Pandas Update Values in iterrows

Pandas – How to Update Values in iterrows

In Pandas, iterrows() is a popular method for iterating over DataFrame rows as (index, Series) pairs. Sometimes, you might want to modify or update values in your DataFrame while iterating through rows. While it is possible to update values within iterrows(), there are more efficient ways to handle such operations in Pandas. In this article, we will explore how to update values using iterrows() and discuss best practices for better performance.

Using iterrows() to Update Values

To update values while iterating with iterrows(), you need to modify the values of the row object, which is a Pandas Series. However, it’s important to remember that modifications made to the row within the loop do not directly affect the original DataFrame.

Example of Updating Values Using iterrows()

In the following example, we’ll update the “Age” column to add 5 years to each person’s age using iterrows():

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22],
    'Gender': ['Male', 'Female', 'Male']
})

# Update values using iterrows()
for index, row in df.iterrows():
    df.at[index, 'Age'] = row['Age'] + 5  # Add 5 years to each 'Age' value

print(df)

Output:

     Name  Age  Gender
0    John   30    Male
1   Alice   35  Female
2     Bob   27    Male

In this example, the “Age” column is updated successfully using iterrows() by iterating through each row and modifying the value using df.at[] to set the new value for the respective index.

Why Direct Modifications Within iterrows() Don’t Affect the DataFrame

While you can modify the row within the loop, directly modifying it does not affect the DataFrame. This is because iterrows() returns a copy of the row, not a reference to the original DataFrame. Therefore, you must use an index-based method, like df.at[], to modify the DataFrame in place.

Alternative: Using apply() for Better Performance

Although iterrows() works for updating values, it can be slow for large DataFrames due to its row-wise nature. A more efficient approach is to use the apply() function, which applies a function along a DataFrame axis (row or column) and is faster for large datasets.

Using apply() to Update Values

Here’s an example of how to use apply() to achieve the same result of adding 5 years to each person’s age:

# Using apply() for better performance
df['Age'] = df['Age'].apply(lambda x: x + 5)

print(df)

Output:

     Name  Age  Gender
0    John   30    Male
1   Alice   35  Female
2     Bob   27    Male

In this example, apply() is used to apply a lambda function that adds 5 to each value in the “Age” column, which is more efficient than using iterrows().

Summary

While you can update values in a DataFrame using iterrows(), it is not the most efficient way, especially for large datasets. The df.at[] method is used to modify the DataFrame in place during iteration. For better performance, consider using vectorized operations or apply() when updating values in a Pandas DataFrame.

admin