Pandas – How to Update Values in iterrows
In Pandas, iterrows()
is a popular method for iterating over DataFrame rows as (index, Series) pairs. Sometimes, you might want to modify or update values in your DataFrame while iterating through rows. While it is possible to update values within iterrows()
, there are more efficient ways to handle such operations in Pandas. In this article, we will explore how to update values using iterrows()
and discuss best practices for better performance.
Using iterrows() to Update Values
To update values while iterating with iterrows()
, you need to modify the values of the row
object, which is a Pandas Series. However, it’s important to remember that modifications made to the row
within the loop do not directly affect the original DataFrame.
Example of Updating Values Using iterrows()
In the following example, we’ll update the “Age” column to add 5 years to each person’s age using iterrows()
:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 22],
'Gender': ['Male', 'Female', 'Male']
})
# Update values using iterrows()
for index, row in df.iterrows():
df.at[index, 'Age'] = row['Age'] + 5 # Add 5 years to each 'Age' value
print(df)
Output:
Name Age Gender
0 John 30 Male
1 Alice 35 Female
2 Bob 27 Male
In this example, the “Age” column is updated successfully using iterrows()
by iterating through each row and modifying the value using df.at[]
to set the new value for the respective index.
Why Direct Modifications Within iterrows() Don’t Affect the DataFrame
While you can modify the row
within the loop, directly modifying it does not affect the DataFrame. This is because iterrows()
returns a copy of the row, not a reference to the original DataFrame. Therefore, you must use an index-based method, like df.at[]
, to modify the DataFrame in place.
Alternative: Using apply() for Better Performance
Although iterrows()
works for updating values, it can be slow for large DataFrames due to its row-wise nature. A more efficient approach is to use the apply()
function, which applies a function along a DataFrame axis (row or column) and is faster for large datasets.
Using apply() to Update Values
Here’s an example of how to use apply()
to achieve the same result of adding 5 years to each person’s age:
# Using apply() for better performance
df['Age'] = df['Age'].apply(lambda x: x + 5)
print(df)
Output:
Name Age Gender
0 John 30 Male
1 Alice 35 Female
2 Bob 27 Male
In this example, apply()
is used to apply a lambda function that adds 5 to each value in the “Age” column, which is more efficient than using iterrows()
.
Summary
While you can update values in a DataFrame using iterrows()
, it is not the most efficient way, especially for large datasets. The df.at[]
method is used to modify the DataFrame in place during iteration. For better performance, consider using vectorized operations or apply()
when updating values in a Pandas DataFrame.