Pandas — How to Update Values in iterrows
In Pandas, iterrows() is a popular method for iterating over DataFrame rows as (index, Series) pairs. Sometimes, you might want to modify or update values in your DataFrame while iterating through rows. While it is possible to update values within iterrows(), there are more efficient ways to handle such operations in Pandas. In this article, we will explore how to update values using iterrows() and discuss best practices for better performance.
Using iterrows() to Update Values
To update values while iterating with iterrows(), you need to modify the values of the row object, which is a Pandas Series. However, it’s important to remember that modifications made to the row within the loop do not directly affect the original DataFrame.
Example of Updating Values Using iterrows()
In the following example, we’ll update the “Age” column to add 5 years to each person’s age using iterrows():
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 22],
'Gender': ['Male', 'Female', 'Male']
})
# Update values using iterrows()
for index, row in df.iterrows():
df.at[index, 'Age'] = row['Age'] + 5 # Add 5 years to each 'Age' value
print(df)
Output:
Name Age Gender
0 John 30 Male
1 Alice 35 Female
2 Bob 27 Male
In this example, the “Age” column is updated successfully using iterrows() by iterating through each row and modifying the value using df.at[] to set the new value for the respective index.
Why Direct Modifications Within iterrows() Don’t Affect the DataFrame
While you can modify the row within the loop, directly modifying it does not affect the DataFrame. This is because iterrows() returns a copy of the row, not a reference to the original DataFrame. Therefore, you must use an index-based method, like df.at[], to modify the DataFrame in place.
Alternative: Using apply() for Better Performance
Although iterrows() works for updating values, it can be slow for large DataFrames due to its row-wise nature. A more efficient approach is to use the apply() function, which applies a function along a DataFrame axis (row or column) and is faster for large datasets.
Using apply() to Update Values
Here’s an example of how to use apply() to achieve the same result of adding 5 years to each person’s age:
# Using apply() for better performance
df['Age'] = df['Age'].apply(lambda x: x + 5)
print(df)
Output:
Name Age Gender
0 John 30 Male
1 Alice 35 Female
2 Bob 27 Male
In this example, apply() is used to apply a lambda function that adds 5 to each value in the “Age” column, which is more efficient than using iterrows().
Summary
While you can update values in a DataFrame using iterrows(), it is not the most efficient way, especially for large datasets. The df.at[] method is used to modify the DataFrame in place during iteration. For better performance, consider using vectorized operations or apply() when updating values in a Pandas DataFrame.
Frequently Asked Questions — How to Update Values in iterrows() in Pandas
How do I update column values using iterrows() in Pandas?
You can modify DataFrame values by using df.at[] or df.loc[] inside the iterrows() loop. Example:
for i, row in df.iterrows():
df.at[i, 'column_name'] = row['column_name'] * 2
This updates the DataFrame in place.
Why can’t I update values directly inside iterrows() using row['col']?
Because row is a copy (a Pandas Series), not a reference to the DataFrame. Changes to row don’t affect the original DataFrame unless you use df.at or df.loc.
How to conditionally update values while iterating with iterrows()?
Use an if-condition inside the loop:
for i, row in df.iterrows():
if row['A'] > 10:
df.at[i, 'B'] = 'High'
Is it efficient to update values using iterrows()?
No. iterrows() is slow for large DataFrames. Use vectorized operations or apply() for better performance:
df['B'] = df['A'].apply(lambda x: 'High' if x > 10 else 'Low')
How do I update multiple columns inside an iterrows() loop?
You can update multiple columns using df.at for each field:
for i, row in df.iterrows():
df.at[i, 'A'] = row['A'] + 1
df.at[i, 'B'] = row['B'].upper()
How to update a single row completely inside iterrows()?
Use df.loc[i] to assign an entire row:
for i, row in df.iterrows():
df.loc[i] = row * 2
However, this is still slower than vectorized methods.
How to update a DataFrame based on another column’s values?
Check one column and modify another during iteration:
for i, row in df.iterrows():
if row['score'] < 50:
df.at[i, 'grade'] = 'Fail'
Can I use iloc to update values inside iterrows()?
Yes, you can use df.iloc[i, col_index] to update by position instead of column label:
for i, row in df.iterrows():
df.iloc[i, 1] = row['A'] + 5
How to verify if updates made inside iterrows() worked?
Print or inspect your DataFrame after the loop:
print(df.head())
Make sure to use df.at or df.loc for changes to persist.
What’s the best alternative to iterrows() for updating values?
Use vectorized operations or np.where() for speed:
import numpy as np
df['grade'] = np.where(df['score'] < 50, 'Fail', 'Pass')