In Pandas, iterrows()
is a popular method for iterating over DataFrame rows as (index, Series) pairs. Sometimes, you might want to modify or update values in your DataFrame while iterating through rows. While it is possible to update values within iterrows()
, there are more efficient ways to handle such operations in Pandas. In this article, we will explore how to update values using iterrows()
and discuss best practices for better performance.
To update values while iterating with iterrows()
, you need to modify the values of the row
object, which is a Pandas Series. However, it’s important to remember that modifications made to the row
within the loop do not directly affect the original DataFrame.
In the following example, we’ll update the “Age” column to add 5 years to each person’s age using iterrows()
:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 22],
'Gender': ['Male', 'Female', 'Male']
})
# Update values using iterrows()
for index, row in df.iterrows():
df.at[index, 'Age'] = row['Age'] + 5 # Add 5 years to each 'Age' value
print(df)
Output:
Name Age Gender
0 John 30 Male
1 Alice 35 Female
2 Bob 27 Male
In this example, the “Age” column is updated successfully using iterrows()
by iterating through each row and modifying the value using df.at[]
to set the new value for the respective index.
While you can modify the row
within the loop, directly modifying it does not affect the DataFrame. This is because iterrows()
returns a copy of the row, not a reference to the original DataFrame. Therefore, you must use an index-based method, like df.at[]
, to modify the DataFrame in place.
Although iterrows()
works for updating values, it can be slow for large DataFrames due to its row-wise nature. A more efficient approach is to use the apply()
function, which applies a function along a DataFrame axis (row or column) and is faster for large datasets.
Here’s an example of how to use apply()
to achieve the same result of adding 5 years to each person’s age:
# Using apply() for better performance
df['Age'] = df['Age'].apply(lambda x: x + 5)
print(df)
Output:
Name Age Gender
0 John 30 Male
1 Alice 35 Female
2 Bob 27 Male
In this example, apply()
is used to apply a lambda function that adds 5 to each value in the “Age” column, which is more efficient than using iterrows()
.
While you can update values in a DataFrame using iterrows()
, it is not the most efficient way, especially for large datasets. The df.at[]
method is used to modify the DataFrame in place during iteration. For better performance, consider using vectorized operations or apply()
when updating values in a Pandas DataFrame.
Pandas: How to Access Columns by Name In Pandas, accessing columns by name is a…
Pandas: How to Access or Select Columns by Index, not by Name In Pandas, accessing…
Pandas: How to Access Row by Index In Pandas, you can access rows in a…
Pandas: How to Access a Column Using iterrows() In Pandas, iterrows() is commonly used to…
Pandas KeyError When Using iterrows() In Pandas, the iterrows() method is often used to iterate…
Pandas DataFrame KeyError: 0 - Trying to access column or index that does not exist…