Pandas

Pandas Update Values in iterrows

Pandas – How to Update Values in iterrows

In Pandas, iterrows() is a popular method for iterating over DataFrame rows as (index, Series) pairs. Sometimes, you might want to modify or update values in your DataFrame while iterating through rows. While it is possible to update values within iterrows(), there are more efficient ways to handle such operations in Pandas. In this article, we will explore how to update values using iterrows() and discuss best practices for better performance.

Using iterrows() to Update Values

To update values while iterating with iterrows(), you need to modify the values of the row object, which is a Pandas Series. However, it’s important to remember that modifications made to the row within the loop do not directly affect the original DataFrame.

Example of Updating Values Using iterrows()

In the following example, we’ll update the “Age” column to add 5 years to each person’s age using iterrows():

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['John', 'Alice', 'Bob'],
    'Age': [25, 30, 22],
    'Gender': ['Male', 'Female', 'Male']
})

# Update values using iterrows()
for index, row in df.iterrows():
    df.at[index, 'Age'] = row['Age'] + 5  # Add 5 years to each 'Age' value

print(df)

Output:

     Name  Age  Gender
0    John   30    Male
1   Alice   35  Female
2     Bob   27    Male

In this example, the “Age” column is updated successfully using iterrows() by iterating through each row and modifying the value using df.at[] to set the new value for the respective index.

Why Direct Modifications Within iterrows() Don’t Affect the DataFrame

While you can modify the row within the loop, directly modifying it does not affect the DataFrame. This is because iterrows() returns a copy of the row, not a reference to the original DataFrame. Therefore, you must use an index-based method, like df.at[], to modify the DataFrame in place.

Alternative: Using apply() for Better Performance

Although iterrows() works for updating values, it can be slow for large DataFrames due to its row-wise nature. A more efficient approach is to use the apply() function, which applies a function along a DataFrame axis (row or column) and is faster for large datasets.

Using apply() to Update Values

Here’s an example of how to use apply() to achieve the same result of adding 5 years to each person’s age:

# Using apply() for better performance
df['Age'] = df['Age'].apply(lambda x: x + 5)

print(df)

Output:

     Name  Age  Gender
0    John   30    Male
1   Alice   35  Female
2     Bob   27    Male

In this example, apply() is used to apply a lambda function that adds 5 to each value in the “Age” column, which is more efficient than using iterrows().

Summary

While you can update values in a DataFrame using iterrows(), it is not the most efficient way, especially for large datasets. The df.at[] method is used to modify the DataFrame in place during iteration. For better performance, consider using vectorized operations or apply() when updating values in a Pandas DataFrame.

admin

Share
Published by
admin

Recent Posts

Pandas Access Column by Name

Pandas: How to Access Columns by Name In Pandas, accessing columns by name is a…

2 months ago

Pandas Accessing Columns by index

Pandas: How to Access or Select Columns by Index, not by Name In Pandas, accessing…

2 months ago

Pandas Access Row by index

Pandas: How to Access Row by Index In Pandas, you can access rows in a…

2 months ago

Pandas Access column using iterrows

Pandas: How to Access a Column Using iterrows() In Pandas, iterrows() is commonly used to…

2 months ago

Pandas iterrows keyerror – How to Fix

Pandas KeyError When Using iterrows() In Pandas, the iterrows() method is often used to iterate…

2 months ago

Pandas keyerror: 0 – How to Fix

Pandas DataFrame KeyError: 0 - Trying to access column or index that does not exist…

2 months ago