How to iterate over rows in a Pandas DataFrame?
Iterating over rows in a Pandas DataFrame can be done in several ways, depending on the use case and performance requirements. Here are the most common methods:
1. Using iterrows()
This method returns an iterator generating pairs of index and Series for each row.
import pandas as pd
# Sample DataFrame
data = {'ID': [101, 102, 103], 'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [85, 90, 78]}
df = pd.DataFrame(data)
# Iterate over rows
for index, row in df.iterrows():
print(f"Index: {index}, Name: {row['Name']}, Score: {row['Score']}")
Output:
Index: 0, Name: Alice, Score: 85
Index: 1, Name: Bob, Score: 90
Index: 2, Name: Charlie, Score: 78
2. Using itertuples()
This method returns named tuples of the rows, which is faster than iterrows()
.
# Iterate using itertuples
for row in df.itertuples(index=True):
print(f"ID: {row.ID}, Name: {row.Name}, Score: {row.Score}")
Output:
ID: 101, Name: Alice, Score: 85
ID: 102, Name: Bob, Score: 90
ID: 103, Name: Charlie, Score: 78
3. Using apply()
For row-wise operations, you can use the apply()
function.
# Apply a function to each row
df['Status'] = df.apply(lambda row: 'Pass' if row['Score'] >= 80 else 'Fail', axis=1)
print(df)
Output:
ID Name Score Status
0 101 Alice 85 Pass
1 102 Bob 90 Pass
2 103 Charlie 78 Fail
4. Using zip()
with DataFrame Columns
For simpler DataFrames, you can iterate directly over the columns using zip()
.
# Iterate using zip
for id_, name, score in zip(df['ID'], df['Name'], df['Score']):
print(f"ID: {id_}, Name: {name}, Score: {score}")
Output:
ID: 101, Name: Alice, Score: 85
ID: 102, Name: Bob, Score: 90
ID: 103, Name: Charlie, Score: 78
5. Using to_dict()
You can convert the DataFrame to a dictionary and iterate over its rows.
# Convert DataFrame to dictionary
data_dict = df.to_dict(orient='records')
for row in data_dict:
print(row)
Output:
{'ID': 101, 'Name': 'Alice', 'Score': 85}
{'ID': 102, 'Name': 'Bob', 'Score': 90}
{'ID': 103, 'Name': 'Charlie', 'Score': 78}
When to Use Each Method
itertuples()
for faster iteration when reading rows without modification.iterrows()
when you need row data as Series (but avoid for performance-critical tasks).apply()
for vectorized operations without explicit iteration.zip()
for iterating specific columns.to_dict()
for working with row data as dictionaries.