The index in a Pandas DataFrame plays a critical role in structuring and accessing your data. While Pandas assigns a default sequential integer index to rows, customizing or renaming the index can make your DataFrame more descriptive and better aligned with the context of your analysis.
In this tutorial, we’ll explore various methods to rename, reset, and manipulate the index, with examples to demonstrate their usage.
We will use the following DataFrame throughout this article:
import pandas as pd
# Sample DataFrame
data = {
'Employee ID': ['E001', 'E002', 'E003', 'E004', 'E005'],
'Name': ['John Doe', 'Alice Smith', 'Bob Johnson', 'Eve Davis', 'Charlie Brown'],
'Department': ['HR', 'IT', 'Finance', 'Marketing', 'Operations'],
'Age': [28, 34, 29, 42, 31],
'Salary': [50000, 60000, 52000, 75000, 49000]
}
df = pd.DataFrame(data)
print(df)
Output:
Employee ID Name Department Age Salary 0 E001 John Doe HR 28 50000 1 E002 Alice Smith IT 34 60000 2 E003 Bob Johnson Finance 29 52000 3 E004 Eve Davis Marketing 42 75000 4 E005 Charlie Brown Operations 31 49000
rename_axis()
The rename_axis()
method lets you assign a descriptive label to the index axis.
This method is particularly useful when you want to clarify the context of the index.
# Rename the index axis to 'Record ID'
df_renamed_axis = df.rename_axis('Record ID')
print(df_renamed_axis)
Output:
Record ID Employee ID Name Department Age Salary 0 E001 John Doe HR 28 50000 1 E002 Alice Smith IT 34 60000 2 E003 Bob Johnson Finance 29 52000 3 E004 Eve Davis Marketing 42 75000 4 E005 Charlie Brown Operations 31 49000
set_index()
You can replace the default integer index with the values from a specific column using the set_index()
method.
# Set 'Employee ID' as the new index
df_with_new_index = df.set_index('Employee ID')
print(df_with_new_index)
Output:
Name Department Age Salary Employee ID E001 John Doe HR 28 50000 E002 Alice Smith IT 34 60000 E003 Bob Johnson Finance 29 52000 E004 Eve Davis Marketing 42 75000 E005 Charlie Brown Operations 31 49000
You can set multiple columns as a multi-level index:
# Set 'Department' and 'Name' as a multi-level index
df_multi_index = df.set_index(['Department', 'Name'])
print(df_multi_index)
Output:
Employee ID Age Salary Department Name HR John Doe E001 28 50000 IT Alice Smith E002 34 60000 Finance Bob Johnson E003 29 52000 Marketing Eve Davis E004 42 75000 Operations Charlie Brown E005 31 49000
index.name
If you want to directly rename the index axis without creating a new DataFrame, use the index.name
attribute.
# Rename the index in place
df.index.name = 'Record ID'
print(df)
Output:
Record ID Employee ID Name Department Age Salary 0 E001 John Doe HR 28 50000 1 E002 Alice Smith IT 34 60000 2 E003 Bob Johnson Finance 29 52000 3 E004 Eve Davis Marketing 42 75000 4 E005 Charlie Brown Operations 31 49000
reset_index()
The reset_index()
method restores the default integer index while optionally retaining the previous index as a column.
# Set 'Employee ID' as the index and reset it
df_with_new_index = df.set_index('Employee ID')
df_reset_index = df_with_new_index.reset_index()
print(df_reset_index)
Output:
Employee ID Name Department Age Salary 0 E001 John Doe HR 28 50000 1 E002 Alice Smith IT 34 60000 2 E003 Bob Johnson Finance 29 52000 3 E004 Eve Davis Marketing 42 75000 4 E005 Charlie Brown Operations 31 49000
You can directly rename the index values by modifying the index
attribute of the DataFrame.
# Rename index values directly
df_renamed_index_values = df.copy()
df_renamed_index_values.index = ['Row1', 'Row2', 'Row3', 'Row4', 'Row5']
print(df_renamed_index_values)
Output:
Employee ID Name Department Age Salary Row1 E001 John Doe HR 28 50000 Row2 E002 Alice Smith IT 34 60000 Row3 E003 Bob Johnson Finance 29 52000 Row4 E004 Eve Davis Marketing 42 75000 Row5 E005 Charlie Brown Operations 31 49000
Using the rename()
method, you can rename both the columns and the index simultaneously:
# Rename columns and index together
df_renamed = df.rename(columns={'Name': 'Employee Name', 'Salary': 'Monthly Salary'}, index={0: 'A', 1: 'B', 2: 'C'})
print(df_renamed)
Output:
Employee ID Employee Name Department Age Monthly Salary A E001 John Doe HR 28 50000 B E002 Alice Smith IT 34 60000 C E003 Bob Johnson Finance 29 52000 D E004 Eve Davis Marketing 42 75000 E E005 Charlie Brown Operations 31 49000
Renaming the index and columns of a Pandas DataFrame provides flexibility and clarity in data analysis.
Here’s a quick summary of the methods:
With these techniques, you can handle DataFrame indices effectively for any data analysis task.
Reference: https://pandas.pydata.org/docs/reference/api/pandas.Index.rename.html
Pandas: How to Access Columns by Name In Pandas, accessing columns by name is a…
Pandas: How to Access or Select Columns by Index, not by Name In Pandas, accessing…
Pandas: How to Access Row by Index In Pandas, you can access rows in a…
Pandas: How to Access a Column Using iterrows() In Pandas, iterrows() is commonly used to…
Pandas - How to Update Values in iterrows In Pandas, iterrows() is a popular method…
Pandas KeyError When Using iterrows() In Pandas, the iterrows() method is often used to iterate…