Pandas DataFrames are the workhorse of data analysis in Python. They provide a flexible and efficient way to store and manipulate tabular data. This article provides a comprehensive guide on how to create, manipulate, and analyze data using the Pandas DataFrame, complete with practical examples and code snippets.
Creating a Pandas DataFrame
There are several ways to create a Pandas DataFrame. One common method is from a dictionary, where keys become column names, and values become column data.
import pandas as pd
data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
'quantity': [3, 2, 5, 1],
'price': [2.50, 1.00, 0.75, 1.50]}
df = pd.DataFrame(data)
print(df)
fruits quantity price 0 apple 3 2.50 1 banana 2 1.00 2 orange 5 0.75 3 grape 1 1.50
> Why use this? This method is useful when you have data readily available in a dictionary format and want to quickly create a DataFrame.

Selecting Columns in a Pandas DataFrame
Selecting specific columns is a fundamental operation. You can select one or more columns by referencing their names.
import pandas as pd
data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
'quantity': [3, 2, 5, 1],
'price': [2.50, 1.00, 0.75, 1.50]}
df = pd.DataFrame(data)
# Selecting the 'fruits' and 'price' columns
selected_columns = df[['fruits', 'price']]
print(selected_columns)
fruits price 0 apple 2.50 1 banana 1.00 2 orange 0.75 3 grape 1.50
> Why use this? This is useful when you only need a subset of the data for analysis or visualization.
Filtering Rows in a Pandas DataFrame
Filtering rows based on certain conditions is a key part of data analysis. Here’s how to filter a Pandas DataFrame using boolean indexing.
import pandas as pd
data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
'quantity': [3, 2, 5, 1],
'price': [2.50, 1.00, 0.75, 1.50]}
df = pd.DataFrame(data)
# Filtering for rows where quantity is greater than 2
filtered_df = df[df['quantity'] > 2]
print(filtered_df)
fruits quantity price 0 apple 3 2.50 2 orange 5 0.75
> When to use this? Use this when you need to isolate specific subsets of your data based on defined criteria.
Adding a New Column to a Pandas DataFrame
Adding new columns to a Pandas DataFrame can be done by assigning a new Series to a new column name.
import pandas as pd
data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
'quantity': [3, 2, 5, 1],
'price': [2.50, 1.00, 0.75, 1.50]}
df = pd.DataFrame(data)
# Adding a new column 'total_cost'
df['total_cost'] = df['quantity'] * df['price']
print(df)
fruits quantity price total_cost 0 apple 3 2.50 7.50 1 banana 2 1.00 2.00 2 orange 5 0.75 3.75 3 grape 1 1.50 1.50
> Why use this? This is useful when you want to create new features based on existing data within the DataFrame.
Grouping and Aggregating Data in a Pandas DataFrame
Grouping and aggregating data allows you to perform calculations on subsets of your data. The groupby() method is powerful for this.
import pandas as pd
data = {'category': ['A', 'B', 'A', 'B', 'A'],
'value': [10, 20, 15, 25, 12]}
df = pd.DataFrame(data)
# Grouping by 'category' and calculating the sum of 'value'
grouped_df = df.groupby('category')['value'].sum()
print(grouped_df)
category A 37 B 45 Name: value, dtype: int64
> When to use this? This is perfect for summarizing data and calculating statistics for different groups within your dataset.
Applying Functions to a Pandas DataFrame
Applying functions to a Pandas DataFrame lets you perform custom calculations on rows or columns.
import pandas as pd
data = {'numbers': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)
# Applying a square function to the 'numbers' column
def square(x):
return x * x
df['squared'] = df['numbers'].apply(square)
print(df)
numbers squared 0 1 1 1 2 4 2 3 9 3 4 16 4 5 25
> Why use this? Applying custom functions allows for flexible data transformation and feature engineering.
Frequently Asked Questions
What is a Pandas DataFrame?
How do I install Pandas?
pip install pandas.
Can I create a Pandas DataFrame from a CSV file?
pd.read_csv() function to create a Pandas DataFrame from a CSV file.
How can I iterate over rows in a Pandas DataFrame?
iterrows(), but it’s generally recommended to use vectorized operations for better performance.
What is boolean indexing in Pandas DataFrames?
How do I handle missing data in a Pandas DataFrame?
fillna(), dropna(), and interpolate() to handle missing data.
What are some common data manipulation tasks with Pandas DataFrames?