Pandas dataframe with examples

Pandas DataFrame: Creation, Manipulation & Real-World Examples



Pandas dataframe with examples

Pandas DataFrames are the workhorse of data analysis in Python. They provide a flexible and efficient way to store and manipulate tabular data. This article provides a comprehensive guide on how to create, manipulate, and analyze data using the Pandas DataFrame, complete with practical examples and code snippets.

Creating a Pandas DataFrame

There are several ways to create a Pandas DataFrame. One common method is from a dictionary, where keys become column names, and values become column data.

 import pandas as pd

 data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
         'quantity': [3, 2, 5, 1],
         'price': [2.50, 1.00, 0.75, 1.50]}

 df = pd.DataFrame(data)
 print(df)
 
   fruits  quantity  price
 0   apple         3   2.50
 1  banana         2   1.00
 2  orange         5   0.75
 3   grape         1   1.50
 

> Why use this? This method is useful when you have data readily available in a dictionary format and want to quickly create a DataFrame.

Pandas dataframe with examples

Selecting Columns in a Pandas DataFrame

Selecting specific columns is a fundamental operation. You can select one or more columns by referencing their names.

 import pandas as pd

 data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
         'quantity': [3, 2, 5, 1],
         'price': [2.50, 1.00, 0.75, 1.50]}

 df = pd.DataFrame(data)

 # Selecting the 'fruits' and 'price' columns
 selected_columns = df[['fruits', 'price']]
 print(selected_columns)
 
   fruits  price
 0   apple   2.50
 1  banana   1.00
 2  orange   0.75
 3   grape   1.50
 

> Why use this? This is useful when you only need a subset of the data for analysis or visualization.

Filtering Rows in a Pandas DataFrame

Filtering rows based on certain conditions is a key part of data analysis. Here’s how to filter a Pandas DataFrame using boolean indexing.

 import pandas as pd

 data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
         'quantity': [3, 2, 5, 1],
         'price': [2.50, 1.00, 0.75, 1.50]}

 df = pd.DataFrame(data)

 # Filtering for rows where quantity is greater than 2
 filtered_df = df[df['quantity'] > 2]
 print(filtered_df)
 
   fruits  quantity  price
 0   apple         3   2.50
 2  orange         5   0.75
 

> When to use this? Use this when you need to isolate specific subsets of your data based on defined criteria.

Adding a New Column to a Pandas DataFrame

Adding new columns to a Pandas DataFrame can be done by assigning a new Series to a new column name.

 import pandas as pd

 data = {'fruits': ['apple', 'banana', 'orange', 'grape'],
         'quantity': [3, 2, 5, 1],
         'price': [2.50, 1.00, 0.75, 1.50]}

 df = pd.DataFrame(data)

 # Adding a new column 'total_cost'
 df['total_cost'] = df['quantity'] * df['price']
 print(df)
 
   fruits  quantity  price  total_cost
 0   apple         3   2.50        7.50
 1  banana         2   1.00        2.00
 2  orange         5   0.75        3.75
 3   grape         1   1.50        1.50
 

> Why use this? This is useful when you want to create new features based on existing data within the DataFrame.

Grouping and Aggregating Data in a Pandas DataFrame

Grouping and aggregating data allows you to perform calculations on subsets of your data. The groupby() method is powerful for this.

 import pandas as pd

 data = {'category': ['A', 'B', 'A', 'B', 'A'],
         'value': [10, 20, 15, 25, 12]}

 df = pd.DataFrame(data)

 # Grouping by 'category' and calculating the sum of 'value'
 grouped_df = df.groupby('category')['value'].sum()
 print(grouped_df)
 
 category
 A    37
 B    45
 Name: value, dtype: int64
 

> When to use this? This is perfect for summarizing data and calculating statistics for different groups within your dataset.

Applying Functions to a Pandas DataFrame

Applying functions to a Pandas DataFrame lets you perform custom calculations on rows or columns.

 import pandas as pd

 data = {'numbers': [1, 2, 3, 4, 5]}
 df = pd.DataFrame(data)

 # Applying a square function to the 'numbers' column
 def square(x):
     return x * x

 df['squared'] = df['numbers'].apply(square)
 print(df)
 
   numbers  squared
 0        1        1
 1        2        4
 2        3        9
 3        4       16
 4        5       25
 

> Why use this? Applying custom functions allows for flexible data transformation and feature engineering.

Frequently Asked Questions

What is a Pandas DataFrame?
A Pandas DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table.
How do I install Pandas?
You can install Pandas using pip: pip install pandas.
Can I create a Pandas DataFrame from a CSV file?
Yes, you can use the pd.read_csv() function to create a Pandas DataFrame from a CSV file.
How can I iterate over rows in a Pandas DataFrame?
You can iterate over rows using methods like iterrows(), but it’s generally recommended to use vectorized operations for better performance.
What is boolean indexing in Pandas DataFrames?
Boolean indexing is a way to select data from a Pandas DataFrame based on a boolean condition (True/False). It’s used for filtering rows.
How do I handle missing data in a Pandas DataFrame?
Pandas provides functions like fillna(), dropna(), and interpolate() to handle missing data.
What are some common data manipulation tasks with Pandas DataFrames?
Common tasks include selecting columns, filtering rows, adding new columns, grouping and aggregating data, sorting, and applying functions.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Post