Python

Pandas Join Dataframes

Joining DataFrames in Pandas

Joining DataFrames is a crucial operation in data analysis. It enables you to combine multiple data sources for deeper insights. Pandas provides several methods to join DataFrames flexibly. In this guide, we will explore the various ways to join DataFrames using merge(), join(), concat(), and other advanced techniques.

Sample DataFrames


import pandas as pd

# Sample DataFrames
data1 = {'Name': ['John', 'Alice', 'Bob', 'Eve'],
         'Age': [25, 30, 22, 35],
         'Gender': ['Male', 'Female', 'Male', 'Female']}
df1 = pd.DataFrame(data1)

data2 = {'Name': ['John', 'Alice', 'Bob', 'Charlie'],
         'Salary': [50000, 55000, 40000, 48000]}
df2 = pd.DataFrame(data2)

data3 = {'ID': [1, 2, 3, 4],
         'Department': ['HR', 'IT', 'Finance', 'Marketing']}
df3 = pd.DataFrame(data3)

print(df1)
print(df2)
print(df3)
    

1. Joining DataFrames Using merge()

The merge() function is versatile and offers SQL-like joins, making it ideal for combining DataFrames based on specific columns.

Inner Join


merged_df = pd.merge(df1, df2, on='Name', how='inner')
print(merged_df)
    

Left Join


left_joined_df = pd.merge(df1, df2, on='Name', how='left')
print(left_joined_df)
    

Right Join


right_joined_df = pd.merge(df1, df2, on='Name', how='right')
print(right_joined_df)
    

Outer Join


outer_joined_df = pd.merge(df1, df2, on='Name', how='outer')
print(outer_joined_df)
    

Multi-Key Join


data4 = {'Name': ['John', 'Alice', 'Bob'],
         'Age': [25, 30, 22],
         'Score': [85, 90, 78]}
df4 = pd.DataFrame(data4)

multi_key_merge = pd.merge(df1, df4, on=['Name', 'Age'], how='inner')
print(multi_key_merge)
    

2. Joining DataFrames Using join()

The join() method is index-based, making it suitable for combining DataFrames with meaningful indices.

Basic Join


df1.set_index('Name', inplace=True)
df2.set_index('Name', inplace=True)

joined_df = df1.join(df2, how='inner')
print(joined_df)
    

Joining on Non-Index Columns


reset_df1 = df1.reset_index()
reset_df2 = df2.reset_index()

joined_on_column = reset_df1.join(reset_df2.set_index('Name'), on='Name')
print(joined_on_column)
    

3. Concatenating DataFrames Using concat()

Vertical Concatenation


vertical_concat = pd.concat([df1.reset_index(), df2.reset_index()], axis=0)
print(vertical_concat)
    

Horizontal Concatenation


horizontal_concat = pd.concat([df1.reset_index(), df2.reset_index()], axis=1)
print(horizontal_concat)
    

Concatenation with Keys


concat_with_keys = pd.concat([df1, df2], keys=['Dataset1', 'Dataset2'])
print(concat_with_keys)
    

4. Cross Join (Cartesian Product)


df1['key'] = 1
df3['key'] = 1

cross_joined_df = pd.merge(df1, df3, on='key').drop('key', axis=1)
print(cross_joined_df)
    

5. Updating DataFrames Using update()


df1.update(df2)
print(df1)
    

6. Merging with Conditions (Custom Join)


custom_join = df1[df1['Name'].isin(df2['Name'])]
print(custom_join)
    

7. Merging Using Indicator


merge_with_indicator = pd.merge(df1, df2, on='Name', how='outer', indicator=True)
print(merge_with_indicator)
    

8. Using combine_first() for Hierarchical Data


combined_df = df1.combine_first(df2)
print(combined_df)
    

Summary

Pandas provides a rich set of tools to join and combine DataFrames, including:

  • merge(): SQL-like joins (inner, left, right, outer, cross).
  • join(): Index-based joins.
  • concat(): Concatenates DataFrames along rows or columns.
  • Custom Joins: Filter-based merging.
  • update(): Updates a DataFrame with another.
  • combine_first(): Combines two DataFrames, prioritizing the first.
admin

Share
Published by
admin

Recent Posts

Pandas Access Column by Name

Pandas: How to Access Columns by Name In Pandas, accessing columns by name is a…

1 month ago

Pandas Accessing Columns by index

Pandas: How to Access or Select Columns by Index, not by Name In Pandas, accessing…

1 month ago

Pandas Access Row by index

Pandas: How to Access Row by Index In Pandas, you can access rows in a…

1 month ago

Pandas Access column using iterrows

Pandas: How to Access a Column Using iterrows() In Pandas, iterrows() is commonly used to…

1 month ago

Pandas Update Values in iterrows

Pandas - How to Update Values in iterrows In Pandas, iterrows() is a popular method…

1 month ago

Pandas iterrows keyerror – How to Fix

Pandas KeyError When Using iterrows() In Pandas, the iterrows() method is often used to iterate…

1 month ago