Pandas Join Dataframes

| 0 Comments| 6:14 pm


Joining DataFrames in Pandas

Joining DataFrames is a crucial operation in data analysis. It enables you to combine multiple data sources for deeper insights. Pandas provides several methods to join DataFrames flexibly. In this guide, we will explore the various ways to join DataFrames using merge(), join(), concat(), and other advanced techniques.

Sample DataFrames


import pandas as pd

# Sample DataFrames
data1 = {'Name': ['John', 'Alice', 'Bob', 'Eve'],
         'Age': [25, 30, 22, 35],
         'Gender': ['Male', 'Female', 'Male', 'Female']}
df1 = pd.DataFrame(data1)

data2 = {'Name': ['John', 'Alice', 'Bob', 'Charlie'],
         'Salary': [50000, 55000, 40000, 48000]}
df2 = pd.DataFrame(data2)

data3 = {'ID': [1, 2, 3, 4],
         'Department': ['HR', 'IT', 'Finance', 'Marketing']}
df3 = pd.DataFrame(data3)

print(df1)
print(df2)
print(df3)
    

1. Joining DataFrames Using merge()

The merge() function is versatile and offers SQL-like joins, making it ideal for combining DataFrames based on specific columns.

Inner Join


merged_df = pd.merge(df1, df2, on='Name', how='inner')
print(merged_df)
    

Left Join


left_joined_df = pd.merge(df1, df2, on='Name', how='left')
print(left_joined_df)
    

Right Join


right_joined_df = pd.merge(df1, df2, on='Name', how='right')
print(right_joined_df)
    

Outer Join


outer_joined_df = pd.merge(df1, df2, on='Name', how='outer')
print(outer_joined_df)
    

Multi-Key Join


data4 = {'Name': ['John', 'Alice', 'Bob'],
         'Age': [25, 30, 22],
         'Score': [85, 90, 78]}
df4 = pd.DataFrame(data4)

multi_key_merge = pd.merge(df1, df4, on=['Name', 'Age'], how='inner')
print(multi_key_merge)
    

2. Joining DataFrames Using join()

The join() method is index-based, making it suitable for combining DataFrames with meaningful indices.

Basic Join


df1.set_index('Name', inplace=True)
df2.set_index('Name', inplace=True)

joined_df = df1.join(df2, how='inner')
print(joined_df)
    

Joining on Non-Index Columns


reset_df1 = df1.reset_index()
reset_df2 = df2.reset_index()

joined_on_column = reset_df1.join(reset_df2.set_index('Name'), on='Name')
print(joined_on_column)
    

3. Concatenating DataFrames Using concat()

Vertical Concatenation


vertical_concat = pd.concat([df1.reset_index(), df2.reset_index()], axis=0)
print(vertical_concat)
    

Horizontal Concatenation


horizontal_concat = pd.concat([df1.reset_index(), df2.reset_index()], axis=1)
print(horizontal_concat)
    

Concatenation with Keys


concat_with_keys = pd.concat([df1, df2], keys=['Dataset1', 'Dataset2'])
print(concat_with_keys)
    

4. Cross Join (Cartesian Product)


df1['key'] = 1
df3['key'] = 1

cross_joined_df = pd.merge(df1, df3, on='key').drop('key', axis=1)
print(cross_joined_df)
    

5. Updating DataFrames Using update()


df1.update(df2)
print(df1)
    

6. Merging with Conditions (Custom Join)


custom_join = df1[df1['Name'].isin(df2['Name'])]
print(custom_join)
    

7. Merging Using Indicator


merge_with_indicator = pd.merge(df1, df2, on='Name', how='outer', indicator=True)
print(merge_with_indicator)
    

8. Using combine_first() for Hierarchical Data


combined_df = df1.combine_first(df2)
print(combined_df)
    

Summary

Pandas provides a rich set of tools to join and combine DataFrames, including:

  • merge(): SQL-like joins (inner, left, right, outer, cross).
  • join(): Index-based joins.
  • concat(): Concatenates DataFrames along rows or columns.
  • Custom Joins: Filter-based merging.
  • update(): Updates a DataFrame with another.
  • combine_first(): Combines two DataFrames, prioritizing the first.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Post