Joining DataFrames is a crucial operation in data analysis. It enables you to combine multiple data sources for deeper insights. Pandas provides several methods to join DataFrames flexibly. In this guide, we will explore the various ways to join DataFrames using merge()
, join()
, concat()
, and other advanced techniques.
import pandas as pd
# Sample DataFrames
data1 = {'Name': ['John', 'Alice', 'Bob', 'Eve'],
'Age': [25, 30, 22, 35],
'Gender': ['Male', 'Female', 'Male', 'Female']}
df1 = pd.DataFrame(data1)
data2 = {'Name': ['John', 'Alice', 'Bob', 'Charlie'],
'Salary': [50000, 55000, 40000, 48000]}
df2 = pd.DataFrame(data2)
data3 = {'ID': [1, 2, 3, 4],
'Department': ['HR', 'IT', 'Finance', 'Marketing']}
df3 = pd.DataFrame(data3)
print(df1)
print(df2)
print(df3)
merge()
The merge()
function is versatile and offers SQL-like joins, making it ideal for combining DataFrames based on specific columns.
merged_df = pd.merge(df1, df2, on='Name', how='inner')
print(merged_df)
left_joined_df = pd.merge(df1, df2, on='Name', how='left')
print(left_joined_df)
right_joined_df = pd.merge(df1, df2, on='Name', how='right')
print(right_joined_df)
outer_joined_df = pd.merge(df1, df2, on='Name', how='outer')
print(outer_joined_df)
data4 = {'Name': ['John', 'Alice', 'Bob'],
'Age': [25, 30, 22],
'Score': [85, 90, 78]}
df4 = pd.DataFrame(data4)
multi_key_merge = pd.merge(df1, df4, on=['Name', 'Age'], how='inner')
print(multi_key_merge)
join()
The join()
method is index-based, making it suitable for combining DataFrames with meaningful indices.
df1.set_index('Name', inplace=True)
df2.set_index('Name', inplace=True)
joined_df = df1.join(df2, how='inner')
print(joined_df)
reset_df1 = df1.reset_index()
reset_df2 = df2.reset_index()
joined_on_column = reset_df1.join(reset_df2.set_index('Name'), on='Name')
print(joined_on_column)
concat()
vertical_concat = pd.concat([df1.reset_index(), df2.reset_index()], axis=0)
print(vertical_concat)
horizontal_concat = pd.concat([df1.reset_index(), df2.reset_index()], axis=1)
print(horizontal_concat)
concat_with_keys = pd.concat([df1, df2], keys=['Dataset1', 'Dataset2'])
print(concat_with_keys)
df1['key'] = 1
df3['key'] = 1
cross_joined_df = pd.merge(df1, df3, on='key').drop('key', axis=1)
print(cross_joined_df)
update()
df1.update(df2)
print(df1)
custom_join = df1[df1['Name'].isin(df2['Name'])]
print(custom_join)
merge_with_indicator = pd.merge(df1, df2, on='Name', how='outer', indicator=True)
print(merge_with_indicator)
combine_first()
for Hierarchical Data
combined_df = df1.combine_first(df2)
print(combined_df)
Pandas provides a rich set of tools to join and combine DataFrames, including:
merge()
: SQL-like joins (inner
, left
, right
, outer
, cross
).join()
: Index-based joins.concat()
: Concatenates DataFrames along rows or columns.update()
: Updates a DataFrame with another.combine_first()
: Combines two DataFrames, prioritizing the first.Pandas: How to Access Columns by Name In Pandas, accessing columns by name is a…
Pandas: How to Access or Select Columns by Index, not by Name In Pandas, accessing…
Pandas: How to Access Row by Index In Pandas, you can access rows in a…
Pandas: How to Access a Column Using iterrows() In Pandas, iterrows() is commonly used to…
Pandas - How to Update Values in iterrows In Pandas, iterrows() is a popular method…
Pandas KeyError When Using iterrows() In Pandas, the iterrows() method is often used to iterate…