PythonPandas.com

How to calculate the Percentage of a column in Pandas ?



Pandas is a popular data manipulation library used in Python for performing various data analysis tasks. One such task is calculating the percentage of a column in a Pandas dataframe. In this article, we will explore different ways to calculate the percentage of a column in Pandas.

Method 1: Using the apply() Method

The apply() method in Pandas allows you to apply a function to each row or column of a dataframe. To calculate the percentage of a column in a Pandas dataframe using the apply() method, we first need to create a function that will calculate the percentage for a single value. We can then apply this function to each value in the column using the apply() method.

Here is the code to calculate the percentage of a column using the apply() method:

import pandas as pd 
# create a sample dataframe 
data = {'name': ['John', 'Emma', 'Kate', 'Josh'], 
        'score': [80, 75, 90, 85]} 

df = pd.DataFrame(data) 

# calculate the percentage of the 'score' column 
total = df['score'].sum() 
df['percentage'] = df['score'].apply(lambda x: (x / total) * 100) 
print(df)

Output:

name score percentage 
0 John 80 35.087719 
1 Emma 75 32.894737 
2 Kate 90 39.473684 
3 Josh 85 37.719298

Method 2: Using the div() Method

The div() method in Pandas allows you to divide two columns element-wise. To calculate the percentage of a column in a Pandas dataframe using the div() method, we can divide the column we want to calculate the percentage for by the sum of all the values in the column. We can then multiply the result by 100 to get the percentage.

Here is the code to calculate the percentage of a column using the div() method:

import pandas as pd 
# create a sample dataframe 
data = {'name': ['John', 'Emma', 'Kate', 'Josh'], 
        'score': [80, 75, 90, 85]} 

df = pd.DataFrame(data) 
# calculate the percentage of the 'score' column 
total = df['score'].sum() 
df['percentage'] = df['score'].div(total).mul(100) 
print(df)

Output:

name score percentage 
0 John 80 35.087719 
1 Emma 75 32.894737 
2 Kate 90 39.473684 
3 Josh 85 37.719298

Method 3: Using the sum() Method

The sum() method in Pandas allows you to calculate the sum of a column or row. To calculate the percentage of a column in a Pandas dataframe using the sum() method, we can first calculate the sum of the column. We can then divide each value in the column by the sum to get the percentage and assign it to a new column:

import pandas as pd 
# create example dataframe 
data = {'A': [1, 2, 3, 4, 5], 
        'B': [10, 20, 30, 40, 50]} 

df = pd.DataFrame(data) 
# calculate percentage and create new column 
total = df['B'].sum() df['B_Percentage'] = df['B'] / total * 100 
print(df)

Output:

A B B_Percentage 
0 1 10 10.0 
1 2 20 20.0 
2 3 30 30.0 
3 4 40 40.0 
4 5 50 50.0

In this example, we first create a DataFrame with two columns A and B. We then calculate the total of column B using the sum() method. Next, we divide each value in the B column by the total and multiply it by 100 to get the percentage. Finally, we create a new column B_Percentage and assign the calculated percentages to it.

Using apply() method to calculate percentage

Another way to calculate the percentage of a column is to use the apply() method along with a lambda function. Here’s an example:

import pandas as pd 
data = {'A': [1, 2, 3, 4, 5], 'B': [10, 20, 30, 40, 50]} 

df = pd.DataFrame(data) 
# calculate percentage using apply() method and lambda function 
df['B_Percentage'] = df['B'].apply(lambda x: (x / df['B'].sum()) * 100) 
print(df)

Output:

A B B_Percentage 
0 1 10 10.0 
1 2 20 20.0 
2 3 30 30.0 
3 4 40 40.0 
4 5 50 50.0

In this example, we use the apply() method to apply a lambda function to each value in the B column. The lambda function divides each value by the sum of the B column and multiplies it by 100 to get the percentage. Finally, we create a new column B_Percentage and assign the calculated percentages to it.

Method 5: Using the mul() Method

The mul() method in Pandas allows us to multiply each element in a column by a given value. We can use this method to multiply each value in the column by 100 and then divide by the sum of the column.

import pandas as pd 

# create sample dataframe 
df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'], 
                    'Score': [70, 80, 90, 85, 75] }) 

# calculate percentage using mul() method 
df['Percentage'] = df['Score'].mul(100).div(df['Score'].sum()) 
# print dataframe 
print(df)

Output:

Name Score Percentage 
0 Alice 70 16.666667 
1 Bob 80 19.047619 
2 Charlie 90 21.428571 
3 David 85 20.238095 
4 Emily 75 17.619048

 

 

Frequently Asked Questions — How to Calculate the Percentage of a Column in Pandas

How do I calculate the percentage of a column in Pandas?

Divide the column by its total and multiply by 100:

df['percentage'] = (df['col'] / df['col'].sum()) * 100

How do I calculate the percentage of each row in a specific column?

Use vectorized division to find each row’s share of the column total:

df['col_percent'] = df['col'] / df['col'].sum() * 100

How to calculate percentage for all numeric columns in a DataFrame?

Use apply() with lambda to divide each column by its sum:

df_percent = df.apply(lambda x: (x / x.sum()) * 100)

How do I calculate the percentage of a column group by another column?

Use groupby() with transform('sum') for relative percentages:

df['perc'] = df['value'] / df.groupby('category')['value'].transform('sum') * 100

How to show percentages with two decimal points in Pandas?

Use round() or formatting:

df['perc'] = ((df['col'] / df['col'].sum()) * 100).round(2)

How to calculate cumulative percentage in Pandas?

Sort and use cumsum() divided by total:

df['cum_perc'] = df['col'].cumsum() / df['col'].sum() * 100

How to calculate percentage change between rows?

Use the pct_change() function:

df['perc_change'] = df['col'].pct_change() * 100

How to convert a percentage column back to decimal format?

Divide by 100:

df['decimal'] = df['perc'] / 100

Can I calculate percentages ignoring NaN values?

Yes, Pandas automatically ignores NaNs in sum() by default, or use skipna=True.

What’s the best way to calculate and visualize percentages in Pandas?

Use df.plot(kind='bar', y='perc') to quickly visualize calculated percentages.

Related Post