Python

Pandas – How to Select Columns

 

Selecting Columns in Pandas: A Complete Guide

When working with data in Pandas, selecting columns is one of the most common and essential operations. Whether you’re extracting a single column or multiple columns, Pandas provides flexible and efficient methods to perform this task. In this guide, we’ll cover all the ways to select columns from a Pandas DataFrame, including common use cases, advanced techniques, and practical examples.

Sample DataFrame

To demonstrate the various methods, we will use the following sample dataset:

import pandas as pd

# Create a sample DataFrame
data = {
    'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'],
    'Age': [25, 30, 22, 35, 28],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male'],
    'Salary': [50000, 55000, 40000, 70000, 48000],
    'Department': ['HR', 'IT', 'Finance', 'Marketing', 'Operations']
}

df = pd.DataFrame(data)
print(df)
Name      Age   Gender   Salary   Department
0  John     25   Male     50000    HR
1  Alice    30   Female   55000    IT
2  Bob      22   Male     40000    Finance
3  Eve      35   Female   70000    Marketing
4  Charlie  28   Male     48000    Operations

1. Selecting a Single Column

A) Using Bracket Notation

The most common way to select a single column is using bracket notation. This returns the column as a Pandas Series.

# Select the 'Age' column
age_column = df['Age']
print(age_column)
0    25
1    30
2    22
3    35
4    28
Name: Age, dtype: int64

B) Using Dot Notation

You can also use dot notation, but it has some limitations. It only works when the column name is a valid Python identifier (no spaces or special characters).

# Select the 'Salary' column
salary_column = df.Salary
print(salary_column)
0    50000
1    55000
2    40000
3    70000
4    48000
Name: Salary, dtype: int64

2. Selecting Multiple Columns

To select multiple columns, pass a list of column names to the bracket notation. This returns a new DataFrame.

# Select 'Name', 'Age', and 'Salary' columns
selected_columns = df[['Name', 'Age', 'Salary']]
print(selected_columns)
      Name  Age  Salary
0    John   25   50000
1   Alice   30   55000
2     Bob   22   40000
3     Eve   35   70000
4  Charlie   28   48000

3. Selecting Columns with loc

The loc[] method allows you to select rows and columns by their labels. To select specific columns, use : for all rows and pass the column names.

# Select 'Gender' and 'Department' columns
selected_columns = df.loc[:, ['Gender', 'Department']]
print(selected_columns)
   Gender   Department
0    Male          HR
1  Female          IT
2    Male     Finance
3  Female   Marketing
4    Male   Operations

4. Selecting Columns with iloc

The iloc[] method selects columns by their index positions. This is useful when you know the position of the columns but not their names.

# Select the first two columns (index positions 0 and 1)
first_two_columns = df.iloc[:, [0, 1]]
print(first_two_columns)

# Select a range of columns (from index 1 to 3)
range_columns = df.iloc[:, 1:4]
print(range_columns)
# Output of first_two_columns:
      Name  Age
0    John   25
1   Alice   30
2     Bob   22
3     Eve   35
4  Charlie   28

# Output of range_columns:
   Age  Gender  Salary
0   25    Male   50000
1   30  Female   55000
2   22    Male   40000
3   35  Female   70000
4   28    Male   48000

5. Selecting Columns Using filter

A) Select Columns by Name Containing a Substring

# Select columns that contain 'Age'
filtered_columns = df.filter(like='Age')
print(filtered_columns)
   Age
0   25
1   30
2   22
3   35
4   28

B) Select Columns by Regex Pattern

# Select columns that start with 'D'
filtered_columns = df.filter(regex='^D')
print(filtered_columns)
    Department
0           HR
1           IT
2      Finance
3    Marketing
4    Operations

6. Summary

Selecting columns is a fundamental operation when working with Pandas DataFrames.
Here’s a quick recap of the methods covered:

  • Bracket Notation: Simple and versatile for single or multiple columns.
  • Dot Notation: Concise but limited.
  • loc[] and iloc[]: Powerful methods for label-based and position-based selection.
  • filter(): Ideal for pattern-based selection.
  • Advanced Techniques: Combine methods for complex selection tasks.

By mastering these techniques, you can efficiently manipulate your DataFrame to suit your data analysis needs.

admin

Share
Published by
admin

Recent Posts

Pandas Access Column by Name

Pandas: How to Access Columns by Name In Pandas, accessing columns by name is a…

1 month ago

Pandas Accessing Columns by index

Pandas: How to Access or Select Columns by Index, not by Name In Pandas, accessing…

1 month ago

Pandas Access Row by index

Pandas: How to Access Row by Index In Pandas, you can access rows in a…

1 month ago

Pandas Access column using iterrows

Pandas: How to Access a Column Using iterrows() In Pandas, iterrows() is commonly used to…

1 month ago

Pandas Update Values in iterrows

Pandas - How to Update Values in iterrows In Pandas, iterrows() is a popular method…

1 month ago

Pandas iterrows keyerror – How to Fix

Pandas KeyError When Using iterrows() In Pandas, the iterrows() method is often used to iterate…

1 month ago