Pandas

Parsing JSON in Pandas dataframe

Parsing JSON in Pandas DataFrame

Parsing JSON data is a frequent task when working with APIs or handling JSON files. Pandas makes it easy to load and manipulate JSON data into a DataFrame. In this guide, we’ll explore various methods to parse JSON into a Pandas DataFrame and work with different JSON structures.

1. Parsing a Simple JSON String

You can directly parse a JSON string into a DataFrame using the pd.read_json() function.

import pandas as pd

# Simple JSON string
json_data = '''
[
    {"Name": "Alice", "Age": 25, "Gender": "Female"},
    {"Name": "Bob", "Age": 30, "Gender": "Male"}
]
'''

# Parse JSON into a DataFrame
df = pd.read_json(json_data)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

2. Reading JSON from a File

If your JSON data is stored in a file, you can read it directly into a DataFrame using the pd.read_json() function.

import pandas as pd

# Reading JSON data from a file
df = pd.read_json('data.json')
print(df)

Example File Content (data.json):

[
    {"Name": "Alice", "Age": 25, "Gender": "Female"},
    {"Name": "Bob", "Age": 30, "Gender": "Male"}
]

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

3. Parsing Nested JSON

If your JSON data is nested, you can use the pd.json_normalize() function to flatten it and create a DataFrame.

import pandas as pd

# Nested JSON example
nested_json = {
    "students": [
        {"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
        {"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
    ]
}

# Normalize the nested JSON
df = pd.json_normalize(nested_json['students'])
print(df)

Output:

     Name  Details.Age Details.Gender
0   Alice           25         Female
1     Bob           30           Male

4. Reading JSON Lines Format

JSON Lines format (one JSON object per line) can be loaded by setting lines=True in the pd.read_json() function.

import pandas as pd

# Example JSON lines file content
# {"Name": "Alice", "Age": 25, "Gender": "Female"}
# {"Name": "Bob", "Age": 30, "Gender": "Male"}

# Read JSON lines from file
df = pd.read_json('data.jsonl', lines=True)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

5. Parsing JSON from an API

When dealing with JSON data from APIs, you can fetch the data using the requests library and then parse it into a DataFrame.

import requests
import pandas as pd

# API URL (replace with an actual URL)
url = 'https://api.example.com/data'
response = requests.get(url)
json_data = response.json()

# Parse JSON into a DataFrame
df = pd.json_normalize(json_data)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

6. Handling Complex Nested JSON

For more complex, deeply nested JSON, you can specify the path to normalize specific sections of the data.

import pandas as pd

# Complex nested JSON
complex_json = {
    "school": {
        "name": "High School",
        "students": [
            {"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
            {"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
        ]
    }
}

# Normalize the 'students' section of the JSON
df = pd.json_normalize(complex_json, record_path=['school', 'students'])
print(df)

Output:

     Name  Details.Age Details.Gender
0   Alice           25         Female
1     Bob           30           Male

Conclusion

With Pandas, parsing JSON data into a DataFrame is simple and effective, whether you’re working with a basic JSON string, reading data from a file, or handling complex, nested structures. Functions like pd.read_json() and pd.json_normalize() allow you to load and manipulate JSON data efficiently for your analysis needs.

admin

Share
Published by
admin

Recent Posts

Pandas Access Column by Name

Pandas: How to Access Columns by Name In Pandas, accessing columns by name is a…

1 month ago

Pandas Accessing Columns by index

Pandas: How to Access or Select Columns by Index, not by Name In Pandas, accessing…

1 month ago

Pandas Access Row by index

Pandas: How to Access Row by Index In Pandas, you can access rows in a…

1 month ago

Pandas Access column using iterrows

Pandas: How to Access a Column Using iterrows() In Pandas, iterrows() is commonly used to…

1 month ago

Pandas Update Values in iterrows

Pandas - How to Update Values in iterrows In Pandas, iterrows() is a popular method…

1 month ago

Pandas iterrows keyerror – How to Fix

Pandas KeyError When Using iterrows() In Pandas, the iterrows() method is often used to iterate…

1 month ago