Pandas: parse json in dataframe

| 0 Comments| 11:04 am


How to Parse JSON into a Pandas DataFrame

Parsing JSON data is a common task in data analysis, especially when working with APIs or JSON files. Pandas provides easy and flexible tools to load and manipulate JSON data into a DataFrame. This guide covers the different methods to parse JSON into a Pandas DataFrame, with practical examples.

1. Parsing a JSON String

You can parse a JSON string directly into a DataFrame using the pd.read_json() function.

import pandas as pd

# JSON string
json_data = '''
[
    {"Name": "Alice", "Age": 25, "Gender": "Female"},
    {"Name": "Bob", "Age": 30, "Gender": "Male"}
]
'''

# Parse JSON into a DataFrame
df = pd.read_json(json_data)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

2. Reading JSON from a File

If your JSON data is stored in a file, use pd.read_json() with the file path.

import pandas as pd

# Read JSON from a file
df = pd.read_json('data.json')
print(df)

Example File Content (data.json):

[
    {"Name": "Alice", "Age": 25, "Gender": "Female"},
    {"Name": "Bob", "Age": 30, "Gender": "Male"}
]

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

3. Parsing Nested JSON

If the JSON has a nested structure, you can normalize it using pd.json_normalize().

import pandas as pd

# Nested JSON
nested_json = {
    "students": [
        {"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
        {"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
    ]
}

# Normalize the JSON
df = pd.json_normalize(nested_json['students'])
print(df)

Output:

     Name  Details.Age Details.Gender
0   Alice           25         Female
1     Bob           30           Male

4. Reading JSON Lines Format

For JSON files with one JSON object per line, use lines=True in pd.read_json().

import pandas as pd

# Example JSON lines file (data.jsonl)
# {"Name": "Alice", "Age": 25, "Gender": "Female"}
# {"Name": "Bob", "Age": 30, "Gender": "Male"}

# Read JSON lines
df = pd.read_json('data.jsonl', lines=True)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

5. Loading JSON from an API

When working with APIs, you often need to parse JSON responses into a DataFrame. Use the requests library to fetch data and then load it into Pandas.

import requests
import pandas as pd

# Fetch JSON data from an API
url = 'https://api.example.com/data'
response = requests.get(url)
json_data = response.json()

# Parse JSON into a DataFrame
df = pd.json_normalize(json_data)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

6. Handling Complex JSON Structures

For deeply nested JSON, specify the path to normalize specific parts of the data.

import pandas as pd

# Complex nested JSON
complex_json = {
    "school": {
        "name": "High School",
        "students": [
            {"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
            {"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
        ]
    }
}

# Normalize the 'students' section
df = pd.json_normalize(complex_json, record_path=['school', 'students'])
print(df)

Output:

     Name  Details.Age Details.Gender
0   Alice           25         Female
1     Bob           30           Male

Conclusion

Pandas makes it easy to parse JSON into a DataFrame, whether it’s a simple JSON string, a nested structure, or data from an API. By leveraging functions like pd.read_json() and pd.json_normalize(), you can efficiently load and manipulate JSON data for analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Post