Parsing JSON in Pandas dataframe

| 0 Comments| 11:05 am


Parsing JSON in Pandas DataFrame

Parsing JSON data is a frequent task when working with APIs or handling JSON files. Pandas makes it easy to load and manipulate JSON data into a DataFrame. In this guide, we’ll explore various methods to parse JSON into a Pandas DataFrame and work with different JSON structures.

1. Parsing a Simple JSON String

You can directly parse a JSON string into a DataFrame using the pd.read_json() function.

import pandas as pd

# Simple JSON string
json_data = '''
[
    {"Name": "Alice", "Age": 25, "Gender": "Female"},
    {"Name": "Bob", "Age": 30, "Gender": "Male"}
]
'''

# Parse JSON into a DataFrame
df = pd.read_json(json_data)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

2. Reading JSON from a File

If your JSON data is stored in a file, you can read it directly into a DataFrame using the pd.read_json() function.

import pandas as pd

# Reading JSON data from a file
df = pd.read_json('data.json')
print(df)

Example File Content (data.json):

[
    {"Name": "Alice", "Age": 25, "Gender": "Female"},
    {"Name": "Bob", "Age": 30, "Gender": "Male"}
]

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

3. Parsing Nested JSON

If your JSON data is nested, you can use the pd.json_normalize() function to flatten it and create a DataFrame.

import pandas as pd

# Nested JSON example
nested_json = {
    "students": [
        {"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
        {"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
    ]
}

# Normalize the nested JSON
df = pd.json_normalize(nested_json['students'])
print(df)

Output:

     Name  Details.Age Details.Gender
0   Alice           25         Female
1     Bob           30           Male

4. Reading JSON Lines Format

JSON Lines format (one JSON object per line) can be loaded by setting lines=True in the pd.read_json() function.

import pandas as pd

# Example JSON lines file content
# {"Name": "Alice", "Age": 25, "Gender": "Female"}
# {"Name": "Bob", "Age": 30, "Gender": "Male"}

# Read JSON lines from file
df = pd.read_json('data.jsonl', lines=True)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

5. Parsing JSON from an API

When dealing with JSON data from APIs, you can fetch the data using the requests library and then parse it into a DataFrame.

import requests
import pandas as pd

# API URL (replace with an actual URL)
url = 'https://api.example.com/data'
response = requests.get(url)
json_data = response.json()

# Parse JSON into a DataFrame
df = pd.json_normalize(json_data)
print(df)

Output:

     Name  Age  Gender
0   Alice   25  Female
1     Bob   30    Male

6. Handling Complex Nested JSON

For more complex, deeply nested JSON, you can specify the path to normalize specific sections of the data.

import pandas as pd

# Complex nested JSON
complex_json = {
    "school": {
        "name": "High School",
        "students": [
            {"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
            {"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
        ]
    }
}

# Normalize the 'students' section of the JSON
df = pd.json_normalize(complex_json, record_path=['school', 'students'])
print(df)

Output:

     Name  Details.Age Details.Gender
0   Alice           25         Female
1     Bob           30           Male

Conclusion

With Pandas, parsing JSON data into a DataFrame is simple and effective, whether you’re working with a basic JSON string, reading data from a file, or handling complex, nested structures. Functions like pd.read_json() and pd.json_normalize() allow you to load and manipulate JSON data efficiently for your analysis needs.

Leave a Reply

Your email address will not be published. Required fields are marked *

Recommended Post