Parsing JSON in Pandas DataFrame
Parsing JSON data is a frequent task when working with APIs or handling JSON files. Pandas makes it easy to load and manipulate JSON data into a DataFrame. In this guide, we’ll explore various methods to parse JSON into a Pandas DataFrame and work with different JSON structures.
1. Parsing a Simple JSON String
You can directly parse a JSON string into a DataFrame using the pd.read_json()
function.
import pandas as pd
# Simple JSON string
json_data = '''
[
{"Name": "Alice", "Age": 25, "Gender": "Female"},
{"Name": "Bob", "Age": 30, "Gender": "Male"}
]
'''
# Parse JSON into a DataFrame
df = pd.read_json(json_data)
print(df)
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
2. Reading JSON from a File
If your JSON data is stored in a file, you can read it directly into a DataFrame using the pd.read_json()
function.
import pandas as pd
# Reading JSON data from a file
df = pd.read_json('data.json')
print(df)
Example File Content (data.json
):
[
{"Name": "Alice", "Age": 25, "Gender": "Female"},
{"Name": "Bob", "Age": 30, "Gender": "Male"}
]
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
3. Parsing Nested JSON
If your JSON data is nested, you can use the pd.json_normalize()
function to flatten it and create a DataFrame.
import pandas as pd
# Nested JSON example
nested_json = {
"students": [
{"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
{"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
]
}
# Normalize the nested JSON
df = pd.json_normalize(nested_json['students'])
print(df)
Output:
Name Details.Age Details.Gender
0 Alice 25 Female
1 Bob 30 Male
4. Reading JSON Lines Format
JSON Lines format (one JSON object per line) can be loaded by setting lines=True
in the pd.read_json()
function.
import pandas as pd
# Example JSON lines file content
# {"Name": "Alice", "Age": 25, "Gender": "Female"}
# {"Name": "Bob", "Age": 30, "Gender": "Male"}
# Read JSON lines from file
df = pd.read_json('data.jsonl', lines=True)
print(df)
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
5. Parsing JSON from an API
When dealing with JSON data from APIs, you can fetch the data using the requests
library and then parse it into a DataFrame.
import requests
import pandas as pd
# API URL (replace with an actual URL)
url = 'https://api.example.com/data'
response = requests.get(url)
json_data = response.json()
# Parse JSON into a DataFrame
df = pd.json_normalize(json_data)
print(df)
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
6. Handling Complex Nested JSON
For more complex, deeply nested JSON, you can specify the path to normalize specific sections of the data.
import pandas as pd
# Complex nested JSON
complex_json = {
"school": {
"name": "High School",
"students": [
{"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
{"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
]
}
}
# Normalize the 'students' section of the JSON
df = pd.json_normalize(complex_json, record_path=['school', 'students'])
print(df)
Output:
Name Details.Age Details.Gender
0 Alice 25 Female
1 Bob 30 Male
Conclusion
With Pandas, parsing JSON data into a DataFrame is simple and effective, whether you’re working with a basic JSON string, reading data from a file, or handling complex, nested structures. Functions like pd.read_json()
and pd.json_normalize()
allow you to load and manipulate JSON data efficiently for your analysis needs.