How to Parse JSON into a Pandas DataFrame
Parsing JSON data is a common task in data analysis, especially when working with APIs or JSON files. Pandas provides easy and flexible tools to load and manipulate JSON data into a DataFrame. This guide covers the different methods to parse JSON into a Pandas DataFrame, with practical examples.
1. Parsing a JSON String
You can parse a JSON string directly into a DataFrame using the pd.read_json()
function.
import pandas as pd
# JSON string
json_data = '''
[
{"Name": "Alice", "Age": 25, "Gender": "Female"},
{"Name": "Bob", "Age": 30, "Gender": "Male"}
]
'''
# Parse JSON into a DataFrame
df = pd.read_json(json_data)
print(df)
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
2. Reading JSON from a File
If your JSON data is stored in a file, use pd.read_json()
with the file path.
import pandas as pd
# Read JSON from a file
df = pd.read_json('data.json')
print(df)
Example File Content (data.json
):
[
{"Name": "Alice", "Age": 25, "Gender": "Female"},
{"Name": "Bob", "Age": 30, "Gender": "Male"}
]
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
3. Parsing Nested JSON
If the JSON has a nested structure, you can normalize it using pd.json_normalize()
.
import pandas as pd
# Nested JSON
nested_json = {
"students": [
{"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
{"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
]
}
# Normalize the JSON
df = pd.json_normalize(nested_json['students'])
print(df)
Output:
Name Details.Age Details.Gender
0 Alice 25 Female
1 Bob 30 Male
4. Reading JSON Lines Format
For JSON files with one JSON object per line, use lines=True
in pd.read_json()
.
import pandas as pd
# Example JSON lines file (data.jsonl)
# {"Name": "Alice", "Age": 25, "Gender": "Female"}
# {"Name": "Bob", "Age": 30, "Gender": "Male"}
# Read JSON lines
df = pd.read_json('data.jsonl', lines=True)
print(df)
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
5. Loading JSON from an API
When working with APIs, you often need to parse JSON responses into a DataFrame. Use the requests
library to fetch data and then load it into Pandas.
import requests
import pandas as pd
# Fetch JSON data from an API
url = 'https://api.example.com/data'
response = requests.get(url)
json_data = response.json()
# Parse JSON into a DataFrame
df = pd.json_normalize(json_data)
print(df)
Output:
Name Age Gender
0 Alice 25 Female
1 Bob 30 Male
6. Handling Complex JSON Structures
For deeply nested JSON, specify the path to normalize specific parts of the data.
import pandas as pd
# Complex nested JSON
complex_json = {
"school": {
"name": "High School",
"students": [
{"Name": "Alice", "Details": {"Age": 25, "Gender": "Female"}},
{"Name": "Bob", "Details": {"Age": 30, "Gender": "Male"}}
]
}
}
# Normalize the 'students' section
df = pd.json_normalize(complex_json, record_path=['school', 'students'])
print(df)
Output:
Name Details.Age Details.Gender
0 Alice 25 Female
1 Bob 30 Male
Conclusion
Pandas makes it easy to parse JSON into a DataFrame, whether it’s a simple JSON string, a nested structure, or data from an API. By leveraging functions like pd.read_json()
and pd.json_normalize()
, you can efficiently load and manipulate JSON data for analysis.