Convert N-nested JSON to CSV
Any number of nesting and records in a JSON can be handled with minimal code using “json_normalize()” method in pandas.
Syntax:
json_normalize(data)
File in use: details.json file
{ "details": [ { "id": "STU001", "name": "Amit Pathak", "age": 24, "results": { "school": 85, "high_school": 75, "graduation": 70 }, "education": { "graduation": { "major": "Computers", "minor": "Sociology" } } }, { "id": "STU002", "name": "Yash Kotian", "age": 32, "results": { "school": 80, "high_school": 58, "graduation": 49 }, "education": { "graduation": { "major": "Biology", "minor": "Chemistry" } } }, { "id": "STU003", "name": "Aanchal Singh", "age": 28, "results": { "school": 90, "high_school": 70, "graduation":65 }, "education": { "graduation": { "major": "Art", "minor": "IT" } } }, { "id": "STU004", "name": "Juhi Vadia", "age": 23, "results": { "school": 95, "high_school": 89, "graduation": 83 }, "education": { "graduation": { "major": "IT", "minor": "Social" } } } ] }
Here the “details” key consists of an array of 4 elements, where each element contains 3-level of nested JSON objects. The “major” and “minor” key in each of these objects is in a level 3 nesting.
Approach
- The first step is to read the JSON file as a python dict object. This will help us to make use of python dict methods to perform some operations. The read_json() function is used for the task, which taken the file path along with the extension as a parameter and returns the contents of the JSON file as a python dict object.
- We have iterated for each JSON object present in the details array. In each iteration we first normalized the JSON and created a temporary dataframe. This dataframe was then appended to the output dataframe.
- Once done, the column name was renamed for better visibility. If we see the console output, the “major” column was named as “education.graduation.major” before renaming. This is because the “json_normalize()” method uses the keys in the complete nest for generating the column name to avoid duplicate column issue. So, “education” is the first level, “graduation” is second and “major” is third level in the JSON nesting. Therefore, the column “education.graduation.major” was simply renamed to “graduation”.
- After renaming the columns, the to_csv() method saves the pandas dataframe object as CSV to the provided file location.
Example: Converting n-nested JSON to CSV
Python
import json import pandas def read_json(filename: str ) - > dict : try : with open (filename, "r" ) as f: data = json.loads(f.read()) except : raise Exception(f "Reading {filename} file encountered an error" ) return data def create_dataframe(data: list ) - > pandas.DataFrame: # Declare an empty dataframe to append records dataframe = pandas.DataFrame() # Looping through each record for d in data: # Normalize the column levels record = pandas.json_normalize(d) # Append it to the dataframe dataframe = dataframe.append(record, ignore_index = True ) return dataframe def main(): # Read the JSON file as python dictionary data = read_json(filename = "details.json" ) # Generate the dataframe for the array items in # details key dataframe = create_dataframe(data = data[ 'details' ]) # Renaming columns of the dataframe print ( "Normalized Columns:" , dataframe.columns.to_list()) dataframe.rename(columns = { "results.school" : "school" , "results.high_school" : "high_school" , "results.graduation" : "graduation" , "education.graduation.major" : "grad_major" , "education.graduation.minor" : "grad_minor" }, inplace = True ) print ( "Renamed Columns:" , dataframe.columns.to_list()) # Convert dataframe to CSV dataframe.to_csv( "details.csv" , index = False ) if __name__ = = '__main__' : main() |
Output:
$ Console Output
—–
Normalized Columns: [‘id’, ‘name’, ‘age’, ‘results.school’, ‘results.high_school’, ‘results.graduation’, ‘education.graduation.major’, ‘education.graduation.minor’]
Renamed Columns: [‘id’, ‘name’, ‘age’, ‘school’, ‘high_school’, ‘graduation’, ‘grad_major’, ‘grad_minor’]
Convert nested JSON to CSV in Python
In this article, we will discuss how can we convert nested JSON to CSV in Python.
An example of a simple JSON file:
As you can see in the example, a single key-value pair is separated by a colon (:) whereas each key-value pairs are separated by a comma (,). Here, “name”, “profile”, “age”, and “location” are the key fields while the corresponding values are “Amit Pathak“, “Software Engineer“, “24”, “London, UK” respectively.
A nested JSON is a structure where the value for one or more fields can be an another JSON format. For example, follow the below example that we are going to use to convert to CSV format.
An example of a nested JSON file:
In the above example, the key field “article” has a value which is another JSON format. JSON supports multiple nests to create complex JSON files if required.