Recently I had the need to take a JSON payload from an Rest API and flatten it, then convert it to a column format with first column as the path to the data item and the second as the data type. This is to place the API fields into an Excel document to create an integration mapping.

Unfortunately, pandas couldn’t do this natively with the json_normalize() function as there were nested arrays of data elements.

I went searching and found some code on Geeks for Geeks.

It allowed me to drill down to all the elements and put them into a format with the field names across the top and the types in the second row. After that, it was as simple as taking the data frame and transposing it, then kicking it out to a CSV file with to_csv().

I added some command line functions to set the separator, whether to keep the number with arrays, and input and output files.

import json
import pandas
import argparse


def read_json(filename: str) -> dict:
    try:
        with open(filename, "r") as f:
            data = json.loads(f.read())
    except:
        raise Exception(f"Reading {filename} file encountered an error")

    return data


# Function for flattening
# json
def flatten_json(y, sep='.', keep_num=True):
    out = {}
    transform_dict = {'0': 'number', '1': 'number', 'True': 'boolean', 'true': 'boolean', 'False': 'boolean', 'false': 'boolean','str': 'string'}

    def flatten(x, name=''):

        # If the Nested key-value
        # pair is of dict type
        if type(x) is dict:

            for a in x:
                flatten(x[a], name + a + sep)

        # If the Nested key-value
        # pair is of list type
        elif type(x) is list:

            i = 0
            for a in x:
                if keep_num == 'True' or keep_num == 'true':
                    long_name = name + str(i) + sep
                else:
                    long_name = name
                flatten(a, long_name)
                i += 1
        else:
            out[name[:-1]] = transform_dict.get(str(x), x)

    flatten(y)
    return out


parser = argparse.ArgumentParser()

parser.add_argument("-i", "--InputFile", help="Specify the input file.", default='api.json')
parser.add_argument("-o", "--OutputFile", help="Specify the output file.", default='api_output.csv')
parser.add_argument("-kn", "--KeepNums", help="Specify true/false if you want the output to print the list numbers.",
                    default=False)
parser.add_argument("-s", "--Sep", help="Specify the separators.", default='.')

args = parser.parse_args()

if args.InputFile:
    input_file = args.InputFile
if args.OutputFile:
    output_file = args.OutputFile
if args.KeepNums:
    keepNums = args.KeepNums
if args.Sep:
    separator = args.Sep

data = read_json(filename=input_file)
flattened_json = flatten_json(data, sep=separator)

dataframe = pandas.json_normalize(flattened_json)

dataframe.T.to_csv(output_file, index=True)

There’s also a dictionary that lets you set a certain amount of mappings to different values for the field types.

To use:

py json2csv.py

To check on the arguments you may pass:

py json2csv.py --help

Let me know if you use it and how it goes.