How to Parse and Convert JSON to CSV in Python

JSON to CSV Python : In this tutorial we will show you how to convert a JSON file to a CSV file in the python programming language.
Overview
JSON literally means Javascript Object Notation. It allows to represent information in a structured way like XML for example. A Json file contains two types of structural elements:
- A set of keys/values
- Ordered lists of values (these can be objects, arrays or generic values).
To give you an idea of what a Json file represents, here is an example that will illustrate our different examples :
[{
"id": 1,
"name": {
"english": "Bulbasaur",
"japanese": "フシギダネ",
"chinese": "妙蛙种子",
"french": "Bulbizarre"
},
"type": [
"Grass",
"Poison"
],
"base": {
"HP": 45,
"Attack": 49,
"Defense": 49,
"Sp. Attack": 65,
"Sp. Defense": 65,
"Speed": 45
}
},
{
"id": 2,
"name": {
"english": "Ivysaur",
"japanese": "フシギソウ",
"chinese": "妙蛙草",
"french": "Herbizarre"
},
"type": [
"Grass",
"Poison"
],
"base": {
"HP": 60,
"Attack": 62,
"Defense": 63,
"Sp. Attack": 80,
"Sp. Defense": 80,
"Speed": 60
}
},
{
"id": 3,
"name": {
"english": "Venusaur",
"japanese": "フシギバナ",
"chinese": "妙蛙花",
"french": "Florizarre"
},
"type": [
"Grass",
"Poison"
],
"base": {
"HP": 80,
"Attack": 82,
"Defense": 83,
"Sp. Attack": 100,
"Sp. Defense": 100,
"Speed": 80
}
}]
This file represents the pokemons as well as the characteristics associated with each one of them, if you wish to recover the complete list of the Pokedex, you will find it at this address :
https://github.com/fanzeyi/pokemon.json/blob/master/pokedex.json
We will see at the end of this tutorial how to convert this type of file to csv and see how to do it in python. Let’s go !
Parsing JSON
The first step is to load the json file into a python object. There are several ways to do this:
- Use the pandas library and its read_json function
- Use the json module to read the JSON file
Parsing Json File using Pandas
Pandas is a python library that allows to easily manipulate data to be analyzed. It is often used to read JSON files. Indeed a lot of python API returns as a result of JSON and with pandas it is very easy to exploit this data directly. To read a JSON file we can use the read_json function. Here is an example with the pokedex.json file :
import pandas as pd
pd.set_option('display.max_columns', 50) # display columns
df = pd.read_json(r'pokedex.json')
print(df)
Output:
id name type \
0 1 {‘english’: ‘Bulbasaur’, ‘japanese’: ‘フシギダネ’, … [Grass, Poison]
1 2 {‘english’: ‘Ivysaur’, ‘japanese’: ‘フシギソウ’, ‘c… [Grass, Poison]
2 3 {‘english’: ‘Venusaur’, ‘japanese’: ‘フシギバナ’, ‘… [Grass, Poison]
0 {‘HP’: 45, ‘Attack’: 49, ‘Defense’: 49, ‘Sp. A…
1 {‘HP’: 60, ‘Attack’: 62, ‘Defense’: 63, ‘Sp. A…
2 {‘HP’: 80, ‘Attack’: 82, ‘Defense’: 83, ‘Sp. A…
Parsing Json File using JSON module
You can also use the json module to read a json file :
import json
with open('pokedex.json') as json_file:
data = json.load(json_file)
print(data)
Converting JSON to CSV in python
The CSV format (which stands for Comma Separated Values) is the most common import and export format used for Excel spreadsheets and databases. This is a main reason why we want to convert a Json file to CSV.
Converting JSON file to CSV file using Pandas
To convert our Json file, there is a function in Pandas called to_csv() that saves our file in CSV format. Using our previous example where we parsed our JSON file into a Pandas dataframe, we can export our dataframe to CSV like this:
import pandas as pd
pd.set_option('display.max_columns', 50) # display columns
df = pd.read_json(r'pokedex.json')
df.to_csv("pokedex.csv")
The CSV file has been exported in the directory you have chosen ( In the example, we have exported the file in the same directory as the python file but you can put an absolute path to store it in another directory).
Converting JSON file to CSV file using cvs module
We can also use the CSV module to export our Json file as a CSV file. Here’s an example that allows us to export only the pokemon names to the CSV file:
import json
import csv
with open('pokedex.json') as json_file:
data = json.load(json_file)
# Write only the Pokemon name
pokedex = open('pokedex2.csv', 'w')
csvwriter = csv.writer(pokedex)
# Get the header
csvwriter.writerow(data[1]['name'].keys())
for row in data:
csvwriter.writerow(row['name'].values())
pokedex.close()
Output:
english,japanese,chinese,french
Bulbasaur,フシギダネ,妙蛙种子,Bulbizarre
Ivysaur,フシギソウ,妙蛙草,Herbizarre
Venusaur,フシギバナ,妙蛙花,Florizarre
We were able to export the different names of the Pokémon in the CSV. To retrieve the header we need to use the keys() function which allows us to get the keys of each “Name” element of our JSON file. Of course it’s possible to get all the JSON file data.
Summary
As we have seen, it may be easy to convert a Json file to a CSV file. On the other hand, JSON files can have much more complex structures than CSV files, so a direct conversion is not always possible and will require us to rework our structure of the file concerned.
Don’t hesitate to tell me in comments if you have any concerns about the use of these codes, I would be happy to answer 🙂
Comments
On 08/21/2020 at 15 h 06 min, ALAN said:
Thanks for writing a clear and easy to understand post! I found this very helpful and useful.
On 09/22/2020 at 17 h 29 min, Thomas said:
Thanks for the nice overview. While having a very similar task, I saw your page, and knowing about the json_flatten library,
I ended up with the following version, which might be also useful for somebody:
```
import json
import sys
import json_flatten
import pandas as pd
# transforms a textfile that contains one json document per line (without
# array brackets surrounding or comma separators) to a csv file
# set in- and output
fname = "pre.json"
if len(sys.argv) > 1:
fname = sys.argv[1]
foutname = ".".join(fname.split(".")[:-1])+".csv"
# generate empty array to fill with json documents
rows = []
# loop over the lines in the file
with open(fname, "rt") as f:
while True:
line = f.readline()
if not line:
break
jd = json.loads(line)
jdflat = json_flatten.flatten(jd)
rows.append(jdflat)
# make dataframe and output to csv
df = pd.read_json(json.dumps(rows))
df.to_csv(foutname)
```
Leave a comment