I need to do some processing on my JSON data but it turn outs that my JSON is formatted in a way that it contains only one row. On Terminal, wc -l file.json
is returning 0
File is created converting Pandas Dataframe to JSON.
Here is the sample: file.json
[
{"id":683156,"overall_rating":5.0,"hotel_id":220216,"hotel_name":"Beacon Hill Hotel","title":"\u201cgreat hotel, great location\u201d","text":"The rooms here are not palatial","author_id":"C0F"},
{"id":692745,"overall_rating":5.0,"hotel_id":113317,"hotel_name":"Casablanca Hotel Times Square","title":"\u201cabsolutely delightful\u201d","text":"I travelled from Spain...","author_id":"8C1"}
]
I want to split it say 10,000 records per file.
You could use jq to emit the top-level items in the array, one per line, as follows:
jq -c '.[]' file.json
If you simply want to partition this stream (without reconstituting each partition as an array), you can use a tool such as split
.
If you want each partition to be an array, you could use jq to form the partitions, and then use a tool such as awk
to create the separate files. See for example this SO Q&A:
Splitting / chunking JSON files with JQ in Bash or Fish shell?