I'm trying to find a group of files
> find . -type f -iregex .*geojson$
> ./dir1/london.geojson
./manchester.geojson
Then for each file found (30 to 40 in many nested folders), I want to add my own json structure around the original, adding in the filename and an extracted id. Just like so:
> cat manchester.geojson
{"properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }
{"properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }
I would like the following result:
{"_id": 11.0, filename": "manchester.geojson", "document": {"properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
{"_id": 22.0, filename": "manchester.geojson", "document": {"properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}
The closest I've got is piping to xargs and awk like this:
> find . -type f -iregex .*geojson$ | xargs -d '\n' awk -F'[{:,]' '{print "{ \"_id\":"$7", \"file\": \""FILENAME"\", \"doc\": " $0 " }"}'
}"_id": 11.0, "file": "./manchester.geojson", "doc": { "type": "Feature", "properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
}"_id": 22.0, "file": "./manchester.geojson", "doc": { "type": "Feature", "properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}
I don't know what exactly is wrong with the opening curly brace?
I can get to all the variables I would like to, see this example:
> find . -type f -iregex .*geojson$ | xargs -d '\n' awk -F'[{:,]' '{print $7 " " FILENAME " " $0}'
11.0 ./manchester.geojson { "type": "Feature", "properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
22.0 ./manchester.geojson { "type": "Feature", "properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}
Then finally there is the question of sending each files output to a new file with the same name but with a new extension. I can send the whole output of the many files into one big file with a simple > redirect but that is not what I need. Any ideas would be gratefully received.
Thank you to both @EdMorton and @glenjackman for helping to point me in the right direction. In the end i was almost there with the question. Once the dodgy line endings were cleaned up the following single line does the job:
> find . -type f -name \*geojson | xargs -d '\n' awk -i inplace -F'[:,]' '{print "{ \"_id\":" $5 ", \"file\": \"" FILENAME "\", \"doc\": "$0"}"}'
The missing piece was the -i inplace
to modify the file in place which was an option i had not initially considered.