Search code examples
bashawkxargs

Find piped to awk redirected to new files


I'm trying to find a group of files

> find . -type f -iregex .*geojson$
> ./dir1/london.geojson
  ./manchester.geojson

Then for each file found (30 to 40 in many nested folders), I want to add my own json structure around the original, adding in the filename and an extracted id. Just like so:

> cat manchester.geojson
  {"properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }
  {"properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }

I would like the following result:

{"_id": 11.0, filename": "manchester.geojson", "document": {"properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
{"_id": 22.0, filename": "manchester.geojson", "document": {"properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}

The closest I've got is piping to xargs and awk like this:

> find . -type f -iregex .*geojson$ | xargs -d '\n' awk -F'[{:,]' '{print "{ \"_id\":"$7", \"file\": \""FILENAME"\", \"doc\": " $0 " }"}'

  }"_id": 11.0, "file": "./manchester.geojson", "doc": { "type": "Feature", "properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
  }"_id": 22.0, "file": "./manchester.geojson", "doc": { "type": "Feature", "properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}

I don't know what exactly is wrong with the opening curly brace?

I can get to all the variables I would like to, see this example:

> find . -type f -iregex .*geojson$ | xargs -d '\n' awk -F'[{:,]' '{print  $7 " " FILENAME " " $0}'

  11.0 ./manchester.geojson { "type": "Feature", "properties": { "id": 11.0, "borough": "Didsbury" }, "geometry": {"removed": 0} }}
  22.0 ./manchester.geojson { "type": "Feature", "properties": { "id": 22.0, "borough": "Chorlton" }, "geometry": {"removed": 0} }}

Then finally there is the question of sending each files output to a new file with the same name but with a new extension. I can send the whole output of the many files into one big file with a simple > redirect but that is not what I need. Any ideas would be gratefully received.


Solution

  • Thank you to both @EdMorton and @glenjackman for helping to point me in the right direction. In the end i was almost there with the question. Once the dodgy line endings were cleaned up the following single line does the job:

    > find . -type f -name \*geojson | xargs -d '\n' awk -i inplace -F'[:,]' '{print "{ \"_id\":" $5 ", \"file\": \"" FILENAME "\", \"doc\": "$0"}"}'
    

    The missing piece was the -i inplace to modify the file in place which was an option i had not initially considered.