In Linux environments, cut
has an --output-delimiter
option that is extremely handy for my purposes of extracting fields from US Census data and outputting a delimited file. The data I'm working with are about 150K rows and 1K columns, and the Census data dictionary provides column ranges for each of a couple hundred fields, but the fields are not delimited in any consistent way -- you have to know the column positions. So if I want a few select fields with comma-separated output, this is easy:
cut -c 1-15,93-95,101-105 --output-delimiter=',' census_file.txt
But on a Mac, the --output-delimiter option is not available. Solutions I've seen to get a comma-delimited file with selected columns are complex, ugly, and furthermore require a more manual approach than the Linux cut approach above, where you have to specify exactly where you want commas in each case.
Can anybody point me to some core bash commands that can reproduce the Linux cut functionality? Or if some third-party software is available to install, that would be fine too.
Without a clean solution, I will probably run an Ubuntu Docker container locally and just use that, but I'm hoping to find a set of tools available to my host machine.
Installing GNU tools works beautifully, following instructions here: Install GNU Tools
brew install coreutils
Unless explicitly defaulted, any GNU tools with overlapping command names can be accessed with a 'g' prefix.
gcut -c [column list] --output-delimiter=',' census-file.txt > delimited-census-file.csv