Search code examples
sortingawksedcut

Split text based on Column Difference Into Multiple Files


I have this data concerning trajectory information, below:

EP, 13, 2017071012, 03, AP01, 126, 27.1, -130, 17, 1018, XX, 34, NEQ, 0000, 0000, 0000, 0000
AL, 07, 2017071012, 03, AP01, 132, 27, -131.1, 18, 1018, XX, 34, NEQ, 0000, 0000, 0000, 0000
WP, 19, 2017071012, 03, AP01, 000, 18.5, -116.8, 56, 982, XX, 50, NEQ, 0057, 0047, 0034, 0036
AL, 08, 2017071012, 03, AP01, 132, 27, -132.1, 17, 1018, XX, 34, NEQ, 0000, 0000, 0000, 0000

The information needs to be sorted by the 1st (name) and 2nd (numerical identifier) columns.

Running

sort -k1,2 file.txt

organizes the file into:

AL, 07, 2017071012, 03, AP01, 132, 27, -131.1, 18, 1018, XX, 34, NEQ, 0000, 0000, 0000, 0000
AL, 08, 2017071012, 03, AP01, 132, 27, -132.1, 17, 1018, XX, 34, NEQ, 0000, 0000, 0000, 0000
EP, 13, 2017071012, 03, AP01, 126, 27.1, -130, 17, 1018, XX, 34, NEQ, 0000, 0000, 0000, 0000
WP, 19, 2017071012, 03, AP01, 000, 18.5, -116.8, 56, 982, XX, 50, NEQ, 0057, 0047, 0034, 0036

This is a step to what is desired.

I need to separate the data into separate files based on the second column - how would that be done? I imagine some type of regular expression is needed. Additionally, the second column is always numerical, and will not contain negative integers.

(The first column will always start with AL, EP, or WP)

Thank you for your information and help in advance!


Solution

  • sort -k1,2 file.txt | awk -F', *' '{print > ("out" $2)}'
    

    If you are not using GNU awk and you file has a lot of unique "$2" values then you'll need to close the files as you go, e.g. at its simplest:

    sort -k1,2 file.txt | awk -F', *' '{f="out" $2; print >> f; close(f)}'