I have a headerless CSV file that is sorted on the first column. When the 2nd and 3rd columns are identical, I want to "collapse" them into one - i.e. leave the last column empty. In other words, this:
0000001,11111,66666
0000002,12121,22222
0000003,33333,33333
0000004,74747,44444
0000005,12345,12345
...becomes this:
0000001,11111,66666
0000002,12121,22222
0000003,33333,
0000004,74747,44444
0000005,12345,
I've tried various permutations of grep
and cut
but can't get anything to work - the closest I've come is cut -c 8-19 file.csv
, which just isolates the 2nd and 3rd columns. I have a feeling needing to do this across columns and needing to delete the value instead of the whole line makes this complicated enough to require awk
or sed
, and I don't know enough about either to know how to approach it.
Using sed
$ sed -E 's/([^,]*,([^,]*),)\2/\1/' input_file
0000001,11111,66666
0000002,12121,22222
0000003,33333,
0000004,74747,44444
0000005,12345,