Search code examples
fileunixfile-processing

Split file in Unix based on occurence of some specific string


Contents of my file is as following

Tenor|CurrentCoupon
15Y|3.091731898890382
30Y|3.5773546584901617
Id|Cusip|Ticker|Status|Error|AsOfDate|Price|LiborOas
1|01F020430|FN 15 2 F0|1||20180312|95.19140625|-0.551161358515
2|01F020448|FN 15 2 F1|1||20180312|95.06640625|1.18958768351
3|01F020547|FN 20 2 F0|1||20180312|90.484375|50.742896921
4|01F020554|FN 20 2 F1|1||20180312|90.359375|52.4642397071
5|01F020646|FN 30 2 F0|1||20180312|90.25|6.26649840403

and I have to split it into 2 files like

Tenor,CurrentCoupon
15Y,3.294202313
30Y,3.727696014

and

Id,Cusip,Ticker,Status,Error,AsOfDate,Price,LiborOas
1,01F020489,FN 15 2 F0,1,,20180807,94.27734375,6.199343069
2,01F020497,FN 15 2 F1,1,,20180807,94.15234375,8.225144379
3,01F020588,FN 20 2 F0,1,,20180807,89.984375,48.11248894

I have very little knowledge of UNIX scripts. The number of rows will vary.


Solution

  • Using awk you can do something very simple

    awk -F '|' '{print $0 > NF ".txt"}' yourfile.txt
    

    This command will split your file into 2.txt (all rows containing 2 columns) and 8.txt (all rows containing 8 columns)

    To understand this command, -F option sets the delimiter, awk will parse your file line by line, $0 stands for the entire row, NF for the number of fields in the parsed row.

    If you want to change the delimiter from | to , :

    awk -F '|' 'BEGIN{OFS=","};{$1=$1; print > NF ".txt"}' yourfile.txt
    

    OFS stands for Output File Separator, $1=$1 is an ugly hack to rebuild your row with the right separator ^^