Search code examples
shellawkdelimiter

Separating onto a new line based on a delimiter


I have some rows in my file that look like this

 ENSG00000003096:E4.2|E5.1
 ENSG00000035115:E14.2|E15.1
 ENSG00000140987:E5.2|ENSG00000140987:E6.1
 ENSG00000154358:E46.1|E47.1

I would like to separate them onto a new line based on the delimiter "|" , such that it becomes

  ENSG00000003096:E4.2
  ENSG00000003096:E5.1
  ENSG00000035115:E14.2
  ENSG00000035115:E15.1
  ENSG00000140987:E5.2
  ENSG00000140987:E6.1
  ENSG00000154358:E46.1
  ENSG00000154358:E47.1

Solution

  • With input data as advised in your question, this seems to work with gnu awk:

    awk -F: -v RS="[|]|\n" 'NF==1{print p FS $0;next}NF!=1{p=$1}1' file1
    #Output
    ENSG00000003096:E4.2
    ENSG00000003096:E5.1
    ENSG00000035115:E14.2
    ENSG00000035115:E15.1
    ENSG00000140987:E5.2
    ENSG00000140987:E6.1
    ENSG00000154358:E46.1
    ENSG00000154358:E47.1
    

    Logic:

    | or \n are used as record separator RS
    : is used as field separator FS
    If a line has more than one fields then keep the first field in a variable p
    if a line has only one field then print previous $1 = variable p and the line $0