Search code examples
linuxshellnotepad++delimitertrim

Linux - convert tab delimited to pipe delimited AND remove leading & trailing space


I want to covert a flat data file to a new file based on below requirement:

1) Change tab delimited to pile delimited "|".

2) Remove any leading and trailing SPACE on each "column".

3) Some columns are NULL, I want to keep the null. e.g., A||B. (2nd column is null.)

Example:

Original file (test.dat) has a row of tab delimited data , 7 columns in total, 2 columns after "NY" are NULL:

 A  New York    NY          Meal - Seafood       Grocery Department   

Please be aware there are leading/trailing space on some fields:

(" A ", "Meal - Seafood   ", "  Grocery Department   ")

This is the final version I want in the new file:

A|New York|NY|||Meal - Seafood|Grocery Department

Can any one write a sample code or shell script that I can use in Linux to ouput a new file?

Thanks!


Solution

  • The posix character class [[:space:]] and you can use * to match zero or more spaces. The literal \t matches a tab. Like,

    $ sed "s/^[[:space:]]*//" test.dat | sed "s/[[:space:]]*\t[[:space:]]*/|/g" \
         | sed "s/[[:space:]]*$//" 
    A|New York|NY|Meal - Seafood|Grocery Department   
    
    $ cat test.dat
    A   New York    NY   Meal - Seafood Grocery Department