Search code examples
bashsedcut

Deleting first n rows and column x from multiple files using Bash script


I am aware that the "deleting n rows" and "deleting column x" questions have both been answered individually before. My current problem is that I'm writing my first bash script, and am having trouble making that script work the way I want it to.

file0001.csv (there are several hundred files like these in one folder)

Data number of lines 540
No.,Profile,Unit
1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm

Desired output

1,1027.84
2,1027.92
3,1028
4,1028.81

I am able to use sed and cut individually but for some reason the following bash script doesn't take cut into account. It also gives me an error "sed: can't read ls: No such file or directory", yet sed is successful and the output is saved to the original files.

sem2csv.sh

for files in 'ls *.csv'  #list of all .csv files
do
  sed '1,2d' -i $files | cut -f  '1-2' -d  ','
done

Actual output:

1,1027.84,µm
2,1027.92,µm
3,1028,µm
4,1028.81,µm

I know there may be awk one-liners but I would really like to understand why this particular bash script isn't running as intended. What am I missing?


Solution

  • The -i option of sed modifies the file in place. Your pipeline to cut receives no input because sed -i produces no output. Without this option, sed would write the results to standard output, instead of back to the file, and then your pipeline would work; but then you would have to take care of writing the results back to the original file yourself.

    Moreover, single quotes inhibit expansion -- you are "looping" over the single literal string ls *.csv. The fact that you are not quoting it properly then causes the string to be subject to wildcard expansion inside the loop. So after variable interpolation, your sed command expands to

    sed -i 1,2d ls *.csv
    

    and then the shell expands *.csv because it is not quoted. (You should have been receiving a warning that there is no file named ls in the current directory, too.) You probably attempted to copy an example which used backticks (ASCII 96) instead of single quotes (ASCII 39) -- the difference is quite significant.

    Anyway, the ls is useless -- the proper idiom is

    for files in *.csv; do
      sed '1,2d' "$files" ...   # the double quotes here are important
    done
    

    Mixing sed and cut is usually not a good idea because you can express anything cut can do in terms of a simple sed script. So your entire script could be

    for f in *.csv; do
        sed -i -e '1,2d' -e 's/,[^,]*$//' "$f"
    done
    

    which says to remove the last comma and everything after it. (If your sed does not like multiple -e options, try with a semicolon separator: sed -i '1,2d;s/,[^,]*$//' "$f")