Search code examples
bashsedcatgnu-parallel

Remove multiple file extesions when using gnu parallel and cat in bash


I have a csv file (separated by comma), which contains

file1a.extension.extension,file1b.extension.extension
file2a.extension.extension,file2b.extension.extension

Problem is, these files are name such as file.extension.extension

I'm trying to feed both columns to parallel and removing all extesions

I tried some variations of:

cat /home/filepairs.csv | sed 's/\..*//' | parallel --colsep ',' echo column 1 = {1}.extension.extension column 2 =  {2} 

Which I expected to output

column 1 = file1a.extension.extension column 2 = file1b
column 1 = file2a.extension.extension column 2 = file2b

But outputs:

column 1 = file1a.extension.extension column 2 = 
column 1 = file2a.extension.extension column 2 =

The sed command is working but is feeding only column 1 to parallel


Solution

  • As currently written the sed only prints one name per line:

    $ sed 's/\..*//'  filepairs.csv
    file1a
    file2a
    

    Where:

    • \. matches on first literal period (.)
    • .* matches rest of line (ie, everything after the first literal period to the end of the line)
    • // says to remove everything from the first literal period to the end of the line

    I'm guessing what you really want is two names per line ... one sed idea:

    $ sed 's/\.[^,]*//g'   filepairs.csv
    file1a,file1b
    file2a,filepath2b
    

    Where:

    • \. matches on first literal period (.)
    • [^,]* matches on everything up to a comma (or end of line)
    • //g says to remove the literal period, everything afterwards (up to a comma or end of line), and the g says to do it repeatedly (in this case the replacement occurs twice)

    NOTE: I don't have parallel on my system so unable to test that portion of OP's code