Search code examples
command-linesedcomm

Keep the content of a text with specific same columns in command line


Basically I tried to operate files in command line like this:

File1:
,1,this is some content,
,2,another content,
,3,blablabla,
,4,xxxxxxxx,
,5,yyyyyyyy,
,6,zzzzzzzzzz,
... ...



File2:
1
3
4 
5

Now I want to keep the content of file1 with the same column numbers in file2, so the output should be:

,1,this is some content,
,3,blablabla,
,4,xxxxxxxx,
,5,yyyyyyyy,

I used comm -3 file1 file2 but it doesn't work. Then I tried sed but also didn't work. Is there any other handy tool?


Solution

  • The following will work on the example as given - it won't work if numbers appear in your string after the comma:

    grep -F -f File2 File1
    

    An alternative would be

    join -t, -1 2 -2 1 -o 1.1, 1.2, 1.3 File1 File2
    

    Here is how that works:

    -t,                 considers the `,` as terminator
    -1 2                look at the second column in file 1
    -2 1                look at the first column in file 2
    -o 1.1, 1.2, 1.3    output the first, second, third column of file 1
    

    This still has the drawback that if there are multiple commas in the text that follows, it terminates after the first comma ("field 3" is the last one output).

    Fixing that issue requires the use of xargs:

    join -t, -1 2 -2 1 -o 1.1, 1.2 File1 File2 | xargs -Ixx grep xx File1
    

    Explanation:

    -Ixx : replace the string xx in the command that follows with each of the output lines from the preceding command; the execute that command for each line. This means we will find the lines that match the first ,number, which should make us insensitive to anything else.