Search code examples
bashstring-comparison

Compare second row from file 1.csv with complete file 2.txt and output differences


I have two text files, 1.txt, and 2.txt.

1.txt

https://soundcloud.com/track1,artisturi067
https://soundcloud.com/track4,artisturi428
https://soundcloud.com/track72,artisturi023
https://soundcloud.com/track22,artisturi181

2.txt

artisturi181
artisturi428
artisturi172
artisturi096

And I'm looking for a way to compare lines from the 1.txt,2 column with the whole file lines from the 2.txt, resulting in something like this:

3.txt

https://soundcloud.com/track1,artisturi067
https://soundcloud.com/track72,artisturi023

Python, Bash for Windows, or even Powershell would be helpful.


Solution

  • If you have the join command installed, you could do this:

    $ join -t, -12 -21 -v1 -o1.1 -o1.2 <(sort -t, -k2 1.txt) <(sort 2.txt)
    https://soundcloud.com/track72,artisturi023
    https://soundcloud.com/track1,artisturi067
    
    • -t,: sets the record separator to be a comma, since that's what file 1.txt uses
    • -12 -21: says to join the 2nd field of the 1st file (-12) with the 1st field of the 2nd file (-21)
    • -v1: tells join to only output those rows in the 1st file that have no match in the 2nd file
    • -o1.1 -o1.2: says that we want to output the 1st and 2nd fields of the 1st file
    • <(sort -t, -k2 1.txt): since join requires sorted files as inputs, we use process substitution to sort the file 1.txt based on its 2nd key (-k2) and using comma as a delimiter (-t,)
    • <(sort 2.txt): similarly, we sort the 2nd input file, but since it contains a single column, we don't have to specify either a separator, or a key, as we did for the previous file