Please note, i understand how to output lines in one file that are not in another (here), my question is a little different.
In one file i have lines akin to
Андреев
Барбашев
Иванов
...
in a different file there are lines:
Барбашёв
Семёнов
...
Now. I need the lines from the second file, but only if you cannot find a line in the first where you substitute ё for е. For example Барбашёв
should not display, because Барбашев
is in the first.
If i do something like
comm -13 first.txt <(cat second.txt | sed 's/ё/е/g')
i get the correct lines, however, they have already been tranformed by that time, and it's unacceptable for what i'm trying to do.
In other words the output is:
Барбашев
...
While it should be
Барбашёв
...
You meant:
"Now. I need the lines from the second file, but only if you cannot find a line in the first when you substitute ё for е in the second file."
instead of
"Now. I need the lines from the second file, but only if you cannot find a line in the first where you substitute ё for е."
Right?
Without using a cyrilic charset, this solution works:
file test.awk
#!/usr/bin/gawk -f
{
if(NR==FNR)
arr[$1]++;
else {
tmp=$1;
gsub("t","e",tmp)
if(!(tmp in arr))
printf("%s\n", $1);
}
}
Use:
$ ./test.awk file1 file2
If you substitute "t" -> "ё" this should also work imo. Maybe you can try.