Search code examples
bashsed

Double single quotes in a sed command


I was trying to remove lines that had an element already seen in them (an ID) while keeping the first that appeared with sed. I found a solution but it was not explained at all and I am struggling to understand it.

Example of test.txt (IDs will not always be numerically sorted but duplicates will follow each others) :

1
2
3
3
4
4
4
5
6
7
7

Result wanted :

1
2
3
4
5
6
7

The code:

#creates array of Ids
mapfile -t id_array < <(cut -f1 test.txt)
#loops over IDs
for (( i=0; i < ${#id_array[@]}; i++ )) 
do
     prev=$(($i-1))
     #compares each ID with the previous one, if same then adds it to index
     if (( ${id_array[$prev]} == ${id_array[$i]} ))
     then 
          index_array+=($i)
     fi
done
#line I dont fully understand, removes lines from array
sed -i ''"${index_array[*]/%/d;}"'' test.txt

The last line deletes inplace the lines indexed in the arrray. [*] expands all values in a single word ([@] would not work as it expands each value in its in own word). The /%/ replaces whitespaces with d; with parameters expansion. But I completely fail to understand the '' on each side. Just one simple quote does not not work. Why ?

EDIT: it came to me that its was to keep the first (internal) ' to keep the sed expression in single quotes as required, true ?


Solution

  • The correct awk solution for this is:

    awk '$1 != prev{print} {prev=$1}' test.txt
    

    The above stores almost nothing in memory, just 2 $1s at a time. If you did awk '!seen[$1]++' test.txt, on the other hand, you'd get the same output but then you'd have to store all unique $1 values in memory together and so YMMV if your input is massive.