Search code examples
awkline

Using awk to pull specific lines from a file


I have two files, one file is my data, and the other file is a list of line numbers that I want to extract from my data file. Can I use awk to read in my lines file, and then extract the lines that match the line numbers?

Example: Data file:

This is the first line of my data
This is the second line of my data
This is the third line of my data
This is the fourth line of my data
This is the fifth line of my data

Line numbers file

1
4
5

Output:

This is the first line of my data
This is the fourth line of my data
This is the fifth line of my data

I've only ever used command line awk and sed for really simple stuff. This is way beyond me and I have been googling for an hour without an answer.


Solution

  • One way with sed:

    sed 's/$/p/' linesfile | sed -n -f - datafile
    

    You can use the same trick with awk:

    sed 's/^/NR==/' linesfile | awk -f - datafile
    

    Edit - Huge files alternative

    With regards to huge number of lines it is not prudent to keep whole files in memory. The solution in that case can be to sort the numbers-file and read one line at a time. The following has been tested with GNU awk:

    extract.awk

    BEGIN {
      getline n < linesfile
      if(length(ERRNO)) {
        print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr"
        exit
      }
    }
    
    NR == n { 
      print
      if(!(getline n < linesfile)) {
        if(length(ERRNO))
          print "Unable to open linesfile '" linesfile "': " ERRNO > "/dev/stderr"
        exit
      }
    }
    

    Run it like this:

    awk -v linesfile=$linesfile -f extract.awk infile
    

    Testing:

    echo "2
    4
    7
    8
    10
    13" | awk -v linesfile=/dev/stdin -f extract.awk <(paste <(seq 50e3) <(seq 50e3 | tac))
    

    Output:

    2   49999
    4   49997
    7   49994
    8   49993
    10  49991
    13  49988