Search code examples
textawksedcathead

extracting first line from file command such that


I have a file with almost 5*(10^6) lines of integer numbers. So, my file is big enough.

The question is all about extract specific lines, filtering them by a condition. For example, I'd like to:

  1. Extract the N first lines without read entire file.
  2. Extract the lines with the numbers less or equal X (or >=, <=, <, >)
  3. Extract the lines with a condition related a number (math predicate)

Is there a cleaver way to perform these tasks? (using sed or awk or cat or head)

Thanks in advance.


Solution

  • To extract the first $NUMBER lines,

    head -n $NUMBER filename
    

    Assuming every line contains just a number (although it will also work if the first token is one), 2 can be solved like this:

    awk '$1 >= 1234 && $1 < 5678' filename
    

    And keeping in spirit with that, 3 is just the extension

    awk 'condition' filename
    

    It would have helped if you had specified what condition is supposed to be, though. This way, you'll have to read the awk documentation to find out how to code it. Again, the number will be represented by $1.

    I don't think I can explain anything about the head call, it's really just what it says on the tin. As for the awk lines: awk, like sed, works linewise. awk fetches lines in a loop and applies your code to each line. This code takes the form

    condition1 { action1 }
    condition2 { action2 }
    # and so forth
    

    For every line awk fetches, the conditions are checked in the order they appear, and the associated action to each condition is performed if the condition is true. It would, for example, have been possible to extract the first $NUMBER lines of a file with awk like this:

    awk -v number="$NUMBER" '1 { print } NR == number { exit }' filename
    

    where 1 is synonymous with true (like in C) and NR is the line number. The -v command line option initializes the awk variable number to $NUMBER. If no action is specified, the default action is { print }, which prints the whole line. So

    awk 'condition' filename
    

    is shorthand for

    awk 'condition { print }' filename
    

    ...which prints every line where the condition holds.