extracting first line from file command such that

I have a file with almost 5*(10^6) lines of integer numbers. So, my file is big enough.

The question is all about extract specific lines, filtering them by a condition. For example, I'd like to:

Extract the N first lines without read entire file.
Extract the lines with the numbers less or equal X (or >=, <=, <, >)
Extract the lines with a condition related a number (math predicate)

Is there a cleaver way to perform these tasks? (using sed or awk or cat or head)

Thanks in advance.

Solution

To extract the first $NUMBER lines,

head -n $NUMBER filename

Assuming every line contains just a number (although it will also work if the first token is one), 2 can be solved like this:

awk '$1 >= 1234 && $1 < 5678' filename

And keeping in spirit with that, 3 is just the extension

awk 'condition' filename

It would have helped if you had specified what condition is supposed to be, though. This way, you'll have to read the awk documentation to find out how to code it. Again, the number will be represented by $1.

I don't think I can explain anything about the head call, it's really just what it says on the tin. As for the awk lines: awk, like sed, works linewise. awk fetches lines in a loop and applies your code to each line. This code takes the form

condition1 { action1 }
condition2 { action2 }
# and so forth

For every line awk fetches, the conditions are checked in the order they appear, and the associated action to each condition is performed if the condition is true. It would, for example, have been possible to extract the first $NUMBER lines of a file with awk like this:

awk -v number="$NUMBER" '1 { print } NR == number { exit }' filename

where 1 is synonymous with true (like in C) and NR is the line number. The -v command line option initializes the awk variable number to $NUMBER. If no action is specified, the default action is { print }, which prints the whole line. So

awk 'condition' filename

is shorthand for

awk 'condition { print }' filename

...which prints every line where the condition holds.