Search code examples
regexbashsedgreplarge-files

grep nth string from a very large file in constant time(file size independent)?


Is there a grep (sed/awk) like tool in linux to find the nth occurrence of a string(regex) from a very large file? Also, I would like to find the number of occurrences of the search string within the file. Remember, the file is really large (> 2 gb).


Solution

  • Grep solution:

    grep -on regexp < file.txt

    file.txt:

    one two one

    two

    one

    two two

    two one

    Lines with regexp one

    grep -on one < test.txt

    1:one

    1:one

    3:one

    5:one

    How many occurrences:

    grep -on one < test.txt | wc -l

    4

    Line with the Nth occurrence:

    grep -m1 one < test.txt | tail -n1

    one two one

    Update: Now, the solutions don't use cat. Thanks to @tripleee for the hint.