Search code examples
pythonpython-3.xfilehandler

How to seach for a string in .gz file?


I am new to scripting and trying to read the .gz file and copy the lines if it contains "Alas!" in its file. myfiles/all*/input.gz. In the mentioned path it should search for all the directories that starts with (all). for an input.gz file. In input.gz file it should search for a string "Alas!" and print the lines in a text file. I am sure how to do this linux using zgrep command zgrep 'Alas!' myfiles/all*/input.gz > file1.txt. I lost somewhere while trying to write a script for this.


Solution

  • The statement

        if 'Alas!':
    

    merely checks if the string value 'Alas!' is "truthy" (it is, by definition); you want to check if the variable line contains this substring;

        if 'Alas!' in line:
    

    Another problem is that you are opening the output file multiple times, overwriting any results from previous input files. You want to open it only once, at the beginning (or open for appending; but repeatedly opening and closing the same file is unnecessary and inefficient).

    A better design altogether might be to simply print to standard output, and let the user redirect the output to a file if they like. (Also, probably accept the input files as command-line arguments, rather than hardcoding a fugly complex relative path.)

    A third problem is that the input line already contains a newline, but print() will add another. Either strip the newline before printing, or tell print not to supply another (or switch to write which doesn't add one).

    import gzip
    import glob
    
    with open('file1.txt', 'w') as o:
        for file in glob.glob('myfiles/all*/input.gz'):
            with gzip.open(file, 'rt') as f:
                for line in f:
                    if 'Alas!' in line:
                        print(line, file=o, end='')
    

    Demo: https://ideone.com/rTXBSS