Search code examples
pythonsubprocesspiping

Calling html2text iteratively on a set of htmls not working


This is the code snippet:

for i in obj:
    url = "someurl" + i
    oars = requests.get(url, timeout=1)
    soup = BeautifulSoup(oars.content)
    fout = open(i + ".html", "wt")
    print((type(soup.prettify)))
    fout.write(oars.text)
    oars.close
    #fout.write(soup.get_text())
    # Still not working, using zsh for now
    if call("html2text " + i + ".html" + ">" + i + ".txt", shell=True) == 0:
        print("yay")
        #call("rm -f " + i + ".html", shell=True)
    else:
        print(i)

But html2text is just creating empty txt files rather than properly piping the output. I even tried replacing html2text with elinks -dump but to no avail.


Solution

  • Not sure, but this might be what you're after

    import subprocess
    import sys
    
    outfile = i + ".txt"
    
    
    cmd = sys.path[0] + "/htmltotext " + i + ".html"
    
    with open(outfile, "w") as output_f:
        p = subprocess.Popen(cmd, stdout=output_f, shell=True)