I have encountered an IOError in Python which I cannot grasp. I have a relatively simple script retrieving various scientific articles and organizing them into a directory structure.
The call to writing each output file is here (in a for-each loop):
(58) outfile = open(curr_dir + "/" + article + ".txt",'w')
(59) outfile.write("title: " + title + '\n')
(60) outfile.write("abstract: " + abstract + '\n')
(61) outfile.close()
For over a thousand articles, the output files are opened and written without trouble. However, on two, it fails with the following IOError pointing to the first line shown above:
Traceback (most recent call last):
File "script.py", line 58, in <module>
outfile = open(curr_dir + "/" + article + ".txt",'w')
IOError: [Errno 2] No such file or directory: '/path/to/file/text.html.txt'
Here are the two files:
/path/2-minute-not-invasive-screening-for-cardio-vascular-diseases-relative-limitation-of-c-reactive-protein-compared-with-more-sensitive-l-homocystine-as-cardio-vascular-risk-factors-safe-and-effective-treatment-using-the-selective-drug-uptake-enhancementme.html.txt
/path/expression-of-chemokine-receptors-i-immunohistochemical-analyses-with-new-monoclonal-antibodies-from-the-8th-iifferentiation-antigens.html.txt
As far as I can tell, all of the other 1000+ documents look more or less identical. For instance, other documents begin with a number and they were opened at printed without trouble. Also, these articles are trying to write to the same directory that other articles have already been printed in. I would suspect something with respect to length in the first case, but that couldn't be the problem with the second.
Is there something I'm missing? Thanks for the help!
Looking back, I should have posted my solution as an answer rather than just leaving it in the comments.
The issue had to do with the length of the absolute filepath (not just the filename!). Trimming these to fewer than 325 characters did the trick. Something like:
article = article[:325-len(current_dir)]
out.write(os.path.join(current_dir, article + '.txt'))