While processing a file with pdfminer (pdf2txt.py) I received empty output:
dan@work:~/project$ pdf2txt.py docs/homericaeast.pdf
dan@work:~/project$
Can anybody say what wrong with this file and what I can do to get data from it?
Here's dumppdf.py docs/homericaeast.pdf
output:
<trailer>
<dict size="4">
<key>Info</key>
<value><ref id="2" /></value>
<key>Root</key>
<value><ref id="1" /></value>
<key>ID</key>
<value><list size="2">
<string size="16">on ¤µF¤5Á>ó_ýv¬`</string>
<string size="16">on ¤µF¤5Á>ó_ýv¬`</string>
</list></value>
<key>Size</key>
<value><number>27</number></value>
</dict>
</trailer>
<trailer>
<dict size="4">
<key>Info</key>
<value><ref id="2" /></value>
<key>Root</key>
<value><ref id="1" /></value>
<key>ID</key>
<value><list size="2">
<string size="16">on ¤µF¤5Á>ó_ýv¬`</string>
<string size="16">on ¤µF¤5Á>ó_ýv¬`</string>
</list></value>
<key>Size</key>
<value><number>27</number></value>
</dict>
</trailer>
Now I have fixed the problem with /OneByteIdentityH
similarly to the code for two byte unicode mapping /Identity-H
. The patch is in PR #179