Search code examples
pythonpdfpdfminerpdf-parsing

PDFminer empty output


While processing a file with pdfminer (pdf2txt.py) I received empty output:

dan@work:~/project$ pdf2txt.py  docs/homericaeast.pdf 

dan@work:~/project$ 

Can anybody say what wrong with this file and what I can do to get data from it?

Here's dumppdf.py docs/homericaeast.pdf output:

<trailer>
<dict size="4">
<key>Info</key>
<value><ref id="2" /></value>
<key>Root</key>
<value><ref id="1" /></value>
<key>ID</key>
<value><list size="2">
<string size="16">on&#10;&#164;&#181;F&#164;5&#193;&#62;&#243;_&#253;v&#172;`</string>
<string size="16">on&#10;&#164;&#181;F&#164;5&#193;&#62;&#243;_&#253;v&#172;`</string>
</list></value>
<key>Size</key>
<value><number>27</number></value>
</dict>
</trailer>

<trailer>
<dict size="4">
<key>Info</key>
<value><ref id="2" /></value>
<key>Root</key>
<value><ref id="1" /></value>
<key>ID</key>
<value><list size="2">
<string size="16">on&#10;&#164;&#181;F&#164;5&#193;&#62;&#243;_&#253;v&#172;`</string>
<string size="16">on&#10;&#164;&#181;F&#164;5&#193;&#62;&#243;_&#253;v&#172;`</string>
</list></value>
<key>Size</key>
<value><number>27</number></value>
</dict>
</trailer>

Solution

  • Now I have fixed the problem with /OneByteIdentityH similarly to the code for two byte unicode mapping /Identity-H. The patch is in PR #179