Search code examples
pythoncsvdata-cleaningpdftotext

TypeError: argument of type 'PSLiteral' is not iterable


I am trying to remove some hidden enters using my pdfform-scraper-script before I write it into a csv file. But I keep receiving the error mentioned in the title. The relevant piece of code is:

import glob
import os
import sys
import csv
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdftypes import resolve1

path = 'C:\Users\Wonen\Downloads\Test'
for filename in glob.glob(os.path.join(path, '*.pdf')):
    fp = open(filename, 'rb')
    #read pdf's
    parser = PDFParser(fp)
    doc = PDFDocument(parser)
    #doc.initialize()    # <<if password is required
    fields = resolve1(doc.catalog['AcroForm'])['Fields']
    row = []
    for i in fields:
        field = resolve1(i)
        name, value = field.get('T'), field.get('V')
        #removing 'hidden enter'
        if value == None:
           print 'ok'
        elif value == NotImplementedError:
            print 'ok'
        elif '\n' in value:    
           value.replace('\n',' ')
        elif '\r' in value:    
           value.replace('\r',' ')
        row.append(value)
    writer.writerow(list(reversed(row)))

The complete error (+output) is:
ok
ok

Traceback (most recent call last): File "C:\Python27\Scripts\test3.py", line 37, in elif '\n' in value: TypeError: argument of type 'PSLiteral' is not iterable

Does anyone know how to solve this?


Solution

  • Not knowing the content of the input file it's hard to guess. I think that the problem is that you get some non string value when calling field.get('V') to solve this I suggest you to change value to a string. Try like this:

    if value == None:
       print 'ok'
    elif value == NotImplementedError:
        print 'ok'
    elif '\n' in str(value):
       value = str(value)    
       value.replace('\n',' ')
    elif '\r' in str(value):
       value = str(value)    
       value.replace('\r',' ')