Search code examples
python-2.7matrixpdfminer

How to acces an existing(!) matrix which partly contains invalid syntax?


I use pdfminer to convert pdf-text into txt. The pdfminer goes through the pdf-file and reads it out line by line. Each line is assigned to a matrix variable. The problem is, that for some reason in rare cases the matrix is for e. g. like x =

[[Г, 'problems', -436, 'have', -448, 'usually', -435, 'found', -452]]

Obviously Г without quotes is an invalid syntax for a matrix (or list). However, x exists but is not accessible to delete Г, understandably del x[0][0] does not work.

Now I'm asking for ideas how to access x and remove the first element. Many thanks in advance!


Solution

  • I solved my problem with:

    from ast import literal_eval
    mr_x = str(x)
    quote_pos = mr_x.find("'")
    mr_x = '[[' + mr_x[quote_pos:]
    x = literal_eval(mr_x)
    print x
    
    [['problems', -436, 'have', -448, 'usually', -435, 'found', -452]]