Is there a way to import rotated text from a PDF table such as with tabula-py in python?
I realize I can just rename the column headers in this case, but I was wondering if there is a way to set a parameter for importing rotated text. I don't see any mention of rotation in the readthedocs for tabula-py and haven't found other packages that would do this yet either (although I did see a mention of rotating an entire page- which doesn't fit this use case exactly as renaming the columns would be easier).
import tabula
list_df = tabula.read_pdf(
As @Francesco mentioned, there is a particular way in which camelot is a better than tabula-py, since camelot finds the rotated text.
It was a difficult process to install camelot, so I thought to share some of my learnings here.
For a mac brew install ghostscript tcl-tk
and then troubleshoot any errors (many errors for me, but after copy-pasting each error, there was gold at the end of the rainbow).
On a mac:
pip install "camelot-py[cv]"
The documentation page currently actually says [base] rather than [cv], but above in the comments it says [cv] (and stack overflow articles say [cv]).
With the following, the rotated column headers are read in just fine.
import camelot
tables = camelot.read_pdf(