extraction of data from PDF converted XBRL files

I have some XBRL files converted into pdf. Now I want to develop a project that would automatically extract all the data from these files. The project would be developed in JAVA. I am unable to get any lead. Any suggestions regarding how to start the project would be very much appreciated as there is very limited information over the internet regarding this.

Solution

I would recommend trying to get the original XBRL (or iXBRL) files rather than use the generated PDFs.

XBRL was designed in the first place in order to be easily machine readable and in order to avoid having to reverse engineer printed documents or PDFs. Attempting to read PDFs means not leveraging the potential of XBRL and may lead to imprecisions and errors.

Then, if you can get these source files, I recommend using an XBRL processor that will take care of all the complexity for you. This will save a lot of time compared to use a raw XML processor. It is likely that there are XBRL libraries written for Java.

I am sorry not to be able to give you a better answer, but I hope this helps you get started.