Suppose I have a pdf file containing the following table info
Trainer: Giannis
Pokedex: Incomplete
Name | Type | Weight | Height | Color |
---|---|---|---|---|
Pikachu | Electric | 6.0 kg | 0.4 m | Yellow |
Bulbasaur | Grass/Poison | 6.9 kg | 0.7 m | Green |
Charizard | Fire/Flying | 90.5 kg | 1.7 m | Orange |
Jigglypuff | Normal/Fairy | 5.5 kg | 0.5 m | Pink |
Gyarados | Water/Flying | 235.0 kg | 6.5 m | Blue |
I am using the Form Parser to extract the table information.
If I know that the table columns will always be [Name, Type, ... , Color]
is there a way to pass this info to the FormParser
processor to help it better determine the header rows?
Thank u in advance for your time!
You can't add any "hints" for the Form Parser to adjust the model at this time. You can try using a different version of the Form Parser model to see if the results are more like what you would expect.
To extract values from a document using a custom defined schema like you are suggesting, you will likely get the best results using a Custom Document Extractor. You can follow this guide for instructions on how to build a custom processor, and this section about Quick Tables in the labeling documentation could be useful to speed up labeling for tabular data.