Can we pass table column info to help FormParser determine header_row contents?

Suppose I have a pdf file containing the following table info

Trainer: Giannis

Pokedex: Incomplete

Name	Type	Weight	Height	Color
Pikachu	Electric	6.0 kg	0.4 m	Yellow
Bulbasaur	Grass/Poison	6.9 kg	0.7 m	Green
Charizard	Fire/Flying	90.5 kg	1.7 m	Orange
Jigglypuff	Normal/Fairy	5.5 kg	0.5 m	Pink
Gyarados	Water/Flying	235.0 kg	6.5 m	Blue

I am using the Form Parser to extract the table information.

If I know that the table columns will always be [Name, Type, ... , Color] is there a way to pass this info to the FormParser processor to help it better determine the header rows?

Thank u in advance for your time!

Solution

You can't add any "hints" for the Form Parser to adjust the model at this time. You can try using a different version of the Form Parser model to see if the results are more like what you would expect.

To extract values from a document using a custom defined schema like you are suggesting, you will likely get the best results using a Custom Document Extractor. You can follow this guide for instructions on how to build a custom processor, and this section about Quick Tables in the labeling documentation could be useful to speed up labeling for tabular data.