Search code examples
xmluima

Using UIMA for text extraction from XML file


I am building a text extractor for XML using UIMA. As I am a total beginner to the UIMA framework, I want to know how to go about it.

I understand that UIMA can annotate specific parts of the file, but how do I extract the information efficiently? Any help is appreciated.

Thanks, Jatin


Solution

  • In the limited perspective of a developer of UIMA Ruta, I use the HtmlAnnotator of UIMA Ruta for these use cases. This is certainly not the most efficient approach. The analysis engine won't use separate types for the elements as it knows only the most common html tags, but I perform the conversion to a predefined type system in UIMA Ruta if needed. At the backend, the htmlparser is applied.