Search code examples
javauima

Retrieve values from UIMA FSArray


I have an annotation which has a feature of the type FSArray. This feature should contain a list of strings.

FSArray fsArray = (FSArray)annotation.getFeatureValue(fe);

How do I get the list of strings from the FSArray?

Looping through fsArray.toStringArray() only returns the string "FSArray" and not the actual value.


Solution

  • There are some important concepts which are important to understand when retrieving values from a FSArray within UIMA:

    • org.apache.uima.cas.Type - The Type describes the data model. It's similar to the concept of classes in java. A Type has a name space and it defines attributes (features).
    • org.apache.uima.cas.Feature - Is an attribute described by a Type.
    • org.apache.uima.jcas.cas.TOP - Is the most general Type and could be compared with java.lang.Object.
    • org.apache.uima.cas.FeatureStructure - A FeatureStructure can best be described as an instance of a Type. The FeatureStructure is what you use to access data.

    Let say that we have the following two Types:

    • com.a.b.c.ColoredCar
    • com.a.b.c.Car

    And we have the following sentence:

    Car A and car B are both blue.
    

    Lets assume that a previous UIMA stage has annotated the entire sentence using the Type com.a.b.c.ColoredCar as following:

    begin: 0
    end: 24
    color: "blue"
    cars: FSArray
    

    Lets also assume that we known from the type definition that the feature cars is an FSArray of com.a.b.c.Car and that Car contains the following values:

    begin: 4
    end: 5
    manufacturer: "Volvo"
    
    begin: 14
    end: 15
    manufacturer: "Toyota"
    

    The following code will then demonstrate how to retrieve the manufacturer attributes / features of the cars FSArray.

    public void process(JCas aJCas) throws AnalysisEngineProcessException {
        List<TOP> tops = new ArrayList<TOP>(JCasUtil.selectAll(aJCas));
        List<String> manufacturers = new ArrayList<>();
        for (TOP t : tops) {
            if (t.getType().getName().endsWith("ColoredCar")) {
                Feature carsFeature = t.getType().getFeatureByBaseName("cars");
                FSArray fsArray = (FSArray) t.getFeatureValue(carsFeature);
                FeatureStructure[] arrayStructures = fsArray.toArray();
                for (int i = 0; i < arrayStructures.length; i++) {
                    FeatureStructure fs = arrayStructures[i];
                    Feature manufacturerFeature = fs.getType().getFeatureByBaseName("cars");
                    manufacturers.add(fs.getStringValue(manufacturerFeature) );
                }
            }
        }
    }
    

    To dig deeper into this, it's a good idea to read how Type systems, Heap and Index Repository works within CAS.