Search code examples
javahunspellproject-panama

Java, project panama and how to deal with Hunspell 'suggest' result


I'm experimenting with Hunspell and how to interact with it using Java Project Panama (Build 19-panama+1-13 (2022/1/18)). I was able to get some initial testing done, as in creating a handle to Hunspell and subsequently using that to perform a spell check. I'm now trying something more elaborate, letting Hunspell give me suggestions for a word not present in the dictionary. This is the code that I have for that now:

public class HelloHun {
    public static void main(String[] args) {
        MemoryAddress hunspellHandle = null;
        try (ResourceScope scope = ResourceScope.newConfinedScope()) {
            var allocator = SegmentAllocator.nativeAllocator(scope);

            // Point it to US english dictionary and (so called) affix file
            // Note #1: it is possible to add words to the dictionary if you like
            // Note #2: it is possible to have separate/individual dictionaries and affix files (e.g. per user/doc type)
            var en_US_aff = allocator.allocateUtf8String("/usr/share/hunspell/en_US.aff");
            var en_US_dic = allocator.allocateUtf8String("/usr/share/hunspell/en_US.dic");

            // Get a handle to the Hunspell shared library and load up the dictionary and affix
            hunspellHandle = Hunspell_create(en_US_aff, en_US_dic);

            // Feed it a wrong word
            var javaWord = "koing";

            // Do a simple spell check of the word
            var word = allocator.allocateUtf8String(javaWord);
            var spellingResult = Hunspell_spell(hunspellHandle, word);
            System.out.println(String.format("%s is spelled %s", javaWord, (spellingResult == 0 ? "incorrect" : "correct")));

            // Hunspell also supports giving suggestions for a word - which is what we do next
            // Note #3: by testing this `koing` word in isolation - we know that there are 4 alternatives for this word
            // Note #4: I'm still investigating how to access individual suggestions

            var suggestions = allocator.allocate(10);
            var suggestionCount = Hunspell_suggest(hunspellHandle, suggestions, word);

            System.out.println(String.format("There are %d suggestions for %s", suggestionCount, javaWord));

            // `suggestions` - according to the hunspell API - is a `pointer to an array of strings pointer`
            // we know how many `strings` pointer there are, as that is the returned value from `suggest`
            // Question: how to process `suggestions` to get individual suggestions


        } finally {
            if (hunspellHandle != null) {
                Hunspell_destroy(hunspellHandle);
            }
        }
    }
}

What I'm seeing is that a call to Hunspell_suggest (created from jextract) succeeds and gives me back (4) suggestions (which I verified using Hunspell from the commandline) - so no problem there.

What is more challenging for me now is how do I unpack the suggestions element that comes back from this call? I've been looking at various examples, but none of them seem to go into this level of detail (and even if I find examples, they seem to be using outdated panama APIs).

So in essence, here is my question:

How do I unpack a structure that reportedly consists of a pointer to an array of strings pointer using panama JDK19 APIs to their respective collection of strings?


Solution

  • Looking at the header here: https://github.com/hunspell/hunspell/blob/master/src/hunspell/hunspell.h#L80

    /* suggest(suggestions, word) - search suggestions
     * input: pointer to an array of strings pointer and the (bad) word
     *   array of strings pointer (here *slst) may not be initialized
     * output: number of suggestions in string array, and suggestions in
     *   a newly allocated array of strings (*slts will be NULL when number
     *   of suggestion equals 0.)
     */
    LIBHUNSPELL_DLL_EXPORTED int Hunspell_suggest(Hunhandle* pHunspell,
                                                  char*** slst,
                                                  const char* word);
    

    The slst is a classic 'out' parameter. i.e. we pass a pointer to some value (in this case a char** i.e. an array of strings), and the function will set this pointer for us, as a way to return multiple results. (the first result being the number of suggestions)

    In panama you use 'out' parameters by allocating a segment with the layout of the type the parameter is a pointer of. In this case char*** is a pointer to char**, so the layout is ADDRESS. We then pass the created segment to the function, and finally retrieve/use the value from that segment after the function call, which will have filled in the segment contents:

    // char***
    var suggestionsRef = allocator.allocate(ValueLayout.ADDRESS); // allocate space for an address
    var suggestionCount = Hunspell_suggest(hunspellHandle, suggestionsRef, word);
    // char** (the value set by the function)
    MemoryAddress suggestions = suggestionsRef.get(ValueLayout.ADDRESS, 0);
    

    After that, you can iterate over the array of strings:

    for (int i = 0; i < suggestionCount; i++) {
        // char* (an element in the array)
        MemoryAddress suggestion = suggestions.getAtIndex(ValueLayout.ADDRESS, i);
        // read the string
        String javaSuggestion = suggestion.getUtf8String(suggestion, 0);
    }