Search code examples
lucenefst

Migrate lucene FST files from 5.1.0 to 8.9.0


I have files with FST's created with lucene 5.1.0.

After upgrading to lucene 8.9.0 I get exception when I am trying to read FST from file:

org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource org.apache.lucene.store.InputStreamDataInput@34ce8af7): 4 (needs to be between 6 and 7). This version of Lucene only supports indexes created with release 6.0 and later.

Is there any way to upgrade old FST files to new format?


Solution

  • I solved it this way.

    Write all content from FST to text file:

    public static <T> void writeToTextFile(FST<T> fst, Path filePath) throws IOException {
        try (BufferedWriter writer = Files.newBufferedWriter(filePath)) {
            BytesRefFSTEnum<T> fstEnum = new BytesRefFSTEnum<>(fst);
            while (fstEnum.next() != null) {
                BytesRefFSTEnum.InputOutput<T> inputOutput = fstEnum.current();
                writer.write(inputOutput.input.utf8ToString() + "\t" + inputOutput.output.toString() + "\n");
            }
        }
    }
    

    Change lucene version to new and read content from file:

    public static <T> FST<T> readFromTextFile(Path filePath, Outputs<T> outputs, Function<String, T> fromString) throws IOException {
        Builder<T> builder = new Builder<>(FST.INPUT_TYPE.BYTE1, outputs);
        IntsRefBuilder scratchInts = new IntsRefBuilder();
    
        try (BufferedReader reader = Files.newBufferedReader(filePath)) {
            String[] split = reader.readLine().split("\t");
    
            BytesRef scratchBytes = new BytesRef(split[0]);
            builder.add(Util.toIntsRef(scratchBytes, scratchInts), fromString.apply(split[1]));
        }
    
        return builder.finish();
    }