Search code examples
google-chromedictionaryhunspell

Combine two BDIC files from Chrome (hunspell) into one


I have two BDIC (binary dictionary?) files from Google Chrome spell checker (based on hunspell). I want to combine this two files into one single BDIC for all words.

Here is a reader/writer (chromium/src/third_party/hunspell/google/bdict_reader.h) of this format from Chrome sources (LGPL/C++)

How can I combine two files with C++ or command-line utility?


Solution

  • Merging two hunspell dictionaries is easy, there are tools like https://github.com/arty-name/hunspell-merge that can help you merge any number of source dictionaries together.

    Creating bdict file that Chrome understands is trickier. Chrome uses this format for optimizations and uses convert_dict tool internally to convert aff and dic files to bdict. I couldn't find this tool online so it left only one option, building it from Chromium sources. Google has a pretty straight forward setup that if executed carefully will let you build this tool. First you have to follow http://dev.chromium.org/developers/how-tos/get-the-code to obtain the code and setup your environment base on your platform. After that execute ninja -C out\Debug convert_dict and if completed without errors, find your convert_dict executable under out/Debug folder.

    You cannot add custom language to Chrome (as far as I know) so you have to replace one of the predefined ones. I suggest installing one of the languages you don't understand and use it for your merged one. The bdict files can be found in Chrome user profile folder.