Search code examples
google-cloud-automl

Sentence alignment for MT in google AutoML


Does Google AutoML come with a sentence alignment tool?

I have lots of documents in English and Italian which are manually translated "almost" sentence by sentence so it should be easy to detect translated sentences automatically. Documents are grammatically well written relatively short: 5-10 sentences.

Is such tool on the roadmap and what would be a good tool/approach to use until it's included in the AutoML cloud service?


Solution

  • I found a bunch of options online to do this:

    https://github.com/rsennrich/Bleualign

    https://github.com/machinalis/yalign

    https://github.com/danielvarga/hunalign

    https://github.com/rali-udem/yasa

    https://github.com/cocoxu/Shakespeare

    https://www.microsoft.com/en-us/download/details.aspx?id=52608

    http://mi.eng.cam.ac.uk/~wjb31/distrib/mttkv1/

    http://champollion.sourceforge.net/