Search code examples
nlpinformation-retrieval

Does Google engine penalize pages containing (machine or human) translated content?


Google SE has zero-tolerance policy against duplicate and spun content, but I am not sure how it deals with translated text? Any guesses on how it might detect translated content? The first thing occurs to my mind is they use their own Google Translate to back-translate the translated content into the source language, but if that's the case do they have to try back-translating into all languages? Are there any specific similarity metrics for such a task? Thank you!


Solution

  • From this video with a Google employee, auto-generated / machine translated versions of webpages can count against your site as duplicate content. If you append the machine translated version with some text of your own you might be able to get around this 'Yes, it's duplicated content' flag, but we can't know how much original text needs to be added to a translation in order for the Google robots to flag the page as original content instead of duplicated content.

    Your best bet would be to have an actual human translate the whole web page or you could have a human translator augment or modify a machine-translated version of your webpage so that human-edited translation of your website is sufficiently different (what 'sufficiently' is we don't know) from the machine translated version.