Search code examples
spam-preventionlanguage-detection

How to detect if a text is in a given language?


I have a kind of Q&A site (very approximately) where users enter questions to be answered by our Staff. I am quite concerned about users posting non-questions, which are an annoyance. The best I thought to far is a system to detect whether the text is in Italian (our users' language), and if it is, to check if it's not a copypasta against a list of common copypastas.

So, long story short: users will input some text, I have to make sure it's a proper question in Italian and not random characters.


Solution

  • Not sure what language you'll make

    http://www.easywayserver.com/blog/java-string-contains-example/

    How do I check if a string contains a specific word in PHP?

    Checking if the input String (Question) contains any forbidden word would be one way to go at it.

    Pseudo code

    ListOfForbiddenWords;
    if Language = Italian
        if Input does not contain any of ListOfForbiddenwords
             //It's fine
        else
             //Don't spam
    else
        //You're not Italian
    

    Not quite sure on what's the best way to check if a string is written in a specific language