Search code examples
javascriptdiscorddiscord.jsbotsmessage

is there a way for the content.replace to sort of split them into more words than these?


const filter = ["bad1", "bad2"];

client.on("message", message => {
    var content = message.content;
    var stringToCheck = content.replace(/\s+/g, '').toLowerCase();

    for (var i = 0; i < filter.length; i++) {
        if (content.includes(filter[i])){  
            message.delete();
            break
        }
    }
});

So my code above is a discord bot that deletes the words when someone writes ''bad1'' ''bad2'' (some more filtered bad words that i'm gonna add) and luckily no errors whatsoever.

But right now the bot only deletes these words when written in small letters without spaces in-between or special characters.

I think i have found a solution but i can't seem to put it into my code, i mean i tried different ways but it either deleted lowercase words or didn't react at all and instead i got errors like ''cannot read property of undefined'' etc.

var badWords = [
  'bannedWord1',
  'bannedWord2',
  'bannedWord3',
  'bannedWord4'
];

bot.on('message', message => {
  var words = message.content.toLowerCase().trim().match(/\w+|\s+|[^\s\w]+/g);
  var containsBadWord = words.some(word => {
    return badWords.includes(word);
  });

This is what i am looking at. the var words line. specifically (/\w+|\s+|[^\s\w]+/g);.

Anyway to implement that into my const filter code (top/above) or a different approach? Thanks in advance.


Solution

  • Well, I'm not sure what you're trying to do with .match(/\w+|\s+|[^\s\w]+/g). That's some unnecessary regex just to get an array of words and spaces. And it won't even work if someone were to split their bad word into something like "t h i s".

    If you want your filter to be case insensitive and account for spaces/special characters, a better solution would probably require more than one regex, and separate checks for the split letters and the normal bad word check. And you need to make sure your split letters check is accurate, otherwise something like "wash it" might be considered a bad word despite the space between the words.

    A Solution

    So here's a possible solution. Note that it is just a solution, and is far from the only solution. I'm just going to use hard-coded string examples instead of message.content, to allow this to be in a working snippet:

    //Our array of bad words
    var badWords = [
      'bannedWord1',
      'bannedWord2',
      'bannedWord3',
      'bannedWord4'
    ];
    
    //A function that tests if a given string contains a bad word
    function testProfanity(string) {
    
      //Removes all non-letter, non-digit, and non-space chars
      var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
      
      //Replaces all non-letter, non-digit chars with spaces
      var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
    
      //Checks if a condition is true for at least one element in badWords
      return badWords.some(swear => {
      
        //Removes any non-letter, non-digit chars from the bad word (for normal)
        var filtered = swear.replace(/\W/g, "");
        
        //Splits the bad word into a 's p a c e d' word (for spaced)
        var spaced = filtered.split("").join(" ");
        
        //Two different regexes for normal and spaced bad word checks
        var checks = {
          spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
          normal: new RegExp(`\\b${filtered}\\b`, "gi")
        };
        
        //If the normal or spaced checks are true in the string, return true
        //so that '.some()' will return true for satisfying the condition
        return spacerString.match(checks.spaced) || normalString.match(checks.normal);
      
      });
    
    }
    
    var result;
    
    //Includes one banned word; expected result: true
    var test1 = "I am a bannedWord1";
    result = testProfanity(test1);
    
    console.log(result);
    
    //Includes one banned word; expected result: true
    var test2 = "I am a b a N_N e d w o r d 2";
    result = testProfanity(test2);
    
    console.log(result);
    
    //Includes one banned word; expected result: true
    var test3 = "A bann_eD%word4, I am";
    result = testProfanity(test3);
    
    console.log(result);
    
    //Includes no banned words; expected result: false
    var test4 = "No banned words here";
    result = testProfanity(test4);
    
    console.log(result);
    
    //This is a tricky one. 'bannedWord2' is technically present in this string,
    //but is 'bannedWord22' really the same? This prevents something like
    //"wash it" from being labeled a bad word; expected result: false
    var test5 = "Banned word 22 isn't technically on the list of bad words...";
    result = testProfanity(test5);
    
    console.log(result);

    I've commented each line thoroughly, such that you understand what I am doing in each line. And here it is again, without the comments or testing parts:

    var badWords = [
      'bannedWord1',
      'bannedWord2',
      'bannedWord3',
      'bannedWord4'
    ];
    
    function testProfanity(string) {
    
      var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
      var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
    
      return badWords.some(swear => {
      
        var filtered = swear.replace(/\W/g, "");
        var spaced = filtered.split("").join(" ");
        
        var checks = {
          spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
          normal: new RegExp(`\\b${filtered}\\b`, "gi")
        };
        
        return spacerString.match(checks.spaced) || normalString.match(checks.normal);
      
      });
    
    }
    

    Explanation

    As you can see, this filter is able to deal with all sorts of punctuation, capitalization, and even single spaces/symbols in between the letters of a bad word. However, note that in order to avoid the "wash it" scenario I described (potentially resulting in the unintentional deletion of a clean message), I made it so that something like "bannedWord22" would not be treated the same as "bannedWord2". If you want it to do the opposite (therefore treating "bannedWord22" the same as "bannedWord2"), you must remove both of the \\b phrases in the normal check's regex.

    I will also explain the regex, such that you fully understand what is going on here:

    • [^a-zA-Z0-9 ] means "select any character not in the ranges of a-z, A-Z, 0-9, or space" (meaning all characters not in those specified ranges will be replaced with an empty string, essentially removing them from the string).
    • \W means "select any character that is not a word character", where "word character" refers to the characters in ranges a-z, A-Z, 0-9, and underscore.
    • \b means "word boundary", essentially indicating when a word starts or stops. This includes spaces, the beginning of a line, and the end of a line. \b is escaped with an additional \ (to become \\b) in order to prevent javascript from confusing the regex token with strings' escape sequences.
    • The flags g and i used in both of the regex checks indicate "global" and "case-insensitive", respectively.

    Of course, to get this working with your discord bot, all you have to do in your message handler is something like this (and be sure to replace badWords with your filter variable in testProfanity()):

    if (testProfanity(message.content)) return message.delete();
    

    If you want to learn more about regex, or if you want to mess around with it and/or test it out, this is a great resource for doing so.