Search code examples
regexpredictlevenshtein-distance

Determine valid options for the next character in a sequence


Say I have the regex

const string regex = "[A-Za-z0-9]* [0-9]{1,3} [A-Za-z]* ?[A-Za-z]*";

const string address = "ABC 123 Sesame Street"; // this is a valid match

and so far I have typed "ABC 123 Se".

As a human, I can see that the next character needs to be a letter. Is there an algorithm that can do that for a computer?

I have looked at Levenshtein Distance algorithms, but in order for those to provide information I need two strings, and I only have a string and a regex. Spell Checking algorithms don't quite match my situation either.

I would prefer a generic solution, so that if for some reason I need to allow 123 N 4567 W Paris, Idaho all I have to do is modify the regex.

Edit

I should have said, "as a human, I can see that the regex won't allow the next character to be a number or special character, so I can exclude those options." Thanks for catching that!


Solution

  • According to this question, it is possible, you just have to be clever about the regex's you use. For example, if you are parsing IPs:

    List<string> validNextOptions = new List<string>();
    string currentString = "255.3";
    string newCharacter = "2";
    string partialIP = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])[.]){0,3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])?$";
    Regex partialIpRegex = new Regex(partialIP);
    
    if(partialIpRegex.IsMatch(currentString + newCharacter))
    {
        validNextOptions.Add(newCharacter);
    }
    

    This regex will return a match as long as you are progressing toward a valid IP. If you are unfamiliar with how regex's work, I reccomend you post the particularIP string into something like regex101.com and play with it a bit.