Search code examples
regexlanguage-agnosticpenetration-testingwebsecurity

Validating an obfuscation token


I am building a secured algorithm to get rid of obfuscation attacks. The user is validated with the token which should satisfy following condition:

  1. username in lowercase letters only and username is at least 5 digit long.
  2. username is followed with #.
  3. After # first two characters are important. A digit and a character always. This part contains at least a digit, a lowercase and an upperCase Letter.
  4. In between there could be any number of digits or letters only.
  5. In the last the digit and character should exactly match point-3's digit and character.
  6. It should end with #.
  7. The characters in the middle of two # should be at least 5 characters long.
  8. The complete token consists only of two #, lowercase and uppercase letters and digits. And

I don't know about regular expression but my guide told me this task is easily achieved at validation time by regular expressions. After I looked for long on the internet and found some links which are similar and tried to combine them and got this:

^[a-z]{5,}#[a-zA-Z0-9]{2}[A-Z][0-9A-Za-z]*[a-zA-Z0-9]{2}#$

But this only matches 1 test case. I don't know how I can achieve the middle part of two hashes. I tried to explain my problem as per my english. Please help.

Below test cases should pass

userabcd#4a39A234a#

randomuser#4A39a234A#

abcduser#2Aa39232A#

abcdxyz#1q39A231q#

randzzs#1aB1a#

Below test cases should fail:

randuser#1aaa1a#

randuser#1112#

randuser#a1a1##

randuser#1aa#

u#4a39a234a#

userstre#1qqeqe123231q$

user#1239a23$a#

useabcd#4a39a234a#12


Solution

  • You may try:

     ^[a-z]{5,}#(?=[^a-z\n]*[a-z])(?=[^A-Z\n]*[A-Z])(\d[a-zA-Z])[a-zA-Z\d]*\1#$    
    

    Explanation of the above regex:

    • ^, $ - Represents start and end of the line respectively.

    • [a-z]{5,} - Matches lower case user names 5 or more times.

    • # - Matches # literally.

    • (?=[^a-z]*[a-z]) - Represents a positive look-ahead asserting at least a lowercase letters.

    • (?=[^A-Z]*[A-Z]) - Represents a positive look-ahead asserting at least an uppercase letters.

    • (\d[a-zA-Z]) - Represents a capturing group matching first 2 character i.e. a digit and a letter. If you want other way then use [a-zA-Z]\d.

    • [a-zA-Z\d]* - Matching zero or more of the characters in mentioned character set.

    • \1 - Represents back-reference exactly matching the captured group.

    You can find the demo of the above regex in here.

    Note: If you want to match one string at a time i.e. for practical purposes; remove \n from the character sets.


    You can use this regex as an alternative.

    ^[a-z]{5,}#(?=.*?[a-z])(?=.*?[A-Z])(\d[a-zA-Z])[a-zA-Z\d]*\1#$
    

    Recommended reading: Principle of contrast