Search code examples
regexvalidationurllocalhostsubstitution

Regex for Tag Substitution on URLs


I want to know a way to create a Regex to validate an URL, but at the middle I can have a Tag for substitution, like this:

  • http://localhost/path1/path2/{SubstitutionTag}/path3/path4/etc

Than I need a exception on my validation to except this Tag, currently I have this Regex:

  • ^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+$

In addition I also need too except the word "localhost" on my URL.

Any tips?


Solution

  • I abandoned the idea of using only Regex to validate URLs and since my project is in C# I used a "mix" of Regex and URI to validate URLs and as a result I got this code:

    private bool IsValidURL(string url)
    {
        var validationPathCounter = 0;
        var tags = Regex.Matches(url, @"\{(.*?)\}")
            .OfType<Match>()
            .Select(x => x.Value)
            .Distinct()
            .ToArray();
    
        foreach (var tag in tags)
            url = url.Replace(tag, $"validationPath{++validationPathCounter}");
    
        if (!Uri.IsWellFormedUriString(url, UriKind.Absolute))
            return false;
    
        if (Uri.TryCreate(url, UriKind.Absolute, out Uri tmp))
            return tmp.Scheme == Uri.UriSchemeHttp || tmp.Scheme == Uri.UriSchemeHttps;
        else
            return false;
    }
    

    In the Source Code above, Regex is used only to obtain the paths between the slashes /, with them I can replace them with a valid path in order to proceed with the validations.

    On the other hand, the URI is used to actually validate the URL, with it the IsWellFormedUriString and TryCreate methods are used, where the IsWellFormedUriString is the one who actually validates the URL while the TryCreate is used to verify if the validated URI is in HTTP or in HTTPs.

    For my scenario, only these types of URLs were allowed, but if this type of validation is not necessary, then the IsWellFormedUriString method is enough, since internally it already uses TryCreate.

    The Source Code above worked perfectly for me as it allowed me to validate URLs with generic parts of their path for future replacement and ensuring that the rest of the URL complies with the standards of an HTTP or HTTPS based URL.