Search code examples
c#.netregexdata-annotations

Regular expression for Website address validation


I have a user input to supply website address, obviously most users have no idea what is well formatted url so I look for a website address Regex that will follow this rules:

1) www.someaddress.com - True
2) someaddress.com - True
3) http://someaddress.com - True
4) https://someaddress.com - True
5) https://www.someaddress.co.il - True
6) http://www.someaddress.com - True

I use this Regex:

[RegularExpression(@"^((http|ftp|https|www)://)?([\w+?\.\w+])+([a-zA-Z0-9\~\!\@\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?$", ErrorMessage = "Not a valid website address")]
public string SiteUrl { get; set; }

But it's useless because it allows almost every string to pass.

Please supply a data annotation answer and not answers such as:

Uri.IsWellFormedUriString

Because .net doesn't support client side validation for custom attributes.


Solution

  • There is a UrlAttribute to validate URLs, but it does enforce the protocol being there, which it appears you don't want.

    However, the source code is available and it does use a regular expression that you can steal and modify. Modifying just the protocol portion to be optional the way you want, you get this:

    ^((http|ftp|https)://)?(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?$
    

    (Side note: I noticed that your regex allowed www://, which is suspicious. I took it out in this, but if you truly do need that, then you can add it.)

    These are values I tested with:

    www.someaddress.com             Yes
    someaddress.com                 Yes
    http://someaddress.com          Yes
    https://someaddress.com         Yes
    https://www.someaddress.co.il   Yes
    cow                             No
    hi hello.com                    No
    this/that.com                   No
    

    In the comments of the source code it does say:

    This attribute provides server-side url validation equivalent to jquery validate, and therefore shares the same regular expression. See unit tests for examples.