I have made this regex to capture all types of url
(it literally capture all url
) but it also captures single ip
.
This is my scenario: I have a list full of IP, Hash and url and my url regex and ip regex both capture the same entry. I don't know if a single ip can be considered as "url".
My regex: ((http|https)://)?(www)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,9}\b([-a-zA-Z0-9()@:%_\|+.~#?&//={};,\[\]'"$\x60]*)?
Captures all these:
http://127.0.0.1/
http://127.0.0.1
https://127.0.0.1/m=weblogin/loginform238,363,771,89816356,2167
127.0.0.1:8080 ------> excluding this one is okay too (optional)
127.0.0.1 ------> i want to exclude this one
google.com
google.com:80
www.google.com
https://google.com
https://www.google.com
I want my regex to capture all url's except single ip's like this:
127.0.0.1
regexp.Compile()
and FindAllString
functions.You can use a regex implementing the "best trick ever" with FindAllStringSubmatch
: match what you need to skip/omit, and match and capture what you need to keep.
\b(?:https?://)?(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b(?:[^:]|$)|((?:https?://)?(?:www)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,9}\b[-a-zA-Z0-9()@:%_\|+.~#?&//={};,\[\]'"$\x60]*)
The first alternative is an IP matching regex where I added (?:https?://)?
part to match an optional protocol part and (?:[^:]|$)
part to make sure there is a char other than :
or end of string immediately after the IP pattern, but you may further adjust this part.
Then, use it in Go like
package main
import (
"fmt"
"regexp"
)
func main() {
r := regexp.MustCompile(`\b(?:https?://)?(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b(?:[^:]|$)|((?:https?://)?(?:www)?[-a-zA-Z0-9@:%._+~#=]{1,256}\.[a-zA-Z0-9()]{1,9}\b[-a-zA-Z0-9()@:%_\|+.~#?&//={};,\[\]'"$\x60]*)`)
matches := r.FindAllStringSubmatch(`http://127.0.0.1/
http://127.0.0.1
http://www.127.0.0.1/m=weblogin/loginform238,363,771,89816356,2167
127.0.0.1:8080
127.0.0.1
google.com
google.com:80
www.google.com
https://google.com
https://www.google.com`, -1)
for _, v := range matches {
if (len(v[1]) > 0) { // if Group 1 matched
fmt.Println(v[1]) // Display it, else do nothing
}
}
}
Output:
http://www.127.0.0.1/m=weblogin/loginform238,363,771,89816356,2167
127.0.0.1:8080
google.com
google.com:80
www.google.com
https://google.com
https://www.google.com