I have been trying to find a url in a html document and this has to be done in regex since the url is not in any html tag so I can't use nokogiri
for that. To get the html i used httparty
and i did it this way
require 'httparty'
doc = HTTParty.get("")
puts doc
That outputs the html code. And to get the url i used the .split()
method to reach to the url. The full code is
require 'httparty'
doc = HTTParty.get('').split(".ngrok.io")[0].split('https:')[2]
puts "https:#{doc}.ngrok.io"
I wanted to do this using regex since ngrok might update their localhost html file and so this code won't work anymore. How do i do it?
If I understood correctly you want to find all hostnames matching "https://(any subdomain).ngrok.io", right ?
If then you want to use String#scan with a regexp. Here is an example:
# get your body (replace with your HTTP request)
body = "my doc contains https://subdomain.ngrok.io and https://subdomain-1.subdomain.ngrok.io"
puts body
# Use scan and you're done
urls = body.scan(%r{https://[0-9A-Za-z-\.]+\.ngrok\.io})
puts urls
It will result in an array containing ["https://subdomain.ngrok.io", "https://subdomain-1.subdomain.ngrok.io"]
Call .uniq
if you want to get rid of duplicates
This doesn't handle ALL edge cases but it's probably enough for what you need