Bash shell script to find Robots meta tag value

I've found this bash script to check status of URLs from text file and print the destination URL when having redirections :

while read url
    dt=$(date '+%H:%M:%S');
    urlstatus=$(curl -kH 'Cache-Control: no-cache' -o /dev/null --silent --head --write-out '%{http_code} %{redirect_url}' "$url" )
    echo "$url $urlstatus $dt" >> urlstatus.txt

done < $1

I'm not that good in bash : I'd like to add - for each url - the value of its Robots meta tag (if is exists)


  • Actually I'd really suggest a DOM parser (e.g. Nokogiri, hxselect, etc.), but you can do this for instance (Handles lines starting with <meta and "extracts" the value of the robots' attribute content):

    curl -s "$url" | sed -n '/\<meta/s/\<meta[[:space:]][[:space:]]*name="*robots"*[[:space:]][[:space:]]*content="*\([^"]*\)"*\>/\1/p'

    This will print the value of the attribute or the empty string if not available.

    Do you need a pure Bash solution? Or do you have sed?