Search code examples
apachehttpsautomationcomparisonfrontend

How do I compare whether two web pages have the same layout and content?


The two URLs

http://www.bbprescott.com/

and

https://www.bbprescott.com/

have the same content although one starts with "http://" and the other starts with "https://". Instead of checking them manually, how can I compare them automatically, a script that returns true if they have the same content, and false if they don't.


Solution

  • I based my answer on this link

    You can adapt it for your needs

    Create a file called for instance myscript.sh with this content :

    #!/bin/sh
    wget --output-document=url_http.html http://www.bbprescott.com/
    wget --output-document=url_https.html https://www.bbprescott.com/
    
    diff --brief url_http.html url_https.html >/dev/null
    comp_value=$?
    
    if [ $comp_value -eq 1 ]
    then
        echo "The two web pages are different"
    else
        echo "The two web pages are identical"
    fi
    
    rm -f url_http*.html
    

    Then on command line add execution right to your logged in user :

    chmod u+x myscript.sh
    

    Then execute it :

    ./myscript.sh
    

    An if you want to see the differences between your two url content, you can execute manually :

    wget --output-document=url_http.html http://www.bbprescott.com/
    wget --output-document=url_https.html https://www.bbprescott.com/
    diff url_http.html url_https.html