Search code examples
javaeclipsewebweb-scrapingwebharvest

web harvest - scraping an url


I am using web harvest. However, I want to scrape data from the URL:

http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=&timestamp=1363305908912

My code is:

<?xml version="1.0" encoding="UTF-8"?>

<config>
    <var-def name="google">
    <html-to-xml>
    <http url="http://derstandard.at/anzeiger/immoweb/Suchergebnis.aspx?Regionen=9&Bezirke=&Arten=&AngebotTyp=&timestamp=1363305908912"></http>
    </html-to-xml>
    </var-def>
</config>

However I get:

Reference to the entity Bezirke has to end with an ';'

I do not understand what is meant by web harvest, with the ';'?


Solution

  • I don't know too much about web-harvesting, but their example has this:

    <xpath expression="//a[@shape='rect']/@href">
        <html-to-xml>
            <http url="http://www.somesite.com/"/>
        </html-to-xml>
    </xpath>
    
    <http url =".." />
    

    Whereas your code has

    <http url = ".."></http> 
    

    Maybe this is your problem? No need for closing tag