Search code examples
goweb-scrapinggo-colly

How to add the start of a url to a colly link list


I'm somewhat new to go and am trying to scrape several webpages using colly. Two of the pages have incomplete links, the below is the code and output

func PaloNet() {

    c := colly.NewCollector(
        colly.AllowedDomains("security.paloaltonetworks.com"),
    )

    c.OnHTML(".list", func(e *colly.HTMLElement) {
        PaloNetlinks := e.ChildAttrs("a", "href")
        fmt.Println("\n\n PaloAlto Security: \n\n", PaloNetlinks)
    })

    c.Visit("https://security.paloaltonetworks.com/")

}

Output:

[/CVE-2022-0031 /CVE-2022-42889 /PAN-SA-2022-0006 /CVE-2022-0030 /CVE-2022-0029 /PAN-SA-2022-0005 /CVE-2022-28199 /PAN-SA-2022-0004 /CVE-2022-0028 /PAN-SA-2022-0003 /CVE-2022-0024 /CVE-2022-0026 /CVE-2022-0025 /CVE-2022-0027 /PAN-SA-2022-0001 /PAN-SA-2022-0002 /CVE-2022-0023 /CVE-2022-0778 /CVE-2022-22963 /CVE-2022-0022 /CVE-2021-44142 /CVE-2022-0016 /CVE-2022-0017 /CVE-2022-0020 /CVE-2022-0011 /csv?]

As you can see the links are missing the 'https://security.paloaltonetworks.com/' section. What would be the best way to add the start of the link


Solution

  • you can do it like this

    func PaloNet() {
    visitUrl := "https://security.paloaltonetworks.com"
    urls := []string{}
    
    c := colly.NewCollector(
        colly.AllowedDomains("security.paloaltonetworks.com"),
    )
    
    c.OnHTML(".list", func(e *colly.HTMLElement) {
        PaloNetlinks := e.ChildAttrs("a", "href")
    
        for i := 0; i < len(PaloNetlinks); i++ {
            urls = append(urls, visitUrl+PaloNetlinks[i])
        }
    
        fmt.Println("\n\n PaloAlto Security: \n\n", urls)
    })
    
    c.Visit(visitUrl)
    }