Search code examples
htmlhttpgodownloadget

How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list using golang?


Currently I am downloading files using below function and I wanted to download folders as well from the URL

Any help would be appreciated

 package main
        
        import (
            "fmt"
            "io"
            "net/http"
            "os"
        )
        
        func main() {
            fileUrl := "http://example.com/file.txt"
            err := DownloadFile("./example.txt", fileUrl)
            if err != nil {
                panic(err)
            }
            fmt.Println("Downloaded: " + fileUrl)
        }
        
        // DownloadFile will download a url to a local file.
        func DownloadFile(filepath string, url string) error {
        
            // Get the data
            resp, err := http.Get(url)
            contentType = resp.Header.Get("Content-Type")  
    
            if err != nil {
                return err
            }
            defer resp.Body.Close()
    
    if contentType == "application/octet-stream" {
            // Create the file
            out, err := os.Create(filepath)
            if err != nil {
                return err
            }
            defer out.Close()
        
            // Write the body to file
            _, err = io.Copy(out, resp.Body)
            return err
        }
        }else{
        fmt.Println("Requested URL is not downloadable")
        }

I have referred below link : How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?

but I wanted it in golang


Solution

  • Here you can find the algorithm for the wget --recursive implementation: https://www.gnu.org/software/wget/manual/html_node/Recursive-Download.html

    Basically, you access the page and then parse the HTML and follow each href link (and css link if necessary), which can be extracted like this: https://vorozhko.net/get-all-links-from-html-page-with-go-lang.

    Once you have all the links just do a request on them and based on the Content-Type header you save it if it is not text/html or parse it for links if it is.