Search code examples
juliaxlsx

Is there a way to download an excel file (with multiple sheets) from google drive in Julia?


I´ve tried using google_download(“URL”, “local_path”) from “GoogleDrive” library, but it seems to only get the first sheet in csv format.

Have any clue?


Solution

  • It looks like the guts of GoogleDrive.jl is just doing some url manipulation.

    https://github.com/tejasvaidhyadev/GoogleDrive.jl/blob/master/src/GoogleDrive.jl#L26

    isg_sheet(url) = occursin("docs.google.com/spreadsheets", url)
    isg_drive(url) = occursin("drive.google.com", url)
    
    function sheet_handler(url; format=:csv)
        link, expo = splitdir(url)
        if startswith(expo, "edit") || expo == ""
            url = link * "/export?format=$format"
        elseif startswith(expo, "export")
            url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format\\2"))
        end
        url
    end
    
    function google_download(url, localdir)
        long_url = unshortlink(url)
        if isg_sheet(long_url)
    
            long_url = sheet_handler(long_url)
        end
    
        if isg_drive(long_url)
            drive_download(long_url, localdir)
        else
            DataDeps.fetch_http(long_url, localdir)
        end
    end
    

    Google Sheets API

    If you want to do much more than that you need to actually use the Google Sheet's API.

    https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/get

    GET https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}
    

    https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets#Spreadsheet

    With the GET on a spreadsheet id (found in the URL for the spreadsheet) the response contains the ids for all the sheets in the spreadsheet.

    {
      "spreadsheetId": string,
      "properties": {
        object (SpreadsheetProperties)
      },
      "sheets": [
        {
          object (Sheet)
        }
      ],
      "namedRanges": [
        {
          object (NamedRange)
        }
      ],
      "spreadsheetUrl": string,
      "developerMetadata": [
        {
          object (DeveloperMetadata)
        }
      ]
    }
    

    Then you can pull out those sheet id's and do something with them, such as do an export request on each with a format of csv.

    Manually scape the gid for each sheet

    If you don't want to use the API, and just copy those ids out of the browser URL as you click on each, you may be able to turn them into download links.

    To get them to go thru the export link properly you need to pass the gid into the export url like so:

    (This is a slightly modified version of GoogleDrive.sheet_handler that takes in the sheet gid)

    function sheet_handler(url; format=:csv, sheet_gid=0)
        link, expo = splitdir(url)
        if startswith(expo, "edit") || expo == ""
            url = link * "/export?format=$format&gid=$sheet_gid"
        elseif startswith(expo, "export")
            url = replace(url, r"format=([a-zA-Z]*)(.*)"=>SubstitutionString("format=$format&gid=$sheet_gid\\2"))
        end
        url
    end
    

    So for my example test sheet I had three sheets with the following gids

    • Sheet1, gid=0
    • Sheet2, gid=972467363
    • Sheet3, gid=1251741166

    So to grab the third one, I did this:

    DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")
    

    Here is the example run:

    julia> using GoogleDrive
    
    julia> using GoogleDrive.DataDeps
    
    julia> url = read("link.txt", String)
    "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/edit?usp=sharing"
    
    julia> DataDeps.fetch_http(sheet_handler(url; format=:csv, sheet_gid=1251741166), ".")
    ┌ Info: Downloading
    │   source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
    │   dest = "./export?format=csv&gid=1251741166"
    │   progress = NaN
    │   time_taken = "0.0 s"
    │   time_remaining = "NaN s"
    │   average_speed = "∞ B/s"
    │   downloaded = "404 bytes"
    │   remaining = "∞ B"
    └   total = "∞ B"
    ┌ Info: Downloading
    │   source = "https://docs.google.com/spreadsheets/d/13-LtgMi8evaxGxUTwlZZ_lqmr8Epcqt1ZSPUszqWhW4/export?format=csv&gid=1251741166"
    │   dest = "./download-test-julia-Sheet3.csv"
    │   progress = NaN
    │   time_taken = "0.0 s"
    │   time_remaining = "NaN s"
    │   average_speed = "7.324 KiB/s"
    │   downloaded = "15 bytes"
    │   remaining = "∞ B"
    └   total = "∞ B"
    "./download-test-julia-Sheet3.csv"
    
    shell> cat download-test-julia-Sheet3.csv
    Data on Sheet 3
    

    Download as a format that has all the sheets

    If you use the sheet_handler call and pass in a format that supports multiple sheets, then you can just parse and manipulate the output locally. Such as with xlsx. I haven't tried it just now, but the call would be something like:

    DataDeps.fetch_http(sheet_handler(url; format=:xlsx), ".")
    

    Then find your favorite Julia Excel library and you are off to the races.