Search code examples
pythongoogle-sheetsgspread

Retrieve value of links from Google Sheet?


I have some values in a Google Sheet, some of which are hyperlinked, like the third one here:

enter image description here

I want to retrieve the text value of each cell, and the hyperlink if present.

I am able to access the sheet easily enough with gspread:

import gspread
from oauth2client.service_account import ServiceAccountCredentials

scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
    './credentials.json', scope)
gc = gspread.authorize(credentials)
key = 'xxxxx'
wks = gc.open_by_key(key).worksheets()
for wk in wks:
    links = wk.col_values(3)
    for l in links:
       print l.value

However, this only prints the string value of the links, not the actual href that the link is pointing to.

Does anyone know if it's possible to retrieve this programmatically, using gspread or another library?


Solution

  • In gspread, a Cell instance has an undocumented attribute input_value that gives you access to the formula.

    >>> formula = mycell.input_value
    >>> formula
    '=HYPERLINK("https://url.com","Link Text")'
    >>> lst = formula.split('"')
    >>> lst[1], lst[3]
    ('https://url.com', 'Link Text')
    

    From there you just need to split the string to remove the undesired parts.

    In your case, you may want to subclass gspread.Worksheet like so:

    class CustomWorksheet(gspread.Worksheet):
        def get_links(self, col):
            """Returns a list of all links urls in column `col`.
               Empty cells in this list will be rendered as :const:`None`.
            """
            start_cell = self.get_addr_int(1, col)
            end_cell = self.get_addr_int(self.row_count, col)
    
            row_cells = self.range('%s:%s' % (start_cell, end_cell))
            return [cell.input_value.split('"')[1] for cell in row_cells if cell.input_value.startswith('=HYPERLINK')]