I have some values in a Google Sheet, some of which are hyperlinked, like the third one here:
I want to retrieve the text value of each cell, and the hyperlink if present.
I am able to access the sheet easily enough with gspread:
import gspread
from oauth2client.service_account import ServiceAccountCredentials
scope = ['https://spreadsheets.google.com/feeds']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'./credentials.json', scope)
gc = gspread.authorize(credentials)
key = 'xxxxx'
wks = gc.open_by_key(key).worksheets()
for wk in wks:
links = wk.col_values(3)
for l in links:
print l.value
However, this only prints the string value of the links, not the actual href that the link is pointing to.
Does anyone know if it's possible to retrieve this programmatically, using gspread or another library?
In gspread
, a Cell
instance has an undocumented attribute input_value
that gives you access to the formula.
>>> formula = mycell.input_value
>>> formula
'=HYPERLINK("https://url.com","Link Text")'
>>> lst = formula.split('"')
>>> lst[1], lst[3]
('https://url.com', 'Link Text')
From there you just need to split the string to remove the undesired parts.
In your case, you may want to subclass gspread.Worksheet
like so:
class CustomWorksheet(gspread.Worksheet):
def get_links(self, col):
"""Returns a list of all links urls in column `col`.
Empty cells in this list will be rendered as :const:`None`.
"""
start_cell = self.get_addr_int(1, col)
end_cell = self.get_addr_int(self.row_count, col)
row_cells = self.range('%s:%s' % (start_cell, end_cell))
return [cell.input_value.split('"')[1] for cell in row_cells if cell.input_value.startswith('=HYPERLINK')]