With some help I have been able to put together a script but with this block I keep getting an error:
import openpyxl
import xlrd
import requests
# loads the workbook - assumes the sussex.xlsx file is in the same directory as the script
wb = openpyxl.load_workbook("sussex.xlsx")
# load first worksheet
ws = wb.worksheets[0]
#go to the British Fencing Association website and download this file (specified)
url = "https://www.britishfencing.com/wp-content/uploads/2018/10/mf_oct_2018.xls"
downloaded_file = requests.get(url)
#write the contents to a new file called rankings.xls
with open("rankings.xls", 'wb') as file:
file.write(downloaded_file.content)
# Use xlrd to open older style .xls workbook
rank_wb = xlrd.open_workbook('rankings.xls')
# Get the first sheet of the ranked workbook
rank_ws = rank_wb.sheet_by_index(0)
# Get the total number of rows to be used to create our license list
rows = rank_ws.nrows
# Due to formatting, real numbers don't start until here
startrow = 5
# Create list of lic numbers
rank_lic = rank_ws.col_values(colx=4, start_rowx=startrow, end_rowx=rows)
# print the values in the second column of the first sheet
for row in ws['B1:B{}'.format(ws.max_row)]:
for cell in row:
print(cell.value)
# Putting values in same row as "Rank, Name, First name,...." adjust as necessary
ws.cell(2, 7).value = 'Fencer NIF'
ws.cell(2, 8).value = 'Points scored'
# Define function to lookup NIF and return value
def get_nif(x):
startrow = 5
for y in rank_lic:
if int(x) == y:
try:
return int(rank_ws.cell_value(startrow, 9))
except ValueError:
pass
startrow = startrow + 1
#sum of NIF values
Grand_Total_Row = ws.max_row + 1
ws.cell(Grand_Total_Row, 1).value = "Grand Total"
ws.cell(Grand_Total_Row, 4).value = "=SUM(G4:G{})".format(ws.max_row - 1)
for row in ws['D3:D{}'.format(ws.max_row)]:
for cell in row:
nif_val = get_nif(cell.value)
ws.cell(cell.row, 7).value = nif_val
# While testing I always save to a different workbook so I don't overwrite my test spreadsheet
wb.save('sussex2.xlsx')
The error is:
Traceback (most recent call last):
File "clubranking.py", line 63, in <module>
nif_val = get_nif(cell.value)
File "clubranking.py", line 48, in get_nif
if int(x) == y:
ValueError: invalid literal for int() with base 10: '=SUM(G4:G35)'
What I am trying to do is take cell G4 to the last cell with a value in that column, and on the row below it sum those values.
Does anyone have any ideas on how to resolve this?
FYI I am using request; xlrd; and openpyxl
Your issue is that the libraries you're using work with Excel files and this is not quite the same thing as working within the Excel program. The Excel program does quite a lot of things automatically that are not done by openpyxl or xlrd.
For example, one thing you have to know about Excel files is that formulas and their results are two completely separate things. They are stored separately and read independently. In openpyxl, if you write a formula to a cell, then that cell only has a formula stored in it. It does not (and cannot) calculate the result of the formula.
Similarly, when it comes time to read a cell, openpyxl will either give you the formula or the result, but not both (you have to choose which one you want; by default you get the formula if there is one). xlrd will only ever give you the result. And I cannot stress enough: In your case there is no result to read because it simply hasn't been calculated. So even if you tell openpyxl to give you the result instead of the formula, that won't help you.
In contrast, the Excel program (by default) always recalculates formulas, and stores both the formulas and their results at the same time, keeping them synchronized. This isn't the case with the libraries you're using.
The error message you showed is what you get when you try to convert '=SUM(G4:G35)'
into an integer. Note that Python's int
function has no idea what Excel is or what cells are or what formulas are. It is trying to convert an equals sign, a capital 'S', a capital 'U', a capital 'M', a left paren, a capital 'G', the digit 4, a colon, another capital 'G', the digits 3 and 5, and a right paren into an integer. Python is telling you that it can't make sense of that as an integer.
Your main options if you want to keep doing this with Python are (1) calculate the sum yourself, in Python, and use that; or (2) use a differently library, such as xlwings, which will work with the Excel program, rather than raw files.