How can we extract the data validation dropdown values(not just formulas/references or a single valued formula result) from a given excel sheet using python?
Currently openpyxl helps us to get the formulas but not the values directly. It even fails in cases where file contain extensions(extLst) to the OOXML specifications that are not supported.These datavalidation formulas can contain functions like OFFSET, SUBSTITUTE, VLOOKUP, INDIRECT etc. There are some libraries that support parsing/calculation of a subset of formulas(xlcalculator, pycel, formulas etc); but it fails in some cases(limitations).Is there a way we can get them using libraries like xlwings, win32com etc or even using macros with python? The datavalidation formulas can contain plain lists(values), definednames, database table reference etc. and the source file should not be ideally modified. Is there a solution which works in all these cases?
In order to obtain the results of a dynamic Excel formula you need to automate Excel itself, rather than just parse the worksheet file with an XML reader.
Hard to say this solution will fit all your cases, but it is a starting point. The Excel object model has a Validation Object, which contains all the information about the validation for a cell or range. One of the properties of the Validation might be a formula. The Worksheet/Application Evaluate method can return the result of the formula.
As an example, cell C3 has a validation formula based on an offset from cell F3:
Using win32com
, the formula for a validation range can be evaluated, as shown in the following minimal Python code:
import win32com.client as wc
#Create Excel object
xl = wc.gencache.EnsureDispatch('Excel.Application')
xl.Visible=True
#Open saved workbook, get the first worksheet, and a given cell
wb = xl.Workbooks.Open('MySheet.xlsx')
ws = wb.Worksheets[1]
rng = ws.Range('C3')
#Get the Validation object for the range
valid = rng.Validation
formula = valid.Formula1
if len(formula) >0: #If formula not empty
print('Formula for validation of cell',rng.Address,'is',formula)
#Evaluate will return different results depending on formula
#Offset returns a r x c array of cells (Range objects)
#Need to cut off the leading '=' character from the formula string
res = rng.Worksheet.Evaluate(formula[1:])
arr = [ [c.Value for c in r] for r in res]
print('Result of validation formula:',arr)
#Tidy up
wb.Close(False)
xl.Quit()
with the following output:
Formula for validation of cell $C$3 is =OFFSET($F$3,2,0,5)
Result of validation formula: [[3.0], [4.0], [5.0], [6.0], [7.0]]
You may need to experiment with the return values from Evaluate
depending on the exact formulas you are using. Using win32com
differs from file-reading methods (like openpyxl
) in that the actual Excel application is launched, and its calculation engine used to return the results. This does however mean that you need the Excel application installed and accessible (which can be an issue for web/server based use cases).