I have a program where the user can copy an excel table and paste it on the program itself for data analysis. The only value allowed for the table is float. The problem is that there are 4 different ways to write a float, those being:
x = 123456.789
y = 123456,789
a = 123.456,789
b = 123,456.789
and I need the code to interpret all 4 as:
123456.789
What would be the best way to do that, such that it would work on windows, linux and mac?
Here is the full function:
# Copies csv from clipboard to a pandas dataframe
clipboard_dataframe = pd.read_clipboard()
# Get row and column count
paste_rows = len(clipboard_dataframe.index)
paste_columns = len(clipboard_dataframe.columns)
# Create the table to show the values
paste_janela = QTableWidget(paste_rows, paste_columns)
# Define the labels for headers and indexes
paste_janela.setHorizontalHeaderLabels(clipboard_dataframe.columns)
paste_janela.setVerticalHeaderLabels(clipboard_dataframe.index)
# Populate the table with the proper values
for x in range(paste_janela.rowCount()):
for y in range(paste_janela.columnCount()):
# Error handling in case the cell value isn't a float
if not isinstance(clipboard_dataframe.iat[x,y], numbers.Number):
error_message = QMessageBox.critical(self.janela_principal,
"Erro de importação de dados",
"Houe um erro na importação de dados de tabela. \nOs únicos valores aceitos são números reais")
raise ValueError("Valor inválido foi encontrado na tabela. Só é aceito números reais nas tabelas")
table_value = str(clipboard_dataframe.iat[x,y])
table_item = QTableWidgetItem(table_value)
paste_janela.setItem(x, y, table_item)
# Pass the table to the MDI Area, turning it into a subwindow in the process
self.sandbox_mdiarea.addSubWindow(paste_janela)
# Needed to load the window, otherwise it will be hidden as default
paste_janela.show()
Here is a solution that works for your test cases, plus a couple more I came up with. The biggest problem is a number like 123,456
where its not clear if it should be 123456 or 123.456.
The basic solution is to check if it has a decimal or a period or if it has both, then adjust the string to fit the built-in float
function. Assuming a decimal divider is always provided, here is a function which meets your requirements I believe:
def read_float(floatstr: str) -> float:
period_count = floatstr.count('.')
comma_count = floatstr.count(',')
if period_count > 1 and comma_count > 1:
raise ValueError(f'Unable to read float with multiple of both period and comma: {floatstr}')
if period_count == 0 and comma_count == 0:
return float(floatstr)
if period_count == 0:
return float(floatstr.replace(',', '.'))
if comma_count == 0:
return float(floatstr)
period_first = floatstr.find('.')
comma_first = floatstr.find(',')
if period_first < comma_first:
return float(floatstr.replace('.', '').replace(',', '.'))
return float(floatstr.replace(',', ''))
def _main():
import numpy as np
test_points = [
('123', 123),
('1.25', 1.25),
('1234.56', 1234.56),
('1,234.56', 1234.56),
('1,234,567.89', 1234567.89),
('1234567.89', 1234567.89),
('1,25', 1.25),
('1234,56', 1234.56),
('1.234,56', 1234.56),
('1.234.567,89', 1234567.89),
('1234567,89', 1234567.89),
('123,456', 123.456) # this is problematic - no way to tell if 123456 or 123.456
]
for teststr, testval in test_points:
output = read_float(teststr)
print(f'{teststr:>12} - {"PASS" if np.equal(testval, output) else "FAIL"}')
if __name__ == '__main__':
_main()
Which gives the following output:
123 - PASS
1.25 - PASS
1234.56 - PASS
1,234.56 - PASS
1,234,567.89 - PASS
1234567.89 - PASS
1,25 - PASS
1234,56 - PASS
1.234,56 - PASS
1.234.567,89 - PASS
1234567,89 - PASS
123,456 - PASS
I'm sure a simpler solution exists using some localization library, but I was unable to find it.
Let me know if you have any questions. If you have a test point this function doesn't work for let me know and I'll see about updating it.