I have a Python program that generates reports from data in SQL database. For CI pipeline tests I create an in memory sqlite3 database using the following command:
from sqlaclhemy import create_engine
eng = create_engine("sqlite:///")
con = eng.connect()
I initialize this database from a bunch of csv files (all utf8), with filenames as the table names and first rows as headers. The initialization is done like this:
import petl as etl
for file in folder:
filename = os.path.splittext(os.path.basename(os.path.absname(file))[0])
data = etl.fromcsv(file, delimiter='|', encoding="utf8")
etl.todb(csvTable, con, filename, create=true)
This database is then used to run tests for the app.
All the tests works great in my Windows-10 dev machine, but when I push to git and CI-job is triggered (CI jobs are run in centos7 based docker containers), the tests fail. After some debugging I managed to find the root cause of the failure by the following code snippet:
results = etl.fromdb(con, "SELECT * FROM PERSONS")
print(results)
Which causes the following error:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 317: ordinal not in range(128)
Any idea what causes this. Both sqlalchemy and sqlite should use utf8 as default. Why does my code work in windows, but not in linux. I have been trying to fix this for a day now and I'm stumped. Any help is appreciated.
Python is version 3.6.8 and all packages are same version in my windows and linux machines.
PS The print command is used to demonstrate the problem (the tests font make use of print()). The actual problem is that when querying table containing Unicode characters ('ä' in this case), the results are empty.
Okay as per Klaus' comment, the problem was locale. Our docker container has no localectl process, and even though locale is set in /etc/locale.conf the system defaults to POSIX. Adding LANG environmental variable fixed the problem.