I am trying to read in the following dataset: https://data.opensanctions.org/datasets/20230620/default/names.txt
I have run this code:
filename = "https://data.opensanctions.org/datasets/20230620/default/names.txt"
df = pd.read_csv(filename, encoding='latin1', nrows = 2, header=None)
print(df)
The dataframe looks like this:
0
0 SANAVBARI NIKITENKO
1 ÐÐÐÐÐТ Ð ÐÐÐÐÐÐÐÐÐ ÐÐ¥ÐÐÐÐ...
How can I automatically detect the special character types when I read in the file ?
For me working remove encoding='latin1'
, so is used default encoding='utf-8'
:
filename = "https://data.opensanctions.org/datasets/20230620/default/names.txt"
df = pd.read_csv(filename, nrows = 2, header=None)
print(df)
0
0 SANAVBARI NIKITENKO
1 АМИНАТ РАМЗАНОВНА АХМАДОВА