Search code examples
pandastext

Load a txt with unstructured text in Python


I have found a .txt file with the names of more than 5000 cities around the world. The link is here. The text within is all messy. I would like, in Python, to read the file and store it into a list, so I could search the name of a city whenever I want?

I tried loading it as a dataframe with

import pandas as pd
cities = pd.read_csv('cities15000.txt',error_bad_lines=False)   

However, everything looks very messy. Is there an easier way to achieve this? Thanks in advance!


Solution

  • The linked file is like a CSV (Comma Separated Values) but instead of commas it uses tabs as the field separator. Set the sep parameter of the pd.read_csv function to \t, i.e. the tab character.

    In [18]: import pandas as pd
        ...: 
        ...: pd.read_csv('cities15000.txt', sep = '\t', header = None)
    Out[18]: 
                0                    1                    2                                                  3         4         5   ...   13      14  15    16              17          18
    0      3040051         les Escaldes         les Escaldes  Ehskal'des-Ehndzhordani,Escaldes,Escaldes-Engo...  42.50729   1.53414  ...  NaN   15853 NaN  1033  Europe/Andorra  2008-10-15
    1      3041563     Andorra la Vella     Andorra la Vella  ALV,Ando-la-Vyey,Andora,Andora la Vela,Andora ...  42.50779   1.52109  ...  NaN   20430 NaN  1037  Europe/Andorra  2020-03-03
    2       290594   Umm Al Quwain City   Umm Al Quwain City  Oumm al Qaiwain,Oumm al Qaïwaïn,Um al Kawain,U...  25.56473  55.55517  ...  NaN   62747 NaN     2      Asia/Dubai  2019-10-24
    3       291074  Ras Al Khaimah City  Ras Al Khaimah City  Julfa,Khaimah,RAK City,RKT,Ra's al Khaymah,Ra'...  25.78953  55.94320  ...  NaN  351943 NaN     2      Asia/Dubai  2019-09-09
    4       291580           Zayed City           Zayed City  Bid' Zayed,Bid’ Zayed,Madinat Za'id,Madinat Za...  23.65416  53.70522  ...  NaN   63482 NaN   124      Asia/Dubai  2019-10-24
    ...        ...                  ...                  ...                                                ...       ...       ...  ...  ...     ...  ..   ...             ...         ...
    24563   894701             Bulawayo             Bulawayo  BUQ,Bulavajas,Bulavajo,Bulavejo,Bulawayo,bu la... -20.15000  28.58333  ...  NaN  699385 NaN  1348   Africa/Harare  2019-09-05
    24564   895061              Bindura              Bindura       Bindura,Bindura Town,Kimberley Reefs,Биндура -17.30192  31.33056  ...  NaN   37423 NaN  1118   Africa/Harare  2010-08-03
    24565   895269           Beitbridge           Beitbridge  Bajtbridz,Bajtbridzh,Beitbridge,Beitbridzas,Be... -22.21667  30.00000  ...  NaN   26459 NaN   461   Africa/Harare  2013-03-12
    24566  1085510              Epworth              Epworth                                            Epworth -17.89000  31.14750  ...  NaN  123250 NaN  1508   Africa/Harare  2012-01-19
    24567  1106542          Chitungwiza          Chitungwiza  Chitungviza,Chitungwiza,Chytungviza,Citungviza... -18.01274  31.07555  ...  NaN  340360 NaN  1435   Africa/Harare  2019-09-05
    
    [24568 rows x 19 columns]