Search code examples
pythonpandascsvparsingdelimiter

Python Pandas - Email column has values with the delimiter on it


So I have this big .csv in my work that looks something like this:

Name| Adress| Email| Paid Value    
John| x street | [email protected]| 0|
Chris| c street | [email protected]| 100|
Rebecca| y street| RebeccaFML|@dmail.com|177|
Bozo | z street| BozoSMH|@yahow.com|976|

As you can see, the .csv is seperated by pipes and the email of the last two people have pipes in it, causing formating problems. There are only 2 customers with this problem but these fellas will have more and more entries every month and we have to manually find them in the csv and change the email by hand . It is a very boring and time consuming process because the file is that big.

We use python to deal with data, I researched a bit and couldn't find anything to help me with it, any ideas?

Edit: So what I want is a way to change this email adresses automatically through code (like RebeccaFML|@dmail.com -> [email protected]). It doenst need to be pandas or anything, I am accepting ideas of any sort. The main thing is I only know how to replace once I read the file in python, but since these registers have the pipes in it, they dont read properly.

Ty in advance


Solution

  • You can pass regex as separator in read_csv. \|\s*(?!\@) will split on pipes (possibly followed by spaces), but exclude pipes followed by an at sign. You can subsequently remove the remaining pipes with replace:

    import pandas as pd
    import io
    
    data = '''Name| Adress| Email| Paid Value    
    John| x street | [email protected]| 0|
    Chris| c street | [email protected]| 100|
    Rebecca| y street| RebeccaFML|@dmail.com|177|
    Bozo | z street| BozoSMH|@yahow.com|976|'''
    
    df = pd.read_csv(io.StringIO(data), sep = r'(?<!\@)\s*\|\s*(?!\@)', engine='python',index_col=False,usecols=range(4)).replace('\|','', regex=True)
    

    Output:

    Name Adress Email Paid Value
    0 John x street [email protected] 0
    1 Chris c street [email protected] 100
    2 Rebecca y street [email protected] 177
    3 Bozo z street [email protected] 976