Search code examples
pythonpandastext-miningstrip

Trim each column values at pandas


I am working on .xls files after import data to a data frame with pandas, need to trim them. I have a lot of columns. Each data starting xxx: or yyy: and in a column for example:

  1. xxx:abc yyy:def \n
  2. xxx:def yyy:ghi \n
  3. xxx:ghi yyy:jkl \n
  4. ...

I need to trim that xxx: and yyy: for each column. Researched and tried some issue solves but they doesn't worked. How can I trim that, I need an effective code. Already thanks.

(Unnecessary chars don't have static length I just know what are them look like stop words. For example:

  1. ['Comp:Apple', 'Product:iPhone', 'Year:2018', '128GB', ...]
  2. ['Comp:Samsung', 'Product:Note', 'Year:2017', '64GB', ...]

i want to new dataset look like:

  1. ['Apple', 'iPhone', '2018', '128GB', ...]
  2. ['Samsung', 'Note', '2017', '64GB', ...]

So I want to trim ('Comp:', 'Product:', 'Year:', ...) stop words for each column.


Solution

  • You can use pd.Series.str.split for this:

    import pandas as pd
    
    df = pd.DataFrame([['Comp:Apple', 'Product:iPhone', 'Year:2018', '128GB'],
                       ['Comp:Samsung', 'Product:Note', 'Year:2017', '64GB']],
                      columns=['Comp', 'Product', 'Year', 'Memory'])
    
    for col in ['Comp', 'Product', 'Year']:
        df[col] = df[col].str.split(':').str.get(1)
    
    #       Comp Product  Year Memory
    # 0    Apple  iPhone  2018  128GB
    # 1  Samsung    Note  2017   64GB