Search code examples
pythonfunctionwhitespaceapplystrip

How to strip whitespaces from Python DataFrame in this example


I'm reading a excel file into a DataFrame. I need to strip whitespace from all the cells, leaving the other cells unchanged in Python 3.5. for example:

from pandas import Series, DataFrame
import pandas as pd
import numpy as np

#read data from DataFrame
data_ThisYear_Period=[[' 序 号','北  京','上  海','  广州'],\
                      ['  总计','11232',' 2334','3 4'],\
                      [' 温度','1223','23 23','2323'],\
                      ['人 口','1232','21 321','1222'],\
                      ['自行车', '1232', '21321', '12  22']]
data_LastYear_Period=DataFrame(data_ThisYear_Period)
print(type(data_LastYear_Period))

data_ThisYear_Period.apply(data_ThisYear_Period.str.strip(),axis=1)

Traceback (most recent call last): File "C:/test/temp.py", line 17, in data_ThisYear_Period.apply(data_ThisYear_Period.str.strip(),axis=1) AttributeError: 'list' object has no attribute 'apply'

How to strip whitespaces from Python DataFrame in this example


Solution

  • use applymap to the dataframe, applymap applies a lambda function on each cell. In the lambda function split the string (white spaces are ignored in it) and then join it. If there is an int, then you can use if else in lambda function.

    from pandas import Series, DataFrame
    import pandas as pd
    import numpy as np
    
    #read data from DataFrame
    data_ThisYear_Period=[[' 序 号','北  京','上  海','  广州'],\
                          ['  总计','11232',' 2334','3 4'],\
                          [' 温度','1223','23 23','2323'],\
                          ['人 口',1232,'21 321','1222'],\
                          ['自行车', '1232', '21321', '12  22']]
    
    data_LastYear_Period=DataFrame(data_ThisYear_Period)
    print data_LastYear_Period
    data_LastYear_Period = data_LastYear_Period.applymap((lambda x: "".join(x.split()) if type(x) is str else x ))
    
    print data_LastYear_Period
    

    results in

          0      1       2       3
    0   序 号   北  京    上  海      广州
    1    总计  11232    2334     3 4
    2    温度   1223   23 23    2323
    3   人 口   1232  21 321    1222
    4   自行车   1232   21321  12  22
    
         0      1      2     3
    0   序号     北京     上海    广州
    1   总计  11232   2334    34
    2   温度   1223   2323  2323
    3   人口   1232  21321  1222
    4  自行车   1232  21321  1222
    

    on a side note, you are getting this particular error because

    data_ThisYear_Period.apply(data_ThisYear_Period.str.strip(),axis=1)
    

    data_ThisYear_Period is a list and not a pandas dataframe (data_LastYear_Period)