Search code examples
pandasdataframedata-analysisdata-processing

How to split a column in a data frame containing only numbers into multiple columns in pandas


I have a .dat file containing the following data:

0001100000101010100
110101000001111
101100011001110111
0111111010100
1010111111100011

Need to count number of zeros and ones in each row

I have tried with Pandas.

  • Step-1: Read the data file
  • Step-2: Given a column name
  • Step-3: Tried to split the values into multiple columns. But could not succeed

df1=pd.read_csv('data.dat',header=None) df1.head()
                 0   1100000101010100

1   110101000001111
2   101100011001110111
3   111111010100
4   1010111111100011

df1.columns=['kirti']
df1.head()
        Kirti
_______________________
0   1100000101010100
1   110101000001111
2   101100011001110111
3   111111010100
4   1010111111100011

I need to split the data frame into multiple columns depending upon the 0s and 1s in each row.

the maximum number of columns will be equal to max no of zeros and ones in any of the rows in the data frame.


Solution

  • First create one column DataFrame by parameters names and dtype=str for convert column to strings:

    import pandas as pd
    
    temp="""0001100000101010100
    110101000001111
    101100011001110111
    0111111010100
    1010111111100011"""
    #after testing replace 'pd.compat.StringIO(temp)' to 'filename'
    df = pd.read_csv(StringIO(temp), header=None, names=['kirti'], dtype=str)
    
    print (df)
                     kirti
    0  0001100000101010100
    1      110101000001111
    2   101100011001110111
    3        0111111010100
    4     1010111111100011
    

    And then create new DataFrame by convert values to lists:

    df = pd.DataFrame([list(x) for x in df['kirti']])
    print (df)
    
       0  1  2  3  4  5  6  7  8  9 10 11 12    13    14    15    16    17    18
    0  0  0  0  1  1  0  0  0  0  0  1  0  1     0     1     0     1     0     0
    1  1  1  0  1  0  1  0  0  0  0  0  1  1     1     1  None  None  None  None
    2  1  0  1  1  0  0  0  1  1  0  0  1  1     1     0     1     1     1  None
    3  0  1  1  1  1  1  1  0  1  0  1  0  0  None  None  None  None  None  None
    4  1  0  1  0  1  1  1  1  1  1  1  0  0     0     1     1  None  None  None