Search code examples
pythonpandasdataframesplitdelimiter

Pandas Split Column by delimiter when the delimiter does not exist in the column at all: ValueError - Columns must be same length as key


I have dataframes as follows;

df1:

Col1 Col2
1 hey.1
2 and.5
3 hai.6
4 six.4

I want to split Col2 by delimiter ".". I am doing that by executing this code:

df1[["Col3", "Col2"]] = df1["Col2"].str.split(".", expand = True)

My outcome is this (which is ultimately what I want):

df1:

Col1 Col2 Col3
1 hey 1
2 and 5
3 hai 6
4 six 4

MY PROBLEM IS; I also have another df called df2, where col2 does not have the delimiter ".".

df2:

Col1 Col2
1 ho
2 has
3 text
4 map

The problem is that I still need to execute the same code on this df as well. But when the col2 does not have the delimiter, the desired outcome I want is this;

df2:

Col1 Col2 Col3
1 ho NA
2 has NA
3 text NA
4 map NA

But when running this code on df2;

df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True)

I receive the following error: "ValueError: Columns must be same length as key"

My question is: Is there a way to write the code so that it successfully executes on dfs like df1, but it also execute succesfully on dfs like df2 (by passing NAs when delimiter does not exist)?


Solution

  • Trick is create DataFrame with 2 columns after split - because swapped order is used default columns names [1, 0] in DataFrame.reindex:

    df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True).reindex([1, 0], axis=1)
    

    Or if original order use [0, 1]:

    df2[["Col2", "Col3"]] = df2["Col2"].str.split(".", expand = True).reindex([0, 1], axis=1)
    

    Sample:

    print (df2)
       Col1 Col2
    0     1   ho
    1     2  has
    
    print (df2["Col2"].str.split(".", expand = True))
         0
    0   ho
    1  has
    
    df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True).reindex([1, 0], axis=1)
    print (df2)
       Col1 Col2  Col3
    0     1   ho   NaN
    1     2  has   NaN
    

    If 2 columns DataFrame after split same solution working:

    print (df2)
       Col1   Col2
    0     1   ho.3
    1     2  has.4
    
    print (df2["Col2"].str.split(".", expand = True))
         0  1
    0   ho  3
    1  has  4
    
    df2[["Col3", "Col2"]] = df2["Col2"].str.split(".", expand = True).reindex([1, 0], axis=1)
    print (df2)
       Col1 Col2 Col3
    0     1   ho    3
    1     2  has    4