Search code examples
pythonpandassplitseries

Pandas series.str.split throwing error if expand = False despite the single split


Pandas series.str.split throwing error if expand = False despite the single split

I have the below data frame

data = ['Swifter = ALL', 'COM-Swifter Testing & Monitoring = ALL',
   'Poll-Audit = ALL', 'Customer Relationship Management = ARC',
   'Credit Policy = ALL', 'CP-Credit Officers = ALL',
   'COM-Regulatory Swifter = ALL', 'Wrapper-Technology Wrapper = ABLM',
   'COM-Wrapper Assessments = ALL',np.nan,np.nan,np.nan]

df = pd.DataFrame(data=data,columns=['BASE'])
df


        BASE
0   Swifter = ALL
1   COM-Swifter Testing & Monitoring = ALL
2   Poll-Audit = ALL
3   Customer Relationship Management = ARC
4   Credit Policy = ALL
5   CP-Credit Officers = ALL
6   COM-Regulatory Swifter = ALL
7   Wrapper-Technology Wrapper = ABLM
8   COM-Wrapper Assessments = ALL
9   NaN
10  NaN
11  NaN

I just want to split the 'BASE' column into 2 with ' = ' as delimiter but I'm forced to give expand = True regardless the single split. With expand = False I keep getting the Value error for the first 3 versions of the code: ValueError: Columns must be same length as key

Problematic code:

# 1
df[['BASE_TECH', 'BASE_TYPE']]= df['BASE'].str.split(' = ')

# 2
df[['BASE_TECH', 'BASE_TYPE']]= df['BASE'].str.split(' = ', n=1)

# 3
df[['BASE_TECH', 'BASE_TYPE']]= df['BASE'].str.split(' = ', n=2)


ValueError: Columns must be same length as key

Working Code

# 4
df[['BASE_TECH', 'BASE_TYPE']]= df['BASE'].str.split(' = ', expand=True)

# 5
df[['BASE_TECH', 'BASE_TYPE']]= df['BASE'].str.split(' = ', n=1, expand=True)

# 6
df[['BASE_TECH', 'BASE_TYPE']]= df['BASE'].str.split(' = ', n=2, expand=True)

Please clarify why expand = True works fine when I am not splitting further? , also why n value has no effect?


Solution

  • Pandas documentation:

    expand bool, default False

    Expand the split strings into separate columns.

    If True, return DataFrame/MultiIndex expanding dimensionality.

    If False, return Series/Index, containing lists of strings.