Search code examples
pythonpandasif-statementcharactergeopandas

Check if column contains (/,-,_, *or~) and split in another column - Pandas


I have a column with numbers separated by one of these characters : -,/,*,~,_. I need to check if the values contain any of the characters, then split the value in another column.

Is there a different solution than shown below?

In the end, columns subnumber1, subnumber2 ...subnumber5 will be merged in one column and column number5 will be without characters. I need to use those two columns in further process.

if gdf['column_name'].str.contains('~').any():
    gdf[['number1', 'subnumber1']] = gdf['column_name'].str.split('~', expand=True)
gdf
if gdf['column_name'].str.contains('^').any():
    gdf[['number2', 'subnumber2']] = gdf['column_name'].str.split('^', expand=True)
gdf
Input column:
column_name  
152/6*3
163/1-6
145/1
163/6^3

output:
 number5 |subnumber1 |subnumber2
152      | 6         |  3
163      | 1         |  6
145      | 1         |
163      | 6         |  3

Solution

  • Use Series.str.split with list of possible separators and create new DataFrame:

    import re
    
    L = ['-','/','*','~','_','^', '.']
    
    #some values like `^.` are escape
    pat = '|'.join(re.escape(x) for x in L)
    df = df['column_name'].str.split(pat, expand=True).add_prefix('num')
    print (df)
      num0 num1  num2
    0  152    6     3
    1  163    1     6
    2  145    1  None
    3  163    6     3
    

    EDIT: If need match values before value use:

    L = ["\-_",'\^|\*','~','/']
    
    for val in L:
        df[f'before {val}'] = df['column_name'].str.extract(rf'(\d+){[val]}')
    
    #for last value not exist separator, so match $ for end of string
    df['last'] = df['column_name'].str.extract(rf'(\d+)$')
    print (df)
       column_name before \-_ before \^|\* before ~ before / last
    0  152/2~3_4*5          3            4        2      152    5
    1  152/2~3-4^5          4            4        2      152    5
    2      152/6*3        NaN            6      NaN      152    3
    3      163/1-6        NaN          NaN      NaN      163    6
    4        145/1        NaN          NaN      NaN      145    1
    5      163/6^3          6            6      NaN      163    3