Search code examples
pythonpandasdataframeif-statementextract

Python write an if statement that extracts text from data frame column and shifts values accordingly


I have data in a txt. file that imports like this once I split it into columns:

import pandas as pd
df = pd.read_csv(r'df.txt', header=None)
df.columns = ['Test']
df.drop(wells.tail(1).index, inplace = True)
df = df.Test.str.split(expand=True)

 0          1         2              3          4           5          6
Name        3      20201117         0.00       0.00       0.00 
Name      text        1           20090307     757.69     0.00    2060.00 
Name      text        2           20090308    1141.72     0.00    2099.00 
Name        2      20200908         0.00       0.00       0.00 

I need to write an if statement for my data frame where, if there is text column#1, the text is merged into column#0, and those rows all shift as well:

 0          1         2            3          4           5        
Name        3      20201117       0.00       0.00       0.00 
Name text   1      20090307       757.69     0.00    2060.00 
Name text   2      20090308      1141.72     0.00    2099.00 
Name        2      20200908       0.00       0.00       0.00 

I need this in an "if" statement format as I am running multiple files of the same format through the code, not all of which have text in column#1. Was trying to use str.extract() but couldn't figure it out.


Solution

  • In [2]: df = pd.DataFrame({'0': {0: 'Name', 1: 'Name', 2: 'Name', 3: 'Name'},
       ...:  '1': {0: '3', 1: 'text', 2: 'text', 3: '2'},
       ...:  '2': {0: 20201117, 1: 1, 2: 2, 3: 20200908},
       ...:  '3': {0: 0.0, 1: 20090307.0, 2: 20090308.0, 3: 0.0},
       ...:  '4': {0: 0.0, 1: 757.69, 2: 1141.72, 3: 0.0},
       ...:  '5': {0: 0.0, 1: 0.0, 2: 0.0, 3: 0.0},
       ...:  '6': {0: np.nan, 1: 2060.0, 2: 2099.0, 3: np.nan}})
    
    In [3]: df
    Out[3]: 
          0     1         2           3        4    5       6
    0  Name     3  20201117         0.0     0.00  0.0     NaN
    1  Name  text         1  20090307.0   757.69  0.0  2060.0
    2  Name  text         2  20090308.0  1141.72  0.0  2099.0
    3  Name     2  20200908         0.0     0.00  0.0     NaN
    
    In [4]: x = df.iloc[:, 1]
       ...: rows = x[~x.str.isnumeric()].index.to_list()
       ...: df1 = df.drop(rows).iloc[:, :-1]
       ...: df = df.iloc[rows]
       ...: s = df.iloc[:, :2].agg(" ".join, axis=1)
       ...: s.index = df.index
       ...: df = df.iloc[:, 1:]
       ...: df.iloc[:, 0] = s
       ...: df.columns = df1.columns
       ...: df = pd.concat([df, df1]).sort_index()
       ...: df
    Out[4]: 
               0  1           2        3    4       5
    0       Name  3  20201117.0     0.00  0.0     0.0
    1  Name text  1  20090307.0   757.69  0.0  2060.0
    2  Name text  2  20090308.0  1141.72  0.0  2099.0
    3       Name  2  20200908.0     0.00  0.0     0.0