Search code examples
pythonpandasstringdataframesubstring

Printing substring from one dataframe to another dataframe


In the df_input['Visit' ] column, there are three different timepoints that I would like to extract and have print into a new dataframe (df_output). The time points are Pre, Post, and Screening.

data_input

data_output

I essentially would like to make a for loop (or just a single code strand) stating:

if data_input['Visit '] contains the word "Pre", print "Pre" in df_output['VISIT'] elif data_input['Visit '] contains the word "Post", print "Post" in df_output['VISIT'] else data_input['Visit '] contains the word "Screening", print "Screening" in df_output['VISIT']

I am just not sure of the proper way to do that.

So far, the only thing I have is this line of code:

df_output['VISIT'] = df_input[df_input['Visit '].str.contains('Pr|Po|Sc'))

that gives the error message "Columns must be the same length as key"

I've also tried: df_output['VISIT'] = df_input['Visit '].str.contains('Pr|Po|Sc'), which prints True or False into my output dataframe.


Solution

  • you can use np.select

    import numpy
    
    df_output['VISIT'] = np.select( [df_input['Visit'].str.contain('Pr'), # condition #1
                                    df_input['Visit'].str.contain('Po'),  # condition #2
                                    df_input['Visit'].str.contain('Sc')], # condition #3
                                    ['Pre','Post','Screening'], # corresponding value when true
                                    '')# default value