Search code examples
pythonpandassubstringnan

Using Pandas, make a new column from the string slice of another column -- getting NAN


I want to create a new column from an extracting Data Frame (DF) column. All my testing indicates the values I am using are correct and should produce a level1 value vs NAN. Help!

CODE SNIPPET:

import pandas as pd
string = df['currentagentsnapshot']
start  = string.str.find('agent-group') + 55
stop   = string.str.find('}, level2=')
df['start']  = string.str.find('agent-group') + 55
df['stop']   = string.str.find('}, level2=')
df['level1'] = string.str[df['start']:df['stop']]
print(df.head())

SAMPLE OUTPUT OF KEY FIELDS:

awsaccountid start stop level1
992974280925 410 414 NaN
992974280925 410 414 NaN
992974280925 410 414 NaN
992974280925 408 412 NaN
992974280925 408 412 NaN

Note: df['currentagentsnapshot'] is a LARGE text string. As long as start and stop are both numbers -- and stop > start -- I would expect string.str[df['start']:df['stop']] to produce the expected result.

Running the above script produces NAN instead of the expected string value.
All the examples I have checked on the WEB reference constant vs calculated values.
When I substitute constant for calculated values in string.str[start : stop] it works.


Solution

  • data = { 'currentagent': [ "some large text with agent-group info and }, level2=more text", "another example with agent-group data here and }, level2=continued", "yet another string agent-group details and }, level2=info", "text with agent-group data and }, level2=more", "last example of agent-group information and }, level2=content" ] } df = pd.DataFrame(data)

    def extract_level1(row): start = row['currentagent'].find('agent-group') + 55 stop = row['currentagent'].find('}, level2=') if start != -1 and stop != -1 and stop > start: return row['currentagentsnapshot'][start:stop] else: return None

    df['level1'] = df.apply(extract_level1, axis=1)

    print(df)