I am trying to do 2 things - 1st use a startswith in a df to identify which rows contain the condition, and 2nd add a specific TAG to a column. I have been able to do both things correctly but separately - but unsuccessful in combining.
This is an example of the content in the interest column df 'PartNumber' (GLO, GLOA, GLO1, GLOA2, GLO9, etc) the common denominator is the "GLO" and I want the tag to be added "GLO Series".
This code successfully can ID the rows that have the starting condition
search = 'GLO'
search_series = df['PartNumber'].str.startswith(search, na= False)
df[search_series]
This code successfully can ID the row that has the specific condition - I know it is because of the '=='. The challenge is that I have to call out the specific part, which is what I do not know all values, hence my attempt to do the 'startswith'
df.loc[(df['PartNumber']== 'GLOA2'),'TAG_2']='Glo Series'
print (df)
I tried to combine both which did not work.
I am trying to achieve once completed a kind of "If this / then that" for several conditions summarized as follows:
df [PartNumber] used to ID set conditions-based on starting characters ('GLO', 'FL0', 'MOW') depending on this then a value would be placed in a set column resulting in ('Glo series', 'Flo series', 'Mow series')
Assume that TAG_2 column contains initially some other values (I marked them with ?):
ind PartNumber Val TAG_2
0 110101 ABC 20 ?
1 110102 GLO 22 ?
2 110103 GLO1 24 ?
3 110104 GLO2 26 ?
4 110105 XYZ 28 ?
To do your task in one go (assuming that you set search to the wanted value) you can run:
df.TAG_2.mask(df['PartNumber'].str.startswith(search, na=False), 'Glo Series', inplace=True)
The result is:
ind PartNumber Val TAG_2
0 110101 ABC 20 ?
1 110102 GLO 22 Glo Series
2 110103 GLO1 24 Glo Series
3 110104 GLO2 26 Glo Series
4 110105 XYZ 28 ?
To detect multiple group prefixes and set TAG_2 to corresponding group names, take a different approach.
Define a Series with index composed of "wanted" prefixes and corresponding group names as values:
search = pd.Series(['Glo Series', 'Flo Series', 'Mow Series'],
index=['GLO', 'FLO', 'MOW'])
Define a function to get the part series name from a part number:
def getSer(PartNo):
for idx, val in search.iteritems():
if PartNo.startswith(idx):
return val
return 'Alphabet start' if PartNo < search.index.min() else 'Alphabet end'
Note that if no match has been found in the loop, the last instruction returns either 'Alphabet start' or 'Alphabet end' (as you asked for in your comment).
Generate the result updating TAG_2 (in place) with the result of application of the above function:
df.TAG_2.update(df.PartNumber.apply(getSer))
This time, to test this code, I used a DataFrame containing:
ind PartNumber Val TAG_2
0 110101 ABC30 20 ?
1 110102 FLO34 22 ?
2 110103 GLO10 24 ?
3 110104 GLO22 26 ?
4 110105 XYZ26 28 ?
5 110105 MOW55 30 ?
6 110105 XYZ99 28 ?
After the above updata the result is:
ind PartNumber Val TAG_2
0 110101 ABC30 20 Alphabet start
1 110102 FLO34 22 Flo Series
2 110103 GLO10 24 Glo Series
3 110104 GLO22 26 Glo Series
4 110105 XYZ26 28 Alphabet end
5 110105 MOW55 30 Mow Series
6 110105 XYZ99 28 Alphabet end
If you have more "wanted" groups, change search variable accordingly.