Search code examples
pandasif-statementstartswith

How to combine startswith and IF for pandas


I am trying to do 2 things - 1st use a startswith in a df to identify which rows contain the condition, and 2nd add a specific TAG to a column. I have been able to do both things correctly but separately - but unsuccessful in combining.

This is an example of the content in the interest column df 'PartNumber' (GLO, GLOA, GLO1, GLOA2, GLO9, etc) the common denominator is the "GLO" and I want the tag to be added "GLO Series".

This code successfully can ID the rows that have the starting condition

search = 'GLO'
search_series = df['PartNumber'].str.startswith(search, na= False)
df[search_series] 

This code successfully can ID the row that has the specific condition - I know it is because of the '=='. The challenge is that I have to call out the specific part, which is what I do not know all values, hence my attempt to do the 'startswith'

df.loc[(df['PartNumber']== 'GLOA2'),'TAG_2']='Glo Series'
print (df)

I tried to combine both which did not work.

I am trying to achieve once completed a kind of "If this / then that" for several conditions summarized as follows:

df [PartNumber] used to ID set conditions-based on starting characters ('GLO', 'FL0', 'MOW') depending on this then a value would be placed in a set column resulting in ('Glo series', 'Flo series', 'Mow series')


Solution

  • Assume that TAG_2 column contains initially some other values (I marked them with ?):

          ind PartNumber  Val TAG_2
    0  110101        ABC   20     ?
    1  110102        GLO   22     ?
    2  110103       GLO1   24     ?
    3  110104       GLO2   26     ?
    4  110105        XYZ   28     ?
    

    To do your task in one go (assuming that you set search to the wanted value) you can run:

    df.TAG_2.mask(df['PartNumber'].str.startswith(search, na=False), 'Glo Series', inplace=True)
    

    The result is:

          ind PartNumber  Val       TAG_2
    0  110101        ABC   20           ?
    1  110102        GLO   22  Glo Series
    2  110103       GLO1   24  Glo Series
    3  110104       GLO2   26  Glo Series
    4  110105        XYZ   28           ?
    

    Part 2

    To detect multiple group prefixes and set TAG_2 to corresponding group names, take a different approach.

    1. Define a Series with index composed of "wanted" prefixes and corresponding group names as values:

      search = pd.Series(['Glo Series', 'Flo Series', 'Mow  Series'],
          index=['GLO', 'FLO', 'MOW'])
      
    2. Define a function to get the part series name from a part number:

      def getSer(PartNo):
          for idx, val in search.iteritems():
              if PartNo.startswith(idx):
                  return val
          return 'Alphabet start' if PartNo < search.index.min() else 'Alphabet end'
      

      Note that if no match has been found in the loop, the last instruction returns either 'Alphabet start' or 'Alphabet end' (as you asked for in your comment).

    3. Generate the result updating TAG_2 (in place) with the result of application of the above function:

      df.TAG_2.update(df.PartNumber.apply(getSer))
      

    This time, to test this code, I used a DataFrame containing:

          ind PartNumber  Val TAG_2
    0  110101      ABC30   20     ?
    1  110102      FLO34   22     ?
    2  110103      GLO10   24     ?
    3  110104      GLO22   26     ?
    4  110105      XYZ26   28     ?
    5  110105      MOW55   30     ?
    6  110105      XYZ99   28     ?
    

    After the above updata the result is:

          ind PartNumber  Val           TAG_2
    0  110101      ABC30   20  Alphabet start
    1  110102      FLO34   22      Flo Series
    2  110103      GLO10   24      Glo Series
    3  110104      GLO22   26      Glo Series
    4  110105      XYZ26   28    Alphabet end
    5  110105      MOW55   30     Mow  Series
    6  110105      XYZ99   28    Alphabet end
    

    If you have more "wanted" groups, change search variable accordingly.