Search code examples
pythonpandasdataframecategories

Categories in Python with Pandas (special case)


I have some data and want to build some categories.

Now, the data looks like this:

Var     Category
a         cat1
a         cat1
b         cat2
a         cat1
b         cat2
a         cat1

But it should look like this:

Var     Category
  a         cat1
  a         cat1
  b         cat2
  a         cat2
  b         cat3
  a         cat3

So, whenever 'Var' != 'a' 'Category' should move on to the next category and so on. How could I do this?


Solution

  • You can compare for not equal and then add cumulative sum by Series.cumsum, add 1 if necessary, convert to strings and add to cat:

    df['Category'] = 'cat' + df.Var.ne('a').cumsum().add(1).astype(str)
    

    Alternative:

    df['Category'] = 'cat' + (df.Var != 'a').cumsum().add(1).astype(str)
    print (df)
      Var Category
    0   a     cat1
    1   a     cat1
    2   b     cat2
    3   a     cat2
    4   b     cat3
    5   a     cat3