Search code examples
pythonpandastransformationdenormalization

Denormalizing a column to a boolean matrix in Pandas?


I'm trying to take a column of values such as:

name  tag
   a    1
   a    2
   b    2
   c    1
   b    3

and ascribe a boolean matrix with new columns, "tag_(val)", such as:

name  tag_1  tag_2  tag_3
   a    T       T      F
   b    F       T      T
   c    T       F      F

How can this be done in Pandas?


Solution

  • You could add a column full of True and then pivot:

    >>> df["val"] = True
    >>> piv = df.pivot("name", "tag", "val").fillna(False)
    >>> piv
    tag       1      2      3
    name                     
    a      True   True  False
    b     False   True   True
    c      True  False  False
    

    Or if you're very particular:

    >>> piv.columns = ["tag_{}".format(x) for x in piv.columns]
    >>> piv = piv.reset_index()
    >>> piv
      name  tag_1  tag_2  tag_3
    0    a   True   True  False
    1    b  False   True   True
    2    c   True  False  False