Search code examples
pythonpandasdataframenumerical

How to convert a column of string to numerical?


I have this pandas dataframe from a query:

|    name    |    event    |
----------------------------
| name_1     | event_1     |
| name_1     | event_2     |
| name_2     | event_1     |

I need to convert the column event to numerical, or something to look like this:

| name    | event_1 | event_2 |
-------------------------------
| name_1  | 1       | 0       |
| name_1  | 0       | 1       |
| name_2  | 1       | 0       |

In the software rapidminer, i can do this with an operator "nominal to numerical", so i assume that in python convert the type of the column should be effective, but i can be mistaken.

In the final, the idea is make a sum on the columns value with same name and have as result a table that should look like this:

| name    | event_1 | event_2 |
-------------------------------
| name_1  | 1       | 1       |
| name_2  | 1       | 0       |

There is a function that returns what a expected?

important: i can't do a simple count of the events because i do not know them, and the events is different for the users

EDIT: well thanks guys, i can see there is multiple ways to do this, can you guys say which one of these is the most pythonic way?


Solution

  • Some ways of doing it

    1)

    In [366]: pd.crosstab(df.name, df.event)
    Out[366]:
    event   event_1  event_2
    name
    name_1        1        1
    name_2        1        0
    

    2)

    In [367]: df.groupby(['name', 'event']).size().unstack(fill_value=0)
    Out[367]:
    event   event_1  event_2
    name
    name_1        1        1
    name_2        1        0
    

    3)

    In [368]: df.pivot_table(index='name', columns='event', aggfunc=len, fill_value=0)
    Out[368]:
    event   event_1  event_2
    name
    name_1        1        1
    name_2        1        0
    

    4)

    In [369]: df.assign(v=1).pivot(index='name', columns='event', values='v').fillna(0)
    Out[369]:
    event   event_1  event_2
    name
    name_1      1.0      1.0
    name_2      1.0      0.0