Search code examples
pythonpandasgraphlab

graphlab adding variable columns from existing sframe


I have a SFrame e.g.

a | b
-----
2 | 31 4 5
0 | 1 9
1 | 2 84

now i want to get following result

a | b      | c | d | e
----------------------
2 | 31 4 5 | 31|4  | 5
0 | 1 9    | 1 | 9 | 0
1 | 2 84   | 2 | 84 | 0

any idea how to do it? or maybe i have to use some other tools?

thanks


Solution

  • Using pandas:

    In [409]: sf
    Out[409]: 
    Columns:
        a   int
        b   str
    
    Rows: 3
    
    Data:
    +---+--------+
    | a |   b    |
    +---+--------+
    | 2 | 31 4 5 |
    | 0 |  1 9   |
    | 1 |  2 84  |
    +---+--------+
    [3 rows x 2 columns]
    
    In [410]: df = sf.to_dataframe()
    
    In [411]: newdf =  pd.DataFrame(df.b.str.split().tolist(), columns = ['c', 'd', 'e']).fillna('0')
    
    In [412]: df.join(newdf)
    Out[412]: 
       a       b   c   d  e
    0  2  31 4 5  31   4  5
    1  0     1 9   1   9  0
    2  1    2 84   2  84  0
    

    Converting back to SFrame:

    In [498]: SFrame(df.join(newdf))
    Out[498]: 
    Columns:
        a   int
        b   str
        c   str
        d   str
        e   str
    
    Rows: 3
    
    Data:
    +---+--------+----+----+---+
    | a |   b    | c  | d  | e |
    +---+--------+----+----+---+
    | 2 | 31 4 5 | 31 | 4  | 5 |
    | 0 |  1 9   | 1  | 9  | 0 |
    | 1 |  2 84  | 2  | 84 | 0 |
    +---+--------+----+----+---+
    [3 rows x 5 columns]
    

    If you want integers/floats, you can also do:

    In [506]: newdf =  pd.DataFrame(map(lambda x: [int(y) for y in x], df.b.str.split().tolist()), columns = ['c', 'd', 'e'])
    
    In [507]: newdf
    Out[507]: 
        c   d    e
    0  31   4  5.0
    1   1   9  NaN
    2   2  84  NaN
    
    In [508]: SFrame(df.join(newdf))
    Out[508]: 
    Columns:
        a   int
        b   str
        c   int
        d   int
        e   float
    
    Rows: 3
    
    Data:
    +---+--------+----+----+-----+
    | a |   b    | c  | d  |  e  |
    +---+--------+----+----+-----+
    | 2 | 31 4 5 | 31 | 4  | 5.0 |
    | 0 |  1 9   | 1  | 9  | nan |
    | 1 |  2 84  | 2  | 84 | nan |
    +---+--------+----+----+-----+
    [3 rows x 5 columns]