Search code examples
pandasdataframenumpynumpy-ndarrayseries

How to change type of Pandas Series of vectors from str to numerical?


I do have a Series that consists of fixed-sized vectors, but as str. How can I change this series' type to a numerical vector?

Here is the preview of this serie:

The preview of the serie

p.s. The provided answers in a similar question did not help.


Solution

  • As you have nan values, you can use pd.eval:

    out = gen_vec.apply(pd.eval, local_dict={'nan': np.nan})
    

    Use literal_eval from ast module:

    import ast:
    
    out = gen_vec.apply(ast.literal_eval)
    

    Output:

    >>> out
    0    [[0.6304918890918207, -0.5886238157645294, -0....
    1    [[-0.6302182776914216, 0.9368165801475401, 0.7...
    2    [[0.6153572001094536, -0.07547153598238743, -0...
    3    [[0.1583211249108949, -0.07501481771633367, -0...
    4    [[0.9793698091130785, 0.6140448218764745, -0.9...
    dtype: object
    
    >>> out.loc[0]
    [[0.6304918890918207, -0.5886238157645294, -0.3194771085022785],
     [-0.7222439829639373, 0.682891259912199, -0.9084527274979692],
     [0.9372246370318329, -0.8042811128682565, -0.39435908071826065]]
    
    >>> type(out.loc[0])
    list
    

    Input example:

    data = ['[[0.6304918890918207, -0.5886238157645294, -0.3194771085022785], [-0.7222439829639373, 0.682891259912199, -0.9084527274979692], [0.9372246370318329, -0.8042811128682565, -0.39435908071826065]]',
            '[[-0.6302182776914216, 0.9368165801475401, 0.7293141762489015], [-0.10363402231002539, 0.22356716941880794, 0.6796536411142267], [0.739412959837795, 0.3434906849876964, 0.6840523183724572]]',
            '[[0.6153572001094536, -0.07547153598238743, -0.3147739134079086], [-0.4517142976978141, -0.7661353319665889, -0.08218569081022897], [0.21828238409073308, -0.8458822924041092, -0.8100486062713181]]',
            '[[0.1583211249108949, -0.07501481771633367, -0.8430782622316249], [0.11189737816973255, -0.890710343331605, 0.2881597201674384], [-0.8188156405874802, -0.16829948165814113, -0.9222470203602522]]',
            '[[0.9793698091130785, 0.6140448218764745, -0.9485282042022696], [0.7188762127494397, 0.042247790689530884, -0.5645509356734524], [-0.26842956038325627, -0.993030492245303, -0.8585439320376391]]']
    
    gen_vec = pd.Series(data)