Search code examples
pythonpandasdataframenumpyseries

Store np.float64 and np.array values as column values in dataframe


I have two numpy arrays and dataframe as given below

val = np.array([0.501,0.32])
values = np.arange(24).reshape((2,3,4))
input_df = pd.DataFrame(columns=['colname_' + str(i) for i in range(4)])

I would like to

a) Create a new dataframe (dummy) with 3 columns such as ROW_ID, FEATURE NAME, Contribution

b) values for dummy dataframe should be populated using np.array above and column names from input_df`

c) Under the Feature Name column use the input_df column names

b) Populate the val[0] as contribution in dummy dataframe and also use each element from values[0][1] to populate it in contribution column. I tried the below code

pd.DataFrame({
        "Feature Name": ["Base value"] + [f"{col}" for col in df.columns.tolist()],
        "Contribution": (val[0].tolist()) + list(values[0][1])
    })

But I get an error message

TypeError: unsupported operand type(s) for +: 'float' and 'list'

Or I also receive another error which is

ValueError: All arrays must be of the same length

I expect my output to be like as shown below

enter image description here

update - real data issue

enter image description here


Solution

  • Try:

    pd.DataFrame({
      "Feature Name": ["Base value"] + [f"{col}" for col in df.columns.tolist()],
      "Contribution": (val[:1].tolist()) + list(values[0][1])
      #                   ^^^^
    })
    

    val[0] makes it a scalar value, even followed by .tolist()

    >>> type(val[0].tolist())
    <class 'float'>