Search code examples
pythonlistimportnested-lists

How can I import numeric lists of lists from Excel as actual numeric values, not strings in Python?


Im trying to work with data organized as a lists of lists within each vertical cell of a column, like this. The data type is numeric for all cells in this column, and each sublist has an int and a float.

I am importing the data using read_excel, and would like to be able to access elements of the sublists with indexing eg.

dataframe["ColumnName"][0][0][0] 

to get the first item in the first list

However, the contents of the cells are being read as strings (I also checked this with type()). So for example when I try the indexing above I would get "[", the first character of the string, rather than '2'.

I tried:

  • using pd.to_numeric on the column I got an "unable to parse string" error
  • when I try and iterate through the string and change characters to int(), i get : invalid literal for int() with base 10
  • using dtype = {"colname": float) - It cant (also cannot convert to int)
  • converting to a numpy array and then using arr.astype(float) i get "could not convert string to float".

When I copy and paste the data into the python file as a bunch of lists, it's fine and I can access the elements with indexing, but obviously I don't want to be doing that. I would like to better understand why it's converting the lists to strings, any advice is much appreciated.


Solution

  • Your Excel elements are strings along the lines of "[[2, 2.3], [2, 3.4]]" so cannot be directly read as Python Lists and subLists from which the numeric values can be extracted and converted. You need to first change the string into a true List by using for example:

    from ast import literal_eval
    ml = literal_eval(df['ColName'][0])
    

    and then to get the first value of the first subList:

    print(ml[0][0])
    

    No type conversion is necessary as the value is an int.