Search code examples
pythonpandaslist-comprehensionvalueerror

List comprehension with dataframe condition; ValueError: Item wrong length


I am trying to use list comprehension to create a list of DataFrames where the item that I append is the DataFrame[condition = True]. However, I get a Value Error:

list_of_dataframes = [df0[(df0['Names'].values == my_list_of_names[i])] for i in range(len(my_list_of_names))]

File "/home/josep/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2986, in getitem return self._getitem_bool_array(key)

File "/home/josep/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3033, in _getitem_bool_array "Item wrong length %d instead of %d." % (len(key), len(self.index))

ValueError: Item wrong length 233 instead of 234.

For a list comprehension, the syntax goes like:

new_list = []
for i in old_list:
    if filter(i):
        new_list.append(expressions(i))

Which is rewritten as:new_list = [expression(i) for i in old_list if filter(i)]

So, now my for bucle is:

my_list_of_names = pd.DataFrame('0': ['Jou', 'Lara'])
d = {'Names': ['John', 'Lara', 'Ari', 'Jou'], 'col2': [1, 2, 2, 2], 'col3': [1, 2 ,3, 4], 'col4': [2,1,1,1,], 'col5': [2,1,0,0], 'col6': [2,1,3,1]}
df0 = pd.DataFrame(data=d)

list_of_dataframes = []
for i in range(len(my_list_of_names)):
    df_i = df0[(df0['Names'].values ==
                  my_list_of_names.values[i])]
    list_of_dataframes.append(df_i)

Which can be written as:

list_of_dataframes = [df0[(df0['Names'].values == 
 my_list_of_names.values[i])] for i in range(len(my_list_of_names))]

And this works completely fine. But if I try to simplify my code by changing the type of my_list_of_names, which is a DataFrame, to a list type:

my_list_of_names2 = ['Jou', 'Lara']  # IS A LIST
list_of_df = [df0[(df0['Names'].values ==
                   my_list_of_names2[measure])
                  ] for measure in range(len(my_list_of_names2))]

It raises a Value Error:

runcell(7, '~/sample.py') Traceback (most recent call last):

File "~/sample.py", line 263, in for measure in range(len(my_list_of_names2))]

File "~/sample.py", line 263, in for measure in range(len(my_list_of_names2))]

File "~home/josep/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2986, in getitem return self._getitem_bool_array(key)

File "/home/josep/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3033, in _getitem_bool_array "Item wrong length %d instead of %d." % (len(key), len(self.index))

ValueError: Item wrong length 233 instead of 234.

NOTE: The real list and dataframe are different, but for the sake of the question I thought to be easier to put shorter ones.


Solution

  • This might not be the best solution, but I believe it solves your direct example.

    for name in my_list_of_names:
        df_i = df0[df0['Names'] == name]
        list_of_dataframes.append(df_i)