pythonpandaslistdictionarycount

How to ascribe the value count of a list item to a new column - pandas


Imagine that I have a dataset df with one column containing a dictionary with two list types (list_A and list_B) as value:

data = [{"list_A": [2.93, 4.18, 4.18, None, 1.57, 1.57, 3.92, 6.27, 2.09, 3.14, 0.42, 2.09],
         "list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]},
        {"list_A": [2.51, 3.58, 3.58, None, 1.34, 1.34, 3.36, 5.37, 1.79, 2.69, 0.36, 1.79],
         "list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]},
        {"list_A": [None, 5.94, 5.94, None, 2.23, 2.23, 5.57, 8.9, 2.97, 4.45, 0.59, 2.97],
         "list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]}]

# Create a DataFrame with a column named "column_dic"
df = pd.DataFrame({"column_dic": [data]})

Now, I want to create an additional column count_first_item that contains the count of non-Null values of the first item ([0]) of the lists that correspond to "List_A".

The expected output of this is 2 (2.93 = +1; 2.51 = +1; None = 0).


Solution

  • Use list comprehension for get first values of list_A, test non missing values by notna and count Trues by sum:

    df['count_first_item'] = [pd.notna([y['list_A'][0] for y in x]).sum() 
                              for x in df['column_dic']]
    print (df)
                                              column_dic  count_first_item
    0  [{'list_A': [2.93, 4.18, 4.18, None, 1.57, 1.5...                 2
    

    Or use Series.explode, get values of lists by str or Series.str.get, get first values by indexing - str[0] and count non missing values by DataFrameGroupBy.count:

    df['count_first_item'] = (df['column_dic'].explode().str.get('list_A').str[0]
                                              .groupby(level=0).count())
    print (df)
                                              column_dic  count_first_item
    0  [{'list_A': [2.93, 4.18, 4.18, None, 1.57, 1.5...                 2