Search code examples
pythonpandasdataframecounter

Build a pandas Dataframe from multiple "Counter" Collection objects


I am working with sequence DNA data, and I would like to count the frequency of each letter (A,C,G,T) on each sequence in my dataset.

For doing so, I have tried the following using Counter method from Collections package, with good results:

df = []
for seq in pseudomona.sequence_DNA:
    df.append(Counter(seq))

[Counter({'C': 2156779, 'A': 1091782, 'G': 2143630, 'T': 1090617}),
 Counter({'T': 1050880, 'G': 2083283, 'C': 2101448, 'A': 1055877}),
 Counter({'C': 2180966, 'A': 1111267, 'G': 2176873, 'T': 1108010}),
 Counter({'C': 2196325, 'G': 2204478, 'A': 1128017, 'T': 1123038}),
 Counter({'T': 1117153, 'C': 2176409, 'A': 1115003, 'G': 2194606}),
 Counter({'G': 2054304, 'A': 1026830, 'T': 1044090, 'C': 2020029})]

However, I do obtain a list of Counter instances (sorry if that's not the right terminology) and I would like to have a sorted data frame with those frequencies like, for instance:

A C G T
2237 4415 124 324
4565 8567 3776 623

I have tried to convert it into a list of lists but then I can not figure out how to transform it into a pandas Dataframe:

[list(items.items()) for items in df]

[[('C', 2156779), ('A', 1091782), ('G', 2143630), ('T', 1090617)],
 [('T', 1050880), ('G', 2083283), ('C', 2101448), ('A', 1055877)],
 [('C', 2180966), ('A', 1111267), ('G', 2176873), ('T', 1108010)],
 [('C', 2196325), ('G', 2204478), ('A', 1128017), ('T', 1123038)],
 [('T', 1117153), ('C', 2176409), ('A', 1115003), ('G', 2194606)],
 [('G', 2054304), ('A', 1026830), ('T', 1044090), ('C', 2020029)]]

It might be something foolish, but I can't figure out how to do it properly. Hope someone has the right clue! :)


Solution

  • Make a series out of each, and use pd.concat with axis, and tranpose:

    df = pd.concat([pd.Series(c) for c in l], axis=1).T
    

    Output:

    >>> df
             C        A        G        T
    0  2156779  1091782  2143630  1090617
    1  2101448  1055877  2083283  1050880
    2  2180966  1111267  2176873  1108010
    3  2196325  1128017  2204478  1123038
    4  2176409  1115003  2194606  1117153
    5  2020029  1026830  2054304  1044090