Search code examples
pythonpandasdataframepivotcrosstab

how to merge lists? (pandas DataFrame)


I have 3 different lists named : "pos_date_data", "neu_date_data", "neg_date_data". (each means positive, neutral, and negative)

they contains datetime value in dataframe shape.

below is how they look like when printed list. and each of they have different lengths.

 datetime
0   2018-07-04
1   2018-07-04
2   2018-07-04
3   2018-07-04
4   2018-07-04
..         ...
212 2020-02-02
213 2020-02-02
214 2020-02-03
215 2020-02-04
216 2020-02-07

[217 rows x 1 columns]

    datetime
0    2018-07-04
1    2018-07-04
2    2018-07-04
3    2018-07-04
4    2018-07-04
...         ...
1527 2020-02-09
1528 2020-02-10
1529 2020-02-11
1530 2020-02-11
1531 2020-02-12

[1532 rows x 1 columns]

From them I'm trying to merge them in one DataFrame using pandas, python.

for that, I have to count the months by dates in whole range(2018-07 ~ 2020.03). for example : if the total date number of 2019-05 is 15 in the neu_date_data, I want to display it as integer number in a table.

I tried this code:

df = pd.crosstab(neg_date_data['datetime'].dt.month.rename('m'),
                 neg_date_data['datetime'].dt.year.rename('y'))

and printed as: in shape of crosstable, and each items stands for the number of dates in months.

y   2018  2019  2020
m                   
1      0     1    17
2      0     0     2
3      0     1     0
4      0     3     0
5      0    12     0
6      0    13     0
7     25    16     0
8      0    36     0
9      0     2     0
10     1     8     0
11     1     5     0
12     2     4     0

I want to fix the code in 3 ways:

  1. merge 3 lists and display them once.

  2. having entire 'year' and 'month' data as columns.

  3. for 3 rows, heading of each lists : 'pos', 'neu', 'neg', not years.


Solution

  • First convert datetimes to months periods by Series.dt.to_period and then use Series.value_counts, last concat together:

    I think for ploting should be better use months periods in rows and types in columns:

    neg = neg_date_data['datetime'].dt.to_period('m').value_counts()
    neu = neu_date_data['datetime'].dt.to_period('m').value_counts()
    
    df = pd.concat([neg, neu], axis=1, keys=('neg','neu'))
    print (df)
             neg  neu
    2020-02    5    5
    2018-07    5    5
    
    df.plot()