I have 3 different lists named : "pos_date_data", "neu_date_data", "neg_date_data". (each means positive, neutral, and negative)
they contains datetime value in dataframe shape.
below is how they look like when printed list. and each of they have different lengths.
datetime
0 2018-07-04
1 2018-07-04
2 2018-07-04
3 2018-07-04
4 2018-07-04
.. ...
212 2020-02-02
213 2020-02-02
214 2020-02-03
215 2020-02-04
216 2020-02-07
[217 rows x 1 columns]
datetime
0 2018-07-04
1 2018-07-04
2 2018-07-04
3 2018-07-04
4 2018-07-04
... ...
1527 2020-02-09
1528 2020-02-10
1529 2020-02-11
1530 2020-02-11
1531 2020-02-12
[1532 rows x 1 columns]
From them I'm trying to merge them in one DataFrame using pandas, python.
for that, I have to count the months by dates in whole range(2018-07 ~ 2020.03). for example : if the total date number of 2019-05 is 15 in the neu_date_data, I want to display it as integer number in a table.
I tried this code:
df = pd.crosstab(neg_date_data['datetime'].dt.month.rename('m'),
neg_date_data['datetime'].dt.year.rename('y'))
and printed as: in shape of crosstable, and each items stands for the number of dates in months.
y 2018 2019 2020
m
1 0 1 17
2 0 0 2
3 0 1 0
4 0 3 0
5 0 12 0
6 0 13 0
7 25 16 0
8 0 36 0
9 0 2 0
10 1 8 0
11 1 5 0
12 2 4 0
I want to fix the code in 3 ways:
merge 3 lists and display them once.
having entire 'year' and 'month' data as columns.
for 3 rows, heading of each lists : 'pos', 'neu', 'neg', not years.
First convert datetimes to months periods by Series.dt.to_period
and then use Series.value_counts
, last concat
together:
I think for ploting should be better use months periods in rows and types in columns:
neg = neg_date_data['datetime'].dt.to_period('m').value_counts()
neu = neu_date_data['datetime'].dt.to_period('m').value_counts()
df = pd.concat([neg, neu], axis=1, keys=('neg','neu'))
print (df)
neg neu
2020-02 5 5
2018-07 5 5
df.plot()