I try to learn matplotlib and stuck on some nuisance. I have these lines:
import os
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
current_dir = os.path.dirname(os.path.abspath(__file__))
csv_path = os.path.join(current_dir, "CSV\\")
df = pd.DataFrame()
df = df.append(pd.read_csv(csv_path + "MainData.csv"), sort=False)
periodB4 = "'2023-05-10' AND '2023-05-13'"
def makeStartEndDates(x):
start_date, end_date = x.split(' AND ')
start_date = start_date.strip()
end_date = end_date.strip()
return [start_date, end_date]
start_date_b4, end_date_b4 = makeStartEndDates(periodB4)
selected_df = df.iloc[:-5, :]
selected_df['date'] = pd.to_datetime(selected_df['date'], format='%Y-%m-%d')
b4period = selected_df.loc[selected_df['date'].between(start_date_b4, end_date_b4)]
# print(b4period)
plt.bar(b4period['date'], b4period['dau'])
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.xticks(rotation=90)
plt.xlabel('Category')
plt.ylabel('Value')
plt.title('Bar Chart Example')
plt.tight_layout()
plt.savefig('chart.png')
So basically I get excess date 2023-05-09 and all other dates are duplicated. And it is only in chart, can't see any of that in csv of df.
How can I avoid that? So that x axis will have dates from '2023-05-10' to '2023-05-13', and they will be shown only once?
Some complications with dates are needed to be used together with some other scripts, to work with BigQuery and SQL.
Here is a sample of csv:
Output of print(b4period.head(10).to_dict('list'))
:
{'date': [Timestamp('2023-05-10 00:00:00'), Timestamp('2023-05-11 00:00:00'), Timestamp('2023-05-12 00:00:00'), Timestamp('2023-05-13 00:00:00')], 'new_users': [2885.0, 2954.0, 3160.0, 4086.0], 'dau': [8627.0, 9112.0, 9318.0, 9327.0], 'wau': [28542.0, 28542.0, 28542.0, 28542.0]}
Given the above code, if you are looking to control the number of ticklabels to be equal to the number of unique dates (and so, the number of bars), why not control the number of ticks using MaxNLocator. Note that this will assign N-1 ticklabels to the plot. So, adding the below line after the set_major_formatter()
line...
plt.gca().xaxis.set_major_locator(plt.MaxNLocator(len(b4period['date'].unique())+1))
...will give you, based on your code and the 4 lines of data (note I remove -5
in the iloc
line), the below plot. Hope this is what you are looking for...