I have managed to get two histograms to overlay but if you look closely, the bars start to skew and don't overlap exactly.
I have adjusted line width and width, and it hasn't improved it.
My goal is for all the bars to line up on top of each other with no skewing of the black edges.
Any ideas how to fix this
Here is my code:
import matplotlib.pyplot as plt
import numpy
True_Distance = sort_by_Distance_below_4kpc_and_retrabmag_no_99s["true distance"].tolist()
Retr_Distance = sort_by_Distance_below_4kpc_and_retrabmag_no_99s["retrieved distance from observed parallax"].tolist()
plt.figure(figsize=(8,6))
plt.hist(True_Distance, normed=True, bins = 40, alpha=0.75, color = "mediumorchid", label="True Distance", edgecolor='black', linewidth=0.1, width=200)
plt.hist(Retr_Distance, normed=True, bins = 20, alpha=0.5, color = "lightskyblue", label="Retrieved Distance", edgecolor='black', linewidth=0.1, width=200)
# Add title and axis names
plt.title('Number distribution of stars with distance')
plt.xlabel('Distance (parsecs)')
plt.ylabel('Number of stars')
plt.legend()
Following is the output:
'distance'
categories (e.g. 'methods'
) and values are provided separately in a tidy format, the seaborn.histplot
API will correctly align the bin edges of the various categories, when using the hue
parameter.
df = sort_by_Distance_below_4kpc_and_retrabmag_no_99s[['true distance', 'retrieved distance from observed parallax']].stack().reset_index(level=1).rename(columns={'level_1': 'method', 0: 'distance'})
seaborn
is a high-level API for matplotlib
.seaborn
sample datasets, and is explained at NASA Exoplanet Explorations. Distance is light years from Earth.plants
dataset coincides nicely with you star distance dataset. Here, there are several values for 'method'
.import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams["patch.force_edgecolor"] = True
# import some test data
df = sns.load_dataset('planets')
# display(df.head())
method number orbital_period mass distance year
0 Radial Velocity 1 269.300 7.10 77.40 2006
1 Radial Velocity 1 874.774 2.21 56.95 2008
2 Radial Velocity 1 763.000 2.60 19.84 2011
3 Radial Velocity 1 326.030 19.40 110.62 2007
4 Radial Velocity 1 516.220 10.50 119.47 2009
'methods'
togetherbins
is specified, the edges always alignfig, (ax1, ax2, ax3) = plt.subplots(nrows=3, figsize=(10, 10))
data = df[df.distance < 801]
sns.histplot(data=data, x='distance', hue='method', ax=ax1, bins=np.arange(0, 801, 80))
sns.histplot(data=data, x='distance', hue='method', ax=ax2, bins=20)
sns.histplot(data=data, x='distance', hue='method', ax=ax3)
'method'
individually and plotax2
, when the edges are defined the same for both sets of data.sns.histplot
, without using hue
, is "mostly" equivalent to plotting with plt.hist(...)
bins
: sns.hist
uses auto
and plt.hist
defaults to 10, as pointed out by mwaskom, the creator of seaborn
.# create a dataframe for two values from the method column
radial = data[data.method == 'Radial Velocity']
transit = data[data.method == 'Transit']
fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, figsize=(10, 10))
# number of bins and edges determined by the API
sns.histplot(data=transit, x='distance', color="lightskyblue", ax=ax1)
sns.histplot(data=radial, x='distance', color="mediumorchid", ax=ax1)
# bin edges defined the same for both plots
sns.histplot(data=transit, x='distance', bins=np.arange(0, 801, 40), color="lightskyblue", ax=ax2)
sns.histplot(data=radial, x='distance', bins=np.arange(0, 801, 40), color="mediumorchid", ax=ax2)
# a number of bins is specifice, edges determined by API based on the data
sns.histplot(data=transit, x='distance', bins=20, color="lightskyblue", ax=ax3)
sns.histplot(data=radial, x='distance', bins=20, color="mediumorchid", ax=ax3)