I am trying to plot age distribution regarding survived, sex, class variables.
from matplotlib import pyplot
import seaborn
titanic= seaborn.load_dataset("titanic")
g = seaborn.catplot(data = titanic, x = 'survived', y = 'age',
hue = 'sex', split = True,
row='class', kind ='violin', legend = False)
Result is shown in the picture below.
If you see the age distribution of the first subplot where I draw a circle around, you can see that it is plotted on negative number which doesn't make sense.
How can I solve this problem? Age data does not contain any negative numbers.
The particular violin plot you circled is based on only 3 values: [2, 25, 50]. The violin plot draws a kernel density estimate obtained with these 3 points. In your case, the KDE has a significant portion below zero.
If you want, you can limit the plotting range of the violin plots to the range of the observed data by adding the parameter cut = 0
(cf. violinplot).