While going through one kernel on Kaggle regarding Regression in that it was mentioned that the data should look like a normal distribution. But I am not getting why? I know this question might be very basic But please help me to understand this concept.
Thanks in Advance!!
Regression models make a number of assumptions, one of which is normality. When this assumption is violated then your p-values and confidence intervals around your coefficient estimate could be wrong, leading to incorrect conclusions about the statistical significance of your predictors
However, a common misconception is that the data (i.e. the variables/predictors) needs to be normally distributed, but this is not true. These models don't make any assumptions about the distribution of predictors.
For example, imagine a case where you have a binary predictor in regression (Male/Female; Slow/Fast etc.) - it would be impossible for this variable to be normally distributed and yet it is still a valid predictor to use in a regression model. The normality assumption actually refers to the distribution of the residuals, not the predictors themselves