I am splitting a single df so why is it giving Inconsistent no of samples in X_train, X_test (if that is what the error means)?
X_train, X_test = train_test_split(df[categorical_cols+ numeric_cols], test_size=0.2, random_state=4)
regression = LinearRegression().fit(X_train, X_test)
regression.score(X)
In your example, the method will do something roughly equivalent to the following:
Generate a random number between 0 and 1 for each record
Put records where the random number is below .2 in the test set
Put the rest in the training set
There is some randomness to how many actually get put in the train/test sets because the number of random numbers under .2 won't always be exactly 20%.