python-3.x linear-regression train-test-split

Found input variables with inconsistent numbers of samples: [799996, 199999]

I am splitting a single df so why is it giving Inconsistent no of samples in X_train, X_test (if that is what the error means)?

X_train, X_test = train_test_split(df[categorical_cols+ numeric_cols], test_size=0.2, random_state=4)
regression = LinearRegression().fit(X_train, X_test)
regression.score(X)

Solution

In your example, the method will do something roughly equivalent to the following:

Generate a random number between 0 and 1 for each record
Put records where the random number is below .2 in the test set
Put the rest in the training set

There is some randomness to how many actually get put in the train/test sets because the number of random numbers under .2 won't always be exactly 20%.

Macro VS Micro VS Weighted VS Samples F1 Score
Receiving Import Error: No Module named ***, but has __init__.py
Hex string to signed int in Python
How to convert a hex string to signed integer in Python
server in a thread (Python3.9.0+aiohttp) : RuntimeError: can't register atexit after shutdown
How to display a mathematical equation in a pretty way in Python 3
Python Add data from array to prettytable via column
"Unterminated string literal" (when it is terminated) - Why does Python parsing differ from syntax highlighting on VSCode
How to setup FastAPI comunication with firestore through Pyrebase4
pyTelegramBotAPI disable link preview
How to fix: cx_Oracle.DatabaseError: DPI-1047: Cannot locate a 64-bit Oracle Client library - Python
Encode URL with nested dictionaries
How to send base64 image using Python requests and FastAPI?
Jupyter doesn't let me type the left square bracket
Calculate Python Enum each time when evaluated with datetime?
How can i display the python scipt data within flask application
deep copy of list in python
Using tqdm progress bar in a while loop
Import text file from geonames using pandas python
Is there an add-element-to-a-set method returning whether the element was actually added?
google.cloud namespace import error in __init__.py
Flask redirect with data
What's the best way to import a module from a directory that's not a package?
Python Sqlalchemy insert data into AWS Redshift
Facing GSKit Error: 17". SQLSTATE=42724 SQLCODE=-1109 Error While Connecting from Python to IDB DB2
How to run FastAPI on apache2?
How to run another application within the same running event loop?
How to get an integer from a tkinter entry box?
Matplotlib runs out of memory
Copying in python