Search code examples
pythoncsvscikit-learnjoblib

Make a prediction on csv file, one line at a time


I have a large csv file that i need to take a row of data, one at a time, and score it against a model. I have tried the code below but get an error of "X has 120839 features per sample; expecting 30". I can run the model against the entire dataset and it makes predictions on each row. But i need to do it one line at a time, thank you.

loaded_model = joblib.load('LR_model.sav')
with open(r'fordTestA.csv', "r") as f:

for line in f:
    line = f.readlines()[1:]  ##minus headers
    result = loaded_model.predict(line)

In this scenario, it doesnt seem to split the lines as there is \n after each row. I tried to add

line = line.rstrip('\n')

This gives an error : " 'list' object has no attribute 'rstrip'". Thanks in advance for any feedback.


Solution

  • I'm not familiar with joblib or predict(), but:

    import csv
    
    # other code
    
    with open(r'fordTestA.csv', 'r', newline='') as f:
        rows = csv.reader(f, delimiter=',')
        _ = next(rows) # skip headers
        for row in rows:
            line = list(map(float, row)) # convert row of str to row of float
            results = loaded_model.predict(line)
            # or if you need a ',' delimited string
            line = ','.join(row)
            results = loaded_model.predict(row)