I got problem that when I did describe()
or head()
on my dataset.csv, the dataset output shows there is no column in it. And I already tried using split()
and strip()
but still same
So this is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import csv
directory = os.path.join('drive/My Drive/BCML/winequality-white.csv')
text_file = open(directory, "r")
lines = text_file.readlines()
line = line.strip(";")
line = line.strip()
parsed_values = []
for index, line in enumerate(lines):
split_columns = line.split(";")
if index == 0:
continue
parsed_values.append([float(split_columns[0]), float(split_columns[1]), float(split_columns[2]), float(split_columns[3]), float(split_columns[4]), float(split_columns[5]), float(split_columns[6]), float(split_columns[7]), float(split_columns[8]), float(split_columns[9]), float(split_columns[10]), int(split_columns[11])])
dataset = pd.read_csv(directory)
Additional :
I need to use this dataset just for this case. So I can't search and replace with same dataset on internet. The .csv file is structured normally when I open it with Excel. And I don't know why when I change split
in for
loop into split_columns = line.split(",")
, I will get this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-fb83c4e7dfad> in <module>()
8 continue
9
---> 10 parsed_values.append([float(split_columns[0]), float(split_columns[1]), float(split_columns[2]), float(split_columns[3]), float(split_columns[4]), float(split_columns[5]), float(split_columns[6]), float(split_columns[7]), float(split_columns[8]), float(split_columns[9]), float(split_columns[10]), int(split_columns[11])])
11
12 print("Jumlah data (parsed) ", len(parsed_values))
ValueError: could not convert string to float: '7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6\n'
I really appreciate every help I can get
Seems like you have a custom separator in the csv file. you can simply add a sep argument and set it to ; like this
dataset = pd.read_csv(directory,sep=";")