Search code examples
pythonpandasgoogle-colaboratory

Columns in Dataframe not splitted


I got problem that when I did describe() or head() on my dataset.csv, the dataset output shows there is no column in it. And I already tried using split() and strip() but still same

So this is my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import csv

directory = os.path.join('drive/My Drive/BCML/winequality-white.csv')

text_file = open(directory, "r")
lines = text_file.readlines()

line = line.strip(";")
line = line.strip()

parsed_values = [] 

for index, line in enumerate(lines):
  split_columns = line.split(";")


  if index == 0:
    continue
  
  parsed_values.append([float(split_columns[0]), float(split_columns[1]), float(split_columns[2]), float(split_columns[3]), float(split_columns[4]), float(split_columns[5]), float(split_columns[6]), float(split_columns[7]), float(split_columns[8]), float(split_columns[9]), float(split_columns[10]), int(split_columns[11])])

dataset = pd.read_csv(directory)

And the output is: enter image description here

And: enter image description here

Additional : I need to use this dataset just for this case. So I can't search and replace with same dataset on internet. The .csv file is structured normally when I open it with Excel. And I don't know why when I change split in for loop into split_columns = line.split(","), I will get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-fb83c4e7dfad> in <module>()
      8     continue
      9 
---> 10   parsed_values.append([float(split_columns[0]), float(split_columns[1]), float(split_columns[2]), float(split_columns[3]), float(split_columns[4]), float(split_columns[5]), float(split_columns[6]), float(split_columns[7]), float(split_columns[8]), float(split_columns[9]), float(split_columns[10]), int(split_columns[11])])
     11 
     12 print("Jumlah data (parsed) ", len(parsed_values))

ValueError: could not convert string to float: '7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6\n'

I really appreciate every help I can get


Solution

  • Seems like you have a custom separator in the csv file. you can simply add a sep argument and set it to ; like this

    dataset = pd.read_csv(directory,sep=";")