Search code examples
pythongoogle-colaboratory

How to import .dat file in Google Co-lab


I am implementing famous Iris classification problem in python for 1st time. I have a data file namely iris.data. I have to import this file in my python project. I try my hand in Google Colab.

Sample data

Attributes are:

1.sepal length in cm 2. sepal width in cm 3. petal length in cm 4. petal width in cm 5. class:

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa

I worte

import torch
import numpy as np
import matplotlib.pyplot as plt
FILE_PATH = "E:\iris dataset"
MAIN_FILE_NAME = "iris.dat"

data = np.loadtxt(FILE_PATH+MAIN_FILE_NAME, delimiter=",")

But it did not work and through errors.

But it worked when I wrote the code in Linux. But currently I am using windows 10 and it did not work.

Thank you for help in advance.


Solution

  • When constructing the file name for np.loadtxt, there is a \ missing, as FILE_PATH+MAIN_FILE_NAME = 'E:\iris_datasetiris.dat. To avoid having to add \manually between FILE_PATH and MAIN_FILE_NAME, you could use os.path.join, which does this for you.

    import os
    import numpy as np
    
    FILE_PATH = 'E:\iris dataset'
    MAIN_FILE_NAME = 'iris.dat'
    
    data = np.loadtxt(os.path.join(FILE_PATH, MAIN_FILE_NAME), delimiter=',')  # not actually working due to last column of file
    

    On the other hand, I am not sure why it did work with Linux, because numpy is not able to convert the string "Iris-setosa" into a number, which np.loadtxt tries to do. If you are only interested in the numeric values, you could use the usecols keyword of np.loadtxt

    data = np.loadtxt(os.path.join(FILE_PATH, MAIN_FILE_NAME), delimiter=',', usecols=(0, 1, 2, 3))