Search code examples
pythoncsvcluster-analysisweka

Python clustering from .csv file as input


I'm trying to find a similar way to perform clustering using Python as I would do using Weka.

I tried scipy, however it gets as input an array.

What I have is a .csv file consisting of

objectId, attribute1, attribute2, .., attributeN
e.g. '1234', 0, 1, 0,1,1,1, ..., 0

Attribute1,2,..,N get values 0 and 1.

Is there a way to load the aforementioned .csv file and perform clustering using a python library and get the cluster each objectId falls into?

My .csv file consists of 300.000 ojectId records.

I have transformed my .csv file into .arff form for weka, but it takes up to 6 hours to perform clustering, so I'm looking for a faster way to do it and was hoping that python library would be faster.

Thanks in advance.


Solution

  • I don't know if is this is what you want but:

    To read the .csv:

    f = open('yourcsv.csv', mode='r')
    
    content = f.readlines()
    

    Now you can create a list to add all the info

    cluster = []
    
    for line in content:
      list = line.decode('utf-8').strip().split(',')
      cluster[list[0]] = list[1 : len(list) - 1]
    

    // Now you can acces to all the info like this

    objectId = 'someIdentifier'
    
    info = cluster[objectId]