I'm trying to find a similar way to perform clustering using Python as I would do using Weka.
I tried scipy, however it gets as input an array.
What I have is a .csv file consisting of
objectId, attribute1, attribute2, .., attributeN
e.g. '1234', 0, 1, 0,1,1,1, ..., 0
Attribute1,2,..,N get values 0 and 1.
Is there a way to load the aforementioned .csv file and perform clustering using a python library and get the cluster each objectId falls into?
My .csv file consists of 300.000 ojectId records.
I have transformed my .csv file into .arff form for weka, but it takes up to 6 hours to perform clustering, so I'm looking for a faster way to do it and was hoping that python library would be faster.
Thanks in advance.
I don't know if is this is what you want but:
To read the .csv:
f = open('yourcsv.csv', mode='r')
content = f.readlines()
Now you can create a list to add all the info
cluster = []
for line in content:
list = line.decode('utf-8').strip().split(',')
cluster[list[0]] = list[1 : len(list) - 1]
// Now you can acces to all the info like this
objectId = 'someIdentifier'
info = cluster[objectId]