I have the following two codes that I wrote and the excel file. I just don't understand how to combine them, so that I can read from the excel file and using those numbers make a cluster.
import matplotlib.pyplot as plt
import xlrd
from matplotlib import style
style.use("ggplot")
from sklearn.cluster import KMeans
fileWorkspace = 'C://Users/bob/Desktop/'
pull=[]
wb1 = xlrd.open_workbook(fileWorkspace + 'try.xlsx')
sh1 = wb1.sheet_by_index(0)
for a in range(0,sh1.nrows):
for b in range(0,sh1.ncols):
pull.append(sh1.cell(a,b).value)
print('Finished in row' + str(a))
x = [11,19,23,33,44,91,92,90,60,63]
y = [92,85,22,25,86,78,63,51,66,15]
X = [list(item) for item in zip(x,y)]
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
centroids = kmeans.cluster_centers_
labels = kmeans.labels_
print(centroids)
print(labels)
colors = ["g.","r.","y.","c.","m.","k."]
for i in range(len(X)):
print("coordinate:",X[i], "label:", labels[i])
plt.plot(X[i][0], X[i][1], colors[labels[i]], markersize = 10)
plt.scatter(centroids[:, 0],centroids[:, 1], marker = "x", s=150, linewidths=5, zorder=10)
plt.show()
The excel file image:
It is complicated because I have to read that single row of data and then make clusters. Also I have to skip rows and columns to read them.
If you're not opposed to using pandas you can use the read_excel function by doing:
import pandas as pd
# Read in data from first sheet
df = pd.read_excel(filename, sheetname=0, parse_cols='B:D', index_col=0, header=[0,1])
That way you can deal with the blank column as well as the header and data labels at the same time. From there you can access the data via df.values
in the form of a numpy array, or get a list of lists (y, x) pairs by doing:
pairs = df.values.tolist()
You can also use xlrd by iterating through the rows and/or columns using the appropriate ranges. For example, if you wanted to read in just the data in your example file into a list of lists you could do something like:
import xlrd
workbook = xlrd.open_workbook(filename)
sheet = workbook.sheet_by_index(0)
pairs = []
# Iterate through rows starting at the 3rd row)
for i in range(2, 15):
# Iterate through columns starting at the 3rd column
pairs.append([sheet.cell(i, j).value for j in range(2, 4)])
There's probably a better way to do this in xlrd, but I rarely ever use it.