python pandas jupyter-notebook cluster-analysis

K clustering from Panda Dataframe instead of Sample Data

I have a csv that I imported as a panda into my juypter notebook. The Panda has 12 columns, one "Timestamp" and then 11 columns with data for different countries.

What I am trying to do is to visualize them in a cluster. I found the following code to create a basic cluster. But I am struggling already for a while to do exactly the same as below with my data. Anyone know what I have to change to replace the random sample data with my panda data?

import numpy as np
import pandas as pd
import datetime as dt
from pylab import mpl, plt
plt.style.use('seaborn')
mpl.rcParams['font.family'] = 'serif'
np.random.seed(1000)

from sklearn.datasets.samples_generator import make_blobs

X, y = make_blobs(n_samples=250, centers=4, random_state=500, cluster_std=1.25) 

plt.figure(figsize=(10,6))
plt.scatter(X[:,0], X[:,1], s=50);

Solution

Assuming you want to do a one-dimensional cluster, you can do something like this:

someDF = pd.read_csv("myFile.csv")
print(someDF.columns)
columnsOfInterest = ['Austria', 'Norway', 'Belgium', 'Sweden', 'Spain']
plt.figure(figsize=(10,6))
for c in columnsOfInterest:
    plt.scatter(someDF[c], someDF[c])
plt.show()

If you want to do 2d clustering (which is more typical) you can do something like:

someDF = pd.read_csv("myFile.csv")
print(someDF.columns)
columnsOfInterest = ['Austria', 'Norway', 'Belgium', 'Sweden', 'Spain']
secondColumn = 'OtherColumn'
plt.figure(figsize=(10,6))
for c in columnsOfInterest:
    plt.scatter(someDF[c], someDF[secondColumn])
plt.show()