Search code examples
pythondataframecluster-analysislinear-regressionk-means

encountering as error when trying to create a artificial dataframe in Python


This is my first post and pardon me for any misses from my end.

Was trying to create an artificial data frame to use k-means clustering. Getting this error while running the data set creating function and viewing the data frame getting error as below.

TypeError: _append_dispatcher() missing 1 required positional argument: 'values'

I would appreciate your support and help to resolve.

from scipy.stats import norm 
import random
from numpy import *
import numpy as np
from ast import literal_eval
from pandas import DataFrame
def create_clustered_data(N,k):
    random.seed(10)
    points_per_cluster=float(N)/k
    x=[]
    
    for i in range(k):
        income_centroid=random.uniform(20000,200000)
        age_centroid=random.uniform(20,70)
        for j in range(int(points_per_cluster)):
            x=np.append([random.normal(income_centroid,10000),random.normal(age_centroid,2)])
        x=np.array(x)
    return(x)

df=create_clustered_data(100,5)
df

Error Message

TypeError                                 Traceback (most recent call last)
<ipython-input-204-0ff0b56b46c6> in <module>
     18     return(x)
     19 
---> 20 df=create_clustered_data(100,5)
     21 df
     22 

<ipython-input-204-0ff0b56b46c6> in create_clustered_data(N, k)
     14         age_centroid=random.uniform(20,70)
     15         for j in range(int(points_per_cluster)):
---> 16             x=np.append([random.normal(income_centroid,10000),random.normal(age_centroid,2)])
     17         x=np.array(x)
     18     return(x)

<__array_function__ internals> in append(*args, **kwargs)

TypeError: _append_dispatcher() missing 1 required positional argument: 'values'


Solution

  • Here x=[] creates a list, not a numpy array also the check the syntax of the numpy append function. One way to solve the problem would be to append it to the list using the list.append function and then convert the list to a numpy array.

    from scipy.stats import norm 
    import random
    from numpy import *
    import numpy as np
    from ast import literal_eval
    from pandas import DataFrame
    
    def create_clustered_data(N,k):
        random.seed(10)
        points_per_cluster=float(N)/k
        x=[]
        for i in range(k):
            income_centroid=random.uniform(20000,200000)
            age_centroid=random.uniform(20,70)
            for j in range(int(points_per_cluster)):
                x.append([random.normal(income_centroid,10000),random.normal(age_centroid,2)])
            ar = np.array(x) 
        return(ar)
    
    df=create_clustered_data(100,5)
    df