Search code examples
pythonscikit-learncluster-analysisdbscan

Is there an easy way to use DBSCAN in python with dimensions higher than 2?


I've been working on a machine learning project using clustering algorithms, and I'm looking into using scikit-learn's DBSCAN implementation based on the data that I'm working with. However, whenever I try to run it with my feature arrays, it throws the following error:

ValueError: Found array with dim 3. Estimator expected <= 2.

This gives me the impression that scikit's DBSCAN only supports two-dimensional features. Am I wrong in thinking this? If not, is there an implementation of DBSCAN that supports higher-dimensional feature arrays? Thanks for any help you can offer.

Edit

Here's the code that I'm using for my DBSCAN script. The idea is to read data from a number of different CSVs, save them into an array, and then dump them into a pickle file so that the model can load them in the future and run DBSCAN.

def get_clusters(fileList, arraySavePath):
    # Create empty array
    fitting = [];

    # Get values from all files, save to singular array
    for filePath in fileList:
        df = pd.read_csv(filePath, usecols=use_cols);
        fitting.append(df.values.tolist());

    # Save array to it's own csv file    
    with open(arraySavePath, "wb") as fp:
        pickle.dump(fitting, fp);


def predict_cluster(modelPath, predictInput):
    # Load the cluster data
    with open(modelPath, "rb") as fp:
        fitting = pickle.load(fp);

    # DBSCAN fit
    clustering = DBSCAN(eps=3, min_samples=2);
    clustering.fit(fitting);

    # Predict the label
    return clustering.predict_fit(predictInput);

Solution

  • I believe the issue is with the "min_samples" parameter. The data you're fitting contains 3 features/dimensions but you've set "min_samples=2". Min_samples has to be equal to or greater than the number of features in your dataset.