Search code examples
pythonjsonloopsscikit-learnpipeline

Loading Scikit-Learn Pipeline Steps from JSON


I am a newbie to Python and ML, so please bear with me.

I’m trying to programmatically generate scikit-learn pipeline steps based on parameters in a JSON file.

The Parameters are as follows:

{
    "scaler": ["STAN", "MINMAX", "MAXABS", "ROBUST", "PT"],
    "imbalance": ["SMOTE", "RUS", "SMOTEENN"],
    "classifier": ["SVM", "RF", "GBC"]
}

I’m trying to loop through each of the lists, starting with the scaler and adding pipeline steps with each iteration. The goal is to have one pipeline for each possible combination.

The following loops will provide me with the some output:

complete_pipelines = []
pipeline_steps = []
for scaler in scalerList:
    pipeline_steps.append(scaler)

    for imbalancer in imbalancerList:
        pipeline_steps.append(imbalancer)

        for classifier in classifierList:
            pipeline_steps.append(classifier)

            if (len(pipeline_steps) == 3):
                complete_pipelines.append(pipeline_steps)
                pipeline_steps = []
                pipeline_steps.append(scaler)
                pipeline_steps.append(imbalancer)

What I am getting with the above method is:

[
  ['STAN', 'SMOTE', 'SVM'], 
  ['STAN', 'SMOTE', 'RF'], 
  ['STAN', 'SMOTE', 'GBC']
]

Which is a start, but what I am looking for is:

[
  [“STAN”, “SMOTE”, “SVM”]
  [“STAN”, “SMOTE”, “RF”]
  [“STAN”, “SMOTE”, “GBC”]
  [“STAN”, “RUS”, “SVM”]
  [“STAN”, “RUS”, “RF”]
  [“STAN”, “RUS”, “GBC”]
  [“STAN”, “SMOTEENN”, “SVM”]
  [“STAN”, “SMOTEENN”, “RF”]
  [“STAN”, “SMOTEENN”, “GBC”]

  [“MINMAX”, “SMOTE”, “SVM”]
  [“MINMAX”, “SMOTE”, “RF”]
  [“MINMAX”, “SMOTE”, “GBC”]
  .
  .
  .
  [“PT”, “SMOTEENN”, “SVM”]
  [“PT”, “SMOTEENN”, “RF”]
  [“PT”, “SMOTEENN”, “GBC”]
]

After these lists are generated I will go through each of the lists and set up the pipelines accordingly. But just getting this to work has already stumped me…

I’d appreciate any pointers on how to implement such a loop or if there are any functions to create such sets already “built in”.

Thanks Steve


Solution

  • may be something like this

    complete_pipelines = []
    for scaler in scalerList:
        for imbalancer in imbalancerList:
            for classifier in classifierList:
                complete_pipelines.append([scaler,imbalancer,classifier ])
    
    print(complete_pipelines)
    
    

    http://ideone.com/tftPGd