Search code examples
pythonpysparkmodeling

How to visualize variable grouping or perform interactive grouping in PySpark world?


I was wondering whether there is a way how to perform interactive variables grouping (similar to one enabled by SAS Miner software) in PySpark/Python world. Variable grouping is intergral part of model development, so I suppose there has to be already some tool/library that might support this. Does anyone have experience with this?


Solution

  • Currently no such library exists for Python.

    Interactive variable grouping is a multi-step process (offered as a node called IGN in SAS Enterprise Miner) that is part of SAS EM Credit Scoring solution and not base SAS. Although there are tools in Python world for some of the IGN steps such as binning, WoE, Gini, decision trees, etc. Scikit-learn is a good starting point for that.

    There are a lot of Scikit-learn related projects including domain-specific ones. A project for credit scoring is a potential candidate in that list.