Search code examples
machine-learningdeep-learningartificial-intelligencerandom-forest

Can I use machine learning on below dataset sample


Dataset Sample

Can I use any algorithm to train above dataset ? Because Each Row (Id) has Dependent Variable(Status) . But Each "Id" again as Mulitple Rows as per Features You Can Assume it as "Each Id has multiple transaction and All transactions have common Status" Will Machine learning find some Patterns from these transaction

Is there any other approach to solve these type of problems


Solution

  • Just fill your ID row with the value from the above row , same for the status row, this will lead to:

    df
    ID Feature1 Feature2 Feature3 Status
    8079 100    Asia      High    Approved
    8079 200    Africa    Low     Approved
    

    When you run a classification algorithm, you can use: ID, Feature1, Feature2, Feature3as features and Status as target. A classifier will learn with this and everything is completly the same as before. The features are still independet. Dependet features you will only have if the variables are somehow dependet to each other, in your case the ID 8079 does not lead to Feature1: Africa. They are independet.

    You can fill your cells with:

    import numpy as np
    df[df[0]==""] = np.NaN
    df.fillna(method='ffill')
    

    Based on your comments, the approach can be slightly different, you need to convert your entries to new features (Python pandas convert rows to columns where multiple columns exist): The dataframe then should look like:

    ID Feature1 Feature2 Feature3  Feature1a .... Feature3z Status
    8079 100    Asia      High    200                       Approved