math logic artificial-intelligence probability economics

Categorize a Loan based on risk into one of 3 Categories ,Red, Yellow and Green. Risk calculated based on his data

Here Risk means if someone lends him the money for his loan he won't be able to pay it back.

our app has data about users asking loans such as :

    grossAnnualIncome;

    monthlySalary;

    monthly rent;

    educationalExpenses;
    
    houseHoldExpenses;

    monthlyPersonalExpenses;

    monthlySavings;

so using the above data I want to find the probability of him being able to pay back the loan and if he has a high probability he's a Green if he low probability he's a Red if in between he's yellow.

are there any formulas for this?

or if there's a formula for the above question but that requires more data than above mentioned those solutions too would be a lot of help.

Note: A person with monthlySalary $50,000 may be a Red when he asks for a loan of 500,000. but he's definitely green for loans of ranges around 50,000 to 65,000 and a yellow for loans of something like 70,000 - 90,000, since he may be able to pay it back. [considering his rent is 0$ , and expenses are 0$ as well.]

also if that same person has a monthly salary $50,000 but has rent $44,000, and householdExpenses of $9000 when he asks for a loan of $50,000 . this loan asked for $50,000 must be category Red as in no way he can pay it back.

Solution

"Formula"?

No such thing.

This sounds like a multi-class classification problem.

You should have a data set of the variables you listed, with one record per individual. Each row should have the red/yellow/green designation.

If the rows are not assigned a red/yellow/green status you'll have to create one. In that case you should create a regression model that gives you probability of repayment from 0-100% or 0-1. You will assign red/yellow/green risk as probability bins: 0-50% = red, 50=70% = yellow, 70-100% = green.

You can adjust the bin cutoffs to suit your appetite for risk. These are just my example. A real application would look at repayment patterns and set the bins based on those.

Your job is to divide the data into three parts: train/validation/test.

Create a model using your technique of choice (e.g. neural network, random forest, XGBoost, etc) by training it on the train set. Tune and validate your models on the validation set. Once you have a model, give it the test set and compare its predictions to the known test set results.

You might try creating multiple models to see how they do. Sometimes blending several models gives better results than just one.

You need to trade off bias and variance for each model.

Good luck!