Search code examples
rprobabilitydistribution

How do I know the distribution of a dataset in R?


Here I take DemocracyIncome as an example. I need to simulate this dataset but I don't know which probability distribution of democracy and income I should choose. The dataset can be obtained from the following codes

library(pder)
data("DemocracyIncome", package = "pder")

Can anyone help me with the distribution of income and democracy? How can I simulate this democracy? I did a densityplot also and found that democracy has two to three peaks. It seems like a bimodal distribution.


Solution

  • I had a similar question some days ago and got helpful insights in the following post: How to identify the distribution of the given data using r

    Adapted to your dataset

    library(pder)
    data("DemocracyIncome", package = "pder")
    
    demo <- na.omit(DemocracyIncome)
    
    
     library(fitdistrplus)
     descdist(demo$income, discrete = FALSE)
    
    
     normal_dist <- fitdist(demo$income, "norm")
    
     plot(normal_dist)
    

    The first plot help you to identify distribution, second plot(s) are to check normal distribution! Hope this helps.