Search code examples
rhistogram

How can I build a histogram with factor intervals?


I need to build a histogram out of some factors, but those factors describe number intervals, for example, the intervals 0-2000, 2000-4000, 4000-6000, 6000-8000, 8000-10000 and i know in what frequence itens falls into those intervals, how would i do it?

I've tried turning the intervals into numbers but, didn't really arrived at anywhere.


Solution

  • Your problem is how to convert factors of numbers into something number-like so that you can plot a histogram on it.

    quux <- data.frame(x = factor(c("0-2000", "2000-4000", "4000-6000", "6000-8000", "8000-10000")))
    quux
    #            x
    # 1     0-2000
    # 2  2000-4000
    # 3  4000-6000
    # 4  6000-8000
    # 5 8000-10000
    

    I think the easiest start is to come up with two values for each string, each a value.

    nums <- lapply(strsplit(levels(quux$x), "[^0-9]+"), as.numeric)
    str(nums)
    # List of 5
    #  $ : num [1:2] 0 2000
    #  $ : num [1:2] 2000 4000
    #  $ : num [1:2] 4000 6000
    #  $ : num [1:2] 6000 8000
    #  $ : num [1:2] 8000 10000
    

    You can convert this into whatever "numbers" you want each to represent. Examples:

    ### first of each pair
    sapply(nums, `[[`, 1)
    # [1]    0 2000 4000 6000 8000
    
    ### min, different from above if they are not always in order;
    ### this time showing addition of the 'na.rm=TRUE' in case 
    ### there are non-numbers
    sapply(nums, min, na.rm = TRUE)
    # [1]    0 2000 4000 6000 8000
    
    ### average of each pair
    sapply(nums, mean)
    # [1] 1000 3000 5000 7000 9000
    

    Whichever you choose, you can then place that value into whatever hist-plotting expression you're planning to use.