Search code examples
rplotggplot2power-law

Creating power law distribution chart based on raw data


So, i have a raw data that if charted, should form a power law distribution. I'm not really sure how to smooth the chart. I can do it in Excel, but i want to do it in R. I have a dataframe with 2 column. one is called frequency and another is called proportion. Frequency is frequency of a word used in a document. Proportion is the percentage. So I want to plot frequency on the X axis, and proportion on the Y. I tried barplot and ggplot.

The barplot seems perfect after adjusting the space. But for some reason I can only show the numbers on Y axis, and can't make the number appear on X axis.

The ggplot isn't as smooth.

If i convert the plot into density plot, it will change the measurement on Y axis.

How do I plot X and Y, and retain all the measurement label?

barplot(height=speech$proportion,width=speech$frequency,density=FALSE,space=10,border="green",xlab="Speech Frequency", ylab="Percentage of Words")

enter image description here

and ggplot

ggplot(speech,aes(x=speech$frequency,y=speech$proportion))+geom_bar(stat="identity",fill="green",colour="green") + xlab("Speech Frequency") +ylab("Proportion")

enter image description here

This is what it looks like in excel, which is what i want. enter image description here


Solution

  • Changing the labels on the x-axis with barplot is tedious. I usually use the gridBase package for this purpose.

    CODE:

    # 1: generating some mockup data
    speech = data.frame(frequency=c(500,250,125,75,20,10,5,3,1,1,1),proportion=c(c(500,250,125,75,20,10,5,3,1,1,1)/100))
    # 2: calling barplot with filled bars and with space=0 (no space between bars)
    midpts=barplot(height=speech$proportion,col="green",space=0,border="green",xlab="Speech Frequency", ylab="Percentage of Words")
    # 3: loading gridBase, and using it to generate the x-axis labels
    library(gridBase)
    vps <- baseViewports()
    pushViewport(vps$inner, vps$figure, vps$plot)
    grid.text(speech$frequency, x = unit(midpts, "native"), y=unit(-0.5, "lines"), just="right", rot=90)
    

    RESULT:

    barplot in R with x labels