Search code examples
rfunctionrelate

Simple Function to normalize related objects


I'm quite new to R and I'm trying to write a function that normalizes my data in diffrent dataframes.

The normalization process is quite easy, I just divide the numbers I want to normalize by the population size for each object (that is stored in the table population). To know which object relates to one and another I tried to use IDs that are stored in each dataframe in the first column.

I thought to do so because some objects that are in the population dataframe have no corresponding objects in the dataframes to be normalized, as to say, the dataframes sometimes have lesser objects.

Normally one would built up a relational database (which I tried) but it didn't worked out for me that way. So I tried to related the objects within the function but the function didn't work. Maybe someone of you has experience with this and can help me.

so my attempt to write this function was:

    # Load Tables
    # Agriculture, Annual Crops
    table.annual.crops <-read.table ("C:\\Users\\etc", header=T,sep=";")
    # Agriculture, Bianual and Perrenial Crops
    table.bianual.crops <-read.table ("C:\\Users\\etc", header=T,sep=";")
    # Fishery
    table.fishery <-read.table ("C:\\Users\\etc", header=T,sep=";")
    # Population per Municipality
    table.population <-read.table ("C:\\Users\\etc", header=T,sep=";")

    # attach data
    attach(table.annual.crops)
    attach(table.bianual.crops)
    attach(table.fishery)
    attach(table.population)


    # Create a function to normalize data
    # Objects should be related by their ID in the first column
    # Values to be normalized and the population appear in the second column
    funktion.norm.percapita<-function (x,y){if(x[,1]==y[,1]){x[,2]/y[,2]}else{return("0")}}

    # execute the function
    funktion.norm.percapita(table.annual.crops,table.population)

Solution

  • Lets start with the attach steps... why? Its usually unecessary and can get you into trouble! Especially since both your population data.frame and your crops data.frame have Geocode as a column!

    as suggested in the comments, you can use merge. This will by default combine data.frames using columns of the same name. You can specify which columns on which to merge with the by parameters.

    dat <- merge(table.annual.crops, table.population)
    dat$crop.norm <- dat$CropValue / dat$Population
    

    The reason your function isn't working? Look at the results of your if statemnt.

    table.annual.crops[,1] == table.population[,1]
    

    Gives a vector of booleans that will recycle the shorter vector. If your data is quite large (on the order of millions of rows) the merge function can be slow. if this is the case, take a look at the data.table package and use its merge function instead.