R Saving function output to object when using assign function

I am currently trying to make my code dryer by rewriting some parts with the help of functions. One of the functions I am using is:

datasetperuniversity<-function(university,year){assign(paste("data",university,sep=""),subset(get(paste("originaldata",year,sep="")),get(paste("allcollaboration",university,sep=""))==1))}

Executing the function datasetperuniversity("Harvard","2000") would result within the function in something like this:

dataHarvard=subset(originaldata2000,allcollaborationHarvard==1)

The function runs nearly perfectly, except that it does not store a the results in dataHarvard. I read that this is normal in functions, and using the <<- instead of the = could solve this issue, however since I am making use of the assign function this is not really possible, since the = is just the outcome of the assign function.

Here some data:

sales = c(2, 3, 5,6) 
numberofemployees = c(1, 9, 20,12) 
allcollaborationHarvard = c(0, 1, 0,1) 
originaldata = data.frame(sales, numberofemployees, allcollaborationHarvard)

Solution

Generally, it's best not to embed data/a variable into the name of an object. So instead of using assign to dataHarvard, make a list data with an element called "Harvard":

# enumerate unis, attaching names for lapply to use
unis = setNames(, "Harvard")

# make a table for each subset with lapply
data = lapply(unis, function(x) 
  originaldata[originaldata[[ paste0("allcollaboration", x) ]] == 1, ]
)

which gives

> data
$Harvard
  sales numberofemployees allcollaborationHarvard
2     3                 9                       1
4     6                12                       1

As seen here, you can use DF[["column name"]] to access a column instead of get as in the OP. Also, see the note in ?subset:

Warning

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

Generally, it's also better not to embed data in column names if possible. If the allcollaboration* columns are mutually exclusive, they can be collapsed to a single categorical variable with values like "Harvard", "Yale", etc. Alternately, it might make sense to put the data in long form.

For more guidance on arranging data, I recommend Hadley Wickham's tidy data paper.