Search code examples
rdataframer-faq

Why is it not advisable to use attach() in R, and what should I use instead?


Let's assume that we have a data frame x which contains the columns job and income. Referring to the data in the frame normally requires the commands x$jobfor the data in the job column and x$income for the data in the income column.

However, using the command attach(x) permits to do away with the name of the data frame and the $ symbol when referring to the same data. Consequently, x$job becomes job and x$income becomes income in the R code.

The problem is that many experts in R advise NOT to use the attach() command when coding in R.

What is the main reason for that? What should be used instead?


Solution

  • When to use it:

    I use attach() when I want the environment you get in most stats packages (eg Stata, SPSS) of working with one rectangular dataset at a time.

    When not to use it:

    However, it gets very messy and code quickly becomes unreadable when you have several different datasets, particularly if you are in effect using R as a crude relational database, where different rectangles of data, all relevant to the problem at hand and perhaps being used in various ways of matching data from the different rectangles, have variables with the same name.

    The with() function, or the data= argument to many functions, are excellent alternatives to many instances where attach() is tempting.