Search code examples
rpackageconventionscode-organizationproject-organization

How to organize large R programs?


When I undertake an R project of any complexity, my scripts quickly get long and confusing.

What are some practices I can adopt so that my code will always be a pleasure to work with? I'm thinking about things like

  • Placement of functions in source files
  • When to break something out to another source file
  • What should be in the master file
  • Using functions as organizational units (whether this is worthwhile given that R makes it hard to access global state)
  • Indentation / line break practices.
    • Treat ( like {?
    • Put things like )} on 1 or 2 lines?

Basically, what are your rules of thumb for organizing large R scripts?


Solution

  • The standard answer is to use packages -- see the Writing R Extensions manual as well as different tutorials on the web.

    It gives you

    • a quasi-automatic way to organize your code by topic
    • strongly encourages you to write a help file, making you think about the interface
    • a lot of sanity checks via R CMD check
    • a chance to add regression tests
    • as well as a means for namespaces.

    Just running source() over code works for really short snippets. Everything else should be in a package -- even if you do not plan to publish it as you can write internal packages for internal repositories.

    As for the 'how to edit' part, the R Internals manual has excellent R coding standards in Section 6. Otherwise, I tend to use defaults in Emacs' ESS mode.

    Update 2008-Aug-13: David Smith just blogged about the Google R Style Guide.