Search code examples
coding-stylewhitespacecode-readability

Experimental research findings on white space (for language design and style guides)?


What does experimental research say about white space in code? Let me be specific: I'm talking about cognitive studies that compare how quickly and effectively people can read and grasp visual information that comes across in different formats.

Let's say you were designing a new computer language and had to make some decisions that affect how the source code looks. Or you were simply writing a style guide for a new language and wanted to make recommendations. Relevant topics might be identifier style (snake_cased_identifiers vs. camelCaseIdentifiers / PascalCaseIdentifiers), horizontal indentation, documentation styles, or vertical spacing.

I'm intentionally asking this question in this way to avoid recommendations such as:

  • "it doesn't matter (without justification for the claim)"
  • "do what the community already recommends for language X."

I don't want a flame war between people that support differing approaches; rather, I'd like to know what experimental research has to say about the matter. (And I don't expect any particular study to necessarily be completely 'objective' or 'neutral'.)

To give a 'squishier' motivation for this question: People appreciate white space in code, when reading documents, and in art (such as listening to music). These fields all put a big emphasis on the importance of space.

So, thanks, I'd appreciate to hear what the studies have to say. To be clear, I'm not ruling out the importance of style and art -- I actually would hope that the wisdom from these worlds will show up in experimental studies.

In summary, if you can, please touch on one or more of the following:

  • variable naming convention
  • horizontal indentation
  • horizontal alignment (align the equal signs across multiple lines?)
  • vertical spacing

Solution

  • There is an annual IEEE conference titled the International Conference on Program Comprehension (ICPC) which often has experimental studies on program comprehension. The most relevant that I found from the past three years are:

    • An Eye Tracking Study on camelCase and under_score Identifier Styles "While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly."

    • To camelcase or under_score "Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style."

    In addition to the computer-science specific cognitive literature, there are studies about online and hypertext reading:

    • [Chaparro, 2005] Reading Online Text with a Poor Layout: Is Performance Worse? by Barbara S. Chaparro, A. Dawn Shaikh, & J. Ryan Baker, Usability News, Volume 7, Issue 1, February 2005.

    • [Lin, 2004] Evaluating older adults' retention in hypertext perusal: impacts of presentation media as a function of text topology by Dyi-Yih Michael Lin in "Computers in Human Behavior", Volume 20, Issue 4, July 2004, Pages 491-503. Available from ScienceDirect

    • Cognitive load in hypertext reading: A review by Diana DeStefano and Jo-Anne LeFevre.

    These papers less directly address the question, but I mention them in hopes that they provide some context. The first two references were found thanks Michael Suodenjoki's blog post entitled White space matters in program source code.