Search code examples
performancefunctionreadability

How small should functions be?


How small should I be making functions? For example, if I have a cake baking program.

bakeCake(){
  if(cakeType == "chocolate") 
     fetchIngredients("chocolate")
  else

  if(cakeType == "plain")
     fetchIngredients("plain")
  else

  if(cakeType == "Red velvet")
     fetchIngredients("Red Velvet")

  //Rest of program

My question is, while this stuff is simple enough on its own, when I add much more stuff to the bakeCake function it becomes cluttered. But lets say that this program has to bake thousands of cakes per second. From what I've heard, it takes significantly longer (relative to computer time) to use another function compared to just doing the statements in the current function. So something that's similar like this should be very easy to read, and if efficiency is important wouldn't I want to keep it in there?

Basically, at what point do I sacrifice readability for efficiency. And a quick bonus question, at what point does having too many functions decrease readability? Here's an example of Apple's swift tutorial.

func isCandyAmountAcceptable(bandMemberCount: Int, candyCount: Int) -> Bool {
  return candyCount % bandMemberCount == 0

They said that because the function name isCandyAmountAcceptable was easier to read than candyCount % bandMemberCount == 0 that it'd be good to make a function for that. But from my perspective it may take a few seconds to figure out what the second option is saying, but it's also more readable when ti comes to knowing how it works.

Sorry about being all over the place and kinda asking 2 questions in one. Just to summarize my questions:

  1. Does using functions extraneously make efficiency (speed) suffer? If it does how can I figure out what the cutoff between readability and efficiency is?

  2. How small and simple should I make functions for? Obviously I'd make them if I ever have to repeat the function, but what about one time use functions?

Thanks guys, sorry if these questions are ignorant or anything but I'd really appreciate an answer.


Solution

  • Does using functions extraneously make efficiency (speed) suffer? If it does how can I figure out what the cutoff between readability and efficiency is?

    For performance I would generally not factor in any overhead of direct function calls against any decent optimizer, since it can even make those come free of charge. When it doesn't, it's still a negligible overhead in, say, 99.9% of scenarios. That applies even for performance-critical fields. I work in areas like raytracing, mesh processing, and image processing and still the cost of a function call is typically on the bottom of the priority list as opposed to, say, locality of reference, efficient data structures, parallelization, and vectorization. Even when you're micro-optimizing, there are much bigger priorities than the cost of a direct function call, and even when you're micro-optimizing, you often want to leave a lot of the optimization for your optimizer to perform instead of trying to fight against it and do it all by hand (unless you're actually writing assembly code).

    Of course with some compilers you might deal with ones that never inline function calls and have a bit of an overhead to every function call. But in that case I'd still say it's relatively negligible since you probably shouldn't be worrying about such micro-level optimizations when using those languages and interpreters/compilers. Even then it will probably often be bottom on the priority list, relatively speaking, as opposed to more impactful things like improving locality of reference and thread efficiency.

    It's like if you're using a compiler with very simplistic register allocation that has a stack spill for every single variable you use, that doesn't mean you should be trying to use and reuse as few variables as possible to work around its tendencies. It means reach for a new compiler in those cases where that's a non-negligible overhead (ex: write some C code into a dylib and use that for the most performance-critical parts), or focus on higher-level optimizations like making everything run in parallel.

    How small and simple should I make functions for? Obviously I'd make them if I ever have to repeat the function, but what about one time use functions?

    This is where I'm going to go slightly off-kilter and actually suggest you consider avoiding the teeniest of functions for maintainability reasons. This is admittedly a controversial opinion although at least John Carmack seems to somewhat agree (specifically in respect to inlining code and avoiding excess function calls for cases where side effects occur to make the side effects easier to comprehend).

    However, if you are going to make a lot of state changes, having them all happen inline does have advantages; you should be made constantly aware of the full horror of what you are doing.

    The reason I believe it can sometimes be good to err on the side of meatier functions is because there's often more to comprehend than that of a simple function to understand all the information necessary to make a change or fix a problem.

    Which is simpler to comprehend, a function whose logic consists of 80 lines of inlined code, or one distributed across a couple dozen functions and possibly ones that lead to disparate places throughout the codebase?

    The answer is not so clear cut. Naturally if the teeny functions are used widely, like say sqrt or abs, then the reader can simply skim over the function call, knowing full well what it does like the back of his hand. But if there are a lot of teeny exotic functions that are only used one time, then the ability to comprehend the operation as a whole requires looking them up and understanding what they all individually do before you can get a proper comprehension of what's going on in terms of the big picture.

    I actually disagree with that Apple Swift tutorial somewhat with that one-liner function because while it is easier to understand than figuring out what the arithmetic and comparison are supposed to do, in exchange it might require looking it up to see what it does in scenarios where you can't just say, isCandyAmountAcceptable is enough information for me and need to figure out exactly what makes an amount acceptable. Instead I would actually prefer a simple comment:

    // Determine if candy amount is acceptable.
    if (candyCount % bandMemberCount == 0)
        ...
    

    ... because then you don't have to jump to disparate places in code (the analogy of a book referring its reader to other pages in the book causing the readers to constantly have to flip back and forth between pages) to figure that out. Of course the idea behind this isCandyAmountAcceptable kind of function is that you shouldn't have to be concerned with such details about what makes a candy amount of acceptable, but too often in practice, we do end up having to understand the details more often than we optimally should to debug the code or make changes to it. If the code never needs to be debugged or changed, then it doesn't really matter how it's written. It could even be written in binary code for all we care. But if it's written to be maintained, as in debugged and changed in the future, then sometimes it is helpful to avoid making the readers have to jump through lots of hoops. The details do often matter in those scenarios.

    So sometimes it doesn't help to understand the big picture by fragmenting it into the teeniest of puzzle pieces. It's a balancing act, but certain types of developers can err on the side of overly dicing up their systems into the most granular bits and pieces and finding maintenance problems that way. Those types are still often promising engineers -- they just have to find their balance. The other extreme is the one that writes 500-line functions and doesn't even consider refactoring -- those are kinda hopeless. But I think you fit in the former category, and for you, I'd actually suggest erring on the side of meatier functions ever-so-slightly just to keep the puzzle pieces a healthy size (not too small, not too big).

    There's even a balancing act I see between code duplication and minimizing dependencies. An image library doesn't necessarily become easier to comprehend by shaving off a few dozen lines of duplicated math code if the exchange is a dependency to a complex math library with 800,000 lines of code and an epic manual on how to use it. In such cases, the image library might very well be easier to comprehend as well as use and deploy in new projects if it chooses instead to duplicate a few math functions here and there to avoid external dependencies, isolating its complexity instead of distributing it elsewhere.

    Basically, at what point do I sacrifice readability for efficiency.

    As stated above, I don't think readability of the small picture and comprehensibility of the big picture are synonymous. It can be really easy to read a two-line function and know what it does and still be miles away from understanding what you need to understand to make the necessary changes. Having many of those teeny one-shot two-liners can even delay the ability to comprehend the big picture.

    But if I use "comprehensibility vs. efficiency" instead, I'd say upfront at the design-level for cases where you anticipate processing huge inputs. As an example, a video processing application with custom filters knows it's going to be looping over millions of pixels many times per frame. That knowledge should be utilized to come up with an efficient design for looping over millions of pixels repeatedly. But that's with respect to design -- towards the central aspects of the system that many other places will depend upon because big central design changes are too costly to apply late in hindsight.

    That doesn't mean it has to start applying hard-to-understand SIMD code right off the bat. That's an implementation detail provided the design leaves enough breathing room to explore such an optimization in hindsight. Such a design would imply abstracting at the Image level, at the level of a million+ pixels, not at the level of a single IPixel. That's the worthy thing to take into consideration upfront.

    Then later on, you can optimize hotspots and potentially use some difficult-to-understand algorithms and micro-optimizations here and there for those truly critical cases where there's a strong perceived business need for the operation to go faster, and hopefully with good tools (profilers, i.e.) in hand. The user cases guide you about what operations to optimize based on what the users do most often and find a strong desire to spend less time waiting. The profiler guides you about precisely what parts of the code involved in that operation need to be optimized.