Search code examples
performancerust

What's the performance penalty of creating a closure vs struct?


I have done something that looks very smart. Trying to get better with error handling, I've written my DB operations like this

    fn handle_delete(&self, id: &Id) -> Result<(), SqliteError> {
        db_op(
            || DbOperationInfo::new("filerequests", OperationType::Delete),
            || {
                self.conn
                    .execute("DELETE FROM filerequests WHERE id = ?", [id.to_string()])?;
                Ok(())
            },
        )
    }

The idea is that I have my SqliteError enum, which contains a fairly small struct that contains some info about the attempted operation and the rusqlite::error::Error that caused it.

What the db_op function does is accept two closures, the first of which generates that small struct containing operation info and the second one doing the actual operation. If (and only if) an error occurs, the first closure is executed generating the operation info and then the causing error is added to create a SqliteError.

So the operation info creation is lazy.

But do I actually gain something here? Or is the creation of the closure similarly expensive? Or, perhaps the compiler will optimize the closure creation and do that only once? Or maybe it would optimize struct creation just the same (the contents are kinda static)?

Looking for some guidance here: Does this optimization (likely) actually help?


Solution

  • You can benchmark this situation using divan (or any other benchmarking tool in Rust, like criterion).

    You need to minimize the code first to isolate the actual code you want to test. Then, add a bunch of black_box at the right positions to avoid your test code getting compiled out.

    This is my attempt to minimize and benchmark your situation. Note that I did not know what exactly is going on in DbOperationInfo::new(), so I just put a String allocation in there, to demonstrate some kind of heap based initialization. If DbOperationInfo does not perform a heap allocation, the result might be a little less obvious.

    #![allow(dead_code)]
    
    use std::hint::black_box;
    
    enum OperationType {
        Delete,
        Insert,
    }
    
    struct DbOperationInfo {
        s: String,
        t: OperationType,
    }
    
    impl DbOperationInfo {
        pub fn new(s: impl Into<String>, t: OperationType) -> Self {
            Self { s: s.into(), t }
        }
    }
    
    fn db_op_closure(generate: bool, f: impl FnOnce() -> DbOperationInfo) {
        if black_box(generate) {
            black_box(f());
        }
    }
    
    fn db_op_value(generate: bool, val: DbOperationInfo) {
        if black_box(generate) {
            black_box(val);
        }
    }
    
    #[divan::bench(args = [true, false], min_time = 0.5)]
    fn handle_delete_closure(generate_value: bool) {
        db_op_closure(generate_value, || {
            DbOperationInfo::new("filerequests", OperationType::Delete)
        })
    }
    
    #[divan::bench(args = [true, false], min_time = 0.5)]
    fn handle_delete_value(generate_value: bool) {
        db_op_value(
            generate_value,
            DbOperationInfo::new("filerequests", OperationType::Delete),
        )
    }
    
    fn main() {
        // Run registered benchmarks.
        divan::main();
    }
    

    And here is the result:

    closure_vs_direct         fastest       │ slowest       │ median        │ mean          │ samples │ iters
    ├─ handle_delete_closure                │               │               │               │         │
    │  ├─ false               1.011 ns      │ 5.796 ns      │ 1.023 ns      │ 1.04 ns       │ 22881   │ 187441152
    │  ╰─ true                28.83 ns      │ 476.6 ns      │ 29.41 ns      │ 31.12 ns      │ 29588   │ 15149056
    ╰─ handle_delete_value                  │               │               │               │         │
       ├─ false               28.24 ns      │ 194.2 ns      │ 29.02 ns      │ 30.34 ns      │ 60225   │ 15417600
       ╰─ true                28.63 ns      │ 114.7 ns      │ 29.41 ns      │ 30.54 ns      │ 30153   │ 15438336
    

    So what does that mean?

    • My version of DbOperationInfo::new() takes about 28 ns.
    • When the operant is a closure and the closure does not get called, the time is only 1 ns.

    That means a single heap allocation is enough to make it way worth wrapping it in a closure.


    Just to see the difference, I removed the heap allocation from DbOperationInfo::new():

    struct DbOperationInfo {
        s: &'static str,
        t: OperationType,
    }
    
    impl DbOperationInfo {
        pub fn new(s: &'static str, t: OperationType) -> Self {
            Self { s, t }
        }
    }
    

    And now it does not matter if you wrap it in a closure or not:

    closure_vs_direct         fastest       │ slowest       │ median        │ mean          │ samples │ iters
    ├─ handle_delete_closure                │               │               │               │         │
    │  ├─ false               1.291 ns      │ 122 ns        │ 1.303 ns      │ 1.387 ns      │ 19944   │ 163381248
    │  ╰─ true                1.291 ns      │ 7.199 ns      │ 1.303 ns      │ 1.36 ns       │ 20790   │ 170311680
    ╰─ handle_delete_value                  │               │               │               │         │
       ├─ false               1.291 ns      │ 6.955 ns      │ 1.303 ns      │ 1.371 ns      │ 20484   │ 167804928
       ╰─ true                1.291 ns      │ 22.99 ns      │ 1.303 ns      │ 1.367 ns      │ 20653   │ 169189376
    

    That tells me that as long as you have a heap allocation or some kind of computation, wrap it in a closure.

    If your object is small and can be allocated statically, it does not matter. Theoretically removing the closure could be minimally faster in this situation, but the compiler will likely optimize the closure away to be identical.