Search code examples
c++recursionshared-ptrcompiler-optimizationtail-recursion

Does shared pointer break tail call optimization?


Preface

I'm practicing C++ and trying to implement immutable list. In one of my tests I'm trying to create a list with lot of values (1 million nodes) recursively. All values are const, so I cannot perform regular loop, also this isn't functional enough, you know. The test fails with Segmentation fault.

My system is 64-bit Xubuntu 16.04 LTS with Linux 4.4. I compile the my code with g++ 5.4 and clang++ 3.8 using --std=c++14 -O3 flags.

Source code

I've written a simple example, which shows the situation, when tail call should be easily optimized, but something goes wrong and Segmentation fault appears. The function f just waits amount iterations and then creates a pointer to single int and returns it

#include <memory>

using std::shared_ptr;

shared_ptr<int> f(unsigned amount) {
    return amount? f(amount - 1) : shared_ptr<int>{new int};
}

int main() {
    return f(1E6) != nullptr;
}

Note this example fails only with g++, while clang++ makes it okay. Though, on more complicated example it doesn't optimize too.

Here is an example of a simple list with recursive insertion of elements. Also I've added destroy function, which helps to avoid stack overflow during destruction. Here I get Segmentation fault with both compilers

#include <memory>

using std::shared_ptr;

struct L {
    shared_ptr<L> tail;

    L(const L&) = delete;
    L() = delete;
};

shared_ptr<L> insertBulk(unsigned amount, const shared_ptr<L>& tail) {
    return amount? insertBulk(amount - 1, shared_ptr<L>{new L{tail}})
                 : tail;
}

void destroy(shared_ptr<L> list) {
    if (!list) return;

    shared_ptr<L> tail = list->tail;
    list.reset();

    for (; tail; tail = tail->tail);
}

int main() {
    shared_ptr<L> list = shared_ptr<L>{new L{nullptr}};
    destroy(insertBulk(1E6, list));
    return 0;
}

NOTE Implementation with regular pointers is optimized well by both compilers.

Question

Does shared_ptr really break tail call optimization in my case? Is it a compilers' issue or a problem with shared_ptr implementation?


Solution

  • Answer

    Short answer is: yes and no.

    Shared pointer in C++ doesn't break tail call optimization, but it complicates creation of such recursive function, that can be converted to loop by compiler.

    Details

    Avoiding stack overflow during recursive construction of a long list

    I've recalled that shared_ptr has a destructor and C++ has RAII. This makes construction of optimizable tail call harder, as it was discussed in Can Tail Call Optimization and RAII Co-Exist? question.

    @KennyOstrom has proposed to use an ordinary pointer to solve this problem

    static const List* insertBulk_(unsigned amount, const List* tail=nullptr) {
        return amount? insertBulk_(amount - 1, new List{tail})
                     : tail;
    }
    

    The following constructor is used

    List(const List* tail): tail{tail} {}
    

    When tail of List is an instance of shared_ptr, tail call is successfully optimized.

    Avoiding stack overflow during destruction

    Custom destruction strategy is needed. Fortunately, shared_ptr allows us to set it, so I've hidden destructor of List by making it private, and use this for list destruction

    static void destroy(const List* list) {
        if (!list) return;
    
        shared_ptr<const List> tail = list->tail;
        delete list;
        for (; tail && tail.use_count() == 1; tail = tail->tail);
    }
    

    Constructor should pass this destruction function to tail initialization list

    List(const List* tail): tail{tail, List::destroy} {}
    

    Avoiding memory leak

    In the case of exceptions I'll not have proper cleanup, so the problem's not solved yet. I want to use shared_ptr because it's safe, but now I don't use it for current list head until the end of construction.

    It's needed to watch the "naked" pointer until it's wrapped into shared pointer, and free it in the case of emergency. Let's pass a reference to tail pointer instead of a pointer itself to insertBulk_. This will allow the last good pointer to be visible outside of the function

    static const List* insertBulk_(unsigned amount, const List*& tail) {
        if (!amount) {
            const List* result = tail;
            tail = nullptr;
            return result;
        }
        return insertBulk_(amount - 1, tail = new List{tail});
    }
    

    Then analogue of Finally is needed in order to automate destruction of a pointer, which will leak in the case of exception

    static const shared_ptr<const List> insertBulk(unsigned amount) {
        struct TailGuard {
            const List* ptr;
            ~TailGuard() {
                List::destroy(this->ptr);
            }
        } guard{};
        const List* result = insertBulk_(amount, guard.ptr);
        return amount? shared_ptr<const List>{result, List::destroy}
                     : nullptr;
    }
    

    Solution

    Now, I guess, the problem is solved:

    • g++ and clang++ successfully optimize recursive creation of long lists;
    • lists still use shared_ptr;
    • ordinary pointers seem to be in safety.

    Source code

    The final code is

    #include <memory>
    #include <cassert>
    
    using std::shared_ptr;
    
    class List {
        private:
            const shared_ptr<const List> tail;
    
            /**
             * I need a `tail` to be an instance of `shared_ptr`.
             * Separate `List` constructor was created for this purpose.
             * It gets a regular pointer to `tail` and wraps it
             * into shared pointer.
             *
             * The `tail` is a reference to pointer,
             * because `insertBulk`, which called `insertBulk_`,
             * should have an ability to free memory
             * in the case of `insertBulk_` fail
             * to avoid memory leak.
             */
            static const List* insertBulk_(unsigned amount, const List*& tail) {
                if (!amount) {
                    const List* result = tail;
                    tail = nullptr;
                    return result;
                }
                return insertBulk_(amount - 1, tail = new List{tail});
            }
            unsigned size_(unsigned acc=1) const {
                return this->tail? this->tail->size_(acc + 1) : acc;
            }
            /**
             * Destructor needs to be hidden,
             * because it causes stack overflow for long lists.
             * Custom destruction method `destroy` should be invoked first.
             */
            ~List() {}
        public:
            /**
             * List needs custom destruction strategy,
             * because default destructor causes stack overflow
             * in the case of long lists:
             * it will recursively remove its items.
             */
            List(const List* tail): tail{tail, List::destroy} {}
            List(const shared_ptr<const List>& tail): tail{tail} {}
            List(const List&) = delete;
            List() = delete;
    
            unsigned size() const {
                return this->size_();
            }
    
            /**
             * Public iterface for private `insertBulk_` method.
             * It wraps `insertBulk_` result into `shared_ptr`
             * with custom destruction function.
             *
             * Also it creates a guard for tail,
             * which will destroy it if something will go wrong.
             * `insertBulk_` should store `tail`,
             * which is not yet wrapped into `shared_ptr`,
             * in the guard, and set it to `nullptr` in the end
             * in order to avoid destruction of successfully created list.
             */
            static const shared_ptr<const List> insertBulk(unsigned amount) {
                struct TailGuard {
                    const List* ptr;
                    ~TailGuard() {
                        List::destroy(this->ptr);
                    }
                } guard{};
                const List* result = insertBulk_(amount, guard.ptr);
                return amount? shared_ptr<const List>{result, List::destroy}
                             : nullptr;
            }
            /**
             * Custom destruction strategy,
             * which should be called in order to delete a list.
             */
            static void destroy(const List* list) {
                if (!list) return;
    
                shared_ptr<const List> tail = list->tail;
                delete list;
    
                /**
                 * Watching references count allows us to stop,
                 * when we reached the node,
                 * which is used by another list.
                 *
                 * Also this prevents long loop of construction and destruction,
                 * because destruction calls this function `destroy` again
                 * and it will create a lot of redundant entities
                 * without `tail.use_count() == 1` condition.
                 */
                for (; tail && tail.use_count() == 1; tail = tail->tail);
            }
    };
    
    int main() {
        /**
         * Check whether we can create multiple lists.
         */
        const shared_ptr<const List> list{List::insertBulk(1E6)};
        const shared_ptr<const List> longList{List::insertBulk(1E7)};
        /**
         * Check whether we can use a list as a tail for another list.
         */
        const shared_ptr<const List> composedList{new List{list}, List::destroy};
        /**
         * Checking whether creation works well.
         */
        assert(list->size() == 1E6);
        assert(longList->size() == 1E7);
        assert(composedList->size() == 1E6 + 1);
        return 0;
    }
    

    Shorter version of the source code

    The List class without comments and checks in main function

    #include <memory>
    
    using std::shared_ptr;
    
    class List {
        private:
            const shared_ptr<const List> tail;
    
            static const List* insertBulk_(unsigned amount, const List*& tail) {
                if (!amount) {
                    const List* result = tail;
                    tail = nullptr;
                    return result;
                }
                return insertBulk_(amount - 1, tail = new List{tail});
            }
            ~List() {}
        public:
            List(const List* tail): tail{tail, List::destroy} {}
            List(const shared_ptr<const List>& tail): tail{tail} {}
            List(const List&) = delete;
            List() = delete;
    
            static const shared_ptr<const List> insertBulk(unsigned amount) {
                struct TailGuard {
                    const List* ptr;
                    ~TailGuard() {
                        List::destroy(this->ptr);
                    }
                } guard{};
                const List* result = insertBulk_(amount, guard.ptr);
                return amount? shared_ptr<const List>{result, List::destroy}
                             : nullptr;
            }
            static void destroy(const List* list) {
                if (!list) return;
    
                shared_ptr<const List> tail = list->tail;
                delete list;
    
                for (; tail && tail.use_count() == 1; tail = tail->tail);
            }
    };