python garbage-collection swig fluent chaining

Using Swig to Wrap Fluent Interfaces

I'm using Swig to wrap a class implemented in C++. This class uses a fluent interface to allow method chaining. That is, methods that modify the state of an object return the reference to the object and, thus, allow to call the next state modifying method. For example:

class FluentClass {
public:
    ...
    FluentClass & add(std::string s)
    {
        state += s;
        return *this;
    }
    ...
private:
    std::string state;
};

The method add adds the given string s to state and returns the reference to the object allowing one to chain multiple calls of add:

FluentClass fc;
c.add(std::string("hello ")).add(std::string("world!"));

You will find more comprehensive examples at: https://en.wikipedia.org/wiki/Fluent_interface

I wrote several swing files (nothing special) to create bindings for multiple languages, in particular: C#, Java, Python, and Ruby. The following example (Python) works as expected:

fc = FluentClass()
fc.add("hello").add("world!")

However, the following does not:

fc = FluentClass()
fc = fc.add("hello").add("world!")

I found that calling add on fc does not return the reference of fc but a reference (I'm expecting that the other bindings will do the same) to a newly create object that actually wraps the same memory:

fc = FluentClass()
nfc = fc.add("hello world!")
fc != nfc, though fc and nfc wrap the same memory :(

Therefore, assigning the result of add to the same variable leads to the destruction of the original object as part of the garbage collection. The result is that fc now points to invalid memory.

So my question is: Do you know how to wrap FluentClass properly, to let add return the same reference in order to prevent garbage collection?

Solution

The problem is that when the Python proxy created when you construct your instance gets destroyed the underlying C++ object gets deleted. Since SWIG doesn't know that the returned value is a reference to the same object it constructs a new proxy when you call add. As a result in the case you observed errors the original object has its ref count hit 0 before the chained methods are finished.

To investigate and fix first I created a test case to reproduce the issue properly. This was fluent.h:

#include <string>

class FluentClass {
public:
    FluentClass & add(std::string s)
    {
        state += s;
        return *this;
    }
private:
    std::string state;
};

Enough code to reliably hit SEGFAULT/SIGABRT in testing in Python:

import test

def test_fun():
    f=test.FluentClass()
    f=f.add("hello").add("world")

    return f

for i in range(1000):
    f2=test_fun()
    f2.add("moo")

And a SWIG interface file to build the module 'test':

%module test

%{
#include "fluent.h"
%}

%include <std_string.i>

%include "fluent.h"

With this extra work done I was able to reproduce the problem you reported. (Note: throughout this I'm targeting SWIG 3.0 with Python 3.4).

You need to write typemaps to handle the special case where 'returned value == this'. I initially wanted to target the argout typemap of the special 'this' argument as this felt like the right place to be doing this kind of work, but that also matches on the destructor call unfortunately which would have made writing the typemaps correctly harder than needed so I skipped that.

In my out typemap, which only gets applied to fluent types I check that we really do meet the "input is output" assumption and aren't simply returning something else. Then it bumps the reference count of the input so that I can safely return it with the semantics expected.

In order to make that work in the out typemap though we need to do a bit more work to safely and robustly capture the input Python object. The problem here is that SWIG generates the following function signature:

SWIGINTERN PyObject *_wrap_FluentClass_add(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {

where the SWIGUNUSEDPARAM marco expands to simply not name the first parameter at all. (This looks like a bug to me in the macro definition, since it's the minor version of GCC that determines which option gets picked in C++ mode, but nonetheless we want it to work still).

So what I ended up doing was writing a custom in typemap that reliably captures the C++ this pointer and the Python object associated with it. (The way it's written works even if you enable one of the other argument unpacking styles and should be robust to other variations. It will however fail if you name other arguments 'self'). To put the values somewhere that can be used from the later 'out' typemap and don't have problems crossing goto statements we need to use the _global_ prefix when declaring local variables.

Finally we need to do something sane in the non-fluent case. So the resulting file looks like:

%module test

%{
#include "fluent.h"
%}

%include <std_string.i>
%typemap(in) SWIGTYPE *self (PyObject *_global_self=0, $&1_type _global_in=0) %{
  $typemap(in, $1_type)
  _global_self = $input;
  _global_in = &$1;
%}

%typemap(out) FLUENT& %{
  if ($1 == *_global_in) {
    Py_INCREF(_global_self);
    $result = _global_self;
  }
  else {
    // Looks like it wasn't really fluent here!
    $result = SWIG_NewPointerObj($1, $descriptor, $owner);
  }
%}

%apply FLUENT& { FluentClass& };

%include "fluent.h"

Using %apply here makes the control of where this gets used simple and generic.

As an aside you can also tell SWIG that your FluentClass::add function consumes its first argument and creates a new one, using:

%module test

%{
#include "fluent.h"
%}

%include <std_string.i>

%delobject FluentClass::add;
%newobject FluentClass::add;

%include "fluent.h"

Which generates more correct code in a much simpler way by decoupling the death of the first proxy from the real delete call. Equally though it's more verbose to have to write this for every method and it wouldn't be correct in all cases still, even though in my test case it is correct e.g.

f1=test.FluentClass()
f2=f.add("hello").add("world") # f2 is another proxy object, which now owns
f3=f1.add("again") # badness starts here, two proxies own it now....