Search code examples
drake

Segfault when QueryObject leaves function scope


The following code segfaults:

from pydrake.all import (
    Diagram,
    DiagramBuilder,
    AddMultibodyPlantSceneGraph,
    UnitInertia,
    SpatialInertia,
    MultibodyPlant,
    SceneGraph,
    QueryObject,
)
import numpy as np


def get_query_object(
    diagram: Diagram,
    plant: MultibodyPlant,
    scene_graph: SceneGraph,
    q: np.ndarray,
) -> QueryObject:
    context = diagram.CreateDefaultContext()
    plant_context = diagram.GetMutableSubsystemContext(plant, context)
    plant.SetPositions(plant_context, q)
    sg_context = diagram.GetMutableSubsystemContext(scene_graph, context)
    query_object = (
        scene_graph.get_query_output_port().EvalAbstract(sg_context).get_value()
    )
    return query_object


builder = DiagramBuilder()

plant, scene_graph = AddMultibodyPlantSceneGraph(builder, 1e-4)
plant.AddRigidBody(
    "body",
    SpatialInertia(1, [0, 0, 0], UnitInertia(1, 1, 1)),
)

plant.Finalize()
diagram = builder.Build()

q = np.zeros(7)

query_object = get_query_object(
    diagram=diagram,
    plant=plant,
    scene_graph=scene_graph,
    q=q,
)
signed_distance_pair = query_object.ComputeSignedDistancePairwiseClosestPoints()

The segfault can be avoided by inlining the contents of get_query_object or by executing ComputeSignedDistancePairwiseClosestPoints inside of it, so I take it that the problem lies in doing geometric queries using a QueryObject that has left the scope in which it was created. The Drake documentation has a warning about QueryObject which I believe relates to this:

"The const reference returned by the input port is considered 'live' - it is linked to the context, system, and cache (making full use of all of those mechanisms). This const reference should never be persisted; doing so can lead to erroneous query results. It is simpler and more advisable to acquire it for evaluation in a limited scope (e.g., CalcTimeDerivatives()) and then discard it. If a QueryObject is needed for many separate functions in a LeafSystem, each should re-evaluate the input port. The underlying caching mechanism should make the cost of this negligible."

Here's my usecase: I wrote a bunch of functions that use geometry queries, like non-penetration constraints and contact jacobians. All of these use a lot of boilerplate for setting up a QueryObject, so I thought I'd wrap that boilerplate up in a get_query_object function, which the other functions would call and then run whichever geometric query they needed in it. This sounds safe to me, as each function is setting up its own QueryObject and it doesn't exit the scope of that function, so we shouldn't run into the problem described in the documentation. But it clearly doesn't work---so what is the recommended workflow?


Solution

  • The problem you have is that get_query_object creates a context that gets thrown out upon return. At the highest level, you should think of QueryObject as a view into the Context that can perform specific queries; it stores no data. No Context and your QueryObject dies.

    Your function attempts to do two things:

    1. Set some positions for a multibody plant.
    2. Get a query object so you can compute signed distance.

    Although you're tripping over the fact that the query object is now dead, your other problem is that the body positions you set got destroyed with the context as well.

    I'd recommend the following changes:

    1. Create the context and store it at a similar scope as the diagram, plant, and scene graph.
    2. Get the query object associated with that context and just hang onto it. It will be valid and "live" as long as the context (i.e., it will always be a view into that context's data). Alternatively, as the documentation suggests, QueryObject instances are cheap. You can just go get one from the context whenever you need one.
    3. Create a method whose sole purpose is to set the body positions for a plant in that same context.

    This should give you the functionality you're looking for.

    The warning in the documentation you pasted refers to the fact that the query object is live. If you hang on to that reference and the data in the context changes you'll get different answers. The answers will always reflect the current state of the associated context.