Search code examples
recorderopenmdao

OpenMDAO 1.x: recording in parallel


When running an analysis under MPI with distributed components in a ParallelGroup, I get an error when adding a DumpRecorder to the analysis. Below is a small example that demonstrates this (this was run with the latest master branch commit aaa67a4d51f4081e9e41b250b0a76b077f6f0c21 from 28/10/2015):

import numpy as np
from openmdao.core.mpi_wrap import MPI
from openmdao.api import Component, Group, DumpRecorder, Problem, ParallelGroup


class Sliced(Component):

    def __init__(self):
        super(Sliced, self).__init__()

        self.add_param('x', 0.)
        self.add_output('y', 0.)

    def solve_nonlinear(self, params, unknowns, resids):

        unknowns['y'] = params['x'] * 2.


class VectorComp(Component):

    def __init__(self, size):
        super(VectorComp, self).__init__()

        self.add_param('xin', np.zeros(size))

        self.add_output('x', np.zeros(size))

    def solve_nonlinear(self, params, unknowns, resids):

        unknowns['x'] = params['xin'] * 2.


class Analysis(Group):

    def __init__(self, size):
        super(Analysis, self).__init__()

        self.add('v', VectorComp(size), promotes=['*'])

        par = self.add('par', ParallelGroup())
        for i in range(size):
            par.add('sec%02d' % i, Sliced())
            self.connect('x', 'par.sec%02d.x' % i, src_indices=[i])


if __name__ == '__main__':

    if MPI:
        from openmdao.core.petsc_impl import PetscImpl as impl
    else:
        from openmdao.core.basic_impl import BasicImpl as impl

    p = Problem(impl=impl, root=Analysis(4))

    recorder = DumpRecorder('optimization.log')
    # adding specific includes works, but leaving it out results in a crash
    # recorder.options['includes'] = ['x']
    p.driver.add_recorder(recorder)
    p.setup()
    p.run()

The error which is raised is:

RuntimeError: Cannot access remote Variable 'par.sec00.x' in this process.

I see that the recorder dumps a file per processor, so shouldn't the BaseRecorder._filter_vectors method filter out params not present on a specific processor? I'm not yet familiar enough with the code to propose a fix, so I hope the OpenMDAO devs can easily figure out what goes wrong.

Manually specifying the includes works since the Sliced parameters are then excluded, but it would be nice that this was not necessary, and dealt with under the hood.

I also want to let you guys know how excited we are about the new framework. It is so much faster that the 0.x version, and the parallel FD feature is much appreciated and works like a charm!


Solution

  • There were some recent changes that broke the dump recorder in parallel. We put a story up for someone to fix it, but in the meantime, you might want to try the SqliteRecorder recorder. It's what I have been using for performance testing on CADRE. You set it up the same way, but then to read the values using an sqlitedict. There is a small example in the docs, but a more practical example is here in the CADRE code:

    https://github.com/OpenMDAO/CADRE/blob/master/plot_progress.py