Search code examples
pythonmultithreadinggoogle-cloud-vision

Multi-threading python calls to Google Cloud Vision API


I am running a server in a Docker container supported by a macOS machine, from which I need to send several images for processing by the Google Cloud Vision API.

It is imperative for me to be able to minimize the time spent uploading and processing the images.

I started by wrapping calls to GCV in a Queue.Queue and Threading.Thread, but this sporadically crashes my code (not trapped by a python Exception) thus:

E1121 14:15:10.902211037   25448 sync_posix.c:38]            assertion failed: pthread_mutex_destroy(mu) == 0

According to several github threads this is a gRPC-inspired (or httplib?) bug/feature, but I can't find steps to an easy workaround - see e.g. https://github.com/grpc/grpc/issues/11184 and https://github.com/grpc/grpc/issues/10909

Given that this seems to be a commonly-experienced problem with no clear solution, what is the best way to mitigate it?

Is it as simple as using a single Batch call (but what about the overall speed?) to GCV? Is there no other way to safely thread the calls?

Update:

With fear of breakage, I embarked on a rollback of gRPC to v1.2.x a la

https://github.com/grpc/grpc/issues/10909#issuecomment-302581954

Except that I had to add /usr/local/lib to LD_LIBRARY_PATH in my Docker container in order to pick up libprotobuf.so.12.

I made no changes to any other python packages.

I then did:

pip install grpcio==1.2.1

Obviously this gets expensive, but the crash rate is <1/20 calls (and counting) compared to 1/3 calls before.

Now: How can I test conclusively that this is fixed?


Solution

  • Here is the relevant part of the Dockerfile I've gone with (based on https://github.com/grpc/grpc/issues/10909#issuecomment-302581954)

    # gRPC fix (rollback to v1.2.x to mitigate mutex crashing bug)
    RUN apt-get install -y build-essential autoconf libtool
    RUN apt-get install -y libgflags-dev libgtest-dev
    RUN apt-get install -y clang libc++-dev
    RUN apt-get install -y sudo unzip locate
    RUN mkdir -p /app/src
    
    WORKDIR /app/src
    RUN git clone --branch v1.2.x https://github.com/grpc/grpc
    
    WORKDIR /app/src/grpc
    RUN git submodule foreach git clean -xfd
    RUN git submodule update --init
    
    WORKDIR /app/src/grpc/third_party/protobuf
    RUN ./autogen.sh
    RUN ./configure
    RUN make -j2
    RUN sudo make install
    
    WORKDIR /app/src/grpc
    RUN sudo make install
    ENV LD_LIBRARY_PATH=${LD_LIBRARY}:/usr/local/lib
    RUN pip install --trusted-host pypi.python.org grpcio==1.2.1
    RUN pip install --trusted-host pypi.python.org google-cloud-vision
    

    I will leave this open for a few days to see if anyone responds.