HuggingFace model in Flask multiprocess app doesn't return a result

I have a Flask app, within which I'm trying to load a HuggingFace SentenceTransformer.

class Encoder():
   def __init__(self):
    self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

   def encode(self, text):
    return self.model.encode(text)

encoder = Encoder()

@app.route("/get_encoded_vector", methods=['POST'])
def get_encoded_vector():
   data = json.loads(request.data)
   text = data['text']

   embeddings = encoder.encode(text)
   return json.dumps({'embeddings': str(embeddings)})

if __name__ == '__main__':
   app.run(port=5000, threaded=False, processes=4)

If I run the app and request the API to return result, it gets stuck. However, if I make no other change but the following:

 app.run(port=5000, threaded=False, processes=0)

it runs fine for me and returns result. I suspect this an issue with how the library deals with multi-processing vs how its setup in Flask. How can I make this work for multiple processes?

Solution

Found the solution here: https://github.com/UKPLab/sentence-transformers/issues/1318 ;

It emanates from torch's own multithreading setup. To avoid the issue, add the following to your code:

import torch
torch.set_num_threads(1)