I have a Flask app, within which I'm trying to load a HuggingFace SentenceTransformer.
class Encoder():
def __init__(self):
self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
def encode(self, text):
return self.model.encode(text)
encoder = Encoder()
@app.route("/get_encoded_vector", methods=['POST'])
def get_encoded_vector():
data = json.loads(request.data)
text = data['text']
embeddings = encoder.encode(text)
return json.dumps({'embeddings': str(embeddings)})
if __name__ == '__main__':
app.run(port=5000, threaded=False, processes=4)
If I run the app and request the API to return result, it gets stuck. However, if I make no other change but the following:
app.run(port=5000, threaded=False, processes=0)
it runs fine for me and returns result. I suspect this an issue with how the library deals with multi-processing vs how its setup in Flask. How can I make this work for multiple processes?
Found the solution here: https://github.com/UKPLab/sentence-transformers/issues/1318 ;
It emanates from torch's own multithreading setup. To avoid the issue, add the following to your code:
import torch
torch.set_num_threads(1)