tensorflow architecture deep-learning production

deep learning model in production, backend or frontend?

I recently built a web, in which a user can upload photos and then there will be a POST request sent to backend for further predictions of the photos. Currently, the use case of the web is like ... someone opens the browser in their phone, takes the photo with their phone and upload. So basically the web is run on browser in a phone but not a computer.

Backend : keras+flask+gunicorn+nginx hosted on a GPU powered machine (1080 Ti*2)

My question is.. is this a good architecture in terms of speed ? I've heard someone said that the POST request would be very slow due to the fact that sending photo via http is slow.

I wonder if loading model on client side with Tensorflow.js is a better choice ? It looks great since there is no need to POST photos to backend, but It also means my GPU would not be used ? I've searched on the Internet but couldn't find any reference or comparison

Thank you!

Solution

There are many variables to consider. The key being how many user requests you expect to service per minute. The bottleneck in the system will be 'prediction' as you've termed it. Prediction speed will vary depending on many factors e.g. image resolution and algorithm complexity. You should do some simple tests. Build an algorithm for the type of prediction you want to do e.g. classification, detection, segmentation, etc. There are stock algorithms available which balance speed vs performance. It will give you a sense of whats possible. From memory, on a single 1080ti gpu machine, a ssd detection algorithm takes less 1 sec (perhaps even 0.2sec) for high-resolution images. Build your system diagram, identify key risks and perform tests for risk identified.