I have a basic tensorflow serving docker container exposing a model on a kubernetes pod.
FROM tensorflow/serving:2.6.0
RUN mkdir /serving_model
WORKDIR /serving_model
COPY src/serving_model /serving_model
EXPOSE 5225 #(5225 is the port all the pods talk to each other on)
ENTRYPOINT tensorflow_model_server --rest_api_port=5225 --model_name=MyModel --model_base_path=/serving_model/
It is called by python service running on another pod.
def call_tensorflow_serving(self, docker_pod_url: str, input: dict) -> Response:
response = requests.post(
f"{docker_pod_url}/v1/models/MyModel:predict",
data=json.dumps(input),
)
return response
Once in a while, this actually succeeds, but mostly the python service fails to retrieve a response from tensorflow serving with the following error:
Traceback (most recent call last):
File "/python-dependencies/lib/python3.8/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/python-dependencies/lib/python3.8/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/python-dependencies/lib/python3.8/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/http/client.py", line 1348, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 316, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 285, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
Since it seems like the tensorflow serving container is timing out and closing the connection, is there any way to extend the time to complete a request?
Also, the logs of the tensorflow serving pod do not show anything once the startup is complete:
2022-03-01 16:03:00.345546: I tensorflow_serving/model_server/server.cc:89] Building single TensorFlow model file config: model_name: MyModel model_base_path: /serving_model/
2022-03-01 16:03:00.348593: I tensorflow_serving/model_server/server_core.cc:465] Adding/updating models.
2022-03-01 16:03:00.348622: I tensorflow_serving/model_server/server_core.cc:591] (Re-)adding model: MyModel
2022-03-01 16:03:00.449013: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: MyModel version: 6}
2022-03-01 16:03:00.449051: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: MyModel version: 6}
2022-03-01 16:03:00.449064: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: MyModel version: 6}
2022-03-01 16:03:00.449114: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /serving_model/6
2022-03-01 16:03:01.418230: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2022-03-01 16:03:01.418305: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /serving_model/6
2022-03-01 16:03:01.418961: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-01 16:03:04.924449: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:211] Restoring SavedModel bundle.
2022-03-01 16:03:08.716223: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:195] Running initialization op on SavedModel bundle at path: /serving_model/6
2022-03-01 16:03:11.024820: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 10573787 microseconds.
2022-03-01 16:03:11.321916: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /serving_model/6/assets.extra/tf_serving_warmup_requests
2022-03-01 16:03:11.816916: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: MyModel version: 6}
2022-03-01 16:03:11.820605: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2022-03-01 16:03:11.824554: I tensorflow_serving/model_server/server.cc:133] Using InsecureServerCredentials
2022-03-01 16:03:11.824604: I tensorflow_serving/model_server/server.cc:383] Profiler service is enabled
2022-03-01 16:03:11.840760: I tensorflow_serving/model_server/server.cc:409] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2022-03-01 16:03:11.856959: I tensorflow_serving/model_server/server.cc:430] Exporting HTTP/REST API at:localhost:5225 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
Is it possible to configure it to get more information?
================================================================ Additional information
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: tensorflow-server
name: tensorflow-server
namespace: app-namespace
spec:
replicas: 1
selector:
matchLabels:
app: tensorflow-server
strategy:
rollingUpdate:
maxSurge: 50%
maxUnavailable: 50%
type: RollingUpdate
template:
metadata:
labels:
app: tensorflow-server
name: tensorflow-server
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "5225"
spec:
containers:
- image: ...
imagePullPolicy: ...
name: tensorflow-server
resources:
limits:
cpu: "100m"
memory: "256Mi"
requests:
cpu: "100m"
memory: "256Mi"
I eventually caught the pod in the act. For a brief moment tensorflow-predictor reported itself as "Killed", before silently regenerating. Turns out the pod did not have enough memory, so the container was killing off tensorflow-predictor as described here whenever an actual query triggered it.