We have done deployment for Vespa using Kubernetes on the GKE cluster with 3 nodes while creating a Dockerfile we took Vespa 7.351.32 version as a base image and added a few more things to it
The workspace folder contains all the necessary .xml and other files required for the Vespa deployment.
Below are the steps we execute inside three PODs to deploy and restart the config server
/opt/vespa/bin/vespa-deploy prepare /workspace && /opt/vespa/bin/vespa-deploy activate
wait (5 min)
vespa-stop-services
vespa-stop-configserver
wait(15min)
vespa-start-configserver
vespa-start-services
vespa-get-cluster-state
vespa-config-status
Then we receive the following error.
Please find below the screenshot for the connectivity to 2181 ports on all three pods.
Upon further inspection of logs(using vespa-logfmt -l error), we found that com.yahoo.container.handler.threadpool.threadpool.DefaultContainerTHreadpool
bundle fails to load. Manually restarting the config server and Vespa services seems to solve the issue.
Attaching the related log below.
Please help us in understand the following points:
Does some service need to be running before this bundle is loaded?
Is there a path issue? If so where can we find this bundle?
Is this because of any memory issue(we have the recommended 4G)?
How does vespa load these bundles?
Below are the additional details. for the setup.
FROM vespaengine/vespa:7.351.32
#Copy Neccessary Files
RUN mkdir -p workspace
COPY workspace /workspace
RUN yum install python3
COPY backup-pod.sh /
# Downloading gcloud package
RUN curl https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz > /tmp/google-cloud-sdk.tar.gz
# Installing the package
RUN mkdir -p /usr/local/gcloud \
&& tar -C /usr/local/gcloud -xvf /tmp/google-cloud-sdk.tar.gz \
&& /usr/local/gcloud/google-cloud-sdk/install.sh
# Adding the package path to local
ENV PATH $PATH:/usr/local/gcloud/google-cloud-sdk/bin
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vespa
namespace: vespa
labels:
app: vespa
spec:
replicas: 3
#serviceName: vespa
selector:
matchLabels:
app: vespa
name: vespa-internal
serviceName: vespa-internal
template:
metadata:
labels:
app: vespa
name: vespa-internal
spec:
serviceAccount: vespa-sa
# nodeSelector:
# iam.gke.io/gke-metadata-server-enabled: "true"
containers:
- name: vespa
image: asia-south1-docker.pkg.dev/aurum-projec/vespa/vespa:latest
imagePullPolicy: Always
securityContext:
privileged: true
ports:
- containerPort: 8080
protocol: TCP
readinessProbe:
httpGet:
path: /ApplicationStatus
port: 19071
scheme: HTTP
volumeMounts:
- name: vespa-var
mountPath: /opt/vespa/var
- name: vespa-logs
mountPath: /opt/vespa/logs
resources:
requests:
memory: "2G"
limits:
memory: "2G"
volumeClaimTemplates:
- metadata:
name: vespa-var
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
- metadata:
name: vespa-logs
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
That message comes on startup, not reconfig, and relates to one of our bundles which is always present and which does consume significant resources on construction, so yes you are probably running out of memory.
To be clear, 4Gb isn't recommended, it is the minimum you can get away with for trying it out.
Also note that you don't need this complex, time-consuming process for deploying changes - just deploy prepare+activate is sufficient and will also work without disrupting queries and writes so that you can do it in production.