Data from pod are not persisting in my local machine.
I'm relatively new to Kubernetes. I'm using it to deploy a cluster for data processing. I know there might be better practices for this, so any advice would be greatly appreciated!
The main issue I'm facing is that I've defined a PersistentVolume (PV) and PersistentVolumeClaim (PVC) to persist data from my Hadoop node (I'm starting with the NameNode for now). I previously tried using storageClass, but I wasn't successful, so now I'm sticking with PV and PVC.
My goal is to persist the main metadata created by the NameNode after it formats itself. The reason for this is that if I try to reapply the cluster, the DataNodes will synchronize with the NameNode again without any issues regarding cluster ID, thus avoiding the need to reformat every time. So, I want to check if there are any metadata files to prevent reformatting the NameNode repeatedly and avoid synchronization issues with the DataNodes.
However, even though the manifests show the volume declaration, and the PVC and PV seem to be set up correctly (at least to my knowledge), I don't see any files on my local machine to manage the HDFS NameNode format. (I only want to format it once to have a single cluster ID.)
I'm not sure what I'm doing wrong or what needs to be fixed.
hadoopDeployment for namenode:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: hadoop-namenode
labels:
app: hadoop
role: namenode
spec:
replicas: 1
selector:
matchLabels:
app: hadoop
role: namenode
serviceName: "hadoop-namenode-service"
template:
metadata:
labels:
app: hadoop
role: namenode
spec:
volumes:
- name: hadoop-namenode-storage
persistentVolumeClaim:
claimName: hadoop-pvc-namenode
containers:
- name: namenode
image: chrlrwork/hadoop-ubuntu-3.4.1:0.0.7
ports:
- containerPort: 9000
- containerPort: 9870
- containerPort: 9864
volumeMounts:
- mountPath: "/opt/hadoop/data/hdfs/"
name: hadoop-namenode-storage
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "500m"
command:
- "/bin/bash"
- "/opt/hadoop/start-service.sh"
---
apiVersion: v1
kind: Service
metadata:
name: hadoop-namenode-service
labels:
app: hadoop
role: namenode
spec:
selector:
app: hadoop
role: namenode
type: NodePort
ports:
- protocol: TCP
port: 9870
targetPort: 9870
nodePort: 32070
PVC and PV manifests:
apiVersion: v1
kind: PersistentVolume
metadata:
name: hadoop-pv-namenode
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
hostPath:
path: "/mnt/hadoop/namenode"
type: DirectoryOrCreate
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hadoop-pvc-namenode
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
I have to mention I gave all possible permisions to my local directory. If you have a better idea to do what Im trying I will be pleasure to read u!
Thanks in advance!!
know there might be better practices for this, so any advice would be greatly appreciated!
Mmhhmm
better idea to do what Im trying I will be pleasure to read
Yeah, I do and it's not HDFS.