Search code examples
dockerkubernetesazure-storageazure-akscdap

storage class in aks can't chown a directory


i hope you're doing okay

im trying to build a cdap image that i havein gitlab in aks using argocd

the build works in my local kubernetes cluster with rook-ceph storage class but with managed premium storage class in aks it seems that something is wrong in permissions

here is my storage class :

#The default value for fileMode and dirMode is 0777 for Kubernetes #version 1.13.0 and above, you can modify as per your need
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: azurefile-zrs
provisioner: kubernetes.io/azure-file
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=0
  - gid=0
  - mfsymlinks
  - cache=strict
parameters:
  skuName: Standard_LRS

here is my statfulset :

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: {{ .Release.Name }}-sts
  labels:
    app: {{ .Release.Name }}
spec:
  revisionHistoryLimit: 2
  replicas: {{ .Values.replicas }}
  updateStrategy:
    type: RollingUpdate
  serviceName: {{ .Release.Name }}
  selector:
    matchLabels:
      app: {{ .Release.Name }}
  template:
    metadata:
      labels:
        app: {{ .Release.Name }}
    spec:
      imagePullSecrets:
        - name: regcred-secret-argo
      volumes:
        - name: nginx-proxy-config
          configMap:
            name: {{ .Release.Name }}-nginx-conf
      containers:
        - name: nginx
          image: nginx:1.17
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 80
            - containerPort: 8080
          volumeMounts:
            - name: nginx-proxy-config
              mountPath: /etc/nginx/conf.d/default.conf
              subPath: default.conf
        - name: cdap-sandbox
          image: {{ .Values.containerrepo }}:{{ .Values.containertag }}
          imagePullPolicy: Always
          resources:
            limits:
              cpu: 1000m
              memory: 8Gi
            requests:
              cpu: 500m
              memory: 6000Mi
          readinessProbe:
            httpGet:
              path: /
              port: 11011
            initialDelaySeconds: 30
            periodSeconds: 20  
          volumeMounts:
            - name: {{ .Release.Name }}-data
              mountPath: /opt/cdap/sandbox/data
          ports:
            - containerPort: 11011
            - containerPort: 11015
  volumeClaimTemplates:
    - metadata:
        name: {{ .Release.Name }}-data
      spec:
        accessModes: 
          - ReadWriteMany
        resources:
          requests:
            storage: "300Gi"

the problem is the cdap pod can't change ownership of a directory
here are the logs :

Fri Oct 22 13:48:08 UTC 2021 Starting CDAP Sandbox ...LOGBACK: No context given for io.cdap.cdap.logging.framework.local.LocalLogAppender[io.cdap.cdap.logging.framework.local.LocalLogAppender]
55
log4j:WARN No appenders could be found for logger (DataNucleus.General).
54
log4j:WARN Please initialize the log4j system properly.
53
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
52
2021-10-22 13:48:56,030 - ERROR [main:i.c.c.StandaloneMain@446] - Failed to start Standalone CDAP
51
Failed to start Standalone CDAP
50
com.google.common.util.concurrent.UncheckedExecutionException: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Error applying authorization policy on hive configuration: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/cdap/sandbox-6.2.3/data/explore/tmp/cdap/7f546668-0ccc-45ae-8188-9ac12af4c504': Operation not permitted
49
48
    at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015)
47
    at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
46
    at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
45
    at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106)
44
    at io.cdap.cdap.StandaloneMain.startUp(StandaloneMain.java:300)
43
    at io.cdap.cdap.StandaloneMain.doMain(StandaloneMain.java:436)
42
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
41
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
40
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
39
    at java.lang.reflect.Method.invoke(Method.java:498)
38
    at io.cdap.cdap.StandaloneMain.main(StandaloneMain.java:418)
37
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Error applying authorization policy on hive configuration: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/cdap/sandbox-6.2.3/data/explore/tmp/cdap/7f546668-0ccc-45ae-8188-9ac12af4c504': Operation not permitted
36
35
    at com.google.common.util.concurrent.Futures.wrapAndThrowUnchecked(Futures.java:1015)
34
    at com.google.common.util.concurrent.Futures.getUnchecked(Futures.java:1001)
33
    at com.google.common.util.concurrent.AbstractService.startAndWait(AbstractService.java:220)
32
    at com.google.common.util.concurrent.AbstractIdleService.startAndWait(AbstractIdleService.java:106)
31
    at io.cdap.cdap.explore.executor.ExploreExecutorService.startUp(ExploreExecutorService.java:99)
30
    at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)
29
    at java.lang.Thread.run(Thread.java:748)
28
Caused by: java.lang.RuntimeException: Error applying authorization policy on hive configuration: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/cdap/sandbox-6.2.3/data/explore/tmp/cdap/7f546668-0ccc-45ae-8188-9ac12af4c504': Operation not permitted
27
26
    at org.apache.hive.service.cli.CLIService.init(CLIService.java:114)
25
    at io.cdap.cdap.explore.service.hive.BaseHiveExploreService.startUp(BaseHiveExploreService.java:309)
24
    at io.cdap.cdap.explore.service.hive.Hive14ExploreService.startUp(Hive14ExploreService.java:76)
23
    ... 2 more
22
Caused by: java.lang.RuntimeException: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/cdap/sandbox-6.2.3/data/explore/tmp/cdap/7f546668-0ccc-45ae-8188-9ac12af4c504': Operation not permitted
21
20
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
19
    at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:127)
18
    at org.apache.hive.service.cli.CLIService.init(CLIService.java:112)
17
    ... 4 more
16
Caused by: ExitCodeException exitCode=1: chmod: changing permissions of '/opt/cdap/sandbox-6.2.3/data/explore/tmp/cdap/7f546668-0ccc-45ae-8188-9ac12af4c504': Operation not permitted
15
14
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:972)
13
    at org.apache.hadoop.util.Shell.run(Shell.java:869)
12
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170)
11
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:1264)
10
    at org.apache.hadoop.util.Shell.execCommand(Shell.java:1246)
9
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:771)
8
    at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:515)
7
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:555)
6
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:533)
5
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:313)
4
    at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:639)
3
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:574)
2
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
1
    ... 6 more

i don't know why it can't change permission

i would appreciate any kind of help i can get because im stuck at this and i have no idea how to fix this rather than installing a new provisioner which i really don't want to do

thank you in advance and hope a good day for you all


Solution

  • after a lot of testing i changed the storage class i installed rook-ceph using : this procedure

    note: you have to change the image version in cluster.yaml from ceph/ceph:v14.2.4 to ceph/ceph:v16