Search code examples
vspherekubernetes-pod

vsphere-csi-controller fails to start due to "invalid memory address or nil pointer dereference"


I am installing k8s and vsphere CPI/CSI following the instructions located here

My setup: 2x centos 7.7 vSphere VM's (50g hd/16g ram), 1 master & 1 node in k8s cluster.

Made it to the part where I create the storageClass (near the end) when I discovered this github issue exactly. OP of the linked issue just started from scratch and their issue went away, so the report was closed. This has not been the case for me as I've redeployed my k8s cluster from scratch a bunch of times now and always hit this wall. Below is the error if you don't want to check the linked github issue.

Anyone have ideas on what I can try to get past this? I've checked my hd and ram and plenty there.

# kubectl -n kube-system logs pod/vsphere-csi-controller-0 vsphere-csi-controller
I0127 18:49:43.292667       1 config.go:261] GetCnsconfig called with cfgPath: /etc/cloud/csi-vsphere.conf
I0127 18:49:43.292859       1 config.go:206] Initializing vc server 132.250.31.180
I0127 18:49:43.292867       1 controller.go:67] Initializing CNS controller
I0127 18:49:43.292884       1 virtualcentermanager.go:63] Initializing defaultVirtualCenterManager...
I0127 18:49:43.292892       1 virtualcentermanager.go:65] Successfully initialized defaultVirtualCenterManager
I0127 18:49:43.292905       1 virtualcentermanager.go:107] Successfully registered VC "132.250.31.180"
I0127 18:49:43.292913       1 manager.go:60] Initializing volume.volumeManager...
I0127 18:49:43.292917       1 manager.go:64] volume.volumeManager initialized
time="2020-01-27T18:50:03Z" level=info msg="received signal; shutting down" signal=terminated
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x128 pc=0x867dc7]

goroutine 10 [running]:
google.golang.org/grpc.(*Server).GracefulStop(0x0)
        /go/pkg/mod/google.golang.org/grpc@v1.23.0/server.go:1393 +0x37
github.com/rexray/gocsi.(*StoragePlugin).GracefulStop.func1()
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:333 +0x35
sync.(*Once).Do(0xc0002cc8fc, 0xc000380ef8)
        /usr/local/go/src/sync/once.go:44 +0xb3
github.com/rexray/gocsi.(*StoragePlugin).GracefulStop(0xc0002cc870, 0x21183a0, 0xc000056018)
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:332 +0x56
github.com/rexray/gocsi.Run.func3()
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:121 +0x4e
github.com/rexray/gocsi.trapSignals.func1(0xc00052a240, 0xc000426990, 0xc000426900)
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:502 +0x143
created by github.com/rexray/gocsi.trapSignals
        /go/pkg/mod/github.com/rexray/gocsi@v1.0.0/gocsi.go:487 +0x107

Solution

  • Ok turns out this SIGSEGV was a bug or something and it was caused by a network timeout, making this error kind of a red herring.

    Details: My vsphere-csi-controller-0 pod was (and still is actually) unable to reach the vsphere server which caused the container in the pod to timeout and trigger this SIGSEV fault. The CSI contributers updated some libraries and the fault is now gone but the timeout remains. Timeout appears to be my problem and not related to CSI but that's a new question :)

    If you want the details of what was fixed in the CSI check the github link in the question.