Search code examples
glusterfs

Is this shared storage volume preventing self-heal from completing in GlusterFS?


I've inherited a 10-node Gluster cluster (v 3.8.13) as part of a new-ish gig. The main problem I've encountered is that the nfs-ganesha service on one node routinely becomes unresponsive and requires a restart. Investigating this led me down a path towards checking the cluster health, and I found a very long list of files that require healing.

But I can't seem to do a heal on the ever-growing list of files.

Attempting to execute a heal with gluster volume heal volx yields an immediate warning about disabled bricks:

Launching heal operation to perform index self heal on volume volx has been unsuccessful on bricks that are down. Please check if all brick processes are running

When I check gluster volume status, all the bricks in the 'volx' volume are up, and the only suspicious thing is a message about shared storage:

Volume gluster_shared_storage is not started

I do see an entry in /etc/fstab for the mount:

1xx.1xx.1.xx:/gluster_shared_storage /var/run/gluster/shared_storage/ glusterfs defaults        0 0

but it's not mounted as we speak.

It seems like someone has attempted to enable shared storage, but either brought the volume/mount down on purpose, or it kicked the bucket. I just don't want to remove the volume and find out it was critical (and/or find out that a heal does nothing to help the regular ganesha crashes). This is part of a production system, so I have to tread lightly here.

Are these two unrelated problems? It only said that healing was "unsuccessful on bricks that are down", so maybe it's healing the ones that are up? Is there a way to check?

Any insight into the ganesha crashes would be helpful too, but for the moment I'd settle for anything I can learn about Gluster.

UPDATE: docs seem to indicate that you need this shared storage volume to use nfs-ganesha:

Ensure that the following pre-requisites are taken into consideration before you run NFS-Ganesha in your environment: ...

  • Create and mount a gluster shared volume.

Sure does feel like A) I should keep the shared storage volume, and B) it needs to be started per ganesha requirements. I just don't want to start flipping switches on a production system that is mostly intact (if not optimized).


Solution

  • Just as a follow up to any brave souls who run into this with Gluster, the aforementioned volume WAS bogus and completely inactive.