Search code examples
zfsiostatomnios

device naming zfs vs smartctl vs iostat


I'm working on gathering performance metric data from a SAN (5.11 omnios-7648372 ). we use dataon JBOD.

a snip of Output of zdb -C mypoolname :

children[0]:
    type: 'disk'
    id: 0
    guid: 7701924650939559899
    path: '/dev/dsk/c1t0d0s0'
    devid: 'id1,sd@n5000c5004cce9b53/a'
    phys_path: '/pci@0,0/pci8086,25f7@2/pci8086,350c@0,3/pci1000,3030@1/sd@0,0:a'
    whole_disk: 1
    DTL: 599
    create_txg: 4

Focusing at the 'path' part, it say '/dev/dsk/c1t0d0s0' I asume the device name is c1t0d0s0 and it 'match' with smartctl ... unless smart ctl said the path is /dev/rdsk/c1t0d0s0

But 'iostat -extnc 3 1' naming the device differently : c1t0d0

Kindly please tell me why the 3 of them did not use the same name for one same device ?


Solution

  • zdb is showing you the path that ZFS uses internally to address the device, which is a path in the /dev filesystem that allows access to the block device file. (By the way, zdb is mainly meant as a debugging tool, and isn’t guaranteed to have backward compatibility for anything it prints.) It's addressing using the disk slice suffix s0, which is why that appears in those listings. I believe disk slices are basically the same as partitions. Slice 0 is usually the whole disk when you're using ZFS, since when you add an entire device into your zpool, ZFS will automatically format the device for you and it only needs / creates one slice. However, it's also possible to add just a single partition of a disk to your zpool (or even more stupidly, multiple partitions on the same drive as separate vdevs), so ZFS has to track which partition(s) it's actually in control of.

    iostat is showing you just the device name, without the /dev path or the slice number. This is because iostat doesn't know about slices, and it just looks at the actual device when it collects its data.

    I don't know what's up with smartctl. I would expect it to work at the device level like iostat, but maybe it's getting data passed to it through ZFS, which uses the slice number. This seems like a bug to me, since ideally you would have errors being reported in the smallest failure domain that they’re a part of, which in this case is the disk. (Although at least it’s easy to work around.)