ISCSI, Delayed ACK and the Confused VMware Hypervisor

I’ve recently had the pleasure of taking over management of a customer’s DR environment. It’s in pretty good shape, mostly. Much more modern hardware than I often see, save for the iSCSI VM storage. iSCSI is fine 99% of the time but controller failover is slow and there are a multitude of tiny tweaks that can have a big impact on performance. A good time to talk about todays obscure finding then.

Delayed ACK. Maybe you’ve heard of it. Maybe you’ve brushed past the setting. Maybe you couldn’t care less. Essentially it enables delaying a TCP ACK message until the end of every other data segment rather than trying to ACK every byte. I’m sure someone with a computer science degree (wait, that’s me…) or a fetish for reciting the GENEVE header structure could tell me I’m “technically correct, except…” but that’s not what we’re here for.

Most major storage providers set their best practice at disabling Delayed ACK. You can verify current settings in the vSphere Web Client under (select host) > Configure > Storage Adapters > (select iSCSI adapter) > Advanced Options > DelayedAck

Oh good, my work here is already done.

Wait, what?

According to VMware’s documentation, the process to disable DelayedAck involves placing a host into maintenance mode, disabling DelayedAck, rebooting and slinging the host back into production; but that doesn’t seem to be the case. The GUI is showing that DelayedAck is disabled but it isn’t applying to adapters with active LUNs.

The solution is as follows. This needs to be performed on all hosts connected over iSCSI.

Place the desired host into maintenance mode.

Remove all static and dynamic iSCSI servers (yes, really).

Ensure DelayedAck is disabled on the desired iSCSI adapter.

Re-add iSCSI targets.

Ensure DelayedAck setting for each target is “inherit from parent”

Rescan HBA to rediscover LUNs.

Confirm DelayedAck is false / 0 at CLI with

vmkiscsid --dump-db | grep Delayed

SCSI commands to switch paths settings will need to be reapplied if applicable

esxcli storage nmp psp roundrobin deviceconfig set --device naa.<LUN ID> --type=iops --iops=1

Leave a Reply

Your email address will not be published. Required fields are marked *