‘The host returns esxupdate error code:15.’ with VSAN and USB Boot Disks

I was recently having some difficulty with VMware Update Manager. Remediating each host in my lab failed with The host returns esxupdate error code:15. This was one of those very satisfying situations that I managed to (slowly) debug myself, as there were a few environmental specifics that Google couldn’t quite relate to-

  • Each host in the cluster is running ESXi 6.0u2 (but other versions may be affected)
  • Each host in the cluster boots from a USB memory pen
  • Each host in the cluster is contributing storage to VSAN

The first step in debugging this was to tail logs. Enable SSH for any affected host; either through vCenter or the new ESXi embedded host client. SSH to the host, then tail logs with

tail -f /var/log/esxupdate.log

Run the VUM ‘Remediate’ action in vCenter while watching the log

Log output is quite verbose, but watch carefully and you may find the first clue

esxupdate: LockerInstaller: WARNING: There was an error in cleaning up product locker: [Errno 2] No such file or directory: '/locker/packages/var/db/locker'

This indicates that a few files may be missing. This lead me to VMware KB 2030665. Running

ls /locker/packages/

should show the folder 6.0.0 (on version 6.0 of ESXi) but in my case, didn’t.

Let’s start with the basics and ensure the necessary symbolic links are in place

ls -l /

Should show

locker -> /store

and

store -> /vmfs/volumes/(long volume number)

If not, symbolic links can be recreated with this syntax

ln -s /store /locker

One of my hosts shows

lrwxrwxrwx    1 root     root            49 Sep 12 13:25 store -> /vmfs/volumes/571bb679-563d7a10-e27b-0cc47aaaeef0

This is the volume where /locker is stored

df -h

Should show this partition and it’s free space

vfat       285.8M 227.3M     58.5M  80% /vmfs/volumes/571bb679-563d7a10-e27b-0cc47aaaeef0

212.1MB of free space is required on this volume for the missing files. It seems that in my case, the required files were being overwritten in favour of retaining other files.

Cormac Hogan’s great post on booting from USB with VSAN led me to the reason for a lack of free space. VSAN traces are so IO intense that they would quickly burn out a USB drive, so aren’t be copied to disk in realtime as they would be when using persistent storage. Instead they are stored in /scratch, which should reside in memory, then moved to the USB boot drive upon shutdown/reboot. To verify this, use (replacing the volume to suit your environment)

ls -l /vmfs/volumes/571c20bc-a4e14648-df22-0cc47aaafeb0/vsantraces

You will likely see a lot of VSAN trace files from around the last time the host was rebooted. What I will now describe is being done in a lab environment. Definitely reconsider before doing this in a production environment but either is at your own risk! Now, these files… VSAN trace files are extremely useful to VMware support when debugging VSAN issues but may as well be written in Russian as far as I’m concerned, (I don’t speak Russian, just so we’re clear) so I’m going to delete them to free up space.

SCP can then be used to copy the /locker/packages/6.0.0 folder from a work(ing) host.

If this is a recurring issue and you have no interest in VSAN traces, you can prevent the service that copies them to the USB disk on shutdown/reboot from starting at boot

chkconfig vsantraced off

This can be reversed with

chkconfig vsantraced on

The service can be temporarily stopped, too

/etc/init.d/vsantraced stop

It will start again at boot, or can be started manually

/etc/init.d/vsantraced start

In a production environment the required partition should be extended or alternatively, move the boot disk to a hard disk.

Leave a Reply

Your email address will not be published. Required fields are marked *