December/January Maintenance Window 2: How To Clean Everything #30
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This issue exists to track our plan for an upcoming maintenance window where we will fix the underlying cause of our disks wearing out or die trying (and restore from backup).
this is the primary reason why this maintenance is being performed:
upgrade qemu on baikal to fix virtio-blk thrashing our disks
There are a couple other changes we would like to make which also require stopping all customer VMs and/or restarting baikal:
Capsul outage mitigation: need a way to shutdown the server
Finally, there are some other changes which are "nice to haves", they are not required for this maintenance, but if we have time I would like to get them fully in place before the maintenance.
Capsul outage mitigation: capsul hub's database should not run on a capsul
backup and rollback strategy
So, we will need the ability to boot from another drive in order to restore the backup in the case that things do not work. Baikal is not in UEFI mode, so we will have to have a KVM hooked up in order to do this:
So we'll need to coordinate with CyberWurx. IMO we should ask them to hook up KVM for us pre-emptively.
Either we can ask them to insert a linux recovery USB for us, or we can potentially maybe boot from an ISO file that sits as a file on the "normal" boot partition.
We have taken a full back up of the boot drive and cyberwurx is standing by to assist with KVM and recovery OS if needed.
Ok this deployment was carried out and succeeded... ish.
We got debian, QEMU, and libvirt upgraded, but unfortunately it did not seem to fix our problem with the trims and discards . NEW capsuls have discard support OOTB, but existing ones don't, so it doesn't really fix our problem unfortunately.
We also tried to deploy the systemd drop-in for fixing the shutdown process of libvirt-guests, but it was still calling the old script not our new one..
systemctl status
was showing the drop in, but it wasn't actually overriding theExecStop
script we had specified.