upgrade qemu on baikal to fix virtio-blk thrashing our disks #29

Closed
opened 2021-12-16 17:22:45 +00:00 by forest · 5 comments
Owner

Right now when a guest machine issues a trim or discard operation to the disk, the virtio driver used for the vm's qcow2 image interprets that as "write a shit load of zeros to the disk" which counts as a write and makes the disk wear out a lot faster.

Instead we want the block storage driver to pass it as a trim/discard operation which is a special operation SSDs support in order to solve this issue: it simply unlinks the storage instead of writing over it with zeros, so it doesn't wear out as quickly.

This will require a maintenance window. Here are some notes:

j3s (he/him)
it sounds like virtio-scsi was meant to replace virtio-blk according to random stackoverflow sources.

j3s (he/him)
honestly remaining on virtio-blk would probably be fine for us
we don't reap any of the benefits of moving to scsi, and we lose a lot by having to have everyone migrate
i say we just do a full-upgrade, and then discard support should "just work" for virtio-blk

forest (he/him)
thanks for doing the research on QEMU virtio-blk and virtio-scsi
if we do a dist-upgrade, its basically just "pray that it works" right ?

forest (he/him)
Ok, so we can plan for another maintenance window in the future where we do the upgrade, make sure the grub isn't fcked up,
and then reboot the box ?
maybe we take a backup and notify cyberwurx about it as well?
or at very least just take a backup
and if it blows up we can call cyberwurx

j3s (he/him)
yep that’s fine - ill work on getting borg setup and working between baikal and maggy/gibson

Right now when a guest machine issues a trim or discard operation to the disk, the virtio driver used for the vm's qcow2 image interprets that as "write a shit load of zeros to the disk" which counts as a write and makes the disk wear out a lot faster. Instead we want the block storage driver to pass it as a trim/discard operation which is a special operation SSDs support in order to solve this issue: it simply unlinks the storage instead of writing over it with zeros, so it doesn't wear out as quickly. This will require a maintenance window. Here are some notes: > **j3s (he/him)** > it sounds like virtio-scsi was meant to replace virtio-blk according to random stackoverflow sources. > > **j3s (he/him)** > honestly remaining on virtio-blk would probably be fine for us > we don't reap any of the benefits of moving to scsi, and we lose a lot by having to have everyone migrate > i say we just do a full-upgrade, and then discard support should "just work" for virtio-blk > > **forest (he/him)** > thanks for doing the research on QEMU virtio-blk and virtio-scsi > if we do a dist-upgrade, its basically just "pray that it works" right ? > > **forest (he/him)** > Ok, so we can plan for another maintenance window in the future where we do the upgrade, make sure the grub isn't fcked up, > and then reboot the box ? > maybe we take a backup and notify cyberwurx about it as well? > or at very least just take a backup > and if it blows up we can call cyberwurx > > **j3s (he/him)** > yep that’s fine - ill work on getting borg setup and working between baikal and maggy/gibson
Author
Owner

Nyaaori ⚛️
it's important to note that with debian that you can't skip versions
so if you're on 9 and want to upgrade to 11 you have to upgrade to 10 first
you also need to fully upgrade your current version before you go to the next

forest (he/him)
good call

root@baikal:~# lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster
> **Nyaaori ⚛️** > it's important to note that with debian that you can't skip versions > so if you're on 9 and want to upgrade to 11 you have to upgrade to 10 first > you also need to fully upgrade your current version before you go to the next > **forest (he/him)** > good call > > ``` > root@baikal:~# lsb_release -a > No LSB modules are available. > Distributor ID: Debian > Description: Debian GNU/Linux 10 (buster) > Release: 10 > Codename: buster > ```
Author
Owner

Here is the current trajectory of the SSDs wearing out

https://prometheus.cyberia.club/graph?g0.expr=smartmon_total_lbas_written_raw_value%7B%7D&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=2d

so we should see those lines bend downward if the upgrade actually fixes something
smartmon_media_wearout_indicator_raw_value also sounds good and looks the same.

image

Here is the current trajectory of the SSDs wearing out https://prometheus.cyberia.club/graph?g0.expr=smartmon_total_lbas_written_raw_value%7B%7D&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=2d so we should see those lines bend downward if the upgrade actually fixes something `smartmon_media_wearout_indicator_raw_value` also sounds good and looks the same. ![image](/attachments/98f2402f-43c3-48ad-867d-909abfa93757)
Author
Owner

Ok this deployment was carried out and succeeded... ish.

We got debian, QEMU, and libvirt upgraded, but unfortunately it did not seem to fix our problem with the trims and discards . NEW capsuls have discard support OOTB, but existing ones don't, so it doesn't really fix our problem unfortunately.

Ok this deployment was carried out and succeeded... ish. We got debian, QEMU, and libvirt upgraded, but unfortunately it did not seem to fix our problem with the trims and discards . NEW capsuls have discard support OOTB, but existing ones don't, so it doesn't really fix our problem unfortunately.
Author
Owner

j3s (he/him)
i updated my personal (old) box to 3.15 and diffed the configs, this is what i got

<domain type='kvm' id='87'>				      |	<domain type='kvm' id='88'>
  <name>capsul-userpb9vz6</name>			      |	  <name>capsul-6kyez54nyx</name>
  <uuid>2dad9bd7-8ca2-4148-9c6e-fbb99acf98b1</uuid>	      |	  <uuid>f1b6f1cf-6e75-4c01-b81a-ecd41b9f2fcb</uuid>
  <memory unit='KiB'>2097152</memory>			      |	  <memory unit='KiB'>524288</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>	      |	  <currentMemory unit='KiB'>524288</currentMemory>
  <vcpu placement='static'>4</vcpu>			      |	  <vcpu placement='static'>1</vcpu>
    <type arch='x86_64' machine='pc-i440fx-3.1'>hvm</type>    |	    <type arch='x86_64' machine='pc-i440fx-5.2'>hvm</type>
      <source file='/tank/vm/capsul-userpb9vz6.qcow2' index=' |	      <source file='/tank/vm/capsul-6kyez54nyx.qcow2' index='
      <source file='/tank/vm/capsul-userpb9vz6.iso' index='1' |	      <source file='/tank/vm/capsul-6kyez54nyx.iso' index='1'
      <mac address='52:54:00:4f:9e:b6'/>		      |	      <mac address='52:54:00:e2:c7:f5'/>
      <source network='public2' portid='facc3920-176d-4012-9f |	      <source network='public3' portid='3333ebf3-392a-4294-9f
      <target dev='vnet86'/>				      |	      <target dev='vnet87'/>
      <source path='/dev/pts/30'/>			      |	      <source path='/dev/pts/82'/>
    <console type='pty' tty='/dev/pts/30'>		      |	    <console type='pty' tty='/dev/pts/82'>
      <source path='/dev/pts/30'/>			      |	      <source path='/dev/pts/82'/>
    <graphics type='vnc' port='5919' autoport='yes' listen='1 |	    <graphics type='vnc' port='5972' autoport='yes' listen='1
      <model type='qxl' ram='65536' vram='65536' vgamem='1638 |	      <model type='vga' vram='16384' heads='1' primary='yes'/
      <address type='pci' domain='0x0000' bus='0x00' slot='0x |	      <address type='pci' domain='0x0000' bus='0x00' slot='0x
    <label>libvirt-2dad9bd7-8ca2-4148-9c6e-fbb99acf98b1</labe |	    <label>libvirt-f1b6f1cf-6e75-4c01-b81a-ecd41b9f2fcb</labe
    <imagelabel>libvirt-2dad9bd7-8ca2-4148-9c6e-fbb99acf98b1< |	    <imagelabel>libvirt-f1b6f1cf-6e75-4c01-b81a-ecd41b9f2fcb<

oh shit i got it

to enable discard support, we must:

change machine='pc-i440fx-3.1' to machine='pc-i440fx-5.2'
stop the VM completely (not reboot)
start the VM again

we can easily do the change in the VM definitions. people might be annoyed if we force stop everyones VMs again

j3s (he/him) i updated my personal (old) box to 3.15 and diffed the configs, this is what i got ``` <domain type='kvm' id='87'> | <domain type='kvm' id='88'> <name>capsul-userpb9vz6</name> | <name>capsul-6kyez54nyx</name> <uuid>2dad9bd7-8ca2-4148-9c6e-fbb99acf98b1</uuid> | <uuid>f1b6f1cf-6e75-4c01-b81a-ecd41b9f2fcb</uuid> <memory unit='KiB'>2097152</memory> | <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>2097152</currentMemory> | <currentMemory unit='KiB'>524288</currentMemory> <vcpu placement='static'>4</vcpu> | <vcpu placement='static'>1</vcpu> <type arch='x86_64' machine='pc-i440fx-3.1'>hvm</type> | <type arch='x86_64' machine='pc-i440fx-5.2'>hvm</type> <source file='/tank/vm/capsul-userpb9vz6.qcow2' index=' | <source file='/tank/vm/capsul-6kyez54nyx.qcow2' index=' <source file='/tank/vm/capsul-userpb9vz6.iso' index='1' | <source file='/tank/vm/capsul-6kyez54nyx.iso' index='1' <mac address='52:54:00:4f:9e:b6'/> | <mac address='52:54:00:e2:c7:f5'/> <source network='public2' portid='facc3920-176d-4012-9f | <source network='public3' portid='3333ebf3-392a-4294-9f <target dev='vnet86'/> | <target dev='vnet87'/> <source path='/dev/pts/30'/> | <source path='/dev/pts/82'/> <console type='pty' tty='/dev/pts/30'> | <console type='pty' tty='/dev/pts/82'> <source path='/dev/pts/30'/> | <source path='/dev/pts/82'/> <graphics type='vnc' port='5919' autoport='yes' listen='1 | <graphics type='vnc' port='5972' autoport='yes' listen='1 <model type='qxl' ram='65536' vram='65536' vgamem='1638 | <model type='vga' vram='16384' heads='1' primary='yes'/ <address type='pci' domain='0x0000' bus='0x00' slot='0x | <address type='pci' domain='0x0000' bus='0x00' slot='0x <label>libvirt-2dad9bd7-8ca2-4148-9c6e-fbb99acf98b1</labe | <label>libvirt-f1b6f1cf-6e75-4c01-b81a-ecd41b9f2fcb</labe <imagelabel>libvirt-2dad9bd7-8ca2-4148-9c6e-fbb99acf98b1< | <imagelabel>libvirt-f1b6f1cf-6e75-4c01-b81a-ecd41b9f2fcb< ``` oh shit i got it to enable discard support, we must: change machine='pc-i440fx-3.1' to machine='pc-i440fx-5.2' stop the VM completely (not reboot) start the VM again we can easily do the change in the VM definitions. people might be annoyed if we force stop everyones VMs again
Author
Owner

Ok we eventually figured out how to do this for the majority of the existing VMS.

Most of the vms were pc-i440fx-3.1 and could be upgraded to pc-i440fx-5.2, which caused discards to start working on existing ones.

However some of the vms were the pc-q35 machine type ones, I was able to get 1 of those (elliot) to upgrade to pc-i440fx-5.2, however discards did not start working.

However all of this enabling discards did not seem to fix the problem. Having discard support is nice but it appears the real problem is some of our users REALLY like to write to the disk a lot.

Here is # of writes per 30m according to the disk itself
https://prometheus.cyberia.club/graph?g0.expr=deriv(smartmon_media_wearout_indicator_raw_value%7B%7D%5B30m%5D)&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1d&g0.end_input=2021-12-17%2007%3A52%3A45&g0.moment_input=2021-12-17%2007%3A52%3A45
image

Here is # of writes by capsul:
image
https://grafana.cyberia.club/d/jMw9xSRMz/capsul-stats?viewPanel=3&orgId=1&from=1638748463900&to=1639762070109

Ok we eventually figured out how to do this for the majority of the existing VMS. Most of the vms were `pc-i440fx-3.1` and could be upgraded to `pc-i440fx-5.2`, which caused discards to start working on existing ones. However some of the vms were the `pc-q35` machine type ones, I was able to get 1 of those (elliot) to upgrade to pc-i440fx-5.2, however discards did not start working. However all of this enabling discards did not seem to fix the problem. Having discard support is nice but it appears the real problem is some of our users REALLY like to write to the disk a lot. Here is # of writes per 30m according to the disk itself https://prometheus.cyberia.club/graph?g0.expr=deriv(smartmon_media_wearout_indicator_raw_value%7B%7D%5B30m%5D)&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=1d&g0.end_input=2021-12-17%2007%3A52%3A45&g0.moment_input=2021-12-17%2007%3A52%3A45 ![image](/attachments/31d99b15-d086-4f16-9046-c4b1a87efe48) Here is # of writes by capsul: ![image](/attachments/ec1d30ec-b6e4-4c84-b030-a9f1c35ae4a2) https://grafana.cyberia.club/d/jMw9xSRMz/capsul-stats?viewPanel=3&orgId=1&from=1638748463900&to=1639762070109
120 KiB
369 KiB
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: cyberia/capsul-flask#29
No description provided.