dec2021 blaog #1

Merged
forest merged 21 commits from dec2021 into main 2021-12-18 19:32:48 +00:00
5 changed files with 401 additions and 5 deletions

5
make
View File

@ -50,9 +50,10 @@ while read -r page; do
"../site/$page" |
# link https
sed -E "s|^(https[:]//[^ )]{50})([^ )]*)|<a href='\0'>\1</a>|g" |
sed -E "s|^(https[:]//[^ )]{32})([^ )]{3,1000})([^ )]{15})|<a href='\0'>\1...\3</a>|g" |
sed -E "s|^(https[:]//[^ )]{1,50})|<a href='\0'>\0</a>|g" |
# link mailto
sed -E "s|(mailto:[^ ]*)|<a href='\1'>\1</a>|g" |
sed -E "s|mailto:([^ ]*)|<a href='mailto:\1'>\1</a>|g" |
# color hexcodes
sed -E 's|(#[a-f0-9]{6})|<font color="\1">\1</font>|g' |
# color cyberia

View File

@ -9,6 +9,8 @@
| |
+-------------------------------------------------------+
- <a href=/blog/20211217-capsul-maintenance-updates.html>Capsul - Rumors Of My Demise Have Been Greatly Exaggerated</a>
- <a href=/blog/2020-11-27.html>COVIDaware MN app investigation</a>
- <a href=/blog/20200520-capsul-rollin-onward-with-a-web-application.html>Capsul rollin' onward with a Web Application</a>

View File

@ -215,6 +215,6 @@ verbose logging goes slightly against the "we don't
collect any more data about you than we need to" mantra.
If you would like to take a peek at the code, it's
hosted on forge:
hosted on our git server:
https://giit.cyberia.club/~forest/capsul-flask
https://git.cyberia.club/cyberia/capsul-flask

View File

@ -0,0 +1,385 @@
CAPSUL
rumors of my demise have been greatly exaggerated
Forest 2021-12-17
WHAT IS THIS?
If you're wondering "what is capsul?", see:
https://cyberia.club/blog/20200520-capsul-rollin-onward-with-a-web-application.html
For the capsul source code, navigate to:
https://git.cyberia.club/cyberia/capsul-flask
WHAT HAPPENED TO THE CRYPTOCURRENCY PAYMENT OPTION?
Life happens. Cyberia Computer Club has been hustling
and bustling to build out our new in-person space in
Minneapolis, MN:
https://wiki.cyberia.club/hypha/cyberia_hq/faq
Hackerspace, lab, clubhouse, we aren't sure what to call
it yet, but we're extremely excited to finish with the
rennovations and move in!
In the meantime, something went wrong with the physical
machine hosting our BTCPay server and we didn't have
anywhere convenient to move it, nor time to replace it,
so we simply disabled cryptocurrency payments
temporarily in September 2021.
Many of yall have emailed us asking "what gives??",
and I'm glad to finally be able to announce that
"the situation has been dealt with",
we have a brand new server and the blockchain syncing
process is complete, cryptocurrency payments in bitcoin,
litecoin, and monero are back online now!
--> https://capsul.org/payment/btcpay <--
THAT ONE TIME CAPSUL WAS ALMOST fsync'd TO DEATH
Guess what? Yall loved capsul so much, you wore our disks
out. Well, almost.
We use redundant solid state disks + the ZFS file system
for your capsul's block storage needs, and it turns out
that some of our users like to write files. A lot.
Over time, SSDs will wear out, mostly dependent on how
many writes hit the disk. Baikal, the server behind
capsul.org, is a bit different from a typical desktop
computer, as it hosts about 100 virtual machines, each
with thier own list of application processes, for over 50
individual capsul users, each of whom may be providing
services to many other individuals in turn.
The disk-wear-out situation was exacerbated by our
geographical separation from the server; we live in
Minneapolis, MN, but the server is in Georgia. We wanted
to install NVME drives to expand our storage capacity
ahead of growing demand, but when we would mail PCI-e to
NVME adapters to CyberWurx, our datacenter colocation
provider, they kept telling us the adapter didn't fit
inside the 1U chassis of the server.
At one point, we were forced to take a risk and undo the
redundancy of the disks in order to expand our storage
capacity and prevent "out of disk space" errors from
crashing your capsuls. It was a calculated risk, trading
certain doom now for the potential possibility of doom
later.
Well, time passed while we were busy with other projects,
and those non-redundant disks started wearing out.
According to the "smartmon" monitoring indicator, they
reached about 25% lifespan remaining. Once the disk
theoretically hit 0%, it would become read-only in order
to protect itself from total data loss.
So we had to replace them before that happened.
https://picopublish.sequentialread.com/files/smartmon_dec2021.png
We were so scared of what could happen if we slept on
this that we booked a flight to Atlanta for maintenance.
We wanted to replace the disks in person, and ensure we
could restore the ZFS disk mirroring feature.
We even custom 3d-printed a bracket for the tiny PCI-e
NVME drive that we needed in order to restore redundancy
for the disks, just to make 100% sure that the
maintenance we were doing would succeed & maintain
stability for everyone who has placed thier trust in us
and voted with thier shells, investing thier time and
money on virtual machines that we maintain on a volunteer
basis.
https://picopublish.sequentialread.com/files/silly-nvme-bracket2.jpg
Unfortunately, "100% sure" was still not good enough,
the new NVME drive didn't work as a ZFS mirroring partner
at first ⁠— the existing NVME drive was 951GB, and the
one we had purchased was 931GB. It was too small and ZFS
would not accept that. f0x suggested:
> [you could] start a new pool on the new disk,
> zfs send all the old data over, then have an
> equally sized partition on the old disk then add
> that to the mirror
But we had no idea how to do that exactly or how long it
would take & we didn't want to change the plan at the
last second, so instead we ended up taking the train from
the datacenter to Best Buy to buy a new disk instead.
The actual formatted sizes of these drives are typically
never printed on the packaging or even mentioned on PDF
datasheets online. When I could find an actual number
for a model, it was always the lower 931GB.
So, we ended up buying a "2TB" drive as it was the only
one BestBuy had which we could guarantee would work.
So, lesson learned the hard way. If you want to use ZFS
mirroring and maybe replace a drive later, make sure to
choose a fixed partition size which is slightly smaller
than the typical avaliable space on the size of drive
you're using, in case the replacement drive was
manufactured with slightly less avaliable formatted
space!!!
Once mirroring was restored, we made sure to test it
in practice by carefully removing a disk from the server
while it's running:
https://picopublish.sequentialread.com/files/zfs_disk_replacement/
While we could have theoretically done this maintenance
remotely with the folks at CyberWurx performing the
physical parts replacement per a ticket we open with
them, we wanted to be sure we could meet the timeline
that the disks had set for **US**. That's no knock on
CyberWurx, moreso a knock on us for yolo-ing this server
into "production" with tape and no test environment :D
The reality is we are vounteer supported. Right now
the payments that the club receives from capusl users
don't add up to enough to compensate (make ends meet for)
your average professional software developer or sysadmin,
at least if local tech labor market stats are to be
believed.
We are all also working on other things, we can't devote
all of our time to capsul. But we do care about capsul,
we want our service to live, mostly because we use it
ourselves, but also because the club benefits from it.
We want it to be easy and fun to use, while also staying
easy and fun to maintain. A system that's agressively
maintained will be a lot more likely to remain maintained
when it's no one's job to come in every weekday for that.
That's why we also decided to upgrade to the latest
stable Debian major version on baikal while we were
there. We encountered no issues during the upgrade
besides a couple of initial omissions in our package
source lists. The installer also notified us of several
configuration files we had modified, presenting us with
a git-merge-ish interface that displayed diffs and
allowed us to decide to keep our changes, replace our
file with the new version, or merge the two manually.
I can't speak more accurately about it than that, as
j3s did this part and I just watched :)
LOOKING TO THE FUTURE
We wanted to upgrade to this new Debian version because
it had a new major version of QEMU, supporting virtio-blk
storage devices that can pass-through file system discard
commands to the host operating system.
We didn't see any benefits right away, as the vms
stayed defined in libvirt as their original machine types,
either pc-i440fx-3.1 or a type from the pc-q35 family.
After returning home, we noticed that when we created
a new capsul, it would come up as the pc-i440fx-5.2
machine type and the main disk on the guest would display
discard support in the form of a non-zero DISC-MAX size
displayed by the `lsblk -D` command:
localhost:~# sudo lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sr0 0 0B 0B 0
vda 512 512B 2G 0
Most of our capsuls were pc-i440fx ones, and we upgraded
them to pc-i440fx-5.2, which finally got discards working
for the grand majority of capsuls.
If you see discard settings like that on your capsul,
you should also be able to run `fstrim -v /` on your
capsul which saves us disk space on baikal:
welcome, cyberian ^(;,;)^
your machine awaits
localhost:~# sudo lsblk -D
NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO
sr0 0 0B 0B 0
vda 512 512B 2G 0
localhost:~# sudo fstrim -v /
/: 15.1 GiB (16185487360 bytes) trimmed
^ Please do this if you are able to!
You might also be able to enable an fstrim service or
timer which will run fstrim to clean up and optimize
your disk periodically.
However, some of the older vms were the pc-q35 family of
QEMU machine type, and while I was able to get one of
ours to upgrade to pc-i440fx-5.2, discard support still
did not show up in the guest OS. We're not sure what's
happening there yet.
We also improved capsul's monitoring features; we began
work on proper infrastructure-as-code-style diffing
functionality, so we get notified if any key aspects of
your capsuls are out of whack. In the past this had been
an issue, with DHCP leases expiring during maintenance
downtimes and capsuls stealing each-others assigned IP
addresses when we turn everything back on.
Also, capsul-flask now includes an admin panel with
1-click-fix actions built in, leveraging this data:
https://git.cyberia.club/cyberia/capsul-flask/src/commit/b013f9c9758f2cc062f1ecefc4d7deef3aa484f2/capsulflask/admin.py#L36-L202
https://picopublish.sequentialread.com/files/admin-panel.jpg
I acknowledge that this is a bit of a silly system,
but it's an artifact of how we do what we do. Capsul
is always changing and evolving, and the web app was
built on the idea of simply "providing a button for"
any manual action that would have to be taken,
either by a user or by an admin.
At one point, back when capsul was called "cvm",
_everything_ was done by hand over email and the
commandline, so of course anything that reduced the
amount of manual administration work was welcome,
and we are still working on that today.
When we build new UIs and prototype features, we learn
more about how our system works, we expand what's
possible for capsul, and we come up with new ways to
organize data and intelligently direct the venerable
virtualization software our service is built on.
I think that's what the "agile development" buzzword from
professional software development circles was supposed to
be about: freedom to experiment means better designs
because we get the opportunity to experience some of the
consequences before we fully commit to any specific
design. A touch of humility and flexibility goes a
long way in my opinion.
We do have a lot of ideas about how to continue
making capsul easier for everyone involved, things
like:
1. Metered billing w/ stripe, so you get a monthly bill
with auto-pay to your credit card, and you only pay
for the resources you use, similar to what service
providers like Backblaze do.
(Note: of course we would also allow you to
pre-pay with cryptocurrency if you wish)
2. Looking into rewrite options for some parts of the
system: perhaps driving QEMU from capsul-flask
directly instead of going through libvirt,
and perhaps rewriting the web application in golang
instead of sticking with flask.
3. JSON API designed to make it easier to manage capsuls
in code, scripts, or with an infrastructure-as-code
tool like Terraform.
4. IO throttling your vms:
As I mentioned before, the vms wear out the disks
fast. We had hoped that enabling discards would help
with this, but it appears that it hasn't done much
to decrease the growth rate of the smartmon wearout
indicator metric.
So, most likely we will have to enforce some form of
limit on the amount of disk writes your capsul can
perform while it's running day in and day out.
80-90% of capsul users will never see this limit,
but our heaviest writers will be required to either
change thier software so it writes less, or pay more
money for service. In any case, we'll send you a
warning email long before we throttle your capsul's
disk.
And last but not least, Cybera Computer Club Congress
voted to use a couple thousand of the capsulbux we've
recieved in payment to purchase a new server, allowing
us to expand the service ahead of demand and improve our
processes all the way from hardware up.
(No tape this time!)
https://picopublish.sequentialread.com/files/baikal2
Shown: Dell PowerEdge R640 1U server with two
10-core xeon silver 4114 processors and 256GB of RAM.
(Upgradable to 768GB!!)
CAN I HELP?
Yes! We are not the only ones working on capsul these
days. For example, another group, https://coopcloud.tech
has forked capsul-flask and set up thier own instance at
https://yolo.servers.coop
Thier source code repository is here
(not sure this is the right one):
https://git.autonomic.zone/3wordchant/capsul-flask
Having more people setting up instances of capsul-flask
really helps us, whether folks are simply testing or
aiming to run it in production like we do.
Unfortunately we don't have a direct incentive to
work on making capsul-flask easier to set up until folks
ask us how to do it. Autonomic helped us a lot as they
made thier way through our terrible documentation and
asked for better organization / clarification along the
way, leading to much more expansive and organized README
files.
They also gave a great shove in the right direction when
they decided to contribute most of a basic automated
testing implementation and the beginnings of a JSON API
at the same time. They are building a command line tool
called abra that can create capsuls upon the users
request, as well as many other things like installing
applications. I think it's very neat :)
Also, just donating or using the service helps support
cyberia.club, both in terms of maintaing capsul.org and
reaching out and supporting our local community.
We accept donations via either a credit card (stripe)
or in Bitcoin, Litecoin, or Monero via our BTCPay server:
https://cyberia.club/donate
As always, you may contact us at:
mailto:support@cyberia.club
Or on matrix:
#services:cyberia.club
For information on what matrix chat is and how to use it,
see: https://cyberia.club/matrix
Forest 2021-12-17

View File

@ -33,7 +33,7 @@
<item>
<title>20/05/2020: Capsul rollin' onward with a Web Application</title>
<description>A necessary post</description>
<description>Forest describes his thought process when building the capsul.org web application</description>
<link>https://cyberia.club/blog/20200520-capsul-rollin-onward-with-a-web-application</link>
<guid isPermaLink="true">https://cyberia.club/blog/20200520-capsul-rollin-onward-with-a-web-application</guid>
<pubDate>Wed, 20 May 2020 00:00:00 +0000</pubDate>
@ -47,5 +47,13 @@
<pubDate>Fri, 27 Nov 2020 13:46:13 UTC</pubDate>
</item>
<item>
<title>17/12/2021: Capsul - Rumors Of My Demise Have Been Greatly Exaggerated</title>
<description>Forest regails you with tales of maintenance, links to pictures and video, and hopes for the future of capsul</description>
<link>https://cyberia.club/blog/20211217-capsul-maintenance-updates</link>
<guid isPermaLink="true">https://cyberia.club/blog/20211217-capsul-maintenance-updates</guid>
<pubDate>Sat, 18 Dec 2021 00:00:00 +0000</pubDate>
</item>
</channel>
</rss>