ops-handbook/README.md

113 lines
3.2 KiB
Markdown
Raw Permalink Normal View History

2020-04-28 22:26:18 +00:00
# Cyberia operations handbook
This project provides guidance for those wonderful operations members who are technically "on the hook" for all Cyberia-related production services, _especially_
- Capsul
- Matrix
- Forge
2020-10-10 23:29:17 +00:00
- Jitsi (Cafe)
- Mumble
2020-04-28 22:26:18 +00:00
Our current list of operations members is:
2020-10-10 23:29:17 +00:00
- fack
- forest
2020-07-28 19:52:34 +00:00
- j3s
- plantdaddy
2020-10-10 23:29:17 +00:00
- skh
- vvesley
2020-04-28 22:26:18 +00:00
2020-08-27 19:09:09 +00:00
## Ansible
For information on how to run ansible, see the [README.md in the `ansible` folder](ansible/README.md).
2020-04-28 22:26:18 +00:00
## On-Call
Operations provides around the clock shared support. We are all technically on-call 24/7/365.
## Things to keep an eye on
All operations members must:
2020-07-28 19:52:34 +00:00
- have a Matrix account and join #ops:cyberia.club and #services:cyberia.club
2020-04-28 22:26:18 +00:00
- have a Forge account
- subscribe to the [ops mailing list](https://lists.cyberia.club/~cyberia/ops)
2020-07-28 19:52:34 +00:00
- subscribe to the [ops todo tracker](https://todo.cyberia.club/~cyberia/services)
2020-04-28 22:26:18 +00:00
### Alerts
2020-04-28 22:32:43 +00:00
Check alerts on [Prometheus](https://prometheus.cyberia.club/alerts)
2020-04-28 22:26:18 +00:00
This handbook contains the above alert definitions, and they are pulled down into Prometheus automatically when updated.
_You are welcome to adjust, add, and remove alerts_
2020-04-28 22:32:43 +00:00
### Graphs
2020-07-28 19:52:34 +00:00
Grafana graphs are currently available [here](https://grafana.cyberia.club)
2020-04-28 22:32:43 +00:00
2020-04-28 22:26:18 +00:00
#### Informational alerts
```
labels:
severity: info
```
Informational alerts are free to be ignored, but indicate curious happenings in our infrastructure. Operations members are not expected to look at or react to these.
#### Critical alerts
```
labels:
severity: critical
```
Critical alerts are those that indicate a failing of one or more of our services.
When a **critical alert** fires, we should respond with a **sense of urgency**.
If a critical alert fires and there was no associated outage, it is our shared responsibility to _eliminate that alert_. All critical alerts must be actionable. If a critical alert can be resolved by a cronjob, it should be resolved by a cronjob and removed as an alert.
## Incidents
Don't panic. If you are feeling overwhelmed, please contact another operations member about the issue at hand.
## Communication
2020-04-28 22:32:43 +00:00
We will communicate in the #ops:cyberia.club Matrix channel. If that channel is down, we will communicate via [cafe.cyberia.club/ops](https://cafe.cyberia.club/ops)
2020-04-28 22:26:18 +00:00
Communicate your activity. If you are bouncing a machine, please notify #ops that you are bouncing a machine. If you are reloading a service, tell #ops so that people don't step on each others toes. This is also important to establish a timeline.
# References
2020-04-28 22:35:16 +00:00
### [Capsul](docs/capsul.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:35:16 +00:00
* [Create a Capsul](howto/capsul-create.md)
* [Delete a Capsul](howto/capsul-delete.md)
* [Restore a Capsul](howto/capsul-restore.md)
* [List a Capsul](howto/capsul-list.md)
* [Console into a Capsul](howto/capsul-console.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:35:16 +00:00
### [Prometheus](docs/prometheus.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:59:10 +00:00
* [Create an alert](howto/prometheus-create-alert.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:35:16 +00:00
### [Matrix](docs/matrix.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:35:16 +00:00
* [Upgrade Synapse](howto/synapse-upgrade.md)
* [Upgrade Riot](howto/riot-upgrade.md)
2020-04-28 23:29:17 +00:00
* [Reset a user password](howto/matrix-reset-pass.md)
2020-09-20 03:08:16 +00:00
* [Invite a user to Matrix](howto/matrix-invite-user.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:35:16 +00:00
### [Email](docs/email.md)
2020-04-28 22:26:18 +00:00
2020-04-28 23:29:17 +00:00
* [Make new user](howto/email-make-user.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:35:16 +00:00
### [Forge](docs/forge.md)
2020-04-28 22:26:18 +00:00
2020-04-28 22:35:16 +00:00
* [Upgrade Forge](howto/forge-upgrade.md)
2020-04-28 23:29:17 +00:00
2020-04-28 23:29:28 +00:00
### [Git](docs/git.md)
2020-04-28 23:29:17 +00:00
* [Make repo](howto/git-make-repo.md)