Cloud Break Glass Process
In the event of emerency, we will need to bypass the usual operation process and restore custom instances in a timely manner.
What events qualify? For example:
- GCP zonal outage and we may need to perform manual failover
- Customer instance is corrupted or deleted somehow - disaster recovery
Make sure you are following our incident playbook to handle those events.
How to gain escalated permissions
Use Entitle to assume permission as usual, if it doesn’t work, use /break-glass
slash command on Slack. If Slack is down, page Security Support or Security EM for emergency.
Break instance out from the control plane
When performing disruptive manual change, you should extract the instance from control plane management.
If cloud.sourcegraph.com/control-plane-mode=true
is in config.yaml
, follow the Extract instance from control plane (break glass)
section from the Ops Dashboard of the instance, go/cloud-ops
How to apply terraform
We use Terraform Cloud as a terraform state and remote execution platform using the VCS-driven model.
During daily operation, human operator usually shouldn’t be running terraform manually. However, this should be permited during incident.
Update the config.yaml
with the following:
apiVersion: sourcegraph.com/v1
kind: SourcegraphCloud
metadata:
name: <>
spec:
debug:
tfcRunsMode: cli
Then run the command below to generate the updated terraform modules:
mi2 generate -e $ENVIRONMENT --domain $DOMAIN --slug $SLUG
This will configure the TFC workspaces to the CLI-driven model and permit a human operator to execute terraform locally:
cd environments/$ENVIRONMENT/deployments/$INSTANCE_ID/terraform/stacks/tfc
terraform init && terraform apply
Wrapping up
If cloud.sourcegraph.com/control-plane-mode=true
is in config.yaml
, follow the Backfill instance into control plane
section from the Ops Dashboard of the instance, go/cloud-ops, when you’re done.