magicciv/scripts/cloud-bringup.sh
Natalie 273a7c71f8
Some checks are pending
ci / regression gate (push) Waiting to run
feat(infra): auto-cull orphaned packer build droplets to prevent zombies
Packer destroys its build droplet on a clean finish, but a killed/slept/
network-dropped run leaves the s-8vcpu-16gb-amd builder alive (~$192/mo).
This happened once already (.project/handoffs/20260629_packer-cross-account-leak.md).

Two defense layers:
- scripts/cull-orphan-builders.sh reaps leftover builders by name prefix
  (mc-packer-* / legacy packer-*) with a size guard and an optional age guard;
  pins the MC token via --access-token.
- cloud-bringup.sh calls it in its EXIT trap, so a failed/Ctrl-C'd build reaps
  its own builder.
- infra/launchd/com.uvlava.mc.cull-builders.plist sweeps every 30m with
  --min-age-min 90 to catch SIGKILL/power-loss cases no trap can.

golden-image.pkr.hcl names the builder mc-packer-<ts> for deterministic matching.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 00:05:59 -04:00

62 lines
3.2 KiB
Bash

#!/usr/bin/env bash
# One-shot DigitalOcean bring-up + smoke. Run it yourself so the cloud build and
# repo clone happen under your authority (the agent can't auto-clone private
# source onto cloud boxes; you can). It:
# 1. builds the golden image from the forge,
# 2. spins 1 worker, runs the test suite (timed) + a render proof,
# 3. tears the worker down (trap, even on failure).
#
# Launch and walk away:
# nohup bash scripts/cloud-bringup.sh > ~/cloud-bringup.log 2>&1 &
# # ...sleep... then on waking: less ~/cloud-bringup.log ; open ~/Desktop/mc-do-proof.png
#
# Reads all secrets from ~/.vault/ — nothing sensitive is hardcoded here.
set -uo pipefail
REPO="$HOME/Code/@projects/@magic-civilization"
cd "$REPO" || exit 1
# --- auth (from vault) ---
export DIGITALOCEAN_TOKEN; DIGITALOCEAN_TOKEN="$(cat ~/.vault/do_pat_mc)"
export TF_VAR_do_token="$DIGITALOCEAN_TOKEN"
# shellcheck disable=SC1090
. ~/.vault/mc_forge_creds # FORGE_IP ADMIN_USER ADMIN_PASS ...
GITR="http://${ADMIN_USER}:${ADMIN_PASS}@${FORGE_IP}:3000/mcadmin/magicciv.git"
export TF_VAR_git_remote="$GITR" # workers pull latest from the forge
export PKR_VAR_git_remote="$GITR" # packer reads the creds from env, not argv
PKR_VAR_fleet_pubkey="$(cat ~/.ssh/id_mc_fleet.pub)"; export PKR_VAR_fleet_pubkey # baked into worker authorized_keys
# fleet reuses the pre-registered DO key 'mc-fleet' (var ssh_key_name default); just load its private half
ssh-add ~/.ssh/id_mc_fleet 2>/dev/null || true # so the dispatch ssh (mc@worker) authenticates
echo "########## $(date) — DO cloud bring-up starting ##########"
_teardown() {
echo "########## teardown: ./run dist:down ##########"
./run dist:down 2>&1 | tail -3 || true
# Reap any Packer build droplet left alive by a failed/interrupted build. Packer
# tears its builder down on a clean finish; this catches the cases it can't.
echo "########## teardown: cull orphaned packer builders ##########"
bash scripts/cull-orphan-builders.sh 2>&1 | tail -5 || true
echo "forge left UP for inspection — './run forge:down' to park it (~\$0.30/mo idle)."
}
trap _teardown EXIT
echo "=== [1/4] packer build golden image (~20-40 min) ==="
( cd infra/packer && packer init golden-image.pkr.hcl >/dev/null && \
packer build golden-image.pkr.hcl ) \
|| { echo "!!! PACKER BUILD FAILED — see above. Stopping."; exit 1; }
echo "=== [2/4] dist:up 1 worker (s-8vcpu-16gb-amd — beefy, from golden snapshot) ==="
./run dist:up 1 s-8vcpu-16gb-amd || { echo "!!! dist:up FAILED"; exit 1; }
echo " waiting 75s for worker cloud-init (key + git pull) to settle ..."
sleep 75
echo "=== [3/4] dist:test on the worker (TIMED — the DX-win proof) ==="
time ./run dist:test || echo " (dist:test returned nonzero — see output above)"
echo "=== [4/4] dist:render proof scene -> ~/Desktop/mc-do-proof.png ==="
./run dist:render res://engine/scenes/tests/city_proof.tscn "$HOME/Desktop/mc-do-proof.png" 240 \
|| echo " (render returned nonzero — try another scene from src/game/engine/scenes/tests/*_proof.tscn)"
echo "########## $(date) — bring-up done. Worker will be torn down on exit. ##########"
echo "Review: this log + ~/Desktop/mc-do-proof.png"