Systems Engineering
Luna: My Trusted, Autonomous AI Agent
Trust, autonomy, and guardrails for a supervised agent.
Intro
Luna is my autonomous AI agent that I built and supervise to handle technical work and manage my infrastructure.
She keeps systems running and fixes problems automatically when they show up.
She looks at the logs, keeps notes, and can recognize and learn from long-term patterns.
This post outlines how she decides what to do, how I verify it, and how I prevent unauthorized and runaway behavior.
Trust
"Trusted" means Luna cannot execute actions unless the request carries cryptographic proof that it was authorized by me.
The result is simple: Luna cannot perform unauthorized actions, whether the request is intentional or accidental.
I built the trust mechanisms for two reasons: to block unauthorized control, and to prevent runaway behavior.
That includes preventing privilege escalation so existing credentials cannot be used to expand access, reach root, or lock me out of my own infrastructure.
Autonomy
"Autonomous" means Luna can run unattended work loops without me driving each step.
She monitors signals, forms a plan, and carries out allowed actions, while keeping a record of what she did and why.
When an action crosses a higher-risk boundary, she switches from executing to requesting explicit authorization.
Guardrails
- Default deny: Actions are not permitted unless explicitly authorized.
- Least privilege: Luna's permissions are limited to what she needs for the task at hand.
- Capability boundaries: Authorization is specific to an action class, environment, and time window.
- No privilege ladder: The system is designed so a low-privilege foothold cannot be used to obtain higher privileges.
Verification
Every meaningful step produces an audit trail: what triggered the action, what was authorized, what changed, and what evidence supports the result.
I can review what Luna did after the fact, and I can also require review before execution when the risk is higher.
Preprints
- Operator Authentication (how trust and authorization are enforced)
- Phased Autonomy (how autonomy is bounded as risk changes)