Agents Need Job Descriptions, Not Just Instructions

Three active agents, one watcher script, and a longer list of decisions they were not allowed to make

A builder's journal — Part 2

Part 2 — The design question

Which decisions deserved their own owner

Part 1 ended with a curriculum too large for one prompt. That created the real design problem of the project: not how many agents to orchestrate, but which decisions deserved their own owner.

CrewAI, LangGraph, AutoGen — those solve orchestration: how to move information between agents reliably. Useful problem. Not the first one. The first one was preventing curriculum design, product design, and technical tradeoffs from collapsing into one generic assistant.

So only three agents activated for the build phase, each a Markdown file in /agents/ with a role, a philosophy, a defined scope, and a list of things it defers:

Mira

Curriculum Director

Lesson design, learning science, tip and quiz systems, AI evaluation criteria per lesson

Kai

Product Designer

Every screen, every tap, every emotional moment in the mobile experience

Noa

Technical Architect

Stack selection, AI pipeline, cost modeling, scalability

Two Python scripts handle coordination. squint.py invokes a single agent, loads its context files, calls the API, and saves output to /outputs/. collaborate.py runs pipelines where each agent's output feeds the next automatically. The point was not to create a swarm. It was to create clean handoffs. A new lesson flows like this:

Mira designs → Celeste checks AI feasibility → Kai designs the screen → Sam logs the decision

How Memory Worked Across Sessions

Context files were the memory. Every agent loaded a curated set of project documents before responding. The curriculum, the team brief, and the decision log moved between sessions and tools. The agents were stateless. The documents were the state.

The missing agent: Dev

The missing role mattered more than any additional reviewer. If Mira designs a lesson, Celeste reviews its evaluation criteria, and Kai specifies its screen, why can't a fourth agent named Dev build the Angular component? Because the messages API gives text in, text out. Writing files and running commands requires tool access. Claude Code already had it, so we used that instead.

The result was architectural inconsistency: the planning agents lived in one system, while the coding agent lived outside it. The missing Dev agent became the clearest gap in the architecture because it sat exactly where the handoff from design to running software should have been.

Activation scheduling

Not all agents run from day one. Elena, the legal agent, does not need to wake up before there are users to protect. Felix, the strategy agent, does not need to opine before Riya has retention data. Activation is a product decision, not a completionist exercise. Turning on an agent before its domain matters mostly produces confident-sounding advice with no grounding.

The coordination

The watcher

A watcher script monitors /outputs/ and triggers review pipelines automatically when new files appear. Quinn finishes the video mapping, Mira reviews it for curriculum fit without any manual handoff. Kai finishes the screen flow, Celeste reviews the feedback display. When both complete, Sam synthesises a summary. One command, walk away, return to six agents having reviewed each other's work.

Individual agents are useful. Coordinated agents are something qualitatively different.

The limits

What they cannot do

Noa produced theory when asked to write code. Not a failure of the agent — a failure of task assignment. An agent asked outside its scope drifts toward its strengths. Tighter scope, not a smarter prompt, is the fix.

More fundamentally: agents cannot observe humans. No system prompt produces the insight of watching a first-time learner sit with the app for twenty minutes. They cannot make the final call. Felix can frame a pricing tradeoff precisely, but the decision belongs to whoever lives with the consequences.

A team of advisors with no decision-maker is just an expensive echo chamber. — Something Sam would say

And they drift when underconstrained. The quality of agent output is almost entirely a function of prompt specificity — which is the same thing that is true of human teams.

What comes next

The rest of the team can wait

The design accounted for more roles than the product needed immediately. Celeste for auditing whether feedback on real student work is actually correct. Riya for funnels and retention. Jordan for support signal. Felix for pricing once behavior exists. Elena for legal and disclosure before scale. They were defined early so they could be activated later, not because they all deserved airtime on day one.

The measure of the whole system is still simpler than the org chart: does a complete beginner draw a better sphere after one round of feedback than before? Build the loop first. Prove it on one person. Then earn the right to the rest of the team.

Part Three covers what building actually looked like: the Claude Code sessions, the Ollama layer that kept iteration cheap, and the deployment work that finally got the app off a laptop and onto a phone. The architecture reads cleanly. The sessions did not.

Previous: Part 1 | Continue with Part 3.

The agent system — squint.py, collaborate.py, context file architecture, and all agent system prompts — is in the project repository. The curriculum (squint-complete-curriculum-v3.md) is the primary reference. The handoff document (squint-conversation-handoff.md) holds the full decision log.

Stack: Angular 21 PWA — Node.js + Express — SQLite / PostgreSQL — Claude Sonnet API — Capacitor (planned). Deployment: Vercel (frontend) + Railway (backend).