What Building With AI Assistants Actually Looks Like

Eight sessions, two evenings, one message that said “give me one option”

A builder's journal — Part 3

The transition

The browser can think. It cannot build.

The agent system from Parts 1 and 2 existed as documents and outputs. Mira had designed the curriculum. Kai had designed the screens. Every decision landed in /outputs/ as a markdown file. Acting on any of it meant opening a different tool, making changes, running a command, and hoping the two stayed consistent.

Claude Code is the same model in a terminal with filesystem access. Same prompts. It could actually write the files. What followed was not a clean pipeline.

Windows

Windows cp1252 broke the same script three times

The first error arrived within minutes of the watcher script running. A Unicode character in an agent's output, whether an arrow or an emoji in a print statement, hit Windows' default cp1252 encoding and the whole process crashed.

Session 1 — first crash

UnicodeEncodeError: 'charmap' codec can't encode character '\u2705' in position 341: character maps to <undefined>

The fix was straightforward: force UTF-8 at the top of the script. What was less straightforward was that the same error appeared again in a different file, then again in a third. Each fix was correct but local. The root cause was systemic: Windows defaulting to cp1252 in terminal sessions while the Anthropic API returned Unicode throughout. It took four separate incidents across three files before the permanent fix landed: UTF-8 reconfiguration at the top of every Python entry point, encoding="utf-8" on every write_text call, no exceptions.

A .bat file was created to load the .env API key automatically before any Python invocation, another fix for another variant of the same underlying problem: terminal sessions on Windows do not carry environment state the way Unix shells do. Every time a new terminal opened, the key was gone. The batch file made it permanent.

The invisible character

One invisible character. One hour lost. Thirty seconds to fix.

Session 2. The relevant line was at the bottom of a long traceback:

Session 2 — httpx traceback, final line

httpcore.LocalProtocolError: Illegal header value b'sk-ant-api03-...ZJQAA '

The space at the end of the key, one invisible character appended when the key was copied into the .env file, was enough to make every API call fail with a protocol error buried four levels deep in the httpx transport layer. The error message gave no indication of what was wrong. The traceback pointed at the HTTP library, not at the configuration file. Finding it required reading the raw header value and noticing the trailing whitespace.

Open .env, delete the space, save. Thirty seconds. The error pointed everywhere except at the actual problem.

The version problem

Confident, fast, and four major versions behind

When asked to scaffold the MVP, Claude Code chose Angular 17. Not because of any constraint or deliberate reasoning, but because it was the version current in its training data. It wrote 28 files confidently, ran a syntax check, confirmed everything was clean, and delivered a scaffold several major versions behind the current release.

The session that followed is worth quoting directly:

"Why Angular 17? Honest answer: it was the stable version I defaulted to, not a carefully considered choice. Angular 17 introduced standalone components as the default, which keeps the code simpler. But if you're starting fresh today, Angular 19 is current and there's no strong reason to pin to 17." — Claude Code, when asked

At the time of the session, the latest version was 21.2.x. Claude Code acknowledged this when told, looked it up, and updated the package versions. The scaffold it had written was compatible, because the standalone component API had not changed significantly, but the exchange illustrated something worth naming: Claude Code is fast, thorough, and confident. It is also operating from a training cutoff. Any dependency, framework version, or API that moved after that cutoff will be handled incorrectly until you catch it.

The cost question

Ollama free, Gemini untested, Claude in production

Calling a production AI API on every photo upload during development is expensive. Not catastrophically so, but enough that you hesitate before every test run, which is the wrong state of mind when you are trying to iterate quickly on an evaluation prompt.

The solution was Ollama: an open-source tool that runs language models locally. During development, the backend pointed at a local Ollama instance. The API calls were free, the latency was manageable, and the prompt iteration loop was frictionless. The switch to a real API, Claude or Gemini, happened only when the prompt was stable enough to warrant it.

The switching layer is a single environment variable:

AI_PROVIDER=ollama   # local dev - free, fast iteration
AI_PROVIDER=gemini   # testing - evaluating response quality
AI_PROVIDER=claude   # production - best results, higher cost

The Gemini evaluation did not go smoothly. One model was deprecated mid-test; the next preview model returned a 404. The specific names mattered less than the pattern: model catalogs move faster than training data. Anything time-sensitive still needed a human check.

The comparison is still running. The real question is not which model is strongest in the abstract, but which one gives a beginner feedback that is useful and accurate. Ollama kept that evaluation loop cheap while the answer stayed unsettled.

The gap between planning and code

Agents read documents. They do not read your codebase.

The app had been called Squint for weeks. The context documents said Squint. The agent system prompts said Squint. The curriculum said Squint. Late in the build sessions, a search of the source files turned up BrushCoach throughout, in component titles, in service class names, in comments, and in the master system prompt passed to the vision API.

One command found them all. Another replaced them. Four files updated, done in seconds. But the episode illustrated something the agent design had not accounted for: agents read documents. They do not read source files. The gap between what the planning layer knows and what the codebase contains is real, and it grows every time a decision is made in chat and not immediately reflected in the code.

The frustration

We are wasting time just getting started

Session 1, third environment error in a row:

"can we have this sorted we are wasting time just getting started" — Session 1, after the fourth incident

This is the honest version of building with AI assistants. The architecture sessions in the browser had been clean: a question asked, an answer given, a decision made. The build sessions were not. The errors were fixable, but they were mostly uninteresting friction: encoding, environment state, naming drift, deployment setup. AI did not remove that friction. It shortened the time spent inside it.

Claude Code was useful when the problem was local: find the root cause, patch the files, catch the missing import, update the affected call sites. It was weaker at noticing patterns across sessions or auditing what already existed outside the current thread. The local build took roughly two evenings instead of four or five. That is real leverage. It is not a clean pipeline.

Context drift

Claude Code built a second backend nobody asked for

Session 3 asked Claude Code to implement the full Phase 1 curriculum and wire up the backend. It did both. It also created a brand new Node.js Express backend at /backend/ on port 3000 — while a Python FastAPI backend already existed at /mvp/backend/ on port 8000, fully functional, with its own bat files, its own venv, and its own routes.

Nobody asked for it. Claude Code did not check whether one already existed.

The video seed data went into the Node backend nobody was querying. The video player showed “coming soon” for every lesson. Two sessions to untangle. Both backends still coexist. The Python one serves the app. The Node one sits there.

Context drift. The model knows what it built in this session. It does not audit what was already there.

Security

The API key was exposed twice in eight sessions

The API key was exposed twice. First: the trailing-space traceback printed it in full and it was pasted into the chat. Second: .env.example contained a real key, was committed, and pushed to GitHub. Claude Code caught it on a file check and flagged it.

Same pattern both times: a key left .env, touched a file that was not gitignored, and went public. The gap between what a file is supposed to contain and what it actually contains.

A pre-commit hook scanning for key patterns would have caught both. It does not exist in this project yet.

Working with AI assistants

AI assistants hedge. Senior engineers commit.

After several rounds of Option A / Option B / Option C, the message was:

“give one option please” — Session 6, mobile testing

A senior engineer who is not sure picks one option, tries it, adjusts. An AI assistant presents all options and defers. That looks thorough. It transfers the cognitive load back to the person who already needed help.

What comes next

Five users, no shared WiFi, no localhost

The sessions described here produced a running local build. The next problem was getting it somewhere five people who are not on the same WiFi could reach it. That means a real deployment — a backend that stays running when the laptop closes, a frontend that loads without knowing the laptop’s IP address, and API keys that live in an environment the code cannot accidentally commit.

Part four covers that: Railway’s free tier paused the week it was needed, the venv in the first commit, the PORT variable uvicorn did not pick up, and the moment the API finally returned {"status":"ok"} from a URL that was not localhost.

The design work in Parts 1 and 2 did not ship anything. The sessions did. Slower and messier than the architecture implies, faster and more capable than building alone. That is where AI-assisted development is right now: not a clean pipeline, not a replacement for knowing what you are doing, but a real compression of the distance between a decision and a running build.

Previous: Part 2 | Continue with Part 4.

Claude Code session history exported from VS Code: eight sessions across nine days, April 3–11, 2026. Stack: Angular 21 PWA — FastAPI backend — Ollama (local) / Claude Sonnet / Gemini 2.5 Flash (evaluation) — Vercel (frontend) — Railway (backend). The agent system, curriculum, and all session exports are in the project repository.