Introduction

This book is a learning roadmap tailored to one specific codebase: the FastVRP route planning POC at /Users/leonlan/Dropbox/agents. It is written for a backend engineer who needs to take that POC to a real product.

What you have today

  • Backend: FastAPI + LangGraph ReAct agent (Claude Haiku 4.5, Gemini fallback), 12 tools that wrap a FastVRP solver client, in-memory session store.
  • Frontend: React 19 + Vite, Leaflet map, SSE streaming chat, no state library (prop drilling), no TypeScript.
  • Data: Scenarios held in a Python dict; instances are JSON files on disk.
  • Deploy: Docker on a single GCE VM, nginx basic auth, Let's Encrypt cert.

The hard parts already work: the agent loop, the VRP schema, SSE streaming, and the solve-and-compare UX. The gap to "product" is almost entirely persistence, multi-user, and polish. It is not algorithm or agent work.

How to use this book

Each chapter covers one learning priority. Chapters are ordered by how much they unblock the rest of the work. Every chapter has the same shape:

  1. Why this matters in this codebase.
  2. What to learn, concretely.
  3. Resources worth reading.
  4. A hands-on exercise against the repo.

If you only have a weekend, read the Start here this week chapter and skip everything else.

TypeScript and React

Budget: 2-3 weeks.

Why this matters

The frontend is the weakest layer in the current codebase. It is plain JSX with no types. You have already said design quality matters. Adding types catches most of the bugs you will hit as you add features, and it gives future hires a codebase they can navigate without reading every function.

What to learn

TopicTime
Modern JavaScript: arrow functions, destructuring, spread, async/await, modules, array methods1-2 days
TypeScript handbook (first half): interfaces, types, generics, unions, narrowing2-3 days
React mental model: UI as a function of state, pure components, effects as sync boundaries2 days
Hooks you will actually use: useState, useEffect, useMemo, useCallback, useRef2 days
One throwaway React app (todo list or weather fetcher) to get reps1 day
openapi-typescript setup against the FastAPI /openapi.json0.5 day
Port leaf components in frontend/src/components/ from .jsx to .tsx3-4 days
Port App.jsx to App.tsx once the types are in place2 days

Resources

Exercise

Port frontend/src/App.jsx and the components in frontend/src/components/ to TypeScript. Start with leaf components (MapSettingsPanel, RouteDetail, KpiPanel) and work up to App.tsx. Run openapi-typescript against the FastAPI server's /openapi.json and import the generated types into useChat.ts so the SSE payloads are typed. When you are done, remove allowJs from tsconfig.json and make sure the build is clean.

TanStack Query and frontend state

Budget: 1 week.

Why this matters

Today App.jsx manually fetches sessions and scenarios with useEffect and local useState. The second the product has features like "save scenario, list my scenarios, compare across runs," ad-hoc fetch calls fall apart. You end up reimplementing caching, loading flags, refetch-on-focus, and optimistic updates by hand. TanStack Query does all of that and is the default choice for new React apps.

What to learn

TopicTime
useQuery for reads, with a typed fetch client1 day
Query keys: how to design them so invalidation is surgical0.5 day
useMutation for writes, and when to invalidateQueries vs return new data1 day
Optimistic updates for mutations that should feel instant0.5 day
Defaults: stale time, retry, refetchOnWindowFocus, and when to override0.5 day
Devtools: installing and reading the TanStack Query devtools panel0.5 day
Port every useEffect-based fetch in App.jsx to useQuery or useMutation2-3 days

Resources

Exercise

Replace every useEffect(() => fetch(...)) in App.jsx and the hooks in frontend/src/ with useQuery. Replace every POST or PATCH call with useMutation. When a scenario is created or renamed, invalidate the scenario list query instead of manually reloading. Keep the SSE chat stream as is. TanStack Query is for request/response, not for streams.

A real map library

Budget: 1 week.

Why this matters

MapPanel.jsx uses Leaflet. Leaflet is fine for a few markers and polylines, but it struggles once you have thousands of stops or want custom styling beyond DOM elements. MapLibre GL JS renders vector tiles on the GPU, handles large GeoJSON sources cleanly, and gives you data-driven styling. For a route planning tool, this is the piece that most affects whether the product feels real or feels like a demo.

What to learn

TopicTime
The MapLibre concept stack: style, sources, layers1 day
GeoJSON sources and circle, line, symbol layers1 day
Data-driven styling with expressions (color by vehicle, width by load)1 day
Clustering for dense point sets0.5 day
Fit-to-bounds and camera transitions for scenario switching0.5 day
react-map-gl as a thin React wrapper (optional but helpful)0.5 day
Rewrite MapPanel.jsx on top of MapLibre with GeoJSON sources2-3 days
deck.gl overlay basics (only if you actually need heavy viz)defer

Resources

Exercise

Rewrite frontend/src/components/MapPanel.jsx on top of MapLibre GL JS. Represent tasks and depots as GeoJSON sources rather than per-marker React components. Draw selected routes as a line layer driven by a feature-state so highlighting a route does not rerender the whole map. Wire the existing polyline decoding in server.py into the GeoJSON features returned by the API.

Persistence and auth

Budget: 1-2 weeks.

Why this matters

SessionStore in models.py is an in-memory Python dict. Restart the server and everything a user built is gone. There are no user accounts, so every visitor shares the same nginx basic-auth credential. This is the single largest gap between the current POC and anything you can charge for.

This is also the chapter where your backend instincts carry you the furthest. You already know SQL, you already know how to model domains. What is new is the Python async plus SQLAlchemy 2.0 patterns, and wiring authentication into FastAPI.

What to learn

TopicTime
Postgres refresher: transactions, indexes, JSONB for VRP payloads1 day
SQLAlchemy 2.0 async sessions, or SQLModel if you prefer Pydantic2-3 days
Alembic migrations: autogenerate, edit, apply, roll back1 day
Pick and set up a managed auth provider (Clerk, Supabase Auth, WorkOS)1-2 days
JWT verification as a FastAPI dependency0.5 day
Repository layer that scopes every query by user_id1-2 days
Design the schema (users, sessions, scenarios, solutions)0.5 day
Replace SessionStore and write the isolation integration test2-3 days

Resources

Exercise

Design and implement the schema for users, sessions, scenarios, and solutions. Keep the VRP request and solution payloads as JSONB. Replace SessionStore with a repository that reads and writes Postgres. Add a FastAPI dependency that resolves a JWT to a user_id and injects it into every request. Write one integration test that proves a user cannot read another user's scenario even if they guess the ID.

Schema sketch

create table users (
  id uuid primary key,
  email text unique not null,
  created_at timestamptz not null default now()
);

create table sessions (
  id uuid primary key,
  user_id uuid not null references users(id),
  name text not null,
  created_at timestamptz not null default now()
);

create table scenarios (
  id uuid primary key,
  session_id uuid not null references sessions(id),
  name text not null,
  request jsonb not null,
  original_request jsonb not null,
  created_at timestamptz not null default now()
);

create table solutions (
  id uuid primary key,
  scenario_id uuid not null references scenarios(id),
  status text not null,
  payload jsonb,
  created_at timestamptz not null default now()
);

create index on sessions (user_id);
create index on scenarios (session_id);
create index on solutions (scenario_id);

Background jobs

Budget: 3-5 days.

Why this matters

Solves can take minutes. Today solve_vrp blocks the request. That is fine for a demo with one user, but it holds a worker the whole time, it breaks on server restarts, and it has no retry path when FastVRP transiently fails. A queue fixes all three.

What to learn

TopicTime
Queue concepts: producers, consumers, visibility timeouts, dead letters0.5 day
Pick one of Celery, RQ, or arq and read its quickstart (arq fits this async codebase best)0.5 day
Redis as the broker: install, run, connect0.5 day
Progress reporting pattern: worker writes to Postgres, SSE handler reads1 day
Move solve_vrp from tools.py into an arq job1-2 days
Retry policy with exponential backoff for FastVRP timeouts0.5 day

Resources

  • arq docs: async-native task queue, closest fit to this codebase.
  • Celery docs: heavier, battle-tested alternative.
  • RQ docs: simple sync worker alternative.
  • FastAPI background tasks: read this to understand why they are not a substitute for a real queue.
  • Redis docs: you only need the basics to use it as a broker.

Exercise

Move the body of solve_vrp in tools.py into an arq job. The tool invocation should enqueue the job and return a job ID. Store status and result on the solutions table added in the previous chapter. Update the SSE chat handler so when the agent is waiting on a solve, it streams progress updates as they land in Postgres. Add a retry policy that retries FastVRP timeouts up to three times with exponential backoff.

LLM agent engineering

Budget: ongoing.

Why this matters

You already use LangGraph in agent.py. The loop works. What is missing is the set of production concerns that separate a demo agent from one that is cheap, debuggable, and stable under prompt edits.

What to learn

TopicTime
Prompt caching: add cache_control to the system prompt in agent.py2 hours
Measure token cost before and after caching on a representative session1 hour
Streaming structured output so the UI can render partial tool progress1 day
Write 10 scripted eval cases at tests/agent_evals.jsonl1 day
Build a pytest-based eval harness that runs the agent against each case1-2 days
Wire LangSmith tracing behind an env flag0.5 day
Ongoing: review traces weekly, add eval cases when you find regressionsongoing

Resources

  • Anthropic prompt caching: the official guide, the only thing you need for step 1 of the exercise.
  • Anthropic tool use: if you are reworking the tool layer.
  • LangGraph docs: concepts and recipes.
  • LangSmith: hosted tracing and eval platform.
  • promptfoo: lighter-weight eval framework.
  • The claude-api skill in this environment. Invoke it when you touch agent.py or system_prompt.md. It enforces caching and model-version hygiene.

Exercise

  1. Turn on prompt caching for the system prompt in agent.py. Measure token cost on a representative session before and after.
  2. Add a JSONL eval file at tests/agent_evals.jsonl with 10 scripted user turns and the expected tool calls or substrings in the final response. Write a pytest that runs the agent against each entry and fails if the expected tool was not called.
  3. Wire LangSmith tracing behind an env flag so you can turn it on in staging without leaking traces from production by default.

Design fundamentals

Budget: weekends, parallel to everything.

Why this matters

Your product is a spatial data visualization with a chat sidebar. Design quality is most of what users will feel. You do not need to become a designer. You need enough taste and vocabulary to recognize when something is wrong and fix it.

What to learn

TopicTime
Visual hierarchy, spacing, and color (most "looks off" bugs are these three)1 weekend
Typography: font choice, weight scale, line-height, line length1 afternoon
Cartographic principles: color vs size vs shape, avoiding chartjunk1 weekend
The CRAP acronym: contrast, repetition, alignment, proximity2 hours
Refactoring UI (Wathan and Schoger), cover to cover1 weekend
Practical Typography (Butterick), online1 afternoon, reread yearly
The Visual Display of Quantitative Information (Tufte)1-2 weekends
The Design of Everyday Things (Norman)2-3 weekends
Apply it: one-hour design pass on MapSettingsPanel.jsx and RouteDetail.jsx2 hours

Resources

Exercise

Read Refactoring UI cover to cover in one weekend. Then do a one-hour pass on MapSettingsPanel.jsx and RouteDetail.jsx applying what you learned. Fix spacing, align controls on a grid, reduce the number of distinct shades of gray, and pick one font weight scale and stick to it. Do not refactor the logic. Only touch styles. Commit before and after so you can see the delta.

Observability and deploy

Budget: 1 week, once users exist.

Why this matters

Your current deploy is one VM plus nginx. That is fine today. The day a user reports "my solve failed and I don't know why," you will wish you had structured logs, error tracking, and a CI pipeline that builds known good images. Do this before users arrive, not after.

What to learn

TopicTime
structlog in Python: JSON in prod, pretty in dev0.5-1 day
Pick and wire a managed log destination (Axiom, Better Stack, Grafana Cloud)0.5 day
Sentry for the FastAPI backend0.5 day
Sentry for the React frontend, sourcemaps included0.5 day
GitHub Actions: one workflow for pytest + frontend build + Docker image1-2 days
Push Docker images to Artifact Registry on merge to main0.5 day
Evaluate Cloud Run or Fly.io as the next deploy target (defer the migration)0.5 day

Resources

Exercise

Replace server.log and ad-hoc print statements in server.py with structlog. Configure it to emit JSON in production and pretty human-readable output in development. Add Sentry to both the backend and the frontend. Write one GitHub Actions workflow that runs tests, builds the Docker image, and pushes it to Artifact Registry on a push to main. Leave the VM deploy as is for now. The goal is to know when something is broken, not to rearchitect the deploy.

What to skip

Just as important as what to learn is what to ignore. A backend engineer new to frontend will get pulled in every direction by blog posts and hype. These are the things that will come up, and that you should not touch for this product, at this stage.

Next.js, server components, SSR

You are building an authenticated internal tool and an authenticated product for paying users. You do not need SEO on the solver, and you do not need SSR. A plain Vite SPA talking to FastAPI is simpler, ships faster, and is easy to migrate later if marketing needs demand it.

Redux, Zustand, and other global state managers

TanStack Query covers 90% of what you would reach for them for. The remaining 10% is usually a few useStates in App.tsx. If you find yourself genuinely needing a global store for cross-cutting UI state, reach for Zustand at that point. Not before.

Component libraries beyond shadcn/ui plus Tailwind

shadcn/ui gives you accessible primitives you own and can modify. Do not install Material UI, Chakra, Ant Design, Mantine, or any other framework on top. They fight Tailwind and they fight your designer instincts.

Microservices

One FastAPI app plus one worker process is the right shape for a long time. You do not need a service mesh, you do not need gRPC between your own services, and you do not need to split the agent into its own service. Split only when a single service is genuinely in the way.

Kubernetes

Cloud Run or Fly.io will serve you well past your first hundred paying users. Kubernetes is worth learning the day you have a dedicated SRE who wants to run it. Not before.

Testing frameworks and Storybook, too early

Both are good tools. Both are premature if the UI is still changing every day. Add Vitest the day you have a regression you want to catch. Add Storybook the day you have a component library someone else needs to use.

Start here this week

If you only have a weekend, do these three things. They are the highest leverage per hour in the whole book.

1. Turn on Anthropic prompt caching

Effort: two hours. Impact: immediate and ongoing cost reduction, plus latency.

In agent.py, add cache_control to the system prompt so it hits Anthropic's prompt cache. system_prompt.md is 101 lines and ships on every turn. Caching it cuts token cost by a large factor. Invoke the claude-api skill when you do this, so the diff also handles model version hygiene.

2. Add TypeScript to the frontend

Effort: one weekend. Impact: catches most of the bugs you will hit in the next three months.

Add a tsconfig.json, rename the leaf components in frontend/src/components/ from .jsx to .tsx, and run openapi-typescript against the FastAPI server's /openapi.json to generate a typed client. Import those types into useChat.js and App.jsx as they get ported. You do not have to port everything in one pass. Leaving a few files as .jsx is fine.

3. Read Refactoring UI and do one design pass

Effort: one weekend. Impact: biggest visible-quality bump for the effort.

Read Refactoring UI cover to cover. Then spend one focused hour on MapSettingsPanel.jsx and RouteDetail.jsx applying what you just learned. Fix spacing, align on a grid, cut the number of gray shades, and pick a type scale. Only touch styles, do not refactor logic.

What comes next

Once these three are done, open TypeScript and React and work through the chapters in order. Persistence and auth is the chapter that unblocks paying customers. Everything before it is quality-of-life. Everything after it is scale and polish.