Engagement timeline / week 2 merge log

The seven PRs a forward deployed engineer lands in week 2.

Most pages on the FDE role describe what the role is. None publish the merge log. This is the literal sequence of pull requests that land between calendar day 8 and calendar day 14 of an fde10x engagement, with branch names, files touched, line counts, reviewers, and what each one unblocks for the Monday morning decision meeting.

Matthew Diakonov, Written with AI

Published May 2, 20268 min read

Direct answer (verified 2026-05-02 against the engagement template)

Seven PRs, one per calendar day from day 8 to day 14, all branched under a pilot/ prefix.

PR #1, day 8: rubric.yaml + cases.yaml skeleton
PR #2, day 9: staging deploy + rollout flag held at 10 percent
PR #3, day 10: first agent prototype + 5 seed cases
PR #4, day 11: expand cases.yaml to 15+ from production traces
PR #5, day 12: pilot-gate.yml as the only required CI check
PR #6, day 13: ragas + per-axis grader wiring
PR #7, day 14: Monday snapshot job + refund webhook

Average net change per PR is about 450 lines. The day 14 cron writes .pilot-gate/latest.json, which is the input to the 30 minute decision meeting at 09:30 UTC.

The merge log on Monday morning

Here is what gh pr list prints at 09:00 UTC on calendar day 14, after the seven PRs have landed but before the rubric runs against them.

gh pr list (calendar day 14, 09:00 UTC)

The seven PRs in detail

One card per PR. Branch name, day it merges, files added, files edited, net line count, what it unblocks, and the reviewer pool. Read straight through if you are about to run an engagement; jump to the one that matches your current calendar day if you are mid-week 2.

PR #1calendar day 8+118 / -2

`pilot/rubric-skeleton`

rubric: <agent> v0 (skeleton, week 2 row of ratchet)

files added

rubric.yaml
eval/cases.yaml
eval/__init__.py

files edited

README.md

What this PR unblocks. Locks the five axes, the per-axis floors, and the week 2 row of the ratchet. Subsequent PRs are not allowed to redefine the rubric without an explicit follow-up PR, so reviewers cannot accidentally lower the bar mid-week.

reviewers: engagement-owner, client-lead

PR #2calendar day 9+204 / -6

`pilot/staging-deploy`

infra: staging deploy + flags.<agent>.rollout_percent (held at 10)

files added

infra/staging/deploy.tf
infra/staging/flag.tf

files edited

.github/CODEOWNERS
infra/README.md

What this PR unblocks. The prototype now has a place to run. The rollout flag is pinned to 10 percent and CODEOWNERS forbids edits to that file outside a PR titled rollout: ramp <agent>. The cancel-and-refund clause has a real surface to act on.

reviewers: client-platform, engagement-owner

PR #3calendar day 10+412 / -8

`pilot/agent-v0`

agent: first <agent> prototype + 5 seed cases

files added

agents/<agent>/__init__.py
agents/<agent>/prompt.md
agents/<agent>/run.py

files edited

eval/cases.yaml

What this PR unblocks. The agent runs end to end against the staging flag. Five cases are written by the senior engineer to cover the happy path plus one stakes:high case. Reviewers see the first signal: which axes the v0 prompt is naturally good at and which are below the floor.

reviewers: engagement-owner, client-lead

PR #4calendar day 11+1,148 / -42

`pilot/cases-from-traces`

eval: expand cases.yaml from 5 to 17 (production traces, 4 stakes:high)

files added

eval/traces/2026-04-trace-batch-01.jsonl
eval/traces/2026-04-trace-batch-02.jsonl

files edited

eval/cases.yaml
eval/README.md

What this PR unblocks. The case set is now drawn from real traffic, not happy-path demo questions. 17 rows total, 4 tagged stakes:high. The day 14 decision meeting will read a score against this case set, not a curated subset. Without this PR the rubric would be measuring the demo, not the product.

reviewers: client-data-lead, engagement-owner

PR #5calendar day 12+186 / -4

`pilot/gate-workflow`

ci: pilot-gate.yml as the only required check on agents/** and eval/**

files added

.github/workflows/pilot-gate.yml

files edited

.github/branch-protection.yml
rubric.yaml

What this PR unblocks. The rubric file is now executable. Every PR touching agents/ or eval/ runs the rubric and must clear the week 2 row of the ratchet to merge. The Monday 09:00 UTC cron is wired but the refund webhook env is still empty until PR #7 lands.

reviewers: client-platform, engagement-owner

PR #6calendar day 13+724 / -3

`pilot/grader-wiring`

eval: ragas + per-axis grader wiring (faithfulness, helpfulness, completeness, tone, policy)

files added

eval/graders/ragas.py
eval/graders/helpfulness.py
eval/graders/completeness.py
eval/graders/tone.py
eval/graders/policy.py
eval/rubric.py

files edited

pyproject.toml

What this PR unblocks. The five axes now have actual graders behind them. Faithfulness uses ragas, policy is a hard gate that fails the run on a single miss, the others are case-specific rubrics. The grader contract (in, score, reasoning) is the same shape across all five so the day 14 snapshot is a single uniform JSON.

reviewers: engagement-owner, client-lead

PR #7calendar day 14+288 / -14

`pilot/snapshot-and-refund`

ci: Monday snapshot job + refund webhook (calendar day 14 decision input)

files added

.pilot-gate/.gitkeep
eval/pr_comment.py
eval/open_refund_issue.py

files edited

.github/workflows/pilot-gate.yml
rubric.yaml
infra/staging/secrets.tf

What this PR unblocks. The Monday cron writes .pilot-gate/latest.json and, on a miss, fires the refund webhook before anyone is at their desk. The 30 minute meeting at 09:30 UTC reads that file plus the day 7 snapshot to look at the slope. This PR is what makes the cancel-and-refund clause an actual workflow instead of a sentence in the MSA.

reviewers: engagement-owner, client-lead, client-finance

What the day 14 snapshot file looks like

PR #7 lands the cron that writes this file. It is the single artifact the 09:30 UTC meeting reads. The shape is locked in PR #6 so the refund webhook in PR #7 can post it without a transformer step.

.pilot-gate/latest.json

The Monday morning, in terminal output

The cron run at 09:00 UTC, the rubric scoring, the threshold miss, the refund webhook, and the 09:30 decision meeting reading the slope. A real run of an engagement that ended up continuing through the ratchet to week 6.

pilot-gate · 2026-05-04 calendar day 14

What goes wrong vs what we ship

Six failure modes we see in scoping calls, and what the seven-PR shape looks like instead. Each row is a real pattern from a real engagement that landed on the wrong side of the table once.

Feature	What we see in scoping calls	The seven-PR shape
How many PRs land in week 2	One giant PR titled "prototype" that lands on Friday of week 2 with 6,000 lines, 80 files, and a single reviewer who skims it.	Seven small PRs, one per calendar day from day 8 to day 14. Each is reviewable in under thirty minutes. Average net change per PR is around 450 lines.
Branch naming	Branches named after the engineer: matt/scratch, sarah/wip, fde-poc-2. The day 14 review meeting cannot enumerate the work without scrolling through six months of unrelated branches.	All branches under a pilot/ prefix. branch-protection.yml restricts the prefix to the engagement owner and the senior engineer. The prefix is what the day 14 gh pr list query filters on.
When the rubric file lands	The rubric is added in week 4 "once we know what we are evaluating." By then the team is grading whatever the agent already does well. The rubric is descriptive, not prescriptive.	PR #1, calendar day 8. Before any agent code. The rubric is the contract; the agent is the attempt to satisfy it. Reviewing the rubric first forces the team to agree on what the prototype is for.
When the gate becomes executable	The gate is wired the morning of the meeting. The first run is at 09:00 UTC, the score is below threshold, the team blames the gate, the meeting becomes a debate about whether the harness is broken.	PR #5, calendar day 12. Two days before the day 14 decision meeting. The gate runs against three real PRs (#3, #4, #6) before Monday morning, so the snapshot at 09:00 UTC is not the first time anyone sees the score.
Where the reviewer pool comes from	The senior FDE merges their own PRs because the client team is busy. By day 14 the prototype has a single owner, no client engineer can debug it, and the leave behind in week 6 is a pile of unfamiliar code.	Every PR has at least one named client engineer in the reviewers list. CODEOWNERS routes agent files to the engagement owner plus the client lead, eval files to the client data lead, infra files to the client platform owner. Six different humans have touched the prototype by day 14.
What the day 14 snapshot is	A Notion page assembled the night before the meeting. Numbers from three different runs, no slope, no comparison to last week. The meeting becomes a story instead of a decision.	A single .pilot-gate/latest.json file written by PR #7's cron job at 09:00 UTC. The 09:30 UTC meeting reads it alongside the day 7 snapshot to look at the slope, not just the score. The file format is locked in PR #6.

Week 2 PR list checklist

Every week 2 PR is on a branch with prefix pilot/
PR #1 (rubric.yaml) merges no later than calendar day 8
PR #4 expands cases.yaml to at least 15 production-trace rows by day 11
PR #5 makes pilot-gate.yml the only required check on agents/** and eval/**
PR #7 wires the Monday snapshot job and the refund webhook by day 14
Each PR has at least one named client engineer in the reviewers list, not just the engagement owner
The day 7 snapshot is retained so day 14 can compute the slope
Every PR description includes the row of the ratchet it was scored against
No PR title is decoration: every title is action + scope (rubric:, infra:, agent:, eval:, ci:)
If any PR touches the rollout flag, it is gated by CODEOWNERS and titled rollout: ramp <agent>

What you can copy off this page

The branch prefix, the seven titles, the reviewer routing, and the ordering constraint (rubric before agent, gate before graders, graders before snapshot job) are not proprietary. We use them on every engagement and the leave behind in week 6 is the same set of files in your repo. Copying the shape today is the right move whether you ever talk to us or not.

What is hard to copy is PR #4. Pulling 12 production traces, tagging 4 of them stakes:high, and writing per-axis ground truth for each takes a senior engineer about three days. That is most of what the named engineer is doing on calendar days 9 to 11. Without that PR the rubric is grading the demo, not the product, and the day 14 decision meeting is reading the wrong number.

Want a named senior engineer landing this PR list inside your repo next week?

Sixty minutes with the engineer who would own the build. You leave with a written one-pager: outcome, data sources, rubric, and the seven-PR week 2 plan. The cancel-and-refund clause is in the MSA.

Frequently asked questions

How many PRs should an FDE land in week 2 of an engagement?

Seven, one per calendar day from day 8 to day 14, branched under a pilot/ prefix. The exact split we ship: PR #1 rubric.yaml + cases.yaml skeleton, PR #2 staging deploy + flag, PR #3 first agent prototype + 5 seed cases, PR #4 expand cases to 15 from production traces, PR #5 pilot-gate.yml CI workflow, PR #6 ragas + per-axis grader wiring, PR #7 Monday snapshot job + refund webhook. Average net change per PR is about 450 lines. The point is not the number 7. The point is that each PR is small enough to review in under thirty minutes and lands a single decision the team can argue with.

Why does the rubric land before the agent in PR #1?

Because a rubric written after the agent is descriptive, not prescriptive. It scores whatever the agent happens to do well. Landing rubric.yaml on calendar day 8, before any agents/ code exists, forces the engagement owner and the client lead to agree on what the prototype is for. The five axes, the floors, and the week 2 row of the ratchet become the contract that PRs #3 through #7 are attempts to satisfy. The cancel-and-refund clause has nothing to bind to without that PR.

What happens if the day 14 snapshot misses the rubric_min_score threshold?

PR #7's refund-signal job fires the BILLING_REFUND_WEBHOOK before anyone is at their desk. The invoice pauses. An issue opens with .pilot-gate/latest.json attached and assigned to the engagement owner and the client lead. The 30 minute meeting at 09:30 UTC reads the snapshot alongside the day 7 snapshot, computes the slope, and decides: continue if the slope puts next week's row of the ratchet in reach, fire the refund clause for real if not. The decision input is the same JSON file in both branches.

Can the seven PRs land out of order?

Three of them have to land in this order: PR #1 (rubric) before PR #3 (agent), PR #5 (pilot-gate.yml) before PR #6 (graders) and PR #7 (snapshot job), and PR #4 (cases from production traces) before the day 14 cron fires. The other dependencies are softer. PR #2 (staging deploy) can move forward by a day if the client platform team is already running. PR #6 (graders) can be split into two PRs if the policy grader needs more review. We have shipped engagements where the PR count was 6 or 8; we have not shipped one where the rubric landed after the agent.

Who reviews each week 2 PR?

CODEOWNERS routes by directory. agents/ goes to the engagement owner plus the client lead. eval/ goes to the client data lead plus the engagement owner. infra/ goes to the client platform owner plus the engagement owner. The rubric file goes to the engagement owner plus the client lead. The .github/workflows/ file goes to the client platform owner. By calendar day 14, six different humans on two organizations have approved at least one PR. That is what makes the leave behind real instead of a pile of unfamiliar code on day 42.

What is the relationship between the week 2 PR list and the FDE week 2 prototype rubric?

The rubric is the spec; the seven PRs are the implementation. PR #1 lands the rubric file. PR #3 lands the first agent attempt at it. PR #4 lands the case set the rubric is graded against. PR #5 lands the gate that runs the rubric on every subsequent PR. PR #6 lands the graders the rubric calls. PR #7 lands the snapshot job that writes the input to the day 14 decision meeting. If you read only one of the two, read the rubric page first; the PR list is what the rubric looks like as a sequence of merges.

Does fde10x land these PRs as the engineer's own, or as authored on the client team?

Authored by the named senior engineer from the scoping call, pushed from a fork to the client repo, merged by the engagement owner. The PR author column on calendar day 14 reads as the senior engineer's GitHub handle, not a generic vendor account. That is on purpose. The leave behind is the senior engineer's git history; if you have to fire the refund clause, you keep their commits.

Other engagement-shape pages on fde10x.com

Related guides

Rubric

FDE Week 2 prototype rubric: the file, the five axes, and the 30-minute decision meeting

The rubric file the seven week 2 PRs are graded against. Five axes, 15 cases drawn from production traces, the ratchet schedule, and the Monday morning refund webhook.

Read

Engagement model

AI pilot to production: the two gates and the promotion PR

What goes in pilot-gate.yml at week 2 and production-gate.yml at week 6, plus the one-line promotion PR that flips the rollout flag from 10 to 100 percent.

Read

Background

A short history of the forward deployed engineer

Where the FDE role came from, in three eras: Palantir Delta in the early 2010s, the Anthropic and OpenAI applied AI teams in 2024 to 2025, and the vendor neutral studios that came after.

Read

The merge log on Monday morning

The seven PRs in detail

pilot/rubric-skeleton

pilot/staging-deploy

pilot/agent-v0

pilot/cases-from-traces

pilot/gate-workflow

pilot/grader-wiring

pilot/snapshot-and-refund

What the day 14 snapshot file looks like

The Monday morning, in terminal output

What goes wrong vs what we ship

What you can copy off this page

Want a named senior engineer landing this PR list inside your repo next week?

Frequently asked questions

Related guides

FDE Week 2 prototype rubric: the file, the five axes, and the 30-minute decision meeting

AI pilot to production: the two gates and the promotion PR

A short history of the forward deployed engineer

Comments (••)

`pilot/rubric-skeleton`

`pilot/staging-deploy`

`pilot/agent-v0`

`pilot/cases-from-traces`

`pilot/gate-workflow`

`pilot/grader-wiring`

`pilot/snapshot-and-refund`

Comments ()