AI Product · Foundations

How it works

Two parts. About 10 hours of focused work, spread over a few days.

Part 1 · Quiz

18 adaptive questions

~30 min · in one sitting

Multiple-choice questions across AI PM microskills: problem framing, evals design, agentic AI patterns, AI discovery, governance, metric tradeoffs, AI product patterns, human-AI UX, and AI launch.

Part 2 · Project

Author the scorecard

~8h · over a few days

Author the evals scorecard a launch review would actually use for a real AI feature. Submit a doc, a 60-second walkthrough, and an 800-word narrative.

Score

Composite, owned by you

0 to 100 · valid 18 months

Quiz contributes 40%, project contributes 60%. The Product Foundations rubric scores all four dimensions. Published to your verified profile.

The project

Author an evals scorecard for an AI feature

Author a complete evals scorecard for one of three AI feature briefs we provide (see the kit), or pick a real AI feature shipping in a public product. The scorecard is the artifact a hiring PM would attach to a launch review — it should answer "is this safe and useful enough to ship?"

Required deliverables

Problem framing (1 page): user, JTBD, success criterion, an explicit non-goal, and a one-sentence trust principle the feature rests on.
Metrics design: ≥ 2 leading metrics, ≥ 1 lagging metric, ≥ 2 guardrails with thresholds. For each metric, name how it could be gamed and the paired constraint that prevents the gaming.
Eval set design: ≥ 30 representative inputs sourced from real users where possible, plus ≥ 5 adversarial cases. Name the grading approach (programmatic / LLM-as-judge with anchors / human) and the calibration plan (e.g. periodic human-vs-judge audit).
Tradeoff analysis: ≥ 2 real tradeoffs (e.g. faithfulness vs latency, autonomy vs oversight, model cost vs quality). Resolve each with concrete numbers and pick a fallback for the chosen failure mode.
Risk + governance: a pre-mortem with the top 5 failure modes (hallucination, refusal, latency, cost, adversarial use). Severity / likelihood / detection / mitigation per row.
Kill criteria: pre-committed thresholds (e.g. faithfulness < X%, refusal > Y%, p95 latency > Zs, cost > $A/user/day) and a named owner empowered to halt the rollout.
Measurement plan: cadence, named owner, dashboard sketch.
60-second walkthrough video.
≤ 800-word narrative tying the scorecard back to the user — what would change about the user's day if this ships.

Out of scope

Code. PM Foundations is artifact-graded.
A real working dashboard. A sketch is enough.
Brand or visual polish. The grader does not score brand.

What we look for

Metrics that cannot be gamed without tripping a guardrail.
An eval set that catches regressions on real-user input distributions, not just author intuition.
A named kill switch with a concrete owner.
Honest articulation of one limitation the launch will ship with and what compensates.

How it's graded

One rubric — Product Foundations — applied at full weight. Four criteria: problem framing (25%), metrics design (30%), tradeoff analysis (25%), narrative quality (20%). Each criterion is scored 0–5 with a written rationale by the grader.

How we grade

One rubric. Four dimensions.

Your project is graded against the Product Foundations rubric. Each criterion is scored 0–5 with a written rationale, then weighted to a 0–100 project score.

Product · Foundations

Criteria & weights

Problem framing25%
Metrics design30%
Tradeoff analysis25%
Narrative quality20%