Cohort Segmentation in the Copilot Usage Metrics API¶

The Copilot Usage Metrics API now sorts each engaged user into one of four AI-adoption phases, recovering the segmented shape aggregate utilization hides.

On 29 May 2026 the Copilot Usage Metrics API gained ai_adoption_phase (user-level) and totals_by_ai_adoption_phase (enterprise/org-level), keyed to a four-phase classification over a rolling 28-day window (GitHub Changelog, 2026-05-29).

The four phases¶

Phase	Name	Definition
0	No cohort	Did not meet engagement criteria in the 28-day window
1	Code-first	Code completion and/or IDE agent mode
2	Agent-first	A single GitHub-based agent surface
3	Multi-agent	Two or more agent surfaces, or the new Copilot app

Each entry carries a version field (starts at v1) so logic can evolve without breaking history. totals_by_ai_adoption_phase reports averages per user inside the phase — engaged users, interactions, completion/acceptance activity, lines added/deleted, PRs created/merged/reviewed, median time-to-merge (GitHub Changelog, 2026-05-29). Averages, not sums, so phase size never dominates intensity.

When this applies¶

The cohort layer is only diagnostic at meaningful scale:

25 or more engaged users per phase: below that, one developer adopting Copilot Workspace can flip an org-level "Multi-agent growth" headline
60 or more days since rollout: earlier, cohort transitions are onboarding artifacts (Phase 0 to 1 to 2 happens mechanically in month one), not the result of interventions
outcome telemetry wired alongside: pass rate, revision rate, and cost per merged PR decide whether a phase shift mattered, and without them the dashboard is a vanity metric surface

Mapping phases to interventions¶

Each phase has a different intervention surface, which is the reason to break adoption out:

Phase	Likely cause of stall	Intervention
0	Dormant licence, policy block, refusal	Triage before reclaiming — causes diverge (AI Adoption Footprint)
1	Has not crossed the agent-supervision skill gate	Pair sessions, agent-mode demos, low-stakes first tasks
2	Single surface is enough for current work	Multi-surface onboarding only if work requires it
3	Power user — retention question	Instrument outcomes, protect from churn, mine for patterns

Phase 0 does not equal "dormant." It means "did not meet engagement criteria in 28 days," which mixes non-users, policy-blocked users, people on leave, and users whose work happened to skip tracked surfaces.

Why it works¶

Aggregate utilization — "60% active this month" — is a mean over a distribution that is almost always bimodal in engineering orgs: a small power-user mode, a long middle, and a dormant tail (Userpilot 2026, AI Adoption Footprint). Means over bimodal distributions hide the modes, and the modes are where interventions land. A 60% headline is consistent with a 20%-power / 40%-dormant org and with a uniform-60%-medium-use org — same number, opposite playbooks.

Cohort segmentation recovers the conditional P(activity | phase) instead of the marginal P(activity) — the actionable shape (Zigpoll 2026). The API's depth-of-surface framing aligns with the capability-layering mechanism in AI Adoption Footprint: each phase boundary is a skill gate (autocomplete to agent supervision to multi-agent), so per-phase averages reveal which gate is choking adoption.

When this backfires¶

Small teams or fresh rollouts: the 28-day window plus small per-phase samples produce noise that reads as trend, so below the thresholds above, ignore the split
Phase 3 treated as a goal: grade enablement on "% in Phase 3" and teams Goodhart it — open Workspace once, install the app, touch two surfaces, and Phase 3 climbs while throughput stays flat (the Agent Headcount Vanity Metric shape on a different axis)
Single-surface shops by policy: regulated environments that disable agent surfaces have a structural Phase 1 ceiling, so reading the distribution as adoption maturity there is a category error
Replacement for outcome telemetry: cohort distribution decides where to invest, while outcome metrics decide whether it worked, so reporting Phase 3 growth alone repeats the DORA-as-vanity drift (Larridin: Why DORA Metrics Break in the AI Era)
Depth confused with productivity: time savings plateaued around four hours per week even as adoption climbed from about 50% to 91% in DX 2025 (Rob Bowley on DX 2025), so a higher Phase 3 share is not a higher-productivity org

Cohort segmentation is the right diagnostic primitive for adoption shape; it is the wrong target for adoption success.

Example¶

A 200-engineer org sees weekly_active_users / engaged_users = 0.62 and reports "62% weekly active." After enabling the cohort fields, the same month's totals_by_ai_adoption_phase resolves to:

Phase 0:  48 users   (24%)   — needs refusal / policy / dormancy triage
Phase 1: 110 users   (55%)   — completion + IDE agent only
Phase 2:  30 users   (15%)   — one agent surface
Phase 3:  12 users    (6%)   — multi-agent / Copilot app

The 62% headline is consistent with this org and with a uniform-60%-medium-use org, but the intervention reads diverge sharply: this org needs Phase 0 triage (24% is too large to be people on leave) and a Phase 1 to 2 skill-gate program (the chat-tool middle from AI Adoption Footprint is 55% of the population). Phase 3 growth is not the lever; the Phase 1 boundary is.

Key Takeaways¶

The Copilot API now exposes ai_adoption_phase (user) and totals_by_ai_adoption_phase (org/enterprise) over a 28-day window, with four phases keyed to surface depth.
Cohort segmentation is a diagnostic primitive: it recovers the conditional P(activity | phase) that an aggregate "% active" headline averages away.
Each phase maps to a distinct intervention; Phase 3 is a descriptor, not a target.
Apply at scale only: ~25+ users per phase, ≥60 days post-rollout, outcome telemetry wired alongside.
Without outcome metrics, cohort dashboards degenerate into the same vanity surface that headcount and aggregate utilization already produce.

AI Adoption Footprint: The Segmented Shape of Engineering Orgs — the underlying segmented-distribution mechanism cohort segmentation operationalises.
Agent Headcount as a Vanity Metric — the Goodhart failure mode Phase 3 will exhibit if treated as a target.
The Productivity-Experience Paradox in AI-Assisted Development — why depth-of-use can rise while developer experience and outcomes diverge.
Copilot vs Claude Billing Semantics — the other side of Copilot-specific instrumentation; cost telemetry that pairs with the cohort distribution.
Rolling Out CLI Coding Agents at Organization Scale — the rollout that these adoption-phase cohorts instrument: seed adoption socially, then track retention as a separate number.