Engineering Governance · AI Era

Size Change by Risk, Not Lines

How I retrained AI-assisted delivery to evaluate change by CLOC, blast radius, criticality, and change character — not naive LOC.

Overall change risk = magnitude × character.

The point

Raw LOC is not a risk signal. CLOC is only a dial. Blast radius and criticality are gates. Change character selects the guard.

Change risk is not LOC

The correction is simple: stop treating line count like risk, and classify the shape of the change.

The LOC trap

  • Reduces scope to one number
  • Treats all changed lines as equal
  • Misses contracts, callsites, and critical layers
  • Can underweight a 10-line high-risk change
  • Can overweight a large cosmetic cleanup

The architect’s model

  • Size change by risk, not naive line count
  • Use CLOC/change size as a dial
  • Use blast radius and criticality as gates
  • Use change character to select the right guards
  • Compare expected vs measured classification before merge
Overall change risk = magnitude × character
CLOC / change sizeThe dial. Excludes comments/blanks; production and test counted separately. Scales review budget and split decisions.
Blast radiusA gate. Reach across imports, callsites, public symbols, shared contracts, protocols, data structures, and integration seams.
CriticalityA gate. Execution paths, risk gates, migrations, decisioning pipelines, observability, tooling, or UI.
CharacterThe guard. Mechanical, refactor, rewrite, new feature, contract change, or schema change.
High blast radius or high criticalityDeep manual inspection is mandatory regardless of CLOC. Automated gates are necessary but not sufficient.
Low blast radius and low criticalityAutomated gates may be enough even at high CLOC, depending on change character.
A 10-line critical integration callsite change can outrank a 2,000-CLOC cosmetic cleanup.

The LOC trap

When AI started helping me plan implementation, one pattern kept showing up: it wanted to size work by lines of code.

That is a familiar trap. Engineering teams can fall into a similar pattern with story points or story size when the work is viewed too narrowly. A change becomes “small” because the code looks small, or “large” because the diff is large.

But architects cannot think that way.

A small diff can cross a critical boundary. A few lines can alter an integration callsite, a shared contract, a migration path, or a decisioning rule. A large cleanup can touch thousands of changed lines while leaving the real system risk low.

LOC is not a risk signal.

That was the correction I had to make in my AI-assisted development process. I stopped letting AI plan scope by naive line count and forced it to classify the shape of the change.

The change-risk model

The model is simple at the top:

Overall change risk = magnitude × character.

Magnitude is not one number. It has three axes.

CLOC or changed-code volume is the dial. It excludes comments and blanks, separates production from test, and helps scale review budget, pause-and-report cadence, and split decisions.

But CLOC is not a gate. It never decides whether the change is safe.

Blast radius is a gate. It asks how far the change reaches: imports, callsites, public symbols, shared contracts, core types, protocols, rule grammars, integration contracts, and multi-capability data structures.

Criticality is a gate. It asks where the change lands: execution paths, risk gates, constrained migrations, critical service layers, decisioning pipelines, observability, tooling, or UI.

Character selects the guard. Mechanical changes, refactors, rewrites, new features, and contract or schema changes do not carry the same inspection burden. When a change has multiple characters, every applicable guard applies.

The decision rule

The decision rule is where the model becomes useful.

If blast radius or criticality is high, deep manual inspection is mandatory regardless of CLOC. Automated gates are necessary, but not sufficient.

If both blast radius and criticality are low, automated gates may be enough even at high CLOC, depending on the character of the change.

That is the point.

A 10-line critical integration callsite change can outrank a 2,000-CLOC cosmetic cleanup.

The risk is not in the line count. The risk is in what the change touches, what it means, and what could break if the system absorbs it incorrectly.

Expected vs measured classification

The model also creates a discipline before and after implementation.

At dispatch, the expected classification is declared: criticality, blast radius, change type, and estimated CLOC split between production and test.

At ship, the measured classification is reported: actual CLOC breakdown, files touched, changed public symbols, dependent callsite count, shared contracts touched, and the change character as built.

The delta is the signal.

If the measured shape diverges from the expected shape, the issue surfaces before merge — not after production learns the truth.

That delta can reveal scope creep, hidden coupling, or a wrong design estimate. It also teaches the AI how to plan the next change more like an architect and less like a line counter.

Why this matters in the AI era

AI makes implementation faster, but it can also make locally reasonable changes at a scale humans struggle to inspect after the fact.

That means the sizing model matters.

If the model sizes work by LOC, AI optimizes for the wrong signal.

If the model sizes work by change risk, AI starts reasoning about boundaries, contracts, dependencies, rollback, and business/process impact before dispatch.

That is the shift I wanted.

Not AI asking, “How many lines will this take?”

AI asking, “What is the shape of this change, where does it land, what does it touch, and what guard does it require?”

That is the difference between code generation and governed execution.

Size means CLOC.
Risk means gates × character.
AI can generate implementation.
Humans govern the shape of change.