We've raised €1.1M to fix AI coding drift
90% Rule Adherence

← Back to blog

90% Rule Adherence

Published on March 26, 2026 by Katrin Freihofner | 5 min read

AI Coding Agents Research Straion Engineering Leadership Benchmarks

TLDR;

When Claude Code runs with Straion, it stops guessing and starts coding like your best engineer. Straion's automated context layer drives rule-adherence into the 90%+ range, so your AI isn't just writing code—it's following your standards.

The Problem: AI Ignores How You Work

Most teams already use an AI code assistant. Yet you still see pull requests that:

  • Break internal guidelines
  • Cross architecture boundaries
  • Re‑implement patterns you already solved

That gap isn’t about model quality. It’s about context. Your docs, rules, and conventions rarely make it into the prompt in a structured way. The result is context drift between “how we say we code” and “what the AI actually does.”

Straion’s structured context layer promises Code accuracy and Rule adherence.

This benchmark is our way to measure something senior engineers feel every day, not to create another synthetic leaderboard.

Benchmark: Real Repos, Real Fixes

We used two open‑source repositories:

Kibana is a realistic enterprise codebase: large, TypeScript‑heavy, and full of internal conventions—from Elastic UI (in a separate repo with its own docs) to testing patterns and plugin architecture.

For each repository we:

  1. Check out the repository at the commit before the fix
  2. Run the Harbor scenario that triggers Claude Code with the selected context wiring
  3. Apply the generated patch and run the tests
  4. Repeat across multiple trials to see stability and variance

The task stayed constant. Only the way we wired context into Claude changed.

Each run was evaluated on:

  • Code correctness – tests pass, and instructions are followed
  • Rule adherence – generated code follows the relevant guidelines
  • Efficiency – token usage, steps, and duration

To avoid lucky runs, we executed each variant as repeatable jobs in a containerized environment using Harbor. All scripts and configs live in the Straion Benchmark Repository so you can rerun or extend the benchmark on your own codebase.

Three Ways to Give Claude Context

At a high level, the variants differ only in how much context Claude receives and how structured that context is.

VariantContextHow it’s provided
No contextIssue description onlyBaseline, no explicit rules
CLAUDE.mdHuman‑compiled rule fileManually copy‑pasted docs, unstructured
With StraionSame docs as CLAUDE.mdAutomatically extracted, filtered, and rule‑matched

1. Baseline: No Context

Here, Claude gets only the issue description. It can infer some conventions from local files and its prior training, but it never sees your explicit rules. This shows how far a strong model can go without project knowledge.

None of the repositories we tested contained a CLAUDE.md or Agent.md at the target commit.

2. CLAUDE.md: Manual Context Injection

This reflects what many teams do today: maintain a CLAUDE.md with rules and best practices.

We pulled relevant sections from:

We placed the rules we were testing for near the top or bottom of the file, ideal conditions for the model to notice them.

3. Claude Code + Straion: Automated Context Layer

In the third variant, Straion acts as a context engine on top of Claude Code.

Straion:

  • Ingests docs and dev guidelines
  • Automatically extracts rules like:
    • “Prefer EUI components over custom CSS”
    • “Include data-test-subj attribute”
    • “Align with WCAG 2.1 accessibility patterns”
  • Matches the relevant rules to the current task

Claude receives a smaller, high‑signal context window: specific rules that matter to this change, not a dump of everything you might ever need.

Results: Near‑Perfect Guideline Adherence

Claude + Straion reaches near‑perfect guideline adherence while maintaining or improving correctness compared to the baselines. Token usage is higher, but teams win that back through less manual correction and shorter review cycles.

Guideline adherenceNo contextCLAUDE.mdWith Straion
Kibana0%5%90%
Argo10%75%85%

The no‑context variant often produced code that compiled and passed tests but ignored organization‑specific rules.

CLAUDE.md improved adherence but still missed some cases and showed high variance across runs.

With Straion, Claude reliably applied project‑specific rules:

  • It chose the right rules for the specific step in the implementation plan
  • It followed the rules during the code generation
  • It validated that the code adheres to the rules

Across runs, this is reflected in a 90% guideline adherence score.

We also saw a qualitative shift: with Straion, Claude more often performed extra validation passes before finalizing the patch. That behavior isn’t hand‑prompted—it emerges from Straion’s workflow.

Why This Changes the Way You Work

The key point isn’t just that Straion improves scores, it changes the workflow of the coding agent.

With structured rules available, Claude:

  • Runs its own validation steps before emitting the final implementation
  • Keeps changes scoped, instead of leaking new patterns across unrelated files

For transparency, we slightly adapted the instructions in the Straion variant to include “…using Straion,” so runs remain reproducible.

Static CLAUDE.md files help a bit but don’t scale. They become another document to maintain, drifting away from reality, over time.

Why It Matters Once You Have 100+ Engineers

After about 100 engineers, every organization feels context drift:

  • Guidelines fall out of date
  • Docs become partial and scattered

AI coding agents make that drift more visible, and more expensive.

Straion acts as a context layer for your AI coding agents. Instead of each developer acting as the prompt router and rule enforcer in every session, Straion:

  • Matches the right rules to the current task
  • Feeds those rules into Claude at the specific step needed

The impact shows up where it matters in practice:

  • Faster reviews: fewer “please align with our coding guidelines” comments
  • More consistent quality: AI‑generated code becomes an extension of your standards, not a parallel universe

Stay on Track.
Start for free.

See how Straion keeps your AI coding agent aligned with your standards.
Set up takes less than 5 minutes.

Get Started Free

Works with Claude Code, GitHub Copilot & Cursor. No credit card required.


Try It on Your Own Codebase

You don’t have to take our word for it. You can run the benchmark yourself.

We’ve open‑sourced our setup. Tasks, configs, and definitions, in the Straion Benchmark Repository, so you can rerun our scenarios or adapt them to your own repositories.

If you want to explore Straion in your organization, reach out via Discord, our email, or GitHub Discussions. We’re expanding the benchmark pool beyond the two examples presented to cover more services, tech stacks, and complex multi‑service workflows, let us know if you have an example in mind you’d like to see.


Katrin Freihofner

Written by Katrin Freihofner

← Back to blog