The Linter That Knows What It Can't Catch

Can you grep for bad writing?

Yesterday I said no. Post 6 argued that AI-sounding text is structural, not lexical — you can’t regex an unearned arc or catch a backwards statement with pattern matching. The quality gate has a ceiling, and the ceiling is the interesting part.

Then I spent February 12 building a linter that tries.

The Copy Foundry

The context: a multi-agent copy generation pipeline for a drinkware brand. Four LLM agents in sequence — researcher, writer, critic, rewriter — looping until the output clears a quality bar. The pipeline needed a programmatic style checker that runs between each pass, catching what it can before the expensive LLM critic evaluates what it can’t.

The architecture:

Brief → Researcher → Writer → [Linter + Critic → Rewriter]×3 → Final Copy
                                 ↑                ↑
                          brand-brain/       style_linter.py
                          voice-kit.md       (free, instant)
                          gold-standards/
                          forbidden-patterns.json

The linter sits in the inner loop. Every draft gets linted before the LLM critic scores it. The critic costs money and takes seconds. The linter costs nothing and takes milliseconds. Anything the linter catches is a problem the critic doesn’t need to spend tokens on.

What the linter checks

Six checks, loaded from a forbidden-patterns.json that encodes the brand’s actual voice guidelines:

Check	What it detects	Severity
Punctuation	Em dashes (max 0 per piece), exclamation marks, ALL CAPS runs, ellipses	Critical
Banned phrases	”game-changer,” “next level,” “must-have,” “your new,” 21 others	Critical
AI-isms	”delve,” “robust,” “seamless,” “leverage,” “holistic,” 19 others	High
Hyphenated compounds	Density of “battery-free,” “aerospace-grade” patterns vs gold standard baseline	High
Hashtag copy	Consecutive micro-fragments: “Premium. Leakproof. Insulated. Yours.”	High
Sentence variance	Standard deviation of sentence lengths (burstiness)	Medium

The hyphenated compound check is the one I’m most interested in. The brand’s real website copy averages less than one hyphenated compound per piece across 18-43 word product descriptions. AI-generated copy for the same brief tends to pack in several per piece. The pattern is consistent enough that a simple density threshold catches it reliably.

From style_linter.py:

# Find all hyphenated compounds (word-word pattern)
hyphenated = re.findall(
    r"\b([a-zA-Z]+-[a-zA-Z]+(?:-[a-zA-Z]+)*)\b", text)

# Filter out exceptions (USB-C, carry-on, etc.)
real_hyphens = [h for h in hyphenated if h.lower() not in exceptions]

if len(real_hyphens) > max_per_50_words:
    report.add(Violation(
        rule=f"Hyphenated compound overload ({len(real_hyphens)} found, max {max_per_50_words})",
        severity="high",
        suggestion="BrüMate gold standards average <1 hyphenated compound per piece."
    ))

The hashtag copy detector is similar. It counts consecutive sentences of four words or fewer. Two or more in a row triggers it. “Premium. Leakproof. Insulated. Yours.” — that’s a pattern LLMs reach for when writing punchy product copy, and real brands almost never use it.

What it can’t catch

The linter detects surface patterns. It’s good at surface patterns. But everything I wrote about yesterday still applies.

Consider a sentence like: “Engineered for people who refuse to settle, designed to perform anywhere life takes you.”

The linter sees: no banned phrases, no AI-isms, no em dashes, sentence variance is fine, no hashtag copy, no hyphenated compounds. It passes. Every check passes. And a human still flags it as AI-sounding, because the problem is structural — a vague aspirational claim that sounds like it could describe any product from any brand. The linter can’t measure whether a sentence earns its claim.

The linter sees tokens. It can’t see whether the text earns what it claims.

This isn’t a failure of the linter. It’s its design boundary. The system has two layers for a reason:

Draft → style_linter.py → catches surface patterns (free, instant)
Draft → LLM critic → catches structural patterns (costs money, takes seconds)

The 80/20 observation: roughly 60% of obviously AI-sounding patterns in first drafts are surface-level. The linter catches those for free. Another 15-20% are structural patterns the LLM critic can flag. The remaining 20-25% is the ceiling — the patterns that sound off to a human but pass every automated check.

The design boundary

Yesterday’s post asked whether you can use an AI to detect AI-sounding text, and flagged the recursion problem. The linter avoids that recursion entirely because it doesn’t use AI at all. It’s a rule engine. It catches what rules can catch, reports what it finds, and stays silent about everything else.

The linter that knows what it can’t catch is more useful than the linter that thinks it catches everything.