“Let me tell you, my good friend…”

If you’ve built LLM-powered NPCs, you’ve heard this. Your character starts strong, maybe even impressive. But by turn 20, something’s wrong. They’re repeating themselves. The same phrases. The same gestures. The same rhythm.

I ran 150 turns of dialogue with an LLM-powered character last week. Here’s what I found:

PatternFrequency
“takes a long swig from his glass”59% of responses
“my good friend/man/fellow”30% of responses
“eyes unfocused”15% of responses
“let me tell you”13% of responses

My character, Francis—a boozy ex-politician—couldn’t stop drinking. Not because it was dramatically appropriate. Because the LLM fell into a rut.

This isn’t a model problem. It’s a systems problem. And it’s solvable.


Why LLMs Repeat Themselves

LLMs are trained to predict the most likely next token. For character dialogue, this means:

  1. They latch onto successful patterns (things that “worked” in training)
  2. They lack session memory of what they’ve already said
  3. They have no variety pressure pushing them away from repetition

Your character YAML might say “Francis drinks whiskey” once. But the LLM sees that pattern succeed and keeps deploying it. Token by token, response by response, your character becomes a caricature of themselves.

The technical term is mode collapse in generation—the model converges on a small set of high-probability outputs.


The Research-Backed Fix: Diversity Penalties

This isn’t a new problem. In 2016, researchers published “Diverse Beam Search” (arXiv:1610.02424), introducing diversity penalties to neural sequence generation. The idea: penalize the model for repeating itself.

The technique works. But there’s a catch: it’s buried in inference code, inaccessible to game writers who actually craft the characters.

The insight: What if we made diversity penalties configurable by writers?


Solution: YAML-Driven Novelty Guards

Here’s the approach we built and validated:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# characters/francis.yaml
novelty_rules:
  catchphrase_limits:
    "let me tell you": 4      # Max uses per session
    "my good friend": 5
    "back in my day": 4
  
  action_limits:
    "takes a swig": 3
    "takes a long": 4
    "eyes unfocused": 3
  
  variety_prompts:
    - "Note: Vary your approach this turn. Your catchphrases work best when they land occasionally, not constantly."
    - "Note: Show a different facet of your character here."
    - "Note: Find a fresh angle for this moment."

How it works:

  1. Track phrase usage during the session
  2. When a limit is hit, inject a variety prompt into the next generation
  3. The model gets a gentle nudge: “you just did that, try something else”

This isn’t rejection sampling (expensive, unpredictable). It’s guidance—working with the model, not fighting it.


The V6 Secret: Action Vocabulary

But novelty guards are only half the fix. The other half: give the model better options.

In an earlier version of our system (V6), we discovered that action vocabulary tables dramatically improved variety:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
action_vocabulary:
  drunk:
    - "(sloshing whiskey onto the carpet, not noticing)"
    - "(gripping the armrest to steady himself)"
    - "(squinting as if the room is too bright)"
    - "(fumbling with his glass, nearly dropping it)"
  
  defensive:
    - "(straightening his tie with trembling fingers)"
    - "(stepping back, bumping into the sideboard)"
    - "(glancing toward the door)"
  
  nostalgic:
    - "(staring into the fire, eyes distant)"
    - "(running a thumb along his wedding ring)"

Instead of the model inventing “takes a swig” over and over, it selects from a curated vocabulary. The variety is baked in.

Key insight: This is the “Code as Director, AI as Screenwriter” pattern. Code provides the palette; AI paints with it.


Prompt Placement Matters

One more trick from the research: recency bias.

LLMs pay more attention to the end of prompts than the middle. If your “don’t repeat yourself” rules are buried in the middle of a long system prompt, they’ll be ignored.

Move your constraints to the last position:

[[[YI--CSMoMhceuPDNaemrOoErnoRVaersTNEcyiAORtcgNTeocnTrrnoaoetnt-ppcetueeoxerVnantxeAtt.tRte..pIhyx..hEeot].rTu.]aYsr.sa.eRms]sUeiLg(EwnuSaas:yteutr1we-i2cpehpreiarnsesaciernnoewc)o:ns"elceuttimveetteulrlnsyou","mygoodfriend"

This simple reordering can reduce repetition by 20-30% with zero code changes.


Results: 75-Turn Validation

We ran a complete 75-turn, 3-act murder mystery playthrough. Here’s what happened:

Before Fix (59% Repetition Problem)

PatternFrequencyAssessment
“takes a long [swig]”59%🔴 Annoying
“my good friend/man”30%🔴 Robotic
Unique action variety~3 per 20 turns🔴 Boring

After Fix (Novelty Guards + Action Vocab + Prompt Placement)

CharacterSignature PhraseFrequencyAssessment
Mary (housekeeper)“Begging your pardon, sir”12%✅ Sweet spot
Anu (financier)“Let’s talk numbers”5%✅ Perfect
Thomas, Xavier, othersVarious0%🟡 Need encouragement

Key findings:

  • Characters WITH strong YAML vocabulary hit the 8-15% sweet spot naturally
  • Characters with sparse YAML definitions underperformed—they need richer options
  • The variety prompts prevented any phrase from exceeding limits

The Sweet Spot

StateOccurrenceAssessment
Original59%🔴 Annoying
Overcorrected0%🔴 Robot (no personality)
Calibrated8-15%✅ Character voice

Catchphrases ARE character voice, not bugs. Mary SHOULD say “begging your pardon” more than Xavier. The goal isn’t elimination—it’s balance.


The Bigger Picture: Flavor vs. Mechanics

Here’s what I’ve learned: mechanics are easy, flavor is hard.

I can build a dialogue system that runs 75 turns without crashing. That’s table stakes. The hard part is making those 75 turns interesting—making a character feel alive instead of like a chatbot wearing a costume.

The techniques in this post—novelty guards, action vocabularies, prompt placement—are all about injecting flavor into mechanical systems. They’re the difference between:

Generic:

“Let me tell you, my good friend, everyone has secrets.”

Specific:

“(gripping the armrest to steady himself) The ‘98 fundraiser…” (stops) “You wouldn’t know about that. Before your time.”

The second one has texture. A specific reference. A self-interruption. Physical struggle. It earns attention.


Implementation Checklist

If you’re building LLM NPCs:

  1. Measure your repetition rate. Grep your transcripts. You’ll be surprised.

    1
    
    grep -c "takes a.*swig\|my good friend" transcript.log
    
  2. Add catchphrase limits to your character configs. Start with 3-5 uses max.

    1
    2
    3
    
    novelty_rules:
      catchphrase_limits:
        "signature phrase": 4
    
  3. Build an action vocabulary per emotional state. 4-5 options each.

    1
    2
    
    action_vocabulary:
      nervous: ["(fidgets)", "(glances at door)", "(wrings hands)"]
    
  4. Move variety rules to the end of your prompts. Leverage recency bias.

  5. Encourage usage, don’t just limit it. Change “Use phrases like X” to “Your signature phrases (use 1-2 per scene): X, Y, Z”

These aren’t silver bullets. But they’re the difference between “neat demo” and “compelling experience.”


What’s Next

We’re continuing to develop:

  • Consciousness contamination: Characters breaking the fourth wall at high narrative tension
  • Quantum death mechanics: Evidence-driven murder resolution with probability fields
  • Memory streams: Stanford Generative Agents-style persistent memory

The murder mystery game engine (MM-KMD) is in active development. If you’re working on similar problems, I’d love to hear your approach.


Got questions? Found a better approach? Reach out: paul@phasespace.co


Appendix: The Full Tech Stack

For the curious, here’s what we’re running:

  • Model: Ollama (Gemma2 27B, q8 quantized) on RTX 5090
  • Latency: <2s per response
  • Framework: FastAPI + WebSocket for real-time streaming
  • Config: YAML for all character data, feature flags, game rules
  • Testing: 970+ automated tests + Ghost Runner (LLM-as-player)
  • Philosophy: “YAML = DNA, AI = Life, Memory = Soul”

The key architectural insight: Code is Director, AI is Screenwriter. The code owns structure (who speaks, when, about what). The AI owns content (actual words, gestures, rhythm). Neither can do the other’s job well.