“Let me tell you, my good friend…”
If you’ve built LLM-powered NPCs, you’ve heard this. Your character starts strong, maybe even impressive. But by turn 20, something’s wrong. They’re repeating themselves. The same phrases. The same gestures. The same rhythm.
I ran 150 turns of dialogue with an LLM-powered character last week. Here’s what I found:
| Pattern | Frequency |
|---|---|
| “takes a long swig from his glass” | 59% of responses |
| “my good friend/man/fellow” | 30% of responses |
| “eyes unfocused” | 15% of responses |
| “let me tell you” | 13% of responses |
My character, Francis—a boozy ex-politician—couldn’t stop drinking. Not because it was dramatically appropriate. Because the LLM fell into a rut.
This isn’t a model problem. It’s a systems problem. And it’s solvable.
Why LLMs Repeat Themselves
LLMs are trained to predict the most likely next token. For character dialogue, this means:
- They latch onto successful patterns (things that “worked” in training)
- They lack session memory of what they’ve already said
- They have no variety pressure pushing them away from repetition
Your character YAML might say “Francis drinks whiskey” once. But the LLM sees that pattern succeed and keeps deploying it. Token by token, response by response, your character becomes a caricature of themselves.
The technical term is mode collapse in generation—the model converges on a small set of high-probability outputs.
The Research-Backed Fix: Diversity Penalties
This isn’t a new problem. In 2016, researchers published “Diverse Beam Search” (arXiv:1610.02424), introducing diversity penalties to neural sequence generation. The idea: penalize the model for repeating itself.
The technique works. But there’s a catch: it’s buried in inference code, inaccessible to game writers who actually craft the characters.
The insight: What if we made diversity penalties configurable by writers?
Solution: YAML-Driven Novelty Guards
Here’s the approach we built and validated:
| |
How it works:
- Track phrase usage during the session
- When a limit is hit, inject a variety prompt into the next generation
- The model gets a gentle nudge: “you just did that, try something else”
This isn’t rejection sampling (expensive, unpredictable). It’s guidance—working with the model, not fighting it.
The V6 Secret: Action Vocabulary
But novelty guards are only half the fix. The other half: give the model better options.
In an earlier version of our system (V6), we discovered that action vocabulary tables dramatically improved variety:
| |
Instead of the model inventing “takes a swig” over and over, it selects from a curated vocabulary. The variety is baked in.
Key insight: This is the “Code as Director, AI as Screenwriter” pattern. Code provides the palette; AI paints with it.
Prompt Placement Matters
One more trick from the research: recency bias.
LLMs pay more attention to the end of prompts than the middle. If your “don’t repeat yourself” rules are buried in the middle of a long system prompt, they’ll be ignored.
Move your constraints to the last position:
This simple reordering can reduce repetition by 20-30% with zero code changes.
Results: 75-Turn Validation
We ran a complete 75-turn, 3-act murder mystery playthrough. Here’s what happened:
Before Fix (59% Repetition Problem)
| Pattern | Frequency | Assessment |
|---|---|---|
| “takes a long [swig]” | 59% | 🔴 Annoying |
| “my good friend/man” | 30% | 🔴 Robotic |
| Unique action variety | ~3 per 20 turns | 🔴 Boring |
After Fix (Novelty Guards + Action Vocab + Prompt Placement)
| Character | Signature Phrase | Frequency | Assessment |
|---|---|---|---|
| Mary (housekeeper) | “Begging your pardon, sir” | 12% | ✅ Sweet spot |
| Anu (financier) | “Let’s talk numbers” | 5% | ✅ Perfect |
| Thomas, Xavier, others | Various | 0% | 🟡 Need encouragement |
Key findings:
- Characters WITH strong YAML vocabulary hit the 8-15% sweet spot naturally
- Characters with sparse YAML definitions underperformed—they need richer options
- The variety prompts prevented any phrase from exceeding limits
The Sweet Spot
| State | Occurrence | Assessment |
|---|---|---|
| Original | 59% | 🔴 Annoying |
| Overcorrected | 0% | 🔴 Robot (no personality) |
| Calibrated | 8-15% | ✅ Character voice |
Catchphrases ARE character voice, not bugs. Mary SHOULD say “begging your pardon” more than Xavier. The goal isn’t elimination—it’s balance.
The Bigger Picture: Flavor vs. Mechanics
Here’s what I’ve learned: mechanics are easy, flavor is hard.
I can build a dialogue system that runs 75 turns without crashing. That’s table stakes. The hard part is making those 75 turns interesting—making a character feel alive instead of like a chatbot wearing a costume.
The techniques in this post—novelty guards, action vocabularies, prompt placement—are all about injecting flavor into mechanical systems. They’re the difference between:
Generic:
“Let me tell you, my good friend, everyone has secrets.”
Specific:
“(gripping the armrest to steady himself) The ‘98 fundraiser…” (stops) “You wouldn’t know about that. Before your time.”
The second one has texture. A specific reference. A self-interruption. Physical struggle. It earns attention.
Implementation Checklist
If you’re building LLM NPCs:
Measure your repetition rate. Grep your transcripts. You’ll be surprised.
1grep -c "takes a.*swig\|my good friend" transcript.logAdd catchphrase limits to your character configs. Start with 3-5 uses max.
1 2 3novelty_rules: catchphrase_limits: "signature phrase": 4Build an action vocabulary per emotional state. 4-5 options each.
1 2action_vocabulary: nervous: ["(fidgets)", "(glances at door)", "(wrings hands)"]Move variety rules to the end of your prompts. Leverage recency bias.
Encourage usage, don’t just limit it. Change “Use phrases like X” to “Your signature phrases (use 1-2 per scene): X, Y, Z”
These aren’t silver bullets. But they’re the difference between “neat demo” and “compelling experience.”
What’s Next
We’re continuing to develop:
- Consciousness contamination: Characters breaking the fourth wall at high narrative tension
- Quantum death mechanics: Evidence-driven murder resolution with probability fields
- Memory streams: Stanford Generative Agents-style persistent memory
The murder mystery game engine (MM-KMD) is in active development. If you’re working on similar problems, I’d love to hear your approach.
Got questions? Found a better approach? Reach out: paul@phasespace.co
Appendix: The Full Tech Stack
For the curious, here’s what we’re running:
- Model: Ollama (Gemma2 27B, q8 quantized) on RTX 5090
- Latency: <2s per response
- Framework: FastAPI + WebSocket for real-time streaming
- Config: YAML for all character data, feature flags, game rules
- Testing: 970+ automated tests + Ghost Runner (LLM-as-player)
- Philosophy: “YAML = DNA, AI = Life, Memory = Soul”
The key architectural insight: Code is Director, AI is Screenwriter. The code owns structure (who speaks, when, about what). The AI owns content (actual words, gestures, rhythm). Neither can do the other’s job well.