On Difficulty
Static Shocks and Rubber Bands
From a recent Seth Skorkowsky Video
Never Scale The Power-Level: The essential idea behind this is the world of your game is a living world. There are things in it that are bigger and badder than your Player Characters are. Like with real life, your Player Characters might encounter a threat that is tougher and meaner than they are… in which case they can either hide, negotiate, flee, or… die if they try to fight this thing that is clearly way more powerful than they are. Okay yeah — I’m fine with this so far; I’ve got a few exceptions I’m thinking about, but let’s keep going with this.
Therefore Game Masters are doing a disservice to the integrity of the game and the world if they adjust the threats based off whatever the character’s power level is. Yeahhhhhh, that’s where I have a problem with this advice. I agree that the Player Characters should have the opportunity of encountering opponents that are way far beyond their capability; just way tougher and more powerful than they are. In those cases, retreat, negotiation, surrender and whatnot — those should always be options instead of y’know fighting an opponent that’s way more powerful than them.
However, any adventure I run for my players has a reasonable chance of success and survival if they play this. I’m not going to curbstomp a bunch of first-level characters with an ancient red dragon for the sake of it being a “realistic world”, man. That’s not fun for me, that’s not fun for anyone else at the table.
Because most of the people giving this “sage advice” in my comments over the years… how do I put this… are gray-beard grognards that then go on about how hardcore and realistic their OD&D and AD&D games are while sissy games like 5e spoiled everyone who says otherwise — they seem to forget that the old AD&D games introduced the idea of power-scaling in the first place. When [the old modules] say what the number of characters and levels an adventure is for; that’s what that means. That is power-scaling.
Semantics
First, I think it’s very important to agree on what “Difficulty” is. I have an article from 2021, What Is Difficulty? that claims that all we’re ever doing is processing information, and then translating that into a decision about how to physically act. You can make the choice about how to physically act difficult (as in chess), and you can also make the physical action itself difficult (as in guitar hero), or both (fortnite, rocket league, starcraft, etc). Dwiz wrote almost exactly the same thing two months ago.
My original article was missing impact; good decisions and clean execution need to have meaningfully different outcome from bad decisions and sloppy execution (otherwise it didn’t matter). In that context, an ancient red dragon from Seth’s example isn’t “high difficulty” for a table of first-level characters; they’re just going to die so there’s no real difficult puzzle to solve.
Power-scaling, like Seth is referring to, is one (very common) method of keeping difficulty in an area where the decisions the players make are impactful.
Wordle
I spun up https://beaurancourt.github.io/wordle-demonstration/ (source) to demonstrate an idea that’s been kicking around in my head every since I played absurdle.
If you’re not familiar with Wordle, you can think of it as an evolution of mastermind. The rules are pretty simple: there is a secret 5-letter word, you guess a 5-letter word and get feedback. A letter will be colored green if it’s the right letter in the right place, yellow if it’s the right letter in the wrong place, and gray otherwise. If the word is PRONE, then a guess of CRANE will show a green -R-NE, while a guess of PILOT will show a green P and a yellow O.
Wordle has static difficulty; if you’re bad at guessing then you’ll have a harder time solving a puzzle than someone who is good. This has varying appeal: it might be so easy as to be boring if you’re very good, or so difficult as to be frustrating. There’s also a wide range of experience; you can just get lucky and just so happen to guess the word correctly on your first or second try.
https://beaurancourt.github.io/wordle-demonstration/ explores what wordle feels like when given dynamic difficulty. Rather than pre-committing to a secret word, the puzzle is allowed to be flexible. It can’t violate any of the evidence it’s given you so far, but it is allowed to “cheat”.
Every time you make a guess, you gain information. When a letter is gray, you can eliminate all of the words that have that letter. When a letter is yellow, you can eliminate all of the words that don’t have that letter. When a letter is green, you can eliminate all of the words that don’t have that letter in that spot.
If we know the full list of possible answers ahead of time, we can simulate what clue we’d get back for our guess for each word. The clue reduces the possible answer list. The guess that, on average, reduces the answer list the most is the “best guess”.
You can also calculate how “lucky” you got; there’s some amount of information you expect to get for a guess, and if you got more information then you got lucky, and if you got less, you got unlucky.
The various variants in the demonstration put their thumb on the scales. For easy mode, you’ll always be given quite a bit more information that you’d expect to.
Normally, guessing FUFFY is terrible; you find out way less information than you’d get with RAISE or CRANE. But in easy mode, if you guess FUFFY… then the actual answer will tend to have an U and F.
Where Easy mode is always trying to help you out and make you feel like you’re good at wordle, absurdle does the opposite:
It keeps providing the least amount of information possible, drawing out the game as long as it can until you corner it.
Diablo 4
The current season of diablo 4 has sixteen difficulty levels. At each difficulty level, monster hp and damage increase, but so does xp, gold, and loot quality.
In diablo, you scale very fast, and there are razor-thin margins where the difficulty feels correct. The game has to account for a dizzying amount of character customization; classes, skill builds, paragon builds, dozens of unique pieces of gear, gear levels, random stats on the gear, etc.
Rather than attempt to design a difficulty, or figure out what level of difficulty you “should” be in to have the intended experience (where your decisions lead to meaningfully different outcomes), they instead let it be player-driven.
Quickly your character will becomes a god, capable of sprinting through areas, annihilating everything you look at. If you’re playing on too low of a difficulty, stuff will legitimately die just by being near you and can’t hope to damage you. That the cue to up the difficulty. If you up it too far, you start being unable to actually damage monsters and they’re one-shotting you (note how torment-12 monsters have 27x more HP and deal 2.6x more damage than torment-9 monsters).
In my experience, this still wasn’t enough; it always felt like the game was either mind-numbingly easy or numerically impossible, and it was very rare where it felt like my moment-to-moment gameplay mattered.
Instead, it felt like the lion’s share of my effectiveness came from my build decisions (class/skills/paragon/gear/etc). The trouble is that you effectively make that decision once, and then spend dozens of hours making moment-to-moment decisions that don’t matter.
TTRPGs
Getting back to the Seth quote:
[…] your game is a living world. […] Like with real life, your Player Characters might encounter a threat that is tougher and meaner than they are… in which case they can either hide, negotiate, flee, or… die […]
Therefore Game Masters are doing a disservice to the integrity of the game and the world if they adjust the threats based off whatever the character’s power level is. […]
This is directly equivalent to the idea I’m demonstrating with Wordle. If my players do “good” stuff; they pick strong builds at character creation and make good combat decisions, then against static difficulty (like a totally pre-written module) they’ll have a much easier time. If they pick weak builds, lack party synergy, or make bad decisions, then they’ll have a much harder time.
If I put my thumb on the scale and start adjusting for their capabilities, then some subset of their decisions get invalidated.
As an easy demonstration, imagine that Strong Party has twice the damage output as Weak Party, so I double the HP of everything Strong Party fights, to make their effectiveness the same. They went through the optimization and coordination effort to try to make good decisions, but then I went and removed the impact of those decisions, just like how the Wordle variants can post-hoc switch out the answer to make your guess good or bad.
Yet, if we try to implement static difficulty you we into the diablo problem: it’s very easy for huge swaths of your decisions to be no-impact because of your previous decisions. Your party will be able to stomp content to the degree that literal months can be spent in a weird victory lap. TTRPG characters tend to have a low amount of highly impactful decisions (like class choices, feat choices, spell selection, etc) be front-loaded at character creation, and then drip-fed as they level up.
To use BX as an example, say a table with 6 players roll the following arrays
STR, INT, WIS, DEX, CON, CHA
Alice: 15 (+1), 18 (+3), 14 (+1), 8 (-1), 4 (-2), 11 (0)
Bob: 8 (-1), 9 (0), 12 (0), 13(+1), 8 (-1), 8 (-1)
Carol: 6 (-1), 8 (-1), 8 (-1), 10 (0), 16 (+2), 13 (+1)
David: 12 (0), 9 (0), 11 (0), 15 (+1), 10 (0), 16 (+2)
Eve: 13 (+1), 9 (0), 13 (+1), 9 (0), 15 (+1), 11 (0)
Fred: 10 (0), 12 (0), 10 (0), 13 (+1), 10 (0), 10 (0)
The party will be much stronger if the players analyze and coordinate than if they pick based on personal preference and vibes.
Analyze/coordinate might look like:
Many combats take place at doors and hallways, and we can usually retreat to such a position
We can fit three people abreast in a hallway with smaller weapons and shields
The other three people can use spears or ranged weapons
Dwarves are better than fighters (tiny xp penalty, infravision, better saving throws), and have the option to lower their INT or WIS to raise their STR
Elves are better than magic users (after they cast their spell, they revert to being fighters and can use spears/polearms/bows)
The campaign probably won’t last past the dwarf/elf level cap
Clerics are extremely good front-line combatants; they can wear plate armor and shields and can heal themselves and other front-liners
The most important stats for front-liners are dex and con; that’s what keeps them alive
CON is more important for clerics than dwarves (+1 on a d6 is +29%, vs +1 on a d8 is +22%)
We need a thief to pick locks, disable traps, and climb stuff.
Thieves can use any weapon, so let’s give them a polearm and a bow; most important stat is STR for the bonus polearm damage (not dex, since they don’t need the AC and the XP boost doesn’t matter)
So they grab a dwarf, two clerics, two elves, and a thief.
Alice plays an Elf, and chooses to lower their WIS from 14 to 12 to get their STR up to 16.
Bob plays an Elf and lowers their WIS from 12 to 10 to raise their STR up to 9.
Carol plays a thief, the +2 con represents a 80% HP increase over average
David plays a dwarf and lowers their WIS from 11 to 9 to get their STR up to 13 (+1 dex, 0 con, +1 str)
Eve plays a Cleric (+1 dex, +0 con, +1 str)
Fred plays a Cleric (+1 dex, +0 con, +0 str)
Vibes/Feels might look like:
Alice plays a magic user because 18 int sounds awesome and it’s the wizard stat.
Bob plays a thief because DEX is the thief stat.
Carol thinks it would be fun to roleplay a weak but tanky fighter (-1 str, +2 con)
David wants to be a nimble and charismatic hobbit (+1 dex, +2 cha) like Frodo.
Eve plays a fighter; +1 str and +1 con is obvious fighter material. Wants to wield a big sword.
Fred plays a thief since that’s the only class he’d get bonus XP on.
Try to imagine how much more effective the first party is than the second. They have three plate-wearers, two clerics that can cast cure light wounds and turn undead, two characters that can cast arcane spells each day. Their whole party can attack effectively (clerics and dwarf in the front, elves and thief in the back).
In plate and shield, their whole front line has 18 AC at level 1 (16 from plate, +1 from dex, +1 from shield).
Then the second party; no cleric so no healing. Two thieves that’ll step on each other’s toes. They only have two natural front-liners and neither has positive DEX, so they have 17 (or 16 with no shield) instead of 18 AC, which means they get hit 20% or 25% of the time by orcs instead of 15%, representing a ~34% decrease in survivability.
The second party is just way worse than the first, and that’s in a game where there’s very little optimization to be done. Imagine the gulf in WWN/Pathfinder/5e. These parties should be tackling different content to keep the game interesting.
That’s fine if the the GM is only trying to stay a couple of steps ahead of the players in terms of prep, but I think it invalidates the idea of spinning up a whole world ahead of time.
Here’s an example from ACKS 2e:
We’re creating all of this content ahead of time. I love the explicit breakdown here, it seems like the only real way to try to achieve static difficulty in a game like this. Despite the work, I think the inevitable result is that almost all of the content will be uninterestingly easy or difficult at any given time. Players can try to hunt for the needle in the haystack, but if I know anything about gamers, when given the choice between optimizing for fun and optimizing for %success, many players will ruin their own enjoyment in exchange for efficiency.
"Given the opportunity, players will optimize the fun right out of a game."
— Soren Johnson, Lead Designer of Civilization
Here’s some examples:
WoW players doing hundreds of hours of Alterac Valley for Don Julio's Band
Diablo 3 players doing the ~1 minute long loop of Road to Alacarnus hundreds of times
Hardcore WoW players almost exclusively doing low-level quests and fighting low-level mobs to get to 60.
The Destiny 1 Loot Cave
In the static-ttrpg-difficulty context, all of this too-easy content still has loot (often too much) and so is still very rewarding. It becomes very efficient for strong 6th level characters to go “farm” 3-4th level content. It’s not interesting, but it’s efficient.
So now, because of a few hours spent making meaningful character creation/build choices, way more hours at the table is spent playing wrong-difficulty content. The only real ways around this are to…
Dynamically adjust the difficulty of the content
Build a very tightly balanced game so that it’s hard to escape the bounds of the intended difficulty
I’ve never once seen the second one happen, so if we want to optimize for meaningful decisions at the table, we have to scale the power level, as Seth says.
Cursed Design
This is one of my favorite GDC videos. In it, Alex Jaffe (Talesin Jaffe of Critical Role fame’s brother, funny enough) goes over what he calls “Cursed Problems” - where two “core promises” of your game are in internal conflict.
I posit that one of the Cursed Problems for RPGs in general is
I want to have lots of build customization and freedom
I want well-balanced and interesting combat/difficulty
By giving players the freedom to craft a build, they will inevitably come up with builds that are weaker or stronger than the baseline you balanced your game around. The more freedom there is, the larger of a combinatorial problem you have, and eventually the balancing problem becomes intractable.
In an OSR context, if you must have static difficulty, I highly recommend constraining character creation as much as possible, so something like 7voz (3 classes total, very low impact for random stats, though I think a static array would be even better, all weapons do d6 damage) ends up being way more workable for this kind of play than something with a lot of room for optimzation like ACKS 2e.











Let's say that I'm DMing Arden Vul for three different groups. The first group A is your optimized party, the second group B is your unoptimized party, and the third group C is also unoptimized but I use GM perogative to start them at level 5. What will happen? Well, likely B will stick around in Lankos Basement fighting rats, while A can advance deeper into the dungeon but not as deep as C. None of them can go down to the deepest levels and assault the factions there. The sandbox balances itself: players will seek a danger level that's appropriate. Is there a risk that party C will start farming rats in Lankos Basement like a deranged WoW player? Not really: tabletop game time is too precious to waste and most players realize this in my experience. I think this solves your cursed problem: players can go where they want and they are unlikely to be WoW-level degenerate about it.
When I think of difficulty in OSR games, I don't think much about monster combat effectiveness. Say I make a simple dungeon: room 1 has a bunch of orcs, room 2 is empty and room 3 has some treasure. How well the players optimize the party determines how likely they are to defeat the orcs, but optimizing the party isn't that hard (as you note). I can add more orcs to room 1, or make them stronger. This will make the players more likely to fail. But it doesn't affect how their choices impact the risk of success: their best strategy is still to optimize the party, same as before. Rolling 10 on a d10 is "more difficult" that rolling 6 on a d6, but is it really when skill doesn't matter?
Now if I add a chess puzzle to room 2 then "real" difficulty comes into play. How hard do I make the puzzle? Or if I add a NPC to negotiate with: How well can the players convince the NPC? This is the player skill I'm interested in.
The true difficulty in OSR games mostly come from faction play, in my experience. And per our previous discussion on your previous post that's a lot GM fiat. But a adventure where the faction descriptions are less forgiving ("The vampires will lie and stab the PCs in the back, the goblins will adapt to every threat, the kobolds have an excellent order of battle", etc. ) will truly be more difficult than an adventure with "easy" factions ("The orcs are brutish and easy to fool, the ghouls are prone to infighting, the myconids will worship any proven magic user as a god", etc).
Finally the ancient red dragon. Examples like it are common. I don't buy either of your takes: If party B from earlier waltzes up to the dragon of Arden Vul, the dragon can easily kill them. But in my game it won't. Instead it will dominate the PCs and try to make them do it's bidding. This can be a good deal for the PCs! Powerful dragons make great patrons. If the PCs play smart (here: difficulty!), they can profit mightily from the relationship. If the players insult the dragon or attack it, the dragon will eat them instead (bad play is punished).
This may be a bit of a ramble.
I think there is another aspect to this as well. I don't think you can actually escape dynamic difficulty, you can only try to keep it stable-ish. I know for a fact that my rulings are not entirely consistent. I'll make a decision one night, and three months later, I'll rule differently because I'm either tired and made a mistake, have gained new information, or just am tired of my group being in the same room after 2 hours of play and I want to keep things moving. That's going to have an effect on difficulty, even if I'm trying to stay consistent.
Something I'm coming to understand is that a lot of GMing and game design is about approximation. I work in a medical laboratory, so the reference that I'm going to use is quality control charts.
When working with lab tests you have three kinds of samples: Calibrators, Controls, and Patient Samples.
A calibrator is a fixed value. If your machine shows something else, your machine is broken or there is a misalignment somewhere.
A Control is a range of expected values, and that range will fluctuate over time. If things run a little high one morning, that's not a big deal. If it runs high over multiple days or shifts it's a problem.
Then you have patient samples, where are chaotic little monsters with no respect for the rules. They can be, and often are, anywhere on a chart.
Getting back to game design and GMing, I don't want the extremes of a calibrator or the chaos of a patient sample. I want to establish a comfortable range of possibilities. If the party steps outside of that range occasionally, that's fine. Sometimes it's fun to take on a challenge you know you shouldn't be able to, or to absolutely curb stomp a handful of low level NPCs. What I want to avoid is constantly being outside the intended ranges.
Keeping content within the ranges is the job of the dungeon master and adventure designers. Sometimes that means details need massaged a little. Other times it falls into place pretty easily. The simulation is, at least a little, flexible. If the party makes it clear they just want to talk, the dragon will probably let them. If the party cracks jokes at the dragon's expense to it's face? Well, different story.