7 Prompt Engineering Mistakes That Kill LLM Output Quality

Most prompts fail for one of seven reasons. The frustrating part is that many of these failures look like model failures when they're actually prompt failures that would vanish with a small change.

1. Describing what you want instead of how you want it

"Write a good LinkedIn post about our product launch."

"Write a 150-word LinkedIn post for senior product managers at B2B SaaS companies. Open with a data point. No emojis. End with a question."

Specify format, length, tone, audience, and structure explicitly.

2. Giving the model no persona or role

"You are a senior UX writer at a B2B software company. You write in a direct, jargon-free style. You prioritize clarity over cleverness."

A role narrows the sampling space toward what you actually want.

3. Asking for multiple outputs without specifying priority

"Write a LinkedIn post, tweet thread, and blog intro about this topic."

Separate into three prompts, or explicitly specify what each format requires.

4. Over-constraining with contradictory requirements

"Write a casual, friendly, professional, authoritative post that's both concise and comprehensive."

"Professional tone, optimized for a senior technical audience. Clarity over brevity, no paragraph longer than 4 lines."

Pick your primary constraint and make everything else secondary.

5. Not specifying what "done" looks like

"Explain machine learning to me"

"In 2–3 paragraphs" / "As a 5-item list" / "Under 200 words"

Scope gives the model a target.

6. Dropping context between iterations

"Now make it more casual" (more casual than what?)

Always pass the original prompt as context when optimizing.

7. Trusting output quality without testing against varied inputs

Testing with one example, deciding it works, shipping it.

Test against at least 10–20 varied inputs: unusually short/long, off-topic, ambiguous.

A prompt is only as good as its worst-case performance on realistic inputs.

Fixing All Seven with Structured Evaluation

Mistake	What to measure
Vague requirements	LLM-rubric assertion on specificity
No persona	Output tone consistency across inputs
Multiple outputs, no priority	Pass rate on each output type
Contradictory constraints	Constraint preservation check
No termination condition	length-max / length-min assertions
Dropped context	Semantic drift score between iterations
Untested on varied inputs	Batch evaluation across diverse test cases