What Makes a Good Prompt?
6 min read
A prompt that works once is not the same as a prompt that works every time. The difference comes down to five dimensions that predict whether a prompt is ready for automation or still needs human supervision.
Clarity and Ambiguity
Weight: 25% of overall score
A vague instruction produces different output every time. An explicit instruction with output format, length constraints, and boundaries produces the same structure every time. The difference is not subtle. It is the difference between a prompt you can automate and a prompt you have to babysit.
Weak
Summarize this article.
Strong
You are a research analyst. Summarize this article in exactly 3 bullet points. Each bullet must start with a category label in brackets (e.g., [Finding], [Risk], [Action]). Keep each bullet under 40 words. Do not add information not present in the source.
Context Completeness
Weight: 20% of overall score
Without context, the model fills in gaps with its own assumptions. A role, an audience, and a use case give it a frame. The frame determines whether the output is useful or just plausible.
Weak
Extract action items from the meeting.
Strong
You are an operations assistant for a 5-person agency. Extract action items from client meeting notes. Each action item must include: owner, deadline, and priority (high/medium/low). If an owner or deadline is missing, return null instead of guessing. The output is used by the project manager to update the team tracker.
Automation Robustness
Weight: 25% of overall score
A binary prompt with no fallback forces the model to guess when it is uncertain. A ternary prompt with a safety valve ('review needed') catches the edge cases that break automation pipelines. Robust prompts anticipate the gray area.
Weak
Classify this email as urgent or not.
Strong
Classify this email as urgent or not. If the email contains words like 'urgent', 'ASAP', 'deadline today', or 'critical', classify as urgent. Otherwise, classify as not urgent. If you cannot determine urgency with reasonable confidence, classify as 'review needed' and explain why.
Cost Efficiency
Weight: 15% of overall score
Open-ended extraction on a large document burns tokens without proportional value. A targeted extraction with a defined output shape costs less and produces more usable output. Cost efficiency is not about being cheap. It is about asking for exactly what you need.
Weak
Read this entire 50-page document and give me everything you find about budgets, timelines, risks, stakeholders, deliverables, dependencies, and assumptions.
Strong
Extract only budget figures and their fiscal quarters from this document. Return a table with columns: fiscal_quarter, budget_amount, budget_category. Skip any section that does not contain budget data.
Failure Mode Safety
Weight: 15% of overall score
Without anti-hallucination instructions, the model will invent details to fill gaps. A single fabricated quote in a customer feedback summary can mislead a product decision. Safety guardrails are not optional for prompts that feed business workflows.
Weak
Write a summary of this customer feedback.
Strong
Write a summary of this customer feedback. Do not fabricate quotes or statistics. If the feedback is too short to summarize meaningfully, return 'Insufficient data for summary' instead of padding. Flag any claim that cannot be verified against the source text.
The bottom line
A good prompt is not one that produces a nice output once. It is one that produces a reliable output every time, even when the input is messy, the stakes are high, or the prompt is running unattended in a pipeline. If you can remove yourself from the loop and the prompt still works, it is production-ready.
Strong prompts are the input; a repeatable review habit is the operating system. Upgrade only when that habit becomes recurring.