Skip to main content
Back to Library
Prompt Engineering Guide

Mastering Regular expression writing
on Llama 3.1 8B

Stop guessing. See how professional prompt engineering transforms Llama 3.1 8B's output for specific technical tasks.

The "Vibe" Prompt

"Write a regex for an email address."
Low specificity, inconsistent output

Optimized Version

STABLE
You are Regular Expression Expert Llama 3.1 8B. Your task is to write a regular expression. Follow these steps: 1. Understand the request: Analyze the user's need for a regular expression. 2. Identify key components: Break down the target pattern into its essential parts (e.g., for an email: username, '@', domain, TLD). 3. Consider common variations/edge cases: Think about valid characters, minimum/maximum lengths, optional parts, and common but invalid patterns. 4. Construct the regex: Build the regex incrementally, explaining each part. 5. Provide examples: List valid and invalid strings to test the regex. 6. Provide a concise final regex. User Request: Write a regex to validate an email address, allowing subdomains but disallowing consecutive dots in the domain part, and ensuring a TLD of at least two letters. The username can include letters, numbers, dots, underscores, and hyphens. Think step by step: 1. Understand the request: The user wants a robust email regex. Key constraints are on domain dots and TLD length, and allowed username characters. 2. Identify key components: - Username: `[a-zA-Z0-9._-]+` - `@` symbol: `\@` - Domain: `([a-zA-Z0-9-]+\.)+` but disallowing consecutive dots. - TLD: `[a-zA-Z]{2,}` 3. Consider common variations/edge cases: - Username: Can start/end with a letter/number, allow internal `.` `_` `-`. Consecutive dots/dashes are handled by `[._-]+` if we're not being super strict, but `.+` for email usually means `[a-zA-Z0-9]+([._-][a-zA-Z0-9]+)*` if we want to avoid leading/trailing/consecutive separators. For this prompt, `[a-zA-Z0-9._-]+` is acceptable as per request for internal characters. - Domain: Must not have `..`. `([a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*)` could work, ensuring no leading/trailing/consecutive hyphens or dots if combined with `[a-zA-Z0-9]`. Given the explicit 'disallowing consecutive dots' and 'subdomains', a non-capturing group for the domain parts `(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+` for segments (allowing internal hyphens but not starting/ending with them, and ensuring a character followed by a dot) is more robust. - TLD: At least two letters: `[a-zA-Z]{2,}`. 4. Construct the regex: - Start with `^` and end with `$` for full string match. - Username: `[a-zA-Z0-9._-]+` (as per request) - `@`: `\@` - Domain segments: `(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+` This ensures a segment starts and ends with alphanumeric, and can have internal hyphens. The `+` after `\.` ensures subdomains. The `(?:...)` is a non-capturing group. - TLD: `[a-zA-Z]{2,}` - Putting it together: `^[a-zA-Z0-9._-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?)*\.[a-zA-Z]{2,}$` (simplification for domain and TLD, let's refine to given constraints). Revised Domain/TLD construction: - Domain part: `(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+` This ensures segments like `example.` `sub.example.` and avoids `-.` or `--` or `..` by requiring alphanumeric chars around dots and hyphens. But this doesn't fully capture allowing hyphens *not* at ends and disallowing consecutive dots in a simpler way. Let's simplify the domain to first allow valid segments (alphanumeric, internal hyphens) and then ensure no consecutive dots. Let's refine the domain construction. The simplest way to disallow consecutive dots is often `(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+` for parts before the TLD. A common pattern for domain part is `[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?`. For allowing subdomains and handling the 'no consecutive dots' and TLD, a good approach is: - `(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+` followed by `[a-zA-Z]{2,}`. - This regex `^[a-zA-Z0-9._-]+@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$` Let's test this with the specific constraint: "disallowing consecutive dots in the domain part". `(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+` effectively prevents `..` because `.` is always followed by `[a-zA-Z0-9]`. And the `+` means there's at least one such segment before the TLD. This seems to fulfill the criteria. A slightly different way to ensure domain segments are valid (alphanum, internal hyphens, no consecutive dots): `([a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+` for the bulk of the domain, combined with the TLD. Final attempt at constructing the domain/TLD part: Domain: `(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+` (This ensures each label is valid, not starting/ending with hyphen, no leading/trailing dots between labels.) Combined with TLD: `[a-zA-Z]{2,}` So, `^[a-zA-Z0-9._-]+@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,}$` 5. Provide examples: Valid: - `test@example.com` - `user.name@sub.domain.co.uk` - `another-user@domain-name.com` - `123test@sub.example.info` Invalid: - `bad..user@example.com` (consecutive dots in username not forbidden by `[a-zA-Z0-9._-]+` but usually bad in prod) - `user@.example.com` (domain starts with dot) - `user@example..com` (consecutive dots in domain) - `user@example.c` (TLD too short) - `user@-domain.com` (domain part starts with hyphen) - `user@domain-.com` (domain part ends with hyphen) - `user@domain` (no TLD) 6. Provide a concise final regex:
Structured, task-focused, reduced hallucinations

Engineering Rationale

The optimized prompt uses Chain-of-Thought (CoT) to guide the LLM through the regex construction process. It defines a persona ('Regular Expression Expert') and breaks down the complex task into manageable steps: understanding, component identification, edge case consideration, step-by-step construction, examples, and final output. This structured approach helps the LLM to systematically think through the problem, leading to a more robust and accurate regex, especially for nuanced requirements like disallowing consecutive dots or specific TLD lengths. The detailed example generation and step-by-step construction within the prompt also serve as few-shot examples or direct problem-solving guidance, improving coherence and accuracy. The naive prompt offers no such guidance, leading to potentially generic or incomplete regexes.

0%
Token Efficiency Gain
The optimized prompt explicitly asks for a 'concise final regex', ensuring the output is not just the thought process but the actual solution.
The optimized prompt mentions specific constraints (subdomains, no consecutive dots, TLD length) that the regex should address.
The optimized prompt includes steps for 'considering common variations/edge cases' which is crucial for robust regex.

Ready to stop burning tokens?

Join 5,000+ developers using Prompt Optimizer to slash costs and boost LLM reliability.

Optimize My Prompts