Lab 6: Merchant Risk Assessment

Exercise Overview

AnyCompany's risk teams assess thousands of merchants across Southeast Asia. Each assessment requires analyzing transaction data, chargeback rates, complaint history, and compliance status — then writing a clear narrative that non-technical stakeholders can act on.

Currently, analysts spend 30-45 minutes per merchant writing these manually. In this exercise, you'll build a prompt template that produces consistent, high-quality assessments in seconds.

⚙️ Setup: How to run this exercise

Use Kiro chat for this exercise. You'll paste prompts and observe how the output improves with each technique.

Keeping steps independent:

Start a New Session for each step (Steps 1–5, 7, 8)
Step 6 continues in the same session as Step 5
Each prompt includes an isolation instruction — a steering file is pre-configured to tell Kiro not to read previous step files, ensuring each technique is evaluated independently
Each step saves to a unique filename (step1-..., step2-...) so you can compare outputs side-by-side at the end

📦 First-time Kiro setup (do this once before starting)

Download and extract this zip into your workspace root folder. It adds steering files that configure Kiro for the exercises.

Download kiro-workshop-setup.zip

Extract into your workspace root → creates .kiro/steering/workshop-rules.md (AnyCompany context) and .kiro/steering/exercise-isolation.md (keeps exercise steps independent).

🎯 Exercise Approach

In this exercise, you'll iteratively refine a prompt through 6 steps — each applying a specific technique from the Advanced Prompting curriculum. At the end, you'll extract a reusable prompt template with variables, then test it against a completely different merchant profile to validate that it works at scale. The final deliverable is a production-ready prompt template that your team can deploy across thousands of merchant assessments.

Techniques You'll Practice

Step	Prompting Technique	Curriculum Reference	Duration
Step 1	Zero-Shot Baseline	Module 1: Prompt Fundamentals	3 min
Step 2	Role & Persona Prompting	Module 3: Role & Persona	4 min
Step 3	Few-Shot Prompting	Module 1: Few-Shot vs Zero-Shot	5 min
Step 4	Structured Output (JSON/Sections)	Module 4: Structured Outputs	5 min
Step 5	RAG Grounding + Self-Critique	Module 5: RAG Patterns + Module 5.1: Self-Critique	5 min
Step 6	Template Extraction (Meta-Prompting)	Module 5.2: Meta-Prompting	8 min
Step 7	Template Validation (New Data)	Production Testing	5 min
Step 8	Evaluate Your Template (Rubric + LLM-as-Judge)	Module 6: Evaluating Prompts	5 min

Sample Merchant Data

You'll use this data throughout all 6 steps. Copy it once — you'll paste it into each prompt.

MERCHANT DATA — Copy this for use in all steps

MERCHANT PROFILE: QuickMart Express (ID: MRC-2847) Market: Singapore Category: Convenience Store / Mini-Mart Onboarded: March 2023 Payment channels: AnyCompany Pay, PayLater TRANSACTION DATA (Last 6 months): - Monthly transaction volume: 4,200 → 4,800 → 5,100 → 8,900 → 12,400 → 15,600 - Monthly GMV (SGD): $42,000 → $48,000 → $51,000 → $89,000 → $124,000 → $156,000 - Average transaction size: $10.00 → $10.00 → $10.00 → $10.00 → $10.00 → $10.00 - PayLater adoption rate: 8% → 12% → 15% → 22% → 35% → 48% CHARGEBACK DATA: - Chargeback rate (last 6 months): 0.3% → 0.4% → 0.5% → 1.2% → 2.8% → 4.1% - Industry benchmark: 0.5% - 1.0% - Chargeback reasons: "Item not received" (45%), "Unauthorized transaction" (30%), "Item not as described" (25%) - Dispute resolution rate: 62% (industry avg: 78%) COMPLAINT DATA: - Customer complaints (monthly): 3 → 5 → 4 → 12 → 28 → 45 - Top complaint: "Charged but order not fulfilled" (60%) - Response time: 72 hours average (SLA: 24 hours) COMPLIANCE STATUS: - KYC: Verified (last updated: Jan 2025) - Business registration: Active - Beneficial owner: Tan Wei Ming (Singapore PR) - Related merchants: None flagged - Previous risk flags: None prior to Month 4 ADDITIONAL CONTEXT: - Merchant opened 2 new outlet locations in Month 4 and Month 5 - Hired 15 new part-time staff in last 3 months - Recently started accepting PayLater for transactions under $5 (previously $10 minimum)

Step 1: Zero-Shot Baseline

📖 Technique: Zero-Shot Prompting (Module 1: Prompt Fundamentals)

Zero-shot means giving the model a task with no examples, no role, and minimal instruction. This establishes a baseline — you'll see what the model produces with almost no guidance, then improve from there.

In Kiro, start a New Session and paste:

PROMPT — Step 1: Zero-Shot

Assess the risk of this merchant based on the data below. Save the output as "step1-zero-shot.md" in a "lab6-risk-assessment" folder. [PASTE MERCHANT DATA HERE]

MERCHANT DATA

🔍 Observe the output: The response is likely generic, unstructured, and missing key analysis. It may hallucinate details not in the data. Note what's missing — this is your improvement baseline.

💬 Discussion point: What's wrong with this output? Common issues:

No clear structure — hard to scan quickly
May include assumptions not supported by the data
No risk rating or actionable recommendation
Inconsistent depth — some areas over-analyzed, others ignored
Would look different every time you run it — not repeatable

Step 2: Add Role & Persona

📖 Technique: Role & Persona Prompting (Module 3)

Assigning a specific role shapes the model's vocabulary, reasoning depth, and what it considers important. A "risk analyst" will focus on different signals than a "customer support agent" looking at the same data.

Start a New Session in Kiro. This time, add a persona before the task:

PROMPT — Step 2: Persona

You are a Senior Merchant Risk Analyst at a Southeast Asian fintech company. You have 8 years of experience assessing payment merchants for fraud risk, operational risk, and compliance risk. You are known for being thorough, data-driven, and fair — you always distinguish between genuine business growth and suspicious patterns. Assess the risk of this merchant based on the data below. Save the output as "step2-persona.md" in the "lab6-risk-assessment" folder. [PASTE MERCHANT DATA HERE]

MERCHANT DATA

💬 Why does persona work? The model has been trained on millions of documents written by risk analysts. When you say "You are a Senior Merchant Risk Analyst," you're activating that specific knowledge cluster — the model draws on risk assessment frameworks, industry terminology, and analytical patterns it learned from real analyst writing.

Step 3: Add Few-Shot Examples

📖 Technique: Few-Shot Prompting (Module 1)

Providing 1-2 examples of the desired output teaches the model your exact format, tone, and level of detail. This is the single most effective technique for getting consistent, repeatable outputs.

Start a New Session in Kiro. Now include two short example assessments before the actual task:

PROMPT — Step 3: Few-Shot

You are a Senior Merchant Risk Analyst at a Southeast Asian fintech company. You have 8 years of experience assessing payment merchants for fraud risk, operational risk, and compliance risk. Here are two examples of how merchant risk assessments should be written: --- EXAMPLE 1 (LOW RISK): Merchant: FreshDaily Grocers (MRC-1102) | Market: Malaysia | Category: Grocery Assessment: FreshDaily Grocers demonstrates a healthy, stable transaction profile. Monthly volumes have grown steadily at 8-10% month-over-month, consistent with organic business expansion. Chargeback rate of 0.4% is well within the industry benchmark of 0.5-1.0%. Customer complaints are minimal (2-3/month) and resolved within SLA. KYC documentation is current and no compliance flags exist. Risk Rating: 🟢 GREEN — No action required. Next review in 6 months. --- EXAMPLE 2 (HIGH RISK): Merchant: LuxeDeals Online (MRC-3391) | Market: Indonesia | Category: E-Commerce Assessment: LuxeDeals Online presents significant risk indicators requiring immediate attention. Transaction volume spiked 400% in one month with no corresponding business explanation. Chargeback rate has reached 6.2%, far exceeding the 1.0% industry benchmark. 70% of chargebacks cite "unauthorized transaction," suggesting potential card-testing or account takeover fraud. The merchant has not responded to two compliance review requests. Risk Rating: 🔴 RED — Recommend immediate PayLater suspension and enhanced monitoring. Escalate to Fraud Investigation team. --- Now assess this merchant using the same format and depth. Save the output as "step3-few-shot.md" in the "lab6-risk-assessment" folder. [PASTE MERCHANT DATA HERE]

MERCHANT DATA

🔍 Compare with Step 2: The output should now match the format of your examples — same structure, similar length, consistent tone. The model learned your "house style" from just 2 examples.

💬 Key insight: Few-shot examples are like showing a new analyst "here's how we write these reports." The model mimics the pattern. Notice how 2 examples (one GREEN, one RED) are enough — the model interpolates for AMBER cases on its own.

Step 4: Add Structured Output Requirements

📖 Technique: Structured Output (Module 4)

Defining exact sections and format ensures every assessment covers the same areas. This makes outputs comparable across merchants and scannable by busy stakeholders.

Start a New Session in Kiro. Now add explicit section requirements:

PROMPT — Step 4: Structured Output

You are a Senior Merchant Risk Analyst at a Southeast Asian fintech company with 8 years of experience in payment merchant risk assessment. Produce a Merchant Risk Assessment Report with EXACTLY these sections: 1. MERCHANT SUMMARY - One paragraph: who they are, what they do, how long on platform 2. TRANSACTION ANALYSIS - Volume and GMV trends (highlight any anomalies) - Average transaction size analysis - PayLater adoption trends and risk implications 3. CHARGEBACK & DISPUTE ANALYSIS - Current rate vs. industry benchmark - Trend direction (improving/worsening) - Root cause breakdown - Dispute resolution effectiveness 4. CUSTOMER COMPLAINT ANALYSIS - Volume trend and top categories - SLA compliance - Correlation with chargeback patterns 5. RISK FACTORS - List each identified risk factor - For each: severity (HIGH/MEDIUM/LOW) and supporting data point 6. MITIGATING FACTORS - Any legitimate business explanations for the patterns observed 7. RISK RATING - 🟢 GREEN (low risk) | 🟡 AMBER (elevated, monitor) | 🔴 RED (high, action required) - One-sentence justification 8. RECOMMENDED ACTIONS - Numbered list of specific, actionable next steps with owners and timelines Save the output as "step4-structured.md" in the "lab6-risk-assessment" folder. [PASTE MERCHANT DATA HERE]

MERCHANT DATA

🔍 Compare with Step 3: Every assessment now has the same 8 sections. You can compare Merchant A vs Merchant B side by side. Stakeholders know exactly where to look for the information they need.

Step 5: Add RAG Grounding + Self-Critique

📖 Technique: RAG Grounding + Self-Critique (Module 5)

Grounding instructions prevent the model from hallucinating facts not in the data. Self-critique makes the model review its own output for errors, bias, or unsupported claims — like having a second analyst review the report.

Start a New Session in Kiro. This is the last isolated step — after this, Step 6 continues in the same session. Add grounding rules and a self-review step:

PROMPT — Step 5: Grounding + Self-Critique

You are a Senior Merchant Risk Analyst at a Southeast Asian fintech company with 8 years of experience in payment merchant risk assessment. CRITICAL GROUNDING RULES: - Base your assessment ONLY on the data provided below. Do not infer, assume, or add information not present in the data. - Every claim must reference a specific data point. Example: "Chargeback rate increased from 0.3% to 4.1% over 6 months" — not "chargebacks are high." - If data is insufficient to assess an area, explicitly state: "[INSUFFICIENT DATA: need X to assess Y]" - Do not speculate on intent or motivation. State patterns, not judgments about the merchant's character. - Distinguish between correlation and causation. If two trends coincide, note the correlation but do not claim one caused the other. Produce a Merchant Risk Assessment Report with these sections: 1. MERCHANT SUMMARY 2. TRANSACTION ANALYSIS 3. CHARGEBACK & DISPUTE ANALYSIS 4. CUSTOMER COMPLAINT ANALYSIS 5. RISK FACTORS (each with severity and supporting data point) 6. MITIGATING FACTORS 7. RISK RATING (GREEN / AMBER / RED with justification) 8. RECOMMENDED ACTIONS (numbered, with owners and timelines) After completing the report, perform a SELF-REVIEW: - Re-read your assessment and check: Is every claim supported by a specific data point from the input? - Are there any assumptions or inferences that go beyond the data? - Is the risk rating consistent with the evidence presented? - Would a different analyst reading the same data reach the same conclusion? If you find any issues, correct them before presenting the final report. Save the output as "step5-grounded.md" in the "lab6-risk-assessment" folder. [PASTE MERCHANT DATA HERE]

MERCHANT DATA

🔍 Compare with Step 4: The output should now cite specific numbers for every claim. The self-review section catches errors the model might have made. This is production-safe — auditable and defensible.

💬 Why self-critique matters for risk assessments: In regulated environments, every assessment may be audited. A report that says "chargebacks are concerning" is useless. A report that says "chargeback rate increased from 0.3% to 4.1% over 6 months, exceeding the 1.0% industry benchmark by 4x" is auditable. The grounding rules + self-critique ensure this level of rigor automatically.

Step 6: Extract the Reusable Template

📖 Technique: Meta-Prompting (Module 5.2)

Meta-prompting asks the AI to analyze your conversation and produce a reusable artifact. Instead of manually extracting the template, you ask the model to do it — turning your iterative work into a production-ready template with variables. The quality of the template depends on how well you instruct the extraction.

In the same session from Step 5 (do not start a new one), paste this follow-up:

PROMPT — Step 6: Template Extraction

Now I want to turn this into a reusable template that any analyst on my team can use for ANY merchant — not just QuickMart Express. Create a Markdown file called "merchant-risk-assessment-prompt-template.md" and save it in a "prompt-templates" folder. The template should: - Be completely self-contained — a new team member should be able to use it without additional training - Use {{variables}} for all merchant-specific data (name, ID, market, transaction data, etc.) - Include the persona, grounding rules, output structure, and self-review from our refined prompt - Have clear usage instructions Think about what makes a template truly production-ready and reusable at scale.

✅ Deliverable: Kiro will create a prompt-templates/merchant-risk-assessment-prompt-template.md file. This is your reusable artifact — open it and review the quality.

📤 Submit Your Template

Submit your template for automated scoring. How production-ready is it?

Your Name / Team

Resubmitting with the same name replaces your previous entry.

Template Markdown (paste the full content of your prompt-templates/*.md file)

💡 How scoring works

Your template is sent to Amazon Bedrock which evaluates how production-ready it is — structure, reusability, guardrails, quality controls, and domain relevance. The same LLM-as-Judge technique from the slides, now applied to your work. Scores appear on the leaderboard below.

🏆 Template Leaderboard

No submissions yet. Be the first!

🔐 Instructor Version — How would an expert extract this template?

After submitting, enter the passkey to reveal the production-grade extraction prompt. Compare it with what you used — notice the difference in specificity.

🎓 Instructor Version — Production-Grade Extraction Prompt

Compare this with the simplified prompt above. Notice how much more specific it is about variable definitions, data format examples, customization notes, and modifiable sections. This level of detail is what makes a template truly self-contained and usable by someone who wasn't in the room when it was built.

INSTRUCTOR PROMPT — Production-Grade Template Extraction

Excellent work. Now I want to turn this into a reusable template that any analyst on my team can use for ANY merchant — not just QuickMart Express. Create a Markdown file called "merchant-risk-assessment-prompt-template.md" and save it in a "prompt-templates" folder. The file should contain a complete, self-contained prompt template with the following structure: ## HEADER - Title: "Merchant Risk Assessment — Prompt Template" - Version number, last updated date, purpose statement - Brief usage instruction: copy from ---START PROMPT--- to ---END PROMPT---, replace variables, paste into LLM ## TEMPLATE USAGE GUIDE A table with ALL variables used in the template. Columns: | Variable | Description | Expected Format | Example | Include every variable: {{merchant_name}}, {{merchant_id}}, {{market}}, {{merchant_category}}, {{onboarded_date}}, {{payment_channels}}, {{beneficial_owner}}, {{related_merchants}}, {{kyc_status}}, {{business_registration_status}}, {{analysis_period}}, {{transaction_data}}, {{chargeback_data}}, {{industry_chargeback_benchmark}}, {{complaint_data}}, {{complaint_sla}}, {{compliance_status}}, {{additional_context}}, {{currency}} ## DATA FORMAT EXAMPLES Show the exact format expected for each complex variable (transaction_data, chargeback_data, complaint_data, compliance_status, additional_context) with realistic sample data. ## PREREQUISITES Numbered list of what to prepare before using the template. ## ---START PROMPT--- / ---END PROMPT--- The actual prompt template containing: - The persona we refined - The grounding rules from Step 5 - The 8-section output format from Step 4 - Risk rating definitions (GREEN/AMBER/RED with specific criteria) - The self-review instructions from Step 5 - A MERCHANT DATA INPUT block at the end with all {{variables}} organized by category ## CUSTOMIZATION NOTES (after ---END PROMPT---) Three subsections: 1. **Market-Specific Adjustments** — Table of SEA markets (Singapore/MAS, Malaysia/BNM, Indonesia/OJK, Thailand/BOT, Vietnam/SBV, Philippines/BSP) with key regulatory considerations. 2. **Adjusting Risk Thresholds by Merchant Category** — Table of categories (Convenience, E-commerce, F&B, Digital Goods, Travel, Subscription) with typical chargeback benchmarks. 3. **Modifiable Sections** — Table showing which parts of the template can/cannot be modified and why. Grounding rules and self-review should be marked as "Do not modify." The template must be self-contained — a new team member should be able to use it without any additional context or training.

💡 Teaching point: The simplified prompt produces a decent template. The instructor version produces a production-grade one. The difference? Specificity — naming every variable, defining data formats, including customization notes. This is the gap between "good enough for a demo" and "ready for your team to use daily."

Step 7: Validate the Template with New Data

📖 Technique: Production Testing

A template is only useful if it works beyond the data it was built with. Test against a completely different merchant — different market, category, and risk profile.

How to use your template

Open the file prompt-templates/merchant-risk-assessment-prompt-template.md that Kiro created in Step 6
Find the section between ---START PROMPT--- and ---END PROMPT---
Copy that entire block — this is your reusable prompt
Start a New Session in Kiro
Paste the prompt, then replace the entire merchant data section with the test data below — no need to replace variables one by one, just swap the whole data block
Add this instruction at the end: "Save the output as run1.md in the lab6-risk-assessment folder."

Test data — a low-risk Indonesian merchant

TEST DATA — WarungMakan Digital (Indonesia, healthy growth)

MERCHANT PROFILE: WarungMakan Digital (ID: MRC-5521) Market: Indonesia Category: Food & Beverage / Cloud Kitchen Onboarded: November 2024 Payment channels: AnyCompany Wallet, PayLater TRANSACTION DATA (Last 6 months): - Monthly transaction volume: 800 → 1,200 → 1,800 → 2,100 → 2,400 → 2,600 - Monthly GMV (IDR): 24M → 36M → 54M → 63M → 72M → 78M - Average transaction size: IDR 30,000 (~$2.80 SGD) - PayLater adoption rate: 5% → 8% → 10% → 12% → 14% → 15% CHARGEBACK DATA: - Chargeback rate (last 6 months): 0.1% → 0.2% → 0.2% → 0.3% → 0.3% → 0.4% - Industry benchmark (F&B Indonesia): 0.3% - 0.8% - Chargeback reasons: "Order not received" (70%), "Wrong item" (20%), "Double charge" (10%) - Dispute resolution rate: 85% (industry avg: 78%) COMPLAINT DATA: - Customer complaints (monthly): 2 → 3 → 4 → 5 → 5 → 6 - Top complaint: "Late delivery during peak hours" (55%) - Response time: 18 hours average (SLA: 24 hours) COMPLIANCE STATUS: - KYC: Verified (last updated: March 2025) - Business registration: Active (PT WarungMakan Digital Indonesia) - Beneficial owner: Budi Santoso (Indonesian citizen) - Related merchants: 1 related entity — WarungMakan Catering (MRC-5522, active, no flags) - Previous risk flags: None ADDITIONAL CONTEXT: - Merchant participates in AnyCompany's "Go Digital" small business program - Recently integrated with 3 food delivery platforms - Seasonal volume spike expected during Ramadan (March-April)

Run it 3 times for consistency testing

Start a New Session for each run. Use the same template + test data each time, but change the output filename:

Run	Add this to the end of your prompt
Run 1	`Save the output as "run1.md" in the "lab6-risk-assessment" folder.`
Run 2	`Save the output as "run2.md" in the "lab6-risk-assessment" folder.`
Run 3	`Save the output as "run3.md" in the "lab6-risk-assessment" folder.`

🔍 Validate the outputs: This is a deliberately different profile — low-risk Indonesian F&B merchant. Check across all 3 runs:

Are all 8 sections populated correctly?
Does it handle IDR currency (not just SGD)?
Does it correctly identify this as a low-risk (GREEN) merchant?
Are the recommended actions appropriate for a healthy F&B merchant?
Are the 3 runs consistent in structure and rating?

💬 If the template doesn't work well: That's valuable feedback. Go back to Step 6 and refine — maybe it needs better currency handling or market-aware regulatory references. This iterate-and-test cycle is exactly how production templates get hardened.

Step 8: Evaluate Your Template

Now use LLM-as-Judge to score each of your 3 runs from Step 7. You have run1.md, run2.md, and run3.md in your lab6-risk-assessment folder.

Score each run

Start a New Session in Kiro. Paste the rubric below and tell Kiro which file to evaluate:

PROMPT — Evaluate Run 1

You are a STRICT expert evaluator for merchant risk assessments. You have high standards and rarely give perfect scores. A score of 5 should be genuinely exceptional — most good outputs score 3-4. Read the file "lab6-risk-assessment/run1.md" and score on 4 criteria (1-5 each): 1. **Completeness** (1-5): Are all 8 required sections present with SUBSTANTIVE content? - 1 = Missing 3+ sections - 2 = Missing 1-2 sections or several are just headers with one sentence - 3 = All sections present but 2-3 are thin (under 2 sentences) - 4 = All sections present with good detail, minor gaps - 5 = RARE — every section has exceptional depth, derived insights, and cross-references between sections 2. **Data Grounding** (1-5): Does EVERY factual claim cite a SPECIFIC number from the input? - 1 = Most claims are generic ("chargebacks are high") - 2 = Some numbers cited but many vague claims remain - 3 = Most claims cite data but 2-3 are generic or rounded - 4 = Nearly all claims cite specific data, 1 minor gap - 5 = RARE — zero vague statements, every single claim traces to an exact metric, includes calculated derived metrics (e.g., growth rates, ratios) 3. **Actionability** (1-5): Are recommendations SPECIFIC with named owners AND timelines? - 1 = No actions or just "monitor the situation" - 2 = Generic actions like "review the merchant" without specifics - 3 = Some specific actions but missing owners OR timelines - 4 = Most actions have owners and timelines, 1-2 are vague - 5 = RARE — every action has: what to do, who does it, by when, and what triggers escalation 4. **Analytical Depth** (1-5): Does the assessment show REASONING beyond restating data? - 1 = Just restates the input data in paragraph form - 2 = Minimal interpretation, mostly data summary - 3 = Some analysis (e.g., compares to benchmarks) but surface-level - 4 = Good analysis with trend interpretation and risk implications - 5 = RARE — identifies root causes, connects patterns across sections, calculates derived metrics, distinguishes correlation from causation IMPORTANT: Be honest. Most good AI outputs score 14-17/20. A score of 20/20 should be almost never given. If you find yourself giving all 5s, you are being too lenient — re-read the "RARE" criteria. Return your evaluation as JSON and save it as "eval-run1.md" in the "lab6-risk-assessment" folder: {"completeness": X, "grounding": X, "actionability": X, "depth": X, "total": X, "strengths": "one sentence", "weaknesses": "one sentence — there is ALWAYS something to improve"}

💡 For runs 2 and 3: Start a New Session each time. Change run1.md → run2.md → run3.md and eval-run1.md → eval-run2.md → eval-run3.md in the prompt.

Record your scores

	Completeness	Grounding	Actionability	Depth	Total /20
Run 1
Run 2
Run 3
Average

✅ What to look for:

All 3 runs score 17-19/20: Your template is production-ready and consistent — this is the ideal outcome
All 3 runs score the same: Excellent consistency — the template produces reliable results every time. This is what you want for production use.
Average total 14-16: Good but has room for improvement — check which criterion scored lowest and refine that part of the template
Scores vary by 3+ between runs: The template needs tighter constraints — add more structure, examples, or decision rules
Want to see real score differences? Try the bonus challenge below — switch to a different model and compare

💡 This is LLM-as-Judge — you're using one AI to evaluate another AI's output. This technique scales: you can evaluate 100 outputs in minutes instead of hours. The JSON format makes it easy to track scores over time and compare template versions.

🎯 Bonus challenge (if time permits): The 3 runs above test consistency (same model, same template — does it produce reliable results?). For a different test, try running your template with a different model. In Kiro, you can switch models in the model selector. Try generating a run-nova.md or run-haiku.md and evaluate it with the same rubric. You'll likely see score differences — different models have different strengths. A cheaper model might score 14/20 where the default scores 17/20 — and that might be good enough for GREEN-rated merchants.

Reflection & Discussion

What You Built

Through 6 iterative steps, you evolved a 10-word prompt into a production-grade template that:

Produces consistent, structured assessments every time
Cites specific data points (auditable and defensible)
Includes self-review to catch errors before human review
Works for any merchant — just swap the variables
Can be customized per market and merchant category

Technique Recap

Step	Technique	What It Fixed
1. Zero-Shot	Baseline	Established what "bad" looks like
2. Persona	Role assignment	Better vocabulary, deeper analysis
3. Few-Shot	Example-driven	Consistent format and tone
4. Structured	Section requirements	Comparable, scannable outputs
5. Grounding	RAG + Self-Critique	No hallucination, auditable claims
6. Meta-Prompt	Template extraction	Reusable at scale
7. Validation	Test with new data	Confirmed template generalizes
8. Evaluation	Rubric scoring	Measurable quality, consistency proof

💡 Key takeaway: The prompt IS the product. In many business workflows, you don't need to build software — you need to build a great prompt template. A well-engineered template that takes 35 minutes to create can save your team hundreds of hours per month when deployed across thousands of merchant assessments.

Try It Yourself

To validate your template, try it with a completely different merchant profile — a high-volume e-commerce merchant in Indonesia, or a small food stall in Thailand. Does the template still produce a useful assessment? If not, what needs adjusting?

💾 Save Your Game — AI Memory for Long Projects

You just completed 8 steps across multiple sessions. In a real project, you'd want the AI to "remember" this work next week. But AI has no memory between sessions — every new chat starts blank.

The fix: maintain two files as your project's persistent memory:

project-status.md	Current state — what exists, what's remaining, key decisions. Load this every session. Keep it compact (~2 pages).
session-log.md	History — what was done each session and why. Load only when needed (e.g., "when did we change the threshold?").

End-of-session prompt:

Update project-status.md with what we built today: - List all files created or modified - Update the "What's Remaining" section - Note any key decisions we made Then append a summary of today's session to session-log.md.

💡 Why be specific? "Update the status" is vague — the AI might miss details or append instead of replacing. The more specific your save prompt, the better your load next time. Think of it like writing a handover note for your future self.

What You Accomplished

📝 Applied 6 advanced prompting techniques in a real business context
🔄 Experienced iterative prompt refinement — the core skill of prompt engineering
📋 Produced a reusable, production-grade prompt template with variables
🔍 Learned to ground AI outputs in data and add self-critique for quality assurance
📊 Evaluated your template with a rubric and LLM-as-Judge — measurable quality
🏗️ Built an artifact your team can deploy immediately for merchant risk assessments

← Workshop Home

🏪 Lab 6: Merchant Risk Assessment Narrative

Exercise Overview

Techniques You'll Practice

Sample Merchant Data

Step 1: Zero-Shot Baseline

Step 2: Add Role & Persona

Step 3: Add Few-Shot Examples

Step 4: Add Structured Output Requirements

Step 5: Add RAG Grounding + Self-Critique

Step 6: Extract the Reusable Template

📤 Submit Your Template

🏆 Template Leaderboard

🔐 Instructor Version — How would an expert extract this template?

Step 7: Validate the Template with New Data

How to use your template

Test data — a low-risk Indonesian merchant

Run it 3 times for consistency testing

Step 8: Evaluate Your Template

Score each run

Record your scores

Reflection & Discussion

What You Built

Technique Recap

Try It Yourself

What You Accomplished