AnyCompany Financial Group · Generative & Agentic AI on AWS
Module 1
Prompt Fundamentals Deep Dive
The 4 pillars that determine 80% of output quality
The 80/20 Rule of Prompting
80% of prompt quality comes from 4 fundamentals:
1. Clarity
Say exactly what you mean. If a colleague would ask "what do you mean?" — your prompt needs work.
2. Context
Give the AI the background it needs. Without context, it guesses — dangerous in finance.
3. Role Assignment
Tell the AI who to be. A "risk analyst" focuses on different signals than a "support agent."
4. Output Framing
Define what "done" looks like — format, length, structure, style.
Pillar 1: Clarity
Vague
Clear
"Summarize this report"
"Summarize this quarterly earnings report in 5 bullet points, focusing on revenue growth, cost changes, and risk factors"
"Help me with this data"
"Analyze this CSV of 500 transactions and identify the top 3 merchants by total volume"
"Write something about compliance"
"Draft a 200-word summary of MAS Notice 626 requirements for e-payment service providers"
Rule of thumb: The more specific your prompt, the less the AI has to guess.
Pillar 2: Context
Without context:
"Is this transaction suspicious?"
With context:
"This merchant is a convenience store
in Singapore, typically 50-80 txns/day
averaging $15 SGD. Today: 340 txns
averaging $4.50. Is this suspicious?"
Types of context: Domain · Data · Situational · Constraints
4 Types of Context
Type
What it tells the AI
Finance example
Domain
The industry, market, and business area
"In the context of Southeast Asian digital payments and PayLater services..."
Data
The specific numbers, records, or documents to analyze
"Here is the merchant's transaction history for the last 6 months: [data]"
Situational
Why you need this now — the trigger or event
"We are preparing for a quarterly board review" / "This merchant was flagged by our monitoring system"
Constraints
Rules, limits, and requirements the output must follow
"All amounts in SGD with 2 decimal places" / "Follow MAS Notice 626 guidelines"
Rule of thumb: If you skip Domain context, the AI gives generic answers. If you skip Data context, it hallucinates. If you skip Situational context, it guesses your purpose. If you skip Constraints, it ignores your standards.
Context in Action: Merchant Review
[DOMAIN]
You are reviewing an AnyCompany Pay merchant in
Singapore's food & beverage sector.
[DATA]
Merchant: Kopi Corner Pte Ltd (ID: MC-8842)
Monthly txn volume: 4,200 → 15,600 (6-month trend)
Avg transaction: $8.50 SGD
Chargeback rate: 0.3% → 4.1% (6-month trend)
Complaints: 12 in last 30 days (up from 2)
[SITUATIONAL]
Auto-flagged: chargeback rate exceeds 1.0% threshold.
Risk committee meets Friday.
[CONSTRAINTS]
- All amounts in SGD
- Reference AnyCompany's chargeback policy (max 1.0%)
- Use only the data provided above
- Include a GREEN/AMBER/RED risk rating
Pillar 3: Role Assignment
Role
What changes in the output
Compliance Officer
Focuses on regulatory requirements, flags risks
Customer Support Agent
Empathetic language, resolution-focused
Financial Analyst
Numbers, trends, comparisons, frameworks
Fraud Investigator
Patterns, anomalies, evidence chains
Pro tip: Add experience level — "Senior Credit Risk Analyst with 10 years of experience in Southeast Asian consumer lending, specializing in PayLater products"
Pillar 4: Output Framing
Dimension
Example
Format
"Respond as a bullet list" / "Use a table"
Length
"In exactly 3 sentences" / "Under 200 words"
Structure
"Use sections: Summary, Analysis, Recommendation"
Style
"Write for a non-technical executive audience"
Constraints
"Do not include personal opinions"
Output Framing in Action
Same question: "What happened with PayLater chargebacks this quarter?"
❌ No output framing
PayLater chargebacks have been increasing this quarter. The rate went from about 2.1% to 2.8% which is a notable increase. There are several factors that could be contributing to this including seasonal spending patterns, new merchant onboarding, and changes in customer demographics. The team should look into this further and consider implementing additional fraud checks. It might also be worth reviewing the underwriting criteria for new merchants. Overall, while the increase is concerning, it's not yet at critical levels but should be monitored closely going forward.
⚠ Wall of text. No structure. Vague recommendations. Key numbers buried.
✅ With output framing
Summary: Chargeback rate rose 2.1% → 2.8% (+33%).
Root causes:
• New merchant cohort (Q2): 4.2% vs 1.8% established
• "Item not received" disputes +45% in Indonesia
Actions:
• Tighten onboarding for high-risk MCCs — Risk, 2 wks
• Add delivery confirmation for ID — Product, 1 wk
• Review top 20 merchants by volume — Ops, immediate
✓ Scannable. Key number upfront. Actions with owners & timelines.
All 4 Pillars Together
[ROLE] You are a Senior Merchant Risk Analyst at a
Southeast Asian digital payments company.
[CONTEXT] Below is the data for a flagged merchant.
Merchant: Kopi Corner Pte Ltd (ID: MC-8842)
Sector: Food & Beverage, Singapore
Monthly txn volume: 4,200 → 15,600 (6-month trend)
Avg transaction: $8.50 SGD
Chargeback rate: 0.3% → 4.1% (6-month trend)
Complaints: 12 in last 30 days (up from 2)
Top complaint: "Item not received" (45%)
[TASK] Produce a risk assessment with a
GREEN/AMBER/RED rating.
[OUTPUT] Format as:
1. Risk Rating (one sentence)
2. Transaction Analysis (cite numbers)
3. Chargeback Assessment (vs 1.0% threshold)
4. Recommended Actions (numbered, with owners)
Use ONLY the data provided. All amounts in SGD.
See the Difference: Merchant Review
❌ Without Technique
Prompt: "Review this merchant"
The merchant seems to be doing okay. Their transactions have been increasing which is good. There are some chargebacks but that's normal for this type of business. Overall they appear to be a reasonable merchant to work with. I would recommend continuing the relationship.
⚠ Vague, no data cited, no structure, no actionable recommendation, no risk rating
✅ With 4 Pillars
Prompt: Role + Context + Structure + Constraints
RISK RATING: AMBER Transaction Analysis: Volume surged 271% (4,200→15,600) over 6 months. Avg txn flat at $10 suggests organic growth, not fraud. Chargeback Alert: Rate escalated 0.3%→4.1%, exceeding 1.0% benchmark by 4x. Primary driver: "Item not received" (45%). Action: 1) Issue chargeback warning letter (Merchant Ops, 5 days) 2) Reduce PayLater limit to $5 min (Risk, immediate)
✓ Structured, data-cited, specific actions with owners and timelines
Module 2
Chain-of-Thought Reasoning
Making AI show its work — step by step
Why Chain-of-Thought?
Financial decisions require multi-step logic. CoT makes reasoning visible and auditable.
Without CoT
"Can this merchant afford a
$50K credit line?"
Answer: "Yes, they can afford it."
(No reasoning shown)
With CoT
"Think through this step by step."
Revenue: $120K
- Costs: $95K = $25K free cash
- Existing debt: $15K
= Available: $10K
New repayment: $5K
DSCR = 2.0x → Affordable
CoT Techniques
Technique
Trigger
Best for
Zero-Shot CoT
"Think step by step"
Quick calculations, simple logic
Few-Shot CoT
Provide example with reasoning
Consistent multi-step processes
Step-Back
"First identify key factors, then analyze"
Complex analysis needing prioritization
Self-Consistency
"Solve 3 ways, report majority"
High-stakes decisions
Finance rule: Any decision that could be audited should use CoT — the reasoning trail is your documentation.
Zero-Shot CoT Example
Just add "Think step by step" to any prompt — no examples needed.
❌ Without CoT
PROMPT: A merchant has revenue $120K, costs $95K, existing loan repayment $15K/month. They want a $50K credit line with $5K monthly repayments. Can they afford it?
AI RESPONSE:
Yes, they can afford it. The merchant has sufficient revenue to cover the additional repayment.
⚠ No math. No reasoning. Not auditable. Could be wrong.
✅ With "Think step by step"
PROMPT: Same data as left. + "Think through this step by step."
Counter-offer: Increase to $750 only (DTI = $1,950/$6,500 = 30.0%).
Key insight: The AI copied your exact Step 1-4 format, applied the same thresholds, and even added a counter-offer — all from one example.
Step-Back Prompting Example
First identify principles, then apply them to the specific case.
Your prompt
PROMPT:
Before analyzing this insurance claim, first answer:
1. What are the 3 most important factors for travel insurance claim validity?
2. What are common red flags in Southeast Asia?
Now assess this claim using those factors:
Claimant: Sarah Tan (SG)
Trip: Bangkok, 3 nights
Claim: $4,200 SGD — laptop stolen from hotel
Filed: 12 days after return
Police report: Filed on last day of trip
Receipt: Online purchase, 2 months before trip
STEP 2 — RED FLAGS:
• Filing delay >7 days • Police report on departure day
• High-value electronics • No hotel incident report
ASSESSMENT: HIGH RISK
• 12-day filing delay ❌
• Police report timing suspicious (last day) ❌
• $4,200 claim for 3-night trip ⚠ Recommend: Request hotel incident report + purchase proof before processing.
Why Step-Back works: The AI built a framework FIRST (factors + red flags), then applied it systematically — instead of jumping to "looks suspicious."
Self-Consistency for High Stakes
Solve 3 ways, report the majority — for decisions where being wrong is costly.
Your prompt
PROMPT:
Investigate this merchant for potential fraud.
Analyze using 3 independent approaches:
1. Velocity patterns (txn frequency vs norms)
2. Amount patterns (deviation from avg size)
3. Geographic patterns (location consistency)
For each: conclude FRAUDULENT / SUSPICIOUS / LEGITIMATE
Final assessment = majority conclusion.
If approaches disagree → flag for human review.
AI response
Approach 1 — Velocity: 340 txns/day vs 75 norm (+353%). Burst pattern at 2-4am. → FRAUDULENT
Approach 2 — Amount: Avg $4.50 vs $15 norm (−70%). Micro-transactions consistent with card testing. → FRAUDULENT
Approach 3 — Geographic: All transactions from Singapore IP. Consistent with merchant location. → LEGITIMATE
MAJORITY: 2/3 FRAUDULENT Recommend: Suspend merchant, escalate to fraud team for full investigation.
Why 3 approaches? Approach 3 alone would have cleared this merchant. The majority vote catches what a single analysis misses.
See the Difference: Loan Decision
❌ Without CoT
Prompt: "Should we approve this PayLater application?"
Yes, I would recommend approving this PayLater application. The customer has a decent income and has been on the platform for a reasonable amount of time. Their payment history is mostly good with only minor issues.
⚠ No reasoning shown, no math, not auditable, "decent" and "reasonable" are not metrics
✅ With Chain-of-Thought
Prompt: "Think through each policy criterion step by step"
Step 1 — Income: $5,200 > $2,000 min ✅ Step 2 — DTI: ($400 loan + $3,000 req) / $5,200 = 65% > 40% max ❌ Step 3 — Late payments: 2 in 8 months ≤ 2 max ✅ Step 4 — History: 8 months > 6 min ✅ Decision: DECLINE. DTI of 65% exceeds 40% policy maximum. Suggest $1,500 counter-offer (DTI = 37%).
✓ Every criterion checked, math shown, auditable, includes counter-offer
Module 3
Role & Persona Prompting
Same data, dramatically different insights — just by changing who the AI "is"
The Persona Formula
You are [TITLE] at [COMPANY TYPE]
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC].
When [SITUATION], you always [BEHAVIOR].
Conservative Analyst
"Senior Risk Analyst, 12 years in consumer credit. Known for being cautious. When in doubt, recommends additional verification."
Growth Analyst
"Business Development Analyst, 6 years in merchant acquisition. Known for finding opportunities. Always highlights growth potential alongside risks."
The last two fields matter most: CHARACTERISTIC and BEHAVIOR shape the AI's judgment calls. "Cautious" vs "opportunity-focused" produces completely different recommendations from the same data.
Persona in Action: Same Merchant, Different Eyes
Data: Merchant with 271% volume growth + chargeback rate rising from 0.3% to 4.1%
🛡 Conservative Risk Analyst
RISK RATING: RED — IMMEDIATE ACTION
Primary concern: Chargeback rate at 4.1% is 4x above the 1.0% policy threshold. This trajectory suggests systemic issues, not isolated incidents.
Volume growth concern: 271% growth without proportional infrastructure investment raises fulfillment risk. Rapid scaling often precedes quality collapse.
Recommendation:
• Reduce PayLater limit to minimum immediately
• Issue formal chargeback warning letter
• Place on 30-day enhanced monitoring
• If no improvement → initiate offboarding review
Growth signal: 271% volume increase indicates strong product-market fit. This merchant is scaling fast — a potential top-10 partner within 6 months.
Chargeback context: 4.1% rate is elevated but common during rapid scaling. 45% are "item not received" — a logistics issue, not fraud. Solvable with delivery confirmation.
Recommendation:
• Maintain current limits to support growth
• Offer delivery tracking integration (reduces chargebacks 60%)
• Assign dedicated merchant success manager
• Review in 60 days with chargeback target of 1.5%
Multi-Agent Framing
Get 3 perspectives in one prompt — no need to schedule 3 meetings.
Your prompt
PROMPT:
Evaluate this PayLater policy change (raising limit from $500 to $1,000):
Each: 3 concerns + Recommendation + 1 metric to monitor
Then: Balanced synthesis of all three views.
AI response (synthesis excerpt)
🛡 RISK: "Doubling limits increases exposure by $12M. Default rate likely rises 0.3-0.5pp. Recommend: phased rollout to top-tier customers first."
📊 PRODUCT: "Competitors offer $1,500+. Current $500 limit is #1 reason for churn. Revenue uplift est. $2.4M/quarter."
⚖ COMPLIANCE: "MAS Notice requires affordability assessment above $500. Must add income verification step."
SYNTHESIS: Proceed with phased rollout ($750 first) with income verification. Monitor default rate weekly. Full $1,000 after 90-day review.
Why this works: Forces balanced analysis. No single perspective dominates. The synthesis is where the real insight lives.
Same Data, Different Audiences
Data: "PayLater default rate increased from 2.1% to 2.8% this quarter"
Audience
Persona
Output style
Board
"You are the CFO presenting to the board"
Strategic, 5-minute read
Ops Team
"You are the Ops Manager briefing your team"
Actionable, task-oriented
Regulators
"You are Compliance Head responding to MAS"
Formal, regulation-referenced
Customers
"You are a support specialist"
Simple, empathetic
💡 Practice activity (10 min): Pick the same data point above. Write prompts for 2 different audiences. Compare how the tone, detail level, and recommendations change.
Module 4
Structured Outputs & RAG
JSON extraction, document grounding, and meta-prompting
Why Structure Matters
Unstructured = Conversation
Different every time. Hard to compare. Can't feed into systems. Requires human parsing.
Structured = Form
Consistent format. Comparable across items. Machine-parseable. Scannable by busy stakeholders.
Finance use cases:
Invoice extraction → accounts payable system
Transaction categorization → reconciliation
Complaint classification → route to correct team
KYC document parsing → verification forms
How to Get Structured Output
Tell the AI exactly what shape the output should take. The more specific, the more consistent.
Technique
Prompt example
What you get
Named sections
"Use these sections: Summary, Risk Factors, Recommendation"
Same headings every time — scannable, comparable
Table format
"Present as a table with columns: Metric | Value | Benchmark | Status"
"Give a GREEN/AMBER/RED rating. Justify in exactly 2 sentences."
Consistent decision format across all reviews
Length control
"Executive summary: max 3 sentences. Detail section: max 200 words."
Right depth for the audience
Markdown output
"Save as .md with ## headings, bullet lists, and | tables"
AI-native format — low tokens, reusable, versionable
Pro tip: Combine techniques — "Use sections: Summary (3 sentences), Risk Table (Metric | Value | Benchmark), Actions (numbered, with owner and deadline). Return the risk rating as JSON at the end."
The Best Default Format: Markdown
When you ask AI to save output as a file or produce a reusable document, Markdown wins on every dimension:
Ask AI to "save as .md" — you get structured headings, tables, and lists with 60% fewer tokens than HTML. Readable by you, parseable by AI, and on Day 3 every artifact you create (SKILL.md, steering files) will be Markdown.
Why Markdown? The Numbers
Not just a preference — Markdown is measurably better for AI work:
60%
fewer tokens than HTML for the same content structure
35%
better RAG retrieval accuracy with clean Markdown vs unstructured text
61%
table extraction accuracy in Markdown vs 54% for HTML tables
llms.txt
new web standard (2024) — websites now serve Markdown specifically for AI agents
What this means for you: Your steering files and SKILL.md load on every AI request. Concise Markdown = lower cost, better accuracy, and outputs that are reusable across tools. Detailed sources in the interactive explainer.
The Grounding Problem
Without grounding rules, the AI mixes its training data with your documents — and you can't tell which is which.
❌ Without grounding
Fills gaps with plausible fiction — invents policy details that aren't yours
Uses "typically" and "usually" — hedging that masks guessing
Mixes sources invisibly — your doc + training data, no way to tell
Never says "I don't know" — answers confidently regardless
✅ With grounding rules
Every claim traces to a source — citations after each statement
Admits gaps explicitly — "[INSUFFICIENT DATA]" instead of inventing
No outside knowledge — only the provided documents
Audit-ready output — regulators can verify every claim
Why this matters in finance: If a customer disputes a charge based on AI-generated policy guidance that was hallucinated, your team has no defense. Grounding rules make every AI output traceable.
RAG — The 4 Grounding Rules
Add these rules to any prompt where accuracy matters:
CRITICAL RULES:
1. Base your answer ONLY on the provided documents
2. After each claim, cite: [Doc Name, Section]
3. If not in documents: "Not available in
provided documents"
4. Do NOT use outside knowledge
DOCUMENTS:
[Doc 1: PayLater Terms v3.2]
[Doc 2: MAS Notice PSN 06]
QUESTION: What are our obligations if a customer
misses 3 consecutive PayLater payments?
When to use: Policy lookups, compliance checks, contract review, regulatory Q&A, audit preparation — any task where the answer must come from YOUR documents.
See the Difference: Policy Q&A
❌ Without RAG Grounding
Prompt: "What happens if a customer misses 3 PayLater payments?"
If a customer misses 3 consecutive PayLater payments, the account is typically suspended and the outstanding balance may be sent to collections. Late fees of around 1-2% per month are usually applied. The customer's credit score will be negatively impacted and they may be barred from future PayLater services.
⚠ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination.
✅ With RAG Grounding
Prompt: "ONLY from provided documents" + policy doc attached
Per the PayLater Terms v3.2:
• After 3 missed payments, the account enters "Default" status [Section 8.3]
• A late fee of $5 or 1.5% (whichever higher) per missed payment [Section 7.1]
• [INSUFFICIENT DATA: credit score reporting not in provided document]
✓ Every claim cites a section. Admits what it doesn't know. No hallucination.
Meta-Prompting
Ask the AI to write prompts for you:
I need a prompt that helps our compliance team
assess whether a new MAS regulatory update
affects our PayLater product.
The prompt should:
- Work with any MAS circular
- Produce a structured impact assessment
- Identify operations that need to change
- Suggest a compliance timeline
Write the best possible prompt for this task.
The AI builds a tool for your team. Use the generated prompt repeatedly — it's a reusable asset.
Module 5
Evaluating Your Prompts
How do you KNOW your prompts are working?
Why Evaluate?
Prompts degrade over time — model updates change behavior
"It looks good" is not a metric — you need measurable quality
Compliance requires evidence that AI outputs meet standards
You need to compare version A vs version B objectively
The problem: Most teams deploy prompts based on "it looked good when I tested it once." That's like shipping software without tests.
Manual Evaluation: Rubrics
Criterion
1 (Poor)
3 (OK)
5 (Excellent)
Completeness
Missing 3+ sections
All sections, some thin
All sections thorough
Data grounding
Unsupported claims
Mostly grounded
Every claim cites data
Actionability
No recommendation
Vague recommendation
Specific actions + owners
Consistency
Different each run
Mostly consistent
Identical structure
Process: Run same prompt 5 times → score each → average = quality score
Run on 10 outputs → compare scores between template versions
A/B Testing Prompts
Process
Same input data, two prompt versions
Run both 10 times each
Score with the judge prompt
Higher average score wins
When to Re-evaluate
After any model update
When users report quality issues
Monthly for production templates
After any template modification
Module 6 · NEW
From Manual Prompts to Automated Tools
You build the template once. The tools do the rest.
The Reality: Nobody Writes Long Prompts Every Day
You learn the techniques → build the template once → let the tools handle the rest.
Phase
What you do
Tool
1. Learn
Master the techniques (today)
Your brain
2. Build
Create a reusable template with {{variables}}
Kiro / any AI chat
3. Optimize
Let AI rewrite your prompt for better performance
Bedrock Prompt Optimization
4. Store & Share
Save versioned templates with metadata
Bedrock Prompt Management
5. Reuse
Fill in variables and run — no rewriting needed
Bedrock Console / API
Bedrock Prompt Management
Your prompt library — stored, versioned, and shared across the team.
Manual (today)
Bedrock Prompt Management
Templates in markdown files
Stored as managed resources
Copy-paste to test
One-click testing across models
No version history
Immutable version snapshots
Manual comparison
Side-by-side model comparison
Share via email/Slack
Shared across team via API
No additional charge — you only pay for model tokens during testing.
Prompt Management: Key Features
Prompt Templates with {{variables}} — same syntax from the exercises. Define variables with descriptions and defaults.
Version Management — every change creates an immutable snapshot. Roll back anytime.
Multi-Model Testing — test across Claude, Nova, Llama side-by-side. Compare quality, latency, cost.
Up to 3 Prompt Variants — compare different versions of the same prompt to find the best performer.
Think of it as: Google Docs for prompts — versioned, shared, and always accessible. But with built-in testing across multiple AI models.
Prompt Optimization (Instructor Demo)
You write a basic prompt. Bedrock rewrites it for better performance — automatically.
Your prompt
"Assess this merchant's risk level"
6 words. No structure, no role, no constraints.
Bedrock's optimized version
"You are a Senior Risk Analyst
specializing in SEA digital payments.
Produce a risk assessment:
1. Rating (GREEN/AMBER/RED)
2. Transaction Pattern Analysis
3. Chargeback Assessment
4. Recommended Actions
Base analysis ONLY on provided data."
Persona + structure + grounding — applied automatically
How Prompt Optimization Works
Step 1: Submit your prompt (even a short, rough one)
Step 2: Bedrock analyzes the prompt components
Step 3: It rewrites with best practices — structure, constraints, model-specific formatting
Step 4: Compare original vs optimized output side-by-side
Step 5: Save the optimized version to your Prompt Management library
GA — April 2025. Supports Claude, Amazon Nova, Meta Llama, DeepSeek, Mistral. The techniques you learned today help you evaluate whether the optimized prompt is actually good.
The Bottom Line
Your concern
The solution
"I don't want to write long prompts every time"
Build the template once → reuse with {{variables}}
"I'm not sure my prompt is good enough"
Prompt Optimization rewrites it automatically
"My team needs to share and version prompts"
Prompt Management stores everything centrally
"Which model gives the best result?"
Multi-model testing compares side-by-side
For developers: Intelligent Prompt Routing auto-selects cheaper models for simple tasks (up to 30% cost savings). Prompt Flows chains prompts into automated workflows. These are covered in Day 3.
Deliverable: Reusable template for APPROVE/CONDITIONS/DECLINE credit narratives
Open the workshop site → Prompt Engineering Exercises
Wrap-up
Best Practices & Prompt Optimization
Common mistakes, optimization strategies, and recovery patterns
7 Prompt Mistakes Everyone Makes
Mistake
Why it hurts
Quick fix
The Kitchen Sink
Cramming 5 tasks into 1 prompt
One task per prompt, chain results
The Blank Canvas
No examples = AI guesses your format
Show 1-2 examples of desired output
The Trust Fall
No grounding = confident hallucinations
"ONLY from provided data"
The Vague Ask
"Analyze this" — analyze what, how, for whom?
Specify audience, format, length
The One-Shot Wonder
Expecting perfection on first try
Plan for 2-3 refinement turns
The Copy-Paste Trap
Using the same prompt for different models
Tune syntax per model family
The Set-and-Forget
Never re-testing after model updates
Monthly prompt health checks
The Draft-Score-Revise Loop
Don't accept the first output. Build a self-improving cycle into your prompt:
Step 1 — DRAFT: Write a merchant risk summary
using the data provided.
Step 2 — SCORE: Rate your draft on these criteria:
- Completeness (0-5): All required sections?
- Grounding (0-5): Every claim cites data?
- Actionability (0-5): Specific next steps?
Step 3 — REVISE: If total < 12, rewrite to fix
the lowest-scoring area. Max 2 revisions.
Output only the final version.
Result: The AI self-corrects before you even read it. Teams using this pattern report 40-60% fewer revision cycles.
Break Big Tasks into Small Steps
Complex tasks fail when you ask for everything at once. Decompose instead:
❌ One Giant Prompt
"Analyze our Q2 transactions,
identify fraud patterns, calculate
loss exposure, compare to Q1,
draft a board summary, and
recommend 3 prevention measures."
6 tasks = shallow work on each
✅ Chained Prompts
Prompt 1: "Analyze Q2 transactions
and flag anomalies"
Prompt 2: "From these anomalies,
identify the top 3 fraud patterns"
Prompt 3: "Calculate loss exposure
for each pattern"
Prompt 4: "Draft a board summary
with prevention measures"
Each step gets full attention
Tell the AI What NOT to Do
Positive instructions tell the AI what to include. Negative constraints prevent common failure modes:
Problem
Negative constraint to add
AI adds unsolicited opinions
"Do not include personal opinions or speculation"
AI uses data not in your input
"Do not reference any data outside the provided documents"
AI writes too much
"Do not exceed 300 words. Do not add a conclusion section"
AI hedges everything
"Do not use phrases like 'it depends' or 'generally speaking'"
AI explains obvious things
"Do not explain what PayLater is or how digital wallets work"
AI invents numbers
"If a metric is not in the data, write [DATA NOT AVAILABLE]"
Pro tip: After your first test run, note what went wrong and add a "Do NOT" line for each issue. Your prompt improves with every iteration.
Decision Rules: Override Subjective Judgment
Different models give different ratings for the same data. Decision rules enforce your policy:
❌ Without rules
Claude: "4.1% chargebacks = critically high" → RED Llama: "271% growth offsets risk" → AMBER Same data, different conclusions.
Claude: "Rule: 4.1% > 3.0%" → RED Llama: "Rule: 4.1% > 3.0%" → RED Both agree. Policy enforced.
Use when: Consistency matters more than creativity — risk ratings, credit decisions, compliance. If your company has a policy threshold, encode it in the prompt.
Structure Your Prompts Like Documents
Well-organized prompts produce well-organized outputs. Use clear sections and delimiters:
### ROLE
You are a Senior Payment Operations Analyst.
### CONTEXT
<<<
[Paste your transaction data or document here]
>>>
### TASK
Analyze the data for anomalies in Thailand and Vietnam.
### OUTPUT FORMAT
- Executive summary (3 sentences)
- Anomaly table: Market | Type | Severity | Evidence
- Recommended actions (numbered, with owner)
### CONSTRAINTS
- Use ONLY the data provided above
- All amounts in SGD
- Do not exceed 400 words
Why delimiters matter: Without clear separation, the AI may confuse your instructions with your data — especially dangerous when pasting policy documents.
Show, Don't Tell: The Power of Examples
One good example is worth 100 words of instruction:
❌ Telling
"Categorize each transaction as
high risk, medium risk, or low
risk based on amount, frequency,
and merchant type. Format as a
table with columns for transaction
ID, category, and reasoning."
50 words of instruction, AI still guesses your format
✅ Showing
"Categorize transactions like this:
| ID | Risk | Reason |
| T001 | HIGH | $12K single txn,
new merchant, no history |
| T002 | LOW | $45 recurring,
12-month pattern |
Now categorize these: [data]"
One example = perfect format every time
The 3-Round Prompt Improvement Workflow
Every production-quality prompt goes through this cycle:
Round
What you do
What improves
Round 1: Baseline
Write your first prompt using the 4 pillars. Run it 3 times.
You see what the AI gets right and wrong
Round 2: Fix failures
Add negative constraints for each failure. Add an example of good output. Run 3 more times.
Consistency jumps from ~60% to ~85%
Round 3: Polish
Add self-review step. Tighten length/format. Test with edge cases.
Production-ready at ~95% consistency
Total time: 15-20 minutes to go from first draft to production template. That template then saves hours every week.
Build a Team Prompt Library
Your best prompts are team assets, not personal notes. Treat them like shared templates:
What to include
Prompt name and purpose
The full prompt with {{variables}}
Which model and temperature to use
1-2 example outputs (good vs bad)
Known limitations and edge cases
Last tested date and model version
Starter library for finance
Merchant risk assessment
Transaction anomaly detection
Customer complaint classification
Policy document Q&A (RAG)
Board summary generator
Regulatory impact assessment
Start today: The template you built in the exercise is your first library entry. Share it with your team this week.
Why AI "Gets Dumber" Mid-Conversation
It's not a bug — it's a context window problem. Every AI has a limited "working memory."
What happens inside
Every message + every AI response stays in the context window
At 60-70% capacity, performance drops sharply — sudden cliffs, not gradual
AI compresses and deprioritizes earlier messages
"Lost in the Middle": AI remembers start and end best, forgets the middle
What you experience
AI contradicts instructions you gave 10 messages ago
AI re-introduces ideas you already rejected
AI ignores constraints from the start of the chat
Outputs get vague, generic, or repetitive
AI starts "hallucinating" more frequently
Key insight: Most people blame the AI for "getting stupid." The real problem is the conversation got too long. The fix is context management, not a better model.
5 Rules for Managing Long Conversations
Rule
Why it works
One task per session — don't mix debugging, writing, and analysis
Each session gets full attention capacity
Paste only what's relevant — don't dump entire documents
Reduces noise, keeps AI focused
Key instructions at start AND end — not buried in the middle
Exploits primacy + recency bias
Keep sessions under 15-20 turns — start fresh after that
Stays within the performance sweet spot
Use "session summaries" — ask AI to summarize, paste into new chat
Fresh context window with all the knowledge
The Session Summary Technique
When a conversation gets too long but you can't lose the state:
Step 1: Ask for a summary
PROMPT (in the old session):
Summarize our conversation so far:
• Key decisions we made
• Data and findings so far
• What we still need to do next
Format as a briefing I can paste into a new session.
Step 2: Start fresh with context
PROMPT (in the new session):
Here is the context from our previous session:
[PASTE SUMMARY HERE]
Continue from where we left off. The next step is to draft the risk committee report based on the findings above.
✓ Fresh context window + all accumulated knowledge = best of both worlds
Think of it as "saving your game." You compress hours of conversation into a focused briefing, then load it into a fresh session with full attention capacity.
The Conversation Funnel
Start broad, then narrow. Each turn builds on context — but keep it focused.
The pattern
Turn 1 (Explore):
"Analyze this month's transaction data — identify top 3 trends"
Turn 2 (Deep-dive):
"Expand on trend #2 — the PayLater chargeback increase"
Turn 3 (Produce):
"Draft a 1-page summary for the risk committee"
Turn 4 (Polish):
"Make the tone more formal and add data citations"
Why it works
Each turn is focused on one thing
You review and correct at each step
Errors don't compound — you catch them early
4 focused turns > 1 massive prompt
When to reset: If Turn 3 goes wrong, don't keep correcting. Start a new session with: "Here's the data and the trend analysis. Draft a risk committee summary."
When to Start Fresh vs. Continue
🟢 Start a New Session
Switching to a completely different task
Conversation has gone off track
Testing a refined prompt cleanly
Session is longer than 15-20 turns
AI keeps repeating the same mistake
AI contradicts earlier instructions
🔵 Continue the Session
Iterating on the same output
Need AI to remember earlier context
Building step by step (funnel pattern)
Refining format or tone
Follow-up questions on same topic
Session is still under 15 turns
The 3-strike rule: If you've corrected the AI 3 times and it's still wrong — the context is working against you. Start fresh. It's faster than fighting a polluted conversation.
The #1 Misconception: "AI Remembers Me"
It doesn't. Each session is completely isolated. Here's what AI actually sees:
❌
What people think
"The AI remembers our conversation from last week"
"It knows what I worked on yesterday"
"I should keep this session open so it doesn't forget"
"My old tabs are giving it context"
✅
How it actually works
Each session starts with zero memory
AI only sees: your current message + this session's history
Old tabs/sessions have no effect on new ones
Closing old sessions is safe — it's cosmetic, not functional
The mental model: chat is ephemeral, files are permanent. The AI's "memory" is the files it created — reports, templates, skills. Those persist in your workspace. The conversation that produced them does not. When you need context in a new session, reference the files — not the old chat.
"Save Your Game" — AI Memory for Long Projects
For projects spanning weeks or months, you need two files — not one giant document:
project-status.md
Load every session — compact, ~2 pages
What exists now (file list, decisions)
What's remaining (next steps)
Key rules and constraints
Like a project brief — current state only
session-log.md
Load only when needed — grows over time
What was done each session
Why decisions were made
Technical details and gotchas
Like meeting minutes — history archive
When
What to say
Start session
"Here's my project context: [paste project-status.md]"
End session
"Update project-status.md with current state. Append today's work to session-log.md."
Look back
"Load session-log.md — when did we change the approval threshold?"
Why two files? A single status doc that grows every session wastes tokens. After 10 sessions, you're loading 20 pages of history on every request. Split it: load the brief (2 pages) always, load the history only when you need it. Same knowledge, 90% fewer tokens.
Circuit Breaker Patterns
Pattern
Symptom
Fix
Repetition Loop
Same wrong output after correction
New session, rephrase
Hallucination Spiral
Inventing data
"Use ONLY provided data"
Over-Eager Helper
2,000 words for 5 bullets
"Exactly 5 bullets, under 20 words"
Format Drift
Format changes mid-output
"Continue EXACTLY same format"
Confidence Trap
Uncertain info as fact
"Prefix uncertain with [UNCERTAIN]"
Working Safely: Undo, Revert, Recover
What happens when the AI makes a mistake? You have safety nets at every level.
Safety net
How it works
When to use
Supervised Mode
Shows changes, asks for approval before applying
High-stakes outputs, first time using a skill
Revert Changes
Click to undo individual file changes
AI modified a file incorrectly
New Session
Start fresh — clean context, no polluted history
AI went off track, switching tasks
Autopilot + Review
AI works autonomously, you review after
Trusted skills, routine tasks
Think of it like "track changes" in Word. Supervised mode shows you what's about to change. You approve or reject. If you approve and it's wrong, you can still revert.
Two Controls, Two Different Jobs
These are independent settings — changing one does not affect the other.
Execution Mode
Controls how much freedom Kiro has over your files
Supervised
Shows changes, waits for your approval
Autopilot
Applies changes directly, you review after
Model Selection
Controls which AI brain answers your question
Auto
Kiro picks the best model per task (recommended)
Sonnet / Opus / Haiku
You choose a specific model
Our recommendation: Use Supervised + Auto for today's labs. Supervised lets you see what Kiro is doing. Auto picks the right model so you don't have to. Switch to Autopilot in Day 3 when you're comfortable.
Using Kiro for Business Users
Vibe mode: Describe what you want → Kiro writes and runs the code
File context: Drag CSVs, PDFs, JSON into chat
Iterative refinement: "Make the chart bigger" / "Add a percentage column"
New Session per task: Keep context focused
Remember: You don't need to understand the code Kiro writes. You just need to describe what you want clearly — using the 4 pillars from Module 1.
Quick Reference Card
Technique
Trigger Phrase
Zero-Shot CoT
"Think step by step before answering"
Expert Persona
"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective
"Present the case FOR and AGAINST"
Structured Output
"Use EXACTLY these sections: 1... 2... 3..."
RAG Grounding
"Base your answer ONLY on the provided documents"
Self-Critique
"Review: Is every claim supported by data?"
Meta-Prompting
"Write the best prompt for [TASK]"
LLM-as-Judge
"Score this output against these criteria"
Negative Constraints
"Do NOT include / Do NOT use / Do NOT exceed"
Decision Rules
"If [metric] > [threshold] → MUST be [rating]"
Task Decomposition
Break 1 big prompt into 3-4 focused prompts
Draft-Score-Revise
"Draft, then score on [rubric], then revise if < threshold"
Show Don't Tell
Include 1-2 examples of desired output format
Preview
From Prompts to Workflow Automation
Everything you learned today becomes the foundation for autonomous AI agents
Your Prompt Skills = Agent Design Skills
Every technique you learned today maps directly to how AI agents are built:
Day 2: Prompt Technique
Day 3: Agent Component
What it does in an agent
Persona prompting
Agent role definition
Defines who the agent "is" and how it behaves
Structured output
Output contracts
Ensures consistent, usable results
Chain-of-Thought
Reasoning strategy
Agent thinks step-by-step before acting
RAG grounding
Knowledge base
Agent accesses your company's documents
Negative constraints
Guardrails
Prevents the agent from doing things it shouldn't
Prompt template
SKILL.md file
The template becomes a reusable, shareable skill
Key insight: You don't need to code to design an AI agent. You need to write great instructions — which is exactly what you practiced today.
Preview: Templates → Skills → Automation
Tomorrow you'll turn your prompt templates into automated workflows:
Today: Prompt template
You are a Senior Risk Analyst...
Analyze merchant data and produce:
1. Risk Rating (GREEN/AMBER/RED)
2. Transaction Analysis
3. Recommended Actions
✓ Auto-activates, shared, versioned ✓ Works in Kiro AND Claude Cowork
Day 3 covers: Workflow patterns (chaining, parallelization, routing, orchestration), the Kiro stack (steering + skills + hooks), and you'll design an agent for your team's workflow.
Quick preview only — don't go deep. Show the before/after to build excitement for tomorrow. The key message: the template they built today becomes a portable skill file with 4 lines of frontmatter. Day 3 covers the full stack and they'll design a real agent. The callout lists what's coming tomorrow.
The 3-Day Journey
📚
Day 1
"What can AI do?"
Fundamentals, use cases, responsible AI
💬
Day 2 (Today)
"How do I talk to AI?"
Prompt engineering, templates, tools
🤖
Day 3 (Tomorrow)
"How do I make AI work on its own?"
Agentic AI, workflow automation, no code
💡 Homework: What repetitive task does your team do every week that could be automated? Come to Day 3 with a specific workflow — you'll design an AI agent for it.
Day 2 Outcomes
Design prompts using the 4 pillars (Clarity, Context, Role, Output)
Apply Chain-of-Thought and Self-Consistency for financial reasoning
Create expert personas for different audiences
Extract structured data and ground responses in documents
Evaluate prompt quality with rubrics and LLM-as-Judge
Use Bedrock tools to optimize and manage prompts at scale
Manage long conversations and know when to start fresh
Build reusable prompt templates — the foundation for AI agents
Identify a workflow from your team to automate on Day 3
Thank You
Tomorrow: Make AI Work On Its Own
Agentic AI · Workflow Automation · Agent Design · No Coding Required
💡 Homework: Come with a workflow your team does every week that could be automated
AnyCompany Financial Group · Generative & Agentic AI on AWS