首页/AI 工程/prompt-repetition
P

prompt-repetition

by @supercent-iov1.0.0
4.3(0)

解决大型语言模型(LLMs)中的提示重复问题,优化因果语言模型在处理上下文和问题时的表现,提升生成质量。

ai-engineeringprompt repetitionGitHub
安装方式
npx skills add supercent-io/skills-template --skill prompt-repetition
compare_arrows

Before / After 效果对比

1
使用前

在没有理解 LLM 内部机制(如因果语言模型)并优化提示词结构时,LLM 可能会在处理复杂上下文、多项选择题或长文本时表现不佳。例如,问题可能被置于上下文之后,导致 LLM 无法充分利用上下文来回答,或者选项顺序影响其判断。

使用后

通过应用 Prompt Repetition 技能中解决的问题(如上下文-问题问题、选项优先 MCQ 问题、位置/索引问题),可以设计更有效的提示词结构,以弥补 LLM 的固有局限性。这通过重复关键信息或调整信息顺序,确保 LLM 能够更好地理解上下文和问题,从而提高其回答的准确性和质量。 **改进效果:**

description SKILL.md

prompt-repetition

Prompt Repetition Problem Being Solved LLMs are trained as Causal Language Models, where each token attends only to previous tokens. This leads to: Context-Question Problem: The question is unknown when processing context Options-First MCQ Problem: Cannot fully understand the question context when viewing answer choices Position/Index Problem: Attention weights weaken for specific position information in long lists Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention. When to use this skill When using lightweight models: claude-haiku, gemini-flash, gpt-4o-mini, etc. Options-First MCQ: Multiple choice where answer choices appear before the question Context + Question: Searching for specific information in long contexts Index/Position Tasks: Position-based queries in inventories or lists NPC Dialogue: Maintaining consistency for game AI characters Non-Reasoning Tasks: Tasks that do not use Chain-of-Thought How It Works Limitations of Causal Attention [Context] → [Question] ↓ Cannot reference Question content when processing Context tokens Attention weights for Context are already finalized by the time Question tokens appear How Prompt Repetition Solves This [First Pass] [Second Pass] Context → Question → Context' → Question' ↑ ↑ Can reference entire first pass In the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts, resulting in improved performance. Note: This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models. Research Results (Google Research 2025) Metric Result Significant improvement (p < 0.1) 47 / 70 benchmarks Performance degradation 0 Neutral 23 Improvement rate 67% Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p) Tested Models Gemini 2.0 Flash / Flash Lite GPT-4o / GPT-4o-mini Claude 3.7 Sonnet / Claude 3 Haiku Deepseek V3 Tested Benchmarks ARC (Challenge) - Scientific reasoning OpenBookQA - Open-domain QA GSM8K - Math problems MMLU-Pro - Multitask language understanding MATH - Mathematical problem solving NameIndex / MiddleMatch - Custom position tasks Application Procedure Step 1: Verify Auto-Apply Target Models Provider Auto-apply models Excluded models Claude haiku series opus, sonnet Gemini flash, flash-lite pro, ultra OpenAI gpt-4o-mini, gpt-low gpt-4o, gpt-4 Step 2: Determine Repetition Count by Task Type Task Type Keyword Pattern Repetitions Expected Improvement Options-First MCQ A. B. C. D. choices first 2× +15-40%p Index/Position slot, position, index, N-th 3× +50-76%p Context + Question General question 2× +5-15%p With CoT step by step, think through 0× (not applied) ~0% Step 3: Check Token Limits # Check context before auto-apply max_context = model_context_window * 0.8 # 80% safety margin if len(prompt_tokens) * repetitions > max_context: repetitions = max(1, int(max_context / len(prompt_tokens))) Step 4: Prompt Transformation def apply_prompt_repetition(prompt: str, times: int = 2) -> str: """Repeat the prompt a specified number of times Args: prompt: Original prompt times: Number of repetitions (default 2) Returns: Repeated prompt """ if times <= 1: return prompt return "\n\n".join([prompt] * times) Practical Examples Example 1: Options-First MCQ (Greatest Effect) Before: A. Paris B. London C. Berlin D. Madrid Which city is the capital of France? Reply with one letter. After (repetition ×2 applied): A. Paris B. London C. Berlin D. Madrid Which city is the capital of France? Reply with one letter. A. Paris B. London C. Berlin D. Madrid Which city is the capital of France? Reply with one letter. Expected output: A Accuracy: original 78% → after repetition 93% (+15%p) Example 2: Index/Position Tasks (Maximum Effect) Before: Inventory: 1. Iron Sword 2. Leather Armor 3. Health Potion (x5) 4. Magic Staff ... 25. Dragon Scale ... 50. Ancient Map What item is in slot 25? After (repetition ×3 applied): Prompt repeated 3 times Expected output: Dragon Scale Accuracy: original 21% → after repetition 97% (+76%p) Example 3: Tool Call Prompt Handling Note: Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency. Before: Use the calculator tool to compute 234 * 567. What is the result? After (repetition ×2): Use the calculator tool to compute 234 * 567. What is the result? Use the calculator tool to compute 234 * 567. What is the result? Research results show that full repetition including tool call sections is also effective. Production-Ready Implementation Auto-Apply Transformer """prompt_repetition_transformer.py""" from dataclasses import dataclass, field from typing import Optional, Callable, List import re # Context window per model (in tokens) MODEL_CONTEXT_WINDOWS = { "claude-3-haiku": 200_000, "claude-haiku": 200_000, "gemini-flash": 1_000_000, "gemini-flash-lite": 1_000_000, "gemini-2.0-flash": 1_000_000, "gpt-4o-mini": 128_000, "gpt-low": 128_000, } # Models targeted for auto-apply AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys()) # CoT patterns (excluded from apply) COT_PATTERNS = [ r"step by step", r"think through", r"let's think", r"reasoning:", r"chain of thought", ] # Position/Index patterns (3× repetition) POSITION_PATTERNS = [ r"slot \d+", r"position \d+", r"index \d+", r"\d+(st|nd|rd|th)", r"item \d+", r"row \d+", r"column \d+", ] @dataclass class PromptRepetitionConfig: """Prompt repetition configuration""" default_repetitions: int = 2 position_repetitions: int = 3 separator: str = "\n\n" max_context_ratio: float = 0.8 applied_marker: str = "" class PromptRepetitionTransformer: """Auto-apply prompt repetition transformer for lightweight models""" def init(self, config: Optional[PromptRepetitionConfig] = None): self.config = config or PromptRepetitionConfig() def should_apply(self, model: str, prompt: str) -> bool: """Determine whether to auto-apply""" # Skip if already applied if self.config.applied_marker in prompt: return False # Check target model model_lower = model.lower() if not any(m in model_lower for m in AUTO_APPLY_MODELS): return False # Skip when CoT pattern detected prompt_lower = prompt.lower() for pattern in COT_PATTERNS: if re.search(pattern, prompt_lower): return False return True def determine_repetitions(self, prompt: str, model: str) -> int: """Determine repetition count based on task type""" prompt_lower = prompt.lower() # Position/Index pattern detected → 3× for pattern in POSITION_PATTERNS: if re.search(pattern, prompt_lower): return self.config.position_repetitions return self.config.default_repetitions def estimate_tokens(self, text: str) -> int: """Simple token count estimation (speed over precision)""" # Estimate approximately 4 characters = 1 token return len(text) // 4 def transform(self, prompt: str, model: str) -> str: """Apply repetition to prompt""" if not self.should_apply(model, prompt): return prompt repetitions = self.determine_repetitions(prompt, model) # Check context limit model_lower = model.lower() max_tokens = 128_000 # Default value for m, tokens in MODEL_CONTEXT_WINDOWS.items(): if m in model_lower: max_tokens = tokens break max_allowed = int(max_tokens * self.config.max_context_ratio) prompt_tokens = self.estimate_tokens(prompt) # Reduce repetitions if token limit exceeded while prompt_tokens * repetitions > max_allowed and repetitions > 1: repetitions -= 1 if repetitions <= 1: return prompt # Apply repetition + add marker repeated = self.config.separator.join([prompt] * repetitions) return f"{self.config.applied_marker}\n{repeated}" def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable: """Wrap LLM call function""" def wrapped(prompt: str, **kwargs): transformed = self.transform(prompt, model) return llm_fn(transformed, **kwargs) return wrapped How to Measure Effectiveness (Verification) A/B Testing Method def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]): """A/B test for prompt repetition effectiveness""" transformer = PromptRepetitionTransformer() results = {"baseline": [], "repeated": []} for prompt, expected in zip(prompts, ground_truth): # Baseline response_a = llm_fn(prompt) results["baseline"].append(response_a == expected) # With Repetition repeated_prompt = transformer.transform(prompt, model) response_b = llm_fn(repeated_prompt) results["repeated"].append(response_b == expected) baseline_acc = sum(results["baseline"]) / len(prompts) repeated_acc = sum(results["repeated"]) / len(prompts) print(f"Baseline accuracy: {baseline_acc:.2%}") print(f"Repeated accuracy: {repeated_acc:.2%}") print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p") Key Metrics Metric Measurement Method Accuracy Compare correct answer rates Consistency Variance across 10 runs of same prompt Token cost Input token increase rate Latency Compare p50, p99 latency When NOT to Use Case Reason Using CoT Reasoning process already provides context Reasoning models (opus, sonnet) Already optimized; minimal effect Very long prompts Risk of exceeding context limit Already repeated Duplicate application wastes tokens Cost-Accuracy Analysis Metric Baseline With Repetition Change Input tokens 500/req 1000/req +100% Output tokens 100/req 100/req 0% Latency (p50) 450ms 460ms +2% Latency (p99) 1200ms 1250ms +4% Accuracy 78% 89% +14%p Cost per correct answer $0.019 $0.020 +5% Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency. Multi-Agent Integration Auto-Apply Strategy Per Agent Agent Model Repetition Applied Applied At Claude Orchestrator opus/sonnet Optional - Claude Executor haiku Auto skill_loader.py Gemini Analyst flash Auto On MCP call OpenAI gpt-4o-mini Auto skill_loader.py Preventing Duplicate Application To prevent duplicate application in multi-agent pipelines: Use markers: Detect already-applied prompts with marker Pass metadata: Pass x-prompt-repetition-applied: true header between agents Orchestrator management: Claude Orchestrator tracks whether repetition is applied when calling sub-agents Application Pattern [Claude Sonnet] Planning (no repetition needed) ↓ [Gemini Flash] Analysis (repetition ×2 auto-applied, marker added) ↓ [Claude Haiku] Execution (marker detected → skip duplicate apply) skill_loader.py Integration Guide Recommended Implementation # Code to add to skill_loader.py from prompt_repetition_transformer import PromptRepetitionTransformer class SkillLoader: def init(self, ...): # ... existing code ... self.prompt_transformer = PromptRepetitionTransformer() def apply_auto_skills(self, prompt: str, model: str) -> str: """Handle auto-apply skills""" # Auto-apply prompt-repetition for skill in self.skills.values(): auto_apply = skill.get('data', {}).get('auto-apply', {}) if auto_apply.get('trigger') == 'auto': target_models = auto_apply.get('models', []) if any(m in model.lower() for m in target_models): prompt = self.prompt_transformer.transform(prompt, model) return prompt Constraints Required Rules Lightweight models first: Most effective for haiku, flash, mini series Limit repetitions: 2× for general tasks, max 3× for position tasks Context monitoring: Be cautious of context overflow due to repetition Check markers: Mandatory marker check to prevent duplicate application Prohibited Rules No padding substitution: Increasing length with . etc. has no effect (per research) Do not combine with CoT: Effects cancel out Do not force-apply to reasoning models: Already optimized No duplicate application: Consecutive application without markers wastes tokens Quick Reference === Auto-Apply Target Models === claude-3-haiku, claude-haiku gemini-flash, gemini-flash-lite, gemini-2.0-flash gpt-4o-mini, gpt-low === Repetition Count === General tasks: 2× Position/Index (slot/position/index keywords): 3× With CoT: 0× (not applied) === Effect (Google Research 2025) === Improvement rate: 67% (47/70 benchmarks) Performance degradation: 0 cases Maximum improvement: +76%p (NameIndex) === Cost === Input tokens: +100% Latency: +2% (Prefill parallelization) Cost per correct answer: +5% === Duplicate Application Prevention === Marker: References Prompt Repetition Improves Non-Reasoning LLMs (Leviathan et al., 2025) Chain-of-Thought Prompting Elicits Reasoning (Wei et al., 2023) Re-Reading Improves Reasoning in LLMs (Xu et al., 2024) Weekly Installs10.3KRepositorysupercent-io/sk…templateGitHub Stars53First SeenJan 24, 2026Security AuditsGen Agent Trust HubPassSocketFailSnykPassInstalled oncodex10.2Kgemini-cli10.2Kopencode10.2Kgithub-copilot10.1Kcursor10.1Kamp10.1K

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量10.2K
评分4.3 / 5.0
版本1.0.0
更新日期2026年3月16日
对比案例1 组

用户评分

4.3(0)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code
🔧OpenClaw
🔧OpenCode
🔧Codex
🔧Gemini CLI
🔧GitHub Copilot
🔧Amp
🔧Kimi CLI

时间线

创建2026年3月16日
最后更新2026年3月16日