langsmith-observability
提供LLM可观测性平台,用于调试、评估和监控语言模型及AI应用,提升开发效率和模型性能。
npx skills add davila7/claude-code-templates --skill langsmith-observabilityBefore / After 效果对比
1 组在没有 `langsmith-observability` 技能辅助时,调试 LLM 应用(提示词、链、代理)通常依赖于日志文件、手动检查输出,难以系统性地评估模型表现,也无法有效监控生产环境中的 LLM 系统,导致问题发现滞后。
使用 `langsmith-observability` 技能后,可以利用 LangSmith 平台进行 LLM 应用的调试、评估和监控。技能会指导开发者如何利用 LangSmith 的功能,系统地评估模型输出、构建回归测试,并实时监控生产系统。这显著缩短了调试时间,提高了模型性能的可视性,并加速了 AI 功能的迭代。
LLM 应用调试时间
0%
0 → 0
模型性能评估准确性
0%
0 → 0
生产问题发现速度
0%
0 → 0
description SKILL.md
langsmith-observability
LangSmith - LLM Observability Platform Development platform for debugging, evaluating, and monitoring language models and AI applications. When to use LangSmith Use LangSmith when: Debugging LLM application issues (prompts, chains, agents) Evaluating model outputs systematically against datasets Monitoring production LLM systems Building regression testing for AI features Analyzing latency, token usage, and costs Collaborating on prompt engineering Key features: Tracing: Capture inputs, outputs, latency for all LLM calls Evaluation: Systematic testing with built-in and custom evaluators Datasets: Create test sets from production traces or manually Monitoring: Track metrics, errors, and costs in production Integrations: Works with OpenAI, Anthropic, LangChain, LlamaIndex Use alternatives instead: Weights & Biases: Deep learning experiment tracking, model training MLflow: General ML lifecycle, model registry focus Arize/WhyLabs: ML monitoring, data drift detection Quick start Installation pip install langsmith # Set environment variables export LANGSMITH_API_KEY="your-api-key" export LANGSMITH_TRACING=true Basic tracing with @traceable from langsmith import traceable from openai import OpenAI client = OpenAI() @traceable def generate_response(prompt: str) -> str: response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content # Automatically traced to LangSmith result = generate_response("What is machine learning?") OpenAI wrapper (automatic tracing) from langsmith.wrappers import wrap_openai from openai import OpenAI # Wrap client for automatic tracing client = wrap_openai(OpenAI()) # All calls automatically traced response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello!"}] ) Core concepts Runs and traces A run is a single execution unit (LLM call, chain, tool). Runs form hierarchical traces showing the full execution flow. from langsmith import traceable @traceable(run_type="chain") def process_query(query: str) -> str: # Parent run context = retrieve_context(query) # Child run response = generate_answer(query, context) # Child run return response @traceable(run_type="retriever") def retrieve_context(query: str) -> list: return vector_store.search(query) @traceable(run_type="llm") def generate_answer(query: str, context: list) -> str: return llm.invoke(f"Context: {context}\n\nQuestion: {query}") Projects Projects organize related runs. Set via environment or code: import os os.environ["LANGSMITH_PROJECT"] = "my-project" # Or per-function @traceable(project_name="my-project") def my_function(): pass Client API from langsmith import Client client = Client() # List runs runs = list(client.list_runs( project_name="my-project", filter='eq(status, "success")', limit=100 )) # Get run details run = client.read_run(run_id="...") # Create feedback client.create_feedback( run_id="...", key="correctness", score=0.9, comment="Good answer" ) Datasets and evaluation Create dataset from langsmith import Client client = Client() # Create dataset dataset = client.create_dataset("qa-test-set", description="QA evaluation") # Add examples client.create_examples( inputs=[ {"question": "What is Python?"}, {"question": "What is ML?"} ], outputs=[ {"answer": "A programming language"}, {"answer": "Machine learning"} ], dataset_id=dataset.id ) Run evaluation from langsmith import evaluate def my_model(inputs: dict) -> dict: # Your model logic return {"answer": generate_answer(inputs["question"])} def correctness_evaluator(run, example): prediction = run.outputs["answer"] reference = example.outputs["answer"] score = 1.0 if reference.lower() in prediction.lower() else 0.0 return {"key": "correctness", "score": score} results = evaluate( my_model, data="qa-test-set", evaluators=[correctness_evaluator], experiment_prefix="v1" ) print(f"Average score: {results.aggregate_metrics['correctness']}") Built-in evaluators from langsmith.evaluation import LangChainStringEvaluator # Use LangChain evaluators results = evaluate( my_model, data="qa-test-set", evaluators=[ LangChainStringEvaluator("qa"), LangChainStringEvaluator("cot_qa") ] ) Advanced tracing Tracing context from langsmith import tracing_context with tracing_context( project_name="experiment-1", tags=["production", "v2"], metadata={"version": "2.0"} ): # All traceable calls inherit context result = my_function() Manual runs from langsmith import trace with trace( name="custom_operation", run_type="tool", inputs={"query": "test"} ) as run: result = do_something() run.end(outputs={"result": result}) Process inputs/outputs def sanitize_inputs(inputs: dict) -> dict: if "password" in inputs: inputs["password"] = "***" return inputs @traceable(process_inputs=sanitize_inputs) def login(username: str, password: str): return authenticate(username, password) Sampling import os os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1" # 10% sampling LangChain integration from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate # Tracing enabled automatically with LANGSMITH_TRACING=true llm = ChatOpenAI(model="gpt-4o") prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant."), ("user", "{input}") ]) chain = prompt | llm # All chain runs traced automatically response = chain.invoke({"input": "Hello!"}) Production monitoring Hub prompts from langsmith import Client client = Client() # Pull prompt from hub prompt = client.pull_prompt("my-org/qa-prompt") # Use in application result = prompt.invoke({"question": "What is AI?"}) Async client from langsmith import AsyncClient async def main(): client = AsyncClient() runs = [] async for run in client.list_runs(project_name="my-project"): runs.append(run) return runs Feedback collection from langsmith import Client client = Client() # Collect user feedback def record_feedback(run_id: str, user_rating: int, comment: str = None): client.create_feedback( run_id=run_id, key="user_rating", score=user_rating / 5.0, # Normalize to 0-1 comment=comment ) # In your application record_feedback(run_id="...", user_rating=4, comment="Helpful response") Testing integration Pytest integration from langsmith import test @test def test_qa_accuracy(): result = my_qa_function("What is Python?") assert "programming" in result.lower() Evaluation in CI/CD from langsmith import evaluate def run_evaluation(): results = evaluate( my_model, data="regression-test-set", evaluators=[accuracy_evaluator] ) # Fail CI if accuracy drops assert results.aggregate_metrics["accuracy"] >= 0.9, \ f"Accuracy {results.aggregate_metrics['accuracy']} below threshold" Best practices Structured naming - Use consistent project/run naming conventions Add metadata - Include version, environment, user info Sample in production - Use sampling rate to control volume Create datasets - Build test sets from interesting production cases Automate evaluation - Run evaluations in CI/CD pipelines Monitor costs - Track token usage and latency trends Common issues Traces not appearing: import os # Ensure tracing is enabled os.environ["LANGSMITH_TRACING"] = "true" os.environ["LANGSMITH_API_KEY"] = "your-key" # Verify connection from langsmith import Client client = Client() print(client.list_projects()) # Should work High latency from tracing: # Enable background batching (default) from langsmith import Client client = Client(auto_batch_tracing=True) # Or use sampling os.environ["LANGSMITH_TRACING_SAMPLING_RATE"] = "0.1" Large payloads: # Hide sensitive/large fields @traceable( process_inputs=lambda x: {k: v for k, v in x.items() if k != "large_field"} ) def my_function(data): pass References Advanced Usage - Custom evaluators, distributed tracing, hub prompts Troubleshooting - Common issues, debugging, performance Resources Documentation: https://docs.smith.langchain.com Python SDK: https://github.com/langchain-ai/langsmith-sdk Web App: https://smith.langchain.com Version: 0.2.0+ License: MIT Weekly Installs185Repositorydavila7/claude-…emplatesGitHub Stars23.0KFirst SeenJan 21, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykWarnInstalled onopencode150claude-code147gemini-cli143cursor131codex130github-copilot121
forum用户评价 (0)
发表评价
暂无评价,来写第一条吧
统计数据
用户评分
为此 Skill 评分