首页/数据 & AI/weights-and-biases
W

weights-and-biases

by @davila7v1.0.0
0.0(0)

用于机器学习实验跟踪和MLOps,自动记录指标、可视化结果,帮助用户高效管理、比较和优化ML模型开发过程。

Weights & BiasesML Experiment TrackingModel VersioningHyperparameter TuningDeep Learning MonitoringGitHub
安装方式
npx skills add davila7/claude-code-templates --skill weights-and-biases
compare_arrows

Before / After 效果对比

1
使用前

机器学习实验结果难以复现,超参数调整过程混乱,模型性能指标散落在各处,团队协作效率低下,难以有效管理模型版本。

使用后

使用Weights & Biases,可以实时可视化训练过程,自动记录所有指标和超参数,轻松对比不同实验,通过自动化Sweep找到最佳模型,并实现模型版本化管理,显著提高实验效率和团队协作能力。

description SKILL.md

weights-and-biases

Weights & Biases: ML Experiment Tracking & MLOps When to Use This Skill Use Weights & Biases (W&B) when you need to: Track ML experiments with automatic metric logging Visualize training in real-time dashboards Compare runs across hyperparameters and configurations Optimize hyperparameters with automated sweeps Manage model registry with versioning and lineage Collaborate on ML projects with team workspaces Track artifacts (datasets, models, code) with lineage Users: 200,000+ ML practitioners | GitHub Stars: 10.5k+ | Integrations: 100+ Installation # Install W&B pip install wandb # Login (creates API key) wandb login # Or set API key programmatically export WANDB_API_KEY=your_api_key_here Quick Start Basic Experiment Tracking import wandb # Initialize a run run = wandb.init( project="my-project", config={ "learning_rate": 0.001, "epochs": 10, "batch_size": 32, "architecture": "ResNet50" } ) # Training loop for epoch in range(run.config.epochs): # Your training code train_loss = train_epoch() val_loss = validate() # Log metrics wandb.log({ "epoch": epoch, "train/loss": train_loss, "val/loss": val_loss, "train/accuracy": train_acc, "val/accuracy": val_acc }) # Finish the run wandb.finish() With PyTorch import torch import wandb # Initialize wandb.init(project="pytorch-demo", config={ "lr": 0.001, "epochs": 10 }) # Access config config = wandb.config # Training loop for epoch in range(config.epochs): for batch_idx, (data, target) in enumerate(train_loader): # Forward pass output = model(data) loss = criterion(output, target) # Backward pass optimizer.zero_grad() loss.backward() optimizer.step() # Log every 100 batches if batch_idx % 100 == 0: wandb.log({ "loss": loss.item(), "epoch": epoch, "batch": batch_idx }) # Save model torch.save(model.state_dict(), "model.pth") wandb.save("model.pth") # Upload to W&B wandb.finish() Core Concepts 1. Projects and Runs Project: Collection of related experiments Run: Single execution of your training script # Create/use project run = wandb.init( project="image-classification", name="resnet50-experiment-1", # Optional run name tags=["baseline", "resnet"], # Organize with tags notes="First baseline run" # Add notes ) # Each run has unique ID print(f"Run ID: {run.id}") print(f"Run URL: {run.url}") 2. Configuration Tracking Track hyperparameters automatically: config = { # Model architecture "model": "ResNet50", "pretrained": True, # Training params "learning_rate": 0.001, "batch_size": 32, "epochs": 50, "optimizer": "Adam", # Data params "dataset": "ImageNet", "augmentation": "standard" } wandb.init(project="my-project", config=config) # Access config during training lr = wandb.config.learning_rate batch_size = wandb.config.batch_size 3. Metric Logging # Log scalars wandb.log({"loss": 0.5, "accuracy": 0.92}) # Log multiple metrics wandb.log({ "train/loss": train_loss, "train/accuracy": train_acc, "val/loss": val_loss, "val/accuracy": val_acc, "learning_rate": current_lr, "epoch": epoch }) # Log with custom x-axis wandb.log({"loss": loss}, step=global_step) # Log media (images, audio, video) wandb.log({"examples": [wandb.Image(img) for img in images]}) # Log histograms wandb.log({"gradients": wandb.Histogram(gradients)}) # Log tables table = wandb.Table(columns=["id", "prediction", "ground_truth"]) wandb.log({"predictions": table}) 4. Model Checkpointing import torch import wandb # Save model checkpoint checkpoint = { 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'loss': loss, } torch.save(checkpoint, 'checkpoint.pth') # Upload to W&B wandb.save('checkpoint.pth') # Or use Artifacts (recommended) artifact = wandb.Artifact('model', type='model') artifact.add_file('checkpoint.pth') wandb.log_artifact(artifact) Hyperparameter Sweeps Automatically search for optimal hyperparameters. Define Sweep Configuration sweep_config = { 'method': 'bayes', # or 'grid', 'random' 'metric': { 'name': 'val/accuracy', 'goal': 'maximize' }, 'parameters': { 'learning_rate': { 'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1 }, 'batch_size': { 'values': [16, 32, 64, 128] }, 'optimizer': { 'values': ['adam', 'sgd', 'rmsprop'] }, 'dropout': { 'distribution': 'uniform', 'min': 0.1, 'max': 0.5 } } } # Initialize sweep sweep_id = wandb.sweep(sweep_config, project="my-project") Define Training Function def train(): # Initialize run run = wandb.init() # Access sweep parameters lr = wandb.config.learning_rate batch_size = wandb.config.batch_size optimizer_name = wandb.config.optimizer # Build model with sweep config model = build_model(wandb.config) optimizer = get_optimizer(optimizer_name, lr) # Training loop for epoch in range(NUM_EPOCHS): train_loss = train_epoch(model, optimizer, batch_size) val_acc = validate(model) # Log metrics wandb.log({ "train/loss": train_loss, "val/accuracy": val_acc }) # Run sweep wandb.agent(sweep_id, function=train, count=50) # Run 50 trials Sweep Strategies # Grid search - exhaustive sweep_config = { 'method': 'grid', 'parameters': { 'lr': {'values': [0.001, 0.01, 0.1]}, 'batch_size': {'values': [16, 32, 64]} } } # Random search sweep_config = { 'method': 'random', 'parameters': { 'lr': {'distribution': 'uniform', 'min': 0.0001, 'max': 0.1}, 'dropout': {'distribution': 'uniform', 'min': 0.1, 'max': 0.5} } } # Bayesian optimization (recommended) sweep_config = { 'method': 'bayes', 'metric': {'name': 'val/loss', 'goal': 'minimize'}, 'parameters': { 'lr': {'distribution': 'log_uniform', 'min': 1e-5, 'max': 1e-1} } } Artifacts Track datasets, models, and other files with lineage. Log Artifacts # Create artifact artifact = wandb.Artifact( name='training-dataset', type='dataset', description='ImageNet training split', metadata={'size': '1.2M images', 'split': 'train'} ) # Add files artifact.add_file('data/train.csv') artifact.add_dir('data/images/') # Log artifact wandb.log_artifact(artifact) Use Artifacts # Download and use artifact run = wandb.init(project="my-project") # Download artifact artifact = run.use_artifact('training-dataset:latest') artifact_dir = artifact.download() # Use the data data = load_data(f"{artifact_dir}/train.csv") Model Registry # Log model as artifact model_artifact = wandb.Artifact( name='resnet50-model', type='model', metadata={'architecture': 'ResNet50', 'accuracy': 0.95} ) model_artifact.add_file('model.pth') wandb.log_artifact(model_artifact, aliases=['best', 'production']) # Link to model registry run.link_artifact(model_artifact, 'model-registry/production-models') Integration Examples HuggingFace Transformers from transformers import Trainer, TrainingArguments import wandb # Initialize W&B wandb.init(project="hf-transformers") # Training arguments with W&B training_args = TrainingArguments( output_dir="./results", report_to="wandb", # Enable W&B logging run_name="bert-finetuning", logging_steps=100, save_steps=500 ) # Trainer automatically logs to W&B trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=eval_dataset ) trainer.train() PyTorch Lightning from pytorch_lightning import Trainer from pytorch_lightning.loggers import WandbLogger import wandb # Create W&B logger wandb_logger = WandbLogger( project="lightning-demo", log_model=True # Log model checkpoints ) # Use with Trainer trainer = Trainer( logger=wandb_logger, max_epochs=10 ) trainer.fit(model, datamodule=dm) Keras/TensorFlow import wandb from wandb.keras import WandbCallback # Initialize wandb.init(project="keras-demo") # Add callback model.fit( x_train, y_train, validation_data=(x_val, y_val), epochs=10, callbacks=[WandbCallback()] # Auto-logs metrics ) Visualization & Analysis Custom Charts # Log custom visualizations import matplotlib.pyplot as plt fig, ax = plt.subplots() ax.plot(x, y) wandb.log({"custom_plot": wandb.Image(fig)}) # Log confusion matrix wandb.log({"conf_mat": wandb.plot.confusion_matrix( probs=None, y_true=ground_truth, preds=predictions, class_names=class_names )}) Reports Create shareable reports in W&B UI: Combine runs, charts, and text Markdown support Embeddable visualizations Team collaboration Best Practices 1. Organize with Tags and Groups wandb.init( project="my-project", tags=["baseline", "resnet50", "imagenet"], group="resnet-experiments", # Group related runs job_type="train" # Type of job ) 2. Log Everything Relevant # Log system metrics wandb.log({ "gpu/util": gpu_utilization, "gpu/memory": gpu_memory_used, "cpu/util": cpu_utilization }) # Log code version wandb.log({"git_commit": git_commit_hash}) # Log data splits wandb.log({ "data/train_size": len(train_dataset), "data/val_size": len(val_dataset) }) 3. Use Descriptive Names # ✅ Good: Descriptive run names wandb.init( project="nlp-classification", name="bert-base-lr0.001-bs32-epoch10" ) # ❌ Bad: Generic names wandb.init(project="nlp", name="run1") 4. Save Important Artifacts # Save final model artifact = wandb.Artifact('final-model', type='model') artifact.add_file('model.pth') wandb.log_artifact(artifact) # Save predictions for analysis predictions_table = wandb.Table( columns=["id", "input", "prediction", "ground_truth"], data=predictions_data ) wandb.log({"predictions": predictions_table}) 5. Use Offline Mode for Unstable Connections import os # Enable offline mode os.environ["WANDB_MODE"] = "offline" wandb.init(project="my-project") # ... your code ... # Sync later # wandb sync <run_directory> Team Collaboration Share Runs # Runs are automatically shareable via URL run = wandb.init(project="team-project") print(f"Share this URL: {run.url}") Team Projects Create team account at wandb.ai Add team members Set project visibility (private/public) Use team-level artifacts and model registry Pricing Free: Unlimited public projects, 100GB storage Academic: Free for students/researchers Teams: $50/seat/month, private projects, unlimited storage Enterprise: Custom pricing, on-prem options Resources Documentation: https://docs.wandb.ai GitHub: https://github.com/wandb/wandb (10.5k+ stars) Examples: https://github.com/wandb/examples Community: https://wandb.ai/community Discord: https://wandb.me/discord See Also references/sweeps.md - Comprehensive hyperparameter optimization guide references/artifacts.md - Data and model versioning patterns references/integrations.md - Framework-specific examples Weekly Installs173Repositorydavila7/claude-…emplatesGitHub Stars23.0KFirst SeenJan 21, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykWarnInstalled onopencode142claude-code141gemini-cli136cursor125codex122github-copilot113

forum用户评价 (0)

发表评价

效果
易用性
文档
兼容性

暂无评价,来写第一条吧

统计数据

安装量0
评分0.0 / 5.0
版本1.0.0
更新日期2026年3月17日
对比案例1 组

用户评分

0.0(0)
5
0%
4
0%
3
0%
2
0%
1
0%

为此 Skill 评分

0.0

兼容平台

🔧Claude Code
🔧OpenClaw
🔧OpenCode
🔧Codex
🔧Gemini CLI
🔧GitHub Copilot
🔧Amp
🔧Kimi CLI

时间线

创建2026年3月17日
最后更新2026年3月17日