pytorch-lightning
PyTorch Lightning是一个高级训练框架,旨在组织PyTorch代码,消除样板代码,同时保持灵活性,简化深度学习模型开发。
npx skills add davila7/claude-code-templates --skill pytorch-lightningBefore / After 效果对比
1 组使用原生PyTorch进行深度学习模型训练时,开发者需要手动编写大量的训练循环、数据加载、日志记录和分布式训练等样板代码,导致代码冗长、难以维护,且容易引入错误。
引入PyTorch Lightning后,开发者可以将精力集中在模型逻辑上,框架自动处理训练循环、优化器调度、多GPU训练等复杂任务,代码变得简洁、模块化,显著加速了模型开发和实验迭代。
description SKILL.md
pytorch-lightning
PyTorch Lightning - High-Level Training Framework
Quick start
PyTorch Lightning organizes PyTorch code to eliminate boilerplate while maintaining flexibility.
Installation:
pip install lightning
Convert PyTorch to Lightning (3 steps):
import lightning as L
import torch
from torch import nn
from torch.utils.data import DataLoader, Dataset
# Step 1: Define LightningModule (organize your PyTorch code)
class LitModel(L.LightningModule):
def __init__(self, hidden_size=128):
super().__init__()
self.model = nn.Sequential(
nn.Linear(28 * 28, hidden_size),
nn.ReLU(),
nn.Linear(hidden_size, 10)
)
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = nn.functional.cross_entropy(y_hat, y)
self.log('train_loss', loss) # Auto-logged to TensorBoard
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
# Step 2: Create data
train_loader = DataLoader(train_dataset, batch_size=32)
# Step 3: Train with Trainer (handles everything else!)
trainer = L.Trainer(max_epochs=10, accelerator='gpu', devices=2)
model = LitModel()
trainer.fit(model, train_loader)
That's it! Trainer handles:
-
GPU/TPU/CPU switching
-
Distributed training (DDP, FSDP, DeepSpeed)
-
Mixed precision (FP16, BF16)
-
Gradient accumulation
-
Checkpointing
-
Logging
-
Progress bars
Common workflows
Workflow 1: From PyTorch to Lightning
Original PyTorch code:
model = MyModel()
optimizer = torch.optim.Adam(model.parameters())
model.to('cuda')
for epoch in range(max_epochs):
for batch in train_loader:
batch = batch.to('cuda')
optimizer.zero_grad()
loss = model(batch)
loss.backward()
optimizer.step()
Lightning version:
class LitModel(L.LightningModule):
def __init__(self):
super().__init__()
self.model = MyModel()
def training_step(self, batch, batch_idx):
loss = self.model(batch) # No .to('cuda') needed!
return loss
def configure_optimizers(self):
return torch.optim.Adam(self.parameters())
# Train
trainer = L.Trainer(max_epochs=10, accelerator='gpu')
trainer.fit(LitModel(), train_loader)
Benefits: 40+ lines → 15 lines, no device management, automatic distributed
Workflow 2: Validation and testing
class LitModel(L.LightningModule):
def __init__(self):
super().__init__()
self.model = MyModel()
def training_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
loss = nn.functional.cross_entropy(y_hat, y)
self.log('train_loss', loss)
return loss
def validation_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
val_loss = nn.functional.cross_entropy(y_hat, y)
acc = (y_hat.argmax(dim=1) == y).float().mean()
self.log('val_loss', val_loss)
self.log('val_acc', acc)
def test_step(self, batch, batch_idx):
x, y = batch
y_hat = self.model(x)
test_loss = nn.functional.cross_entropy(y_hat, y)
self.log('test_loss', test_loss)
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
# Train with validation
trainer = L.Trainer(max_epochs=10)
trainer.fit(model, train_loader, val_loader)
# Test
trainer.test(model, test_loader)
Automatic features:
-
Validation runs every epoch by default
-
Metrics logged to TensorBoard
-
Best model checkpointing based on val_loss
Workflow 3: Distributed training (DDP)
# Same code as single GPU!
model = LitModel()
# 8 GPUs with DDP (automatic!)
trainer = L.Trainer(
accelerator='gpu',
devices=8,
strategy='ddp' # Or 'fsdp', 'deepspeed'
)
trainer.fit(model, train_loader)
Launch:
# Single command, Lightning handles the rest
python train.py
No changes needed:
-
Automatic data distribution
-
Gradient synchronization
-
Multi-node support (just set
num_nodes=2)
Workflow 4: Callbacks for monitoring
from lightning.pytorch.callbacks import ModelCheckpoint, EarlyStopping, LearningRateMonitor
# Create callbacks
checkpoint = ModelCheckpoint(
monitor='val_loss',
mode='min',
save_top_k=3,
filename='model-{epoch:02d}-{val_loss:.2f}'
)
early_stop = EarlyStopping(
monitor='val_loss',
patience=5,
mode='min'
)
lr_monitor = LearningRateMonitor(logging_interval='epoch')
# Add to Trainer
trainer = L.Trainer(
max_epochs=100,
callbacks=[checkpoint, early_stop, lr_monitor]
)
trainer.fit(model, train_loader, val_loader)
Result:
-
Auto-saves best 3 models
-
Stops early if no improvement for 5 epochs
-
Logs learning rate to TensorBoard
Workflow 5: Learning rate scheduling
class LitModel(L.LightningModule):
# ... (training_step, etc.)
def configure_optimizers(self):
optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)
# Cosine annealing
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(
optimizer,
T_max=100,
eta_min=1e-5
)
return {
'optimizer': optimizer,
'lr_scheduler': {
'scheduler': scheduler,
'interval': 'epoch', # Update per epoch
'frequency': 1
}
}
# Learning rate auto-logged!
trainer = L.Trainer(max_epochs=100)
trainer.fit(model, train_loader)
When to use vs alternatives
Use PyTorch Lightning when:
-
Want clean, organized code
-
Need production-ready training loops
-
Switching between single GPU, multi-GPU, TPU
-
Want built-in callbacks and logging
-
Team collaboration (standardized structure)
Key advantages:
-
Organized: Separates research code from engineering
-
Automatic: DDP, FSDP, DeepSpeed with 1 line
-
Callbacks: Modular training extensions
-
Reproducible: Less boilerplate = fewer bugs
-
Tested: 1M+ downloads/month, battle-tested
Use alternatives instead:
-
Accelerate: Minimal changes to existing code, more flexibility
-
Ray Train: Multi-node orchestration, hyperparameter tuning
-
Raw PyTorch: Maximum control, learning purposes
-
Keras: TensorFlow ecosystem
Common issues
Issue: Loss not decreasing
Check data and model setup:
# Add to training_step
def training_step(self, batch, batch_idx):
if batch_idx == 0:
print(f"Batch shape: {batch[0].shape}")
print(f"Labels: {batch[1]}")
loss = ...
return loss
Issue: Out of memory
Reduce batch size or use gradient accumulation:
trainer = L.Trainer(
accumulate_grad_batches=4, # Effective batch = batch_size × 4
precision='bf16' # Or 'fp16', reduces memory 50%
)
Issue: Validation not running
Ensure you pass val_loader:
# WRONG
trainer.fit(model, train_loader)
# CORRECT
trainer.fit(model, train_loader, val_loader)
Issue: DDP spawns multiple processes unexpectedly
Lightning auto-detects GPUs. Explicitly set devices:
# Test on CPU first
trainer = L.Trainer(accelerator='cpu', devices=1)
# Then GPU
trainer = L.Trainer(accelerator='gpu', devices=1)
Advanced topics
Callbacks: See references/callbacks.md for EarlyStopping, ModelCheckpoint, custom callbacks, and callback hooks.
Distributed strategies: See references/distributed.md for DDP, FSDP, DeepSpeed ZeRO integration, multi-node setup.
Hyperparameter tuning: See references/hyperparameter-tuning.md for integration with Optuna, Ray Tune, and WandB sweeps.
Hardware requirements
-
CPU: Works (good for debugging)
-
Single GPU: Works
-
Multi-GPU: DDP (default), FSDP, or DeepSpeed
-
Multi-node: DDP, FSDP, DeepSpeed
-
TPU: Supported (8 cores)
-
Apple MPS: Supported
Precision options:
-
FP32 (default)
-
FP16 (V100, older GPUs)
-
BF16 (A100/H100, recommended)
-
FP8 (H100)
Resources
-
GitHub: https://github.com/Lightning-AI/pytorch-lightning ⭐ 29,000+
-
Version: 2.5.5+
-
Examples: https://github.com/Lightning-AI/pytorch-lightning/tree/master/examples
-
Discord: https://discord.gg/lightning-ai
-
Used by: Kaggle winners, research labs, production teams
Weekly Installs328Repositorydavila7/claude-…emplatesGitHub Stars23.1KFirst SeenJan 21, 2026Security AuditsGen Agent Trust HubPassSocketPassSnykPassInstalled onclaude-code252opencode234gemini-cli218codex211cursor198github-copilot184
forum用户评价 (0)
发表评价
暂无评价,来写第一条吧
统计数据
用户评分
为此 Skill 评分