name: tooluniverse-image-analysis description: Production-ready microscopy image analysis and quantitative imaging data skill for colony morphometry, cell counting, fluorescence quantification, and statistical analysis of imaging-derived measurements. Processes ImageJ/CellProfiler output (area, circularity, intensity, cell counts), performs Dunnett's test, Cohen's d effect size, power analysis, Shapiro-Wilk normality tests, two-way ANOVA, polynomial regression, natural spline regression with confidence intervals, and comparative morphometry. Supports CSV/TSV measurement tables, multi-channel fluorescence data, colony swarming assays, and neuron counting datasets. Use when analyzing microscopy measurement data, colony area/circularity, cell count statistics, swarming assays, co-culture ratio optimization, or answering questions about imaging-derived quantitative data.

Microscopy Image Analysis and Quantitative Imaging Data

Production-ready skill for analyzing microscopy-derived measurement data using pandas, numpy, scipy, statsmodels, and scikit-image. Designed for BixBench imaging questions covering colony morphometry, cell counting, fluorescence quantification, regression modeling, and statistical comparisons.

IMPORTANT: This skill handles complex multi-workflow analysis. Most implementation details have been moved to references/ for progressive disclosure. This document focuses on high-level decision-making and workflow orchestration.

When to Use This Skill

Apply when users:

Have microscopy measurement data (area, circularity, intensity, cell counts) in CSV/TSV
Ask about colony morphometry (bacterial swarming, biofilm, growth assays)
Need statistical comparisons of imaging measurements (t-test, ANOVA, Dunnett's, Mann-Whitney)
Ask about cell counting statistics (NeuN, DAPI, marker counts)
Need effect size calculations (Cohen's d) and power analysis
Want regression models (polynomial, spline) fitted to dose-response or ratio data
Ask about model comparison (R-squared, F-statistic, AIC/BIC)
Need Shapiro-Wilk normality testing on imaging data
Want confidence intervals for peak predictions from fitted models
Questions mention imaging software output (ImageJ, CellProfiler, QuPath)
Need fluorescence intensity quantification or colocalization analysis
Ask about image segmentation results (counts, areas, shapes)

BixBench Coverage: 21 questions across 4 projects (bix-18, bix-19, bix-41, bix-54)

NOT for (use other skills instead):

Phylogenetic analysis → Use tooluniverse-phylogenetics
RNA-seq differential expression → Use tooluniverse-rnaseq-deseq2
Single-cell scRNA-seq → Use tooluniverse-single-cell
Statistical regression only (no imaging context) → Use tooluniverse-statistical-modeling

Core Principles

Data-first approach - Load and inspect all CSV/TSV measurement data before analysis
Question-driven - Parse the exact statistic, comparison, or model requested
Statistical rigor - Proper effect sizes, multiple comparison corrections, model selection
Imaging-aware - Understand ImageJ/CellProfiler measurement columns (Area, Circularity, Round, Intensity)
Workflow flexibility - Support both pre-quantified data (CSV) and raw image processing
Precision - Match expected answer format (integer, range, decimal places)
Reproducible - Use standard Python/scipy equivalents to R functions

Required Python Packages

# Core (MUST be installed)
import pandas as pd
import numpy as np
from scipy import stats
from scipy.interpolate import BSpline, make_interp_spline
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.power import TTestIndPower
from patsy import dmatrix, bs, cr

# Optional (for raw image processing)
import skimage
import cv2
import tifffile

Installation:

pip install pandas numpy scipy statsmodels patsy scikit-image opencv-python-headless tifffile

High-Level Workflow Decision Tree

START: User question about microscopy data
│
├─ Q1: What type of data is available?
│  │
│  ├─ PRE-QUANTIFIED DATA (CSV/TSV with measurements)
│  │  └─ Workflow: Load → Parse question → Statistical analysis
│  │     Pattern: Most common BixBench pattern (bix-18, bix-19, bix-41, bix-54)
│  │     See: Section "Quantitative Data Analysis" below
│  │
│  └─ RAW IMAGES (TIFF, PNG, multi-channel)
│     └─ Workflow: Load → Segment → Measure → Analyze
│        See: references/image_processing.md
│
├─ Q2: What type of analysis is needed?
│  │
│  ├─ STATISTICAL COMPARISON
│  │  ├─ Two groups → t-test or Mann-Whitney
│  │  ├─ Multiple groups → ANOVA or Dunnett's test
│  │  ├─ Two factors → Two-way ANOVA
│  │  └─ Effect size → Cohen's d, power analysis
│  │  See: references/statistical_analysis.md
│  │
│  ├─ REGRESSION MODELING
│  │  ├─ Dose-response → Polynomial (quadratic, cubic)
│  │  ├─ Ratio optimization → Natural spline
│  │  └─ Model comparison → R-squared, F-statistic, AIC/BIC
│  │  See: references/statistical_analysis.md
│  │
│  ├─ CELL COUNTING
│  │  ├─ Fluorescence (DAPI, NeuN) → Threshold + watershed
│  │  ├─ Brightfield → Adaptive threshold
│  │  └─ High-density → CellPose or StarDist (external)
│  │  See: references/cell_counting.md
│  │
│  ├─ COLONY SEGMENTATION
│  │  ├─ Swarming assays → Otsu threshold + morphology
│  │  ├─ Biofilms → Li threshold + fill holes
│  │  └─ Growth assays → Time-lapse tracking
│  │  See: references/segmentation.md
│  │
│  └─ FLUORESCENCE QUANTIFICATION
│     ├─ Intensity measurement → regionprops
│     ├─ Colocalization → Pearson/Manders
│     └─ Multi-channel → Channel-wise quantification
│     See: references/fluorescence_analysis.md
│
└─ Q3: When to use scikit-image vs OpenCV?
   ├─ scikit-image: Scientific analysis, measurements, regionprops
   ├─ OpenCV: Fast processing, real-time, large batches
   └─ Both: Often interchangeable for basic operations
   See: references/image_processing.md "Library Selection Guide"

Quantitative Data Analysis Workflow

Phase 0: Question Parsing and Data Discovery

CRITICAL FIRST STEP: Before writing ANY code, identify what data files are available and what the question is asking for.

import os, glob, pandas as pd

# Discover data files
data_dir = "."
csv_files = glob.glob(os.path.join(data_dir, '**', '*.csv'), recursive=True)
tsv_files = glob.glob(os.path.join(data_dir, '**', '*.tsv'), recursive=True)
img_files = glob.glob(os.path.join(data_dir, '**', '*.tif*'), recursive=True)

# Load and inspect first measurement file
if csv_files:
    df = pd.read_csv(csv_files[0])
    print(f"Shape: {df.shape}")
    print(f"Columns: {list(df.columns)}")
    print(df.head())
    print(df.describe())

Common Column Names:

Area: Colony or cell area in pixels or calibrated units
Circularity: 4piarea/perimeter^2, range [0,1], 1.0 = perfect circle
Round: Roundness = 4area/(pimajor_axis^2)
Genotype/Strain: Biological grouping variable
Ratio: Co-culture mixing ratio (e.g., "1:3", "5:1")
NeuN/DAPI/GFP: Cell marker counts or intensities

Phase 1: Grouped Statistics

def grouped_summary(df, group_cols, measure_col):
    """Calculate summary statistics by group."""
    summary = df.groupby(group_cols)[measure_col].agg(
        Mean='mean',
        SD='std',
        Median='median',
        Min='min',
        Max='max',
        N='count'
    ).reset_index()
    summary['SEM'] = summary['SD'] / np.sqrt(summary['N'])
    return summary

# Example: Colony morphometry by genotype
area_summary = grouped_summary(df, 'Genotype', 'Area')
circ_summary = grouped_summary(df, 'Genotype', 'Circularity')

For detailed statistical functions, see: references/statistical_analysis.md

Phase 2: Statistical Testing

Decision guide:

Normality test needed? → Shapiro-Wilk
Two groups comparison? → t-test or Mann-Whitney
Multiple groups vs control? → Dunnett's test
Multiple groups, all comparisons? → Tukey HSD
Two factors? → Two-way ANOVA
Effect size? → Cohen's d
Sample size planning? → Power analysis

See: references/statistical_analysis.md for complete implementations

Phase 3: Regression Modeling

When to use each model:

Polynomial (quadratic/cubic): Smooth dose-response, clear peak
Natural spline: Flexible, non-parametric, handles complex patterns
Linear: Simple relationships, checking for trends

Model comparison metrics:

R-squared: Overall fit (higher = better)
Adjusted R-squared: Penalizes complexity
F-statistic p-value: Model significance
AIC/BIC: Compare non-nested models

See: references/statistical_analysis.md for complete implementations

Raw Image Processing Workflow

When Processing Raw Images

Workflow: Load → Preprocess → Segment → Measure → Export

# Quick start for cell counting
from scripts.segment_cells import count_cells_in_image

result = count_cells_in_image(
    image_path="cells.tif",
    channel=0,  # DAPI channel
    min_area=50
)
print(f"Found {result['count']} cells")

Segmentation Method Selection

Decision guide:

| Cell Type | Density | Best Method | Notes | |-----------|---------|-------------|-------| | Nuclei (DAPI) | Low-Medium | Otsu + watershed | Standard approach | | Nuclei (DAPI) | High | CellPose/StarDist | Handles touching | | Colonies | Well-separated | Otsu threshold | Fast, reliable | | Colonies | Touching | Watershed | Edge detection | | Cells (phase) | Any | Adaptive threshold | Handles uneven illumination | | Fluorescence | Low signal | Li threshold | More sensitive |

See: references/segmentation.md and references/cell_counting.md for detailed protocols

Library Selection: scikit-image vs OpenCV

Use scikit-image when:

Scientific measurements needed (area, perimeter, intensity)
regionprops for object properties
Publication-quality analysis
Easier syntax for scientists

Use OpenCV when:

Processing large image batches
Speed is critical
Real-time processing
Advanced computer vision features

Both work for:

Thresholding, filtering, morphological operations
Basic image transformations
Most segmentation tasks

See: references/image_processing.md "Library Selection Guide"

Common BixBench Patterns

Pattern 1: Colony Morphometry (bix-18)

Question type: "Mean circularity of genotype with largest area?"

Data: CSV with Genotype, Area, Circularity columns

Workflow:

Load CSV → group by Genotype
Calculate mean Area per genotype
Identify genotype with max mean Area
Report mean Circularity for that genotype

See: references/segmentation.md "Colony Morphometry Analysis"

Pattern 2: Cell Counting Statistics (bix-19)

Question type: "Cohen's d for NeuN counts between conditions?"

Data: CSV with Condition, NeuN_count, Sex, Hemisphere columns

Workflow:

Load CSV → filter by hemisphere/sex if needed
Split by Condition (KD vs CTRL)
Calculate Cohen's d with pooled SD
Report effect size

See: references/statistical_analysis.md "Effect Size Calculations"

Pattern 3: Multi-Group Comparison (bix-41)

Question type: "Dunnett's test: How many ratios equivalent to control?"

Data: CSV with multiple co-culture ratios, Area, Circularity

Workflow:

Create Strain_Ratio labels
Run Dunnett's test for Area (vs control)
Run Dunnett's test for Circularity (vs control)
Count groups NOT significant in BOTH tests

See: references/statistical_analysis.md "Dunnett's Test"

Pattern 4: Regression Optimization (bix-54)

Question type: "Peak frequency from natural spline model?"

Data: CSV with co-culture frequencies and Area measurements

Workflow:

Convert ratio strings to frequencies
Fit natural spline model (df=4)
Find peak via grid search
Report peak frequency + confidence interval

See: references/statistical_analysis.md "Regression Modeling"

Quick Reference Table

| Task | Primary Tool | Reference | |------|-------------|-----------| | Load measurement CSV | pandas.read_csv() | This file | | Group statistics | df.groupby().agg() | This file | | T-test | scipy.stats.ttest_ind() | statistical_analysis.md | | ANOVA | statsmodels.ols + anova_lm() | statistical_analysis.md | | Dunnett's test | scipy.stats.dunnett() | statistical_analysis.md | | Cohen's d | Custom function (pooled SD) | statistical_analysis.md | | Power analysis | statsmodels TTestIndPower | statistical_analysis.md | | Polynomial regression | statsmodels.OLS + poly features | statistical_analysis.md | | Natural spline | patsy.cr() + statsmodels.OLS | statistical_analysis.md | | Cell segmentation | skimage.filters + watershed | cell_counting.md | | Colony segmentation | skimage.filters.threshold_otsu | segmentation.md | | Fluorescence quantification | skimage.measure.regionprops | fluorescence_analysis.md | | Colocalization | Pearson/Manders | fluorescence_analysis.md | | Image loading | tifffile, skimage.io | image_processing.md | | Batch processing | scripts/batch_process.py | scripts/ |

Example Scripts

Ready-to-use scripts in scripts/ directory:

segment_cells.py - Cell/nuclei counting with watershed
measure_fluorescence.py - Multi-channel intensity quantification
batch_process.py - Process folders of images
colony_morphometry.py - Measure colony area/circularity
statistical_comparison.py - Group comparison statistics

Usage:

# Count cells in image
python scripts/segment_cells.py cells.tif --channel 0 --min-area 50

# Batch process folder
python scripts/batch_process.py input_folder/ output.csv --analysis cell_count

Detailed Reference Guides

For complete implementations and protocols:

references/statistical_analysis.md - All statistical tests, regression models
references/cell_counting.md - Cell/nuclei counting protocols
references/segmentation.md - Colony and object segmentation
references/fluorescence_analysis.md - Intensity quantification, colocalization
references/image_processing.md - Image loading, preprocessing, library selection
references/troubleshooting.md - Common issues and solutions

Important Notes

Matching R Statistical Functions

Some BixBench questions use R for analysis. Python equivalents:

R's Dunnett test (multcomp::glht) → scipy.stats.dunnett() (scipy ≥ 1.10)
R's natural spline (ns(x, df=4)) → patsy.cr(x, knots=...) with explicit quantile knots
R's t-test (t.test()) → scipy.stats.ttest_ind()
R's ANOVA (aov()) → statsmodels.formula.api.ols() + sm.stats.anova_lm()

See: references/statistical_analysis.md for exact parameter matching

Answer Formatting

BixBench expects specific formats:

"to the nearest thousand": int(round(val, -3))
Percentages: Usually integer or 1-2 decimal places
Cohen's d: 3 decimal places
Sample sizes: Always integer (ceiling)
Ratios: String format "5:1"

Completeness Checklist

Before returning your answer, verify:

[ ] Loaded all data files and inspected column names
[ ] Identified the specific statistic or model requested
[ ] Used correct grouping variables and filter conditions
[ ] Applied correct rounding or format
[ ] For "how many" questions: counted correctly based on criteria
[ ] For statistical tests: used appropriate multiple comparison correction
[ ] For regression: properly prepared and transformed data
[ ] Double-checked direction of comparisons
[ ] Verified answer falls within expected range

Getting Help

Start with decision tree at top of this file
Check relevant reference guide for detailed protocol
Use example scripts as templates
See troubleshooting guide for common issues
All statistical implementations in statistical_analysis.md

SkillForge

tooluniverse-image-analysis

Before / After 效果对比

description 文档

Microscopy Image Analysis and Quantitative Imaging Data

When to Use This Skill

Core Principles

Required Python Packages

High-Level Workflow Decision Tree

Quantitative Data Analysis Workflow

Phase 0: Question Parsing and Data Discovery

Phase 1: Grouped Statistics

Phase 2: Statistical Testing

Phase 3: Regression Modeling

Raw Image Processing Workflow

When Processing Raw Images

Segmentation Method Selection

Library Selection: scikit-image vs OpenCV

Common BixBench Patterns

Pattern 1: Colony Morphometry (bix-18)

Pattern 2: Cell Counting Statistics (bix-19)

Pattern 3: Multi-Group Comparison (bix-41)

Pattern 4: Regression Optimization (bix-54)

Quick Reference Table

Example Scripts

Detailed Reference Guides

Important Notes

Matching R Statistical Functions

Answer Formatting

Completeness Checklist

Getting Help

forum用户评价 (0)

统计数据

用户评分

兼容平台

时间线

SkillForge