---
id: sm-statistical-analysis
name: "statistical-analysis"
url: https://skills.yangsir.net/skill/sm-statistical-analysis
author: anthropics
domain: data-ai
tags: ["statistical-modeling", "hypothesis-testing", "regression-analysis", "data-interpretation", "r/python"]
install_count: 2100
rating: 4.30 (20 reviews)
github: https://github.com/anthropics/knowledge-work-plugins
---

# statistical-analysis

> 执行描述性统计、趋势分析、异常值检测、假设检验，并提供统计主张的谨慎指导。

**Stats**: 2,100 installs · 4.3/5 (20 reviews)

## Before / After 对比

### 严谨执行统计数据分析

## Readme

# statistical-analysis

# Statistical Analysis Skill

Descriptive statistics, trend analysis, outlier detection, hypothesis testing, and guidance on when to be cautious about statistical claims.

## Descriptive Statistics Methodology

### Central Tendency

Choose the right measure of center based on the data:

Situation
Use
Why

Symmetric distribution, no outliers
Mean
Most efficient estimator

Skewed distribution
Median
Robust to outliers

Categorical or ordinal data
Mode
Only option for non-numeric

Highly skewed with outliers (e.g., revenue per user)
Median + mean
Report both; the gap shows skew

**Always report mean and median together for business metrics.** If they diverge significantly, the data is skewed and the mean alone is misleading.

### Spread and Variability

- **Standard deviation**: How far values typically fall from the mean. Use with normally distributed data.

- **Interquartile range (IQR)**: Distance from p25 to p75. Robust to outliers. Use with skewed data.

- **Coefficient of variation (CV)**: StdDev / Mean. Use to compare variability across metrics with different scales.

- **Range**: Max minus min. Sensitive to outliers but gives a quick sense of data extent.

### Percentiles for Business Context

Report key percentiles to tell a richer story than mean alone:

```
p1:   Bottom 1% (floor / minimum typical value)
p5:   Low end of normal range
p25:  First quartile
p50:  Median (typical user)
p75:  Third quartile
p90:  Top 10% / power users
p95:  High end of normal range
p99:  Top 1% / extreme users

```

**Example narrative**: "The median session duration is 4.2 minutes, but the top 10% of users spend over 22 minutes per session, pulling the mean up to 7.8 minutes."

### Describing Distributions

Characterize every numeric distribution you analyze:

- **Shape**: Normal, right-skewed, left-skewed, bimodal, uniform, heavy-tailed

- **Center**: Mean and median (and the gap between them)

- **Spread**: Standard deviation or IQR

- **Outliers**: How many and how extreme

- **Bounds**: Is there a natural floor (zero) or ceiling (100%)?

## Trend Analysis and Forecasting

### Identifying Trends

**Moving averages** to smooth noise:

```
# 7-day moving average (good for daily data with weekly seasonality)
df['ma_7d'] = df['metric'].rolling(window=7, min_periods=1).mean()

# 28-day moving average (smooths weekly AND monthly patterns)
df['ma_28d'] = df['metric'].rolling(window=28, min_periods=1).mean()

```

**Period-over-period comparison**:

- Week-over-week (WoW): Compare to same day last week

- Month-over-month (MoM): Compare to same month prior

- Year-over-year (YoY): Gold standard for seasonal businesses

- Same-day-last-year: Compare specific calendar day

**Growth rates**:

```
Simple growth: (current - previous) / previous
CAGR: (ending / beginning) ^ (1 / years) - 1
Log growth: ln(current / previous)  -- better for volatile series

```

### Seasonality Detection

Check for periodic patterns:

- Plot the raw time series -- visual inspection first

- Compute day-of-week averages: is there a clear weekly pattern?

- Compute month-of-year averages: is there an annual cycle?

- When comparing periods, always use YoY or same-period comparisons to avoid conflating trend with seasonality

### Forecasting (Simple Methods)

For business analysts (not data scientists), use straightforward methods:

- **Naive forecast**: Tomorrow = today. Use as a baseline.

- **Seasonal naive**: Tomorrow = same day last week/year.

- **Linear trend**: Fit a line to historical data. Only for clearly linear trends.

- **Moving average forecast**: Use trailing average as the forecast.

**Always communicate uncertainty**. Provide a range, not a point estimate:

- "We expect 10K-12K signups next month based on the 3-month trend"

- NOT "We will get exactly 11,234 signups next month"

**When to escalate to a data scientist**: Non-linear trends, multiple seasonalities, external factors (marketing spend, holidays), or when forecast accuracy matters for resource allocation.

## Outlier and Anomaly Detection

### Statistical Methods

**Z-score method** (for normally distributed data):

```
z_scores = (df['value'] - df['value'].mean()) / df['value'].std()
outliers = df[abs(z_scores) > 3]  # More than 3 standard deviations

```

**IQR method** (robust to non-normal distributions):

```
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = df[(df['value'] < lower_bound) | (df['value'] > upper_bound)]

```

**Percentile method** (simplest):

```
outliers = df[(df['value'] < df['value'].quantile(0.01)) |
              (df['value'] > df['value'].quantile(0.99))]

```

### Handling Outliers

Do NOT automatically remove outliers. Instead:

- **Investigate**: Is this a data error, a genuine extreme value, or a different population?

- **Data errors**: Fix or remove (e.g., negative ages, timestamps in year 1970)

- **Genuine extremes**: Keep them but consider using robust statistics (median instead of mean)

- **Different population**: Segment them out for separate analysis (e.g., enterprise vs. SMB customers)

**Report what you did**: "We excluded 47 records (0.3%) with transaction amounts >$50K, which represent bulk enterprise orders analyzed separately."

### Time Series Anomaly Detection

For detecting unusual values in a time series:

- Compute expected value (moving average or same-period-last-year)

- Compute deviation from expected

- Flag deviations beyond a threshold (typically 2-3 standard deviations of the residuals)

- Distinguish between point anomalies (single unusual value) and change points (sustained shift)

## Hypothesis Testing Basics

### When to Use

Use hypothesis testing when you need to determine whether an observed difference is likely real or could be due to random chance. Common scenarios:

- A/B test results: Is variant B actually better than A?

- Before/after comparison: Did the product change actually move the metric?

- Segment comparison: Do enterprise customers really have higher retention?

### The Framework

- **Null hypothesis (H0)**: There is no difference (the default assumption)

- **Alternative hypothesis (H1)**: There is a difference

- **Choose significance level (alpha)**: Typically 0.05 (5% chance of false positive)

- **Compute test statistic and p-value**

- **Interpret**: If p < alpha, reject H0 (evidence of a real difference)

### Common Tests

Scenario
Test
When to Use

Compare two group means
t-test (independent)
Normal data, two groups

Compare two group proportions
z-test for proportions
Conversion rates, binary outcomes

Compare paired measurements
Paired t-test
Before/after on same entities

Compare 3+ group means
ANOVA
Multiple segments or variants

Non-normal data, two groups
Mann-Whitney U test
Skewed metrics, ordinal data

Association between categories
Chi-squared test
Two categorical variables

### Practical Significance vs. Statistical Significance

**Statistical significance** means the difference is unlikely due to chance.

**Practical significance** means the difference is large enough to matter for business decisions.

A difference can be statistically significant but practically meaningless (common with large samples). Always report:

- **Effect size**: How big is the difference? (e.g., "Variant B improved conversion by 0.3 percentage points")

- **Confidence interval**: What's the range of plausible true effects?

- **Business impact**: What does this translate to in revenue, users, or other business terms?

### Sample Size Considerations

- Small samples produce unreliable results, even with significant p-values

- Rule of thumb for proportions: Need at least 30 events per group for basic reliability

- For detecting small effects (e.g., 1% conversion rate change), you may need thousands of observations per group

- If your sample is small, say so: "With only 200 observations per group, we have limited power to detect effects smaller than X%"

## When to Be Cautious About Statistical Claims

### Correlation Is Not Causation

When you find a correlation, explicitly consider:

- **Reverse causation**: Maybe B causes A, not A causes B

- **Confounding variables**: Maybe C causes both A and B

- **Coincidence**: With enough variables, spurious correlations are inevitable

**What you can say**: "Users who use feature X have 30% higher retention"
**What you cannot say without more evidence**: "Feature X causes 30% higher retention"

### Multiple Comparisons Problem

When you test many hypotheses, some will be "significant" by chance:

- Testing 20 metrics at p=0.05 means ~1 will be falsely significant

- If you looked at many segments before finding one that's different, note that

- Adjust for multiple comparisons with Bonferroni correction (divide alpha by number of tests) or report how many tests were run

### Simpson's Paradox

A trend in aggregated data can reverse when data is segmented:

- Always check whether the conclusion holds across key segments

- Example: Overall conversion goes up, but conversion goes down in every segment -- because the mix shifted toward a higher-converting segment

### Survivorship Bias

You can only analyze entities that "survived" to be in your dataset:

- Analyzing active users ignores those who churned

- Analyzing successful companies ignores those that failed

- Always ask: "Who is missing from this dataset, and would their inclusion change the conclusion?"

### Ecological Fallacy

Aggregate trends may not apply to individuals:

- "Countries with higher X have higher Y" does NOT mean "individuals with higher X have higher Y"

- Be careful about applying group-level findings to individual cases

### Anchoring on Specific Numbers

Be wary of false precision:

- "Churn will be 4.73% next quarter" implies more certainty than is warranted

- Prefer ranges: "We expect churn between 4-6% based on historical patterns"

- Round appropriately: "About 5%" is often more honest than "4.73%"

Weekly Installs451Repository[anthropics/know…-plugins](https://github.com/anthropics/knowledge-work-plugins)GitHub Stars9.9KFirst SeenJan 31, 2026Security Audits[Gen Agent Trust HubPass](/anthropics/knowledge-work-plugins/statistical-analysis/security/agent-trust-hub)[SocketPass](/anthropics/knowledge-work-plugins/statistical-analysis/security/socket)[SnykPass](/anthropics/knowledge-work-plugins/statistical-analysis/security/snyk)Installed onopencode383codex374gemini-cli366github-copilot354kimi-cli337amp336

---
*Source: https://skills.yangsir.net/skill/sm-statistical-analysis*
*Markdown mirror: https://skills.yangsir.net/api/skill/sm-statistical-analysis/markdown*