data-analysis
用于数据探索、生成报告、验证数据一致性和支持决策。,AI Agent Skill,提升工作效率和自动化能力
npx skills add supercent-io/skills-template --skill data-analysisBefore / After 效果对比
1 组传统的数据分析过程通常涉及手动的数据清洗、转换和报告生成,耗时且容易出错。在验证数据一致性和支持决策时,往往需要大量人工干预,效率低下,难以快速响应业务需求。
运用data-analysis技能,数据探索、报告生成和一致性验证变得自动化且高效。它能够快速处理复杂数据集,提供准确的洞察,极大地加速了决策过程,使业务能够更迅速地适应市场变化。
data-analysis
Data Analysis When to use this skill Data exploration: Understand a new dataset Report generation: Derive data-driven insights Quality validation: Check data consistency Decision support: Make data-driven recommendations Instructions Step 1: Load and explore data Python (Pandas): import pandas as pd import numpy as np # Load CSV df = pd.read_csv('data.csv') # Basic info print(df.info()) print(df.describe()) print(df.head(10)) # Check missing values print(df.isnull().sum()) # Data types print(df.dtypes) SQL: -- Inspect table schema DESCRIBE table_name; -- Sample data SELECT * FROM table_name LIMIT 10; -- Basic stats SELECT COUNT() as total_rows, COUNT(DISTINCT column_name) as unique_values, MIN(numeric_column) as min_val, MAX(numeric_column) as max_val, AVG(numeric_column) as avg_val FROM table_name; Step 2: Data cleaning # Handle missing values df['column'].fillna(df['column'].mean(), inplace=True) df.dropna(subset=['required_column'], inplace=True) # Remove duplicates df.drop_duplicates(inplace=True) # Type conversions df['date'] = pd.to_datetime(df['date']) df['category'] = df['category'].astype('category') # Remove outliers (IQR method) Q1 = df['value'].quantile(0.25) Q3 = df['value'].quantile(0.75) IQR = Q3 - Q1 df = df[(df['value'] >= Q1 - 1.5IQR) & (df['value'] <= Q3 + 1.5*IQR)] Step 3: Statistical analysis # Descriptive statistics print(df['numeric_column'].describe()) # Grouped analysis grouped = df.groupby('category').agg({ 'value': ['mean', 'sum', 'count'], 'other': 'nunique' }) print(grouped) # Correlation correlation = df[['col1', 'col2', 'col3']].corr() print(correlation) # Pivot table pivot = pd.pivot_table(df, values='sales', index='region', columns='month', aggfunc='sum' ) Step 4: Visualization import matplotlib.pyplot as plt import seaborn as sns # Histogram plt.figure(figsize=(10, 6)) df['value'].hist(bins=30) plt.title('Distribution of Values') plt.savefig('histogram.png') # Boxplot plt.figure(figsize=(10, 6)) sns.boxplot(x='category', y='value', data=df) plt.title('Value by Category') plt.savefig('boxplot.png') # Heatmap (correlation) plt.figure(figsize=(10, 8)) sns.heatmap(correlation, annot=True, cmap='coolwarm') plt.title('Correlation Matrix') plt.savefig('heatmap.png') # Time series plt.figure(figsize=(12, 6)) df.groupby('date')['value'].sum().plot() plt.title('Time Series of Values') plt.savefig('timeseries.png') Step 5: Derive insights # Top/bottom analysis top_10 = df.nlargest(10, 'value') bottom_10 = df.nsmallest(10, 'value') # Trend analysis df['month'] = df['date'].dt.to_period('M') monthly_trend = df.groupby('month')['value'].sum() growth = monthly_trend.pct_change() * 100 # Segment analysis segments = df.groupby('segment').agg({ 'revenue': 'sum', 'customers': 'nunique', 'orders': 'count' }) segments['avg_order_value'] = segments['revenue'] / segments['orders'] Output format Analysis report structure # Data Analysis Report ## 1. Dataset overview - Dataset: [name] - Records: X,XXX - Columns: XX - Date range: YYYY-MM-DD ~ YYYY-MM-DD ## 2. Key findings - Insight 1 - Insight 2 - Insight 3 ## 3. Statistical summary | Metric | Value | |------|-----| | Mean | X.XX | | Median | X.XX | | Std dev | X.XX | ## 4. Recommendations 1. [Recommendation 1] 2. [Recommendation 2] Best practices Understand the data first: Learn structure and meaning before analysis Incremental analysis: Move from simple to complex analyses Use visualization: Use a variety of charts to spot patterns Validate assumptions: Always verify assumptions about the data Reproducibility: Document analysis code and results Constraints Required rules (MUST) Preserve raw data (work on a copy) Document the analysis process Validate results Prohibited (MUST NOT) Do not expose sensitive personal data Do not draw unsupported conclusions References Pandas Documentation Matplotlib Gallery Seaborn Tutorial Examples Example 1: Basic usage Example 2: Advanced usageWeekly Installs12.3KRepositorysupercent-io/sk…templateGitHub Stars53First SeenJan 24, 2026Security AuditsGen Agent Trust HubPassSocketFailSnykPassInstalled oncodex12.2Kgemini-cli12.2Kopencode12.2Kgithub-copilot12.2Kcursor12.2Kamp12.1K
用户评价 (0)
发表评价
暂无评价
统计数据
用户评分
为此 Skill 评分