data-analysis
Used for data exploration, report generation, data consistency validation, and decision support. An AI Agent Skill that enhances work efficiency and automation capabilities.
npx skills add supercent-io/skills-template --skill data-analysisBefore / After Comparison
1 组Traditional data analysis processes typically involve manual data cleaning, transformation, and report generation, which are time-consuming and prone to errors. When verifying data consistency and supporting decision-making, significant manual intervention is often required, leading to low efficiency and difficulty in quickly responding to business needs.
By utilizing data-analysis skills, data exploration, report generation, and consistency verification become automated and highly efficient. It can quickly process complex datasets, provide accurate insights, significantly accelerate the decision-making process, and enable businesses to adapt more rapidly to market changes.
data-analysis
Data Analysis When to use this skill Data exploration: Understand a new dataset Report generation: Derive data-driven insights Quality validation: Check data consistency Decision support: Make data-driven recommendations Instructions Step 1: Load and explore data Python (Pandas): import pandas as pd import numpy as np # Load CSV df = pd.read_csv('data.csv') # Basic info print(df.info()) print(df.describe()) print(df.head(10)) # Check missing values print(df.isnull().sum()) # Data types print(df.dtypes) SQL: -- Inspect table schema DESCRIBE table_name; -- Sample data SELECT * FROM table_name LIMIT 10; -- Basic stats SELECT COUNT() as total_rows, COUNT(DISTINCT column_name) as unique_values, MIN(numeric_column) as min_val, MAX(numeric_column) as max_val, AVG(numeric_column) as avg_val FROM table_name; Step 2: Data cleaning # Handle missing values df['column'].fillna(df['column'].mean(), inplace=True) df.dropna(subset=['required_column'], inplace=True) # Remove duplicates df.drop_duplicates(inplace=True) # Type conversions df['date'] = pd.to_datetime(df['date']) df['category'] = df['category'].astype('category') # Remove outliers (IQR method) Q1 = df['value'].quantile(0.25) Q3 = df['value'].quantile(0.75) IQR = Q3 - Q1 df = df[(df['value'] >= Q1 - 1.5IQR) & (df['value'] <= Q3 + 1.5*IQR)] Step 3: Statistical analysis # Descriptive statistics print(df['numeric_column'].describe()) # Grouped analysis grouped = df.groupby('category').agg({ 'value': ['mean', 'sum', 'count'], 'other': 'nunique' }) print(grouped) # Correlation correlation = df[['col1', 'col2', 'col3']].corr() print(correlation) # Pivot table pivot = pd.pivot_table(df, values='sales', index='region', columns='month', aggfunc='sum' ) Step 4: Visualization import matplotlib.pyplot as plt import seaborn as sns # Histogram plt.figure(figsize=(10, 6)) df['value'].hist(bins=30) plt.title('Distribution of Values') plt.savefig('histogram.png') # Boxplot plt.figure(figsize=(10, 6)) sns.boxplot(x='category', y='value', data=df) plt.title('Value by Category') plt.savefig('boxplot.png') # Heatmap (correlation) plt.figure(figsize=(10, 8)) sns.heatmap(correlation, annot=True, cmap='coolwarm') plt.title('Correlation Matrix') plt.savefig('heatmap.png') # Time series plt.figure(figsize=(12, 6)) df.groupby('date')['value'].sum().plot() plt.title('Time Series of Values') plt.savefig('timeseries.png') Step 5: Derive insights # Top/bottom analysis top_10 = df.nlargest(10, 'value') bottom_10 = df.nsmallest(10, 'value') # Trend analysis df['month'] = df['date'].dt.to_period('M') monthly_trend = df.groupby('month')['value'].sum() growth = monthly_trend.pct_change() * 100 # Segment analysis segments = df.groupby('segment').agg({ 'revenue': 'sum', 'customers': 'nunique', 'orders': 'count' }) segments['avg_order_value'] = segments['revenue'] / segments['orders'] Output format Analysis report structure # Data Analysis Report ## 1. Dataset overview - Dataset: [name] - Records: X,XXX - Columns: XX - Date range: YYYY-MM-DD ~ YYYY-MM-DD ## 2. Key findings - Insight 1 - Insight 2 - Insight 3 ## 3. Statistical summary | Metric | Value | |------|-----| | Mean | X.XX | | Median | X.XX | | Std dev | X.XX | ## 4. Recommendations 1. [Recommendation 1] 2. [Recommendation 2] Best practices Understand the data first: Learn structure and meaning before analysis Incremental analysis: Move from simple to complex analyses Use visualization: Use a variety of charts to spot patterns Validate assumptions: Always verify assumptions about the data Reproducibility: Document analysis code and results Constraints Required rules (MUST) Preserve raw data (work on a copy) Document the analysis process Validate results Prohibited (MUST NOT) Do not expose sensitive personal data Do not draw unsupported conclusions References Pandas Documentation Matplotlib Gallery Seaborn Tutorial Examples Example 1: Basic usage Example 2: Advanced usageWeekly Installs12.3KRepositorysupercent-io/sk…templateGitHub Stars53First SeenJan 24, 2026Security AuditsGen Agent Trust HubPassSocketFailSnykPassInstalled oncodex12.2Kgemini-cli12.2Kopencode12.2Kgithub-copilot12.2Kcursor12.2Kamp12.1K
User Reviews (0)
Write a Review
No reviews yet
Statistics
User Rating
Rate this Skill