arize-dataset
直接检索 Salesforce 官方文档获取最新答案,无需本地语料库,提供最新且准确的 API 功能、配置方法和最佳实践
npx skills add github/awesome-copilot --skill arize-datasetBefore / After 效果对比
1 组查找 Salesforce API 文档需要登录多个官网、在不同站点间跳转,搜索结果可能过期,手动验证功能是否仍有效,一次查询平均需要 10-15 分钟
直接检索最新的官方文档和最新 API 说明,自动过滤过期内容,提供准确的代码示例和配置步骤,一次查询平均只需 1-2 分钟
description SKILL.md
arize-dataset
Arize Dataset Skill
Concepts
-
Dataset = a versioned collection of examples used for evaluation and experimentation
-
Dataset Version = a snapshot of a dataset at a point in time; updates can be in-place or create a new version
-
Example = a single record in a dataset with arbitrary user-defined fields (e.g.,
question,answer,context) -
Space = an organizational container; datasets belong to a space
System-managed fields on examples (id, created_at, updated_at) are auto-generated by the server -- never include them in create or append payloads.
Prerequisites
Proceed directly with the task — run the ax command you need. Do NOT check versions, env vars, or profiles upfront.
If an ax command fails, troubleshoot based on the error:
-
command not foundor version error → see references/ax-setup.md -
401 Unauthorized/ missing API key → runax profiles showto inspect the current profile. If the profile is missing or the API key is wrong: check.envforARIZE_API_KEYand use it to create/update the profile via references/ax-profiles.md. If.envhas no key either, ask the user for their Arize API key (https://app.arize.com/admin > API Keys) -
Space ID unknown → check
.envforARIZE_SPACE_ID, or runax spaces list -o json, or ask the user -
Project unclear → check
.envforARIZE_DEFAULT_PROJECT, or ask, or runax projects list -o json --limit 100and present as selectable options
List Datasets: ax datasets list
Browse datasets in a space. Output goes to stdout.
ax datasets list
ax datasets list --space-id SPACE_ID --limit 20
ax datasets list --cursor CURSOR_TOKEN
ax datasets list -o json
Flags
Flag Type Default Description
--space-id
string
from profile
Filter by space
--limit, -l
int
15
Max results (1-100)
--cursor
string
none
Pagination cursor from previous response
-o, --output
string
table
Output format: table, json, csv, parquet, or file path
-p, --profile
string
default
Configuration profile
Get Dataset: ax datasets get
Quick metadata lookup -- returns dataset name, space, timestamps, and version list.
ax datasets get DATASET_ID
ax datasets get DATASET_ID -o json
Flags
Flag Type Default Description
DATASET_ID
string
required
Positional argument
-o, --output
string
table
Output format
-p, --profile
string
default
Configuration profile
Response fields
Field Type Description
id
string
Dataset ID
name
string
Dataset name
space_id
string
Space this dataset belongs to
created_at
datetime
When the dataset was created
updated_at
datetime
Last modification time
versions
array
List of dataset versions (id, name, dataset_id, created_at, updated_at)
Export Dataset: ax datasets export
Download all examples to a file. Use --all for datasets larger than 500 examples (unlimited bulk export).
ax datasets export DATASET_ID
# -> dataset_abc123_20260305_141500/examples.json
ax datasets export DATASET_ID --all
ax datasets export DATASET_ID --version-id VERSION_ID
ax datasets export DATASET_ID --output-dir ./data
ax datasets export DATASET_ID --stdout
ax datasets export DATASET_ID --stdout | jq '.[0]'
Flags
Flag Type Default Description
DATASET_ID
string
required
Positional argument
--version-id
string
latest
Export a specific dataset version
--all
bool
false
Unlimited bulk export (use for datasets > 500 examples)
--output-dir
string
.
Output directory
--stdout
bool
false
Print JSON to stdout instead of file
-p, --profile
string
default
Configuration profile
Agent auto-escalation rule: If an export returns exactly 500 examples, the result is likely truncated — re-run with --all to get the full dataset.
Export completeness verification: After exporting, confirm the row count matches what the server reports:
# Get the server-reported count from dataset metadata
ax datasets get DATASET_ID -o json | jq '.versions[-1] | {version: .id, examples: .example_count}'
# Compare to what was exported
jq 'length' dataset_*/examples.json
# If counts differ, re-export with --all
Output is a JSON array of example objects. Each example has system fields (id, created_at, updated_at) plus all user-defined fields:
[
{
"id": "ex_001",
"created_at": "2026-01-15T10:00:00Z",
"updated_at": "2026-01-15T10:00:00Z",
"question": "What is 2+2?",
"answer": "4",
"topic": "math"
}
]
Create Dataset: ax datasets create
Create a new dataset from a data file.
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.csv
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.json
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.jsonl
ax datasets create --name "My Dataset" --space-id SPACE_ID --file data.parquet
Flags
Flag Type Required Description
--name, -n
string
yes
Dataset name
--space-id
string
yes
Space to create the dataset in
--file, -f
path
yes
Data file: CSV, JSON, JSONL, or Parquet
-o, --output
string
no
Output format for the returned dataset metadata
-p, --profile
string
no
Configuration profile
Passing data via stdin
Use --file - to pipe data directly — no temp file needed:
echo '[{"question": "What is 2+2?", "answer": "4"}]' | ax datasets create --name "my-dataset" --space-id SPACE_ID --file -
# Or with a heredoc
ax datasets create --name "my-dataset" --space-id SPACE_ID --file - << 'EOF'
[{"question": "What is 2+2?", "answer": "4"}]
EOF
To add rows to an existing dataset, use ax datasets append --json '[...]' instead — no file needed.
Supported file formats
Format Extension Notes
CSV
.csv
Column headers become field names
JSON
.json
Array of objects
JSON Lines
.jsonl
One object per line (NOT a JSON array)
Parquet
.parquet
Column names become field names; preserves types
Format gotchas:
-
CSV: Loses type information — dates become strings,
nullbecomes empty string. Use JSON/Parquet to preserve types. -
JSONL: Each line is a separate JSON object. A JSON array (
[{...}, {...}]) in a.jsonlfile will fail — use.jsonextension instead. -
Parquet: Preserves column types. Requires
pandas/pyarrowto read locally:pd.read_parquet("examples.parquet").
Append Examples: ax datasets append
Add examples to an existing dataset. Two input modes -- use whichever fits.
Inline JSON (agent-friendly)
Generate the payload directly -- no temp files needed:
ax datasets append DATASET_ID --json '[{"question": "What is 2+2?", "answer": "4"}]'
ax datasets append DATASET_ID --json '[
{"question": "What is gravity?", "answer": "A fundamental force..."},
{"question": "What is light?", "answer": "Electromagnetic radiation..."}
]'
From a file
ax datasets append DATASET_ID --file new_examples.csv
ax datasets append DATASET_ID --file additions.json
To a specific version
ax datasets append DATASET_ID --json '[{"q": "..."}]' --version-id VERSION_ID
Flags
Flag Type Required Description
DATASET_ID
string
yes
Positional argument
--json
string
mutex
JSON array of example objects
--file, -f
path
mutex
Data file (CSV, JSON, JSONL, Parquet)
--version-id
string
no
Append to a specific version (default: latest)
-o, --output
string
no
Output format for the returned dataset metadata
-p, --profile
string
no
Configuration profile
Exactly one of --json or --file is required.
Validation
-
Each example must be a JSON object with at least one user-defined field
-
Maximum 100,000 examples per request
Schema validation before append: If the dataset already has examples, inspect its schema before appending to avoid silent field mismatches:
# Check existing field names in the dataset
ax datasets export DATASET_ID --stdout | jq '.[0] | keys'
# Verify your new data has matching field names
echo '[{"question": "..."}]' | jq '.[0] | keys'
# Both outputs should show the same user-defined fields
Fields are free-form: extra fields in new examples are added, and missing fields become null. However, typos in field names (e.g., queston vs question) create new columns silently -- verify spelling before appending.
Delete Dataset: ax datasets delete
ax datasets delete DATASET_ID
ax datasets delete DATASET_ID --force # skip confirmation prompt
Flags
Flag Type Default Description
DATASET_ID
string
required
Positional argument
--force, -f
bool
false
Skip confirmation prompt
-p, --profile
string
default
Configuration profile
Workflows
Find a dataset by name
Users often refer to datasets by name rather than ID. Resolve a name to an ID before running other commands:
# Find dataset ID by name
ax datasets list -o json | jq '.[] | select(.name == "eval-set-v1") | .id'
# If the list is paginated, fetch more
ax datasets list -o json --limit 100 | jq '.[] | select(.name | test("eval-set")) | {id, name}'
Create a dataset from file for evaluation
- Prepare a CSV/JSON/Parquet file with your evaluation columns (e.g.,
input,expected_output)
If generating data inline, pipe it via stdin using --file - (see the Create Dataset section)
-
ax datasets create --name "eval-set-v1" --space-id SPACE_ID --file eval_data.csv -
Verify:
ax datasets get DATASET_ID -
Use the dataset ID to run experiments
Add examples to an existing dataset
# Find the dataset
ax datasets list
# Append inline or from a file (see Append Examples section for full syntax)
ax datasets append DATASET_ID --json '[{"question": "...", "answer": "..."}]'
ax datasets append DATASET_ID --file additional_examples.csv
Download dataset for offline analysis
-
ax datasets list-- find the dataset -
ax datasets export DATASET_ID-- download to file -
Parse the JSON:
jq '.[] | .question' dataset_*/examples.json
Export a specific version
# List versions
ax datasets get DATASET_ID -o json | jq '.versions'
# Export that version
ax datasets export DATASET_ID --version-id VERSION_ID
Iterate on a dataset
-
Export current version:
ax datasets export DATASET_ID -
Modify the examples locally
-
Append new rows:
ax datasets append DATASET_ID --file new_rows.csv -
Or create a fresh version:
ax datasets create --name "eval-set-v2" --space-id SPACE_ID --file updated_data.json
Pipe export to other tools
# Count examples
ax datasets export DATASET_ID --stdout | jq 'length'
# Extract a single field
ax datasets export DATASET_ID --stdout | jq '.[].question'
# Convert to CSV with jq
ax datasets export DATASET_ID --stdout | jq -r '.[] | [.question, .answer] | @csv'
Dataset Example Schema
Examples are free-form JSON objects. There is no fixed schema -- columns are whatever fields you provide. System-managed fields are added by the server:
Field Type Managed by Notes
id
string
server
Auto-generated UUID. Required on update, forbidden on create/append
created_at
datetime
server
Immutable creation timestamp
updated_at
datetime
server
Auto-updated on modification
(any user field) any JSON type user String, number, boolean, null, nested object, array
Related Skills
-
arize-trace: Export production spans to understand what data to put in datasets → use
arize-trace -
arize-experiment: Run evaluations against this dataset → next step is
arize-experiment -
arize-prompt-optimization: Use dataset + experiment results to improve prompts → use
arize-prompt-optimization
Troubleshooting
Problem Solution
ax: command not found
See references/ax-setup.md
401 Unauthorized
API key is wrong, expired, or doesn't have access to this space. Fix the profile using references/ax-profiles.md.
No profile found
No profile is configured. See references/ax-profiles.md to create one.
Dataset not found
Verify dataset ID with ax datasets list
File format error
Supported: CSV, JSON, JSONL, Parquet. Use --file - to read from stdin.
platform-managed column
Remove id, created_at, updated_at from create/append payloads
reserved column
Remove time, count, or any source_record_* field
Provide either --json or --file
Append requires exactly one input source
Examples array is empty
Ensure your JSON array or file contains at least one example
not a JSON object
Each element in the --json array must be a {...} object, not a string or number
Save Credentials for Future Use
See references/ax-profiles.md § Save Credentials for Future Use. Weekly Installs522Repositorygithub/awesome-copilotGitHub Stars29.6KFirst Seen12 days agoSecurity AuditsGen Agent Trust HubPassSocketPassSnykPassInstalled onopencode479codex477gemini-cli476github-copilot476deepagents473amp473
forum用户评价 (0)
发表评价
暂无评价
统计数据
用户评分
为此 Skill 评分