手动编写 schema vs 自动生成

Name: apify-generate-output-schema AI Agent Skill
Availability: InStock
Rating: 4.4 (24 reviews)
Author: apify

Before: 手动阅读 Actor 源代码，分析输出数据结构，编写 JSON schema 文件，反复调试字段类型和描述，部署到 Apify 后检查显示效果 After: 自动扫描 Actor 代码，解析输出数据结构，生成符合规范的 schema 文件，自动推断字段类型和描述，一键应用到 Apify Console

apify-generate-output-schema

Generate Actor Output Schema

You are generating output schema files for an Apify Actor. The output schema tells Apify Console how to display run results. You will analyze the Actor's source code, create dataset_schema.json, output_schema.json, and key_value_store_schema.json (if the Actor uses key-value store), and update actor.json.

Core Principles

Analyze code first: Read the Actor's source to understand what data it actually pushes to the dataset — never guess
Every field is nullable: APIs and websites are unpredictable — always set "nullable": true
Anonymize examples: Never use real user IDs, usernames, or personal data in examples
Verify against code: If TypeScript types exist, cross-check the schema against both the type definition AND the code that produces the values
Reuse existing patterns: Before generating schemas, check if other Actors in the same repository already have output schemas — match their structure, naming conventions, description style, and formatting
Don't reinvent the wheel: Reuse existing type definitions, interfaces, and utilities from the codebase instead of creating duplicate definitions

Phase 1: Discover Actor Structure

Goal: Locate the Actor and understand its output

Initial request: $ARGUMENTS

Actions:

Create todo list with all phases
Find the .actor/ directory containing actor.json
Read actor.json to understand the Actor's configuration
Check if dataset_schema.json, output_schema.json, and key_value_store_schema.json already exist
Search for existing schemas in the repository: Look for other .actor/ directories or schema files (e.g., **/dataset_schema.json, **/output_schema.json, **/key_value_store_schema.json) to learn the repo's conventions — match their description style, field naming, example formatting, and overall structure
Find all places where data is pushed to the dataset:

JavaScript/TypeScript: Search for Actor.pushData(, dataset.pushData(, Dataset.pushData(

Python: Search for Actor.push_data(, dataset.push_data(, Dataset.push_data(
Find all places where data is stored in the key-value store:

JavaScript/TypeScript: Search for Actor.setValue(, keyValueStore.setValue(, KeyValueStore.setValue(

Python: Search for Actor.set_value(, key_value_store.set_value(, KeyValueStore.set_value(
Find output type definitions — reuse them directly instead of recreating from scratch:

TypeScript: Look for output type interfaces/types (e.g., in src/types/, src/types/output.ts). If an interface or type already defines the output shape, derive the schema fields from it — do not create a parallel definition

Python: Look for TypedDict, dataclass, or Pydantic model definitions. Use the existing field names, types, and docstrings as the source of truth
Check for existing shared schema utilities or helper functions in the codebase that handle schema generation or validation — reuse them rather than creating new logic
If inline storages.dataset or storages.keyValueStore config exists in actor.json, note it for migration

Present findings to user: list all discovered dataset output fields, key-value store keys, their types, and where they come from.

Phase 2: Generate `dataset_schema.json`

Goal: Create a complete dataset schema with field definitions and display views

File structure

{
    "actorSpecification": 1,
    "fields": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            // ALL output fields here — every field the Actor can produce,
            // not just the ones shown in the overview view
        },
        "required": [],
        "additionalProperties": true
    },
    "views": {
        "overview": {
            "title": "Overview",
            "description": "Most important fields at a glance",
            "transformation": {
                "fields": [
                    // 8-12 most important field names
                ]
            },
            "display": {
                "component": "table",
                "properties": {
                    // Display config for each overview field
                }
            }
        }
    }
}

Consistency with existing schemas

If existing output schemas were found in the repository during Phase 1 (step 5), follow their conventions:

Match the description writing style (sentence case vs. lowercase, period vs. no period, etc.)
Match the field naming convention (camelCase vs. snake_case) — this must also match the actual keys produced by the Actor code
Match the example value style (e.g., date formats, URL patterns, placeholder names)
Match the view structure (number of fields in overview, display format choices)
Match the JSON formatting (indentation, property ordering, spacing) — all schemas in the same repository must use identical formatting, including standalone Actors

When the Actor code already has well-defined TypeScript interfaces or Python type classes, derive fields directly from those types rather than re-analyzing pushData/push_data calls from scratch. The type definition is the canonical source.

Hard rules (no exceptions)

Rule Detail

All fields in properties The fields.properties object must contain every field the Actor can output, not just the fields shown in the overview view. The views section selects a subset for display — the properties section must be the complete superset

"nullable": true On every field — APIs are unpredictable

"additionalProperties": true On the top-level fields object AND on every nested object within properties. This is the most commonly missed rule — it must appear at both levels

"required": [] Always empty array — on the top-level fields object AND on every nested object within properties

Anonymized examples No real user IDs, usernames, or content

"type" required with "nullable" AJV rejects nullable without a type on the same field

Warning — most common mistakes:

Only including fields that appear in the overview view. The fields.properties must list ALL output fields, even if they are not in the views section.
Only adding "required": [] and "additionalProperties": true on nested object-type properties but forgetting them on the top-level fields object. Both levels need them.

Note: nullable is an Apify-specific extension to JSON Schema draft-07. It is intentional and correct.

Field type patterns

String field:

"title": {
    "type": "string",
    "description": "Title of the scraped item",
    "nullable": true,
    "example": "Example Item Title"
}

Number field:

"viewCount": {
    "type": "number",
    "description": "Number of views",
    "nullable": true,
    "example": 15000
}

Boolean field:

"isVerified": {
    "type": "boolean",
    "description": "Whether the account is verified",
    "nullable": true,
    "example": true
}

Array field:

"hashtags": {
    "type": "array",
    "description": "Hashtags associated with the item",
    "items": { "type": "string" },
    "nullable": true,
    "example": ["#example", "#demo"]
}

Nested object field:

"authorInfo": {
    "type": "object",
    "description": "Information about the author",
    "properties": {
        "name": { "type": "string", "nullable": true },
        "url": { "type": "string", "nullable": true }
    },
    "required": [],
    "additionalProperties": true,
    "nullable": true,
    "example": { "name": "Example Author", "url": "https://example.com/author" }
}

Enum field:

"contentType": {
    "type": "string",
    "description": "Type of content",
    "enum": ["article", "video", "image"],
    "nullable": true,
    "example": "article"
}

Union type (e.g., TypeScript ObjectType | string):

"metadata": {
    "type": ["object", "string"],
    "description": "Structured metadata object, or error string if unavailable",
    "nullable": true,
    "example": { "key": "value" }
}

Anonymized example values

Use realistic but generic values. Follow platform ID format conventions:

Field type Example approach

IDs Match platform format and length (e.g., 11 chars for YouTube video IDs)

Usernames "exampleuser", "sampleuser123"

Display names "Example Channel", "Sample Author"

URLs Use platform's standard URL format with fake IDs

Dates "2025-01-15T12:00:00.000Z" (ISO 8601)

Text content Generic descriptive text, e.g., "This is an example description."

Views section

transformation.fields: List 8–12 most important field names (order = column order in UI)
display.properties: One entry per overview field with label and format
Available formats: "text", "number", "date", "link", "boolean", "image", "array", "object"

Pick fields that give users the most useful at-a-glance summary of the data.

Phase 3: Generate `key_value_store_schema.json` (if applicable)

Goal: Define key-value store collections if the Actor stores data in the key-value store

Skip this phase if no Actor.setValue() / Actor.set_value() calls were found in Phase 1 (beyond the default INPUT key).

File structure

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "<Descriptive title — what the key-value store contains>",
    "description": "<One sentence describing the stored data>",
    "collections": {
        "<collectionName>": {
            "title": "<Human-readable title>",
            "description": "<What this collection contains>",
            "keyPrefix": "<prefix->"
        }
    }
}

How to identify collections

Group the discovered setValue / set_value calls by key pattern:

Fixed keys (e.g., "RESULTS", "summary") — use "key" (exact match)
Dynamic keys with a prefix (e.g., "screenshot-${id}", f"image-{name}") — use "keyPrefix"

Each group becomes a collection.

Collection properties

Property Required Description

title Yes Shown in UI tabs

description No Shown in UI tooltips

key Conditional Exact key for single-key collections (use key OR keyPrefix, not both)

keyPrefix Conditional Prefix for multi-key collections (use key OR keyPrefix, not both)

contentTypes No Restrict allowed MIME types (e.g., ["image/jpeg"], ["application/json"])

jsonSchema No JSON Schema draft-07 for validating application/json content

Examples

Single file output (e.g., a report):

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Analysis Results",
    "description": "Key-value store containing analysis output",
    "collections": {
        "report": {
            "title": "Report",
            "description": "Final analysis report",
            "key": "REPORT",
            "contentTypes": ["application/json"]
        }
    }
}

Multiple files with prefix (e.g., screenshots):

{
    "actorKeyValueStoreSchemaVersion": 1,
    "title": "Scraped Files",
    "description": "Key-value store containing downloaded files and screenshots",
    "collections": {
        "screenshots": {
            "title": "Screenshots",
            "description": "Page screenshots captured during scraping",
            "keyPrefix": "screenshot-",
            "contentTypes": ["image/png", "image/jpeg"]
        },
        "documents": {
            "title": "Documents",
            "description": "Downloaded document files",
            "keyPrefix": "doc-",
            "contentTypes": ["application/pdf", "text/html"]
        }
    }
}

Phase 4: Generate `output_schema.json`

Goal: Create the output schema that tells Apify Console where to find results

For most Actors that push data to a dataset, this is a minimal file:

{
    "actorOutputSchemaVersion": 1,
    "title": "<Descriptive title — what the Actor returns>",
    "description": "<One sentence describing the output data>",
    "properties": {
        "dataset": {
            "type": "string",
            "title": "Results",
            "description": "Dataset containing all scraped data",
            "template": "{{links.apiDefaultDatasetUrl}}/items"
        }
    }
}

Critical: Each property entry must include "type": "string" — this is an Apify-specific convention. The Apify meta-validator rejects properties without it (and rejects "type": "object" — only "string" is valid here).

If key_value_store_schema.json was generated in Phase 3, add a second property:

"files": {
    "type": "string",
    "title": "Files",
    "description": "Key-value store containing downloaded files",
    "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"
}

Available template variables

{{links.apiDefaultDatasetUrl}} — API URL of default dataset
{{links.apiDefaultKeyValueStoreUrl}} — API URL of default key-value store
{{links.publicRunUrl}} — Public run URL
{{links.consoleRunUrl}} — Console run URL
{{links.apiRunUrl}} — API run URL
{{links.containerRunUrl}} — URL of webserver running inside the run
{{run.defaultDatasetId}} — ID of the default dataset
{{run.defaultKeyValueStoreId}} — ID of the default key-value store

Phase 5: Update `actor.json`

Goal: Wire the schema files into the Actor configuration

Actions:

Read the current actor.json
Add or update the storages.dataset reference:

"storages": {
    "dataset": "./dataset_schema.json"
}

If key_value_store_schema.json was generated, add the reference:

"storages": {
    "dataset": "./dataset_schema.json",
    "keyValueStore": "./key_value_store_schema.json"
}

Add or update the output reference:

"output": "./output_schema.json"

If actor.json had inline storages.dataset or storages.keyValueStore objects (not string paths), migrate their content into the respective schema files and replace the inline objects with file path strings

Phase 6: Review and Validate

Goal: Ensure correctness and completeness

Checklist:

Every output field from the source code is in dataset_schema.json fields.properties — not just the overview view fields but ALL fields the Actor can produce
Every field has "nullable": true
The top-level fields object has both "additionalProperties": true and "required": []
Every nested object within properties also has "additionalProperties": true and "required": []
Every field has a "description" and an "example"
All example values are anonymized
"type" is present on every field that has "nullable"
Views list 8–12 most useful fields with correct display formats
output_schema.json has "type": "string" on every property
If k

...

apify-generate-output-schema

Before / After 效果对比

apify-generate-output-schema

Generate Actor Output Schema

Core Principles

Phase 1: Discover Actor Structure

Phase 2: Generate `dataset_schema.json`

File structure

Consistency with existing schemas

Hard rules (no exceptions)

Field type patterns

Anonymized example values

Views section

Phase 3: Generate `key_value_store_schema.json` (if applicable)

File structure

How to identify collections

Collection properties

Examples

Phase 4: Generate `output_schema.json`

Available template variables

Phase 5: Update `actor.json`

Phase 6: Review and Validate

用户评价 (0)

统计数据

用户评分

兼容平台

时间线

apify-generate-output-schema

Before / After 效果对比

apify-generate-output-schema

Generate Actor Output Schema

Core Principles

Phase 1: Discover Actor Structure

Phase 2: Generate dataset_schema.json

File structure

Consistency with existing schemas

Hard rules (no exceptions)

Field type patterns

Anonymized example values

Views section

Phase 3: Generate key_value_store_schema.json (if applicable)

File structure

How to identify collections

Collection properties

Examples

Phase 4: Generate output_schema.json

Available template variables

Phase 5: Update actor.json

Phase 6: Review and Validate

用户评价 (0)

统计数据

用户评分

兼容平台

时间线

Phase 2: Generate `dataset_schema.json`

Phase 3: Generate `key_value_store_schema.json` (if applicable)

Phase 4: Generate `output_schema.json`

Phase 5: Update `actor.json`