Workflow Schema Reference¶

Complete YAML schema reference for Ploston workflows.

Canonical Example¶

This example shows every field. Copy it as a starting point for your workflows.

# ─────────────────────────────────────────────────────────────────
# METADATA (required)
# ─────────────────────────────────────────────────────────────────
name: data-pipeline                    # Required: Workflow identifier
version: "1.0.0"                       # Required: Semantic version
description: "Fetch, transform, and validate data"  # Optional

# ─────────────────────────────────────────────────────────────────
# PACKAGES (optional)
# ─────────────────────────────────────────────────────────────────
packages:
  profile: standard                    # minimal | standard | data_science
  additional:                          # Extra packages to allow
    - requests

# ─────────────────────────────────────────────────────────────────
# DEFAULTS (optional)
# ─────────────────────────────────────────────────────────────────
defaults:
  timeout: 30                          # Default step timeout (seconds)
  on_error: fail                       # fail | continue | retry
  retry:                               # Retry config (when on_error: retry)
    max_attempts: 3
    initial_delay: 1.0
    max_delay: 30.0
    backoff_multiplier: 2.0

# ─────────────────────────────────────────────────────────────────
# INPUTS (optional, but usually needed)
# Format: Array of input definitions
# ─────────────────────────────────────────────────────────────────
inputs:
  # Simple syntax: just the name (required, type: string)
  - url

  # With default value (makes it optional)
  - format: "json"

  # Full definition with all options
  - count:
      type: integer                    # string | integer | number | boolean | array | object
      required: false                  # Default: true
      default: 10                      # Default value
      description: "Number of items"   # For documentation
      minimum: 1                       # Validation: minimum value
      maximum: 100                     # Validation: maximum value

  # Enum constraint
  - output_format:
      type: string
      enum: ["json", "csv", "xml"]     # Allowed values
      default: "json"

  # Pattern constraint
  - email:
      type: string
      pattern: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"

# ─────────────────────────────────────────────────────────────────
# STEPS (required, at least one)
# ─────────────────────────────────────────────────────────────────
steps:
  # Tool step: calls an MCP tool
  - id: fetch                          # Required: unique step identifier
    tool: http_get                     # MCP tool name
    params:                            # Tool parameters (templates allowed)
      url: "{{ inputs.url }}"
      headers:
        Accept: "application/json"
    timeout: 60                        # Override default timeout
    on_error: retry                    # Override default error handling
    retry:
      max_attempts: 3
      initial_delay: 2.0

  # Code step: runs Python in sandbox
  - id: transform
    code: |
      import json

      # Access previous step output
      data = context.steps['fetch'].output

      # Access inputs
      limit = context.inputs.get('count', 10)

      # Process data
      items = data.get('items', [])[:limit]

      # Return result (available as steps.transform.output)
      return {"items": items, "count": len(items)}

  # Step with dependency
  - id: validate
    depends_on: [transform]            # Wait for these steps first
    code: |
      data = context.steps['transform'].output
      if data['count'] == 0:
          raise ValueError("No items found")
      return {"valid": True, "count": data['count']}

# ─────────────────────────────────────────────────────────────────
# OUTPUTS (optional)
# ─────────────────────────────────────────────────────────────────

# Option 1: Single output (simple)
output: "{{ steps.validate.output }}"

# Option 2: Multiple named outputs (use this OR output, not both)
# outputs:
#   - name: result
#     from_path: steps.validate.output
#     description: "Validation result"
#   - name: item_count
#     value: "{{ steps.transform.output.count }}"
#     description: "Number of items processed"

Top-Level Structure¶

# Required
name: string          # Workflow identifier (alphanumeric, hyphens)
version: string       # Semantic version (e.g., "1.0", "2.1.3")

# Optional
description: string   # Human-readable description
packages: object      # Python package configuration
defaults: object      # Default step settings

# Schema
inputs: array         # Input definitions (array format)
steps: array          # Step definitions (required, at least one)
outputs: array        # Output definitions (optional)
output: string        # Single output expression (alternative to outputs)

Metadata¶

`name` (required)¶

Unique workflow identifier.

Type: string
Pattern: ^[a-zA-Z][a-zA-Z0-9-]*$
Example: data-transform, hello-world

`version` (required)¶

Semantic version string.

Type: string
Example: "1.0", "2.1.3"

`description` (optional)¶

Human-readable description.

Type: string
Example: "Transform and validate JSON data"

Packages Configuration¶

packages:
  profile: string     # Package profile: minimal | standard | data_science
  additional: array   # Additional packages to install

Profiles¶

Profile	Packages
`minimal`	json, re, datetime, math
`standard`	minimal + collections, itertools, functools, hashlib, uuid
`data_science`	standard + numpy, pandas (if available)

Defaults¶

defaults:
  timeout: integer    # Default step timeout (seconds)
  on_error: string    # Error handling: fail | continue | retry
  retry: object       # Retry configuration

Retry Configuration¶

defaults:
  retry:
    max_attempts: 3           # Maximum retry attempts
    initial_delay: 1.0        # Initial delay (seconds)
    max_delay: 30.0           # Maximum delay (seconds)
    backoff_multiplier: 2.0   # Exponential backoff multiplier

Inputs¶

Format: inputs is an array (list) of input definitions.

Ploston supports three syntaxes for input definitions:

Syntax 1: Simple String (Required Input)¶

inputs:
  - url                    # Required string input named "url"
  - topic                  # Required string input named "topic"

Syntax 2: Name with Default (Optional Input)¶

inputs:
  - format: "json"         # Optional, defaults to "json"
  - count: 10              # Optional, defaults to 10

Syntax 3: Full Definition (All Options)¶

inputs:
  - url:
      type: string         # Required: string | integer | number | boolean | array | object
      required: true       # Optional: default is true
      default: null        # Optional: default value (makes input optional)
      description: "URL"   # Optional: human-readable description
      enum: [...]          # Optional: allowed values
      pattern: "^https?"   # Optional: regex pattern (strings only)
      minimum: 1           # Optional: minimum value (numbers only)
      maximum: 100         # Optional: maximum value (numbers only)

Input Types¶

Type	JSON Type	Example	Notes
`string`	string	`"hello"`	Default type if not specified
`integer`	number	`42`	Whole numbers only
`number`	number	`3.14`	Any numeric value
`boolean`	boolean	`true`	true or false
`array`	array	`[1, 2, 3]`	JSON array
`object`	object	`{"key": "value"}`	JSON object

Complete Input Examples¶

inputs:
  # Simple required inputs
  - url
  - topic

  # With default values
  - format: "json"
  - retries: 3

  # Full definitions
  - count:
      type: integer
      required: false
      default: 10
      description: "Number of items to fetch"
      minimum: 1
      maximum: 100

  - output_format:
      type: string
      enum: ["json", "csv", "xml"]
      default: "json"
      description: "Output format"

  - email:
      type: string
      required: true
      description: "Contact email"
      pattern: "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"

Required vs Optional¶

Condition	Required?
Simple string syntax (`- url`)	✅ Required
Has `default` value	❌ Optional
`required: true` (explicit)	✅ Required
`required: false` (explicit)	❌ Optional

Steps¶

steps:
  - id: string          # Step identifier (required)

    # Type (exactly one required)
    tool: string        # MCP tool name
    code: string        # Python code block

    # Tool parameters (tool steps only)
    params: object      # Tool parameters

    # Dependencies
    depends_on: array   # List of step IDs to wait for

    # Error handling
    timeout: integer    # Step timeout (seconds)
    on_error: string    # Error handling: fail | continue | retry
    retry: object       # Retry configuration

Tool Step¶

steps:
  - id: fetch
    tool: http_get
    params:
      url: "{{ inputs.url }}"
      headers:
        Authorization: "Bearer {{ inputs.token }}"

Code Step¶

steps:
  - id: process
    code: |
      import json
      data = json.loads('{{ inputs.data }}')
      result = {"processed": data}

Dependencies¶

steps:
  - id: step1
    code: |
      result = "first"

  - id: step2
    depends_on: [step1]
    code: |
      result = "second"

  - id: step3
    depends_on: [step1, step2]
    code: |
      result = "third"

Outputs¶

Single Output¶

output: "{{ steps.final.output }}"

Multiple Outputs¶

outputs:
  - name: string        # Output name
    from_path: string   # Path to value (e.g., "steps.process.output.data")
    value: string       # Template expression (alternative to from_path)
    description: string # Human-readable description

Output Examples¶

outputs:
  - name: result
    from_path: steps.transform.output
    description: Transformed data

  - name: count
    value: "{{ steps.count.output }}"
    description: Number of items processed

Template Expressions¶

Use Jinja2 templates to reference values:

Expression	Description
`{{ inputs.name }}`	Input value
`{{ steps.id.output }}`	Step output
`{{ steps.id.output.field }}`	Nested field
`{{ value \\| tojson }}`	JSON encode
`{{ value \\| default('x') }}`	Default value

Complete Example¶

name: data-pipeline
version: "1.0"
description: Fetch, transform, and validate data

packages:
  profile: standard

defaults:
  timeout: 30
  on_error: fail

inputs:
  url:
    type: string
    description: API endpoint URL
  format:
    type: string
    enum: ["json", "csv"]
    default: "json"

steps:
  - id: fetch
    tool: http_get
    params:
      url: "{{ inputs.url }}"
    timeout: 60

  - id: transform
    depends_on: [fetch]
    code: |
      data = {{ steps.fetch.output }}
      result = [item for item in data if item.get("active")]

  - id: format
    depends_on: [transform]
    code: |
      import json
      data = {{ steps.transform.output }}
      result = json.dumps(data, indent=2)

output: "{{ steps.format.output }}"