Module Packaging Guide

Data Generators

How to configure generator blocks to automatically fetch or scaffold input data for a rescile module.

Data Generators

Data generators allow rescile to actively fetch or scaffold data to be processed by a module. Instead of requiring users to manually craft JSON or CSV inputs, [[generator]] blocks automatically execute scripts or commands (like CLI tools) to generate inputs or assets.

The [generators] Block

A generator defines a command to execute, how to trigger it, and where the output should go.

[inputs."aws_ec2.json"]
format = "object_of_objects"

[generators.fetch-aws-inventory]
description = "Fetches EC2 instances via AWS CLI"
target_input = "aws_ec2.json"
env = [
  "AWS_REGION={{ params.region }}"
]
command = ["aws", "ec2", "describe-instances", "--region", "{{ params.region }}", "--output", "json"]
from_stdout = true
ttl = "1h"

Scripts placed in the module’s generators/ directory can be executed simply by specifying their filename. The generators/ directory is automatically added to the PATH environment variable during execution.

Key Configuration Fields

  • target_input / target_asset: The explicit target file to populate.
  • command: An array of strings defining the executable and its arguments. Bypasses the shell, mitigating shell injection vulnerabilities. Executables within the generators/ directory can be called directly by name.
  • env: A list of environment variables to set for the command, formatted as "KEY=VALUE". Values can be static or rendered via Tera templating.
  • from_stdout: If true, captures standard output and pipes it directly to the corresponding [[input]] or [[asset]] file. Standard error is streamed to the user’s console logs.
  • abort_on_failure: If true, aborts the entire rescile process if the command returns a non-zero exit code.

Available Template Variables for env and command:

  • {{ env.VAR_NAME }}: Environment variables inherited from the system running rescile. By default, commands run in a sandbox where only standard OS variables (like PATH) are exposed. You must explicitly map system environment variables if your script needs them (e.g. "AWS_ACCESS_KEY_ID={{ env.AWS_ACCESS_KEY_ID }}").
  • {{ params.PARAM_NAME }}: Module parameters passed via the CLI (e.g., --module-params "region=us-east-1").
  • {{ target_asset }}: The absolute file path to the CSV asset file if target_asset is used.
  • {{ target_input }}: The absolute file path to the JSON input file if target_input is used.

Output Handling & Data Flow

Generators must write their output so rescile can load it. rescile supports two methods, but each generator must be paired directly with exactly one target target_input or target_asset file:

  1. Standard Output Capture (from_stdout = true): rescile captures the command’s stdout and writes it to the explicitly declared target file automatically.
  2. Sandboxed Environment Variables: If the script expects to write the file itself, you can expose the precise sandbox path by defining an environment variable in the list (e.g., "OUT_FILE={{ target_asset }}").
[generators.vmware-discovery]
target_asset = "vmware_inventory.csv"
env = [
  "OUT_FILE={{ target_asset }}",
  "VMWARE_TOKEN={{ env.VMWARETOKEN }}"
]
command = ["python3", "vmware_discover.py", "-o", "{{ target_asset }}"]
# alternative
# command = ["vmware_discover.py", "-o", "${OUT_FILE}"]
condition = "on_missing"

Execution Triggers & Caching

Since fetching external data can take significant time, rescile provides triggers and caching to improve the developer experience:

  • condition = "on_missing": The command runs only if the target output file does not exist. Ideal for heavy initial dataset seeding.
  • ttl = "1h" (Time-To-Live): Checks the modified time (mtime) of the target file. If the file is younger than the TTL (e.g., 15m, 1h, 2d), execution is skipped.
  • condition = "always" (Default): Runs on every execution unless a valid TTL skips it.

Note: You can forcefully bypass TTL caches by running the CLI with --refresh-generators. You can also entirely skip generator execution using --ignore-generators.

Security & Trust Model

Because remote modules can execute arbitrary code, rescile includes built-in safeguards:

  1. Trust on First Use (TOFU) & Interactive Prompts: When rescile-ce serve encounters a generator in an untrusted module, it halts and prompts the user. Selecting [a] saves the module to a local .rescile/trust.json file.
  2. Non-Interactive / CI/CD Safeguard: If rescile-importer runs in a CI pipeline without a TTY, the process fails securely. You must explicitly pass --trust-modules "https://github.com/.../aws-base" or --trust-all-modules via CLI flags.