Scrape jobs

Set up scrape jobs to extract deployed versions from Git repositories using YQ, JQ, regex, or manual entry.

Scrape jobs track your currently deployed software version. Most scrape jobs extract a version from a file in a Git repository using an agent. Alternatively, you can use the manual parse type to enter the version directly – no agent or repository needed. Either way, scrape jobs answer the question “what version are we running?”

What scrape jobs do

A scrape job clones a Git repository, reads a specific file, and applies a parser expression to extract a version string. The result is stored as a version snapshot – a point-in-time record of the version found.

Every scrape creates a new snapshot, even if the version has not changed. This gives you a complete history of your deployed version over time, including rollbacks.

When a scrape job completes, all alert configs that reference it are re-evaluated immediately.

Create a scrape job

Open the Scrape Jobs page from the sidebar.
Click Create Scrape Job.
Fill in the form:

Field	Required	Description
Name	No	A human-readable label
Repository URL	Yes (except manual)	Git clone URL (HTTPS or SSH)
Ref (Branch/Tag)	Yes (except manual)	Git ref to check out (default: `main`)
Target file	Yes (except manual)	Path to the file containing the version
Parser type	Yes	`yq`, `jq`, `regex`, or `manual`
Parse expression	Yes (except manual)	Parser-specific expression to extract the version
Credential name	No	Named credential for private repository access
Schedule	No	Cron expression for recurring runs
Version transform	No	Post-parse transformation for the version string
History limit	No	Maximum snapshots to retain (1-20)

Click Create.

success

Manual entry shortcut

When you select Manual Entry as the parser type, the repository, file, expression, credential, and schedule fields are hidden. Only the job name is required. See Manual version entry below for details.

Manual version entry

Manual scrape jobs let you enter a deployed version directly, without requiring agent infrastructure or Git repository access. This is useful for:

Demos and testing – quickly set up the full monitoring pipeline without deploying an agent
Environments without agents – track versions for systems where agent deployment is not practical
One-off checks – verify that rules, alerts, and notifications work correctly with a known version

Create a manual scrape job

Open the Scrape Jobs page from the sidebar.
Click Create Scrape Job.
Select Manual Entry as the parser type. The repository, file, expression, credential, and schedule fields are hidden automatically.
Enter a Name for the job (this is the only required field).
Optionally set a Version transform if you need to normalize the version format.
Click Create.

Set a version

On the Scrape Jobs page, expand the row for your manual scrape job.
Use the Set Version form to enter the deployed version string (for example, 1.2.3 or v2.0.0-rc1).
Submit the form.

The version is saved as a new version snapshot, exactly like an agent-discovered version. If the scrape job has a version transform configured, it is applied to the entered version before storage.

How it integrates with the pipeline

Manual scrape jobs participate in the full monitoring pipeline:

Alert configs – link a manual scrape job to a gather job and a rule, just like any other scrape job.
Rule evaluation – when you set a version, all alert configs referencing the scrape job are re-evaluated immediately.
Alerts – if the version violates a rule threshold, an alert is created or updated automatically.
Notifications – alert events (created, escalated, resolved) trigger notifications to your configured channels.

The only difference from agent-based scrape jobs is how the version enters the system – everything downstream is identical.

API usage

You can also set the version programmatically:

POST /api/v1/client/scrape-jobs/{id}/set-version
Content-Type: application/json

{
  "version": "1.2.3"
}

This is useful for CI/CD pipelines that want to report deployed versions directly to Planekeeper.

Parser types

Choose the parser type based on the file format you are reading.

info

Current parser implementation

The YQ and JQ parsers are lightweight, built-in path navigators — they do not shell out to the yq or jq CLI tools. This keeps Planekeeper dependency-free and fast, but it means the parsers support a subset of what the full CLI tools offer. Both use dot-notation path expressions (.field.subfield) rather than the full query languages.

The YQ parser supports array indexing (.dependencies[0].version) and can parse both YAML and JSON files, making it the more capable of the two. The JQ parser only supports simple key-based navigation and does not handle arrays.

We are actively exploring a more feature-rich parser implementation with broader query support. For now, if you need array access in JSON files, use the YQ parser. See the Parser types reference for detailed capabilities and limitations.

YQ (YAML and JSON files)

Use YQ for YAML configuration files like Chart.yaml, values.yaml, or Kubernetes manifests. The YQ parser also handles JSON files and is the only parser that supports array indexing.

Expression format: Dot-notation with array indexing.

Simple field

```yaml
# Chart.yaml
version: 5.51.4
```

**Expression:** `.version`
**Result:** `5.51.4`

Nested path

```yaml
# values.yaml
metadata:
  version: 2.1.0
```

**Expression:** `.metadata.version`
**Result:** `2.1.0`

Array access

```yaml
# Chart.yaml
dependencies:
  - name: argo-cd
    version: 5.51.4
```

**Expression:** `.dependencies[0].version`
**Result:** `5.51.4`

JQ (JSON files — simple key lookups)

Use JQ for simple key-based lookups in JSON files like package.json or composer.json.

Expression format: Dot-notation only — no array indexing, filters, or pipes.

Simple field

```json
{
  "version": "3.2.1"
}
```

**Expression:** `.version`
**Result:** `3.2.1`

Nested path

```json
{
  "dependencies": {
    "react": "18.2.0"
  }
}
```

**Expression:** `.dependencies.react`
**Result:** `18.2.0`

warning

Limited functionality

The JQ parser only supports simple dot-notation key access. It does not support array indexing (e.g., .items[0].version), filters, or pipes. If your JSON file requires array access, use the YQ parser instead — it handles both YAML and JSON with full array support.

Regex (any text file)

Use regex for any text file where the version is not in a structured format, or when you need precise control over extraction.

Expression format: A Go RE2 regular expression. If the pattern contains a capture group, the first captured group is returned. Otherwise, the full match is returned. Note that Go’s RE2 engine does not support lookahead, lookbehind, or backreferences — see Parser Types Reference for details.

Capture group

```
# Dockerfile
FROM nginx:1.25.3-alpine
```

**Expression:** `FROM nginx:([\d.]+)`
**Result:** `1.25.3`

Full line match

```
# VERSION file
v2.4.1
```

**Expression:** `^v(\d+\.\d+\.\d+)$`
**Result:** `2.4.1`

Key-value pair

```
# .env file
APP_VERSION=1.0.5
```

**Expression:** `APP_VERSION=([\d.]+)`
**Result:** `1.0.5`

Version transforms

After extracting the version, you can apply an optional transform to normalize the format:

Transform	Input	Output	Use case
`add_v_lower`	`1.2.3`	`v1.2.3`	Match tags that include a `v` prefix
`add_v_upper`	`1.2.3`	`V1.2.3`	Match tags with an uppercase `V` prefix
`strip_v_lower`	`v1.2.3`	`1.2.3`	Remove `v` prefix for clean comparison
`strip_v_upper`	`V1.2.3`	`1.2.3`	Remove `V` prefix for clean comparison

success

Use a version transform when the version format in your file does not match the format of the upstream releases. For example, if your Chart.yaml contains 1.2.3 but GitHub tags are v1.2.3, apply add_v_lower so the versions align for comparison.

Private repository access

For private Git repositories or container registries, configure credentials on the agent and reference them by name in the job configuration.

Security model

Planekeeper uses a decentralized credential model – all secrets stay local to the agent.

Credentials are defined in the agent’s local config.yaml file and never leave the machine.
During heartbeat, the agent advertises only credential names (e.g. github_pat) to the API server – secret values are never transmitted.
The API server uses these names for job routing: a job that requires a credential is only assigned to an agent that has advertised that name.
All authenticated Git clones and registry pulls happen locally on the agent.

info

Because secrets never reach the API server, compromising the server does not expose repository credentials. Each agent is responsible for securing its own config.yaml file.

Credential types

Three credential types are supported. SSH keys and HTTPS PATs are used by scrape jobs (Git repositories). Registry credentials are used by gather jobs (OCI container registries) but are configured the same way.

`ssh_key` – SSH key authentication

For repositories accessed via SSH URLs ([email protected]:org/repo.git).

Field	Required	Description
`type`	Yes	Must be `ssh_key`
`private_key_file`	One of these	Path to an SSH private key file (mount into the container)
`private_key`	One of these	Inline PEM content (alternative to file)
`passphrase`	No	Passphrase for encrypted keys

info

private_key_file and private_key are mutually exclusive. Use private_key_file when mounting a key from a Docker volume or host path. Use private_key when embedding the key directly in the config.

`https_pat` – HTTPS personal access token

For repositories accessed via HTTPS URLs (https://github.com/org/repo.git).

Field	Required	Description
`type`	Yes	Must be `https_pat`
`token`	Yes	Personal access token (GitHub, GitLab, Bitbucket, etc.)

`registry_basic` – OCI container registry

For private container registries (Docker Hub, GHCR, quay.io, etc.). Used by gather jobs that fetch release metadata from registries.

Field	Required	Description
`type`	Yes	Must be `registry_basic`
`username`	Yes	Registry username
`password`	Yes	Registry password or access token

Agent configuration

Credentials are defined under agent.credentials in the agent’s config.yaml file. Each credential has a name (the map key) that you reference when creating jobs.

Mount the config file into the agent container at /etc/planekeeper/config.yaml.

SSH key (file-based)

```yaml
agent:
  credentials:
    my_ssh_key:
      type: ssh_key
      private_key_file: /ssh/id_ed25519
      passphrase: ""  # optional
```

SSH key (inline)

```yaml
agent:
  credentials:
    my_inline_key:
      type: ssh_key
      private_key: |
        -----BEGIN OPENSSH PRIVATE KEY-----
        ...key content...
        -----END OPENSSH PRIVATE KEY-----
      passphrase: ""  # optional
```

HTTPS PAT

```yaml
agent:
  credentials:
    github_pat:
      type: https_pat
      token: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```

Registry

```yaml
agent:
  credentials:
    dockerhub:
      type: registry_basic
      username: myuser
      password: dckr_pat_xxxxxxxxxxxx
```

You can define multiple credentials of different types in the same config file:

agent:
  credentials:
    github_ssh:
      type: ssh_key
      private_key_file: /ssh/id_ed25519
    gitlab_pat:
      type: https_pat
      token: glpat-xxxxxxxxxxxxxxxxxxxx
    ghcr:
      type: registry_basic
      username: github-username
      password: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

How job assignment works

The agent reads credentials from config.yaml on startup.
During each heartbeat, the agent sends the list of credential names to the API server.
When a job with a credential name is due to run, the task engine assigns it only to agents that advertised that name.
The agent performs the authenticated Git clone or registry pull locally using the full credential.

warning

A job with a credential name will only be assigned to agents that have that credential. If no agent has the required credential, the job remains pending indefinitely.

See Agents for more on agent deployment and configuration.

Schedule and manual runs

Set a schedule

Add a cron expression to run the scrape job on a recurring basis:

Expression	Frequency
`0 /6 * *`	Every 6 hours
`0 0 * * *`	Daily at midnight
`/30 * * *`	Every 30 minutes

Trigger a manual run

Open the scrape job detail page.
Click Run Now.

Version snapshots and history

Every scrape run creates a new version snapshot. View the history on the scrape job detail page under Version History.

Each snapshot records:

The version string extracted
The Git commit SHA at the time of scraping
A timestamp of when the version was discovered

History limit: Set a limit (1-20) to control how many snapshots are retained. Older snapshots beyond this limit are automatically deleted during periodic cleanup. This prevents unbounded growth while keeping enough history for tracking version changes.

Rollback detection: Because every scrape creates a new snapshot regardless of whether the version changed, Planekeeper correctly detects rollbacks. If you deploy version 2.0.0 and then roll back to 1.5.0, the snapshot history shows the full sequence.

Bulk actions

Select multiple scrape jobs using the checkboxes on the list page, then click Delete Selected to remove them in a single operation. Use the checkbox in the table header to select all visible items.

Tips

Test your parse expression on a local copy of the file before creating the job. Make sure the expression returns only the version string, not surrounding text.
Use the Test button in the scrape job form to verify your regex compiles. Note that this only checks syntax validity — it does not test the pattern against file content. To verify extraction, test your pattern locally or use regex101.com with the Golang flavor. See Parser Types Reference for details on the Go RE2 engine.
Start with a short history limit (5-10) and increase it if you need more historical data.
If a scrape job consistently fails, check the job detail page for error messages. Common issues include incorrect file paths, unreachable repositories, or parse expressions that do not match the file content.

What scrape jobs do

Create a scrape job

Manual version entry

Create a manual scrape job

Set a version

How it integrates with the pipeline

API usage

Parser types

YQ (YAML and JSON files)

JQ (JSON files — simple key lookups)

Regex (any text file)

Version transforms

Private repository access

Security model

Credential types

ssh_key – SSH key authentication

https_pat – HTTPS personal access token

registry_basic – OCI container registry

Agent configuration

How job assignment works

Schedule and manual runs

Set a schedule

Trigger a manual run

Version snapshots and history

Bulk actions

Tips

`ssh_key` – SSH key authentication

`https_pat` – HTTPS personal access token

`registry_basic` – OCI container registry