Scrape jobs
Set up scrape jobs to extract deployed versions from Git repositories using YQ, JQ, regex, or manual entry.
Scrape jobs track your currently deployed software version. Most scrape jobs extract a version from a file in a Git repository using an agent. Alternatively, you can use the manual parse type to enter the version directly – no agent or repository needed. Either way, scrape jobs answer the question “what version are we running?”
What scrape jobs do
A scrape job clones a Git repository, reads a specific file, and applies a parser expression to extract a version string. The result is stored as a version snapshot – a point-in-time record of the version found.
Every scrape creates a new snapshot, even if the version has not changed. This gives you a complete history of your deployed version over time, including rollbacks.
When a scrape job completes, all alert configs that reference it are re-evaluated immediately.
Create a scrape job
- Open the Scrape Jobs page from the sidebar.
- Click Create Scrape Job.
- Fill in the form:
| Field | Required | Description |
|---|---|---|
| Name | No | A human-readable label |
| Repository URL | Yes (except manual) | Git clone URL (HTTPS or SSH) |
| Ref (Branch/Tag) | Yes (except manual) | Git ref to check out (default: main) |
| Target file | Yes (except manual) | Path to the file containing the version |
| Parser type | Yes | yq, jq, regex, or manual |
| Parse expression | Yes (except manual) | Parser-specific expression to extract the version |
| Credential name | No | Named credential for private repository access |
| Schedule | No | Cron expression for recurring runs |
| Version transform | No | Post-parse transformation for the version string |
| History limit | No | Maximum snapshots to retain (1-20) |
- Click Create.
Manual entry shortcut
When you select Manual Entry as the parser type, the repository, file, expression, credential, and schedule fields are hidden. Only the job name is required. See Manual version entry below for details.
Manual version entry
Manual scrape jobs let you enter a deployed version directly, without requiring agent infrastructure or Git repository access. This is useful for:
- Demos and testing – quickly set up the full monitoring pipeline without deploying an agent
- Environments without agents – track versions for systems where agent deployment is not practical
- One-off checks – verify that rules, alerts, and notifications work correctly with a known version
Create a manual scrape job
- Open the Scrape Jobs page from the sidebar.
- Click Create Scrape Job.
- Select Manual Entry as the parser type. The repository, file, expression, credential, and schedule fields are hidden automatically.
- Enter a Name for the job (this is the only required field).
- Optionally set a Version transform if you need to normalize the version format.
- Click Create.
Set a version
- On the Scrape Jobs page, expand the row for your manual scrape job.
- Use the Set Version form to enter the deployed version string (for example,
1.2.3orv2.0.0-rc1). - Submit the form.
The version is saved as a new version snapshot, exactly like an agent-discovered version. If the scrape job has a version transform configured, it is applied to the entered version before storage.
How it integrates with the pipeline
Manual scrape jobs participate in the full monitoring pipeline:
- Alert configs – link a manual scrape job to a gather job and a rule, just like any other scrape job.
- Rule evaluation – when you set a version, all alert configs referencing the scrape job are re-evaluated immediately.
- Alerts – if the version violates a rule threshold, an alert is created or updated automatically.
- Notifications – alert events (created, escalated, resolved) trigger notifications to your configured channels.
The only difference from agent-based scrape jobs is how the version enters the system – everything downstream is identical.
API usage
You can also set the version programmatically:
POST /api/v1/client/scrape-jobs/{id}/set-version
Content-Type: application/json
{
"version": "1.2.3"
}
This is useful for CI/CD pipelines that want to report deployed versions directly to Planekeeper.
Parser types
Choose the parser type based on the file format you are reading.
Current parser implementation
The YQ and JQ parsers are lightweight, built-in path navigators — they do not shell out to the yq or jq CLI tools. This keeps Planekeeper dependency-free and fast, but it means the parsers support a subset of what the full CLI tools offer. Both use dot-notation path expressions (.field.subfield) rather than the full query languages.
The YQ parser supports array indexing (.dependencies[0].version) and can parse both YAML and JSON files, making it the more capable of the two. The JQ parser only supports simple key-based navigation and does not handle arrays.
We are actively exploring a more feature-rich parser implementation with broader query support. For now, if you need array access in JSON files, use the YQ parser. See the Parser types reference for detailed capabilities and limitations.
YQ (YAML and JSON files)
Use YQ for YAML configuration files like Chart.yaml, values.yaml, or Kubernetes manifests. The YQ parser also handles JSON files and is the only parser that supports array indexing.
Expression format: Dot-notation with array indexing.
Simple field
```yaml
# Chart.yaml
version: 5.51.4
```
**Expression:** `.version`
**Result:** `5.51.4`
Nested path
```yaml
# values.yaml
metadata:
version: 2.1.0
```
**Expression:** `.metadata.version`
**Result:** `2.1.0`
Array access
```yaml
# Chart.yaml
dependencies:
- name: argo-cd
version: 5.51.4
```
**Expression:** `.dependencies[0].version`
**Result:** `5.51.4`
JQ (JSON files — simple key lookups)
Use JQ for simple key-based lookups in JSON files like package.json or composer.json.
Expression format: Dot-notation only — no array indexing, filters, or pipes.
Simple field
```json
{
"version": "3.2.1"
}
```
**Expression:** `.version`
**Result:** `3.2.1`
Nested path
```json
{
"dependencies": {
"react": "18.2.0"
}
}
```
**Expression:** `.dependencies.react`
**Result:** `18.2.0`
Limited functionality
The JQ parser only supports simple dot-notation key access. It does not support array indexing (e.g., .items[0].version), filters, or pipes. If your JSON file requires array access, use the YQ parser instead — it handles both YAML and JSON with full array support.
Regex (any text file)
Use regex for any text file where the version is not in a structured format, or when you need precise control over extraction.
Expression format: A Go RE2 regular expression. If the pattern contains a capture group, the first captured group is returned. Otherwise, the full match is returned. Note that Go’s RE2 engine does not support lookahead, lookbehind, or backreferences — see Parser Types Reference for details.
Capture group
```
# Dockerfile
FROM nginx:1.25.3-alpine
```
**Expression:** `FROM nginx:([\d.]+)`
**Result:** `1.25.3`
Full line match
```
# VERSION file
v2.4.1
```
**Expression:** `^v(\d+\.\d+\.\d+)$`
**Result:** `2.4.1`
Key-value pair
```
# .env file
APP_VERSION=1.0.5
```
**Expression:** `APP_VERSION=([\d.]+)`
**Result:** `1.0.5`
Version transforms
After extracting the version, you can apply an optional transform to normalize the format:
| Transform | Input | Output | Use case |
|---|---|---|---|
add_v_lower | 1.2.3 | v1.2.3 | Match tags that include a v prefix |
add_v_upper | 1.2.3 | V1.2.3 | Match tags with an uppercase V prefix |
strip_v_lower | v1.2.3 | 1.2.3 | Remove v prefix for clean comparison |
strip_v_upper | V1.2.3 | 1.2.3 | Remove V prefix for clean comparison |
Chart.yaml contains 1.2.3 but GitHub tags are v1.2.3, apply add_v_lower so the versions align for comparison.Private repository access
For private Git repositories or container registries, configure credentials on the agent and reference them by name in the job configuration.
Security model
Planekeeper uses a decentralized credential model – all secrets stay local to the agent.
- Credentials are defined in the agent’s local
config.yamlfile and never leave the machine. - During heartbeat, the agent advertises only credential names (e.g.
github_pat) to the API server – secret values are never transmitted. - The API server uses these names for job routing: a job that requires a credential is only assigned to an agent that has advertised that name.
- All authenticated Git clones and registry pulls happen locally on the agent.
config.yaml file.Credential types
Three credential types are supported. SSH keys and HTTPS PATs are used by scrape jobs (Git repositories). Registry credentials are used by gather jobs (OCI container registries) but are configured the same way.
ssh_key – SSH key authentication
For repositories accessed via SSH URLs ([email protected]:org/repo.git).
| Field | Required | Description |
|---|---|---|
type | Yes | Must be ssh_key |
private_key_file | One of these | Path to an SSH private key file (mount into the container) |
private_key | One of these | Inline PEM content (alternative to file) |
passphrase | No | Passphrase for encrypted keys |
private_key_file and private_key are mutually exclusive. Use private_key_file when mounting a key from a Docker volume or host path. Use private_key when embedding the key directly in the config.https_pat – HTTPS personal access token
For repositories accessed via HTTPS URLs (https://github.com/org/repo.git).
| Field | Required | Description |
|---|---|---|
type | Yes | Must be https_pat |
token | Yes | Personal access token (GitHub, GitLab, Bitbucket, etc.) |
registry_basic – OCI container registry
For private container registries (Docker Hub, GHCR, quay.io, etc.). Used by gather jobs that fetch release metadata from registries.
| Field | Required | Description |
|---|---|---|
type | Yes | Must be registry_basic |
username | Yes | Registry username |
password | Yes | Registry password or access token |
Agent configuration
Credentials are defined under agent.credentials in the agent’s config.yaml file. Each credential has a name (the map key) that you reference when creating jobs.
Mount the config file into the agent container at /etc/planekeeper/config.yaml.
SSH key (file-based)
```yaml
agent:
credentials:
my_ssh_key:
type: ssh_key
private_key_file: /ssh/id_ed25519
passphrase: "" # optional
```
SSH key (inline)
```yaml
agent:
credentials:
my_inline_key:
type: ssh_key
private_key: |
-----BEGIN OPENSSH PRIVATE KEY-----
...key content...
-----END OPENSSH PRIVATE KEY-----
passphrase: "" # optional
```
HTTPS PAT
```yaml
agent:
credentials:
github_pat:
type: https_pat
token: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
```
Registry
```yaml
agent:
credentials:
dockerhub:
type: registry_basic
username: myuser
password: dckr_pat_xxxxxxxxxxxx
```
You can define multiple credentials of different types in the same config file:
agent:
credentials:
github_ssh:
type: ssh_key
private_key_file: /ssh/id_ed25519
gitlab_pat:
type: https_pat
token: glpat-xxxxxxxxxxxxxxxxxxxx
ghcr:
type: registry_basic
username: github-username
password: ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
How job assignment works
- The agent reads credentials from
config.yamlon startup. - During each heartbeat, the agent sends the list of credential names to the API server.
- When a job with a credential name is due to run, the task engine assigns it only to agents that advertised that name.
- The agent performs the authenticated Git clone or registry pull locally using the full credential.
See Agents for more on agent deployment and configuration.
Schedule and manual runs
Set a schedule
Add a cron expression to run the scrape job on a recurring basis:
| Expression | Frequency |
|---|---|
0 */6 * * * | Every 6 hours |
0 0 * * * | Daily at midnight |
*/30 * * * * | Every 30 minutes |
Trigger a manual run
- Open the scrape job detail page.
- Click Run Now.
Version snapshots and history
Every scrape run creates a new version snapshot. View the history on the scrape job detail page under Version History.
Each snapshot records:
- The version string extracted
- The Git commit SHA at the time of scraping
- A timestamp of when the version was discovered
History limit: Set a limit (1-20) to control how many snapshots are retained. Older snapshots beyond this limit are automatically deleted during periodic cleanup. This prevents unbounded growth while keeping enough history for tracking version changes.
Rollback detection: Because every scrape creates a new snapshot regardless of whether the version changed, Planekeeper correctly detects rollbacks. If you deploy version 2.0.0 and then roll back to 1.5.0, the snapshot history shows the full sequence.
Bulk actions
Select multiple scrape jobs using the checkboxes on the list page, then click Delete Selected to remove them in a single operation. Use the checkbox in the table header to select all visible items.
Tips
- Test your parse expression on a local copy of the file before creating the job. Make sure the expression returns only the version string, not surrounding text.
- Use the Test button in the scrape job form to verify your regex compiles. Note that this only checks syntax validity — it does not test the pattern against file content. To verify extraction, test your pattern locally or use regex101.com with the Golang flavor. See Parser Types Reference for details on the Go RE2 engine.
- Start with a short history limit (5-10) and increase it if you need more historical data.
- If a scrape job consistently fails, check the job detail page for error messages. Common issues include incorrect file paths, unreachable repositories, or parse expressions that do not match the file content.