Alex Goodman b3c70da3ea
Add experimental cataloger capabilities command (#4317)
* add info command from generated capabilities

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* correct gentoo and arch ecosystems

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* rename os pkg types

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* better binary cataloger description

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* expose metadata and pacakge types in json

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* expose json schema types

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add completeness tests for metadata types

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* latest generation

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix linting

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* improve testing a docs

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix tests and linting

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* restore goreleaser config

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* tweak diagram

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix pdm

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* chore: java binary data

Signed-off-by: Keith Zantow <kzantow@gmail.com>

* new capability descriptions for gguf and python

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* correct poetry lock integrity hash claim

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix compile error

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix: remove purl version from overrides

Signed-off-by: Keith Zantow <kzantow@gmail.com>

* fix lua deps ref

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* keep gguf as ai ecosystem

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* split packages.yaml to multiple files by go package

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* ensure tests do not use go test cache

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* sort json output for info command

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* docs: fix ocaml, php, and portage capabilities yaml

Signed-off-by: Will Murphy <willmurphyscode@users.noreply.github.com>

* chore: update erlang capabilities

Signed-off-by: Keith Zantow <kzantow@gmail.com>

* chore: update java capabilities

Signed-off-by: Keith Zantow <kzantow@gmail.com>

* chore: update javascript capabilities

Signed-off-by: Keith Zantow <kzantow@gmail.com>

* chore: update linux kernel capabilities

Signed-off-by: Keith Zantow <kzantow@gmail.com>

* remove missing tests

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix package.yaml references

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* revert license list change

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* check for drift in capability descriptions

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* regenerate capabilities

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* test cleanup

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* use fixture cache in static analysis

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* claim fixtures pre-req for cap generation

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* update documentation with correct regeneration procedure

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* chore: ruby-gemspec-cataloger finds no dependencies

Signed-off-by: Will Murphy <willmurphyscode@users.noreply.github.com>

* chore: fix python docs and config comment

Signed-off-by: Will Murphy <willmurphyscode@users.noreply.github.com>

* chore: commit re-generated java yaml

Signed-off-by: Will Murphy <willmurphyscode@users.noreply.github.com>

* add cataloger selection to caps command

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* re-generate cap yamls

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix tests for cataloger selection

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix cli test

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* add missing tests

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix linting

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* rename cmd to `cataloger info`

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* [wip] change capability description locations

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* [wip] continued

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* [wip] adjust for import cycles

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* correct docs

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* fix linting

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

---------

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Signed-off-by: Keith Zantow <kzantow@gmail.com>
Signed-off-by: Will Murphy <willmurphyscode@users.noreply.github.com>
Co-authored-by: Keith Zantow <kzantow@gmail.com>
Co-authored-by: Will Murphy <willmurphyscode@users.noreply.github.com>
2025-12-22 19:34:10 +00:00

228 lines
6.9 KiB
Markdown

# Cataloger Capabilities Documentation
This documentation describes the format and structure of cataloger capabilities YAML files.
## File Organization
Capabilities are organized as follows:
- **Cataloger capabilities**: Located in `syft/pkg/cataloger/*/capabilities.yaml` (one file per ecosystem, alongside the cataloger source code: `golang/capabilities.yaml`, `python/capabilities.yaml`, etc.)
- **Application configuration**: Located in `internal/capabilities/appconfig.yaml`
Each `capabilities.yaml` file is partially auto-generated. Run `go generate ./internal/capabilities` to regenerate.
- Fields marked **AUTO-GENERATED** will be updated during regeneration
- All **capabilities** sections are **MANUAL** - edit these to describe cataloger behavior
## Capability Sections
There are two types of capability sections depending on cataloger type:
### 1. Generic Catalogers (`type: generic`)
- Have capabilities at the **PARSER level**
- Each parser function has its own capabilities section
- Allows different parsers within the same cataloger to have different capabilities
### 2. Custom Catalogers (`type: custom`)
- Have capabilities at the **CATALOGER level**
- Single capabilities section for the entire cataloger
## Capabilities Format
Capabilities use a field-based format with defaults and optional conditional overrides:
```yaml
capabilities:
- field: <field-name> # dot-notation path (e.g., "license", "dependency.depth")
default: <value> # value when no conditions match
conditions: # optional - conditional overrides evaluated in order
- when: {ConfigField: val} # when these config fields match (AND logic)
value: <override-value> # use this value instead
comment: "explanation" # optional - why this condition exists
evidence: # optional - source code references
- "StructName.FieldName"
comment: "explanation" # optional - general field explanation
```
## Detector Conditions
Detectors (used by custom catalogers) can have optional conditions that control when they are active. This allows a single cataloger to have different detection behavior based on configuration.
### Structure
```yaml
detectors:
- method: glob # AUTO-GENERATED - detection method
criteria: ["**/*.jar"] # AUTO-GENERATED - patterns to match
comment: "always active" # MANUAL - optional explanation
- method: glob
criteria: ["**/*.zip"]
conditions: # MANUAL - when this detector is active
- when: {IncludeZipFiles: true} # config fields that must match
comment: "optional explanation"
comment: "ZIP detection requires config"
```
### Notes
- Conditions reference fields from the cataloger's config struct
- Multiple conditions in the array use **OR logic** (any condition can activate)
- Multiple fields in a `when` clause use **AND logic** (all must match)
- Detectors without conditions are always active
- Only custom catalogers support detectors with conditions
## Condition Evaluation
- Conditions are evaluated in array order (first match wins)
- Multiple fields in a `when` clause use **AND logic** (all must match)
- Multiple conditions in the array use **OR logic** (first matching condition)
- If no conditions match, the default value is used
## Capability Fields
Standard capability field names and their value types:
### `license` (boolean)
Whether license information is available.
**Examples:**
```yaml
default: true # always available
default: false # never available
default: false # requires configuration
conditions:
- when: {SearchRemoteLicenses: true}
value: true
```
### `dependency.depth` (array of strings)
Which dependency depths can be discovered.
**Values:** `direct` (immediate deps), `indirect` (transitive deps)
**Examples:**
```yaml
default: [direct] # only immediate dependencies
default: [direct, indirect] # full transitive closure
default: [] # no dependency information
```
### `dependency.edges` (string)
Relationships between nodes and completeness of the dependency graph.
**Values:**
- `""` - dependencies found but no edges between them
- `"flat"` - single level of dependencies with edges to root package only
- `"reduced"` - transitive reduction (redundant edges removed)
- `"complete"` - all relationships with accurate direct and indirect edges
**Examples:**
```yaml
default: complete
default: ""
```
### `dependency.kinds` (array of strings)
Types of dependencies that can be discovered.
**Values:** `runtime`, `dev`, `build`, `test`, `optional`
**Examples:**
```yaml
default: [runtime] # production dependencies only
default: [runtime, dev] # production and development
default: [runtime, dev, build] # all dependency types
default: [runtime] # with conditional dev deps
conditions:
- when: {IncludeDevDeps: true}
value: [runtime, dev]
```
### `package_manager.files.listing` (boolean)
Whether file listings are available (which files belong to the package).
**Examples:**
```yaml
default: true
default: false
conditions:
- when: {CaptureOwnedFiles: true}
value: true
```
### `package_manager.files.digests` (boolean)
Whether file digests/checksums are included in listings.
**Examples:**
```yaml
default: true
default: false
```
### `package_manager.package_integrity_hash` (boolean)
Whether a hash for verifying package integrity is available.
**Examples:**
```yaml
default: true
default: false
```
## Examples
### Simple cataloger with no configuration
```yaml
capabilities:
- name: license
default: true
comment: "license field always present in package.json"
- name: dependency.depth
default: [direct]
- name: dependency.edges
default: ""
- name: dependency.kinds
default: [runtime]
comment: "devDependencies not parsed by this cataloger"
- name: package_manager.files.listing
default: false
- name: package_manager.files.digests
default: false
- name: package_manager.package_integrity_hash
default: false
```
### Cataloger with configuration-dependent capabilities
```yaml
capabilities:
- name: license
default: false
conditions:
- when: {SearchLocalModCacheLicenses: true}
value: true
comment: "searches for licenses in GOPATH mod cache"
- when: {SearchRemoteLicenses: true}
value: true
comment: "fetches licenses from proxy.golang.org"
comment: "license scanning requires configuration"
- name: dependency.depth
default: [direct, indirect]
- name: dependency.edges
default: flat
- name: dependency.kinds
default: [runtime, dev]
- name: package_manager.files.listing
default: false
- name: package_manager.files.digests
default: false
- name: package_manager.package_integrity_hash
default: true
evidence:
- "GolangBinaryBuildinfoEntry.H1Digest"
```