Capabilities Generation System
This internal tool is responsible for:
- partially generating the
packages.yamlfile, which documents what capabilities each cataloger in syft has - running completeness / consistency tests of the claims from
packages.yamlagainst actual test observation
Syft has dozens of catalogers across many ecosystems. Each cataloger has different capabilities, such as:
- Some provide license information, others don't
- Some detect transitive dependencies, others only direct
- Some capabilities depend on configuration
The packages.yaml contains all of these capability claims.
The capabilities generation system itself:
- Discovers cataloger information from source code using AST parsing
- Extracts metadata about parsers, detectors, and configuration from code and tests
- Merges discovered information with manually-maintained capability documentation
- Validates that the generated document is complete and in sync with the codebase
Why do this? The short answer is to provide a foundation for the OSS documentation, where the source of truth for facts about the capabilities of Syft can be derived from verifiable claims from the tool itself.
Quick Start
To regenerate packages.yaml after code changes:
go generate ./internal/capabilities
To run validation of capability claims:
# update test evidence
go test ./syft/pkg/...
# check claims against test evidence
go test ./internal/capabilities/generate
Data Flow
graph LR
subgraph "Source Code Inputs"
A1[syft/pkg/cataloger/*/<br/>cataloger.go]
A2[syft/pkg/cataloger/*/<br/>config.go]
A3[cmd/syft/internal/options/<br/>catalog.go, ecosystem.go]
A4[syft task factories<br/>AllCatalogers]
end
subgraph "Test Inputs"
B1[test-fixtures/<br/>test-observations.json]
end
subgraph "Discovery Processes"
C1[discover_catalogers.go<br/>AST Parse Catalogers]
C2[discover_cataloger_configs.go<br/>AST Parse Configs]
C3[discover_app_config.go<br/>AST Parse App Configs]
C4[discover_metadata.go<br/>Read Observations]
C5[cataloger_config_linking.go<br/>Link Catalogers to Configs]
C6[cataloger_names.go<br/>Query Task Factories]
end
subgraph "Discovered Data"
D1[Generic Catalogers<br/>name, parsers, detectors]
D2[Config Structs<br/>fields, app-config keys]
D3[App Config Fields<br/>keys, descriptions, defaults]
D4[Metadata Types<br/>per parser/cataloger]
D5[Package Types<br/>per parser/cataloger]
D6[Cataloger-Config Links<br/>mapping]
D7[Selectors<br/>tags per cataloger]
end
subgraph "Configuration/Overrides"
E2[metadataTypeCoverageExceptions<br/>packageTypeCoverageExceptions<br/>observationExceptions]
E1[catalogerTypeOverrides<br/>catalogerConfigOverrides<br/>catalogerConfigExceptions]
end
subgraph "Merge Process"
F1[io.go<br/>Load Existing YAML]
F2[merge.go<br/>Merge Logic]
F3[Preserve MANUAL fields<br/>Update AUTO-GENERATED]
end
subgraph "Output"
G1[packages.yaml<br/>Complete Catalog Document]
end
subgraph "Validation"
H1[completeness_test.go<br/>Comprehensive Tests]
H2[metadata_check.go<br/>Type Coverage]
end
A1 --> C1
A2 --> C2
A3 --> C3
A4 --> C6
B1 --> C4
C1 --> D1
C2 --> D2
C3 --> D3
C4 --> D4
C4 --> D5
C5 --> D6
C6 --> D7
D1 --> F2
D2 --> F2
D3 --> F2
D4 --> F2
D5 --> F2
D6 --> F2
D7 --> F2
E1 -.configure.-> F2
E2 -.configure.-> H1
F1 --> F3
F2 --> F3
F3 --> G1
G1 --> H1
G1 --> H2
style D1 fill:#e1f5ff
style D2 fill:#e1f5ff
style D3 fill:#e1f5ff
style D4 fill:#e1f5ff
style D5 fill:#e1f5ff
style D6 fill:#e1f5ff
style D7 fill:#e1f5ff
style G1 fill:#c8e6c9
style E1 fill:#fff9c4
style E2 fill:#fff9c4
Key Data Flows
- Cataloger Discovery: AST parser walks
syft/pkg/cataloger/to findgeneric.NewCataloger()calls and extract parser information - Config Discovery: AST parser finds config structs and extracts fields with
// app-config:annotations - App Config Discovery: AST parser extracts ecosystem configurations from options package, including descriptions and defaults
- Metadata Discovery: JSON reader loads test observations that record what metadata/package types each parser produces
- Linking: AST analyzer connects catalogers to their config structs by examining constructor parameters
- Merge: Discovered data combines with existing YAML, preserving all manually-maintained capability sections
- Validation: Comprehensive tests ensure the output is complete and synchronized with codebase
The packages.yaml File
Purpose
internal/capabilities/packages.yaml is the canonical documentation of:
- Every cataloger in syft
- What files/patterns each cataloger detects
- What metadata and package types each cataloger produces
- What capabilities each cataloger has (licenses, dependencies, etc.)
- How configuration affects these capabilities
Structure
# File header with usage instructions (AUTO-GENERATED)
application: # AUTO-GENERATED
# Application-level config keys with descriptions
- key: golang.search-local-mod-cache-licenses
description: search for go package licences in the GOPATH...
default_value: false
configs: # AUTO-GENERATED
# Config struct definitions
golang.CatalogerConfig:
fields:
- key: SearchLocalModCacheLicenses
description: SearchLocalModCacheLicenses enables...
app_key: golang.search-local-mod-cache-licenses
catalogers: # Mixed AUTO-GENERATED structure, MANUAL capabilities
- ecosystem: golang # MANUAL
name: go-module-cataloger # AUTO-GENERATED
type: generic # AUTO-GENERATED
source: # AUTO-GENERATED
file: syft/pkg/cataloger/golang/cataloger.go
function: NewGoModuleBinaryCataloger
config: golang.CatalogerConfig # AUTO-GENERATED
selectors: [go, golang, ...] # AUTO-GENERATED
parsers: # AUTO-GENERATED structure
- function: parseGoMod # AUTO-GENERATED
detector: # AUTO-GENERATED
method: glob
criteria: ["**/go.mod"]
metadata_types: # AUTO-GENERATED
- pkg.GolangModuleEntry
package_types: # AUTO-GENERATED
- go-module
json_schema_types: # AUTO-GENERATED
- GolangModEntry
capabilities: # MANUAL - preserved across regeneration
- name: license
default: false
conditions:
- when: {SearchRemoteLicenses: true}
value: true
comment: fetches licenses from proxy.golang.org
- name: dependency.depth
default: [direct, indirect]
- name: dependency.edges
default: complete
AUTO-GENERATED vs MANUAL Fields
AUTO-GENERATED Fields
These are updated on every regeneration:
Cataloger Level:
name- cataloger identifiertype- "generic" or "custom"source.file- source file pathsource.function- constructor function nameconfig- linked config struct nameselectors- tags from task factories
Parser Level (generic catalogers):
function- parser function name (as used in the generic cataloger)detector.method- glob/path/mimetypedetector.criteria- patterns matchedmetadata_types- from test-observations.jsonpackage_types- from test-observations.jsonjson_schema_types- converted from metadata_types
Custom Cataloger Level:
metadata_types- from test-observations.jsonpackage_types- from test-observations.jsonjson_schema_types- converted from metadata_types
Sections:
- Entire
application:section: a flat mapping of the application config keys relevant to catalogers - Entire
configs:section: a flat mapping of the API-level cataloger config keys, for each cataloger (map of maps)
MANUAL Fields
These are preserved across regeneration and must be edited by hand:
ecosystem- ecosystem/language identifier (cataloger level)capabilities- capability definitions with conditionsdetectors- for custom catalogers (except binary-classifier-cataloger)conditionson detectors - when detector is active based on config
How Regeneration Works
When you run go generate ./internal/capabilities:
- Loads existing YAML into both a struct (for logic) and a node tree (for comment preservation)
- Discovers all cataloger data from source code and tests
- Merges discovered data with existing:
- Updates AUTO-GENERATED fields
- Preserves all MANUAL fields (capabilities, ecosystem, etc.)
- Adds annotations (
# AUTO-GENERATED,# MANUAL) to field comments
- Writes back using the node tree to preserve all comments
- Validates the result with completeness tests
Note
Don't forget to update test observation evidence with
go test ./syft/pkg/...before regeneration.
Generation Process
High-Level Workflow
1. Discovery Phase
├─ Parse cataloger source code (AST)
├─ Find all parsers and detectors
├─ Read test observations for metadata types
├─ Discover config structs and fields
├─ Discover app-level configurations
└─ Link catalogers to their configs
2. Merge Phase
├─ Load existing packages.yaml
├─ Process each cataloger:
│ ├─ Update AUTO-GENERATED fields
│ └─ Preserve MANUAL fields
├─ Add new catalogers
└─ Detect orphaned entries
3. Write Phase
├─ Update YAML node tree in-place
├─ Add field annotations
└─ Write to disk
4. Validation Phase
├─ Check all catalogers present
├─ Check metadata/package type coverage
└─ Run completeness tests
Detailed Discovery Processes
1. Generic Cataloger Discovery (discover_catalogers.go)
What it finds: catalogers using the generic.NewCataloger() pattern
Process:
- Walk
syft/pkg/cataloger/recursively for.gofiles - Parse each file with Go AST parser (
go/ast,go/parser) - Find functions matching pattern:
New*Cataloger() pkg.Cataloger - Within function body, find
generic.NewCataloger(name, ...)call - Extract cataloger name from first argument
- Find all chained
WithParserBy*()calls:generic.NewCataloger("my-cataloger"). WithParserByGlobs(parseMyFormat, "**/*.myformat"). WithParserByMimeTypes(parseMyBinary, "application/x-mytype") - For each parser call:
- Extract parser function name (e.g.,
parseMyFormat) - Extract detection method (Globs/Path/MimeTypes)
- Extract criteria (patterns or mime types)
- Resolve constant references across files if needed
- Extract parser function name (e.g.,
Output: map[string]DiscoveredCataloger with full parser information
2. Config Discovery (discover_cataloger_configs.go)
What it finds: cataloger configuration structs
Process:
- Find all
.gofiles insyft/pkg/cataloger/*/ - Look for structs with "Config" in their name
- For each config struct:
- Extract struct fields
- Look for
// app-config: key.nameannotations in field comments - Extract field descriptions from doc comments
- Filter results by whitelist (only configs referenced in
pkgcataloging.Config)
Example source:
type CatalogerConfig struct {
// SearchLocalModCacheLicenses enables searching for go package licenses
// in the local GOPATH mod cache.
// app-config: golang.search-local-mod-cache-licenses
SearchLocalModCacheLicenses bool
}
Output: map[string]ConfigInfo with field details and app-config keys
3. App Config Discovery (discover_app_config.go)
What it finds: application-level configuration from the options package
Process:
- Parse
cmd/syft/internal/options/catalog.goto findCatalogstruct - Extract ecosystem config fields (e.g.,
Golang golangConfig) - For each ecosystem:
- Find the config file (e.g.,
golang.go) - Parse the config struct
- Find
DescribeFields() []FieldDescriptionmethod - Extract field descriptions from the returned descriptions
- Find
default*Config()function and extract default values
- Find the config file (e.g.,
- Build full key paths (e.g.,
golang.search-local-mod-cache-licenses)
Example source:
// golang.go
type golangConfig struct {
SearchLocalModCacheLicenses bool `yaml:"search-local-mod-cache-licenses" ...`
}
func (c golangConfig) DescribeFields(opts ...options.DescribeFieldsOption) []options.FieldDescription {
return []options.FieldDescription{
{
Name: "search-local-mod-cache-licenses",
Description: "search for go package licences in the GOPATH...",
},
}
}
Output: []AppConfigField with keys, descriptions, and defaults
4. Cataloger-Config Linking (cataloger_config_linking.go)
What it finds: which config struct each cataloger uses
Process:
- For each discovered cataloger, find its constructor function
- Extract the first parameter type from the function signature
- Filter for types that look like configs (contain "Config")
- Build mapping: cataloger name → config struct name
- Apply manual overrides from
catalogerConfigOverridesmap - Apply exceptions from
catalogerConfigExceptionsset
Example:
// Constructor signature:
func NewGoModuleBinaryCataloger(cfg golang.CatalogerConfig) pkg.Cataloger
// Results in link:
"go-module-binary-cataloger" → "golang.CatalogerConfig"
Output: map[string]string (cataloger → config mapping)
5. Metadata Discovery (discover_metadata.go)
What it finds: metadata types and package types each parser produces
Process:
- Find all
test-fixtures/test-observations.jsonfiles - Parse JSON which contains:
{ "package": "golang", "parsers": { "parseGoMod": { "metadata_types": ["pkg.GolangModuleEntry"], "package_types": ["go-module"] } }, "catalogers": { "linux-kernel-cataloger": { "metadata_types": ["pkg.LinuxKernel"], "package_types": ["linux-kernel"] } } } - Build index by package name and parser function
- Apply to discovered catalogers:
- Parser-level observations → attached to specific parsers
- Cataloger-level observations → for custom catalogers
- Convert metadata types to JSON schema types using
packagemetadataregistry
Why this exists: the AST parser can't determine what types a parser produces just by reading code. This information comes from test execution.
Output: populated MetadataTypes and PackageTypes on catalogers/parsers
Input Sources
1. Source Code Inputs
Cataloger Constructors (syft/pkg/cataloger/*/cataloger.go)
What's extracted:
- Cataloger names
- Parser function names
- Detection methods (glob, path, mimetype)
- Detection criteria (patterns)
Example:
func NewGoModuleBinaryCataloger() pkg.Cataloger {
return generic.NewCataloger("go-module-binary-cataloger").
WithParserByGlobs(parseGoBin, "**/go.mod").
WithParserByMimeTypes(parseGoArchive, "application/x-archive")
}
Config Structs (syft/pkg/cataloger/*/config.go)
What's extracted:
- Config struct fields
- Field types
- Field descriptions from comments
- App-config key mappings from annotations
Example:
type CatalogerConfig struct {
// SearchRemoteLicenses enables downloading go package licenses from the upstream
// go proxy (typically proxy.golang.org).
// app-config: golang.search-remote-licenses
SearchRemoteLicenses bool
// LocalModCacheDir specifies the location of the local go module cache directory.
// When not set, syft will attempt to discover the GOPATH env or default to $HOME/go.
// app-config: golang.local-mod-cache-dir
LocalModCacheDir string
}
Options Package (cmd/syft/internal/options/)
What's extracted:
- Ecosystem config structs
- App-level configuration keys
- Field descriptions from
DescribeFields()methods - Default values from
default*Config()functions
Example:
// catalog.go
type Catalog struct {
Golang golangConfig `yaml:"golang" json:"golang" mapstructure:"golang"`
}
// golang.go
func (c golangConfig) DescribeFields(opts ...options.DescribeFieldsOption) []options.FieldDescription {
return []options.FieldDescription{
{
Name: "search-remote-licenses",
Description: "search for go package licences by retrieving the package from a network proxy",
},
}
}
2. Test-Driven Inputs
test-observations.json Files
Location: syft/pkg/cataloger/*/test-fixtures/test-observations.json
Purpose: records what metadata and package types each parser produces during test execution
How they're generated: automatically by the pkgtest.CatalogTester helpers when tests run
Example test code:
func TestGoModuleCataloger(t *testing.T) {
tester := NewGoModuleBinaryCataloger()
pkgtest.NewCatalogTester().
FromDirectory(t, "test-fixtures/go-module-fixture").
TestCataloger(t, tester) // Auto-writes observations on first run
}
Example observations file:
{
"package": "golang",
"parsers": {
"parseGoMod": {
"metadata_types": ["pkg.GolangModuleEntry"],
"package_types": ["go-module"]
},
"parseGoSum": {
"metadata_types": ["pkg.GolangModuleEntry"],
"package_types": ["go-module"]
}
}
}
Why this exists:
- Metadata types can't be determined from AST parsing alone
- Ensures tests use the pkgtest helpers (enforced by
TestAllCatalogers HaveObservations) - Provides test coverage visibility
3. Syft Runtime Inputs
Task Factories (allPackageCatalogerInfo())
What's extracted:
- Canonical list of all catalogers (ensures sync with binary)
- Selectors (tags) for each cataloger
Example:
info := cataloger.CatalogerInfo{
Name: "go-module-binary-cataloger",
Selectors: []string{"go", "golang", "binary", "language", "package"},
}
4. Global Configuration Variables
Merge Logic Overrides (merge.go)
// catalogerTypeOverrides forces a specific cataloger type when discovery gets it wrong
var catalogerTypeOverrides = map[string]string{
"java-archive-cataloger": "custom", // technically generic but treated as custom
}
// catalogerConfigExceptions lists catalogers that should NOT have config linked
var catalogerConfigExceptions = strset.New(
"binary-classifier-cataloger",
)
// catalogerConfigOverrides manually specifies config when linking fails
var catalogerConfigOverrides = map[string]string{
"dotnet-portable-executable-cataloger": "dotnet.CatalogerConfig",
"nix-store-cataloger": "nix.Config",
}
When to update:
- Add to
catalogerTypeOverrideswhen a cataloger's type is misdetected - Add to
catalogerConfigExceptionswhen a cataloger shouldn't have config - Add to
catalogerConfigOverrideswhen automatic config linking fails
Completeness Test Configuration (completeness_test.go)
// requireParserObservations controls observation validation strictness
// - true: fail if ANY parser is missing observations (strict)
// - false: only check custom catalogers (lenient, current mode)
const requireParserObservations = false
// metadataTypeCoverageExceptions lists metadata types allowed to not be documented
var metadataTypeCoverageExceptions = strset.New(
reflect.TypeOf(pkg.MicrosoftKbPatch{}).Name(),
)
// packageTypeCoverageExceptions lists package types allowed to not be documented
var packageTypeCoverageExceptions = strset.New(
string(pkg.JenkinsPluginPkg),
string(pkg.KbPkg),
)
// observationExceptions maps cataloger/parser names to observation types to skip
// - nil value: skip ALL observation checks for this cataloger/parser
// - set value: skip only specified observation types
var observationExceptions = map[string]*strset.Set{
"graalvm-native-image-cataloger": nil, // skip all checks
"linux-kernel-cataloger": strset.New("relationships"), // skip only relationships
}
When to update:
- Add to exceptions when a type is intentionally not documented
- Add to
observationExceptionswhen a cataloger lacks reliable test fixtures - Set
requireParserObservations = truewhen ready to enforce full parser coverage
Completeness Tests
Purpose
The completeness_test.go file ensures packages.yaml stays in perfect sync with the codebase. These tests catch:
- New catalogers that haven't been documented
- Orphaned cataloger entries (cataloger was removed but YAML wasn't updated)
- Missing metadata/package type documentation
- Invalid capability field references
- Catalogers not using test helpers
Test Categories
1. Synchronization Tests
TestCatalogersInSync
- Ensures all catalogers from
syft cataloger listappear in YAML - Ensures all catalogers in YAML exist in the binary
- Ensures all capabilities sections are filled (no TODOs/nulls)
Failure means: you added/removed a cataloger but didn't regenerate packages.yaml
Fix: run go generate ./internal/capabilities
TestCapabilitiesAreUpToDate
- Runs only in CI
- Ensures regeneration succeeds
- Ensures generated file has no uncommitted changes
Failure means: packages.yaml wasn't regenerated after code changes
Fix: run go generate ./internal/capabilities and commit changes
2. Coverage Tests
TestPackageTypeCoverage
- Ensures all types in
pkg.AllPkgsare documented in some cataloger - Allows exceptions via
packageTypeCoverageExceptions
Failure means: you added a new package type but no cataloger documents it
Fix: either add a cataloger entry or add to exceptions if intentionally not supported
TestMetadataTypeCoverage
- Ensures all types in
packagemetadata.AllTypes()are documented - Allows exceptions via
metadataTypeCoverageExceptions
Failure means: you added a new metadata type but no cataloger produces it
Fix: either add metadata_types to a cataloger or add to exceptions
TestMetadataTypesHaveJSONSchemaTypes
- Ensures metadata_types and json_schema_types are synchronized
- Validates every metadata type has a corresponding json_schema_type with correct conversion
- Checks both cataloger-level and parser-level types
Failure means: metadata_types and json_schema_types are out of sync
Fix: run go generate ./internal/capabilities to regenerate synchronized types
3. Structure Tests
TestCatalogerStructure
- Validates generic vs custom cataloger structure rules:
- Generic catalogers must have parsers, no cataloger-level capabilities
- Custom catalogers must have detectors and cataloger-level capabilities
- Ensures ecosystem is always set
Failure means: cataloger structure doesn't follow conventions
Fix: correct the cataloger structure in packages.yaml
TestCatalogerDataQuality
- Checks for duplicate cataloger names
- Validates detector formats for custom catalogers
- Checks for duplicate parser functions within catalogers
Failure means: data integrity issue in packages.yaml
Fix: remove duplicates or fix detector definitions
4. Config Tests
TestConfigCompleteness
- Ensures all configs in the
configs:section are referenced by a cataloger - Ensures all cataloger config references exist
- Ensures all app-key references exist in
application:section
Failure means: orphaned config or broken reference
Fix: remove unused configs or add missing entries
TestAppConfigFieldsHaveDescriptions
- Ensures all application config fields have descriptions
Failure means: missing DescribeFields() entry
Fix: add description in the ecosystem's DescribeFields() method
TestAppConfigKeyFormat
- Validates config keys follow format:
ecosystem.field-name - Ensures kebab-case (no underscores or spaces)
Failure means: malformed config key
Fix: rename the config key to follow conventions
5. Capability Tests
TestCapabilityConfigFieldReferences
- Validates that config fields referenced in capability conditions actually exist
- Checks both cataloger-level and parser-level capabilities
Example failure:
capabilities:
- name: license
conditions:
- when: {NonExistentField: true} # ← this field doesn't exist in config struct
value: true
Fix: correct the field name to match the actual config struct
TestCapabilityFieldNaming
- Ensures capability field names follow known patterns:
licensedependency.depthdependency.edgesdependency.kindspackage_manager.files.listingpackage_manager.files.digestspackage_manager.package_integrity_hash
Failure means: typo in capability field name
Fix: correct the typo or add new field to known list
TestCapabilityValueTypes
- Validates capability values match expected types:
- Boolean fields:
license,package_manager.* - Array fields:
dependency.depth,dependency.kinds - String fields:
dependency.edges
- Boolean fields:
Example failure:
capabilities:
- name: license
default: "yes" # ← should be boolean true/false
Fix: use correct type for the field
TestCapabilityEvidenceFieldReferences
- Validates that evidence references point to real struct fields
- Uses AST parsing to verify field paths exist
Example:
capabilities:
- name: package_manager.files.digests
default: true
evidence:
- AlpmDBEntry.Files[].Digests # ← validates this path exists
Failure means: typo in evidence reference or struct was changed
Fix: correct the evidence reference or update after struct changes
6. Observations Test
TestCatalogersHaveTestObservations
- Ensures all custom catalogers have test observations
- Optionally checks parsers (controlled by
requireParserObservations) - Allows exceptions via
observationExceptions
Failure means: cataloger tests aren't using pkgtest helpers
Fix: update tests to use pkgtest.CatalogTester:
pkgtest.NewCatalogTester().
FromDirectory(t, "test-fixtures/my-fixture").
TestCataloger(t, myCataloger)
How to Fix Test Failures
General Approach
- Read the test error message - it usually tells you exactly what's wrong
- Check if regeneration needed - most failures fixed by:
go generate ./internal/capabilities - Check for code/test changes - did you add/modify a cataloger?
- Consider exceptions - is this intentionally unsupported?
Common Failures and Fixes
| Failure | Most Likely Cause | Fix |
|---|---|---|
| Cataloger not in YAML | Added new cataloger | Regenerate |
| Orphaned YAML entry | Removed cataloger | Regenerate |
| Missing metadata type | Added type but no test observations | Add pkgtest usage or exception |
| Missing observations | Test not using pkgtest | Update test to use CatalogTester |
| Config field reference | Typo in capability condition | Fix field name in YAML |
| Incomplete capabilities | Missing capability definition | Add capabilities section to YAML |
Manual Maintenance
What Requires Manual Editing
these fields in packages.yaml are MANUAL and must be maintained by hand:
1. Ecosystem Field (Cataloger Level)
catalogers:
- ecosystem: golang # MANUAL - identify the ecosystem
Guidelines: use the ecosystem/language name (golang, python, java, rust, etc.)
2. Capabilities Sections
For Generic Catalogers (parser level):
parsers:
- function: parseGoMod
capabilities: # MANUAL
- name: license
default: false
conditions:
- when: {SearchRemoteLicenses: true}
value: true
comment: fetches licenses from proxy.golang.org
- name: dependency.depth
default: [direct, indirect]
- name: dependency.edges
default: complete
For Custom Catalogers (cataloger level):
catalogers:
- name: linux-kernel-cataloger
type: custom
capabilities: # MANUAL
- name: license
default: true
3. Detectors for Custom Catalogers
For most custom catalogers:
detectors: # MANUAL
- method: glob
criteria:
- '**/lib/modules/**/modules.builtin'
comment: kernel modules directory
Exception: binary-classifier-cataloger has AUTO-GENERATED detectors extracted from source
4. Detector Conditions
when a detector should only be active with certain configuration:
detectors:
- method: glob
criteria: ['**/*.zip']
conditions: # MANUAL
- when: {IncludeZipFiles: true}
comment: ZIP detection requires explicit config
Capabilities Format and Guidelines
Standard Capability Fields
Boolean Fields:
- name: license
default: true # always available
# OR
default: false # never available
# OR
default: false
conditions:
- when: {SearchRemoteLicenses: true}
value: true
comment: requires network access to fetch licenses
Array Fields (dependency.depth):
- name: dependency.depth
default: [direct] # only immediate dependencies
# OR
default: [direct, indirect] # full transitive closure
# OR
default: [] # no dependency information
String Fields (dependency.edges):
- name: dependency.edges
default: "" # dependencies found but no edges between them
# OR
default: flat # single level of dependencies with edges to root only
# OR
default: reduced # transitive reduction (redundant edges removed)
# OR
default: complete # all relationships with accurate direct/indirect edges
Array Fields (dependency.kinds):
- name: dependency.kinds
default: [runtime] # production dependencies only
# OR
default: [runtime, dev] # production and development
# OR
default: [runtime, dev, build, test] # all dependency types
Using Conditions
Conditions allow capabilities to vary based on configuration values:
capabilities:
- name: license
default: false
conditions:
- when: {SearchLocalModCacheLicenses: true}
value: true
comment: searches for licenses in GOPATH mod cache
- when: {SearchRemoteLicenses: true}
value: true
comment: fetches licenses from proxy.golang.org
comment: license scanning requires configuration
Rules:
- Conditions are evaluated in array order (first match wins)
- Multiple fields WITHIN a
whenclause use AND logic (all must match) - Multiple conditions in the array use OR logic (first matching condition)
- If no conditions match, the
defaultvalue is used
Adding Evidence
evidence documents which struct fields provide the capability:
- name: package_manager.files.listing
default: true
evidence:
- AlpmDBEntry.Files
comment: file listings stored in Files array
For nested fields:
evidence:
- CondaMetaPackage.PathsData.Paths
For array element fields:
evidence:
- AlpmDBEntry.Files[].Digests
Best Practices
- Be specific in comments: explain WHY, not just WHAT
- Document conditions clearly: explain what configuration enables the capability
- Use evidence references: helps verify capabilities are accurate
- Test after edits: run
go test ./internal/capabilities/generateto validate
Development Workflows
Adding a New Cataloger
If Using generic.NewCataloger():
What happens automatically:
- Generator discovers the cataloger via AST parsing
- Extracts parsers, detectors, and patterns
- Adds entry to packages.yaml with structure
- Links to config (if constructor has config parameter)
- Extracts metadata types from test-observations.json (if test uses pkgtest)
What you must do manually:
- Set the
ecosystemfield in packages.yaml - Add
capabilitiessections to each parser - Run
go generate ./internal/capabilities - Commit the updated packages.yaml
Example workflow:
# 1. Write cataloger code
vim syft/pkg/cataloger/mynew/cataloger.go
# 2. Write tests using pkgtest (generates observations)
vim syft/pkg/cataloger/mynew/cataloger_test.go
# 3. Run tests to generate observations
go test ./syft/pkg/cataloger/mynew
# 4. Regenerate packages.yaml
go generate ./internal/capabilities
# 5. Edit packages.yaml manually
vim internal/capabilities/packages.yaml
# - Set ecosystem field
# - Add capabilities sections
# 6. Validate
go test ./internal/capabilities/generate
# 7. Commit
git add internal/capabilities/packages.yaml
git add syft/pkg/cataloger/mynew/test-fixtures/test-observations.json
git commit
If Writing a Custom Cataloger:
What happens automatically:
- Generator creates entry with name and type
- Extracts metadata types from test-observations.json
What you must do manually:
- Set
ecosystem - Add
detectorsarray with detection methods - Add
capabilitiessection (cataloger level, not parser level) - Run
go generate ./internal/capabilities
Modifying an Existing Cataloger
If Changing Parser Detection Patterns:
Impact: AUTO-GENERATED field, automatically updated
Workflow:
# 1. Change the code
vim syft/pkg/cataloger/something/cataloger.go
# 2. Regenerate
go generate ./internal/capabilities
# 3. Review changes
git diff internal/capabilities/packages.yaml
# 4. Commit
git add internal/capabilities/packages.yaml
git commit
If Changing Metadata Type:
Impact: AUTO-GENERATED field, updated via test observations
Workflow:
# 1. Change the code
vim syft/pkg/cataloger/something/parser.go
# 2. Update tests (if needed)
vim syft/pkg/cataloger/something/parser_test.go
# 3. Run tests to update observations
go test ./syft/pkg/cataloger/something
# 4. Regenerate
go generate ./internal/capabilities
# 5. Commit
git add internal/capabilities/packages.yaml
git add syft/pkg/cataloger/something/test-fixtures/test-observations.json
git commit
If Changing Capabilities:
Impact: MANUAL field, preserved across regeneration
Workflow:
# 1. Edit packages.yaml directly
vim internal/capabilities/packages.yaml
# 2. Validate
go test ./internal/capabilities/generate
# 3. Commit
git commit internal/capabilities/packages.yaml
Adding New Capability Fields
if you need to add a completely new capability field (e.g., package_manager.build_tool_info):
Steps:
- Add field name to known fields in
TestCapabilityFieldNaming(completeness_test.go) - Add value type validation to
validateCapabilityValueType()(completeness_test.go) - Update file header documentation in packages.yaml
- Add the field to relevant catalogers in packages.yaml
- Update any runtime code that consumes capabilities
When to Update Exceptions
Add to catalogerTypeOverrides:
- Discovery incorrectly classifies a cataloger's type
- Example: cataloger uses generic framework but behaves like custom
Add to catalogerConfigExceptions:
- Cataloger should not have config linked
- Example: simple catalogers with no configuration
Add to catalogerConfigOverrides:
- Automatic config linking fails
- Cataloger in a subpackage or unusual structure
- Example: dotnet catalogers split across multiple packages
Add to metadataTypeCoverageExceptions:
- Metadata type is deprecated or intentionally unused
- Example:
MicrosoftKbPatch(special case type)
Add to packageTypeCoverageExceptions:
- Package type is deprecated or special case
- Example:
JenkinsPluginPkg,KbPkg
Add to observationExceptions:
- Cataloger lacks reliable test fixtures (e.g., requires specific binaries)
- Cataloger produces relationships but they're not standard dependencies
- Example:
graalvm-native-image-cataloger(requires native images)
File Inventory
Core Generation
main.go: entry point, orchestrates regeneration, prints status messagesmerge.go: core merging logic, preserves manual sections while updating auto-generatedio.go: YAML reading/writing with comment preservation using gopkg.in/yaml.v3
Discovery
discover_catalogers.go: AST parsing to discover generic catalogers and parsers from source codediscover_cataloger_configs.go: AST parsing to discover cataloger config structsdiscover_app_config.go: AST parsing to discover application-level config from options packagecataloger_config_linking.go: links catalogers to config structs by analyzing constructorsdiscover_metadata.go: reads test-observations.json files to get metadata/package types
Validation & Utilities
completeness_test.go: comprehensive test suite ensuring packages.yaml is complete and syncedcataloger_names.go: helper to get all cataloger names from syft task factoriesmetadata_check.go: validates metadata and package type coverage
Tests
config_discovery_test.go: tests for config discoverycataloger_config_linking_test.go: tests for config linkingdetector_validation_test.go: tests for detector validationmerge_test.go: tests for merge logic
Troubleshooting
"Cataloger X not found in packages.yaml"
Cause: you added a new cataloger but didn't regenerate packages.yaml
Fix:
go generate ./internal/capabilities
"Cataloger X in YAML but not in binary"
Cause: you removed a cataloger but didn't regenerate
Fix:
go generate ./internal/capabilities
# Review the diff - the cataloger entry should be removed
"Metadata type X not represented in any cataloger"
Cause: you added a new metadata type but:
- No cataloger produces it yet, OR
- Tests don't use pkgtest helpers (so observations aren't generated)
Fix Option 1 - Add test observations:
// Update test to use pkgtest
pkgtest.NewCatalogTester().
FromDirectory(t, "test-fixtures/my-fixture").
TestCataloger(t, myCataloger)
// Run tests
go test ./syft/pkg/cataloger/mypackage
// Regenerate
go generate ./internal/capabilities
Fix Option 2 - Add exception (if intentionally unused):
// completeness_test.go
var metadataTypeCoverageExceptions = strset.New(
reflect.TypeOf(pkg.MyNewType{}).Name(),
)
"Parser X has no test observations"
Cause: test doesn't use pkgtest helpers
Fix:
// Before:
func TestMyParser(t *testing.T) {
// manual test code
}
// After:
func TestMyParser(t *testing.T) {
cataloger := NewMyCataloger()
pkgtest.NewCatalogTester().
FromDirectory(t, "test-fixtures/my-fixture").
TestCataloger(t, cataloger)
}
"Config field X not found in struct Y"
Cause: capability condition references a non-existent config field
Fix: edit packages.yaml and correct the field name:
# Before:
conditions:
- when: {SerachRemoteLicenses: true} # typo!
# After:
conditions:
- when: {SearchRemoteLicenses: true}
"Evidence field X.Y not found in struct X"
Cause:
- Typo in evidence reference, OR
- Struct was refactored and field moved/renamed
Fix: edit packages.yaml and correct the evidence reference:
# Before:
evidence:
- AlpmDBEntry.FileListing # wrong field name
# After:
evidence:
- AlpmDBEntry.Files
"packages.yaml has uncommitted changes after regeneration"
Cause: packages.yaml is out of date (usually caught in CI)
Fix:
go generate ./internal/capabilities
git add internal/capabilities/packages.yaml
git commit -m "chore: regenerate capabilities"
Generator Fails with "struct X not found"
Cause: config linking trying to link to a non-existent struct
Fix Option 1 - Add override:
// merge.go
var catalogerConfigOverrides = map[string]string{
"my-cataloger": "mypackage.MyConfig",
}
Fix Option 2 - Add exception:
// merge.go
var catalogerConfigExceptions = strset.New(
"my-cataloger", // doesn't use config
)
"Parser capabilities must be defined"
Cause: parser in packages.yaml has no capabilities section
Fix: add capabilities to the parser:
parsers:
- function: parseMyFormat
capabilities:
- name: license
default: false
- name: dependency.depth
default: []
# ... (add all required capability fields)
Understanding Error Messages
most test failures include detailed guidance. Look for:
- List of missing items: tells you exactly what to add/remove
- Suggestions: usually includes the command to fix (e.g., "Run 'go generate ./internal/capabilities'")
- File locations: tells you which file to edit
General debugging approach:
- Read the full error message
- Check if it's fixed by regeneration
- Check for recent code/test changes
- Consider if it should be an exception
- Ask for help if still stuck (include full error message)
Questions or Issues?
if you encounter problems not covered here:
- Check test error messages (they're usually quite helpful)
- Look at recent commits for examples of similar changes
- Ask in the team chat with the full error message