oss/syft

mirror of https://github.com/anchore/syft.git synced 2025-11-17 08:23:15 +01:00

History

Alex Goodman a97e1c6e1a tweak diagram

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

2025-10-29 15:18:36 -04:00

testdata

fix tests and linting

2025-10-29 11:55:02 -04:00

.gitignore

add info command from generated capabilities

2025-10-13 17:14:40 -04:00

cataloger_config_linking_test.go

fix tests and linting

2025-10-29 11:55:02 -04:00

cataloger_config_linking.go

improve testing a docs

2025-10-29 10:08:29 -04:00

cataloger_names.go

improve testing a docs

2025-10-29 10:08:29 -04:00

completeness_test.go

fix tests and linting

2025-10-29 11:55:02 -04:00

discover_app_config_test.go

fix tests and linting

2025-10-29 11:55:02 -04:00

discover_app_config.go

fix tests and linting

2025-10-29 11:55:02 -04:00

discover_cataloger_configs_test.go

fix tests and linting

2025-10-29 11:55:02 -04:00

discover_cataloger_configs.go

fix tests and linting

2025-10-29 11:55:02 -04:00

discover_catalogers_test.go

fix tests and linting

2025-10-29 11:55:02 -04:00

discover_catalogers.go

fix tests and linting

2025-10-29 11:55:02 -04:00

discover_metadata_test.go

improve testing a docs

2025-10-29 10:08:29 -04:00

discover_metadata.go

improve testing a docs

2025-10-29 10:08:29 -04:00

io_test.go

improve testing a docs

2025-10-29 10:08:29 -04:00

io.go

improve testing a docs

2025-10-29 10:08:29 -04:00

main.go

improve testing a docs

2025-10-29 10:08:29 -04:00

merge_test.go

improve testing a docs

2025-10-29 10:08:29 -04:00

merge.go

improve testing a docs

2025-10-29 10:08:29 -04:00

metadata_check.go

improve testing a docs

2025-10-29 10:08:29 -04:00

README.md

tweak diagram

2025-10-29 15:18:36 -04:00

README.md

Capabilities Generation System

This internal tool is responsible for:

partially generating the packages.yaml file, which documents what capabilities each cataloger in syft has
running completeness / consistency tests of the claims from packages.yaml against actual test observation

Syft has dozens of catalogers across many ecosystems. Each cataloger has different capabilities, such as:

Some provide license information, others don't
Some detect transitive dependencies, others only direct
Some capabilities depend on configuration

The packages.yaml contains all of these capability claims.

The capabilities generation system itself:

Discovers cataloger information from source code using AST parsing
Extracts metadata about parsers, detectors, and configuration from code and tests
Merges discovered information with manually-maintained capability documentation
Validates that the generated document is complete and in sync with the codebase

Why do this? The short answer is to provide a foundation for the OSS documentation, where the source of truth for facts about the capabilities of Syft can be derived from verifiable claims from the tool itself.

Quick Start

To regenerate packages.yaml after code changes:

go generate ./internal/capabilities

To run validation of capability claims:

# update test evidence
go test ./syft/pkg/...

# check claims against test evidence
go test ./internal/capabilities/generate

Data Flow

graph LR
    subgraph "Source Code Inputs"
        A1[syft/pkg/cataloger/*/<br/>cataloger.go]
        A2[syft/pkg/cataloger/*/<br/>config.go]
        A3[cmd/syft/internal/options/<br/>catalog.go, ecosystem.go]
        A4[syft task factories<br/>AllCatalogers]
    end

    subgraph "Test Inputs"
        B1[test-fixtures/<br/>test-observations.json]
    end

    subgraph "Discovery Processes"
        C1[discover_catalogers.go<br/>AST Parse Catalogers]
        C2[discover_cataloger_configs.go<br/>AST Parse Configs]
        C3[discover_app_config.go<br/>AST Parse App Configs]
        C4[discover_metadata.go<br/>Read Observations]
        C5[cataloger_config_linking.go<br/>Link Catalogers to Configs]
        C6[cataloger_names.go<br/>Query Task Factories]
    end

    subgraph "Discovered Data"
        D1[Generic Catalogers<br/>name, parsers, detectors]
        D2[Config Structs<br/>fields, app-config keys]
        D3[App Config Fields<br/>keys, descriptions, defaults]
        D4[Metadata Types<br/>per parser/cataloger]
        D5[Package Types<br/>per parser/cataloger]
        D6[Cataloger-Config Links<br/>mapping]
        D7[Selectors<br/>tags per cataloger]
    end

    subgraph "Configuration/Overrides"
        E2[metadataTypeCoverageExceptions<br/>packageTypeCoverageExceptions<br/>observationExceptions]
        E1[catalogerTypeOverrides<br/>catalogerConfigOverrides<br/>catalogerConfigExceptions]
    end

    subgraph "Merge Process"
        F1[io.go<br/>Load Existing YAML]
        F2[merge.go<br/>Merge Logic]
        F3[Preserve MANUAL fields<br/>Update AUTO-GENERATED]
    end

    subgraph "Output"
        G1[packages.yaml<br/>Complete Catalog Document]
    end

    subgraph "Validation"
        H1[completeness_test.go<br/>Comprehensive Tests]
        H2[metadata_check.go<br/>Type Coverage]
    end

    A1 --> C1
    A2 --> C2
    A3 --> C3
    A4 --> C6
    B1 --> C4

    C1 --> D1
    C2 --> D2
    C3 --> D3
    C4 --> D4
    C4 --> D5
    C5 --> D6
    C6 --> D7

    D1 --> F2
    D2 --> F2
    D3 --> F2
    D4 --> F2
    D5 --> F2
    D6 --> F2
    D7 --> F2

    E1 -.configure.-> F2
    E2 -.configure.-> H1

    F1 --> F3
    F2 --> F3
    F3 --> G1

    G1 --> H1
    G1 --> H2

    style D1 fill:#e1f5ff
    style D2 fill:#e1f5ff
    style D3 fill:#e1f5ff
    style D4 fill:#e1f5ff
    style D5 fill:#e1f5ff
    style D6 fill:#e1f5ff
    style D7 fill:#e1f5ff
    style G1 fill:#c8e6c9
    style E1 fill:#fff9c4
    style E2 fill:#fff9c4

Key Data Flows

Cataloger Discovery: AST parser walks syft/pkg/cataloger/ to find generic.NewCataloger() calls and extract parser information
Config Discovery: AST parser finds config structs and extracts fields with // app-config: annotations
App Config Discovery: AST parser extracts ecosystem configurations from options package, including descriptions and defaults
Metadata Discovery: JSON reader loads test observations that record what metadata/package types each parser produces
Linking: AST analyzer connects catalogers to their config structs by examining constructor parameters
Merge: Discovered data combines with existing YAML, preserving all manually-maintained capability sections
Validation: Comprehensive tests ensure the output is complete and synchronized with codebase

The `packages.yaml` File

Purpose

internal/capabilities/packages.yaml is the canonical documentation of:

Every cataloger in syft
What files/patterns each cataloger detects
What metadata and package types each cataloger produces
What capabilities each cataloger has (licenses, dependencies, etc.)
How configuration affects these capabilities

Structure

# File header with usage instructions (AUTO-GENERATED)

application:  # AUTO-GENERATED
  # Application-level config keys with descriptions
  - key: golang.search-local-mod-cache-licenses
    description: search for go package licences in the GOPATH...
    default_value: false

configs:  # AUTO-GENERATED
  # Config struct definitions
  golang.CatalogerConfig:
    fields:
      - key: SearchLocalModCacheLicenses
        description: SearchLocalModCacheLicenses enables...
        app_key: golang.search-local-mod-cache-licenses

catalogers:  # Mixed AUTO-GENERATED structure, MANUAL capabilities
  - ecosystem: golang            # MANUAL
    name: go-module-cataloger    # AUTO-GENERATED
    type: generic                # AUTO-GENERATED
    source:                      # AUTO-GENERATED
      file: syft/pkg/cataloger/golang/cataloger.go
      function: NewGoModuleBinaryCataloger
    config: golang.CatalogerConfig  # AUTO-GENERATED
    selectors: [go, golang, ...]    # AUTO-GENERATED
    parsers:                     # AUTO-GENERATED structure
      - function: parseGoMod     # AUTO-GENERATED
        detector:                # AUTO-GENERATED
          method: glob
          criteria: ["**/go.mod"]
        metadata_types:          # AUTO-GENERATED
          - pkg.GolangModuleEntry
        package_types:           # AUTO-GENERATED
          - go-module
        json_schema_types:       # AUTO-GENERATED
          - GolangModEntry
        capabilities:            # MANUAL - preserved across regeneration
          - name: license
            default: false
            conditions:
              - when: {SearchRemoteLicenses: true}
                value: true
                comment: fetches licenses from proxy.golang.org
          - name: dependency.depth
            default: [direct, indirect]
          - name: dependency.edges
            default: complete

AUTO-GENERATED vs MANUAL Fields

AUTO-GENERATED Fields

These are updated on every regeneration:

Cataloger Level:

name - cataloger identifier
type - "generic" or "custom"
source.file - source file path
source.function - constructor function name
config - linked config struct name
selectors - tags from task factories

Parser Level (generic catalogers):

function - parser function name (as used in the generic cataloger)
detector.method - glob/path/mimetype
detector.criteria - patterns matched
metadata_types - from test-observations.json
package_types - from test-observations.json
json_schema_types - converted from metadata_types

Custom Cataloger Level:

metadata_types - from test-observations.json
package_types - from test-observations.json
json_schema_types - converted from metadata_types

Sections:

Entire application: section: a flat mapping of the application config keys relevant to catalogers
Entire configs: section: a flat mapping of the API-level cataloger config keys, for each cataloger (map of maps)

MANUAL Fields

These are preserved across regeneration and must be edited by hand:

ecosystem - ecosystem/language identifier (cataloger level)
capabilities - capability definitions with conditions
detectors - for custom catalogers (except binary-classifier-cataloger)
conditions on detectors - when detector is active based on config

How Regeneration Works

When you run go generate ./internal/capabilities:

Loads existing YAML into both a struct (for logic) and a node tree (for comment preservation)
Discovers all cataloger data from source code and tests
Merges discovered data with existing:
- Updates AUTO-GENERATED fields
- Preserves all MANUAL fields (capabilities, ecosystem, etc.)
- Adds annotations (# AUTO-GENERATED, # MANUAL) to field comments
Writes back using the node tree to preserve all comments
Validates the result with completeness tests

Note

Don't forget to update test observation evidence with go test ./syft/pkg/... before regeneration.

Generation Process

High-Level Workflow

1. Discovery Phase
   ├─ Parse cataloger source code (AST)
   ├─ Find all parsers and detectors
   ├─ Read test observations for metadata types
   ├─ Discover config structs and fields
   ├─ Discover app-level configurations
   └─ Link catalogers to their configs

2. Merge Phase
   ├─ Load existing packages.yaml
   ├─ Process each cataloger:
   │  ├─ Update AUTO-GENERATED fields
   │  └─ Preserve MANUAL fields
   ├─ Add new catalogers
   └─ Detect orphaned entries

3. Write Phase
   ├─ Update YAML node tree in-place
   ├─ Add field annotations
   └─ Write to disk

4. Validation Phase
   ├─ Check all catalogers present
   ├─ Check metadata/package type coverage
   └─ Run completeness tests

Detailed Discovery Processes

1. Generic Cataloger Discovery (`discover_catalogers.go`)

What it finds: catalogers using the generic.NewCataloger() pattern

Process:

Walk syft/pkg/cataloger/ recursively for .go files
Parse each file with Go AST parser (go/ast, go/parser)
Find functions matching pattern: New*Cataloger() pkg.Cataloger
Within function body, find generic.NewCataloger(name, ...) call
Extract cataloger name from first argument

Find all chained WithParserBy*() calls:

generic.NewCataloger("my-cataloger").
    WithParserByGlobs(parseMyFormat, "**/*.myformat").
    WithParserByMimeTypes(parseMyBinary, "application/x-mytype")

For each parser call:
- Extract parser function name (e.g., parseMyFormat)
- Extract detection method (Globs/Path/MimeTypes)
- Extract criteria (patterns or mime types)
- Resolve constant references across files if needed

Output: map[string]DiscoveredCataloger with full parser information

2. Config Discovery (`discover_cataloger_configs.go`)

What it finds: cataloger configuration structs

Process:

Find all .go files in syft/pkg/cataloger/*/
Look for structs with "Config" in their name
For each config struct:
- Extract struct fields
- Look for // app-config: key.name annotations in field comments
- Extract field descriptions from doc comments
Filter results by whitelist (only configs referenced in pkgcataloging.Config)

Example source:

type CatalogerConfig struct {
    // SearchLocalModCacheLicenses enables searching for go package licenses
    // in the local GOPATH mod cache.
    // app-config: golang.search-local-mod-cache-licenses
    SearchLocalModCacheLicenses bool
}

Output: map[string]ConfigInfo with field details and app-config keys

3. App Config Discovery (`discover_app_config.go`)

What it finds: application-level configuration from the options package

Process:

Parse cmd/syft/internal/options/catalog.go to find Catalog struct
Extract ecosystem config fields (e.g., Golang golangConfig)
For each ecosystem:
- Find the config file (e.g., golang.go)
- Parse the config struct
- Find DescribeFields() []FieldDescription method
- Extract field descriptions from the returned descriptions
- Find default*Config() function and extract default values
Build full key paths (e.g., golang.search-local-mod-cache-licenses)

Example source:

// golang.go
type golangConfig struct {
    SearchLocalModCacheLicenses bool `yaml:"search-local-mod-cache-licenses" ...`
}

func (c golangConfig) DescribeFields(opts ...options.DescribeFieldsOption) []options.FieldDescription {
    return []options.FieldDescription{
        {
            Name: "search-local-mod-cache-licenses",
            Description: "search for go package licences in the GOPATH...",
        },
    }
}

Output: []AppConfigField with keys, descriptions, and defaults

4. Cataloger-Config Linking (`cataloger_config_linking.go`)

What it finds: which config struct each cataloger uses

Process:

For each discovered cataloger, find its constructor function
Extract the first parameter type from the function signature
Filter for types that look like configs (contain "Config")
Build mapping: cataloger name → config struct name
Apply manual overrides from catalogerConfigOverrides map
Apply exceptions from catalogerConfigExceptions set

Example:

// Constructor signature:
func NewGoModuleBinaryCataloger(cfg golang.CatalogerConfig) pkg.Cataloger

// Results in link:
"go-module-binary-cataloger" → "golang.CatalogerConfig"

Output: map[string]string (cataloger → config mapping)

5. Metadata Discovery (`discover_metadata.go`)

What it finds: metadata types and package types each parser produces

Process:

Find all test-fixtures/test-observations.json files

Parse JSON which contains:

{
  "package": "golang",
  "parsers": {
    "parseGoMod": {
      "metadata_types": ["pkg.GolangModuleEntry"],
      "package_types": ["go-module"]
    }
  },
  "catalogers": {
    "linux-kernel-cataloger": {
      "metadata_types": ["pkg.LinuxKernel"],
      "package_types": ["linux-kernel"]
    }
  }
}

Build index by package name and parser function
Apply to discovered catalogers:
- Parser-level observations → attached to specific parsers
- Cataloger-level observations → for custom catalogers
Convert metadata types to JSON schema types using packagemetadata registry

Why this exists: the AST parser can't determine what types a parser produces just by reading code. This information comes from test execution.

Output: populated MetadataTypes and PackageTypes on catalogers/parsers

Input Sources

1. Source Code Inputs

Cataloger Constructors (`syft/pkg/cataloger/*/cataloger.go`)

What's extracted:

Cataloger names
Parser function names
Detection methods (glob, path, mimetype)
Detection criteria (patterns)

Example:

func NewGoModuleBinaryCataloger() pkg.Cataloger {
    return generic.NewCataloger("go-module-binary-cataloger").
        WithParserByGlobs(parseGoBin, "**/go.mod").
        WithParserByMimeTypes(parseGoArchive, "application/x-archive")
}

Config Structs (`syft/pkg/cataloger/*/config.go`)

What's extracted:

Config struct fields
Field types
Field descriptions from comments
App-config key mappings from annotations

Example:

type CatalogerConfig struct {
    // SearchRemoteLicenses enables downloading go package licenses from the upstream
    // go proxy (typically proxy.golang.org).
    // app-config: golang.search-remote-licenses
    SearchRemoteLicenses bool

    // LocalModCacheDir specifies the location of the local go module cache directory.
    // When not set, syft will attempt to discover the GOPATH env or default to $HOME/go.
    // app-config: golang.local-mod-cache-dir
    LocalModCacheDir string
}

Options Package (`cmd/syft/internal/options/`)

What's extracted:

Ecosystem config structs
App-level configuration keys
Field descriptions from DescribeFields() methods
Default values from default*Config() functions

Example:

// catalog.go
type Catalog struct {
    Golang golangConfig `yaml:"golang" json:"golang" mapstructure:"golang"`
}

// golang.go
func (c golangConfig) DescribeFields(opts ...options.DescribeFieldsOption) []options.FieldDescription {
    return []options.FieldDescription{
        {
            Name: "search-remote-licenses",
            Description: "search for go package licences by retrieving the package from a network proxy",
        },
    }
}

2. Test-Driven Inputs

test-observations.json Files

Location: syft/pkg/cataloger/*/test-fixtures/test-observations.json

Purpose: records what metadata and package types each parser produces during test execution

How they're generated: automatically by the pkgtest.CatalogTester helpers when tests run

Example test code:

func TestGoModuleCataloger(t *testing.T) {
    tester := NewGoModuleBinaryCataloger()

    pkgtest.NewCatalogTester().
        FromDirectory(t, "test-fixtures/go-module-fixture").
        TestCataloger(t, tester)  // Auto-writes observations on first run
}

Example observations file:

{
  "package": "golang",
  "parsers": {
    "parseGoMod": {
      "metadata_types": ["pkg.GolangModuleEntry"],
      "package_types": ["go-module"]
    },
    "parseGoSum": {
      "metadata_types": ["pkg.GolangModuleEntry"],
      "package_types": ["go-module"]
    }
  }
}

Why this exists:

Metadata types can't be determined from AST parsing alone
Ensures tests use the pkgtest helpers (enforced by TestAllCatalogers HaveObservations)
Provides test coverage visibility

3. Syft Runtime Inputs

Task Factories (`allPackageCatalogerInfo()`)

What's extracted:

Canonical list of all catalogers (ensures sync with binary)
Selectors (tags) for each cataloger

Example:

info := cataloger.CatalogerInfo{
    Name: "go-module-binary-cataloger",
    Selectors: []string{"go", "golang", "binary", "language", "package"},
}

4. Global Configuration Variables

Merge Logic Overrides (`merge.go`)

// catalogerTypeOverrides forces a specific cataloger type when discovery gets it wrong
var catalogerTypeOverrides = map[string]string{
    "java-archive-cataloger": "custom",  // technically generic but treated as custom
}

// catalogerConfigExceptions lists catalogers that should NOT have config linked
var catalogerConfigExceptions = strset.New(
    "binary-classifier-cataloger",
)

// catalogerConfigOverrides manually specifies config when linking fails
var catalogerConfigOverrides = map[string]string{
    "dotnet-portable-executable-cataloger": "dotnet.CatalogerConfig",
    "nix-store-cataloger":                  "nix.Config",
}

When to update:

Add to catalogerTypeOverrides when a cataloger's type is misdetected
Add to catalogerConfigExceptions when a cataloger shouldn't have config
Add to catalogerConfigOverrides when automatic config linking fails

Completeness Test Configuration (`completeness_test.go`)

// requireParserObservations controls observation validation strictness
// - true: fail if ANY parser is missing observations (strict)
// - false: only check custom catalogers (lenient, current mode)
const requireParserObservations = false

// metadataTypeCoverageExceptions lists metadata types allowed to not be documented
var metadataTypeCoverageExceptions = strset.New(
    reflect.TypeOf(pkg.MicrosoftKbPatch{}).Name(),
)

// packageTypeCoverageExceptions lists package types allowed to not be documented
var packageTypeCoverageExceptions = strset.New(
    string(pkg.JenkinsPluginPkg),
    string(pkg.KbPkg),
)

// observationExceptions maps cataloger/parser names to observation types to skip
// - nil value: skip ALL observation checks for this cataloger/parser
// - set value: skip only specified observation types
var observationExceptions = map[string]*strset.Set{
    "graalvm-native-image-cataloger": nil,  // skip all checks
    "linux-kernel-cataloger": strset.New("relationships"),  // skip only relationships
}

When to update:

Add to exceptions when a type is intentionally not documented
Add to observationExceptions when a cataloger lacks reliable test fixtures
Set requireParserObservations = true when ready to enforce full parser coverage

Completeness Tests

Purpose

The completeness_test.go file ensures packages.yaml stays in perfect sync with the codebase. These tests catch:

New catalogers that haven't been documented
Orphaned cataloger entries (cataloger was removed but YAML wasn't updated)
Missing metadata/package type documentation
Invalid capability field references
Catalogers not using test helpers

Test Categories

1. Synchronization Tests

TestCatalogersInSync

Ensures all catalogers from syft cataloger list appear in YAML
Ensures all catalogers in YAML exist in the binary
Ensures all capabilities sections are filled (no TODOs/nulls)

Failure means: you added/removed a cataloger but didn't regenerate packages.yaml

Fix: run go generate ./internal/capabilities

TestCapabilitiesAreUpToDate

Runs only in CI
Ensures regeneration succeeds
Ensures generated file has no uncommitted changes

Failure means: packages.yaml wasn't regenerated after code changes

Fix: run go generate ./internal/capabilities and commit changes

2. Coverage Tests

TestPackageTypeCoverage

Ensures all types in pkg.AllPkgs are documented in some cataloger
Allows exceptions via packageTypeCoverageExceptions

Failure means: you added a new package type but no cataloger documents it

Fix: either add a cataloger entry or add to exceptions if intentionally not supported

TestMetadataTypeCoverage

Ensures all types in packagemetadata.AllTypes() are documented
Allows exceptions via metadataTypeCoverageExceptions

Failure means: you added a new metadata type but no cataloger produces it

Fix: either add metadata_types to a cataloger or add to exceptions

TestMetadataTypesHaveJSONSchemaTypes

Ensures metadata_types and json_schema_types are synchronized
Validates every metadata type has a corresponding json_schema_type with correct conversion
Checks both cataloger-level and parser-level types

Failure means: metadata_types and json_schema_types are out of sync

Fix: run go generate ./internal/capabilities to regenerate synchronized types

3. Structure Tests

TestCatalogerStructure

Validates generic vs custom cataloger structure rules:
- Generic catalogers must have parsers, no cataloger-level capabilities
- Custom catalogers must have detectors and cataloger-level capabilities
Ensures ecosystem is always set

Failure means: cataloger structure doesn't follow conventions

Fix: correct the cataloger structure in packages.yaml

TestCatalogerDataQuality

Checks for duplicate cataloger names
Validates detector formats for custom catalogers
Checks for duplicate parser functions within catalogers

Failure means: data integrity issue in packages.yaml

Fix: remove duplicates or fix detector definitions

4. Config Tests

TestConfigCompleteness

Ensures all configs in the configs: section are referenced by a cataloger
Ensures all cataloger config references exist
Ensures all app-key references exist in application: section

Failure means: orphaned config or broken reference

Fix: remove unused configs or add missing entries

TestAppConfigFieldsHaveDescriptions

Ensures all application config fields have descriptions

Failure means: missing DescribeFields() entry

Fix: add description in the ecosystem's DescribeFields() method

TestAppConfigKeyFormat

Validates config keys follow format: ecosystem.field-name
Ensures kebab-case (no underscores or spaces)

Failure means: malformed config key

Fix: rename the config key to follow conventions

5. Capability Tests

TestCapabilityConfigFieldReferences

Validates that config fields referenced in capability conditions actually exist
Checks both cataloger-level and parser-level capabilities

Example failure:

capabilities:
  - name: license
    conditions:
      - when: {NonExistentField: true}  # ← this field doesn't exist in config struct
        value: true

Fix: correct the field name to match the actual config struct

TestCapabilityFieldNaming

Ensures capability field names follow known patterns:
- license
- dependency.depth
- dependency.edges
- dependency.kinds
- package_manager.files.listing
- package_manager.files.digests
- package_manager.package_integrity_hash

Failure means: typo in capability field name

Fix: correct the typo or add new field to known list

TestCapabilityValueTypes

Validates capability values match expected types:
- Boolean fields: license, package_manager.*
- Array fields: dependency.depth, dependency.kinds
- String fields: dependency.edges

Example failure:

capabilities:
  - name: license
    default: "yes"  # ← should be boolean true/false

Fix: use correct type for the field

TestCapabilityEvidenceFieldReferences

Validates that evidence references point to real struct fields
Uses AST parsing to verify field paths exist

Example:

capabilities:
  - name: package_manager.files.digests
    default: true
    evidence:
      - AlpmDBEntry.Files[].Digests  # ← validates this path exists

Failure means: typo in evidence reference or struct was changed

Fix: correct the evidence reference or update after struct changes

6. Observations Test

TestCatalogersHaveTestObservations

Ensures all custom catalogers have test observations
Optionally checks parsers (controlled by requireParserObservations)
Allows exceptions via observationExceptions

Failure means: cataloger tests aren't using pkgtest helpers

Fix: update tests to use pkgtest.CatalogTester:

pkgtest.NewCatalogTester().
    FromDirectory(t, "test-fixtures/my-fixture").
    TestCataloger(t, myCataloger)

How to Fix Test Failures

General Approach

Read the test error message - it usually tells you exactly what's wrong
Check if regeneration needed - most failures fixed by: go generate ./internal/capabilities
Check for code/test changes - did you add/modify a cataloger?
Consider exceptions - is this intentionally unsupported?

Common Failures and Fixes

Failure	Most Likely Cause	Fix
Cataloger not in YAML	Added new cataloger	Regenerate
Orphaned YAML entry	Removed cataloger	Regenerate
Missing metadata type	Added type but no test observations	Add pkgtest usage or exception
Missing observations	Test not using pkgtest	Update test to use `CatalogTester`
Config field reference	Typo in capability condition	Fix field name in YAML
Incomplete capabilities	Missing capability definition	Add capabilities section to YAML

Manual Maintenance

What Requires Manual Editing

these fields in packages.yaml are MANUAL and must be maintained by hand:

1. Ecosystem Field (Cataloger Level)

catalogers:
  - ecosystem: golang  # MANUAL - identify the ecosystem

Guidelines: use the ecosystem/language name (golang, python, java, rust, etc.)

2. Capabilities Sections

For Generic Catalogers (parser level):

parsers:
  - function: parseGoMod
    capabilities:  # MANUAL
      - name: license
        default: false
        conditions:
          - when: {SearchRemoteLicenses: true}
            value: true
            comment: fetches licenses from proxy.golang.org
      - name: dependency.depth
        default: [direct, indirect]
      - name: dependency.edges
        default: complete

For Custom Catalogers (cataloger level):

catalogers:
  - name: linux-kernel-cataloger
    type: custom
    capabilities:  # MANUAL
      - name: license
        default: true

3. Detectors for Custom Catalogers

For most custom catalogers:

detectors:  # MANUAL
  - method: glob
    criteria:
      - '**/lib/modules/**/modules.builtin'
    comment: kernel modules directory

Exception: binary-classifier-cataloger has AUTO-GENERATED detectors extracted from source

4. Detector Conditions

when a detector should only be active with certain configuration:

detectors:
  - method: glob
    criteria: ['**/*.zip']
    conditions:  # MANUAL
      - when: {IncludeZipFiles: true}
        comment: ZIP detection requires explicit config

Capabilities Format and Guidelines

Standard Capability Fields

Boolean Fields:

- name: license
  default: true  # always available
  # OR
  default: false  # never available
  # OR
  default: false
  conditions:
    - when: {SearchRemoteLicenses: true}
      value: true
      comment: requires network access to fetch licenses

Array Fields (dependency.depth):

- name: dependency.depth
  default: [direct]              # only immediate dependencies
  # OR
  default: [direct, indirect]    # full transitive closure
  # OR
  default: []                    # no dependency information

String Fields (dependency.edges):

- name: dependency.edges
  default: ""        # dependencies found but no edges between them
  # OR
  default: flat      # single level of dependencies with edges to root only
  # OR
  default: reduced   # transitive reduction (redundant edges removed)
  # OR
  default: complete  # all relationships with accurate direct/indirect edges

Array Fields (dependency.kinds):

- name: dependency.kinds
  default: [runtime]                    # production dependencies only
  # OR
  default: [runtime, dev]               # production and development
  # OR
  default: [runtime, dev, build, test]  # all dependency types

Using Conditions

Conditions allow capabilities to vary based on configuration values:

capabilities:
  - name: license
    default: false
    conditions:
      - when: {SearchLocalModCacheLicenses: true}
        value: true
        comment: searches for licenses in GOPATH mod cache
      - when: {SearchRemoteLicenses: true}
        value: true
        comment: fetches licenses from proxy.golang.org
    comment: license scanning requires configuration

Rules:

Conditions are evaluated in array order (first match wins)
Multiple fields WITHIN a when clause use AND logic (all must match)
Multiple conditions in the array use OR logic (first matching condition)
If no conditions match, the default value is used

Adding Evidence

evidence documents which struct fields provide the capability:

- name: package_manager.files.listing
  default: true
  evidence:
    - AlpmDBEntry.Files
  comment: file listings stored in Files array

For nested fields:

evidence:
  - CondaMetaPackage.PathsData.Paths

For array element fields:

evidence:
  - AlpmDBEntry.Files[].Digests

Best Practices

Be specific in comments: explain WHY, not just WHAT
Document conditions clearly: explain what configuration enables the capability
Use evidence references: helps verify capabilities are accurate
Test after edits: run go test ./internal/capabilities/generate to validate

Development Workflows

Adding a New Cataloger

If Using `generic.NewCataloger()`:

What happens automatically:

Generator discovers the cataloger via AST parsing
Extracts parsers, detectors, and patterns
Adds entry to packages.yaml with structure
Links to config (if constructor has config parameter)
Extracts metadata types from test-observations.json (if test uses pkgtest)

What you must do manually:

Set the ecosystem field in packages.yaml
Add capabilities sections to each parser
Run go generate ./internal/capabilities
Commit the updated packages.yaml

Example workflow:

# 1. Write cataloger code
vim syft/pkg/cataloger/mynew/cataloger.go

# 2. Write tests using pkgtest (generates observations)
vim syft/pkg/cataloger/mynew/cataloger_test.go

# 3. Run tests to generate observations
go test ./syft/pkg/cataloger/mynew

# 4. Regenerate packages.yaml
go generate ./internal/capabilities

# 5. Edit packages.yaml manually
vim internal/capabilities/packages.yaml
# - Set ecosystem field
# - Add capabilities sections

# 6. Validate
go test ./internal/capabilities/generate

# 7. Commit
git add internal/capabilities/packages.yaml
git add syft/pkg/cataloger/mynew/test-fixtures/test-observations.json
git commit

If Writing a Custom Cataloger:

What happens automatically:

Generator creates entry with name and type
Extracts metadata types from test-observations.json

What you must do manually:

Set ecosystem
Add detectors array with detection methods
Add capabilities section (cataloger level, not parser level)
Run go generate ./internal/capabilities

Modifying an Existing Cataloger

If Changing Parser Detection Patterns:

Impact: AUTO-GENERATED field, automatically updated

Workflow:

# 1. Change the code
vim syft/pkg/cataloger/something/cataloger.go

# 2. Regenerate
go generate ./internal/capabilities

# 3. Review changes
git diff internal/capabilities/packages.yaml

# 4. Commit
git add internal/capabilities/packages.yaml
git commit

If Changing Metadata Type:

Impact: AUTO-GENERATED field, updated via test observations

Workflow:

# 1. Change the code
vim syft/pkg/cataloger/something/parser.go

# 2. Update tests (if needed)
vim syft/pkg/cataloger/something/parser_test.go

# 3. Run tests to update observations
go test ./syft/pkg/cataloger/something

# 4. Regenerate
go generate ./internal/capabilities

# 5. Commit
git add internal/capabilities/packages.yaml
git add syft/pkg/cataloger/something/test-fixtures/test-observations.json
git commit

If Changing Capabilities:

Impact: MANUAL field, preserved across regeneration

Workflow:

# 1. Edit packages.yaml directly
vim internal/capabilities/packages.yaml

# 2. Validate
go test ./internal/capabilities/generate

# 3. Commit
git commit internal/capabilities/packages.yaml

Adding New Capability Fields

if you need to add a completely new capability field (e.g., package_manager.build_tool_info):

Steps:

Add field name to known fields in TestCapabilityFieldNaming (completeness_test.go)
Add value type validation to validateCapabilityValueType() (completeness_test.go)
Update file header documentation in packages.yaml
Add the field to relevant catalogers in packages.yaml
Update any runtime code that consumes capabilities

When to Update Exceptions

Add to `catalogerTypeOverrides`:

Discovery incorrectly classifies a cataloger's type
Example: cataloger uses generic framework but behaves like custom

Add to `catalogerConfigExceptions`:

Cataloger should not have config linked
Example: simple catalogers with no configuration

Add to `catalogerConfigOverrides`:

Automatic config linking fails
Cataloger in a subpackage or unusual structure
Example: dotnet catalogers split across multiple packages

Add to `metadataTypeCoverageExceptions`:

Metadata type is deprecated or intentionally unused
Example: MicrosoftKbPatch (special case type)

Add to `packageTypeCoverageExceptions`:

Package type is deprecated or special case
Example: JenkinsPluginPkg, KbPkg

Add to `observationExceptions`:

Cataloger lacks reliable test fixtures (e.g., requires specific binaries)
Cataloger produces relationships but they're not standard dependencies
Example: graalvm-native-image-cataloger (requires native images)

File Inventory

Core Generation

main.go: entry point, orchestrates regeneration, prints status messages
merge.go: core merging logic, preserves manual sections while updating auto-generated
io.go: YAML reading/writing with comment preservation using gopkg.in/yaml.v3

Discovery

discover_catalogers.go: AST parsing to discover generic catalogers and parsers from source code
discover_cataloger_configs.go: AST parsing to discover cataloger config structs
discover_app_config.go: AST parsing to discover application-level config from options package
cataloger_config_linking.go: links catalogers to config structs by analyzing constructors
discover_metadata.go: reads test-observations.json files to get metadata/package types

Validation & Utilities

completeness_test.go: comprehensive test suite ensuring packages.yaml is complete and synced
cataloger_names.go: helper to get all cataloger names from syft task factories
metadata_check.go: validates metadata and package type coverage

Tests

config_discovery_test.go: tests for config discovery
cataloger_config_linking_test.go: tests for config linking
detector_validation_test.go: tests for detector validation
merge_test.go: tests for merge logic

Troubleshooting

"Cataloger X not found in packages.yaml"

Cause: you added a new cataloger but didn't regenerate packages.yaml

Fix:

go generate ./internal/capabilities

"Cataloger X in YAML but not in binary"

Cause: you removed a cataloger but didn't regenerate

Fix:

go generate ./internal/capabilities
# Review the diff - the cataloger entry should be removed

"Metadata type X not represented in any cataloger"

Cause: you added a new metadata type but:

No cataloger produces it yet, OR
Tests don't use pkgtest helpers (so observations aren't generated)

Fix Option 1 - Add test observations:

// Update test to use pkgtest
pkgtest.NewCatalogTester().
    FromDirectory(t, "test-fixtures/my-fixture").
    TestCataloger(t, myCataloger)

// Run tests
go test ./syft/pkg/cataloger/mypackage

// Regenerate
go generate ./internal/capabilities

Fix Option 2 - Add exception (if intentionally unused):

// completeness_test.go
var metadataTypeCoverageExceptions = strset.New(
    reflect.TypeOf(pkg.MyNewType{}).Name(),
)

"Parser X has no test observations"

Cause: test doesn't use pkgtest helpers

Fix:

// Before:
func TestMyParser(t *testing.T) {
    // manual test code
}

// After:
func TestMyParser(t *testing.T) {
    cataloger := NewMyCataloger()
    pkgtest.NewCatalogTester().
        FromDirectory(t, "test-fixtures/my-fixture").
        TestCataloger(t, cataloger)
}

"Config field X not found in struct Y"

Cause: capability condition references a non-existent config field

Fix: edit packages.yaml and correct the field name:

# Before:
conditions:
  - when: {SerachRemoteLicenses: true}  # typo!

# After:
conditions:
  - when: {SearchRemoteLicenses: true}

"Evidence field X.Y not found in struct X"

Cause:

Typo in evidence reference, OR
Struct was refactored and field moved/renamed

Fix: edit packages.yaml and correct the evidence reference:

# Before:
evidence:
  - AlpmDBEntry.FileListing  # wrong field name

# After:
evidence:
  - AlpmDBEntry.Files

"packages.yaml has uncommitted changes after regeneration"

Cause: packages.yaml is out of date (usually caught in CI)

Fix:

go generate ./internal/capabilities
git add internal/capabilities/packages.yaml
git commit -m "chore: regenerate capabilities"

Generator Fails with "struct X not found"

Cause: config linking trying to link to a non-existent struct

Fix Option 1 - Add override:

// merge.go
var catalogerConfigOverrides = map[string]string{
    "my-cataloger": "mypackage.MyConfig",
}

Fix Option 2 - Add exception:

// merge.go
var catalogerConfigExceptions = strset.New(
    "my-cataloger",  // doesn't use config
)

"Parser capabilities must be defined"

Cause: parser in packages.yaml has no capabilities section

Fix: add capabilities to the parser:

parsers:
  - function: parseMyFormat
    capabilities:
      - name: license
        default: false
      - name: dependency.depth
        default: []
      # ... (add all required capability fields)

Understanding Error Messages

most test failures include detailed guidance. Look for:

List of missing items: tells you exactly what to add/remove
Suggestions: usually includes the command to fix (e.g., "Run 'go generate ./internal/capabilities'")
File locations: tells you which file to edit

General debugging approach:

Read the full error message
Check if it's fixed by regeneration
Check for recent code/test changes
Consider if it should be an exception
Ask for help if still stuck (include full error message)

Questions or Issues?

if you encounter problems not covered here:

Check test error messages (they're usually quite helpful)
Look at recent commits for examples of similar changes
Ask in the team chat with the full error message

README.md

Capabilities Generation System

Quick Start

Data Flow

Key Data Flows

The packages.yaml File

Purpose

Structure

AUTO-GENERATED vs MANUAL Fields

AUTO-GENERATED Fields

MANUAL Fields

How Regeneration Works

Generation Process

High-Level Workflow

Detailed Discovery Processes

1. Generic Cataloger Discovery (discover_catalogers.go)

2. Config Discovery (discover_cataloger_configs.go)

3. App Config Discovery (discover_app_config.go)

4. Cataloger-Config Linking (cataloger_config_linking.go)

5. Metadata Discovery (discover_metadata.go)

Input Sources

1. Source Code Inputs

Cataloger Constructors (syft/pkg/cataloger/*/cataloger.go)

Config Structs (syft/pkg/cataloger/*/config.go)

Options Package (cmd/syft/internal/options/)

2. Test-Driven Inputs

test-observations.json Files

3. Syft Runtime Inputs

Task Factories (allPackageCatalogerInfo())

4. Global Configuration Variables

Merge Logic Overrides (merge.go)

Completeness Test Configuration (completeness_test.go)

Completeness Tests

Purpose

Test Categories

1. Synchronization Tests

2. Coverage Tests

3. Structure Tests

4. Config Tests

5. Capability Tests

6. Observations Test

How to Fix Test Failures

General Approach

Common Failures and Fixes

Manual Maintenance

What Requires Manual Editing

1. Ecosystem Field (Cataloger Level)

2. Capabilities Sections

3. Detectors for Custom Catalogers

4. Detector Conditions

Capabilities Format and Guidelines

Standard Capability Fields

Using Conditions

Adding Evidence

Best Practices

Development Workflows

Adding a New Cataloger

If Using generic.NewCataloger():

If Writing a Custom Cataloger:

Modifying an Existing Cataloger

If Changing Parser Detection Patterns:

If Changing Metadata Type:

If Changing Capabilities:

Adding New Capability Fields

When to Update Exceptions

Add to catalogerTypeOverrides:

Add to catalogerConfigExceptions:

Add to catalogerConfigOverrides:

Add to metadataTypeCoverageExceptions:

Add to packageTypeCoverageExceptions:

Add to observationExceptions:

File Inventory

Core Generation

Discovery

Validation & Utilities

Tests

Troubleshooting

"Cataloger X not found in packages.yaml"

"Cataloger X in YAML but not in binary"

"Metadata type X not represented in any cataloger"

The `packages.yaml` File

1. Generic Cataloger Discovery (`discover_catalogers.go`)

2. Config Discovery (`discover_cataloger_configs.go`)

3. App Config Discovery (`discover_app_config.go`)

4. Cataloger-Config Linking (`cataloger_config_linking.go`)

5. Metadata Discovery (`discover_metadata.go`)

Cataloger Constructors (`syft/pkg/cataloger/*/cataloger.go`)

Config Structs (`syft/pkg/cataloger/*/config.go`)

Options Package (`cmd/syft/internal/options/`)

Task Factories (`allPackageCatalogerInfo()`)

Merge Logic Overrides (`merge.go`)

Completeness Test Configuration (`completeness_test.go`)

If Using `generic.NewCataloger()`:

Add to `catalogerTypeOverrides`:

Add to `catalogerConfigExceptions`:

Add to `catalogerConfigOverrides`:

Add to `metadataTypeCoverageExceptions`:

Add to `packageTypeCoverageExceptions`:

Add to `observationExceptions`: