syft/syft/pkg/dpkg.go
Alan Pope 5fa8e9c6e9
feat: add Debian archive (.deb) file cataloger (#3704)
* feat: add Debian archive (.deb) file cataloger

Add a cataloger that parses Debian package (.deb) archive files directly,
allowing Syft to discover packages from .deb files without requiring
them to be installed on the system. This implements issue #3315.

Key features:
- Parse .deb AR archives to extract package metadata
- Support for gzip, xz, and zstd compressed control files
- Extract package metadata from control files
- Process file information from md5sums files
- Mark configuration files from conffiles entries
- Handle trailing slashes in archive member names

Signed-off-by: Alan Pope <alan.pope@anchore.com>

* chore: run go mod tidy to fix failing workflow

Signed-off-by: Alan Pope <alan.pope@anchore.com>

* add license processing to dpkg archive cataloger + add tests

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* update json schema with dpkg archive type

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

* update comments

Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>

---------

Signed-off-by: Alan Pope <alan.pope@anchore.com>
Signed-off-by: Alex Goodman <wagoodman@users.noreply.github.com>
Co-authored-by: Alex Goodman <wagoodman@users.noreply.github.com>
2025-03-19 20:03:21 +00:00

81 lines
3.6 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package pkg
import (
"sort"
"github.com/scylladb/go-set/strset"
"github.com/anchore/syft/syft/file"
)
const DpkgDBGlob = "**/var/lib/dpkg/{status,status.d/**}"
var _ FileOwner = (*DpkgDBEntry)(nil)
type DpkgArchiveEntry DpkgDBEntry
// DpkgDBEntry represents all captured data for a Debian package DB entry; available fields are described
// at http://manpages.ubuntu.com/manpages/xenial/man1/dpkg-query.1.html in the --showformat section.
// Additional information about how these fields are used can be found at
// - https://www.debian.org/doc/debian-policy/ch-controlfields.html
// - https://www.debian.org/doc/debian-policy/ch-relationships.html
// - https://www.debian.org/doc/debian-policy/ch-binary.html#s-virtual-pkg
// - https://www.debian.org/doc/debian-policy/ch-relationships.html#s-virtual
type DpkgDBEntry struct {
Package string `json:"package"`
Source string `json:"source" cyclonedx:"source"`
Version string `json:"version"`
SourceVersion string `json:"sourceVersion" cyclonedx:"sourceVersion"`
// Architecture can include the following sets of values depending on context and the control file used:
// - a unique single word identifying a Debian machine architecture as described in Architecture specification string (https://www.debian.org/doc/debian-policy/ch-customized-programs.html#s-arch-spec) .
// - an architecture wildcard identifying a set of Debian machine architectures, see Architecture wildcards (https://www.debian.org/doc/debian-policy/ch-customized-programs.html#s-arch-wildcard-spec). any matches all Debian machine architectures and is the most frequently used.
// - "all", which indicates an architecture-independent package.
// - "source", which indicates a source package.
Architecture string `json:"architecture"`
// Maintainer is the package maintainers name and email address. The name must come first, then the email
// address inside angle brackets <> (in RFC822 format).
Maintainer string `json:"maintainer"`
InstalledSize int `json:"installedSize" cyclonedx:"installedSize"`
// Description contains a description of the binary package, consisting of two parts, the synopsis or the short
// description, and the long description (in a multiline format).
Description string `hash:"ignore" json:"-"`
// Provides is a virtual package that is provided by one or more packages. A virtual package is one which appears
// in the Provides control field of another package. The effect is as if the package(s) which provide a particular
// virtual package name had been listed by name everywhere the virtual package name appears. (See also Virtual packages)
Provides []string `json:"provides,omitempty"`
// Depends This declares an absolute dependency. A package will not be configured unless all of the packages listed in
// its Depends field have been correctly configured (unless there is a circular dependency).
Depends []string `json:"depends,omitempty"`
// PreDepends is like Depends, except that it also forces dpkg to complete installation of the packages named
// before even starting the installation of the package which declares the pre-dependency.
PreDepends []string `json:"preDepends,omitempty"`
Files []DpkgFileRecord `json:"files"`
}
// DpkgFileRecord represents a single file attributed to a debian package.
type DpkgFileRecord struct {
Path string `json:"path"`
Digest *file.Digest `json:"digest,omitempty"`
IsConfigFile bool `json:"isConfigFile"`
}
func (m DpkgDBEntry) OwnedFiles() (result []string) {
s := strset.New()
for _, f := range m.Files {
if f.Path != "" {
s.Add(f.Path)
}
}
result = s.List()
sort.Strings(result)
return
}