diff --git a/DEVELOPING.md b/DEVELOPING.md new file mode 100644 index 000000000..01dfe790e --- /dev/null +++ b/DEVELOPING.md @@ -0,0 +1,179 @@ +# Developing + +## Getting started + +In order to test and develop in this repo you will need the following dependencies installed: +- docker +- make + +After cloning do the following: +1. run `make bootstrap` to download go mod dependencies, create the `/.tmp` dir, and download helper utilities. +2. run `make` to run linting, tests, and other verifications to make certain everything is working alright. + +Checkout `make help` to see what other actions you can take. + +The main make tasks for common static analysis and testing are `lint`, `lint-fix`, `unit`, `integration`, and `cli`. + +## Levels of testing + +- `unit`: The default level of test which is distributed throughout the repo are unit tests. Any `_test.go` file that + does not reside somewhere within the `/test` directory is a unit test. Other forms of testing should be organized in + the `/test` directory. These tests should focus on correctness of functionality in depth. % Test coverage metrics + only considers unit tests and no other forms of testing. + +- `integration`: located within `test/integration`, these tests focus on the behavior surfaced by the common library + entrypoints from the `syft` package and make light assertions about the results surfaced. Additionally, these tests + tend to make diversity assertions for enum-like objects, ensuring that as enum values are added to a definition + that integration tests will automatically fail if no test attempts to use that enum value. For more details see + the "Data diversity and freshness assertions" section below. + +- `cli`: located with in `test/cli`, these are tests that test the correctness of application behavior from a + snapshot build. This should be used in cases where a unit or integration test will not do or if you are looking + for in-depth testing of code in the `cmd/` package (such as testing the proper behavior of application configuration, + CLI switches, and glue code before syft library calls). + +- `acceptance`: located within `test/acceptance`, these are smoke-like tests that ensure that application packaging + and installation works as expected. For example, during release we provide RPM packages as a download artifact. We + also have an accompanying RPM acceptance test that installs the RPM from a snapshot build and ensures the output + of a syft invocation matches canned expected output. New acceptance tests should be added for each release artifact + and architecture supported (when possible). + +### Data diversity and freshness assertions + +It is important that tests against the codebase are flexible enough to begin failing when they do not cover "enough" +of the objects under test. "Cover" in this case does not mean that some percentage of the code has been executed +during testing, but instead that there is enough diversity of data input reflected in testing relative to the +definitions available. + +For instance, consider an enum-like value like so: +```go +type Language string + +const ( + Java Language = "java" + JavaScript Language = "javascript" + Python Language = "python" + Ruby Language = "ruby" + Go Language = "go" +) +``` + +Say we have a test that exercises all the languages defined today: + +```go +func TestCatalogPackages(t *testing.T) { + testTable := []struct { + // ... the set of test cases that test all languages + } + for _, test := range cases { + t.Run(test.name, func (t *testing.T) { + // use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects + // ... + }) + } +} +``` + +Where each test case has a `inputFixturePath` that would result with packages from each language. This test is +brittle since it does not assert that all languages were exercised directly and future modifications (such as +adding a new language) won't be covered by any test cases. + +To address this the enum-like object should have a definition of all objects that can be used in testing: + +```go +type Language string + +// const( Java Language = ..., ... ) + +var AllLanguages = []Language{ + Java, + JavaScript, + Python, + Ruby, + Go, + Rust, +} +``` + +Allowing testing to automatically fail when adding a new language: + +```go +func TestCatalogPackages(t *testing.T) { + testTable := []struct { + // ... the set of test cases that (hopefully) covers all languages + } + + // new stuff... + observedLanguages := strset.New() + + for _, test := range cases { + t.Run(test.name, func (t *testing.T) { + // use inputFixturePath and assert that syft.CatalogPackages() returns the set of expected Package objects + // ... + + // new stuff... + for _, actualPkg := range actual { + observedLanguages.Add(string(actualPkg.Language)) + } + + }) + } + + // new stuff... + for _, expectedLanguage := range pkg.AllLanguages { + if !observedLanguages.Contains(expectedLanguage) { + t.Errorf("failed to test language=%q", expectedLanguage) + } + } +} +``` + +This is a better test since it will fail when someone adds a new language but fails to write a test case that should +exercise that new language. This method is ideal for integration-level testing, where testing correctness in depth +is not needed (that is what unit tests are for) but instead testing in breadth to ensure that units are well integrated. + +A similar case can be made for data freshness; if the quality of the results will be diminished if the input data +is not kept up to date then a test should be written (when possible) to assert any input data is not stale. + +An example of this is the static list of licenses that is stored in `internal/spdxlicense` for use by the SPDX +presenters. This list is updated and published periodically by an external group and syft can grab and update this +list by running `go generate ./...` from the root of the repo. + +An integration test has been written to grabs the latest license list version externally and compares that version +with the version generated in the codebase. If they differ, the test fails, indicating to someone that there is an +action needed to update it. + +**_The key takeaway is to try and write tests that fail when data assumptions change and not just when code changes.**_ + +### Snapshot tests + +The format objects make a lot of use of "snapshot" testing, where you save the expected output bytes from a call into the +git repository and during testing make a comparison of the actual bytes from the subject under test with the golden +copy saved in the repo. The "golden" files are stored in the `test-fixtures/snapshot` directory relative to the go +package under test and should always be updated by invoking `go test` on the specific test file with a specific CLI +update flag provided. + +Many of the `Format` tests make use of this approach, where the raw SBOM report is saved in the repo and the test +compares that SBOM with what is generated from the latest presenter code. For instance, at the time of this writing +the CycloneDX presenter snapshots can be updated by running: + +```bash +go test ./internal/formats -update-cyclonedx +``` + +These flags are defined at the top of the test files that have tests that use the snapshot files. + +Snapshot testing is only as good as the manual verification of the golden snapshot file saved to the repo! Be careful +and diligent when updating these files. + +## Architecture + +TODO: outline: +- analysis creates a static SBOM which can be encoded and decoded. +- format objects, should strive to not add or enrich data in encoding that could otherwise be done during analysis +- pkg.Catalogers +- file catalogers +- source.Source +- file.Resolvers +- logger abstraction +- events / bus abstraction \ No newline at end of file