-
-
Notifications
You must be signed in to change notification settings - Fork 67
[FEATURE]: component type "source" #612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the suggestion. I know SPDX added source as a "purpose". Not sure of the logic behind that. The purpose is typically tied to a use case or behavior, both of which are coming to CycloneDX v1.7. Its purpose it not to be a source file. Regardless. My question however, is a "source of what"? An assumption cannot be made that it is source code. As "source" is context specific and could mean many different things. For example, an Adobe Illustrator Using a combination of:
should provide everything necessary to indicate that a component is a source file. I'm not sure what value adding an additional component type would provide. CycloneDX typically is very prescriptive in how things are represented. This approach, while less flexible, leads to greater adoption and less deviations between implementations. Adding a second way to represent source code would likely add unnecessary confusion to the spec. For interpreted languages such as Python and Javascript where non-packaged files are included in a deliverable, adding "source" may add additional confusion to SBOM consumers. Generating SBOMs for source files is briefly discussed on page 34 of the CycloneDX Authoritative Guide to SBOM. Looking through it, there is certainly room for improvement. Are there any use cases where the above strategy does not work? |
Our use case is related to how the packaging of sources and binaries works in Debian: a Source-Package describes the sources for a Binary-Package (the ones you install e.g. with This is an excerpt of how this relationship can be expressed in a SPDX JSON document:
We are not quite sure how this can be expressed with CycloneDX. We would either need a better way to express the relationship in the |
Also note especially that there's not a single file for a Debian source package, but it usually consists of three files which together make up the source package, so when we would use a Roughly the same applies to Alpine Linux where you get patches from the "aports" Git repo plus a control file which tells Alpine tooling where to download upstream source archives. So in a nutshell, I think what we want to express is an abstracted "source code component" which doesn't necessarily map to a single "source code file". |
CycloneDX takes a very different approach from SPDX. CycloneDX incorporates the concept of formulation which can describe the precise steps necessary for reproduction of physical or virtual goods. Its commonly used in the manufacturing world and is referred to as MBOM, or Manufacturing Bill of Materials. In this example, you can see the components that ultimately were responsible for creating Hello World, including the source file, makefile, and the tools. {
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
"version": 1,
"components": [
{
"type": "application",
"bom-ref": "helloworld",
"name": "Hello World",
"externalReferences": [
{
"type": "formulation",
"url": "urn:cdx:3e671687-395b-41f5-a30f-a58921a69b79/1#formula-1"
}
]
}
],
"formulation": [
{
"bom-ref": "formula-1",
"components": [
{
"bom-ref": "file:///CycloneDX/MBOM-examples/simple-application-makefile/helloworld.c",
"type": "file",
"name": "helloworld.c",
"version": "1.0",
"mime-type": "text/x-csrc"
},
{
"bom-ref": "file:///CycloneDX/MBOM-examples/simple-application-makefile/Makefile",
"type": "file",
"name": "Makefile",
"version": "1.0"
},
{
"bom-ref": "file:///Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/gcc",
"type": "application",
"name": "gcc",
"version": "16.0.0 (clang-1600.0.26.4)"
},
{
"bom-ref": "file:///Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/make",
"type": "application",
"name": "GNU Make",
"version": "3.81"
}
]
}
]
} The above example is woofully incomplete, but provides the same level of information as SPDX. A more complete formulation would include the workflow. For example: {
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
"version": 1,
"components": [
{
"type": "application",
"bom-ref": "helloworld",
"name": "Hello World",
"externalReferences": [
{
"type": "formulation",
"url": "urn:cdx:3e671687-395b-41f5-a30f-a58921a69b79/1#formula-1"
}
]
}
],
"formulation": [
{
"bom-ref": "formula-1",
"components": [
{
"bom-ref": "file:///CycloneDX/MBOM-examples/simple-application-makefile/helloworld.c",
"type": "file",
"name": "helloworld.c",
"version": "1.0",
"mime-type": "text/x-csrc"
},
{
"bom-ref": "file:///CycloneDX/MBOM-examples/simple-application-makefile/Makefile",
"type": "file",
"name": "Makefile",
"version": "1.0"
},
{
"bom-ref": "file:///Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/gcc",
"type": "application",
"name": "gcc",
"version": "16.0.0 (clang-1600.0.26.4)"
},
{
"bom-ref": "file:///Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/make",
"type": "application",
"name": "GNU Make",
"version": "3.81"
}
],
"workflows": [
{
"bom-ref": "workflow-1",
"uid": "uuid:8a2c3bf8-77fe-4c1d-849b-c534abc73aee",
"taskTypes": [ "clean", "build" ],
"tasks": [
{
"bom-ref": "task-1",
"uid": "uuid:dbb6c5c0-6958-4a18-ac67-d897dbee76b6",
"taskTypes": ["clean", "build"],
"name": "make build task",
"description": "A task that captures 'make build' step.",
"steps": [
{
"name": "run make build",
"commands": [
{
"executed": "make build"
}
]
}
]
}
],
"trigger": {
"bom-ref": "trigger-1",
"uid": "uuid:c77a73f0-cc1b-40ac-9ee8-e68dbe6e9583",
"type": "manual"
}
}
]
}
]
} It's possible to get even more granular than that by incorporating the systems and environments that were present at the time, thus recreating the exact conditions necessary for true reproducibility - not just bit-for-bit. Typically, formulation information is NOT included in an SBOM. Rather the SBOM contains the inventory of what is delivered in a final product and the MBOM describes how that product came into existence. They are tied together through the |
There are likely two other external references that you may find useful. {
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
"version": 1,
"components": [
{
"type": "application",
"bom-ref": "helloworld",
"name": "Hello World",
"externalReferences": [
{
"type": "vcs",
"url": "https://url-to-version-control"
},
{
"type": "source-distribution",
"url": "https://url-to-source-distribution-artifact"
}
]
}
]
} Optionally, the source distribution artifact(s) can also be represented in the BOM and be referenced. For example: {
"$schema": "http://cyclonedx.org/schema/bom-1.6.schema.json",
"bomFormat": "CycloneDX",
"specVersion": "1.6",
"serialNumber": "urn:uuid:3e671687-395b-41f5-a30f-a58921a69b79",
"version": 1,
"components": [
{
"type": "application",
"bom-ref": "helloworld",
"name": "Hello World",
"externalReferences": [
{
"type": "source-distribution",
"url": "urn:cdx:3e671687-395b-41f5-a30f-a58921a69b79/1#helloworld-source-archive"
}
]
},
{
"type": "file",
"bom-ref": "helloworld-source-archive",
"name": "Hello-World-(sources).gzip",
"mime-type": "application/gzip"
}
]
} |
Thank you very much, @stevespringett, for the detailed answer!! So if I apply this to my Debian use case, because there are usually three source files, I would end up with something like: {
"serialNumber": "urn:uuid:85635bff-dad1-42ae-bc74-cd3b70ca2a2d",
"version": 1,
...
"components": [
{
"type": "application",
"bom-ref": "bsdutils",
"name": "bsdutils",
"version": "2.38.1-5+deb12u3",
"externalReferences": [
{
"type": "source-distribution",
"url": "urn:cdx:85635bff-dad1-42ae-bc74-cd3b70ca2a2d/1#util-linux.dsc"
},
{
"type": "source-distribution",
"url": "urn:cdx:85635bff-dad1-42ae-bc74-cd3b70ca2a2d/1#util-linux.orig"
},
{
"type": "source-distribution",
"url": "urn:cdx:85635bff-dad1-42ae-bc74-cd3b70ca2a2d/1#util-linux.debian.tar.xz"
}
]
},
{
"type": "file",
"bom-ref": "util-linux.dsc",
"name": "util-linux_2.38.1-5+deb12u3.dsc",
},
{
"type": "file",
"bom-ref": "util-linux.orig",
"name": "util-linux_2.38.1.orig.tar.xz",
},
{
"type": "file",
"bom-ref": "util-linux.debian.tar.xz",
"name": "util-linux_2.38.1-5+deb12u3.debian.tar.xz",
}
]
} (And if we look at Alpine, we can even have way more source files, 8 in this example which are not available in a single .zip.) However, there is the concept of a Debian "source package", what would be much more natural for me to express: {
"serialNumber": "urn:uuid:85635bff-dad1-42ae-bc74-cd3b70ca2a2d",
"version": 2,
...
"components": [
{
"type": "application",
"bom-ref": "bsdutils",
"name": "bsdutils",
"version": "2.38.1-5+deb12u3",
"externalReferences": [
{
"type": "source-distribution",
"url": "urn:cdx:85635bff-dad1-42ae-bc74-cd3b70ca2a2d/2#util-linux"
}
]
},
{
"type": "source",
"name": "util-linux",
"version": "2.38.1-5+deb12u3",
"purl": "pkg:deb/debian/[email protected]+deb12u3?arch=source"
}
]
} So a Debian source package can either be described with one PURL or multiple files. Referring to the "source package" instead of the files it consists of doesn't only make version 2 of my SBOM more concise, but also allows to easily map CVEs against it. To find out that CVE-2024-28085 applies to your system, you need to know that it's derived from the Debian source package "util-linux" which is obvious when looking at version 2 of the SBOM, but not in version 1. (You could search for the *.dsc file in version 1 and guess the package name from it, but my experience is such heuristics will fail for corner cases sooner or later.) Also, if you want to download the source files for your SBOM, you would usually use the Debian command "apt-get source util-linux" which again requires that you know the name of your source package, not the single files it consists of. So in a nutshell what I want to express is a "source package" which doesn't correspong to a single physical file. Another use case for us is to create SBOMS which only list source packages, e.g. describing an all-sources.zip you distribute together with a product. Having it only listing the file names in the ZIP file doesn't really help much. What you are interested instead is which source packages are contained in all-sources.zip. I hope this makes our use case a bit more tangible? |
so a source package is "compiled" into a dist-package, and this dist-package is the thing that is installed and used? This means, that the source is an intermediate, a thing that is never shipped? sounds like process-related internals no downstream user should care about, so they should not be part of an SBOM, but part of individual MBOMs. so if i had an SBOM of a Linux system, I would expect to see the installed packages - each as a component. |
Regarding the original request, adding a component type "source":
Either the source is a file, or actually a process that can be described with |
Most Linux distributions ship source packages in a dedicated place, see e.g. https://packages.debian.org/source/bookworm/openssl or http://download.fedoraproject.org/pub/fedora/linux/releases/41/Everything/source/tree/. And according to the GPL license, we as company shipping Linux-based products also have to provide our customers with the source packages. So if we build a Linux image, we also build a source-bundle at the same time. Now, both artifacts tend to go different routes, e.g. to the factory flashing the image into the devices and the web team providing the source-bundle as download. So the source bundle is an artifact on its own, often accompanied by an SBOM. And the information about source packages is not only an intermediate thing, but of particular interest. Linux security incidents are usually tracked against source packages - CVE-2024-28085 as well as the Debian security tracker only tell you which source packages are affected, so you want to get this information from your SBOM.
As both an integrator and end user, the process how the component was made is usually a black box for me. I just take the binary package and the source package from the Linux distributor and trust they correspond to each other. Only if I want to change something, I run some specific tool which knows how to re-create the binary package from the source package. So in my eyes, the source packages have their own "lifecycle", so they should also be considered as "first-class" component in an SBOM. And only in some cases, "one source package" means "one file", so using the "file" type doesn't really describe it. (You could also describe a Windows "application" component as a series of *.exe and *.dll "files" in an SBOM, but that's usually not the abstraction level you're interested in). |
Well, that depends on the abstraction level you are interested in. Basically any application, library, OS or framework is a series of "files" in the end, but you usually prefer to have some abstract SBOM instead of a directory listing. ;-) |
Allow listing of "source" type components
Creating SBOMs for source collections is valuable, e.g. if you (have to) provide a source code bundle for OSS components you ship in a product. To partly address this need, I previously contributed the qualifier
arch=source
for Debian packages to the PURL specification.SPDX provides
primaryPackagePurpose
SOURCE
for such use cases. (Note that other "package purposes" closely align with CycloneDX's component types.)A dedicated "source" component type would enable defining binaries, sources, and their dependencies elegantly within a single SBOM.
For a real world example, see our contribution to a Debian OS file system builder: SBOM Generation for isar. There, we generate both CycloneDX and SPDX SBOMs – with SPDX covering source and binary packages with their relationship, while the CycloneDX SBOM covers binary packages only due to the missing "source" component type.
Possible solutions
I think the straightforward solution is adding "source" to the list of possible component types. This approach aligns with SPDX's model and would enable easy, lossless conversion between the formats.
Alternatives
Alternatively, we could use the generic component type
file
, but I think sources are specific and important enough to warrant their own component type.Currently, we work around this limitation using a property
sbomNature
in our taxonomy to indicate that all components in an SBOM are sources. However, this is confusing for both human and coded readers, and prevents proper specification of source-binary dependencies.The text was updated successfully, but these errors were encountered: