Skip to content

License validity check incompatible with CycloneDX 1.4 #747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Joerki opened this issue Dec 13, 2024 · 6 comments
Closed

License validity check incompatible with CycloneDX 1.4 #747

Joerki opened this issue Dec 13, 2024 · 6 comments
Labels
question Further information is requested

Comments

@Joerki
Copy link

Joerki commented Dec 13, 2024

Hi guys,

In
cyclonedx/model/bom.py, class Bom, method validate starting from there is a check that is NOT compatible with CycloneDX 1.4.
It is applicable only for CycloneDX 1.5 and 1.6.

    # 3. If a LicenseExpression is set, then there must be no other license.
    # see https://github.com/CycloneDX/specification/pull/205

Up to 1.4 we have a list of all licenses, each item may have a license id/name or SPDX expression.
Starting with 1.5 we have a list of license id/name element OR a SINGLE SPDX expression.
A significant difference.

This is a true problem since with the definition of CycloneDX 1.5 there is a significant design fault in CycloneDX from my point of view, which makes it more or less impossiible to create components that both have simple/single and compound SPDX expressions (I recommend to read the SPDX annex chapters of their spec).

The only chance to not get messed up with 1.5 and above is to stick with 1.4.
But unfortunately, the check - which does not consider the CycloneDX version - makes it impossible to use this library. :(

@jkowalleck
Copy link
Member

jkowalleck commented Dec 13, 2024

It is applicable only for CycloneDX 1.5 and 1.6.
[...]
Up to 1.4 we have a list of all licenses, each item may have a license id/name or SPDX expression.

Are you sure about that?
see https://github.com/CycloneDX/specification/blob/fcc722db8140bc93f1cee57cb8410a842826bb5a/schema/bom-1.4.xsd#L1400-L1410
or https://github.com/CycloneDX/specification/blob/fcc722db8140bc93f1cee57cb8410a842826bb5a/schema/bom-1.3.xsd#L1368-L1378

    <xs:complexType name="licenseChoiceType">
        <xs:choice>
            <xs:element name="license" type="bom:licenseType" minOccurs="0" maxOccurs="unbounded"/>
            <xs:element name="expression" type="xs:normalizedString" minOccurs="0" maxOccurs="1">
                <xs:annotation>
                    <xs:documentation>A valid SPDX license expression.
                        Refer to https://spdx.org/specifications for syntax requirements</xs:documentation>
                </xs:annotation>
            </xs:element>
        </xs:choice>
    </xs:complexType>

Licenses is either a list of 0..n license(bom:licenseType) or 0..1 expression(xs:normalizedString).


This is a true problem since with the definition of CycloneDX 1.5 there is a significant design fault in CycloneDX from my point of view, which makes it more or less impossiible to create components that both have simple/single and compound SPDX expressions (I recommend to read the SPDX annex chapters of their spec).

What's the matter? I don't see any issue here.
See, if you had a component that was licensed under MIT or under LGPL3+ WITH PCRE2-exception you could write it as MIT OR (LGPL3+ WITH PCRE2-exception). SPDX expressions are very flexible and verbose. Why would you need multiple "simple/single" licenses and multiple expression at the same time, when you could convert this composition to a single expression?

Anyway, this ticket is not the place to discuss the specification. please keep it to CycloneDX/specification#349 (comment)

@jkowalleck jkowalleck added the question Further information is requested label Dec 13, 2024
@Joerki
Copy link
Author

Joerki commented Dec 14, 2024

Hi Jan,

OK, I agree that this place is not a discussion about the spec, but I want to answer to your question. Please let me show this with the JSON Schema. Here you see that the array definition has "moved".

JSON Schema CycloneDX 1.4

"licenses": {
  "type": "array",
  "title": "BOM License(s)",
  "additionalItems": false,
  "items": {"$ref": "#/definitions/licenseChoice"}
}

"licenseChoice": {
  "type": "object",
  "title": "License(s)",
  "additionalProperties": false,
  "properties": {
	"license": {
	  "$ref": "#/definitions/license"
	},
	"expression": {
	  "type": "string",
	  "title": "SPDX License Expression",
	  "examples": [
		"Apache-2.0 AND (MIT OR GPL-2.0-only)",
		"GPL-3.0-only WITH Classpath-exception-2.0"
	  ]
	}
  },
  "oneOf":[
	{
	  "required": ["license"]
	},
	{
	  "required": ["expression"]
	}
  ]
}
	
JSON Schema CycloneDX 1.5

"licenses": {
  "$ref": "#/definitions/licenseChoice",
  "title": "Component License(s)"
}
	
	
"licenseChoice": {
  "title": "License Choice",
  "description": "EITHER (list of SPDX licenses and/or named licenses) OR (tuple of one SPDX License Expression)",
  "type": "array",
  "oneOf": [
	{
	  "title": "Multiple licenses",
	  "description": "A list of SPDX licenses and/or named licenses.",
	  "type": "array",
	  "items": {
		"type": "object",
		"required": ["license"],
		"additionalProperties": false,
		"properties": {
		  "license": {"$ref": "#/definitions/license"}
		}
	  }
	},
	{
	  "title": "SPDX License Expression",
	  "description": "A tuple of exactly one SPDX License Expression.",
	  "type": "array",
	  "additionalItems": false,
	  "minItems": 1,
	  "maxItems": 1,
	  "items": [{
		"type": "object",
		"additionalProperties": false,
		"required": ["expression"],
		"properties": {
		  "expression": {
			"type": "string",
			"title": "SPDX License Expression",
			"examples": [
			  "Apache-2.0 AND (MIT OR GPL-2.0-only)",
			  "GPL-3.0-only WITH Classpath-exception-2.0"
			]
		  },
		  "bom-ref": {
			"$ref": "#/definitions/refType",
			"title": "BOM Reference",
			"description": "An optional identifier which can be used to reference the license elsewhere in the BOM. Every bom-ref MUST be unique within the BOM."
		  }
		}
	  }]
	}
  ]
}

So this means that with the definition of 1.4 it is totally fine a list with both license expressions with items that are either a "license" or "expression".

@jkowalleck
Copy link
Member

jkowalleck commented Dec 14, 2024

please understand that CycloneDX is not a schema, it is a specification.
And the JSON implementation of 1.4 - which is that schema you just showed - had a "bug". It did not reflect the actual specification. So this was fixed in v1.5.

This very python library intends to work with the specification.
when it comes to the specification, this library is gracious on import/ingest, but is hard on output. It allows "wrong" input to a certain extend, but it tries to prevent wrong output.


So, anyway, I read that you understood that your original request is solved - the implemented license validity check is compatible with CycloneDX 1.4.
I will close this issue. Not locked, (non-spec related) discussions may continue, if needed.

(please be aware of license-related upcoming changes in the spec, which will cause new interesting challenges in this very library.)

@Joerki
Copy link
Author

Joerki commented Jan 6, 2025

OK, I just focus on the current implementation.

We are both using the license-expression library from aboutcode-org (formerly nexB), but in a different way.
I don't use the default JSON data containing the license data (Lib/site-packages/license_expression/data/scancode-license-index.json) and initialize the module with my own JSON file that I can extend and manage.

The file spdx.py has a function named is_component_expression. It invokes __SPDX_EXPRESSION_LICENSING.validate(value).
But the function is_compound_expression is not returning what the name indicates for several cases.

The validate checks whether the expression is formally corrent (simple/compound license: e.g X, X OR Y) AND whether the given simple license expressions (X, Y) can be found in the own database.
But this does not indicate in general whether the given expression string is a compound statement or not.

If the expression string is an SPDX simple expression string that can be found in the internal JSON file, that one is accepted as valid SPDX expression, e.g. LicenseRef-scancode-aardvark-py-2014. The resulting CycloneDX puts this as expression into the generated SBOM. This is the correct behavior.
But it is not a component expression as indicated by the method name is_compound_expression.

LicenseRef-scancode- is a prefix AboutCode (nexB) is using for their own database, the SPDX standard specifies LicenseRef-* for
custom simple expressions. AboutCode's IDs are compatible with SPDX's defintion, but the license-expression library does not assume LicenseRef-other-custom-license expressions as valid custom expression (since such an does not appear in its database).
But according to the SPDX specification it IS a correct simple expression.

There is no way to initialize the cyclonedx library with a custom JSON datafile for license-expression. With this option in this it would be possible to manage LicenseRef-* items that are not provided by license-expression itself (starting with LicenseRef-scancode-*).

If the given expression passed to is_compound_expression like GPL-1.0-only or LGPL-3.3-only I would expect that an exception is raised in the function and properly handled later.
The created CycloneDX contains faulty data GPL-1.0-only or LGPL-3.3-only as license name (!) and the user does not get a complaint. This should be clearly avoided.

@jkowalleck
Copy link
Member

jkowalleck commented Jan 7, 2025

There is no way to initialize the cyclonedx library with a custom JSON datafile for license-expression. With this option in this it would be possible to manage LicenseRef-* items that are not provided by license-expression itself (starting with LicenseRef-scancode-*).

I see.
@Joerki , could you open a new ticket for that, so that we could discuss implications, expectations, (none-)use-cases and (out-of-)scopes, and testability. Lets find a feasible implementation/solution/architecture for it 👍

@jkowalleck
Copy link
Member

If the given expression passed to is_compound_expression like GPL-1.0-only or LGPL-3.3-only I would expect that an exception is raised in the function and properly handled later.
The created CycloneDX contains faulty data GPL-1.0-only or LGPL-3.3-only as license name (!) and the user does not get a complaint. This should be clearly avoided.

@Joerki , could you open another new ticket for that, so that we could discuss implications, expectations, (none-)use-cases and (out-of-)scopes, and testability?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants