-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LicenseRef support #1148
base: master
Are you sure you want to change the base?
Add LicenseRef support #1148
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a question about the test results.
@@ -58,7 +58,7 @@ describe('ScanCodeNewSummarizer basic compatability', () => { | |||
const coordinates = { type: 'pypi', provider: 'pypi' } | |||
const harvestData = getHarvestData(scancodeVersion, 'pypi-complex-declared-license') | |||
const result = summarizer.summarize(coordinates, harvestData) | |||
assert.equal(result.licensed.declared, 'HPND') | |||
assert.equal(result.licensed.declared, 'LicenseRef-scancode-secret-labs-2011') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test originally had a license HPND
. And now it is just LicenseRef-scancode-secret-labs-2011
. Where did the original license come from? I would have thought a change would end up something like HPND AND LicenseRef-scancode-secret-labs-2011
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secret-labs-2011
is the declared license according to the raw ScanCode results. Before this change, our logic fell back to the first package's declared license which is HPND
.
I'm not sure which is the ultimately correct one but we need this change to surface the ScanCode result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to be sure I am understanding logic fell back to the first package's declared license
correctly. Looking at the fixture, I see a license under summary which seems like the correct license...
"summary": {
"declared_license_expression": "secret-labs-2011",
and farther down, I see the first package (transient dependency) has the HPND
as its declared license...
"packages": [
{
"type": "pypi",
"namespace": null,
"name": "Pillow",
"version": "9.5.0",
...
"declared_license_expression": "historical",
"declared_license_expression_spdx": "HPND",
It would be interesting to understand if that is a correct interpretation of how HPND
was identified as the license and why that approach was chosen. To me, that doesn't seem correct as that is the license for Pillow 9.5.0.
@qtomlinson any insights into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v30 result (line 760-766) shows content.packages[0].declared_license as HPND
"license_expression": "historical",
"declared_license": {
"license": "HPND",
"classifiers": [
"License :: OSI Approved :: Historical Permission Notice and Disclaimer (HPND)"
]
},
Reading from content.packages[0].declared_license was the preferred way before deriving from files in v30 scancode results. So using v30 scancode, the license would be HPND.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pypi also shows "License as [OSI Approved :: Historical Permission Notice and Disclaimer (HPND)]"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed Pillow 9.5 was curated as HPND
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
secret-labs-2011
is the declared license according to the raw ScanCode results.
Just noticed another case where declared_license_expression (v32) seems to be different from what is declared from the package. Added here for documentation purposes.
32.3.0.json:
"summary": {
"declared_license_expression": "cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0",
...
package[0]
"declared_license_expression": "gpl-2.0-plus AND gpl-2.0",
"declared_license_expression_spdx": "GPL-2.0-or-later AND GPL-2.0-only",
...
files:
{
"path": "pylint-3.2.3/LICENSE",
"detected_license_expression": "gpl-2.0",
"detected_license_expression_spdx": "GPL-2.0-only",
package[0]
"license_expression": "gpl-2.0-plus AND gpl-2.0",
"declared_license": {
"license": "GPL-2.0-or-later",
"classifiers": [
"License :: OSI Approved :: GNU General Public License v2 (GPLv2)"
]
},
....
files:
{
"path": "pylint-3.2.3/LICENSE",
"key": "gpl-2.0",
"cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0" in v32 is different from "gpl-2.0-plus AND gpl-2.0" in v30
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way, I think all of this cases are bugs/regressions in ScanCode, right? Meaning, our code is behaving as expected here, just producing unexpected/wrong results based on the underlying raw data 🤔
lib/utils.js
Outdated
// parse() checks for LicenseRef- and other special types of expressions before calling the visitor | ||
// therefore use the mapped license expression as an argument if it was found | ||
const mappedLicenseExpression = scancodeMap.get(rawLicenseExpression) | ||
const parsed = SPDX.parse(mappedLicenseExpression || rawLicenseExpression || '', licenseVisitor) | ||
const result = SPDX.stringify(parsed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... Lookup by scancodeMap.get(rawLicenseExpression) helps to convert a leaf expression node to a mapped value, e.g. afpl-9.0
can be converted to LicenseRef-scancode-afpl-9.0
. If rawLicenseExpression is afl-1.1 AND afpl-9.0
, there will still be problems.
describe('normalizeLicenseExpression', () => {
it('should normalize license', () => {
const expression = 'MIT AND GPL-3.0'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('MIT AND GPL-3.0')
})
it('should normalize single licenseRef', () => {
const expression = 'afpl-9.0'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('LicenseRef-scancode-afpl-9.0')
})
it('should normalize license and licenseRef', () => {
const expression = 'afl-1.1 AND afpl-9.0'
const result = utils.normalizeLicenseExpression(expression)
expect(result).to.eq('AFL-1.1 AND LicenseRef-scancode-afpl-9.0')// This one fails with 'AFL-1.1 AND NOASSERTION'
})
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked up ALF-1.1
and confirmed that it is an SPDX registered license. ALPL-9.0
is not, so the expectation is correctly written. The question is why the second part of the last test was incorrectly assigned NOASSERTION
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third commit adds tests and processing for complex licenses. This fixes the issue identified in this thread.
Thanks @qtomlinson for the tests. Made it very easy to locate and fix the problem.
A couple comments:
Also, do you mind filing issues ScanCode Toolkit? This will not be tracked otherwise! |
if (result === 'NOASSERTION') logger.info(`ScanCode NOASSERTION from ${rawLicenseExpression}`) | ||
|
||
return result | ||
} | ||
|
||
function _normalizeParsedLicenseExpression(parsedLicenseExpression, logger) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to implement this logic in SPDX or in spdx-expression-parse? Similar to the licenseVisitor in parseLicense, would it be beneficial to introduce a licenseRefVisitor in parseLicenseRef? For example, SPDX.parse(rawLicenseExpression || '', licenseVisitor, licenseRefVisitor)
, where licenseRefVisitor converts the licenseRef based on scancodeMap via scancodeMap.get(licenseRef)
. It may be a good idea to consider implementing the fix in the forked spdx-expression-parse, as it would be a smaller fix and better encapsulation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Draft PR see clearlydefined/spdx-expression-parse.js#9
This is a new version of lumaxis#2
Discussion around this at #1096 (comment)