Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LicenseRef support #1148

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Add LicenseRef support #1148

wants to merge 3 commits into from

Conversation

lumaxis
Copy link
Contributor

@lumaxis lumaxis commented Jun 26, 2024

This is a new version of lumaxis#2

Discussion around this at #1096 (comment)

Copy link
Collaborator

@elrayle elrayle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a question about the test results.

@@ -58,7 +58,7 @@ describe('ScanCodeNewSummarizer basic compatability', () => {
const coordinates = { type: 'pypi', provider: 'pypi' }
const harvestData = getHarvestData(scancodeVersion, 'pypi-complex-declared-license')
const result = summarizer.summarize(coordinates, harvestData)
assert.equal(result.licensed.declared, 'HPND')
assert.equal(result.licensed.declared, 'LicenseRef-scancode-secret-labs-2011')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test originally had a license HPND. And now it is just LicenseRef-scancode-secret-labs-2011. Where did the original license come from? I would have thought a change would end up something like HPND AND LicenseRef-scancode-secret-labs-2011.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret-labs-2011 is the declared license according to the raw ScanCode results. Before this change, our logic fell back to the first package's declared license which is HPND.

I'm not sure which is the ultimately correct one but we need this change to surface the ScanCode result.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to be sure I am understanding logic fell back to the first package's declared license correctly. Looking at the fixture, I see a license under summary which seems like the correct license...

    "summary": {
      "declared_license_expression": "secret-labs-2011",

and farther down, I see the first package (transient dependency) has the HPND as its declared license...

    "packages": [
      {
        "type": "pypi",
        "namespace": null,
        "name": "Pillow",
        "version": "9.5.0",
        ...
        "declared_license_expression": "historical",
        "declared_license_expression_spdx": "HPND",

It would be interesting to understand if that is a correct interpretation of how HPND was identified as the license and why that approach was chosen. To me, that doesn't seem correct as that is the license for Pillow 9.5.0.

@qtomlinson any insights into this?

Copy link
Collaborator

@qtomlinson qtomlinson Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v30 result (line 760-766) shows content.packages[0].declared_license as HPND

          "license_expression": "historical",
          "declared_license": {
            "license": "HPND",
            "classifiers": [
              "License :: OSI Approved :: Historical Permission Notice and Disclaimer (HPND)"
            ]
          },

Reading from content.packages[0].declared_license was the preferred way before deriving from files in v30 scancode results. So using v30 scancode, the license would be HPND.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pypi also shows "License as [OSI Approved :: Historical Permission Notice and Disclaimer (HPND)]"

Copy link
Collaborator

@qtomlinson qtomlinson Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed Pillow 9.5 was curated as HPND

Copy link
Collaborator

@qtomlinson qtomlinson Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret-labs-2011 is the declared license according to the raw ScanCode results.

Just noticed another case where declared_license_expression (v32) seems to be different from what is declared from the package. Added here for documentation purposes.
32.3.0.json:

    "summary": {
      "declared_license_expression": "cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0",
	  ...
    package[0]
        "declared_license_expression": "gpl-2.0-plus AND gpl-2.0",
        "declared_license_expression_spdx": "GPL-2.0-or-later AND GPL-2.0-only",
	...
    files:
    {
        "path": "pylint-3.2.3/LICENSE",
        "detected_license_expression": "gpl-2.0",
        "detected_license_expression_spdx": "GPL-2.0-only",

30.3.0.json:

    package[0]					
	"license_expression": "gpl-2.0-plus AND gpl-2.0",
	"declared_license": {
		"license": "GPL-2.0-or-later",
		"classifiers": [
			"License :: OSI Approved :: GNU General Public License v2 (GPLv2)"
		]
	},
	....
    files: 
    {
         "path": "pylint-3.2.3/LICENSE",
		"key": "gpl-2.0",

"cc-by-4.0 AND cc-by-sa-4.0 AND gpl-2.0" in v32 is different from "gpl-2.0-plus AND gpl-2.0" in v30

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, I think all of this cases are bugs/regressions in ScanCode, right? Meaning, our code is behaving as expected here, just producing unexpected/wrong results based on the underlying raw data 🤔

lib/utils.js Outdated
// parse() checks for LicenseRef- and other special types of expressions before calling the visitor
// therefore use the mapped license expression as an argument if it was found
const mappedLicenseExpression = scancodeMap.get(rawLicenseExpression)
const parsed = SPDX.parse(mappedLicenseExpression || rawLicenseExpression || '', licenseVisitor)
const result = SPDX.stringify(parsed)
Copy link
Collaborator

@qtomlinson qtomlinson Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm... Lookup by scancodeMap.get(rawLicenseExpression) helps to convert a leaf expression node to a mapped value, e.g. afpl-9.0 can be converted to LicenseRef-scancode-afpl-9.0. If rawLicenseExpression is afl-1.1 AND afpl-9.0, there will still be problems.

describe('normalizeLicenseExpression', () => {
  it('should normalize license', () => {
    const expression = 'MIT AND GPL-3.0'
    const result = utils.normalizeLicenseExpression(expression)
    expect(result).to.eq('MIT AND GPL-3.0')
  })
  it('should normalize single licenseRef', () => {
    const expression = 'afpl-9.0'
    const result = utils.normalizeLicenseExpression(expression)
    expect(result).to.eq('LicenseRef-scancode-afpl-9.0')
  })
  it('should normalize license and licenseRef', () => {
    const expression = 'afl-1.1 AND afpl-9.0'
    const result = utils.normalizeLicenseExpression(expression)
    expect(result).to.eq('AFL-1.1 AND LicenseRef-scancode-afpl-9.0')// This one fails with 'AFL-1.1 AND NOASSERTION' 
  })
})

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up ALF-1.1 and confirmed that it is an SPDX registered license. ALPL-9.0 is not, so the expectation is correctly written. The question is why the second part of the last test was incorrectly assigned NOASSERTION.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third commit adds tests and processing for complex licenses. This fixes the issue identified in this thread.

Thanks @qtomlinson for the tests. Made it very easy to locate and fix the problem.

@pombredanne
Copy link
Member

@elrayle

A couple comments:

  1. About Add LicenseRef support #1148 (comment) The latest ScanCode version has SPDX license expressions side-by-side with the ScanCode license expression. This cannot be guessed from the ScanCode license keys alone, this is a mapping that is explicitly kept in the license database at https://scancode-licensedb.aboutcode.org/ and at https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses

  2. About Add LicenseRef support #1148 (comment) I see this is likely for a wheel or tar of pkg:pypi/[email protected]. For a top level declared package, I would use the --package (and I would always use also these together with --license : --license-text-diagnostics --license-text --license-diagnostics to get the correct details. Note that since version 30.1.0 there have been quite a few changes aboutcode-org/scancode-toolkit@v30.1.0...develop Showing 83,666 changed files with 1,730,392 additions and 515,652 deletions. which is rather big. There have been major improvements to the license detection since. There could be bugs there too, but from a quick look, the correct declared license is GPL and there is other files under CC-BY-SA (two doc icons). The pylint project has otherwise a poor license clarity for its wheel and a much better one for its source distribution. See attached PDFs (using ScanCode.io).
    ScanCode.io_ 3fa2a1f68d.f95201a764.ce71b4612d.pdf
    ScanCode.io_ 258179e40f.f95201a764.ce71b4612d.pdf

  3. About Add LicenseRef support #1148 (comment) ... what are the download URLs? Based on a scan of one of the wheel of pkg:pypi/[email protected] this is detected as an HPND/historical license alright as declared, but there are other licenses involved too. Pillow did change the text of their license and the name they use to refer to it too. See for instance python-pillow/Pillow@997932b ... AFAIRC older versions of PIL and Pillow were using LicenseRef-scancode-secret-labs-2011 https://scancode-licensedb.aboutcode.org/secret-labs-2011.html until they clarified this as being a plain HPND. It may not have not met exactly the SPDX matching guidelines by then.

Also, do you mind filing issues ScanCode Toolkit? This will not be tracked otherwise!

if (result === 'NOASSERTION') logger.info(`ScanCode NOASSERTION from ${rawLicenseExpression}`)

return result
}

function _normalizeParsedLicenseExpression(parsedLicenseExpression, logger) {
Copy link
Collaborator

@qtomlinson qtomlinson Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to implement this logic in SPDX or in spdx-expression-parse? Similar to the licenseVisitor in parseLicense, would it be beneficial to introduce a licenseRefVisitor in parseLicenseRef? For example, SPDX.parse(rawLicenseExpression || '', licenseVisitor, licenseRefVisitor), where licenseRefVisitor converts the licenseRef based on scancodeMap via scancodeMap.get(licenseRef). It may be a good idea to consider implementing the fix in the forked spdx-expression-parse, as it would be a smaller fix and better encapsulation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants