Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No files showing on components details page #964

Open
capfei opened this issue Mar 10, 2022 · 9 comments
Open

No files showing on components details page #964

capfei opened this issue Mar 10, 2022 · 9 comments

Comments

@qtomlinson
Copy link
Collaborator

qtomlinson commented Mar 10, 2022

The raw data for https://clearlydefined.io/definitions/pypi/pypi/-/dnspython/1.10.0 shows files as [].

  "clearlydefined": {
    "1.3.1": {
      "_metadata": {
     },
      "summaryInfo": {
     },
      "files": [],
      "registryData": {
	 ...

The issue for pypi/pypi/-/dnspython/1.10.0 is a bug in crawler: pypiFetch failed to find tar.gz file and interrupted file downloading without reporting an error.

@qtomlinson
Copy link
Collaborator

qtomlinson commented Mar 11, 2022

For github.com/linux-audit/audit-userspace/5fae55c1ad15b3cefe6890eba7311af163e9133c, and git/github/golang/crypto/c084706c2272f3d44b722e988e70d4a58e60e7f4, the reason for "no files" is that only "licensee" tool was run. In the definition page, "Tools" section shows only licensee and curation. In the raw data section, only licensee portion of the json is available. For the files to be listed properly, "clearlydefined" tool needs to be run and its corresponding json result should be available.

image

In my local environment, files are available in both cases after "source" typed harvests (clearlydefined + licensee + scancode) are completed. Harvest was also initiated on dev server and result available at: https://dev.clearlydefined.io/definitions/git/github/linux-audit/audit-userspace/5fae55c1ad15b3cefe6890eba7311af163e9133c/5fae55c1ad15b3cefe6890eba7311af163e9133c. Files are available and displayed upon completion of the harvest.

These two look like cases of incomplete harvest.

@qtomlinson
Copy link
Collaborator

definitions/pypi/pypi/-/dnspython/1.10.0: there is no download url in pypi registry for dnspython 1.10.0, so download failed. See commit message in clearlydefined/crawler#470
For the remaining partial harvest cases, need to trigger re-harvest to resolve:
-https://clearlydefined.io/definitions/git/github/golang/crypto/c084706c2272f3d44b722e988e70d4a58e60e7f4: files now available.
-retriggered harvest for git/github/linux-audit/audit-userspace/5fae55c1ad15b3cefe6890eba7311af163e9133c/5fae55c1ad15b3cefe6890eba7311af163e9133c

@bduranc
Copy link

bduranc commented Jan 13, 2023

@qtomlinson Thanks for looking into this.

https://clearlydefined.io/definitions/git/github/golang/crypto/c084706c2272f3d44b722e988e70d4a58e60e7f4

and

https://clearlydefined.io/definitions/git/github/linux-audit/audit-userspace/5fae55c1ad15b3cefe6890eba7311af163e9133c/5fae55c1ad15b3cefe6890eba7311af163e9133c

both look to have successfully harvested.

There's still with the below. I can confirm there is no download package in PyPi for this component.

https://clearlydefined.io/definitions/pypi/pypi/-/dnspython/1.10.0

Question: Is CD supposed to be showing "harvested" if the system can't find the package like in this example?

@qtomlinson
Copy link
Collaborator

@bduranc Those harvest requests will be marked missing in the crawler (See commit message at clearlydefined/crawler#470) and will not be marked as successful in the future.

@bduranc
Copy link

bduranc commented Jan 14, 2023

Thanks @qtomlinson . This is a fairly important issue since it involved scans that were "harvested" but had no files to scan (or just a LICENSE file in a few other examples I had observed previously but re-harvested). But it sounds like there is a solution in place to address at least the cases like dnspython where package download/source cannot be not found.

For the other two, where package/source is indeed available, is the best solution just to reharvest them when encountered or is there something else we can do?

@qtomlinson
Copy link
Collaborator

@bduranc If clearlydefined tool has not been completed, then re-harvesting the package is the best solution.

@bduranc
Copy link

bduranc commented Jan 16, 2023

@bduranc If clearlydefined tool has not been completed, then re-harvesting the package is the best solution.

@qtomlinson should I go ahead and create a separate issue for this then?

@qtomlinson
Copy link
Collaborator

qtomlinson commented Jan 16, 2023

@bduranc Typically, clearlydefined, reuse, licensee and scancode tools are dispatched for source components. It is possible that all four tools were dispatched, but only one tool was processed and the other three runs were somehow not successful. Retriggering harvests can verify whether a potential issue exists. The re-harvested data are now available and seem ok.

Alternatively, a user has the option to run harvest with a specific tool (e.g. licensee or scancode) via REST api. In that case, only the result for the user specified tool is available (as expected). The two components listed here might have been cases of harvesting with a specific tool (licensee). To get the complete definition, retriggering harvest with all tools is the solution for that scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants