Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sha256 uniqueness to CollectionVersion #1815

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mdellweg
Copy link
Member

No description provided.

@mdellweg mdellweg force-pushed the 1052_uniqueness branch 8 times, most recently from e97a863 to 775fb7c Compare April 16, 2024 10:22
@mdellweg mdellweg force-pushed the 1052_uniqueness branch 2 times, most recently from 03bef99 to baf54a1 Compare April 16, 2024 12:33
try:
await sync_to_async(artifact.save)()
except IntegrityError:
artifact = Artifact.objects.get(sha256=artifact.sha256)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line would never have worked in an async context.

Copy link
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is based on my original work on this, but there was discussion back then on how the sha256 for the Collection should be the digest of the MANIFEST.json file inside the tarball. I actually never got around to finishing all of the changes to my PR, so most of my comments here is me pointing out all the areas I didn't change it.

#1119 (comment)

)

return col
return CollectionVersion.objects.filter(sha256=validated_data["sha256"]).first()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is so much more simple.

Comment on lines +538 to +569
# There was an autogenerated validator rendering sha256 required.
validators = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a really annoying feature of DRF. It tripped me up a lot when I was originally adding domains.

except IntegrityError:
artifact = Artifact.objects.get(sha256=artifact.sha256)
if existing_artifact := await Artifact.objects.filter(sha256=artifact.sha256).afirst():
existing_artifact.touch()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be wrapped in a sync_to_async or do we have a atouch?

pulp_ansible/app/serializers.py Show resolved Hide resolved
pulp_ansible/app/tasks/collections.py Outdated Show resolved Hide resolved
for key in ["description", "documentation", "homepage", "issues", "repository"]:
if collection_info[key] is None:
collection_info.pop(key)
collection, created = Collection.objects.get_or_create(
Copy link
Contributor

@gerrod3 gerrod3 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a database operation? This breaks the comment about metadata_only=True.

pulp_ansible/app/tasks/collections.py Show resolved Hide resolved
Comment on lines +1184 to +1214
if artifact_file_name := d_artifact_files.get(d_artifact):
artifact_file = open(artifact_file_name, mode="rb")
Copy link
Contributor

@gerrod3 gerrod3 Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this logic. Can you help explain it? Edit Looking at the original PR I was the one who wrote this... Well present me can't remember what past me intended with this logic.

pulp_ansible/app/tasks/upload.py Show resolved Hide resolved
@mdellweg
Copy link
Member Author

I know this is based on my original work on this, but there was discussion back then on how the sha256 for the Collection should be the digest of the MANIFEST.json file inside the tarball. I actually never got around to finishing all of the changes to my PR, so most of my comments here is me pointing out all the areas I didn't change it.

#1119 (comment)

The reason raised for this idea was: A user can rebuild the collection and reupload it. After that they may be confused to find two content units with the same (namespace, name, version) but different artifacts. The whole idea of this pr is to allow having different artifacts claiming the same (namespace, name, version). I as a user would on the other hand be most suspicious if i built a collection, uploaded it downloaded it again and then got a different artifact.
I'd like to revisit that discussion, as it probably decides on whether we can ever implement on_demand sync in a sane way.

@mdellweg mdellweg force-pushed the 1052_uniqueness branch 4 times, most recently from 80667d2 to b09eed7 Compare May 7, 2024 10:48
@mdellweg mdellweg linked an issue May 8, 2024 that may be closed by this pull request
@mdellweg mdellweg force-pushed the 1052_uniqueness branch 2 times, most recently from dcd8b05 to 2733da5 Compare May 13, 2024 14:31
@mdellweg mdellweg force-pushed the 1052_uniqueness branch 2 times, most recently from d857fe1 to b4ded81 Compare May 27, 2024 13:19
This should be ZDU compatible. The transition will be completed with the
next y-release.

fixes: pulp#1052

Co-authored-by: Matthias Dellweg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Content should have a digest in the (system-)uniqueness constraints
2 participants