Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better ergonomics around updating car root. #245

Closed
willscott opened this issue Sep 29, 2021 · 13 comments · Fixed by #250
Closed

Better ergonomics around updating car root. #245

willscott opened this issue Sep 29, 2021 · 13 comments · Fixed by #250
Assignees

Comments

@willscott
Copy link
Member

willscott commented Sep 29, 2021

The v2/blockstore package requires the caller to specify the car root []cid.Cid on opening the file.
In the common case, the caller will not know their intended root until the end of writing.

This currently seems to require manual file editing by the caller. I've tried finalizing, and the re-opening via

cdest, err = blockstore.OpenReadWrite(c.String("file"), []cid.Cid{root})

specifying the actual root CID, where on the initial write I instead generate a proxy CID that I know has the same length as the final CID will. This fails, telling me cannot resume on file with mismatching data header. Forcing me to manually go in and edit the root unsupported seems like a failing in this library interface.

@rvagg
Copy link
Member

rvagg commented Sep 30, 2021

There was another thread about this somewhere in this repo. The API I went with in JS is called CarWriter.updateRootsInBytes() (in-memory form, like the browser) and CarWriter.updateRootsInFile() - for basically the same reason as you want this API I think @willscott since it's needed by ipfs-car to do UnixFS packing. This was kept as a discrete API that requires intentional usage because of the weirdness around having to make sure the header ends up being exactly the same length. IMO it shouldn't be easy to accidentally do this, nor should possible errors regarding length be surprising.

@masih
Copy link
Member

masih commented Sep 30, 2021

relates to #196

@mvdan
Copy link
Contributor

mvdan commented Sep 30, 2021

I think adding an API to amend the roots in a CAR file after-the-fact should be fine. I guess the only wrinkle is that it might need to shift the entire contents of the CAR file if there isn't enough space for the new header before the start of the inner CARv1.

@rvagg
Copy link
Member

rvagg commented Sep 30, 2021

https://github.com/ipld/js-car/blob/9fb8dd81e3600501f62848e97fb62b977766aa2d/lib/writer.js#L75

In JS we've just made that not allowed - you need to plan ahead. If you're using a blake2b for your CIDs then you should use a dummy that's of the same length and not a sha2-256 which is going to be a byte shorter.

@masih
Copy link
Member

masih commented Sep 30, 2021

If we take the temporary file approach for writing a CARv2 suggested in #249, then I wonder if we can structure data such that deferring the writing of roots is much easier. We could write inner CARv1 data sections separately or write inner CARv1 header at the end, then re-assemble the file in finalisation to conform with CARv2 format.

Taking a step further, we could also use the same technique to avoid storing index in memory or re-generate it on resumption. But that's a separate kettle of fish.

@mvdan
Copy link
Contributor

mvdan commented Sep 30, 2021

@masih you're right, we could get more clever if we go with #249 for supplying the roots at the very end.

That said, I think we need a separate fix that's simpler and more like what JS provides. Given a CAR file, update its roots list. Similar to JS, we could make it so that it just errors if the new header does not fit in the space we have (i.e. in the space up to DataOffset).

This could be useful for other use cases aside from OpenReadWrite, too, which is why I think this separate bit of API would be nice. The API signature could be somewhat similar to WrapV1File.

@masih
Copy link
Member

masih commented Sep 30, 2021

Yup that makes sense 👍

@mvdan
Copy link
Contributor

mvdan commented Sep 30, 2021

Scratch what I said about "fits up to DataOffset" - the roots are in the inner CarV1 header, so I think Rod is right - the roots list must be exactly the same size as before.

@willscott
Copy link
Member Author

the point of this whole setup is to not re-write the whole car.
As a user, I don't want to potentially shift / re-write the whole car while doing this, since I've put a bunch of work into calculating the size of my final CID in advance to let the car library know how much space to make the root already.

that said, you could do something tricky by initially putting some extra data offset padding.
then, if roots grow, you can shift the carv1 header to start earlier and just re-write the v1 header so it continues to end at the same place. I would not do that for now.

@mvdan
Copy link
Contributor

mvdan commented Sep 30, 2021

In terms of nice API, maybe we could have a companion to OpenReadWrite to create space for dummy roots:

func PreallocateCidV1Roots(number int, codec, mhType multicodec.Code) []cid.Cid

We could fill the multihash digests with some special sequence of bytes that we can later recognise, to have commands like car verify say "you preallocated the roots but never filled them in", or to make ReadWrite.Finalize refuse to finalize until the final roots are provided.

And, of course, it would let the user provide a list of "preallocating roots" in a single line, without having to work out how to hold go-cid right (e.g. with dummy data to hash).

masih added a commit that referenced this issue Sep 30, 2021
Implement an API that allows a caller to replace root CIDs in an
existing CAR file, may it be v1 or v2, as long as the resulting
serialized header is of identical size to the existing header.

Assert that the new API works in a variety of CARv1 and CARv2 files
along with failure scenarios.

Fixed #245
@masih
Copy link
Member

masih commented Sep 30, 2021

I like the PreallocateCidV1Roots idea 👍
I wonder if it would make a nicer API if we make PreallocateCidV1Roots an Option.

@mvdan
Copy link
Contributor

mvdan commented Sep 30, 2021

@masih maybe, though then you have to worry about people using PreallocateCidV1Roots when they shouldn't be (e.g. in OpenReadOnly), or using PreallocateCidV1Roots with OpenReadWrite while also providing non-zero roots. The non-option function can only be used in a single way, given our current API.

@masih
Copy link
Member

masih commented Sep 30, 2021

using PreallocateCidV1Roots when they shouldn't be (e.g. in OpenReadOnly)

This is true today about our existing options. Irrelevant options are ignored.

using PreallocateCidV1Roots with OpenReadWrite while also providing non-zero roots

For this my thought was to explicitly document if option is set given roots must be empty or something along those lines.

Either approach works.

masih added a commit that referenced this issue Sep 30, 2021
Implement an API that allows a caller to replace root CIDs in an
existing CAR file, may it be v1 or v2, as long as the resulting
serialized header is of identical size to the existing header.

Assert that the new API works in a variety of CARv1 and CARv2 files
along with failure scenarios.

Fixes #245
masih added a commit that referenced this issue Sep 30, 2021
Implement an API that allows a caller to replace root CIDs in an
existing CAR file, may it be v1 or v2, as long as the resulting
serialized header is of identical size to the existing header.

Assert that the new API works in a variety of CARv1 and CARv2 files
along with failure scenarios.

Fixes #245
@mvdan mvdan closed this as completed in #250 Oct 1, 2021
mvdan pushed a commit that referenced this issue Oct 1, 2021
Implement an API that allows a caller to replace root CIDs in an
existing CAR file, may it be v1 or v2, as long as the resulting
serialized header is of identical size to the existing header.

Assert that the new API works in a variety of CARv1 and CARv2 files
along with failure scenarios.

Fixes #245
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants