-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Phil/pub conflicts deux #1662
base: master
Are you sure you want to change the base?
Phil/pub conflicts deux #1662
Conversation
30cce7e
to
c59397b
Compare
I moved this back to a draft. At this point, I think it'd be unwise to merge/deploy it until after I'm back from vacation, so I'm thinking I might as well fold in the inferred schema changes to finish out the rest of #1520 . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great so far! Really like the DraftPublication refactor.
draft, | ||
dry_run: false, | ||
detail: Some(format!("publication for data-plane {base_name}")), | ||
// TODO: Should we verify user authz for publications to create a data plane? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's currently handled at L69 at the routine entrypoint: we require the user is an admin of ops/
.
dry_run, | ||
detail: Some(format!("publication for updating L2 reporting")), | ||
default_data_plane_name: Some(default_data_plane.clone()), | ||
// TODO: should we verify user authz for updating L2 reporting? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same story, it's done first-thing by the routine, unless I'm not understanding the question.
I suppose a difference here is that the user could be an admin of ops/
but not be an admin of ops.us-central1/
and then it would fail? Meh 🤷
} | ||
} | ||
|
||
/// A `FinalizeBuild` that removes and built collection specs that are not referenced by other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supernit: removes and built => removes any built
Note that this must be used in conjuction with the
ExpandDraft
Initialize
impl, or else it can prune collections that are referenced by live specs that are not included in the draft.
Also, is it a problem that it might remove collections referenced only by pass-through live specs? I'm not seeing why.
pub struct ExpandDraft { | ||
/// Whether to filter specs based on the user's capability. If true, then only specs for which | ||
/// the user has `admin` capability will be added to the draft. | ||
pub filter_user_has_admin: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been debating whether admin
is too restrictive -- I could see an argument that Alice's update of C should be able to observe / touch-publish a materialization M for which they have read but not admin.
But ultimately I think you're right, and it should be admin, because we don't want to fail Alice's publish of C due to a materialization M that they're unable to change. That's the admin of M's problem, and they'll see a controller error about the issue.
This is a departure from my prior understanding of how spec expansion should work. In the past, I've thought that you would see specs that make use of a read grant you provide for your collection, for which you yourself have no grant. Controllers and the eventual consistency they provide have changed the narrative here.
.await?; | ||
let mut expanded_names = Vec::with_capacity(expanded_rows.len()); | ||
for exp in expanded_rows { | ||
if self.filter_user_has_admin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be pushed down into fetch_expanded_live_specs
? I'm betting that routine can be simplified a bunch given the refined / simpler authorization expectations noted above ☝️
Passing an explicit publication id was important for our previous approach to handling dependencies, but is no longer needed now that we have touch publications. With a touch, the publication id is irrelevant since the `last_pub_id` of the touched sped will not change. Removing it from the `publish` function arguments allows the actual id to be generated by the publisher, and sets us up for automatically re-trying `PublicationSuperseded` errors, which requires generating a new publication id.
Significantly refactors the `Publisher`, in order to address a number of outstanding issues. Introduces a `DraftPublication` struct that represents the desire to publish a draft, along with associated metadata and configuration. Also added the `Publisher::publish(&self, DraftPublication)` function, which handles build, commit, and retries. Fixes #1634 by bypassing authorization checks for controller-initiated publications. The `verify_user_authz` field of `DraftPublication` can be used to toggle this behavior. Takes care of a portion of #1520 by generating a new publication id for each attempted publication, rather than re-using the `publications.id` column value. This required adding a new `publications.pub_id` column to hold the effective publication id, since `publications.id` can no longer be used to join to `publication_specs` or `live_specs.last_pub_id`.
Updates the default publication retry policy to retry `BuildSuperseded` and `PublicationSuperseded` errors. This addresses a portion of #1520
Updates the agent to always update collection inferred schemas on any non-touch collection publications. This makes it impossible for a user to accidentally clobber an embedded inferred schema definition. Also removes inferred schema handling from the `validation` crate, since the drafted specs will already contain the updated schemas. Now all validation needs to do is to bundle the write schema if necessary.
c59397b
to
4e6f5ea
Compare
Resolves #1520 Updates the agent discovers handler to no longer set `expect_pub_id` for collection specs. The goal is to avoid conflicts for collections that use inferred schemas. There's no longer a risk of clobbering inferred schemas due to the previous commit, which ensures published collections always use the latest inferred schema. So it seems relatively safe to omit `expect_pub_id` for the collection specs, and doing so ensures that publications cannot fail simply due to a concurrent update of the inferred schema.
Description:
Rolls up some refactors of the publish api and the remainder of the backend work for #1520.
Build/PublicationSuperseded
errors (inferred schema updates blocking interactive publications of captures and materializations #1520)This change should not really be visible to end users.
The one caveat is that now the
publications.id
column will no longer match up withlive_specs.last_pub_id
orpublication_specs.pub_id
, since the true publication id is now determined by the publisher. I added thepublications.pub_id
column in order to allow joiningpublications
to those tables, since that's still useful for debugging.Note that the
publication_specs_ext
view is unaffected by that change, since it does not join topublications
. Thusflowctl catalog history
still works without any changes.Workflow steps: no changes
Documentation links affected: n/a
Notes for reviewers:
I'm planning to follow on soon with one last PR to finish out the remainder of #1520. The basic idea for that is to add an
Initialize
impl that will update drafted (non-touch) collection specs with their current inferred schema values. Then rip out the inferred schema updating behavior from thevalidation
crate.This change is