Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phil/pub conflicts deux #1662

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Phil/pub conflicts deux #1662

wants to merge 5 commits into from

Conversation

psFried
Copy link
Member

@psFried psFried commented Sep 26, 2024

Description:

Rolls up some refactors of the publish api and the remainder of the backend work for #1520.

This change should not really be visible to end users.
The one caveat is that now the publications.id column will no longer match up with live_specs.last_pub_id or publication_specs.pub_id, since the true publication id is now determined by the publisher. I added the publications.pub_id column in order to allow joining publications to those tables, since that's still useful for debugging.

Note that the publication_specs_ext view is unaffected by that change, since it does not join to publications. Thus flowctl catalog history still works without any changes.

Workflow steps: no changes

Documentation links affected: n/a

Notes for reviewers:

I'm planning to follow on soon with one last PR to finish out the remainder of #1520. The basic idea for that is to add an Initialize impl that will update drafted (non-touch) collection specs with their current inferred schema values. Then rip out the inferred schema updating behavior from the validation crate.


This change is Reviewable

@psFried psFried force-pushed the phil/pub-conflicts-deux branch 2 times, most recently from 30cce7e to c59397b Compare September 27, 2024 16:21
@psFried psFried marked this pull request as draft September 27, 2024 17:04
@psFried
Copy link
Member Author

psFried commented Sep 27, 2024

I moved this back to a draft. At this point, I think it'd be unwise to merge/deploy it until after I'm back from vacation, so I'm thinking I might as well fold in the inferred schema changes to finish out the rest of #1520 .

Copy link
Member

@jgraettinger jgraettinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great so far! Really like the DraftPublication refactor.

draft,
dry_run: false,
detail: Some(format!("publication for data-plane {base_name}")),
// TODO: Should we verify user authz for publications to create a data plane?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's currently handled at L69 at the routine entrypoint: we require the user is an admin of ops/.

dry_run,
detail: Some(format!("publication for updating L2 reporting")),
default_data_plane_name: Some(default_data_plane.clone()),
// TODO: should we verify user authz for updating L2 reporting?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same story, it's done first-thing by the routine, unless I'm not understanding the question.
I suppose a difference here is that the user could be an admin of ops/ but not be an admin of ops.us-central1/ and then it would fail? Meh 🤷

}
}

/// A `FinalizeBuild` that removes and built collection specs that are not referenced by other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: removes and built => removes any built

Note that this must be used in conjuction with the ExpandDraft Initialize impl, or else it can prune collections that are referenced by live specs that are not included in the draft.

Also, is it a problem that it might remove collections referenced only by pass-through live specs? I'm not seeing why.

pub struct ExpandDraft {
/// Whether to filter specs based on the user's capability. If true, then only specs for which
/// the user has `admin` capability will be added to the draft.
pub filter_user_has_admin: bool,
Copy link
Member

@jgraettinger jgraettinger Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been debating whether admin is too restrictive -- I could see an argument that Alice's update of C should be able to observe / touch-publish a materialization M for which they have read but not admin.

But ultimately I think you're right, and it should be admin, because we don't want to fail Alice's publish of C due to a materialization M that they're unable to change. That's the admin of M's problem, and they'll see a controller error about the issue.

This is a departure from my prior understanding of how spec expansion should work. In the past, I've thought that you would see specs that make use of a read grant you provide for your collection, for which you yourself have no grant. Controllers and the eventual consistency they provide have changed the narrative here.

.await?;
let mut expanded_names = Vec::with_capacity(expanded_rows.len());
for exp in expanded_rows {
if self.filter_user_has_admin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be pushed down into fetch_expanded_live_specs ? I'm betting that routine can be simplified a bunch given the refined / simpler authorization expectations noted above ☝️

Passing an explicit publication id was important for our previous
approach to handling dependencies, but is no longer needed now that we
have touch publications. With a touch, the publication id is irrelevant
since the `last_pub_id` of the touched sped will not change.

Removing it from the `publish` function arguments allows the actual id
to be generated by the publisher, and sets us up for automatically
re-trying `PublicationSuperseded` errors, which requires generating a
new publication id.
Significantly refactors the `Publisher`, in order to address a number of
outstanding issues. Introduces a `DraftPublication` struct that
represents the desire to publish a draft, along with associated metadata
and configuration. Also added the `Publisher::publish(&self, DraftPublication)`
function, which handles build, commit, and retries.

Fixes #1634 by bypassing authorization checks for controller-initiated
publications. The `verify_user_authz` field of `DraftPublication` can be
used to toggle this behavior.

Takes care of a portion of #1520 by generating a new publication id for
each attempted publication, rather than re-using the `publications.id`
column value. This required adding a new `publications.pub_id` column
to hold the effective publication id, since `publications.id` can no
longer be used to join to `publication_specs` or
`live_specs.last_pub_id`.
Updates the default publication retry policy to retry `BuildSuperseded`
and `PublicationSuperseded` errors. This addresses a portion of #1520
Updates the agent to always update collection inferred schemas on any
non-touch collection publications. This makes it impossible for a user
to accidentally clobber an embedded inferred schema definition.

Also removes inferred schema handling from the `validation` crate, since
the drafted specs will already contain the updated schemas. Now all
validation needs to do is to bundle the write schema if necessary.
Resolves #1520

Updates the agent discovers handler to no longer set `expect_pub_id` for
collection specs. The goal is to avoid conflicts for collections that
use inferred schemas. There's no longer a risk of clobbering inferred
schemas due to the previous commit, which ensures published collections
always use the latest inferred schema. So it seems relatively safe to
omit `expect_pub_id` for the collection specs, and doing so ensures that
publications cannot fail simply due to a concurrent update of the
inferred schema.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

agent: controller publications should bypass authorization checks
2 participants