Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed Engine: push down even more expressions #483

Open
MichaHoffmann opened this issue Aug 15, 2024 · 0 comments
Open

Distributed Engine: push down even more expressions #483

MichaHoffmann opened this issue Aug 15, 2024 · 0 comments
Assignees

Comments

@MichaHoffmann
Copy link
Contributor

MichaHoffmann commented Aug 15, 2024

I have been thinking what makes an expression computable in a remote engine and to be reassembled in the root engine. I think the following might be a sound algorithm that enables us to push down even more expressions.

Problem

Assume we have remote engines that are partitioned by a label P. Imagine an expression like topk(10, sum by (P, instance) (X)) and assume that instance is a high-cardinality label. In the current implementation we would have to evaluate sum by (P, instance) (X) remotely and fetch all values of instance into the root engine to compute the surrounding topk even though we would not need most of them, clearly we can evaluate topk per engine and only coalesce the top 10 values for the inner expression since we dont have any overlap. But the algorithm right now doesnt know that.

Observation

Imagine we are traversing the AST of a PromQL expression from the bottom up. Assume we have expression nodes P and C, where P is the parent of node C. If the expression at node C still has its original P label I want to conjecture that we can evaluate C in the remote engines and how we coalesce in the root engine depends on the parent node P. If P is nil and we are at the root we just join the series together, if its an aggregation we do the aggregation in the root engine, etc.

What is meant with the heuristic that expression node C is having the original P label still? I think we can define this recursively. In the base case we are at a vector selector, by definition it will have P (disregarding constant expressions and absent functions right now). If we are a node that has children and all children have the P label still then we also have it if we are not a relabel_replace function that overwrites it, if we are not aggregating it away (i.e. sum without(P) (X) would lose P) or if we are not a binary expression that doesnt have P in its matching labels (i.e. A * ignoring (P) B would lose P). In all other cases unless im forgetting something we should still have the partition label P and should be computable in a remote engine.

Suggestion

We rewrite the distribution algorithm by checking recursively if we keep the partition labels P, if so we traverse up until we cannot anymore. Once a parent P would lose the label we stop recursion and evaluate the expression henceforth in the root engine, we coalesce depending on the parent node P.

Is there anything obiously wrong with this approach?

Edit: this should also hold if P is multiple labels that form a logical partition of remote engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants