Automatic Registration of metrics, nodes, MV, transforms etc as well as simpler way to allow any query dimensions available for Transform #1117

bazooka720 · 2024-07-25T12:42:06Z

Is there a more simpler way to enable registration of queries as metrics, nodes (as well as allow not just source nodes as basis for transforms?).

One approach is to just have a simple folder structure where file name and description on top of sql files can be automatically consumed into DJ. Essentially something that makes it easy for existing projects to transition into DJ would help accelerate adoption of DJ

@agorajek @samredai hope this summarizes properly our chat
PS: See this for e.g. not exactly the same as above...but the intent of it is relevant
https://github.com/Bl3f/yato

samredai · 2024-07-25T14:01:31Z

@bazooka720 I think this is a really interesting idea and should definitely be possible with a bit of work. I think this can replace our concept of a "project" that's essentially a collection of YAML files. However that logic is part of the client library and runs locally and the reason I mention that is because the client doesn't have access to the SQL parser+builder which currently is all part of the server-side code.

So the first question I think we need to answer is where would this logic live (should it all run client side or will the files be bundled and sent to the server for processing?). If we decide this should all run client-side then the next question is how do we share the SQL AST and related logic from the server with the client library (maybe break it out as a standalone shared library?).

agorajek · 2024-07-25T22:10:44Z

Let's consider a single SQL file (DBT model, Looker explore, etc). We could make a set of similar APIs:

register_sql( file-name|query-string, db_connection, target_namespace)
register_dbt( file-name|model-string, db_connection, target_namespace)
etc.

In all above cases we could parse the content of the 1st argument, find all the referenced tables, connect to db_connection to pull the schema of the referenced tables and create a transform node + any other metric|dim nodes based on the content. Both the target_namespace and db_connection could be optional.

And once we get the above APIs right, then we could simply apply it to the collection of these using a client or a CLI.

bazooka720 · 2024-07-26T01:41:20Z

This Makes lot of sense @agorajek. This would allow pretty much instant onboarding for any company with pre-existing models / . @samredai I think the logic could be easier for the client to probably push to dj (this is good as the client can more easily changes and so can set triggers ti sync with DJ).

samredai · 2024-07-26T02:26:54Z

Sounds good, I really like how this sounds. If we want to do the processing on the server side, I think we should lean into that and batch the entire set of local SQL files together into one request. Then the server can validate them all in one go, including the references

I'll try to create a set of these SQL files that models the tpch.sf1 sample schema that comes with the oss Trino image and then we can use that as a test suite when developing this.

samredai · 2024-08-01T12:55:34Z

In all above cases we could parse the content of the 1st argument, find all the referenced tables, connect to db_connection to pull the schema of the referenced tables and create a transform node + any other metric|dim nodes based on the content. Both the target_namespace and db_connection could be optional.

@agorajek I think we should talk about this. If you look at the issue @vcknorket linked, I believe it's an alternative to what you're describing here. If we could register raw database SQL as a source node, that would prevent us from having to actually parse it and and create a virtual DAG within DJ. This is really just a handy shortcut that's better than telling people to go create their view(s) on their own and then come register it as a source node in DJ. It's still an open question on how DJ will create/manage the view, but maybe it's as simple as it creates it if it doesn't exist, and updates it whenever it's changed in DJ.

Once these views are in DJ as source nodes, nothing else really changes, meaning transforms/metrics/dimensions/cubes can be created on top of them just as happens today.

agorajek · 2024-08-05T18:33:25Z

@samredai I think we are getting closer to a solution. I just read your suggestion above and I think it matches pretty closely with the solution described at #1127

vcknorket mentioned this issue Jul 30, 2024

Ability to register a view / sql as a source node #1127

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic Registration of metrics, nodes, MV, transforms etc as well as simpler way to allow any query dimensions available for Transform #1117

Automatic Registration of metrics, nodes, MV, transforms etc as well as simpler way to allow any query dimensions available for Transform #1117

bazooka720 commented Jul 25, 2024 •

edited

Loading

samredai commented Jul 25, 2024

agorajek commented Jul 25, 2024

bazooka720 commented Jul 26, 2024

samredai commented Jul 26, 2024

samredai commented Aug 1, 2024 •

edited

Loading

agorajek commented Aug 5, 2024

Automatic Registration of metrics, nodes, MV, transforms etc as well as simpler way to allow any query dimensions available for Transform #1117

Automatic Registration of metrics, nodes, MV, transforms etc as well as simpler way to allow any query dimensions available for Transform #1117

Comments

bazooka720 commented Jul 25, 2024 • edited Loading

samredai commented Jul 25, 2024

agorajek commented Jul 25, 2024

bazooka720 commented Jul 26, 2024

samredai commented Jul 26, 2024

samredai commented Aug 1, 2024 • edited Loading

agorajek commented Aug 5, 2024

bazooka720 commented Jul 25, 2024 •

edited

Loading

samredai commented Aug 1, 2024 •

edited

Loading