Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of SQL functions to the style #5004

Open
imagico opened this issue Aug 15, 2024 · 12 comments
Open

Introduction of SQL functions to the style #5004

imagico opened this issue Aug 15, 2024 · 12 comments
Labels
consensus needed Indicates the lack of consensus among maintainers blocks a PR/issue general

Comments

@imagico
Copy link
Collaborator

imagico commented Aug 15, 2024

In #4952 we have the first concrete change proposed in this style that is using SQL functions in its implementation. This choice was based on the implementation of a similar change elsewhere (#214 (comment)) which #4952 is largely inspired by. To me this seems a natural and non-controversial way to approach this but apparently it is not so i open this issue to discuss the idea of introducing the technique of SQL functions as a strategic measure. This would allow us to expand our technical capabilities significantly and address key issues of the style while ensuring our code base remains well maintainable and style development remains well accessible for map designers.

Background

Our SQL code based has expanded quite massively over the history of this style and despite efforts to minimize code duplication using methods supplied by SQL and in YAML there is a lot of repetition in our SQL code.

Ultimately the problem is that all SQL functionality is implemented inline in project.mml. This severely limits our ability to implement necessary changes. #4952 could still be implemented with inline SQL code, but it would be very awkward and difficult to maintain this way. Other strategic issues like #3880 or design ideas that require ground unit processing (see for example #4340, #4346, #4661, #4948), however, depend fully on us moving to new techniques of structuring and modularizing SQL code.

I had mentioned this necessity before in #4431 (comment):

So far on the SQL code level of this project (which - after the MSS code - is the second most important level where design work happens and that most map designers need to deal with) we are inlining all code in project.mml (with the exception of use of YAML features to de-duplicate completely identical queries). I don't think this is sustainable any more. In particular resolving #3880 in a scalable and future-proof way would not be possible under that paradigm. Same applies most likely for ground unit rendering (#4340, #4346). So introducing use of views, functions or lookup tables or a combination of such seems inevitable if we want to make progress on the larger issues this project faces.

Proposal

The suggestion is to introduce SQL functions as a method to structure and modularize SQL functionality and to avoid unnecessarily duplicating SQL. In #4952 the purpose of these functions would be tag interpretation tasks - which need to be performed identically at several places in the road layers. But the use of function is not limited to that, for ground unit processing functions would be used for geometric calculations (converting between map units, ground units and screen pixels). For addressing #3880 a central use of functions would likely be the scale_denominator to zoom level conversion.

SQL functions are not the only method that can be used to improve our abilities to work with SQL in solving map design problems. #3880 will likely require other methods in addition (like script generated SQL code or lookup tables) but SQL functions are quite clearly the most versatile method here. Other use cases can be found in the AC-Style.

The idea would be to approach this in a conservative way:

  • use Rendering specific access tags #4952 as a first test to evaluate the method
  • have the functions in a file functions.sql next to the existing indexes.sql.
  • stick to functions in LANGUAGE SQL as far as reasonably possible.
  • if Rendering specific access tags #4952 turns out to be sucessful we could identify existing cases of substantial duplication in our SQL code and explore if these can be reasonably avoided by using functions.
  • new suggestions to using functions would be looked at on a case-by-case basis.
  • care would need to be taken to evaluate if the use of functions has a significant performance impact. This is usually possible to test already on a small test database to not a big practical problem.
  • when we make changes to function names/parameters we need to take care that previously installed older versions are correctly removed.
  • we need to communicate the need to install the functions we use to style users. Practically this is not going to be a big issue since we already require users to install our indices.
@imagico imagico added general consensus needed Indicates the lack of consensus among maintainers blocks a PR/issue labels Aug 15, 2024
@pnorman
Copy link
Collaborator

pnorman commented Aug 19, 2024

Functions can be useful for logic that operates across multiple objects or multiple tables, but most of the examples I see are functions which transform tags into something more useable. This is the job of osm2pgsql, which converts OSM tags into Postgres columns that are useful for making a map.

Do you have examples of functions that don't duplicate functionality of osm2pgsql?

@imagico
Copy link
Collaborator Author

imagico commented Aug 19, 2024

Let me get one thing strait: It is us who decide how we want to use the tools we use, not the developers of these tools deciding about what these should be used for long after OSM-Carto was created. As said before in #4952 - if you want to suggest us to move to doing style specific tag interpretation in preprocessing, either in SQL (#4952 (comment): moving the SQL logic to import time) or via osm2pgsql then please present your approach and argue your case why you think this would be of strategic benefit for us. And since the linear discussions on this issue tracker are not suitable for mixing several topics i would strongly suggest moving that to a separate issue.

Regarding applications of SQL functions other than tag interpretation: I have already provided several pointers: a viable solution to #3880 will likely require a !scale_denominator! to zoom level conversion - which is most suitable for being implemented in an SQL function. And since !scale_denominator! is a render time variable this is inherently impossible to do during database import of course. Another natural use case is - as mentioned - conversion between map units, ground units and screen pixels. Currently we only do map units to screen pixels conversion 20 times via way_area/NULLIF(POW(!scale_denominator!*0.001*0.28,2),0) - there is not that much benefit of encapsulating that in a function but if we decide to work with ground units at some point (and - as mentioned - there are plenty of cases where this could be desirable) that would change. Further examples can be found in the AC-Style, for example the per-country spatially differentiated lookup of default language, currency and driving side of roads. I also widely use functions that combine numeric interpretation of tags with ground unit calculation to determine the drawing size of map elements. Like:

https://github.com/imagico/osm-carto-alternative-colors/blob/7f2aa12ce375e318b60b05332bb7ad135d98a613/sql/symbols.sql#L10
https://github.com/imagico/osm-carto-alternative-colors/blob/7f2aa12ce375e318b60b05332bb7ad135d98a613/sql/roads.sql#L449

But yes, a wide range of potentially useful applications of functions for us is style specific tag interpretation. That is why i think it makes sense to test this method with a tag interpretation application in #4952.

@dch0ph
Copy link
Contributor

dch0ph commented Aug 19, 2024

This is the job of osm2pgsql, which converts OSM tags into Postgres columns that are useful for making a map.

I disagree with this presentation. The job of osm2pgsql is to import the data needed to make a useful map. But unless there are real performance issues, interpretation of data should be left to point-of-use for maximum flexibility, otherwise any change to interpretation involves a database reload.

I feel there's a big difference between data normalisation, which should be done at import, and tag interpretation which is best handled by the style, not least because it allows the database to be shared. I used to create custom columns etc. in my own style, and then got rid of this since it was so much better to have a common database structure that could be shared between styles.

Generated columns may be useful in future. These may be easier to support in the flex style. In "classic" osm2pgsql mode, they were a bit clunky as re-importing the database killed the generated columns. In either case, having the SQL function that generates the column is needed!

@imagico
Copy link
Collaborator Author

imagico commented Aug 30, 2024

Reminder to everyone: This proposal has been open for discussion for more than a week now and progress on #4952 depends on establishing consensus on this.

The question asked regarding examples of SQL functions for purposes other than tag interpretation has been answered.

I would like to once more explicitly invite a critical look at what is suggested here from everyone. Although several IMO weighty arguments have been put forward for the use of SQL functions (including their use for style specific tag interpretation) arguments speaking against it would be very valuable to hear.

I also want to summarize the timeline on #214/#4952 to make clear again why making progress here is of high importance:

If there are arguments why - in light also of this history - the approach taken in #4952 of using SQL functions for implementing the tag interpretation logic as part of a principal strategy of conservatively introducing SQL functions in this project like proposed here is not a good idea for OSM-Carto in pursuit of its goals i want to hear them. Otherwise i would say it is really more than time to get #4952 out of the door.

@pnorman
Copy link
Collaborator

pnorman commented Aug 30, 2024

Regarding applications of SQL functions other than tag interpretation: I have already provided several pointers: a viable solution to #3880 will likely require a !scale_denominator! to zoom level conversion

Yes, I agree that some calculations that require information only available at time of generation can make sense as functions - basically anything involving Mapnik tokens. I still don't see that it's the best route for calculating values that are only determined by the object's properties.

@dch0ph
Copy link
Contributor

dch0ph commented Aug 30, 2024

I still don't see that it's the best route for calculating values that are only determined by the object's properties.

But we calculate internal values from object properties all the time:

                CASE WHEN surface IN ('unpaved', 'compacted', 'dirt', 'earth', 'fine_gravel', 'grass', 'grass_paver', 'gravel', 'ground',
                                      'mud', 'pebblestone', 'salt', 'sand', 'woodchips', 'clay', 'ice', 'snow') THEN 'unpaved'
                  WHEN surface IN ('paved', 'asphalt', 'cobblestone', 'cobblestone:flattened', 'sett', 'concrete', 'concrete:lanes',
                                      'concrete:plates', 'paving_stones', 'metal', 'wood', 'unhewn_cobblestone') THEN 'paved'
                END AS int_surface,

Apart from the one-off bump in install complexity, I don't follow why it is worse to do this via an SQL function rather than as in-line SQL. My understanding is that small functions would be inlined anyway.

We would not want to calculate int_surface at import time - you would need a reload with any change to surface types interpreted.

As noted above, generated columns may offer a third way in the future, but you still need the function that does that calculation.

@dch0ph invested a lot of time over the months in adjusting this to various suggestions and requests made.

But at least I'm benefitting from this work in my own style ☺️ ...

@imagico
Copy link
Collaborator Author

imagico commented Aug 30, 2024

I still don't see that it's the best route for calculating values that are only determined by the object's properties.

This does not provide much of a chance to achieve consensus here. You are essentially asking us to prove a negative - the impossibility of a better solution to exist. I am (a) not convinced that this is the case and (b) not making any claim in that direction at all with this proposal.

Various arguments have been presented why we think introducing use of SQL functions in OSM-Carto at this stage is a good idea and why #4952 is a suitable first use case - exactly because, like @dch0ph pointed out - it does not introduce something technically fundamentally new (like it would be for ground unit rendering) but structures and modularizes something into functions that we have so far always done inline in project.mml - at the expense of substantial code duplication and poor maintainability. If you find the arguments presented flawed or have concrete arguments against this proposal please present them so we can have a fact based discussion rather than just an exchange of opinions and can try to come to a consensus view on this.

Stating that you are not convinced but not actually arguing on the matter does not provide much of a foundation to work towards consensus.

@imagico
Copy link
Collaborator Author

imagico commented Aug 31, 2024

My understanding is that small functions would be inlined anyway.

I don't think this is primarily a matter of the length in code, more about what the function is doing. I used the case example of the map units to pixels conversion (way_area/NULLIF(POW(!scale_denominator!*0.001*0.28,2),0)) - that is a static unit conversion that does not implement any map design decision and will never need to me modified. Separating that out into a function with a descriptive name (like mercator_square_meters_to_pixels()) would only have the benefit of hiding the exact implementation of the math - which might make it a bit easier to work with for some map designers but more difficult for others. Bottom line: Very little or negative net benefit. This will change if the logic gets more complicated of course (like for ground unit conversion).

Things are different for map design specific logic. For example we have 7 occurrences of

CASE WHEN service IN ('parking_aisle', 'drive-through', 'driveway') THEN 'INT-minor'::text ELSE 'INT-normal'::text END 

which generates the minor service road distinction from the service tag. That is much shorter than the int_surface case you used as an example - still it could be worthwhile to encapsulate that logic in a function so someone who wants to change which service tags (or potentially other tags) lead to classification as minor or normal service roads only has to make one instead of seven modifications. I am not saying we should necessarily do that in this case (discussing that would be part of step 4/5 of my proposal) - but the benefit would be much clearer than for the map units to pixels conversion.

There is a third type of highly repetitive code elements in our SQL code that do not represent style specific logic but implement practically immutable aspects of the OSM data model. Most obvious of those is probably the COALESCE(layer,0) AS layernotnull (16 occurrences). Since such elements do not practically need to be edited by map designers and are simple enough to be intuitively clear to everyone familiar with OSM they could indeed be implemented in preprocessing. But note this is not trivial even in such a simple case - the difference between an explicit layer=0 and a missing layer tag is not necessarily irrelevant for rendering. But this is definitely something i would consider worth discussing (would be part of developing a data preprocessing strategy in #4901). At this stage, however, this is a theoretical discussion since no concrete proposal has been presented for a strategy in that direction. And i see not much overlap with the discussion here - the distinction between style specific and style independent interpretation of the OSM data is quite clear.

@pnorman
Copy link
Collaborator

pnorman commented Aug 31, 2024

I still don't see that it's the best route for calculating values that are only determined by the object's properties.

This does not provide much of a chance to achieve consensus here. You are essentially asking us to prove a negative - the impossibility of a better solution to exist.

But I have provided what I view as a better solution, you have just considered it off-topic for this discussion.

@imagico
Copy link
Collaborator Author

imagico commented Sep 1, 2024

But I have provided what I view as a better solution, you have just considered it off-topic for this discussion.

You have hinted at the abstract idea to do style specific tag interpretation in pre-processing - but you have not presented a practical strategy how to do that or arguments why you think this is a beneficial approach for OSM-Carto. I have asked you to do so several times in #4952 and explained there why i consider it necessary to separate this from the PR review discussion (#4952 (comment), #4952 (comment), #4952 (comment), #4952 (comment)).

@dch0ph has argued a bit why style specific tag interpretation with SQL functions is preferable to the abstract idea of doing it in pre-processing in some way. I do not see how doing this in pre-processing can be preferable to what is suggested here either, considering the goals of this project - but i do not consider it a productive approach to argue this in an abstract fashion at this stage. I want to be open to the possibility that i am wrong about this and that a well thought through, practically viable concept can address my arguments and convince me to change my mind. But you (or someone else) have to present such a concept and arguments why this is preferable to using SQL functions. Then we can have a fact based discussion of the arguments for and against the different approaches and develop consensus or come up with a third strategy based on the insights gained that is better than the individual proposals.

This is what an openly cooperative project like OSM-Carto is all about. Developing a common strategy on how to tackle the challenges we meet that is better suited than what each of us could come up with individually by combining our diverse viewpoints and experiences to jointly develop the best approach to do things under our shared goals. That requires investing the time and energy to present your ideas and reasoning to the critical review of others. I am doing this here for the approach we have in #4952 developed over the past months based on ground work of the past years. I expect you to do the same for your preferred approach.

A key to productive cooperation is also to value each other's views and perspectives even if you disagree with them and the willingness to discuss them to develop a common understanding (see also #3951 (comment)). Your repeated assertion of tag interpretation in pre-processing being preferrable combined with the non-engagement in argument either for your preferred approach or against the proposal presented here makes it difficult to develop consensus.

@imagico
Copy link
Collaborator Author

imagico commented Sep 30, 2024

Another month has passed with no substantially new input. Everyone had plenty of time to contemplate the matter so i think we can try to wrap this up.

@pnorman - please indicate if you want to put up a concrete alternative proposal for a strategy to move style specific tag interpretation to the database import as a (partial) alternative to what is suggested here. I would very much like us to decide on this proposal with a concrete alternative to compare to and choose the better of two option rather than the only one practically available. But for that we'd need to have a concrete alternative with an implementation strategy (like i provided here) and an idea how this will practically look like (as we have in #4952). The general assertion you'd rather do that in preprocessing is neither suitable for a consensus decision among the maintainers regarding strategic direction nor does it provide sufficient guidance to developers how to practically implement changes that do tag interpretation.

@mboeringa
Copy link

@imagico

I think introducing functions is reasonable. Duplicate code is never desirable from a maintenance point of view, and should be avoided whenever and to the extent possible.

As you said, it is also not to much of a burden - the installation of the functions - compared to what already needs to be done, e.g. indexing.

I do also see the point @pnorman tries to make, and of course we should be glad osm2pgsql has so much more to offer nowadays in terms of tag processing via the new flex style processing, but I think these options are largely complementary and interchangeable, rather than clear-cut choices in specific cases. One is not necessarily better than the other, as arguments by @dch0ph also shows, it all depends on your goals with the data processing.

I do also agree with the general wish to minimize tag processing in the import stage, and leave the tag interpretation to the style (but of course the 'flex' configuration files can now largely represent "style" as well if developers wish to, that is what we gained with recent versions of osm2pgsql).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus needed Indicates the lack of consensus among maintainers blocks a PR/issue general
Projects
None yet
Development

No branches or pull requests

4 participants