Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle empty result sets in CTEs #92

Merged
merged 5 commits into from
Jun 3, 2024

Conversation

camuthig
Copy link
Contributor

If Django finds a CTE during SQL generation that will result in empty results, it will throw an EmptyResultSet immediately. Because the CTEs are compiled to SQL prior to the base query in the CTEQueryCompiler, though, the base compiler is missing core information Django needs to gracefully handle this situation, like col_count and klass_info.

To resolve this, in the case of EmptyResultSet being thrown by CTE compilation, the base as_sql will be called as well before reraising.

The creation of the base queryset can not be move before the explain behaviors in generate_sql or the generated EXPLAIN will throw an error, so the try/except pattern is used instead.

Resolves #84

I think this will also work for #64 as well, but I couldn't figure out a way to throw that particular error.

If Django finds a CTE during SQL generation that will result in empty
results, it will throw an EmptyResultSet immediately. Because the CTEs
are compiled to SQL prior to the base query in the CTEQueryCompiler,
though, the base compiler is missing core information Django needs to
gracefully handle this situation, like `col_count` and `klass_info`.

To resolve this, in the case of EmptyResultSet being thrown by CTE
compilation, the base `as_sql` will be called as well before reraising.

The creation of the base queryset can not be move before the explain
behaviors in `generate_sql` or the generated `EXPLAIN` will throw an
error, so the try/except pattern is used instead.
@millerdev
Copy link
Contributor

The wrong result is returned when using LEFT OUTER JOIN on the CTE. Here's a test to demonstrate:

    def test_left_outer_join_on_empty_result_set_cte(self):
        totals = With(
            Order.objects
            .filter(id__in=[])
            .values("region_id")
            .annotate(total=Sum("amount")),
            name="totals",
        )
        orders = (
            totals.join(Order, region=totals.col.region_id, _join_type=LOUTER)
            .with_cte(totals)
            .annotate(region_total=totals.col.total)
            .order_by("amount")
        )

        self.assertEqual(len(orders), 22)

This explicitly passes in the `elide_empty` parameter to each CTE
compiler based on if the `join` used by it is an outer or inner join. By
passing in `elide_empty` as false, instead of the compiler raising an
`EmptyResultSet` error, it will instead alter the query to always return
no results by setting the where clause to something that is always
false.
@camuthig
Copy link
Contributor Author

camuthig commented May 20, 2024

@millerdev I believe the latest commit should handle that case. It requires more introspection to handle the join, since the join aspect isn't what is throwing the EmptyResultSet - that is coming from the CTE itself. I leveraged the aliases map to understand how the CTE is being used in the rest of the query, and then the existing elide_empty parameter on the get_compiler function to not throw those errors if those CTEs were left joins outer joins.

In the master branch, the left outer joins still caused an empty result set and resulted in the klass_info access error. In this branch, that was originally altered to silently return no results after the empty result set was handled. Now it actually runs the query, but forces it to just not have results without throwing any errors.

Comment on lines 82 to 85
elide_empty = True
alias = query.alias_map.get(cte.name)
if isinstance(alias, QJoin) and alias.join_type == LOUTER:
elide_empty = False
Copy link
Contributor

@millerdev millerdev May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this conditional value of elide_empty needed? Would it work to always pass elide_empty=False? Is there a disadvantage to doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of going with the flow of what Django does as default behavior in the core compiler. The default behavior in the compiler is to raise an assertion error and not run the query in the case of a knowable empty result set. The alternate flow, with elide_empty set to false, is used when queries need to be run regardless of the fact that they will not return results.

@millerdev
Copy link
Contributor

Tests are failing. Unfortunately elide_empty is not available on all supported versions of Django. There is also a lint error.

Before Django 4.0, the `elide_empty` argument on the compiler did not
exist. To ensure that left outer joins in these versions of Django are
not optimized away by the SqlCompile, essentially the same behavior is
added to the error handling in those versions of Django. This creates
a copy of the CTE query, forces the where to have a always-false
condition that the SqlCompiler cannot optimize away, and then builds
the SQL for that query instead.
@camuthig
Copy link
Contributor Author

It looks like elide_empty was added in 4.0.

I have fixed the lint errors and cleaned up the code a bit to enable specifically handling the situation in the except block for the older versions of Django. This is basically the same way that elide_empty works under the hood, it just isn't as clean of an implementation in the library-space.

Copy link
Contributor

@millerdev millerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I made a few suggestions.

django_cte/query.py Outdated Show resolved Hide resolved
django_cte/query.py Outdated Show resolved Hide resolved
django_cte/query.py Outdated Show resolved Hide resolved
django_cte/query.py Outdated Show resolved Hide resolved
@camuthig
Copy link
Contributor Author

Thanks for the feedback. It looks like the connector has always been optional, so I have removed it as requested and pulled in the other feedback.

Copy link
Contributor

@millerdev millerdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dragging this out, but I noticed one more thing that I'd like to see changed. Otherwise this is looking great. Thanks for contributing to Django CTE!

django_cte/raw.py Outdated Show resolved Hide resolved
@millerdev millerdev merged commit c02b8e4 into dimagi:master Jun 3, 2024
15 checks passed
@millerdev
Copy link
Contributor

Thank you for the contribution @camuthig!

@camuthig
Copy link
Contributor Author

camuthig commented Jun 5, 2024

No problem. Thanks for the awesome project @millerdev . My team is starting to lean on it more to improve our more out of control queries. Do you know when you might cut a release that includes this change? We are pinned to my branch until we can get back onto official releases, but we have a few behaviors that fall over without this fix.

@millerdev
Copy link
Contributor

Thanks for the prompt. New release: https://pypi.org/project/django-cte/1.3.3/

@SupImDos
Copy link
Contributor

@millerdev Not a huge deal, but would it be possible to do a tag / release on GitHub for v1.3.3 as well?

@millerdev
Copy link
Contributor

Yes, it had been tagged, but I forgot to push it. https://github.com/dimagi/django-cte/releases/tag/v1.3.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

issue while trying to use with_cte on EmptyResultSet
3 participants