Trigger Geopoint ES Index on Geospatial Feature Flag Enable #35126

zandre-eng · 2024-09-16T10:00:18Z

Product Description

When the Geospatial feature flag is enabled, the domain will receive the following alert on all Geospatial pages which show the progress of indexing the domain's location properties:

If, however, the domain's case count is over the limit then the following error alert will be displayed instead:

Technical Summary

Link to ticket here.
Link to tech spec here.

A new Celery task has been added which will be automatically started when the GEOSPATIAL feature flag is enabled for a domain. This new task essentially runs through the same logic as the index_geolocation_case_properties management command, where ES docs for a domain have their location case properties indexed to add a geopoint property. This is done so that these ES docs can be used within the context of the Geospatial feature.

The CeleryTaskTracker model has been expanded to support the ability of storing a message in Redis, as well as having an error status. These are used to correctly show either the task progress or applicable error message to the user on the front-end.

Feature Flag

GEOSPATIAL

Safety Assurance

Safety story

Local testing
QA to be done

This change will only affect domains that enable the Geospatial feature flag. Additionally, an ES doc limit has been put in place so that the Celery task only starts indexing if the domain has cases fewer than the specified limit. The Celery task itself has also been set to run as a serial task, and so it is expected that no more than one of these tasks will be active at a time in the environment.

Automated test coverage

Automated tests exist for the new Celery task that has been added. Additionally, the tests for the CeleryTaskHelper class have been extended to account for the new changes there.

QA Plan

QA to be completed. Ticket is available here.

Rollback instructions

This PR can be reverted after deploy with no further considerations

Labels & Review

Risk label is set correctly
The set of people pinged as reviewers is appropriate for the level of risk of the change

AmitPhulera

The PR looks mostly in good shape, have a few suggestions regarding task tracking and a few names that I have shared above. Thanks for adding tests for the changes.

corehq/apps/geospatial/utils.py

corehq/apps/geospatial/tasks.py

corehq/apps/geospatial/reports.py

corehq/apps/geospatial/utils.py

AmitPhulera · 2024-09-17T07:27:08Z

settings.py

@@ -2093,3 +2093,5 @@ def _pkce_required(client_id):

 # NOTE: if you are adding a new setting that you intend to have other environments override,
 # make sure you add it before localsettings are imported (from localsettings import *)
+
+MAX_GEOSPATIAL_INDEX_DOC_LIMIT = 1000000


I the number should be fine, but we should atleast have some benchmarks on the number, if it affects the ES cluster in anyway.

Sounds good. I'll see if QA is able to do some load testing, otherwise I'll look into doing this.

No need to change this here, but just useful to know about. Since Python 3.6, you can use underscores to separate thousands in decimals (and words in hexadecimals and nibbles in binaries) to make numbers more readable. e.g.

MAX_GEOSPATIAL_INDEX_DOC_LIMIT = 1_000_000 addr = 0xCAFE_F00D flags = 0b_0011_1111_0100_1110

@kaapstorm Initially this is how I had it but then ultimately decided against it since I didn't see this formatting being used anywhere else within the settings.py file. I think this is fine to keep as is, but good to keep in mind for next time!

corehq/apps/geospatial/reports.py

Charl1996 · 2024-09-18T07:00:37Z

corehq/apps/geospatial/reports.py

@@ -68,8 +68,7 @@ def template_context(self):
                {'id': p.id, 'name': p.name, 'geo_json': p.geo_json}
                for p in GeoPolygon.objects.filter(domain=self.domain).all()
            ],
-            'es_indexing_message': celery_task_tracker.get_message(),
-            'is_error_es_index_message': celery_task_tracker.is_error(),
+            'task_status': celery_task_tracker.get_status(),


Charl1996

All good from my side.

AmitPhulera

LGTM 🚀

corehq/apps/geospatial/tasks.py

corehq/apps/geospatial/templates/geospatial/case_management_base.html

corehq/apps/geospatial/templates/gps_capture_view.html

corehq/apps/geospatial/utils.py

corehq/apps/geospatial/tasks.py

ajeety4 · 2024-09-20T07:47:58Z

corehq/apps/geospatial/tasks.py

+                chunk_size=DEFAULT_CHUNK_SIZE,
+            )
+            celery_task_tracker.update_progress(current=i + 1, total=batch_count)
+    finally:


Thinking of a situation when an exception occurs while processing the cases. This would mark the task as completed in the tracker.
I think it would be good to handle this scenario.

I followed the same logic as the task to async update owners. If we want to be a bit more cautious here, we could do a generic Exception catch, and then mark the tracker as having an error. I would need to slightly think through the tracker's current usage though, as we have no way of determining what error message to show when.

Yes this is more of a safeguard thing. Good point about the task to async update owners. I think it would have been good to have the error marked and stored for that as well. Not very sure what norm is followed in HQ, however a quick search shows handling exception is a case by case basis.

I would recommend for this indexing task considering if it throws a exception, the pending cases would never be processed and will not be available for the usage by the feature.
Agreed, that the tracker would need to be updated to store error message as well.

I agree, handling an exception here more gracefully would be a good idea. Thinking through this however, I'm a little unsure about trying to store a message in the redis cache again. We would need to store the raw string in redis and then translate it when rendering it on the front-end, and I'm not entirely confident or sure whether pulling a string from the redis cache into a variable and then trying to translate it would work correctly.

@ajeety4 What do you think of extending the status system to have the ability for custom error statuses instead? For the mark_as_error function we could pass in an optional slug to append to the end of the "ERROR" string (e.g. "ERROR_CELERY"). Having different error statuses would allow us to know which message to show on the front-end.

I'm not entirely confident or sure whether pulling a string from the redis cache into a variable and then trying to translate it would work correctly.

Great catch. You are right on this, As per django docs,
The caveat with using variables or computed values, as in the previous two examples, is that Django’s translation-string-detecting utility, django-admin makemessages, won’t be able to find these strings

What do you think of extending the status system to have the ability for custom error statuses instead? For the mark_as_error function we could pass in an optional slug to append to the end of the "ERROR" string (e.g. "ERROR_CELERY")

This is a good idea. I feel like a cleaner approach would be to use a separate key error_slug instead of using the task_key while marking it is an error. This way it keeps the choices for the task_key predictable while giving the flexibility to the consumer to set error_slug of their choice.
That being said, I am good with initial approach if this makes things complicated.

Nice find! Doing message strings is definitely out then.

use a separate key error_slug instead of using the task_key while marking it is an error

Do you mean that the task_key would then only have "ACTIVE" or "ERROR" as its states and then we would keep the error_slug as a separate field, combining the two if the task_key is an error? Something like:

def get_status(self): status = self._client.get(self.task_key) if status == 'ERROR': status += f'_{self._client.get(self.error_slug))' return { 'status': status, 'progress': self.get_progress(), }

Yes correct. What do you think of returning the error_slug as well as shown below ? This would not then require to append or deappend a prefix

def get_status(self): status = self._client.get(self.task_key) return { 'status': status, 'error_slug': self._client.get(self.error_slug) if status == 'ERROR' else None, 'progress': self.get_progress(), } def mark_as_error(self, error_slug=None, timeout=ONE_DAY * 3): if error_slug: self._client.set(self.error_slug_key, error_slug) return self._client.set(self.task_key, 'ERROR', timeout=timeout)

I like this idea, it would then keep it completely clean from the task_key. This looks like quite a small lift, so I'll implement as such.

Addressed in ded63ee.

<3 Great discussion.

ajeety4

Overall looks good to me.
Left some minor queries and suggestions.

kaapstorm

👍 A couple of comments, nothing blocking.

kaapstorm · 2024-10-01T12:01:44Z

corehq/apps/geospatial/tasks.py

+    update_cases_owner,
+    get_geo_case_property,
+)
+from corehq.apps.geospatial.management.commands.index_geolocation_case_properties import (


Nit: It feels awkward for the tasks module to be importing from a management command, instead of the other way round (or both importing from somewhere else, but I'd prefer not to throw everything in the utils module, like that second drawer in the kitchen that somehow ends up not only with tongs and skewers, but also clothes pegs, and screws, and one rusty paper clip).

Perhaps a good alternative would be to move the es_case_query function off to the es.py file as I feel it would fit well there. This can then be renamed to something like es_case_query_for_missing_geopoint_val or something to that effect.

Alternatively, I can also simply create a file in a new "utils" directory to contain just the above helper function. I do feel the first option makes sense enough and is a bit more straightforward, but happy to go in this direction as well if you think it sounds more suitable @kaapstorm.

The es module sounds ideal for es_case_query! Yeah, I agree, the function is worth renaming too, and you could drop the "es_" prefix because we would know from its module that it would be an Elasticsearch query.

Addressed in a161b10.

kaapstorm · 2024-10-01T12:08:29Z

corehq/apps/geospatial/tasks.py

+    get_geo_case_property,
+)
+from corehq.apps.geospatial.management.commands.index_geolocation_case_properties import (
+    _es_case_query,


If you're importing a protected function somewhere other than a test, then it's possibly no longer being treated as protected and you should drop the leading underscore. -- This is a guideline, not a rule, so I'm not requesting a change, and I'll leave it to your discretion ... but interesting that the linter didn't flag this.

Good point. I have always been a little unsure as to how strict we should be when it comes to marking functions in Python as private. Is having simply a single external reference enough justification, or do we need to start referencing it a few times before it becomes obvious that this is clearly a public function (I have seen plenty of examples in HQ that use both examples)? I suppose this is the potential downside of having it more as a guideline than a rule, it can be difficult to gauge where the line lies at times.

Thinking this through however, the former does make more sense since even a single external reference clearly means its not private/self-contained anymore. Given this, I agree it would make sense to drop the _ here.

kaapstorm

<3 Nice. Thank you.

zandre-eng added 9 commits September 12, 2024 13:44

extend task tracker to handle messages

4b32aec

refactor index command functions

6280c17

helper func to get celery task tracker

7f52f0e

start celery task to index docs on enabling feature flag

2184e76

send task message to front-end

d7abcee

display task message

5409b02

unit tests

177b1b3

setting for max index doc limit

9deab96

error status for task helper

b647463

zandre-eng added awaiting QA QA in progress. Do not merge product/feature-flag Change will only affect users who have a specific feature flag enabled labels Sep 16, 2024

zandre-eng requested review from AmitPhulera, Charl1996 and ajeety4 September 16, 2024 10:00

zandre-eng marked this pull request as ready for review September 16, 2024 10:00

AmitPhulera reviewed Sep 17, 2024

View reviewed changes

Charl1996 reviewed Sep 17, 2024

View reviewed changes

corehq/apps/geospatial/reports.py Outdated Show resolved Hide resolved

zandre-eng added 4 commits September 17, 2024 12:45

simplify tracker keys

196ffd3

use progress instead of messages

6906d82

update unit tests

ca4ad45

fix incorrect unit test assert

128757f

Charl1996 reviewed Sep 18, 2024

View reviewed changes

Charl1996 approved these changes Sep 18, 2024

View reviewed changes

AmitPhulera approved these changes Sep 19, 2024

View reviewed changes