[ON HOLD] Django settings cleanup, logging improvements and critical fixes #585

hari-kuriakose · 2024-08-15T17:44:31Z

NOTE: This needs approval from ALL reviewers before merge.

What

Segregate the Django settings required for backend versus Celery powered services (execution-consumer, celery-flower and celery-beat).
Improve logging in some cases.
Fix issue where pointing to a new Django settings module in backend/.env was not picked up.
Fix issue where backend was breaking when directly running in local due to version mismatch of some protocol buffers generated code. THIS FIXES CONTRIBUTOR INSTALLATIONS.
Allow building docker images locally from current branch. BENEFITS CONTRIBUTORS.
Ensure another virtual env is not active while invoking dev env CLI commands. BENEFITS CONTRIBUTORS.
Update pdm.lock files for backend and prompt-service.

Why

Integrating Celery distributed queue with Django is beneficial but Django settings need to be isolated by component, to ensure only required modules are loaded by Django. For example, various INSTALLED_APPS, plugins, etc for backend need not be loaded for Celery.
- Renaming settings/dev.py to settings/platform.py might be more apt, as gradually we can keep all common settings in base.py, and unique ones in platform.py and celery.py respectively.
- Adding settings/celery.py keeps all Celery related settings in one place and also eliminates the need for flowerconfig.py.
- Adding .celery.env ensures env is not polluted.
If fetching user orgs failed due to some reason, the error was not logged making it difficult to debug.
DJANGO_SETTINGS_MODULE should not be set in random places but ONLY in these four main entrypoints, BEFORE importing anything else:
- backend/asgi.py (production ASGI)
- backend/wsgi.py (production WSGI)
- manage.py (development server)
- backend/celery.py (Celery powered services)
Protocol buffers by nature generates some stub code for client and server integration from a .proto file. However, this stub code needs to be regenerated should the protocol buffers dependency version change. The easier alternative is to have PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python env setting which lets Python internally handle the stub code generation although relatively slower. This is what we do. However the other issue where the env vars and Django settings got loaded out of order caused mentioned protocol buffers env var to be unset, thus breaking backend.
Contributors should be able to run services directly in local always.
Evaluation server if not running will throw warnings, which could be suppressed by default to remove noise from the logs. We could have an option to show warnings when required, which will continue to help in debugging.
A local build of Docker images should be allowed to happen from any feature branch instead of main only.
Executing dev-env-cli.sh while another virtual env is already activated will cause undesired effects, hence need to be checked.
pdm.lock files base branch caused resolution errors, hence needed to be regenerated manually.

How

Features

Add error logging for fetching user orgs
Improve logging and docs
Add settings/celery.py
Add .celery.env and corresponding .sample.celery.env
Add backend/celery.py entrypoint for execution-consumer, celery-flower and celery-beat
Rename celery instance to celery to avoid confusion
Refactor celery services config
Rename settings/dev.py to settings/platform.py
Remove django settings module update from utils
Remove unnecessary flowerconfig.py
Add env var to suppress evaluation server warnings
Add check for active venv in dev-env-cli.sh
Cleanup run-platform.sh

Fixes

Load env vars and Django settings in properorder and BEFORE other modules are imported in all backend entrypoints.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

YES, this refactors the Django settings for backend (v1 and v2), execution-consumer, celery-flower and celery-beat.

Database Migrations

Env Config

New env var EVALUATION_SERVER_WARNINGS (default 'false') in backend/.env.
New default value of DJANGO_SETTINGS_MODULE='backend.settings.platform'.
New env file backend/.celery.env.

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

The following modes of invocations seem to be running fine:

Component	Command (after `cd backend && . .venv/bin/activate`)	Running mode
backend via gunicorn	`./entrypoint.sh`	Direct in local
backend development server	`python manage.py runserver 0.0.0.0:8000`	Direct in local
execution consumer	`.venv/bin/celery -A backend worker --loglevel=info -Q celery,celery_periodic_logs,celery_log_task_queue --autoscale=8,1`	Direct in local
celery beat	`.venv/bin/celery -A backend beat --scheduler django_celery_beat.schedulers:DatabaseScheduler -l INFO`	Direct in local
celery flower	`.venv/bin/celery -A backend flower --port=5555 --purge_offline_workers=5`	Direct in local
all	`./run-platform.sh -u -b -v current`	In Docker containers

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

* Improve logging and docs * Fix protobuf generated code version mismtach issue * Add settings/celery.py * Rename celery instance to celery * Refactor celery services config * Rename settings/dev.py to settings/platform.py * Fix env and django settings loading order in backend entrypoints * Remove django settings module update from utils * Remove unnecessary flowerconfig.py * Add env var to suppress evaluation server warnings * Add check for active venv in dev-env-cli.sh

* Allow building docker images locally from current branch * Cleanup run platform script * Update pdm lock files for backend and prompt service

hari-kuriakose · 2024-08-16T10:38:17Z

@muhammad-ali-e Observed below error in execution-consumer though:

[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'.
The message has been ignored and discarded.

Did you remember to import the module containing this task?
Or maybe you're using relative imports?

Please see
https://docs.celeryq.dev/en/latest/internals/protocol.html
for more information.

The full contents of the message body was:
b'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)

The full contents of the message headers:
{'lang': 'py', 'task': 'workflow_manager.workflow.execution_log_utils.consume_log_history', 'id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None],'root_id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen1@7747504f1b7b', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}}

The delivery info for this task is:
{'exchange': '', 'routing_key': 'celery_periodic_logs'}
Traceback (most recent call last):
File "/app/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 659, in on_task_received
strategy = strategies[type_]
KeyError: 'workflow_manager.workflow.execution_log_utils.consume_log_history'

Seems like one import was missed in the backend entrypoint for Celery based services after the changes. Please review.

hari-kuriakose · 2024-08-16T10:40:43Z

@tahierhussain Observed that static URL for logo was throwing not found in the login page after the changes:

Please review.

hari-kuriakose · 2024-08-16T10:42:42Z

@gaya3-zipstack @kirtimanmishrazipstack pdm.lock files from base branch caused resolution errors, hence needed to be regenerated manually. We might need to see if previously if the PDM lock PR agent didn't run or exited with error.

Please review.

jaseemjaskp

LGTM

chandrasekharan-zipstack · 2024-08-16T15:14:12Z

backend/backend/asgi.py

-from dotenv import load_dotenv
+from dotenv import find_dotenv, load_dotenv
+
+load_dotenv(find_dotenv() or "")


@hari-kuriakose isn't this the default params / behaviour of the load_dotenv() ?

Why do we call it this way?

Isi it necessary to call load_dotenv() again?

sonarcloud · 2024-08-16T15:57:53Z

Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarCloud

ritwik-g · 2024-08-20T06:41:56Z

backend/backend/settings/celery.py

+missing_settings = []
+
+
+def get_required_setting(


@hari-kuriakose we need to check missing_settings and raise error. Else using the get_required_setting wouldn't have any effect. Reference: https://github.com/Zipstack/unstract/blob/main/backend/backend/settings/base.py#L532-L536

muhammad-ali-e

LGTM

muhammad-ali-e · 2024-08-21T09:01:49Z

backend/backend/asgi.py

 )

+from django.core.asgi import get_asgi_application  # noqa: E402


Why this move to down? Is it for getting envs/settings before loading asgi?

muhammad-ali-e · 2024-08-21T09:18:02Z

backend/sample.env

@@ -97,6 +97,8 @@ STRUCTURE_TOOL_IMAGE_TAG="0.0.36"
 # Feature Flags
 EVALUATION_SERVER_IP=unstract-flipt
 EVALUATION_SERVER_PORT=9000
+EVALUATION_SERVER_WARNINGS="false"


why this wrapped in double quotes. I believe it would still be treated as a string even without the quote

muhammad-ali-e · 2024-08-21T09:39:10Z

@muhammad-ali-e Observed below error in execution-consumer though:

[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'.
The message has been ignored and discarded.

Did you remember to import the module containing this task?
Or maybe you're using relative imports?

@hari-kuriakose
It looks like there's a task stored in the database with the task path workflow_manager.workflow.execution_log_utils.consume_log_history. celery beat is triggering this task, but the consumer isn’t finding any registered task with this name when it receives the request.

Additionally, we have some tasks (methods/functions) in our backend service that use the shared_task decorator. These tasks are used both by our Django backend and by Celery. They should be registered with Celery when they run, since in the current setup, the backend code is also part of the consumer's (Celery's) code.

Are we planning to separate the consumer from the backend code?

hari-kuriakose self-assigned this Aug 15, 2024

* Add .celery.env

47d08fe

* Allow building docker images locally from current branch * Cleanup run platform script * Update pdm lock files for backend and prompt service

hari-kuriakose requested review from jaseemjaskp, muhammad-ali-e, chandrasekharan-zipstack, kirtimanmishrazipstack, gaya3-zipstack, Deepak-Kesavan, ritwik-g and athul-rs August 16, 2024 10:42

hari-kuriakose marked this pull request as ready for review August 16, 2024 10:45

jaseemjaskp approved these changes Aug 16, 2024

View reviewed changes

chandrasekharan-zipstack reviewed Aug 16, 2024

View reviewed changes

hari-kuriakose added 2 commits August 16, 2024 21:21

Update gitignore for env files

d285c4d

Update gitignore for env files

267e122

ritwik-g reviewed Aug 20, 2024

View reviewed changes

ritwik-g approved these changes Aug 21, 2024

View reviewed changes

Deepak-Kesavan approved these changes Aug 21, 2024

View reviewed changes

muhammad-ali-e approved these changes Aug 21, 2024

View reviewed changes

hari-kuriakose changed the title ~~Django settings cleanup, logging improvements and critical fixes~~ [ON HOLD] Django settings cleanup, logging improvements and critical fixes Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ON HOLD] Django settings cleanup, logging improvements and critical fixes #585

[ON HOLD] Django settings cleanup, logging improvements and critical fixes #585

hari-kuriakose commented Aug 15, 2024 •

edited

Loading

hari-kuriakose commented Aug 16, 2024

hari-kuriakose commented Aug 16, 2024 •

edited

Loading

hari-kuriakose commented Aug 16, 2024

jaseemjaskp left a comment

chandrasekharan-zipstack Aug 16, 2024

sonarcloud bot commented Aug 16, 2024

ritwik-g Aug 20, 2024

muhammad-ali-e left a comment

muhammad-ali-e Aug 21, 2024

muhammad-ali-e Aug 21, 2024

muhammad-ali-e commented Aug 21, 2024

		)

		from django.core.asgi import get_asgi_application # noqa: E402

[ON HOLD] Django settings cleanup, logging improvements and critical fixes #585

Are you sure you want to change the base?

[ON HOLD] Django settings cleanup, logging improvements and critical fixes #585

Conversation

hari-kuriakose commented Aug 15, 2024 • edited Loading

NOTE: This needs approval from ALL reviewers before merge.

What

Why

How

Features

Fixes

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

Database Migrations

Env Config

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

Screenshots

Checklist

hari-kuriakose commented Aug 16, 2024

hari-kuriakose commented Aug 16, 2024 • edited Loading

hari-kuriakose commented Aug 16, 2024

jaseemjaskp left a comment

Choose a reason for hiding this comment

chandrasekharan-zipstack Aug 16, 2024

Choose a reason for hiding this comment

sonarcloud bot commented Aug 16, 2024

Quality Gate failed

ritwik-g Aug 20, 2024

Choose a reason for hiding this comment

muhammad-ali-e left a comment

Choose a reason for hiding this comment

muhammad-ali-e Aug 21, 2024

Choose a reason for hiding this comment

muhammad-ali-e Aug 21, 2024

Choose a reason for hiding this comment

muhammad-ali-e commented Aug 21, 2024

hari-kuriakose commented Aug 15, 2024 •

edited

Loading

hari-kuriakose commented Aug 16, 2024 •

edited

Loading