Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ON HOLD] Django settings cleanup, logging improvements and critical fixes #585

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

hari-kuriakose
Copy link
Contributor

@hari-kuriakose hari-kuriakose commented Aug 15, 2024

NOTE: This needs approval from ALL reviewers before merge.

What

  • Segregate the Django settings required for backend versus Celery powered services (execution-consumer, celery-flower and celery-beat).
  • Improve logging in some cases.
  • Fix issue where pointing to a new Django settings module in backend/.env was not picked up.
  • Fix issue where backend was breaking when directly running in local due to version mismatch of some protocol buffers generated code. THIS FIXES CONTRIBUTOR INSTALLATIONS.
  • Allow building docker images locally from current branch. BENEFITS CONTRIBUTORS.
  • Ensure another virtual env is not active while invoking dev env CLI commands. BENEFITS CONTRIBUTORS.
  • Update pdm.lock files for backend and prompt-service.

Why

  • Integrating Celery distributed queue with Django is beneficial but Django settings need to be isolated by component, to ensure only required modules are loaded by Django. For example, various INSTALLED_APPS, plugins, etc for backend need not be loaded for Celery.

    • Renaming settings/dev.py to settings/platform.py might be more apt, as gradually we can keep all common settings in base.py, and unique ones in platform.py and celery.py respectively.
    • Adding settings/celery.py keeps all Celery related settings in one place and also eliminates the need for flowerconfig.py.
    • Adding .celery.env ensures env is not polluted.
  • If fetching user orgs failed due to some reason, the error was not logged making it difficult to debug.

  • DJANGO_SETTINGS_MODULE should not be set in random places but ONLY in these four main entrypoints, BEFORE importing anything else:

    • backend/asgi.py (production ASGI)
    • backend/wsgi.py (production WSGI)
    • manage.py (development server)
    • backend/celery.py (Celery powered services)
  • Protocol buffers by nature generates some stub code for client and server integration from a .proto file. However, this stub code needs to be regenerated should the protocol buffers dependency version change. The easier alternative is to have PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python env setting which lets Python internally handle the stub code generation although relatively slower. This is what we do. However the other issue where the env vars and Django settings got loaded out of order caused mentioned protocol buffers env var to be unset, thus breaking backend.

  • Contributors should be able to run services directly in local always.

  • Evaluation server if not running will throw warnings, which could be suppressed by default to remove noise from the logs. We could have an option to show warnings when required, which will continue to help in debugging.

  • A local build of Docker images should be allowed to happen from any feature branch instead of main only.

  • Executing dev-env-cli.sh while another virtual env is already activated will cause undesired effects, hence need to be checked.

  • pdm.lock files base branch caused resolution errors, hence needed to be regenerated manually.

How

Features

  • Add error logging for fetching user orgs
  • Improve logging and docs
  • Add settings/celery.py
  • Add .celery.env and corresponding .sample.celery.env
  • Add backend/celery.py entrypoint for execution-consumer, celery-flower and celery-beat
  • Rename celery instance to celery to avoid confusion
  • Refactor celery services config
  • Rename settings/dev.py to settings/platform.py
  • Remove django settings module update from utils
  • Remove unnecessary flowerconfig.py
  • Add env var to suppress evaluation server warnings
  • Add check for active venv in dev-env-cli.sh
  • Cleanup run-platform.sh

Fixes

  • Load env vars and Django settings in properorder and BEFORE other modules are imported in all backend entrypoints.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • YES, this refactors the Django settings for backend (v1 and v2), execution-consumer, celery-flower and celery-beat.

Database Migrations

Env Config

  • New env var EVALUATION_SERVER_WARNINGS (default 'false') in backend/.env.
  • New default value of DJANGO_SETTINGS_MODULE='backend.settings.platform'.
  • New env file backend/.celery.env.

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

The following modes of invocations seem to be running fine:

Component Command (after cd backend && . .venv/bin/activate) Running mode
backend via gunicorn ./entrypoint.sh Direct in local
backend development server python manage.py runserver 0.0.0.0:8000 Direct in local
execution consumer .venv/bin/celery -A backend worker --loglevel=info -Q celery,celery_periodic_logs,celery_log_task_queue --autoscale=8,1 Direct in local
celery beat .venv/bin/celery -A backend beat --scheduler django_celery_beat.schedulers:DatabaseScheduler -l INFO Direct in local
celery flower .venv/bin/celery -A backend flower --port=5555 --purge_offline_workers=5 Direct in local
all ./run-platform.sh -u -b -v current In Docker containers

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

* Improve logging and docs
* Fix protobuf generated code version mismtach issue
* Add settings/celery.py
* Rename celery instance to celery
* Refactor celery services config
* Rename settings/dev.py to settings/platform.py
* Fix env and django settings loading order in backend entrypoints
* Remove django settings module update from utils
* Remove unnecessary flowerconfig.py
* Add env var to suppress evaluation server warnings
* Add check for active venv in dev-env-cli.sh
@hari-kuriakose hari-kuriakose self-assigned this Aug 15, 2024
* Allow building docker images locally from current branch
* Cleanup run platform script
* Update pdm lock files for backend and prompt service
@hari-kuriakose
Copy link
Contributor Author

@muhammad-ali-e Observed below error in execution-consumer though:

[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'.
The message has been ignored and discarded.

Did you remember to import the module containing this task?
Or maybe you're using relative imports?

Please see
https://docs.celeryq.dev/en/latest/internals/protocol.html
for more information.

The full contents of the message body was:
b'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)

The full contents of the message headers:
{'lang': 'py', 'task': 'workflow_manager.workflow.execution_log_utils.consume_log_history', 'id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None],'root_id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen1@7747504f1b7b', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}}

The delivery info for this task is:
{'exchange': '', 'routing_key': 'celery_periodic_logs'}
Traceback (most recent call last):
File "/app/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 659, in on_task_received
strategy = strategies[type_]
KeyError: 'workflow_manager.workflow.execution_log_utils.consume_log_history'

Seems like one import was missed in the backend entrypoint for Celery based services after the changes. Please review.

@hari-kuriakose
Copy link
Contributor Author

hari-kuriakose commented Aug 16, 2024

@tahierhussain Observed that static URL for logo was throwing not found in the login page after the changes:

oss_login_no_logo

Please review.

@hari-kuriakose
Copy link
Contributor Author

@gaya3-zipstack @kirtimanmishrazipstack pdm.lock files from base branch caused resolution errors, hence needed to be regenerated manually. We might need to see if previously if the PDM lock PR agent didn't run or exited with error.

Please review.

Copy link
Contributor

@jaseemjaskp jaseemjaskp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

from dotenv import load_dotenv
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv() or "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hari-kuriakose isn't this the default params / behaviour of the load_dotenv() ?

  1. Why do we call it this way?
  2. Isi it necessary to call load_dotenv() again?

Copy link

sonarcloud bot commented Aug 16, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarCloud

missing_settings = []


def get_required_setting(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hari-kuriakose we need to check missing_settings and raise error. Else using the get_required_setting wouldn't have any effect. Reference: https://github.com/Zipstack/unstract/blob/main/backend/backend/settings/base.py#L532-L536

Copy link
Contributor

@muhammad-ali-e muhammad-ali-e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

)

from django.core.asgi import get_asgi_application # noqa: E402
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this move to down? Is it for getting envs/settings before loading asgi?

@@ -97,6 +97,8 @@ STRUCTURE_TOOL_IMAGE_TAG="0.0.36"
# Feature Flags
EVALUATION_SERVER_IP=unstract-flipt
EVALUATION_SERVER_PORT=9000
EVALUATION_SERVER_WARNINGS="false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this wrapped in double quotes. I believe it would still be treated as a string even without the quote

@muhammad-ali-e
Copy link
Contributor

@muhammad-ali-e Observed below error in execution-consumer though:

[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'.
The message has been ignored and discarded.

Did you remember to import the module containing this task?
Or maybe you're using relative imports?

@hari-kuriakose
It looks like there's a task stored in the database with the task path workflow_manager.workflow.execution_log_utils.consume_log_history. celery beat is triggering this task, but the consumer isn’t finding any registered task with this name when it receives the request.

Additionally, we have some tasks (methods/functions) in our backend service that use the shared_task decorator. These tasks are used both by our Django backend and by Celery. They should be registered with Celery when they run, since in the current setup, the backend code is also part of the consumer's (Celery's) code.

Are we planning to separate the consumer from the backend code?

@hari-kuriakose hari-kuriakose changed the title Django settings cleanup, logging improvements and critical fixes [ON HOLD] Django settings cleanup, logging improvements and critical fixes Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants