Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install RabbitMQ from conda on arm64 #482

Draft
wants to merge 3 commits into
base: awb
Choose a base branch
from
Draft

Conversation

danielhollas
Copy link
Contributor

@danielhollas danielhollas commented Jul 19, 2024

After a fair amount of work, we have finally arrived: rabbitmq arm64 build is now available on conda-forge! This let's us simplify the build considerably.

EDIT: Staged on top of #483, we should deploy these breaking changes together.

@@ -6,11 +6,6 @@ set -x
# Environment.
export SHELL=/bin/bash

# Fix https://github.com/aiidalab/aiidalab-docker-stack/issues/225
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue has been fixed a long time ago and should not be relevant in the new docker stack.

@danielhollas danielhollas force-pushed the rmq-arm64-from-conda branch 2 times, most recently from 90fa979 to 54ad202 Compare July 19, 2024 16:31
@danielhollas danielhollas marked this pull request as ready for review July 19, 2024 17:13
@danielhollas
Copy link
Contributor Author

@mikibonacci could you please test the full-stack image from this PR on your Mac Arm machine? I am specifically interested to see if rabbitmq works correctly, so if you could install the QeApp and run some quantum espresso calculations that would be great. Thank you!

@mikibonacci
Copy link
Member

Hi @danielhollas ! Thank you for this PR.
In trying the new image, I have memory issues: when I try to open the App Store, the kernel dies.
Maybe my laptop is not powerful enough (8 GB RAM), but this does not happen with the old full-stack image, so I it seems there is some additional memory consumption.

Not sure why, but I also tried rabbitmq-server==3.13.4 and the issue is still there. I will check more deeply, but it seems that when I use the notebook the memory usage increases from ~500 MB to ~700 MB (instead, with the old image, the usage remains constant at ~500 MB). It is not much, but maybe is sufficient to let the kernel die.

This is a screenshot of the stats of the container with the new image:

Screenshot 2024-07-20 at 13 29 52

Instead, the container with the old image show this stats:

Screenshot 2024-07-20 at 13 30 01

@danielhollas
Copy link
Contributor Author

@mikibonacci thank you very much for testing and debugging!

Very strange, rabbitmq version should not be influencing the notebook in any way. So to understand, the AiiDAlab home page loads, but when you open a second page (app store) the kernel crashes?

I'll test locally as well.

8Gb should be plenty. Can I ask how do you launch the container? It's possible that Docker ion Mac by default reduces available memory in the container?

@mikibonacci
Copy link
Member

mikibonacci commented Jul 20, 2024

Exactly, the container runs well as soon as I only open the home page, then if I open the app store I have the issue.

I launched the container from the docker desktop interface, but also manually (not with aiidalab-launch), and for both the cases (new and old image). After running the command rabbitmq-diagnostics status (in the aiida-core-services env), I see that the total memory usage for the new-image container is 0.1 GB versus 0.0919 GB for the old.

So maybe is not even a rabbitmq problem. I am not sure that Docker on Mac will reduces memory by default: I increased from 4 to 6 GB the memory per container - 8 is the total memory of my laptop - but nothing changed for the kernel.

@danielhollas
Copy link
Contributor Author

I can reproduce this on my computer, so this is not ARM specific, and probably comes from the new version of RabbitMQ. When I run verdi status it says it cannot connect to rabbitmq.

@danielhollas
Copy link
Contributor Author

danielhollas commented Jul 20, 2024

Here's the error I see when running rabbitmq-server start

=INFO REPORT==== 20-Jul-2024::15:17:09.629468 ===
    alarm_handler: {set,{system_memory_high_watermark,[]}}
2024-07-20 15:17:09.785577+00:00 [warning] <0.156.0> Using RABBITMQ_ADVANCED_CONFIG_FILE: /opt/conda/envs/aiida-core-services/etc/rabbitmq/advanced.config
2024-07-20 15:17:10.657751+00:00 [error] <0.254.0> Feature flags: `classic_mirrored_queue_version`: required feature flag not enabled! It must be enabled before upgrading RabbitMQ.
2024-07-20 15:17:10.657800+00:00 [error] <0.254.0> Failed to initialize feature flags registry: {disabled_required_feature_flag,
2024-07-20 15:17:10.657800+00:00 [error] <0.254.0>                                               classic_mirrored_queue_version}
2024-07-20 15:17:10.662211+00:00 [error] <0.254.0> 
2024-07-20 15:17:10.662211+00:00 [error] <0.254.0> BOOT FAILED
2024-07-20 15:17:10.662211+00:00 [error] <0.254.0> ===========
2024-07-20 15:17:10.662211+00:00 [error] <0.254.0> Error during startup: {error,failed_to_initialize_feature_flags_registry}
2024-07-20 15:17:10.662211+00:00 [error] <0.254.0> 

Article about RabbitMQ feature flags:

https://www.rabbitmq.com/docs/feature-flags

@danielhollas
Copy link
Contributor Author

Looks like there might be multiple issues:

  • If upgrading from a previously run container, rabbitmq may fail to start .Looks like upgrading RabbitMQ to from 3.9 directly to 3.13 might not be possible.
    https://www.rabbitmq.com/docs/upgrade#rolling-upgrades
  • Appstore page crashing seems to be independent from whether rabbitmq is running or not (as it should). I have no idea why it is crushing on this PR though, will need to investigate more.

@danielhollas danielhollas marked this pull request as draft July 20, 2024 15:40
@danielhollas danielhollas removed the request for review from unkcpz July 20, 2024 15:40
@danielhollas
Copy link
Contributor Author

I have filled a bug report about the appstore crashing: aiidalab/aiidalab#442 This issue is also present on aiidalab/full-stack:edge and is unrelated to this PR.

@mikibonacci I've rebased this PR and now the AppStore should not be crashing. But in any case, you can also install QEApp via command line with aiidalab install quantum-espresso, and that should allow you to test it. Thank you!

Copy link
Member

@mikibonacci mikibonacci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danielhollas , I tested for relax + band structure calculation and everything was fine. For me it can be merged. Thanks!

@@ -7,13 +7,13 @@
"default": "15"
},
"RMQ_VERSION": {
"default": "3.9.13"
"default": "3.13.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @danielhollas, you mentioned once you want to bring this up to aiida team, I was on holiday that week. @superstar54 told me you were there at the meeting and mentioned this. Is there going to be some tests add for the different version of RMQ?

Comment on lines +18 to +22
RUN mamba create -p /opt/conda/envs/aiida-core-services --yes \
postgresql=${PGSQL_VERSION} \
rabbitmq-server=${RMQ_VERSION} && \
mamba clean --all -f -y && \
fix-permissions "${CONDA_DIR}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Copy link
Member

@unkcpz unkcpz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good, I don't have too much to test. Since it only affect arm64, I'll trust @mikibonacci's tests.

@danielhollas danielhollas added the blocked This issue/PR is blocked by another issue/PR. label Jul 22, 2024
@danielhollas
Copy link
Contributor Author

Thank you for testing @mikibonacci.

Unfortunately, this cannot be merged yet, since it seems that when updating the image in an existing deployment that used the older RMQ version, the RMQ server fails to start, see my previous message with a link about rmq upgrades. We need to figure out how to upgrade safely first.

@unkcpz
Copy link
Member

unkcpz commented Jul 23, 2024

For the update, if we can assure daemon is off is that safe to delete old RMQ configs in home entirely and start from newly installed one?

@danielhollas
Copy link
Contributor Author

danielhollas commented Jul 23, 2024

For the update, if we can assure daemon is off is that safe to delete old RMQ configs in home entirely and start from newly installed one?

Yeah, that was my thought as well, but I am not sure what will happen if for example user had some calculations running, and then turned off the container. If we then delete RMQ config, can this lose some messages? What would be the effect of that? I don't understand the details well enough, maybe something to be brought up at aiida-core meeting, maybe Seb would have a better idea.

@danielhollas danielhollas changed the base branch from main to awb September 8, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked This issue/PR is blocked by another issue/PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants