Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use db_backup + management commands to implement data dump/load/seed #22693

Merged
merged 13 commits into from
Sep 26, 2024

Conversation

KevinMind
Copy link
Contributor

@KevinMind KevinMind commented Sep 23, 2024

Relates to: mozilla/addons#15020
Unblocks: #22663

Description

  • migrate data backup management to use django manmagement commands and db-backup under the hood
  • add test coverage for the behaviour
  • add synchronization of the storage volume with the database snapshot as those two data sources are tightly coupled.

Context

In order to include data initialization into the make up flow, we need a better way to dump/load/seed data. This PR moves the mechanism for exporting/importing data to use the db-backup lib orchestrated by management commands, includes test coverage for these behaviours and removes the concept from our make files.

db-backup is a light wrapper around django database connectors to use mysql commands for dumping/loading. This side steps issues we have with using native django dump/load data commands. Before this patch, we had a db connector and could implement this logic ourselves.. but looking at this lib it is well maintained, has a lot of activity and is very transparent looking through the source code. I cannot imagine implementing something much different then what they already have.

This unblocks the initialization part because the command for initializing data in make up will use seeding (which uses dumping) and loading in order to make sure the database is in the "correct" state based on the the arguments passed by the developer.

This can be extracted as a separate PR to make review and testing easier as this part is relatively low impact.

From this PR we also get synchronization of the ./storage directory with the database backup which squashes a whole range of bugs related to mismatches from db file references and the underlying file system.

Testing

  1. run a data dump.
make data_dump

Expect a directory with the timestamp name is added to ./backups. It should include a file db.sql and a storage file storage.tar which is an exact replica of the ./storage directory.

Optionally pass an argument ARGS=--name foo to put the data in a specific named directory.

  1. run data dump force

Assuming you dumped data to a directory with the name "foo" if you try to dump data there again, it will fail. You can make it pass by adding ARGS="--name foo --force" which will clean the foo directory before dumping.

  1. Seed data

Now run the command

make initialize

This should reset your database to the initial data.

Note. any data you previously had in your DB will now be gone. Expect no add-ons for the local_admin user.

  1. Load data

You should be able to load data from a directory by passing the directory name. Use the name from step 1.

Note, this should restore the data you had originally

make data_load ARGS=--name foo

If you don't pass a name or pass an invalid name the command will raise.

Checklist

  • Add #ISSUENUM at the top of your PR to an existing open issue in the mozilla/addons repository.
  • Successfully verified the change locally.
  • The change is covered by automated tests, or otherwise indicated why doing so is unnecessary/impossible.
  • Add before and after screenshots (Only for changes that impact the UI).
  • Add or update relevant docs reflecting the changes made.

@KevinMind KevinMind requested review from a team and eviljeff and removed request for a team September 23, 2024 11:50
@KevinMind KevinMind force-pushed the data-dump-load branch 2 times, most recently from be0348b to 15df10b Compare September 23, 2024 12:21
Copy link
Member

@eviljeff eviljeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc @diox had concerns that django's dumpdata wasn't going to work for us, but I don't know the details.

docs/topics/development/data_management.md Outdated Show resolved Hide resolved
src/olympia/amo/management/commands/dump_data.py Outdated Show resolved Hide resolved
src/olympia/lib/settings_base.py Outdated Show resolved Hide resolved
src/olympia/amo/management/commands/seed_data.py Outdated Show resolved Hide resolved
@KevinMind KevinMind changed the title Use django commands for data load/dump/seed with test coverage Use db_backup + management commands to implement data dump/load/seed Sep 24, 2024
@KevinMind
Copy link
Contributor Author

iirc @diox had concerns that django's dumpdata wasn't going to work for us, but I don't know the details.

I ran into these issues myself. I think it could be possible to mitigate but I found a library that does mysql level dumping/loading and also handles files.. so pretty much exactly what we are aiming for..

It's a dev dep and only installed on dev so I think it's relatively low risk to add. Also seems well maintained and I read through the code fairly easily.

Copy link
Member

@diox diox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using django-dbbackup seems the way to go, yeah. dumpdata and loaddata are a pain to work with when you have complex relationships (especially the kind that we have).

docs/topics/development/data_management.md Outdated Show resolved Hide resolved
docs/topics/development/data_management.md Outdated Show resolved Hide resolved
@KevinMind KevinMind requested a review from diox September 24, 2024 11:14
@KevinMind KevinMind force-pushed the data-dump-load branch 2 times, most recently from 189201c to 748234a Compare September 24, 2024 12:34
@KevinMind KevinMind requested a review from diox September 24, 2024 12:34
@KevinMind KevinMind force-pushed the data-dump-load branch 2 times, most recently from f10455b to 1dd6aa7 Compare September 24, 2024 17:02
KevinMind added a commit that referenced this pull request Sep 24, 2024
Use db_backup to manage database dump/load

TMP: fix docs

Align file names and command names

Reindex on data_load

Fix broken class assignment and add better error handling

TMP: Why is this test failing ..now

Remove dead settings
Makefile-docker Outdated Show resolved Hide resolved
src/olympia/amo/management/commands/data_seed.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_admin.py Outdated Show resolved Hide resolved
src/olympia/blocklist/tests/test_admin.py Outdated Show resolved Hide resolved
@eviljeff
Copy link
Member

@KevinMind I'm attempting to follow the testing instructions but I don't think they're up to date? make data_seed doesn't exist any longer, and there's no mention of the changes to make initialize

@KevinMind
Copy link
Contributor Author

@KevinMind I'm attempting to follow the testing instructions but I don't think they're up to date? make data_seed doesn't exist any longer, and there's no mention of the changes to make initialize

@eviljeff updated. You should run make initialize this will in turn run the seed management command

Copy link
Member

@eviljeff eviljeff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+wc - for me, just need to apply the suggestion to reintroduce create_db (or a similar change)

@KevinMind KevinMind merged commit 0d71e69 into master Sep 26, 2024
31 checks passed
@KevinMind KevinMind deleted the data-dump-load branch September 26, 2024 17:12
KevinMind added a commit that referenced this pull request Sep 26, 2024
Use db_backup to manage database dump/load

TMP: fix docs

Align file names and command names

Reindex on data_load

Fix broken class assignment and add better error handling

TMP: Why is this test failing ..now

Remove dead settings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants