Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move autoremove builds logic to a scheduled job #2468

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

zackgalbreath
Copy link
Contributor

Decouple the cleanup of old builds from the submission parsing step

We've noticed some large production sites struggle to delete large
batches of builds in a performant manner.

To help address this issue, this commit introduces a new configuration option
to control the number of builds that CDash attempts to remove in a single pass.

This commit also reduces the default batch size from 100 to 10.
@zackgalbreath
Copy link
Contributor Author

it would be good to verify that this doesn't cause any performance degradations on a "production sized" database

Copy link
Collaborator

@williamjallen williamjallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have more code comments later once we agree on the big-picture design.

# How many builds should CDash try to remove per iteration?
# Consider tweaking this value if you notice the autoremove queries
# are slow to execute.
#AUTOREMOVE_BUILDS_BATCH_SIZE=10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding another configuration option, what do you think about adding some logic to automatically adjust the batch size to keep the running time for a given batch under a set time limit, while maximizing the deletion rate under that time limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of figuring out the batch size automatically rather than making the CDash admin figure it out via trial & error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about this further, I wonder if we should even bother with batching in the first place. If we simply delete one at a time, deletion rate will probably be suboptimal, but we are practically guaranteed that the deletion process won't hang.

If each delete is guaranteed to be fast, we can delete for a maximum period of 5 minutes, and schedule the job to run every 10 minutes. This 5-minutes-on-5-minutes-off approach gets rid of the large monolithic job and keeps cleanup tasks out of the way of submission processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One-at-a-time is fine with me! Practically speaking, that's the patch I've resorted to applying on large production instances that hang when attempting to delete 100 builds in a single pass.

app/Console/Commands/CleanupDatabase.php Show resolved Hide resolved
/** Delete unused rows */
private static function delete_unused_rows(string $table, string $field, string $targettable, string $selectfield = 'id'): void
{
DB::delete("DELETE FROM $table WHERE $field NOT IN (SELECT $selectfield AS $field FROM $targettable)");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this delete in batches too? You can add a LIMIT clause to the end, and then iterate until no more results are deleted (DB::delete returns the number of rows deleted).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out postgres does not support DELETE FROM ... LIMIT, so we would need to do something more clever here to allow for deleting in batches.

Copy link
Collaborator

@williamjallen williamjallen Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could try DELETE FROM ... WHERE ... IN (SELECT ... LIMIT ...) instead. I'm not sure whether the optimizer would be smart enough to do it optimally, but it's probably worth a shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that result in data loss? The inner SELECT is telling us what rows to keep. If we put a LIMIT 1 (eg.) there, the outer DELETE would then remove all but one record from the table, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I mistyped the pseudo-query. See edit.

app/Console/Kernel.php Outdated Show resolved Hide resolved
app/Http/Controllers/AdminController.php Show resolved Hide resolved

# Skip the autoremove builds step if more than this number of submissions
# are waiting to be parsed.
#AUTOREMOVE_BUILDS_SKIP_THRESHOLD=100
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a new configuration option, what do you think about using queue priorities to automatically prioritize submissions if such jobs exist? That way, the queuing system will automatically handle cleanup tasks whenever submission activity is low.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea to me!

This commit extracts the automatic removal of old builds from CDash's
submission parsing logic. Instead, old builds will now be periodically
cleaned up as a scheduled job.
The autoremove functionality has been decoupled from the submission parsing
workflow, so this test is now obsolete.
Delete custom queries from removeBuilds to detect when shared records are
no longer needed.

Instead, run the unconditional db:cleanup command at the end of the scheduled
autoremove task.

While writing this commit, the following tables were already handled by db:cleanup:
  - buildfailuredetails
  - configure
  - configureerror
  - coveragefile
  - test2image

The following tables represent potentially shared data that wasn't already handled
by db:cleanup:
  - note
  - buildupdate
  - testoutput
  - updatefile
  - image
  - uploadfile

The following tables were found to already have cascade-on-delete foreign keys,
and thus their explicit DELETE logic was deemed safe to remove:
  - build2uploadfile
  - dynamicanalysisdefect
  - label2dynamicanalysis
  - label2buildfailure
Use the artisan command or scheduled task instead.
Attempting to clean these CDash tables is redundant because they are already
using cascading deletions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants