-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move autoremove builds logic to a scheduled job #2468
base: master
Are you sure you want to change the base?
Conversation
We've noticed some large production sites struggle to delete large batches of builds in a performant manner. To help address this issue, this commit introduces a new configuration option to control the number of builds that CDash attempts to remove in a single pass. This commit also reduces the default batch size from 100 to 10.
it would be good to verify that this doesn't cause any performance degradations on a "production sized" database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have more code comments later once we agree on the big-picture design.
# How many builds should CDash try to remove per iteration? | ||
# Consider tweaking this value if you notice the autoremove queries | ||
# are slow to execute. | ||
#AUTOREMOVE_BUILDS_BATCH_SIZE=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding another configuration option, what do you think about adding some logic to automatically adjust the batch size to keep the running time for a given batch under a set time limit, while maximizing the deletion rate under that time limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of figuring out the batch size automatically rather than making the CDash admin figure it out via trial & error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking about this further, I wonder if we should even bother with batching in the first place. If we simply delete one at a time, deletion rate will probably be suboptimal, but we are practically guaranteed that the deletion process won't hang.
If each delete is guaranteed to be fast, we can delete for a maximum period of 5 minutes, and schedule the job to run every 10 minutes. This 5-minutes-on-5-minutes-off approach gets rid of the large monolithic job and keeps cleanup tasks out of the way of submission processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One-at-a-time is fine with me! Practically speaking, that's the patch I've resorted to applying on large production instances that hang when attempting to delete 100 builds in a single pass.
/** Delete unused rows */ | ||
private static function delete_unused_rows(string $table, string $field, string $targettable, string $selectfield = 'id'): void | ||
{ | ||
DB::delete("DELETE FROM $table WHERE $field NOT IN (SELECT $selectfield AS $field FROM $targettable)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this delete in batches too? You can add a LIMIT
clause to the end, and then iterate until no more results are deleted (DB::delete
returns the number of rows deleted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out postgres does not support DELETE FROM ... LIMIT
, so we would need to do something more clever here to allow for deleting in batches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could try DELETE FROM ... WHERE ... IN (SELECT ... LIMIT ...)
instead. I'm not sure whether the optimizer would be smart enough to do it optimally, but it's probably worth a shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't that result in data loss? The inner SELECT
is telling us what rows to keep. If we put a LIMIT 1
(eg.) there, the outer DELETE would then remove all but one record from the table, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops I mistyped the pseudo-query. See edit.
|
||
# Skip the autoremove builds step if more than this number of submissions | ||
# are waiting to be parsed. | ||
#AUTOREMOVE_BUILDS_SKIP_THRESHOLD=100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a new configuration option, what do you think about using queue priorities to automatically prioritize submissions if such jobs exist? That way, the queuing system will automatically handle cleanup tasks whenever submission activity is low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea to me!
This commit extracts the automatic removal of old builds from CDash's submission parsing logic. Instead, old builds will now be periodically cleaned up as a scheduled job.
The autoremove functionality has been decoupled from the submission parsing workflow, so this test is now obsolete.
Delete custom queries from removeBuilds to detect when shared records are no longer needed. Instead, run the unconditional db:cleanup command at the end of the scheduled autoremove task. While writing this commit, the following tables were already handled by db:cleanup: - buildfailuredetails - configure - configureerror - coveragefile - test2image The following tables represent potentially shared data that wasn't already handled by db:cleanup: - note - buildupdate - testoutput - updatefile - image - uploadfile The following tables were found to already have cascade-on-delete foreign keys, and thus their explicit DELETE logic was deemed safe to remove: - build2uploadfile - dynamicanalysisdefect - label2dynamicanalysis - label2buildfailure
Use the artisan command or scheduled task instead.
9197858
to
be7f3ef
Compare
Attempting to clean these CDash tables is redundant because they are already using cascading deletions.
be7f3ef
to
173e373
Compare
Decouple the cleanup of old builds from the submission parsing step