Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor selectivity of claim_id based indexes of actions table when querying unclaimed actions. #1105

Open
prettyboymp opened this issue Sep 24, 2024 · 0 comments

Comments

@prettyboymp
Copy link
Contributor

The current process of running a batch of actions is followed by the release of the claim on actions. The release of the claim after running a batch does the following:

  1. Updates the claims with the given claim_id from that claim_id to 0.
  2. Delete the claim from the claims table.

Because we're resetting the claim_id back to 0 for all actions, including completed ones that don't need to be reclaimed, it reduces the selectivity of claim_id based indexes when searching for unclaimed actions. As an example, we have a site with 2.52M actions. Of those, 2.49M are completed actions and 25K are pending. This makes it so that almost all of the 2.52M records have a claim_id of 0 increasing the cost to use the claim_id_status_scheduled_date_gmt index, especially when trying to claim pending actions.

Suggestion

Instead of deleting a claim and resetting all the completed actions back to a claim_id of 0, only reset non-completed actions back to a claim_id of 0 when releasing a claim. Removing old claims can then be part of the job to clean old completed actions. This will increase the cardinality of claim_id column in the actions table, making indexes using it much more useful.

Related #1104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant