Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable parquet cache config through CLI #25415

Merged
merged 2 commits into from
Oct 1, 2024

Conversation

hiltontj
Copy link
Contributor

Closes #25395

This adds the following options to the influxdb3 serve CLI:

      --parquet-mem-cache-size <PARQUET_MEM_CACHE_SIZE>
          The size of the in-memory Parquet cache in bytes
          
          [env: INFLUXDB3_PARQUET_MEM_CACHE_SIZE=]
          [default: 1073741824]

      --parquet-mem-cache-prune-percentage <PARQUET_MEM_CACHE_PRUNE_PERCENTAGE>
          The percentage of entries to prune during a prune operation on the in-memory Parquet cache
          
          [env: INFLUXDB3_PARQUET_MEM_CACHE_PRUNE_PERCENTAGE=]
          [default: 0.1]

      --parquet-mem-cache-prune-interval <PARQUET_MEM_CACHE_PRUNE_INTERVAL>
          The interval on which to check if the in-memory Parquet cache needs to be pruned
          
          [env: INFLUXDB3_PARQUET_MEM_CACHE_PRUNE_INTERVAL=]
          [default: 10ms]

      --disable-parquet-mem-cache
          Disable the in-memory Parquet cache
          
          [env: INFLUXDB3_DISABLE_PARQUET_MEM_CACHE=]

With respect to the original issue, I decided to add the prune percent and interval asa CLI options, since if we or the perf team end up needing them there for whatever reason, it would be a pain if they were omitted.

Copy link
Member

@pauldix pauldix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small comments.

@@ -216,6 +217,67 @@ pub struct Config {
/// for any hosts that share the same object store configuration, i.e., the same bucket.
#[clap(long = "host-id", env = "INFLUXDB3_HOST_IDENTIFIER_PREFIX", action)]
pub host_identifier_prefix: String,

/// The size of the in-memory Parquet cache in bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better/more user friendly to have this value in MB?

#[clap(
long = "parquet-mem-cache-prune-interval",
env = "INFLUXDB3_PARQUET_MEM_CACHE_PRUNE_INTERVAL",
default_value = "10ms",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this just grab a read lock? Seems like we don't need to check this so frequently by default, once a second seems more than enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, only if it needs to prune would lock the inner map, but I agree, this is a little over-zealous.

Copy link
Contributor

@praveen-influx praveen-influx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - apart from the points Paul's raised

@hiltontj hiltontj merged commit 83aca43 into main Oct 1, 2024
12 checks passed
@hiltontj hiltontj deleted the hiltontj/parquet-cache-cli branch October 1, 2024 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the parquet cache configurable from the CLI
3 participants