Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

high system memory usage when extracting embeddings #687

Open
joshctaylor opened this issue Aug 30, 2024 · 1 comment
Open

high system memory usage when extracting embeddings #687

joshctaylor opened this issue Aug 30, 2024 · 1 comment

Comments

@joshctaylor
Copy link

joshctaylor commented Aug 30, 2024

Hi,

I'm wondering if there is a simple way of restricting RAM (not vRAM) use when calculating embeddings - sometimes I need to use a laptop.

I'm figuring that the multi-threading used to load the GPU is taking up a lot of RAM?

Thanks all

@joshctaylor joshctaylor changed the title Is there a way to limit threads to use embedding.py when RAM is tight high system memory usage when extracting embeddings Sep 1, 2024
@joshctaylor
Copy link
Author

joshctaylor commented Sep 1, 2024

I've dug into this a little now, it seems that around 150GB of ram is needed when the process starts. 5x memory allocation error messages are shown before the process settles down to using 15GB system ram.

As there are 5x workers for for multi_load_audio_window in chirp/audio_utils.py this could point to where the issue might be.

I'm using NVIDIA A100 80GB on Intel Xeon 24 core VM with 220 GB ram, if I try to use a lesser machine, the process is killed by the linux kernel when it exhausts ram and swap.

I'm working on 1 hour duration Flac formal files.

Found 0 existing embedding ids. 
Processing 1574 new source infos. 
  0%|          | 0/1574 [00:00<?, ?it/s]2024-09-01 20:47:30.019656: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
2024-09-01 20:47:30.484099: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator jax2tf_infer_fn_/assert_equal_1/Assert/AssertGuard/Assert
2024-09-01 20:47:33.430616: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng0{} for conv (f32[708,640,501,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[708,1,160640,1]{3,2,1,0}, f32[640,1,640,1]{3,2,1,0}), window={size=640x1 stride=320x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...

...... simalar messages ....

Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[708,144,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[708,144,125,40]{3,2,1,0}, f32[144,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=144, custom_call_target="__cudnn$convForward", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...
W0000 00:00:1725223732.544358    5221 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
  0%|          | 1/1574 [01:36<42:03:23, 96.25s/it]2024-09-01 20:48:56.767735: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
  0%|          | 2/1574 [01:40<18:27:41, 42.28s/it]2024-09-01 20:48:58.825938: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
  0%|          | 3/1574 [01:43<10:31:21, 24.11s/it]2024-09-01 20:49:00.984838: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
  0%|          | 4/1574 [01:44<6:38:29, 15.23s/it] 2024-09-01 20:49:02.478230: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant