Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory issue #42

Open
cmehta126 opened this issue Nov 13, 2017 · 6 comments
Open

Memory issue #42

cmehta126 opened this issue Nov 13, 2017 · 6 comments

Comments

@cmehta126
Copy link

cmehta126 commented Nov 13, 2017

I'm running Broccoli for permutation tests on MRI data. I'm getting an error of the type:

Run kernel error for kernel 'CalculateBetaWeightsGLM' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'
Run kernel error for kernel 'CalculateStatisticalMapsGLMTTest' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'
Run kernel error for kernel 'CalculateStatisticalMapsGLMTTestSecondLevelPermutation' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'
Run kernel error for kernel 'TransformData' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'

It seems this is a memory issue. Are there anyways getting around this? The data I'm permuting are spatial maps of dimension 256x256x256 for several hundred samples. I'm using a mask generated from "Smoothing".

What is more is that, prior to this error, the output of RandomiseGroupLevel states

Permutation threshold for contrast 1 for a significance level of 0.050000 is 0.000000
Permutation threshold for contrast 2 for a significance level of 0.050000 is 0.000000
Permutation threshold for contrast 3 for a significance level of 0.050000 is 0.000000

Could this imply that the data is too highly correlated over voxels for "RandomiseGroupLevel" to work properly or is it most likely a memory issue? The volumes file is 2.5 GB in total with 356 subjects. My device information is:

Device info


Platform number: 0

Platform vendor: NVIDIA Corporation
Platform name: NVIDIA CUDA
Platform extentions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Platform profile: FULL_PROFILE


Device number: 0

Device vendor: NVIDIA Corporation
Device name: Tesla K80
Hardware version: OpenCL 1.2 CUDA
Software version: 375.66
OpenCL C version: OpenCL C 1.2
Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Global memory size in MB: 11439
Size of largest memory object in MB: 2859
Global memory cache size in KB: 208
Local memory size in KB: 48
Constant memory size in KB: 64
Parallel compute units: 13
Clock frequency in MHz: 823
Max number of threads per block: 1024
Max number of threads in each dimension: 1024 1024 64

It seems the hardware could theoretically handle loading 2.5 Gigs of data (but not sure if it is enough for Permutation tests).

thank you.

Best,
Chintan

@wanderine
Copy link
Owner

wanderine commented Nov 13, 2017 via email

@cmehta126
Copy link
Author

RandomiseGroupLevel worked as I expected on my dataset after downsampling spatial maps in the input volume from 256x256x256 (1mm x 1mm x 1mm) to 128x128x128 (2mm x 2mm x 2mm). That significantly reduced the amount of memory needed for loading this data, without sacrificing specificity. The voxel resolution of the original data from diffusion weighted imaging (DWI) was on the order of (2mm x 2mm x 2mm) to begin with. I registered the DWI to FreeSurfer's CVS template which has voxel resolution (1mm x 1mm x 1mm) to enable group analysis, but I don't believe there is much loss of information by downsampling.

Regardless, here is the output from GetOpenCLInfo and RandomiseGroupLevel when using volumes of spatial maps with dimesnionality 256x256x256 (as I did originally and got error with)

GetOpenCLInfo:

Device info


Platform number: 0

Platform vendor: NVIDIA Corporation
Platform name: NVIDIA CUDA
Platform extentions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Platform profile: FULL_PROFILE


Device number: 0

Device vendor: NVIDIA Corporation
Device name: Tesla K80
Hardware version: OpenCL 1.2 CUDA
Software version: 375.66
OpenCL C version: OpenCL C 1.2
Device extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Global memory size in MB: 11439
Size of largest memory object in MB: 2859
Global memory cache size in KB: 208
Local memory size in KB: 48
Constant memory size in KB: 64
Parallel compute units: 13
Clock frequency in MHz: 823
Max number of threads per block: 1024
Max number of threads in each dimension: 1024 1024 64

The output of the call to RandomiseGroupLevel (this here is with 500 permutations. But same thing held when running 5000 permutations).

Authored by K.A. Eklund
Data size: 256 x 256 x 256 x 362
Number of permutations: 500
Number of regressors: 8
Number of contrasts: 3
Performing 3 t-tests
Correlation design detected for t-contrast 1
Correlation design detected for t-contrast 2
Correlation design detected for t-contrast 3
Max number of permutations for contrast 1 is inf
Max number of permutations for contrast 2 is inf
Max number of permutations for contrast 3 is inf
Starting permutation 1
Starting permutation 101
Starting permutation 201
Starting permutation 301
Starting permutation 401
Permutation threshold for contrast 1 for a significance level of 0.050000 is 0.000000
Starting permutation 1
Starting permutation 101
Starting permutation 201
Starting permutation 301
Starting permutation 401
Permutation threshold for contrast 2 for a significance level of 0.050000 is 0.000000
Starting permutation 1
Starting permutation 101
Starting permutation 201
Starting permutation 301
Starting permutation 401
Permutation threshold for contrast 3 for a significance level of 0.050000 is 0.000000
Run kernel error for kernel 'CalculateBetaWeightsGLM' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'
Run kernel error for kernel 'CalculateStatisticalMapsGLMTTest' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'
Run kernel error for kernel 'CalculateStatisticalMapsGLMTTestSecondLevelPermutation' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'
Run kernel error for kernel 'TransformData' is 'CL_MEM_OBJECT_ALLOCATION_FAILURE'

@wanderine
Copy link
Owner

wanderine commented Nov 14, 2017 via email

@cmehta126
Copy link
Author

Thank you. Does Broccoli have a way of using additional memory to augment the RAM of a graphics card, given my graphics card is limited to 11GB of memory (with largest object size capped at 2.8 GB). I have available ~1 TB of Fast SSD memory mounted.

@wanderine
Copy link
Owner

wanderine commented Nov 15, 2017 via email

@cmehta126
Copy link
Author

cmehta126 commented Nov 15, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@wanderine @cmehta126 and others