diff --git a/.wordlist.txt b/.wordlist.txt index 2488764c1d..1578e1e1ea 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -48,6 +48,7 @@ FNUZ fp gedit GPGPU +GROMACS GWS hardcoded HC diff --git a/docs/conf.py b/docs/conf.py index 82bcefee89..23904ec0e0 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -50,5 +50,6 @@ exclude_patterns = [ "doxygen/mainpage.md", - "understand/glossary.md" + "understand/glossary.md", + 'how-to/debugging_env.rst' ] \ No newline at end of file diff --git a/docs/how-to/debugging.rst b/docs/how-to/debugging.rst index 4925e87b02..433d31de10 100644 --- a/docs/how-to/debugging.rst +++ b/docs/how-to/debugging.rst @@ -2,12 +2,13 @@ :description: How to debug using HIP. :keywords: AMD, ROCm, HIP, debugging, ltrace, ROCgdb, WinGDB +.. _debugging_with_hip: + ************************************************************************* Debugging with HIP ************************************************************************* -AMD debugging tools include *ltrace* and *ROCgdb*. External tools are available and can be found -online. For example, if you're using Windows, you can use *Microsoft Visual Studio* and *WinGDB*. +HIP debugging tools include `ltrace `_ and :doc:`ROCgdb `. External tools are available and can be found online. For example, if you're using Windows, you can use Microsoft Visual Studio and WinGDB. You can trace and debug your code using the following tools and techniques. @@ -272,102 +273,7 @@ HIP environment variable summary Here are some of the more commonly used environment variables: -.. - -.. # COMMENT: The following lines define a break for use in the table below. -.. |break| raw:: html - -
- -.. - -.. list-table:: - - * - **Environment variable** - - **Default value** - - **Usage** - - * - AMD_LOG_LEVEL - |break| Enable HIP log on different Level - - 0 - - 0: Disable log. - |break| 1: Enable log on error level - |break| 2: Enable log on warning and below levels - |break| 0x3: Enable log on information and below levels - |break| 0x4: Decode and display AQL packets - - * - AMD_LOG_MASK - |break| Enable HIP log on different Level - - 0x7FFFFFFF - - 0x1: Log API calls - |break| 0x02: Kernel and Copy Commands and Barriers - |break| 0x4: Synchronization and waiting for commands to finish - |break| 0x8: Enable log on information and below levels - |break| 0x20: Queue commands and queue contents - |break| 0x40: Signal creation, allocation, pool - |break| 0x80: Locks and thread-safety code - |break| 0x100: Copy debug - |break| 0x200: Detailed copy debug - |break| 0x400: Resource allocation, performance-impacting events - |break| 0x800: Initialization and shutdown - |break| 0x1000: Misc debug, not yet classified - |break| 0x2000: Show raw bytes of AQL packet - |break| 0x4000: Show code creation debug - |break| 0x8000: More detailed command info, including barrier commands - |break| 0x10000: Log message location - |break| 0xFFFFFFFF: Log always even mask flag is zero - - * - HIP_LAUNCH_BLOCKING - |break| Used for serialization on kernel execution. - - 0 - - 0: Disable. Kernel executes normally. - |break| 1: Enable. Serializes kernel enqueue, behaves the same as AMD_SERIALIZE_KERNEL. - - * - HIP_VISIBLE_DEVICES (or CUDA_VISIBLE_DEVICES) - |break| Only devices whose index is present in the sequence are visible to HIP - - - - 0,1,2: Depending on the number of devices on the system - - * - GPU_DUMP_CODE_OBJECT - |break| Dump code object - - 0 - - 0: Disable - |break| 1: Enable - - * - AMD_SERIALIZE_KERNEL - |break| Serialize kernel enqueue - - 0 - - 1: Wait for completion before enqueue - |break| 2: Wait for completion after enqueue - |break| 3: Both - - * - AMD_SERIALIZE_COPY - |break| Serialize copies - - 0 - - 1: Wait for completion before enqueue - |break| 2: Wait for completion after enqueue - |break| 3: Both - - * - HIP_HOST_COHERENT - |break| Coherent memory in hipHostMalloc - - 0 - - 0: memory is not coherent between host and GPU - |break| 1: memory is coherent with host - - * - AMD_DIRECT_DISPATCH - |break| Enable direct kernel dispatch (Currently for Linux; under development for Windows) - - 1 - - 0: Disable - |break| 1: Enable - - * - GPU_MAX_HW_QUEUES - |break| The maximum number of hardware queues allocated per device - - 4 - - The variable controls how many independent hardware queues HIP runtime can create per process, - per device. If an application allocates more HIP streams than this number, then HIP runtime reuses - the same hardware queues for the new streams in a round-robin manner. Note that this maximum - number does not apply to hardware queues that are created for CU-masked HIP streams, or - cooperative queues for HIP Cooperative Groups (single queue per device). +.. include:: ../how-to/debugging_env.rst General debugging tips ====================================================== diff --git a/docs/how-to/debugging_env.rst b/docs/how-to/debugging_env.rst new file mode 100644 index 0000000000..312cbb01db --- /dev/null +++ b/docs/how-to/debugging_env.rst @@ -0,0 +1,94 @@ +.. list-table:: + :header-rows: 1 + :widths: 35,14,51 + + * - **Environment variable** + - **Default value** + - **Value** + + * - | ``AMD_LOG_LEVEL`` + | Enables HIP log on various level. + - ``0`` + - | 0: Disable log. + | 1: Enables log on error level. + | 2: Enables log on warning and lower levels. + | 3: Enables log on information and lower levels. + | 4: Enables log on debug and lower levels. + + * - | ``AMD_LOG_LEVEL_FILE`` + | Sets output file for ``AMD_LOG_LEVEL``. + - stderr output + - + + * - | ``AMD_LOG_MASK`` + | Specifies HIP log filters. Here is the ` complete list of log masks `_. + - ``0x7FFFFFFF`` + - | 0x1: Log API calls. + | 0x2: Kernel and copy commands and barriers. + | 0x4: Synchronization and waiting for commands to finish. + | 0x8: Decode and display AQL packets. + | 0x10: Queue commands and queue contents. + | 0x20: Signal creation, allocation, pool. + | 0x40: Locks and thread-safety code. + | 0x80: Kernel creations and arguments, etc. + | 0x100: Copy debug. + | 0x200: Detailed copy debug. + | 0x400: Resource allocation, performance-impacting events. + | 0x800: Initialization and shutdown. + | 0x1000: Misc debug, not yet classified. + | 0x2000: Show raw bytes of AQL packet. + | 0x4000: Show code creation debug. + | 0x8000: More detailed command info, including barrier commands. + | 0x10000: Log message location. + | 0x20000: Memory allocation. + | 0x40000: Memory pool allocation, including memory in graphs. + | 0x80000: Timestamp details. + | 0xFFFFFFFF: Log always even mask flag is zero. + + * - | ``HIP_LAUNCH_BLOCKING`` + | Used for serialization on kernel execution. + - ``0`` + - | 0: Disable. Kernel executes normally. + | 1: Enable. Serializes kernel enqueue, behaves the same as ``AMD_SERIALIZE_KERNEL``. + + * - | ``HIP_VISIBLE_DEVICES`` (or ``CUDA_VISIBLE_DEVICES``) + | Only devices whose index is present in the sequence are visible to HIP + - Unset by default. + - 0,1,2: Depending on the number of devices on the system. + + * - | ``GPU_DUMP_CODE_OBJECT`` + | Dump code object. + - ``0`` + - | 0: Disable + | 1: Enable + + * - | ``AMD_SERIALIZE_KERNEL`` + | Serialize kernel enqueue. + - ``0`` + - | 0: Disable + | 1: Wait for completion before enqueue. + | 2: Wait for completion after enqueue. + | 3: Both + + * - | ``AMD_SERIALIZE_COPY`` + | Serialize copies + - ``0`` + - | 0: Disable + | 1: Wait for completion before enqueue. + | 2: Wait for completion after enqueue. + | 3: Both + + * - | ``AMD_DIRECT_DISPATCH`` + | Enable direct kernel dispatch (Currently for Linux; under development for Windows). + - ``1`` + - | 0: Disable + | 1: Enable + + * - | ``GPU_MAX_HW_QUEUES`` + | The maximum number of hardware queues allocated per device. + - ``4`` + - The variable controls how many independent hardware queues HIP runtime can create per process, + per device. If an application allocates more HIP streams than this number, then HIP runtime reuses + the same hardware queues for the new streams in a round-robin manner. Note that this maximum + number does not apply to hardware queues that are created for CU-masked HIP streams, or + cooperative queues for HIP Cooperative Groups (single queue per device). \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index a659b9b83a..1934dcd5b7 100644 --- a/docs/index.md +++ b/docs/index.md @@ -61,6 +61,7 @@ On non-AMD platforms, like NVIDIA, HIP provides header files required to support * [C++ language extensions](./reference/cpp_language_extensions) * [C++ language support](./reference/cpp_language_support) * [HIP math API](./reference/math_api) +* [HIP environment variables](./reference/env_variables) * [Comparing syntax for different APIs](./reference/terms) * [List of deprecated APIs](./reference/deprecated_api_list) * [FP8 numbers in HIP](./reference/fp8_numbers) diff --git a/docs/reference/env_variables.rst b/docs/reference/env_variables.rst new file mode 100644 index 0000000000..bb27def80a --- /dev/null +++ b/docs/reference/env_variables.rst @@ -0,0 +1,165 @@ +.. meta:: + :description: HIP environment variables reference + :keywords: AMD, HIP, environment variables, environment, reference + +************************************************************* +HIP environment variables +************************************************************* + +In this section, the reader can find all the important HIP environment variables on AMD platform, which are grouped by functionality. The full collection of the ROCm environment variables, which are grouped by projects is on the :doc:`ROCm environment variables page `. + +GPU isolation variables +======================= + +The GPU isolation environment variables in HIP are collected in the next table. For more information, check :doc:`GPU isolation page `. + +.. list-table:: + :header-rows: 1 + :widths: 70,30 + + * - **Environment variable** + - **Value** + + * - | ``ROCR_VISIBLE_DEVICES`` + | A list of device indices or UUIDs that will be exposed to applications. + - Example: ``0,GPU-DEADBEEFDEADBEEF`` + + * - | ``GPU_DEVICE_ORDINAL`` + | Devices indices exposed to OpenCL and HIP applications. + - Example: ``0,2`` + + * - | ``HIP_VISIBLE_DEVICES`` or ``CUDA_VISIBLE_DEVICES`` + | Device indices exposed to HIP applications. + - Example: ``0,2`` + +Profiling variables +=================== + +The profiling environment variables in HIP are collected in the next table. For more information, check :doc:`setting the number of CUs page `. + +.. list-table:: + :header-rows: 1 + :widths: 70,30 + + * - **Environment variable** + - **Value** + + * - | ``HSA_CU_MASK`` + | Sets the mask on a lower level of queue creation in the driver, + | this mask will also be set for queues being profiled. + - Example: ``1:0-8`` + + * - | ``ROC_GLOBAL_CU_MASK`` + | Sets the mask on queues created by the HIP or the OpenCL runtimes, + | this mask will also be set for queues being profiled. + - Example: ``0xf``, enables only 4 CUs + + * - | ``HIP_FORCE_QUEUE_PROFILING`` + | Used to run the app as if it were run in rocprof. Forces command queue + | profiling on by default. + - | 0: Disable + | 1: Enable + +Debug variables +=============== + +The debugging environment variables in HIP are collected in the next table. For more information, check :ref:`debugging_with_hip`. + +.. include:: ../how-to/debugging_env.rst + +Memory management related variables +=================================== + +The memory management related environment variables in HIP are collected in the next table. + +.. list-table:: + :header-rows: 1 + :widths: 35,14,51 + + * - **Environment variable** + - **Default value** + - **Value** + + * - | ``HIP_HIDDEN_FREE_MEM`` + | Amount of memory to hide from the free memory reported by hipMemGetInfo. + - ``0`` + - | 0: Disable + | Unit: megabyte (MB) + + * - | ``HIP_HOST_COHERENT`` + | Specifies if the memory is coherent between the host and GPU in ``hipHostMalloc``. + - ``0`` + - | 0: Memory is not coherent. + | 1: Memory is coherent. + | Environment variable has effect, if the following conditions are statisfied: + | - One of the ``hipHostMallocDefault``, ``hipHostMallocPortable``, ``hipHostMallocWriteCombined`` or ``hipHostMallocNumaUser`` flag set to 1. + | - ``hipHostMallocCoherent``, ``hipHostMallocNonCoherent`` and ``hipHostMallocMapped`` flags set to 0. + + * - | ``HIP_INITIAL_DM_SIZE`` + | Set initial heap size for device malloc. + - ``8388608`` + - | Unit: Byte + | The default value corresponds to 8 MB. + + * - | ``HIP_MEM_POOL_SUPPORT`` + | Enables memory pool support in HIP. + - ``0`` + - | 0: Disable + | 1: Enable + + * - | ``HIP_MEM_POOL_USE_VM`` + | Enables memory pool support in HIP. + - | ``0``: other OS + | ``1``: Windows + - | 0: Disable + | 1: Enable + + * - | ``HIP_VMEM_MANAGE_SUPPORT`` + | Virtual Memory Management Support. + - ``1`` + - | 0: Disable + | 1: Enable + + * - | ``GPU_MAX_HEAP_SIZE`` + | Set maximum size of the GPU heap to % of board memory. + - ``100`` + - | Unit: Percentage + + * - | ``GPU_MAX_REMOTE_MEM_SIZE`` + | Maximum size that allows device memory substitution with system. + - ``2`` + - | Unit: kilobyte (KB) + + * - | ``GPU_NUM_MEM_DEPENDENCY`` + | Number of memory objects for dependency tracking. + - ``256`` + - + + * - | ``GPU_STREAMOPS_CP_WAIT`` + | Force the stream memory operation to wait on CP. + - ``0`` + - | 0: Disable + | 1: Enable + + * - | ``HSA_LOCAL_MEMORY_ENABLE`` + | Enable HSA device local memory usage. + - ``1`` + - | 0: Disable + | 1: Enable + + * - | ``PAL_ALWAYS_RESIDENT`` + | Force memory resources to become resident at allocation time. + - ``0`` + - | 0: Disable + | 1: Enable + + * - | ``PAL_PREPINNED_MEMORY_SIZE`` + | Size of prepinned memory. + - ``64`` + - | Unit: kilobyte (KB) + + * - | ``REMOTE_ALLOC`` + | Use remote memory for the global heap allocation. + - ``0`` + - | 0: Disable + | 1: Enable diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 6a70b9e2ad..bac25b2930 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -87,6 +87,7 @@ subtrees: - file: reference/cpp_language_support title: C++ language support - file: reference/math_api + - file: reference/env_variables - file: reference/terms title: Comparing syntax for different APIs - file: reference/deprecated_api_list