diff --git a/docs/data/how-to/hipgraph/hip_graph.drawio b/docs/data/how-to/hip_runtime_api/hipgraph/hip_graph.drawio similarity index 100% rename from docs/data/how-to/hipgraph/hip_graph.drawio rename to docs/data/how-to/hip_runtime_api/hipgraph/hip_graph.drawio diff --git a/docs/data/how-to/hipgraph/hip_graph.svg b/docs/data/how-to/hip_runtime_api/hipgraph/hip_graph.svg similarity index 100% rename from docs/data/how-to/hipgraph/hip_graph.svg rename to docs/data/how-to/hip_runtime_api/hipgraph/hip_graph.svg diff --git a/docs/data/how-to/hipgraph/hip_graph_speedup.drawio b/docs/data/how-to/hip_runtime_api/hipgraph/hip_graph_speedup.drawio similarity index 100% rename from docs/data/how-to/hipgraph/hip_graph_speedup.drawio rename to docs/data/how-to/hip_runtime_api/hipgraph/hip_graph_speedup.drawio diff --git a/docs/data/how-to/hipgraph/hip_graph_speedup.svg b/docs/data/how-to/hip_runtime_api/hipgraph/hip_graph_speedup.svg similarity index 100% rename from docs/data/how-to/hipgraph/hip_graph_speedup.svg rename to docs/data/how-to/hip_runtime_api/hipgraph/hip_graph_speedup.svg diff --git a/docs/how-to/hipgraph.rst b/docs/how-to/hip_runtime_api/hipgraph.rst similarity index 98% rename from docs/how-to/hipgraph.rst rename to docs/how-to/hip_runtime_api/hipgraph.rst index 83d6fa0f61..8c543dbf31 100644 --- a/docs/how-to/hipgraph.rst +++ b/docs/how-to/hip_runtime_api/hipgraph.rst @@ -9,6 +9,7 @@ HIP graphs ******************************************************************************** .. note:: + The HIP graph API is currently in Beta. Some features can change and might have outstanding issues. Not all features supported by CUDA graphs are yet supported. For a list of all currently supported functions see the @@ -31,11 +32,12 @@ The nodes can be one of the following: - signalling or waiting on external semaphores .. note:: + The available node types are specified by :cpp:enumerator:`hipGraphNodeType`. The following figure visualizes the concept of graphs, compared to using streams. -.. figure:: ../data/how-to/hipgraph/hip_graph.svg +.. figure:: ../../data/how-to/hip_runtime_api/hipgraph/hip_graph.svg :alt: Diagram depicting the difference between using streams to execute kernels with dependencies, resolved by explicitly synchronizing, or using graphs, where the edges denote the dependencies. @@ -56,7 +58,7 @@ HIP runtime takes care of executing the operations within the graph. Graphs can provide additional performance benefits, by enabling optimizations that are only possible when knowing the dependencies between the operations. -.. figure:: ../data/how-to/hipgraph/hip_graph_speedup.svg +.. figure:: ../../data/how-to/hip_runtime_api/hipgraph/hip_graph_speedup.svg :alt: Diagram depicting the speed up achievable with HIP graphs compared to HIP streams when launching many short-running kernels. @@ -189,6 +191,7 @@ The following code is an example of how to use the HIP graph API to capture a graph from a stream. .. code-block:: cpp + #include #include #include @@ -321,7 +324,7 @@ the graph, for example :cpp:func:`hipGraphAddKernelNode` See the available functions, they are of type ``hipGraphAdd{Type}Node``. Each type of node also has a predefined set of parameters depending on the operation, for example :cpp:class:`hipKernelNodeParams` for a kernel launch. See the -:doc:`documentation for the general hipGraphNodeParams type<../doxygen/html/structhip_graph_node_params>` +:doc:`documentation for the general hipGraphNodeParams type <../doxygen/html/structhip_graph_node_params>` for a list of available parameter types and their members. The general flow for explicitly creating a graph is usually: diff --git a/docs/how-to/hip_runtime_api/memory_management/coherence_control.rst b/docs/how-to/hip_runtime_api/memory_management/coherence_control.rst index a2cee45a04..8d573530ad 100644 --- a/docs/how-to/hip_runtime_api/memory_management/coherence_control.rst +++ b/docs/how-to/hip_runtime_api/memory_management/coherence_control.rst @@ -32,6 +32,7 @@ In HIP, host and device memory can be allocated with two different types of cohe them read-only. .. TODO: Is this still valid? What about Mi300? + Developers should use coarse-grained coherence where they can to reduce host-device interconnect communication and also Mi200 accelerators hardware based floating point instructions are working on coarse grained memory regions. diff --git a/docs/how-to/hip_runtime_api/memory_management/device_memory/texture_fetching.rst b/docs/how-to/hip_runtime_api/memory_management/device_memory/texture_fetching.rst index b3edcf637f..a7f2873dd5 100644 --- a/docs/how-to/hip_runtime_api/memory_management/device_memory/texture_fetching.rst +++ b/docs/how-to/hip_runtime_api/memory_management/device_memory/texture_fetching.rst @@ -9,7 +9,7 @@ Texture fetching ******************************************************************************* -`Textures <../../../doxygen/html/group___texture.html>`_ are more than just a buffer +`Textures <../../../../doxygen/html/group___texture.html>`_ are more than just a buffer interpreted as a 1D, 2D, or 3D array. As textures are associated with graphics, they are indexed using floating-point @@ -34,7 +34,7 @@ sections. Here is the sample texture used in this document for demonstration purposes. It is 2x2 texels and indexed in the [0 to 1] range. -.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/original.png +.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/original.png :width: 150 :alt: Sample texture :align: center @@ -68,7 +68,7 @@ The following image shows a texture stretched to a 4x4 pixel quad but still indexed in the [0 to 1] range. The in-between values are the same as the values of the nearest texel. -.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/nearest.png +.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/nearest.png :width: 300 :alt: Texture upscaled with nearest point sampling :align: center @@ -99,7 +99,7 @@ This following image shows a texture stretched out to a 4x4 pixel quad, but still indexed in the [0 to 1] range. The in-between values are interpolated between the neighboring texels. -.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/linear.png +.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/linear.png :width: 300 :alt: Texture upscaled with linear filtering :align: center @@ -126,7 +126,7 @@ bounds. The border value must be set before texture fetching. The following image shows the texture on a 4x4 pixel quad, indexed in the [0 to 3] range. The out-of-bounds values are the border color, which is yellow. -.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/border.png +.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/border.png :width: 300 :alt: Texture with yellow border color :align: center @@ -149,7 +149,7 @@ The following image shows the texture on a 4x4 pixel quad, indexed in the [0 to 3] range. The out-of-bounds values are repeating the values at the edge of the texture. -.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/clamp.png +.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/clamp.png :width: 300 :alt: Texture with clamp addressing :align: center @@ -174,7 +174,7 @@ This creates a repeating image effect. The following image shows the texture on a 4x4 pixel quad, indexed in the [0 to 3] range. The out-of-bounds values are repeating the original texture. -.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/wrap.png +.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/wrap.png :width: 300 :alt: Texture with wrap addressing :align: center @@ -203,7 +203,7 @@ The following image shows the texture on a 4x4 pixel quad, indexed in The [0 to 3] range. The out-of-bounds values are repeating the original texture, but mirrored. -.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/mirror.png +.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/mirror.png :width: 300 :alt: Texture with mirror addressing :align: center diff --git a/docs/how-to/stream_ordered_allocator.rst b/docs/how-to/hip_runtime_api/memory_management/stream_ordered_allocator.rst similarity index 99% rename from docs/how-to/stream_ordered_allocator.rst rename to docs/how-to/hip_runtime_api/memory_management/stream_ordered_allocator.rst index 5cef65a2c9..692dfae0de 100644 --- a/docs/how-to/stream_ordered_allocator.rst +++ b/docs/how-to/hip_runtime_api/memory_management/stream_ordered_allocator.rst @@ -216,6 +216,7 @@ Trim pools The memory allocator allows you to allocate and free memory in stream order. To control memory usage, set the release threshold attribute using ``hipMemPoolAttrReleaseThreshold``. This threshold specifies the amount of reserved memory in bytes to hold onto. .. code-block:: cpp + uint64_t threshold = UINT64_MAX; hipMemPoolSetAttribute(memPool, hipMemPoolAttrReleaseThreshold, &threshold); @@ -466,6 +467,7 @@ Here is how to read the pool exported in the preceding example: } .. _shareable-handle: + Shareable handle ---------------- diff --git a/docs/how-to/performance_guidelines.rst b/docs/how-to/performance_guidelines.rst index e119931865..d76ab1c1a0 100644 --- a/docs/how-to/performance_guidelines.rst +++ b/docs/how-to/performance_guidelines.rst @@ -22,6 +22,7 @@ optimization potential: This document discusses the usage and benefits of these cornerstones in detail. .. _parallel execution: + Parallel execution ==================== @@ -67,6 +68,7 @@ GPU resources, ranging from individual multiprocessors to the device as a whole. .. _memory optimization: + Memory throughput optimization =============================== @@ -94,6 +96,7 @@ impact on performance. The memory throughput optimization techniques are further discussed in detail in the following sections. .. _data transfer: + Data transfer --------------- @@ -112,6 +115,7 @@ memory accesses. The process where threads in a warp access sequential memory lo On integrated systems where device and host memory are physically the same, no copy operation between host and device memory is required and hence mapped page-locked memory should be used instead. To check if the device is integrated, applications can query the integrated device property. .. _device memory access: + Device memory access --------------------- @@ -158,6 +162,7 @@ Reading device memory through texture or surface fetching provides the following - Optional conversion of 8-bit and 16-bit integer input data to 32-bit floating-point values on the fly. .. _instruction optimization: + Optimization for maximum instruction throughput ================================================= @@ -185,6 +190,7 @@ Leverage intrinsic functions: Intrinsic functions are predefined functions avail Optimize memory access: The memory access efficiency can impact the speed of arithmetic operations. See: :ref:`device memory access`. .. _control flow instructions: + Control flow instructions --------------------------- diff --git a/docs/index.md b/docs/index.md index 2e3f0b9f5e..5e0a5fd9e3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -33,13 +33,11 @@ The HIP documentation is organized into the following categories: * {doc}`./how-to/hip_runtime_api/memory_management` * {doc}`./how-to/hip_runtime_api/cooperative_groups` * [HIP porting guide](./how-to/hip_porting_guide) -* [HIP porting: driver API guide](./how-to/hip_porting_driver_api) + * [HIP porting: driver API guide](./how-to/hip_porting_driver_api) * {doc}`./how-to/hip_rtc` * {doc}`./how-to/performance_guidelines` * [Debugging with HIP](./how-to/debugging) * {doc}`./how-to/logging` -* {doc}`./how-to/stream_ordered_allocator` -* [HIP graphs](./how-to/hipgraph) ::: @@ -48,7 +46,6 @@ The HIP documentation is organized into the following categories: * [HIP runtime API](./reference/hip_runtime_api_reference) * [Modules](./reference/hip_runtime_api/modules) * [Global defines, enums, structs and files](./reference/hip_runtime_api/global_defines_enums_structs_files) -* [HSA runtime API for ROCm](./reference/virtual_rocr) * [C++ language extensions](./reference/cpp_language_extensions) * [C++ language support](./reference/cpp_language_support) * [HIP math API](./reference/math_api) diff --git a/docs/reference/virtual_rocr.rst b/docs/reference/virtual_rocr.rst deleted file mode 100644 index 444882fc7e..0000000000 --- a/docs/reference/virtual_rocr.rst +++ /dev/null @@ -1,35 +0,0 @@ -.. meta:: - :description: This chapter lists user-mode API interfaces and libraries - necessary for host applications to launch compute kernels to - available HSA ROCm kernel agents. - :keywords: AMD, ROCm, HIP, HSA, ROCR runtime, virtual memory management - -******************************************************************************* -HSA runtime API for ROCm -******************************************************************************* - -The following functions are located in the https://github.com/ROCm/ROCR-Runtime repository. - -.. doxygenfunction:: hsa_amd_vmem_address_reserve - -.. doxygenfunction:: hsa_amd_vmem_address_free - -.. doxygenfunction:: hsa_amd_vmem_handle_create - -.. doxygenfunction:: hsa_amd_vmem_handle_release - -.. doxygenfunction:: hsa_amd_vmem_map - -.. doxygenfunction:: hsa_amd_vmem_unmap - -.. doxygenfunction:: hsa_amd_vmem_set_access - -.. doxygenfunction:: hsa_amd_vmem_get_access - -.. doxygenfunction:: hsa_amd_vmem_export_shareable_handle - -.. doxygenfunction:: hsa_amd_vmem_import_shareable_handle - -.. doxygenfunction:: hsa_amd_vmem_retain_alloc_handle - -.. doxygenfunction:: hsa_amd_vmem_get_alloc_properties_from_handle diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index d9f300119c..0f73d42dfe 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -46,16 +46,15 @@ subtrees: - file: how-to/hip_runtime_api/memory_management/coherence_control - file: how-to/hip_runtime_api/memory_management/unified_memory - file: how-to/hip_runtime_api/memory_management/virtual_memory + - file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator - file: how-to/hip_runtime_api/cooperative_groups + - file: how-to/hip_runtime_api/hipgraph - file: how-to/hip_porting_guide - file: how-to/hip_porting_driver_api - file: how-to/hip_rtc - file: how-to/performance_guidelines - file: how-to/debugging - file: how-to/logging - - file: how-to/hipgraph - title: HIP graphs - - file: how-to/stream_ordered_allocator - caption: Reference entries: @@ -101,7 +100,6 @@ subtrees: - file: reference/hip_runtime_api/global_defines_enums_structs_files/driver_types - file: doxygen/html/annotated - file: doxygen/html/files - - file: reference/virtual_rocr - file: reference/cpp_language_extensions title: C++ language extensions - file: reference/cpp_language_support