Skip to content

Commit

Permalink
Documentation fix
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Oct 3, 2024
1 parent d5a6ca3 commit 4afc564
Show file tree
Hide file tree
Showing 12 changed files with 26 additions and 54 deletions.
File renamed without changes.
File renamed without changes
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ HIP graphs
********************************************************************************

.. note::

The HIP graph API is currently in Beta. Some features can change and might
have outstanding issues. Not all features supported by CUDA graphs are yet
supported. For a list of all currently supported functions see the
Expand All @@ -31,11 +32,12 @@ The nodes can be one of the following:
- signalling or waiting on external semaphores

.. note::

The available node types are specified by :cpp:enumerator:`hipGraphNodeType`.

The following figure visualizes the concept of graphs, compared to using streams.

.. figure:: ../data/how-to/hipgraph/hip_graph.svg
.. figure:: ../../data/how-to/hip_runtime_api/hipgraph/hip_graph.svg
:alt: Diagram depicting the difference between using streams to execute
kernels with dependencies, resolved by explicitly synchronizing,
or using graphs, where the edges denote the dependencies.
Expand All @@ -56,7 +58,7 @@ HIP runtime takes care of executing the operations within the graph.
Graphs can provide additional performance benefits, by enabling optimizations
that are only possible when knowing the dependencies between the operations.

.. figure:: ../data/how-to/hipgraph/hip_graph_speedup.svg
.. figure:: ../../data/how-to/hip_runtime_api/hipgraph/hip_graph_speedup.svg
:alt: Diagram depicting the speed up achievable with HIP graphs compared to
HIP streams when launching many short-running kernels.

Expand Down Expand Up @@ -189,6 +191,7 @@ The following code is an example of how to use the HIP graph API to capture a
graph from a stream.

.. code-block:: cpp
#include <hip/hip_runtime.h>
#include <vector>
#include <iostream>
Expand Down Expand Up @@ -321,7 +324,7 @@ the graph, for example :cpp:func:`hipGraphAddKernelNode` See the
available functions, they are of type ``hipGraphAdd{Type}Node``. Each type of
node also has a predefined set of parameters depending on the operation, for
example :cpp:class:`hipKernelNodeParams` for a kernel launch. See the
:doc:`documentation for the general hipGraphNodeParams type<../doxygen/html/structhip_graph_node_params>`
:doc:`documentation for the general hipGraphNodeParams type <../doxygen/html/structhip_graph_node_params>`
for a list of available parameter types and their members.

The general flow for explicitly creating a graph is usually:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ In HIP, host and device memory can be allocated with two different types of cohe
them read-only.

.. TODO: Is this still valid? What about Mi300?
Developers should use coarse-grained coherence where they can to reduce
host-device interconnect communication and also Mi200 accelerators hardware
based floating point instructions are working on coarse grained memory regions.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
Texture fetching
*******************************************************************************

`Textures <../../../doxygen/html/group___texture.html>`_ are more than just a buffer
`Textures <../../../../doxygen/html/group___texture.html>`_ are more than just a buffer
interpreted as a 1D, 2D, or 3D array.

As textures are associated with graphics, they are indexed using floating-point
Expand All @@ -34,7 +34,7 @@ sections.
Here is the sample texture used in this document for demonstration purposes. It
is 2x2 texels and indexed in the [0 to 1] range.

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/original.png
.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/original.png
:width: 150
:alt: Sample texture
:align: center
Expand Down Expand Up @@ -68,7 +68,7 @@ The following image shows a texture stretched to a 4x4 pixel quad but still
indexed in the [0 to 1] range. The in-between values are the same as the values
of the nearest texel.

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/nearest.png
.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/nearest.png
:width: 300
:alt: Texture upscaled with nearest point sampling
:align: center
Expand Down Expand Up @@ -99,7 +99,7 @@ This following image shows a texture stretched out to a 4x4 pixel quad, but
still indexed in the [0 to 1] range. The in-between values are interpolated
between the neighboring texels.

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/linear.png
.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/linear.png
:width: 300
:alt: Texture upscaled with linear filtering
:align: center
Expand All @@ -126,7 +126,7 @@ bounds. The border value must be set before texture fetching.
The following image shows the texture on a 4x4 pixel quad, indexed in the
[0 to 3] range. The out-of-bounds values are the border color, which is yellow.

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/border.png
.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/border.png
:width: 300
:alt: Texture with yellow border color
:align: center
Expand All @@ -149,7 +149,7 @@ The following image shows the texture on a 4x4 pixel quad, indexed in the
[0 to 3] range. The out-of-bounds values are repeating the values at the edge of
the texture.

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/clamp.png
.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/clamp.png
:width: 300
:alt: Texture with clamp addressing
:align: center
Expand All @@ -174,7 +174,7 @@ This creates a repeating image effect.
The following image shows the texture on a 4x4 pixel quad, indexed in the
[0 to 3] range. The out-of-bounds values are repeating the original texture.

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/wrap.png
.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/wrap.png
:width: 300
:alt: Texture with wrap addressing
:align: center
Expand Down Expand Up @@ -203,7 +203,7 @@ The following image shows the texture on a 4x4 pixel quad, indexed in The
[0 to 3] range. The out-of-bounds values are repeating the original texture, but
mirrored.

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/textures/mirror.png
.. figure:: ../../../../data/how-to/hip_runtime_api/memory_management/textures/mirror.png
:width: 300
:alt: Texture with mirror addressing
:align: center
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,7 @@ Trim pools
The memory allocator allows you to allocate and free memory in stream order. To control memory usage, set the release threshold attribute using ``hipMemPoolAttrReleaseThreshold``. This threshold specifies the amount of reserved memory in bytes to hold onto.

.. code-block:: cpp
uint64_t threshold = UINT64_MAX;
hipMemPoolSetAttribute(memPool, hipMemPoolAttrReleaseThreshold, &threshold);
Expand Down Expand Up @@ -466,6 +467,7 @@ Here is how to read the pool exported in the preceding example:
}
.. _shareable-handle:

Shareable handle
----------------

Expand Down
6 changes: 6 additions & 0 deletions docs/how-to/performance_guidelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ optimization potential:
This document discusses the usage and benefits of these cornerstones in detail.

.. _parallel execution:

Parallel execution
====================

Expand Down Expand Up @@ -67,6 +68,7 @@ GPU resources, ranging from individual multiprocessors to the device as a
whole.

.. _memory optimization:

Memory throughput optimization
===============================

Expand Down Expand Up @@ -94,6 +96,7 @@ impact on performance.
The memory throughput optimization techniques are further discussed in detail in the following sections.

.. _data transfer:

Data transfer
---------------

Expand All @@ -112,6 +115,7 @@ memory accesses. The process where threads in a warp access sequential memory lo
On integrated systems where device and host memory are physically the same, no copy operation between host and device memory is required and hence mapped page-locked memory should be used instead. To check if the device is integrated, applications can query the integrated device property.

.. _device memory access:

Device memory access
---------------------

Expand Down Expand Up @@ -158,6 +162,7 @@ Reading device memory through texture or surface fetching provides the following
- Optional conversion of 8-bit and 16-bit integer input data to 32-bit floating-point values on the fly.

.. _instruction optimization:

Optimization for maximum instruction throughput
=================================================

Expand Down Expand Up @@ -185,6 +190,7 @@ Leverage intrinsic functions: Intrinsic functions are predefined functions avail
Optimize memory access: The memory access efficiency can impact the speed of arithmetic operations. See: :ref:`device memory access`.

.. _control flow instructions:

Control flow instructions
---------------------------

Expand Down
5 changes: 1 addition & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,11 @@ The HIP documentation is organized into the following categories:
* {doc}`./how-to/hip_runtime_api/memory_management`
* {doc}`./how-to/hip_runtime_api/cooperative_groups`
* [HIP porting guide](./how-to/hip_porting_guide)
* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
* [HIP porting: driver API guide](./how-to/hip_porting_driver_api)
* {doc}`./how-to/hip_rtc`
* {doc}`./how-to/performance_guidelines`
* [Debugging with HIP](./how-to/debugging)
* {doc}`./how-to/logging`
* {doc}`./how-to/stream_ordered_allocator`
* [HIP graphs](./how-to/hipgraph)

:::

Expand All @@ -48,7 +46,6 @@ The HIP documentation is organized into the following categories:
* [HIP runtime API](./reference/hip_runtime_api_reference)
* [Modules](./reference/hip_runtime_api/modules)
* [Global defines, enums, structs and files](./reference/hip_runtime_api/global_defines_enums_structs_files)
* [HSA runtime API for ROCm](./reference/virtual_rocr)
* [C++ language extensions](./reference/cpp_language_extensions)
* [C++ language support](./reference/cpp_language_support)
* [HIP math API](./reference/math_api)
Expand Down
35 changes: 0 additions & 35 deletions docs/reference/virtual_rocr.rst

This file was deleted.

6 changes: 2 additions & 4 deletions docs/sphinx/_toc.yml.in
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,15 @@ subtrees:
- file: how-to/hip_runtime_api/memory_management/coherence_control
- file: how-to/hip_runtime_api/memory_management/unified_memory
- file: how-to/hip_runtime_api/memory_management/virtual_memory
- file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator
- file: how-to/hip_runtime_api/cooperative_groups
- file: how-to/hip_runtime_api/hipgraph
- file: how-to/hip_porting_guide
- file: how-to/hip_porting_driver_api
- file: how-to/hip_rtc
- file: how-to/performance_guidelines
- file: how-to/debugging
- file: how-to/logging
- file: how-to/hipgraph
title: HIP graphs
- file: how-to/stream_ordered_allocator

- caption: Reference
entries:
Expand Down Expand Up @@ -101,7 +100,6 @@ subtrees:
- file: reference/hip_runtime_api/global_defines_enums_structs_files/driver_types
- file: doxygen/html/annotated
- file: doxygen/html/files
- file: reference/virtual_rocr
- file: reference/cpp_language_extensions
title: C++ language extensions
- file: reference/cpp_language_support
Expand Down

0 comments on commit 4afc564

Please sign in to comment.