-
Notifications
You must be signed in to change notification settings - Fork 732
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SYCL][Ext] Query kernel maximum active work-groups based on occupancy
The currently proposed and implemented query is `max_num_work_group_occupancy_per_cu` which retrieves the maximum actively executing workgroups based on compute unit occupancy granularity. This commit also fixes an issue in the `max_num_work_group_sync` query that could have previously lead to out of launch resources issue. Additionally, it also overloads the `max_num_num_work_group_sync` query to take extra parameters for local work-group size and local dynamic memory size (in bytes) in order to be allow users to pass those important resource usage factors to the query, so they are take in account in the final group count suggestion. This overload is currently only usable when targetting Cuda.
- Loading branch information
Showing
13 changed files
with
360 additions
and
19 deletions.
There are no files selected for viewing
154 changes: 154 additions & 0 deletions
154
sycl/doc/extensions/experimental/sycl_ext_oneapi_group_occupancy_queries.asciidoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,154 @@ | ||
= sycl_ext_oneapi_group_occupancy_queries | ||
|
||
:source-highlighter: coderay | ||
:coderay-linenums-mode: table | ||
|
||
// This section needs to be after the document title. | ||
:doctype: book | ||
:toc2: | ||
:toc: left | ||
:encoding: utf-8 | ||
:lang: en | ||
:dpcpp: pass:[DPC++] | ||
|
||
// Set the default source code type in this document to C++, | ||
// for syntax highlighting purposes. This is needed because | ||
// docbook uses c++ and html5 uses cpp. | ||
:language: {basebackend@docbook:c++:cpp} | ||
|
||
|
||
== Notice | ||
|
||
[%hardbreaks] | ||
Copyright (C) 2024 Intel Corporation. All rights reserved. | ||
|
||
Khronos(R) is a registered trademark and SYCL(TM) and SPIR(TM) are trademarks | ||
of The Khronos Group Inc. OpenCL(TM) is a trademark of Apple Inc. used by | ||
permission by Khronos. | ||
|
||
|
||
== Contact | ||
|
||
To report problems with this extension, please open a new issue at: | ||
|
||
https://github.com/intel/llvm/issues | ||
|
||
|
||
== Dependencies | ||
|
||
This extension is written against the SYCL 2020 revision 5 specification. All | ||
references below to the "core SYCL specification" or to section numbers in the | ||
SYCL specification refer to that revision. | ||
|
||
This extension also depends on the following other SYCL extensions: | ||
|
||
* link:../proposed/sycl_ext_oneapi_launch_queries.asciidoc[ | ||
sycl_ext_oneapi_launch_queries] | ||
|
||
|
||
== Status | ||
|
||
This is an experimental extension specification, intended to provide early | ||
access to features and gather community feedback. Interfaces defined in this | ||
specification are implemented in {dpcpp}, but they are not finalized and may | ||
change incompatibly in future versions of {dpcpp} without prior notice. | ||
*Shipping software products should not rely on APIs defined in this | ||
specification.* | ||
|
||
|
||
== Overview | ||
|
||
This extension is based on the kernel-queue-specific specific querying mechanism | ||
introduced by the sycl_ext_oneapi_launch_queries extension. | ||
|
||
The purpose of queries the to be added is to aid occupancy based calculations | ||
for kernel launches based on hardware occupancy per compute unit granularity. | ||
The queries take in account the kernel resources and user-specified constraints, | ||
such as, but not limited to, local (work-group) size and dynamic work-group | ||
local memory (in bytes). The motivation behind is to aid the tuning of kernels, | ||
by being able to design the algorithm's implementation to maintain the highest | ||
possible occupancy in a portable way. | ||
|
||
List of currently planned queries. | ||
* max_num_work_group_occupancy_per_cu | ||
|
||
[source,c++] | ||
---- | ||
sycl::queue q{}; | ||
auto bundle = sycl::get_kernel_bundle(q.get_context()); | ||
auto kernel = bundle.get_kernel<class KernelName>(); | ||
auto wgSizeRange = sycl::range{32, 1, 1}; | ||
size_t localMemorySize = 32; | ||
namespace syclex = sycl::ext::oneapi::experimental; | ||
uint32_t maxWGsPerCU = kernel.ext_oneapi_get_info< | ||
syclex::info::kernel_queue_specific::max_num_work_group_occupancy_per_cu>( | ||
q, wgSizeRange, localMemorySize); | ||
---- | ||
|
||
NOTE: SYCL 2020 requires lambdas to be named in order to locate the associated | ||
`sycl::kernel` object used to query information descriptors. Reducing the | ||
verbosity of the queries shown above is left to a future extension. | ||
|
||
|
||
== Specification | ||
|
||
=== Feature test macro | ||
|
||
This extension provides a feature-test macro as described in the core SYCL | ||
specification. An implementation supporting this extension must predefine the | ||
macro `SYCL_EXT_ONEAPI_GROUP_OCCUPANCY_QUERIES` to one of the values defined in | ||
the table below. Applications can test for the existence of this macro to | ||
determine if the implementation supports this feature, or applications can test | ||
the macro's value to determine which of the extension's features the | ||
implementation supports. | ||
|
||
[%header,cols="1,5"] | ||
|=== | ||
|Value | ||
|Description | ||
|
||
|1 | ||
|The APIs of this experimental extension are not versioned, so the | ||
feature-test macro always has this value. | ||
|=== | ||
|
||
|
||
=== Occupancy queries | ||
|
||
[source, c++] | ||
---- | ||
namespace ext::oneapi::experimental::info::kernel { | ||
struct max_num_work_group_occupancy_per_cu; | ||
} | ||
---- | ||
|
||
[%header,cols="1,5,5,5"] | ||
|=== | ||
|Kernel Descriptor | ||
|Argument Types | ||
|Return Type | ||
|Description | ||
|
||
|`max_num_work_group_occupancy_per_cu` | ||
|`sycl::queue`, `sycl::range`, `size_t` | ||
|`uint32_t` | ||
|Returns the maximum number of actively executing work-groups per compute unit | ||
granularity, when the kernel is submitted to the specified queue with the | ||
specified work-group size and the specified amount of dynamic work-group local | ||
memory (in bytes). The actively executing work-groups are those that occupy | ||
the fundamental hardware unit responsible for the execution of work-groups in | ||
parallel. | ||
|
||
|=== | ||
|
||
== Implementation notes | ||
|
||
The implementation needs to define `sycl::kernel::ext_onapi_get_info` with the | ||
extra `sycl::range` and `size_t` parameters in addition to the `sycl::queue`. | ||
|
||
The Cuda, Hip and Level Zero backend adapters have the required infrastructure | ||
required to implement the extension. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.