Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Implement max_num_work_groups from the launch queries extension #14333
[SYCL] Implement max_num_work_groups from the launch queries extension #14333
Changes from 8 commits
aead3e3
e172c1e
a840be1
a9e17b4
4f81d0a
b2756c9
28b09f4
fd51cfb
377cf3b
3a8f3bf
b5b3d43
594727e
efcf44f
448b191
54c1b57
eb60b1c
b4b355e
20aa7c5
8fa7b09
e5910fa
a7411c8
7de06c1
e50e837
dc2dde4
b7807f9
4344064
0ef4baa
2fa280b
da8cde2
27b2416
b51f965
ef4cd8b
6483da7
4fc7353
7636f78
ea1e525
1698bb8
529caa5
2e190a2
762c7e1
2169579
c2788f6
6c5485f
cb5e47e
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't render well in HTML. Asciidoc requires a blank line before the bullet.
Are you planning to add more queries to this extension soon? This list of currently planned queries seems odd. I'd suggest removing it unless you have some specific plan to add more things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning on at least one more, being
recommended_work_group_size
. This would be useful in combination with the currently added one, to let the runtime assist in selecting a configuration for max HW occupancy. This is not super useful in most kernel launch configurations, but for small ones that are not the hot-path wheresycl::nd_range
is specified explicitly and manual fine-tuning is not required, it is a useful feature. However, I am not adding this yet.Also, this is not much of a list with one addition at this point, so I am removing it.
Thank you for questioning this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to be a little more detailed about what counts as dynamic work-group local memory. I assume this is the sum of the sizes of all local accessors, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dynamically allocated (SYCL) local memory, for which the size is know only at runtime and can change between kernel submissions, so yes, you are right. I will elaborate in a little more detail to make it clear.
I can see how it is not clear the way I phrased it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that
max_num_work_group_occupancy_per_cu
returns a recommended work-group size? If that is the case, can we rename the query to something likerecommended_num_work_groups
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a recommendation for the maximum number of work-groups (or Cuda blocks) of specified block size etc. that will theoretically execute concurrently on the compute unit (Cuda SM) to achieve maximum occupancy.
I think I like the naming you suggest and it does sound to me like a recommendation. What do you think would be a good name based on my description? (I am sold based on the fact the original I came up with sounds a little weird.)
Thank you, @gmlueck !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it to
recommended_num_work_groups
. Initially, I wanted to indicate that this is not per-device semantics but per compute unit (or whatever this maps to in the HW, i.e. SM for Cuda, EU for Intel Level-Zero or CU for AMD HIP), hence why I had the_per_cu
in the name. However, the extension docs describe the semantics, so I think that's okay now.