Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xnn f32 reduce window #7192

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

LokeshReddyOVS-MCW
Copy link
Contributor

No description provided.

@LokeshReddyOVS-MCW LokeshReddyOVS-MCW marked this pull request as ready for review September 26, 2024 12:19
Copy link
Collaborator

@dsharlet dsharlet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only looked at microkernels so far


int padded_size = rows + MAX((rows - 1),0) * (base_dilation - 1) + padding[0] + padding[1];
int output_size = (padded_size < (window_dimensions - 1) * window_dilations + 1) ?
0 : FLOOR((padded_size - (window_dimensions - 1) * window_dilations - 1) / (float)window_strides) + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to convert the size to a float, which could drop some precision, which is a major problem for size/shape values. I think this needs to be done with integer arithmetic.

int output_size = (padded_size < (window_dimensions - 1) * window_dilations + 1) ?
0 : FLOOR((padded_size - (window_dimensions - 1) * window_dilations - 1) / (float)window_strides) + 1;

// replaced modulo and division by multiplicative scaled inverse
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an approximation, isn't this a problem for something like this? Similar to float comment above: approximation for the "values" of buffers is usually OK, but not for the "shape" parameters.

float sum = init_value;
int window_start = i * window_strides;
int pad_high_boundary = CEIL((((padding[0] - window_start) * inverse_win_dilation) >> 32));
int pad_low_boundary = CEIL((((padded_size - padding[1] - window_start) * inverse_win_dilation) >> 32));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the argument to CEIL here an integer? I think you need something more like: (((padded_size - padding[1] - window_start) * inverse_win_dilation + ((1 << 32) - 1)) >> 32)

But of course, this assumes it's OK to approximate this division in the first place (see above comment).

#include "xnnpack/common.h"
#include "xnnpack/reduce.h"

#define CEILING_POS(X) ((X-(int)(X)) > 0 ? (int)(X+1) : (int)(X))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These macros should be helper functions in math.h. I think we already have some of the functions you need, e.g. divide_round_up (we shouldn't divide first and then try to compute the ceiling of the result, that approach assumes approximating the result with float or fixed point arithmetic).

#define FLOORING_POS(X) (int)(X)
#define FLOORING_NEG(X) ((X-(int)(X)) > 0 ? (int)(X-1) : (int)(X))
#define FLOOR(X) ( ((X) > 0) ? FLOORING_POS(X) : FLOORING_NEG(X) )
#define MAX(X,Y) (X > Y ? X : Y)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use functions in math.h for these

const float* input,
float init_value,
int* padding,
int base_dilation,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be int, i.e. signed? If we can assume they are positive, a lot of the necessary arithmetic gets simpler (e.g. divide_round_up is pretty simple for size_t, not so much for int).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that you should use size_t, and if you need a signed value, use a more specific type, e.g. int32_t.

# LICENSE file in the root directory of this source tree.

# SCALAR
- name: xnn_f32_rwdsum_ukernel_1p1x__scalar_c1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add new yaml files, and instead try to use this new system: https://github.com/google/XNNPACK/blob/master/doc/microkernel-enumerators.md, e.g. https://github.com/google/XNNPACK/blob/master/src/f16-vabs/f16-vabs.h.

We are working on converting existing yamls to such headers.

assert(input != NULL);
assert(output != NULL);

int padded_size = rows + MAX((rows - 1),0) * (base_dilation - 1) + padding[0] + padding[1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixing of int and size_t here is concerning, I think there's a lot of risk of integer overflow, especially in the approximated division below which will multiply values that should be size_t with very large fixed point reciprocals (2^32 if dilation/stride is 1).

Comment on lines +237 to +247
[xnnpack_benchmark(
name = "%s_bench" % kernel,
srcs = [
"%s.cc" % kernel.replace("_", "-"),
"rsum-benchmark.h",
"rw-benchmark.h",
],
deps = MICROKERNEL_BENCHMARK_DEPS,
) for kernel in [
"f32_rwsum",
]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a single target it does not need to be in a list expansion.

Comment on lines +10 to +20
#include "bench/rw-benchmark.h"
#include "bench/rsum-benchmark.h"
#include "bench/utils.h"
#include <benchmark/benchmark.h>

#include "xnnpack.h"
#include "xnnpack/aligned-allocator.h"
#include "xnnpack/common.h"
#include "xnnpack/reduce.h"
#include "xnnpack/microfnptr.h"
#include "xnnpack/microparams-init.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort the includes correctly, i.e. system headers first, then #include <...>, then the #include "...", each group sorted alphabetically.

Comment on lines +15 to +23
#include "bench/rw-benchmark.h"
#include "bench/utils.h"
#include <benchmark/benchmark.h>

#include "xnnpack.h"
#include "xnnpack/aligned-allocator.h"
#include "xnnpack/common.h"
#include "xnnpack/reduce.h"
#include "xnnpack/microfnptr.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sort as described above.

state.counters["cpufreq"] = cpu_frequency;
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove extra space.


namespace {

void f32_rwsum(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions are not templated, so this should actually be it's own build target, with a .cc and a .h file.

const float* input,
float init_value,
int* padding,
int base_dilation,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that you should use size_t, and if you need a signed value, use a more specific type, e.g. int32_t.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants