Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for LLVM Flang #4016

Merged
merged 4 commits into from
Aug 5, 2023
Merged

Conversation

mmuetzel
Copy link
Contributor

@mmuetzel mmuetzel commented Apr 21, 2023

LLVM Flang is now at a point where it can actually produce usable code.

This PR changes the cmake rules to use appropriate flags for that compiler.
Additionally, it installs LLVM Flang for the CLANG64 runners using MSYS2.
To continue testing CLAPACK in the CI, a new CLANG32 runner is added to the CI. It is highly likely that no Fortran compiler will be available for that build environment.

I've also noticed that none of the runners on GitHub is building with OpenMP. It's likely that that won't work with LLVM Flang in its current form. But I haven't tested yet.
Should some of the runners be switched to test OpenMP support?

Edit: Trying to compile with -DUSE_OPENMP=ON leads to the following error for me:

[4243/18430] Building Fortran object CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F.obj
FAILED: CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F.obj
C:\msys64\clang64\bin\flang.exe -ID:\repo\OpenBLAS\lapack-netlib\SRC -ID:/repo/OpenBLAS/lapack-netlib/LAPACKE/include -fopenmp -m64 -fdefault-integer-8 -fopenmp     -m64 -fdefault-integer-8  -ffixed-line-length-72 -c CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F-pp.f -o CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F.obj
error: loc("./CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/ssytrd_sb2st.F-pp.f":241:27): C:/M/B/src/flang-16.0.1.src/lib/Lower/OpenMP.cpp:815: not yet implemented: OpenMP Block construct clauses
[4260/18430] Building Fortran object CMakeFiles/LAPACK_OVERRIDES.dir/lapack-netlib/SRC/strsyl3.f.obj
ninja: build stopped: subcommand failed.

Looks like that is hitting a current limitation of LLVM Flang.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Apr 21, 2023

ccache in MSYS2 has been updated to a newer version since the CI rules for that platform have been added.
That new version stores its cache at a different location where the current rules don't pick it up.

Is it ok to change that as part of this PR (even if it's not related to LLVM Flang at all). Or do you prefer if I opened a different PR for that?

@martin-frbg
Copy link
Collaborator

Thanks - the new flang still looks like a moving target to me, can we be certain that it does not have the reentrancy issue of previous implementations (and gfortran) without a specific commandline option ?
(And yes, I think it would probably be better to have the ccache rule changes in a separate patch)

@mmuetzel
Copy link
Contributor Author

Thanks - the new flang still looks like a moving target to me, can we be certain that it does not have the reentrancy issue of previous implementations (and gfortran) without a specific commandline option ?

I agree. It still feels like a work in progress, see the limited support for OpenMP.

I'm not sure if such a flag is needed. I tried to search the documentation of Flang and googled for information on that. But I wasn't successful in that search.

Here is what ChatGPT has to say about that:

Hi ChatGPT! Does LLVM Flang need a specific flag to produce code that is thread safe? Similar to gfortran that needs the flag -frecursive.

Hello! LLVM Flang supports thread-safe code generation by default, without requiring any specific flags like -frecursive in gfortran.

LLVM Flang is designed to produce thread-safe code by default, which means that it should be able to handle multiple threads accessing the same data without causing data races or other synchronization issues. However, it's always a good idea to carefully review and test your code for thread safety, even if you're using a compiler that supports thread-safe code generation.

If you encounter any issues with thread safety while using LLVM Flang, you may need to review your code and consider using synchronization primitives such as mutexes, locks, or atomic operations to ensure correct behavior in a multi-threaded environment.

But I don't know how reliable that is.

Fwiw, ctest passes all tests for me locally.

@martin-frbg
Copy link
Collaborator

Well, I have seen too many examples of ChatGPT simply confabulating answers to trust it over a conventional search...
Pity the LLVM website itself does not have usable release notes for LLVMFlang, only design documents that do not tell much about what is actually implemented.

@mmuetzel
Copy link
Contributor Author

Well, I have seen too many examples of ChatGPT simply confabulating answers to trust it over a conventional search...

I completely agree.

Pity the LLVM website itself does not have usable release notes for LLVMFlang, only design documents that do not tell much about what is actually implemented.

In the light of that, I wouldn't mind if you prefer to not merge this change just yet.
Also it looks like ctest failed in the CI when using LLVM Flang. I tried again and it passes all tests reliably for me locally.

Do you prefer to close this PR? Or leave it hanging around and maybe revisit for LLVM 17?

@martin-frbg
Copy link
Collaborator

Seems to be failing (all) the BLAS3 tests specifically. I have restarted one of the CI jobs just to see if it fails consistently. Perhaps "convert to draft" and leave it hanging around (unless having that PR branch around bothers you) ?

@mmuetzel mmuetzel marked this pull request as draft April 21, 2023 16:28
@mmuetzel
Copy link
Contributor Author

mmuetzel commented Apr 21, 2023

It looks like it still failed at the same 4 tests. Not sure why it is passing for me locally.
Maybe that is processor specific?
For me:

PS C:\Users\Markus> Get-CIMInstance -Class Win32_Processor | Select-Object -Property Name

Name
----
AMD Ryzen 7 5800X 8-Core Processor

For the CI, there are different runners. Most of them are Intel CPUs. E.g., in the last run:
CLANG64, int32:

Name
----
Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz

CLANG64, int64:

Name
----
Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz

Perhaps "convert to draft" and leave it hanging around (unless having that PR branch around bothers you) ?

Done. It doesn't bother me. I just need to remember not to force push that branch with something else.

@martin-frbg
Copy link
Collaborator

Zen3 is mostly equivalent to Haswell, the Xeons would appear to be SKYLAKEX target (AVX512) so indeed different GEMM kernels - but same code for the infrastructure bits like thread management. I am away from my SKX system for the weekend, but IIRC the Windows jobs in Azure CI alternate between Haswell and SkylakeX hosts so it may be possible to repeat the test there

@h-vetinari
Copy link
Contributor

not yet implemented: OpenMP Block construct clauses

See here and here. Apparently, this is a Fortran 2008 feature; this means it'll probably still take a while...

@martin-frbg
Copy link
Collaborator

We could try to ifdef that based on advertised OpenMP support capability (if I did not do that already?) but given the other oddities it is probably not worth doing it now

@mmuetzel
Copy link
Contributor Author

I added an additional step that will hopefully display more information if the tests failed.
Maybe, that will help to understand what causes the test failure.

@martin-frbg
Copy link
Collaborator

So some of them complain about a division by zero "somewhere", and all report at least one result that is "less than half accurate" (leading to the "Error" statement output by the cmake test helper script). At that point it might make sense to
print one of the SUMM files - or change one of the failing jobs to release type "debug" (upon which it will probably pass instead of providing line number information about the problem)

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Apr 25, 2023

I added commands that display the SUMM files for those tests.

See, e.g.: https://github.com/xianyi/OpenBLAS/actions/runs/4798792488/jobs/8537546152#step:13:141 (until that log is purged)

@mmuetzel
Copy link
Contributor Author

For some reason, one of the LLVM Flang runners passed the last run. Used processor:

Name
----
Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz

Compared to one of the ones that failed previously:
https://www.intel.com/content/www/us/en/products/compare.html?productIds=192482,79930

Maybe, AVX2 or AVX-512?

@martin-frbg
Copy link
Collaborator

Quite likely - #4016 (comment) but then this would either mean the new flang is trying to use AVX512 instructions/registers or there is a latent bug in (all) the AVX512 GEMM kernels that somehow only shows up in this particular combination of compilers

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Apr 26, 2023

Looking for avx in the folder flang of llvm/llvm-project using grep, I don't find it mentioned once. But maybe it could still use functions from other subprojects in the monorepo that might be implemented with or lower some code to AVX-512 instructions.

Like you already suspected, the debug build passed its tests even though it was running on a Intel(R) Xeon(R) Platinum 8171M CPU (https://www.cpu-world.com/CPUs/Xeon/Intel-Xeon%208171M.html).

I don't know how to track this down any further without being able to reproduce this on a machine to which I've physical access...

@martin-frbg
Copy link
Collaborator

So probably some kind of over-optimization in AVX512-enabled builds (and cmake unhelpfully defaults to -O3 for "Release" builds). We could enforce NO_AVX512 in builds envolving flang-new (meaning nobody gets anything better than HASWELL target), try to strip any mention of -mcpu=skylake-avx512 and similar from FFLAGS, or try to override the -O3. (Or simply wait for LLVM17)

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Apr 26, 2023

Enforcing NO_AVX512 when using LLVM Flang sounds like the easiest work-around for the time being to me.

I'm also seeing these warnings during compilation:

In file included from D:/repo/OpenBLAS/.build-clang64/interface/CMakeFiles/cblas_zgemmt.c:10:
D:/repo/OpenBLAS/interface/gemmt.c:395:9: warning: equality comparison with extraneous parentheses [-Wparentheses-equality]
        if ((m == 0) )
             ~~^~~~
D:/repo/OpenBLAS/interface/gemmt.c:395:9: note: remove extraneous parentheses around the comparison to silence this warning
        if ((m == 0) )
            ~  ^   ~
D:/repo/OpenBLAS/interface/gemmt.c:395:9: note: use '=' to turn this equality comparison into an assignment
        if ((m == 0) )
               ^~
               =
D:/repo/OpenBLAS/interface/gemmt.c:252:13: warning: variable 'm' is uninitialized when used here [-Wuninitialized]
                if (ldc < m)
                          ^
D:/repo/OpenBLAS/interface/gemmt.c:202:11: note: initialize the variable 'm' to silence this warning
        blasint m, lda, ldb;
                 ^
                  = 0

Could that be related?

@martin-frbg
Copy link
Collaborator

Not related the tests only encompass the standard BLAS functions, not extensions like GEMMT. (And I'm pretty sure the warning is harmless, just an extra set of parentheses and some variables that are initialized in an if-else where there are no other options for the condition.

@mmuetzel
Copy link
Contributor Author

I added some changes that enforces NO_AVX512 with LLVM Flang leaving a FIXME note that this should be checked again with the next version of LLVM.

@martin-frbg
Copy link
Collaborator

Not sure if it is a good idea to move the enable_language to the toplevel CMakelists.txt (apart from it being outside the - announced - scope of the PR, there may have been a reason for putting it in a separate script)

@mmuetzel
Copy link
Contributor Author

I was mostly interested to see whether the tests would pass in CI with that change.

Most projects using Cmake include the steps to enable languages pretty early on.
For this case, we'd need to set NO_AVX512 before it is first used in system.make. If we'd like to make that dependent on the identification of the used Fortran compiler, that identification has to be determined before that.
If "Add support for LLVM Flang" should mean that the project should not only be able to be built with this compiler - but also that the tests should pass, we'll need to make one of the changes you proposed a couple of comments before. The last two commits implement one of those options.
But I agree that the block that I moved to the top-level CMakeLists.txt file unconditionally should be guarded by if (NOT NOFORTRAN) instead.

Would that be ok for you?

@martin-frbg
Copy link
Collaborator

The additional guard would not change that much - I'm questioning if it is wise to change known working code that every build uses, just to accomodate an unfinished, experimental and arguably broken compiler.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Apr 26, 2023

I'd argue that it is similar to what was needed for the tree-vectorizer with gcc. But much more coarse grained.
We could probably spend more time to find which over-optimization(s) exactly cause the test errors. But I agree with you that this might not be worth the effort for a compiler that doesn't feel like it is a finished product yet.

Having the option to use that compiler doesn't force people to use it. But imho it would be "kind" to the people that venture that way if they'd end up with something that kind of works.

Moving that compiler check earlier should come with little to no risk.
The check is currently done in f_check.cmake unconditionally.
This file is included in prebuild.cmake conditional on NOT NOFORTRAN.
prebuild.cmake is included in system.cmake unconditionally.
system.cmake is included in the top-level CMakeLists.txt file unconditionally.

So, the only variable we need to be careful about is NOFORTRAN.
That variable is set by the user or in f_check.cmake. It is set in f_check.cmake after the existing test for the Fortran compiler. So, it won't make a difference to move that test to earlier on. But we'd need to keep the condition on a (user-set) NOT NOFORTRAN.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Apr 26, 2023

Staring at the .cmake files for longer, there might be an existing minor inconsistency if ONLY_CBLAS is set:
If there is no Fortran compiler on the build system but the user doesn't set NOFORTRAN, NOFORTRAN is set to 2 and NO_FBLAS is set to 1.
If the user would explicitly set NOFORTRAN, it wouldn't be touched and NO_FBLAS wouldn't be set.

With the changes from here, the latter would happen in both cases. (Which feels more consistent tbh.)

@martin-frbg
Copy link
Collaborator

Interesting - but there is something wrong with ONLY_CBLAS in the CMAKE builds that I need to look into anyway.
Another thought - setting NO_AVX512 would have the drawback of disabling it for the C compiler, that is AVX512 kernels and all, while the passing CBLAS tests suggest that the problem is only in the flang-new compilation of the Fortran-based test cases. So massaging the FFLAGS in test/CMakeLists.txt (and potentially lapack-netlib/TESTING/CMakeLists.txt) might be a better solution

@martin-frbg
Copy link
Collaborator

I had both HLFIRDialect.h.inc and HLFIRenums.h.inc missing in my build, but both should apparently be generated and my VM ran out of disk space once during the build, so it could still be a local problem.

@martin-frbg
Copy link
Collaborator

Issue created as llvm/llvm-project#64268 after reproducing the problem in a clean build.

@martin-frbg
Copy link
Collaborator

So with the build problem fixed, LLVM 17.0-rc1 builds, and also builds OpenBLAS - the earlier OpenMP problem is indeed gone. Interestingly, the Fortran compiler is still called flang-new in this release (candidate). In any case I have modified
f_check to query the compiler for its version, instead of relying on a particular name for its binary. Probably time to elevate this PR from draft status now...

@martin-frbg martin-frbg marked this pull request as ready for review August 5, 2023 09:43
@martin-frbg
Copy link
Collaborator

closing&reopening to rerun CI

@martin-frbg martin-frbg closed this Aug 5, 2023
@martin-frbg martin-frbg reopened this Aug 5, 2023
@mmuetzel
Copy link
Contributor Author

mmuetzel commented Aug 5, 2023

I stripped the commits that I only did for testing (that didn't make a difference) and rebased the changes on a current head of the develop branch.

The Flang compiler for MSYS2/CLANG64 is still at version 16.0.5. So, I don't expect any difference when it comes to the failing tests on hardware that uses AVX512 instructions.
The only way to avoid that afaict would be to not use AVX512 instructions when using Flang. But you didn't like to include such a change in this PR.
I can open a follow-up PR if you prefer.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Aug 5, 2023

Added another commit that should be de-activating AVX512 only on the CI-runners that build with LLVM Flang 16 (currently). Maybe, that's less intrusive than integrating that logic into the build system itself.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Aug 5, 2023

With that change to the build rules in the CI, the tests passed running on "Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz".

This is ready to land imho.

(Note to future me or anyone else coming back to this when the CI logs are gone: The tests on CLANG64 still failed without -DNO_AVX512.)

@martin-frbg
Copy link
Collaborator

Thanks. I saw v16 of flang-new as more of a transient nuisance, given the other issue with openmp. Maybe I should reconsider this due to the likely inertia until everybody has upgraded to 17+

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Aug 5, 2023

I don't have access to hardware with AVX-512 support (other than the CI here). So, I don't know how to test if Flang 17 would actually behave differently compared to Flang 16.

But imho these changes would still be helpful for anyone trying to use LLVM Flang.

@martin-frbg
Copy link
Collaborator

I have built and tested with 17.0.0-rc1 on Xeon, though not specifically on Windows (idk if the OS is expected to make a difference in this regard)

@martin-frbg martin-frbg added this to the 0.3.24 milestone Aug 5, 2023
@martin-frbg martin-frbg merged commit b63e458 into OpenMathLib:develop Aug 5, 2023
55 checks passed
@mmuetzel
Copy link
Contributor Author

mmuetzel commented Aug 7, 2023

Thanks. I saw v16 of flang-new as more of a transient nuisance, given the other issue with openmp. Maybe I should reconsider this due to the likely inertia until everybody has upgraded to 17+

Thank you for merging this.
PR #4180 might help users of LLVM Flang prior to the upcoming version 17 from running into known issues with those (older) versions of that compiler.

@mmuetzel
Copy link
Contributor Author

Fwiw, I tried to rebuild with "CI: Build with NO_AVX512 for the runners that use Flang 16." reverted now that MSYS2 updated to Flang 17.
But the ?blas3 tests still fail:
https://github.com/mmuetzel/OpenBLAS/actions/runs/6297082053/job/17093337718

Test project D:/a/OpenBLAS/OpenBLAS/build
      Start  4: sblas3
  1/4 Test  #4: sblas3 ...........................***Failed   [17](https://github.com/mmuetzel/OpenBLAS/actions/runs/6297082053/job/17093337718#step:13:18).75 sec
  Fortran STOP
  IEEE arithmetic exceptions signaled: DIVBYZERO INEXACT
  Error
  
      Start  7: dblas3
  2/4 Test  #7: dblas3 ...........................***Failed   17.79 sec
  Fortran STOP
  IEEE arithmetic exceptions signaled: DIVBYZERO INEXACT
  Error
  
      Start 10: cblas3
  3/4 Test #10: cblas3 ...........................***Failed   17.90 sec
  Fortran STOP
  IEEE arithmetic exceptions signaled: INEXACT INVALID
  Error
  
      Start 13: zblas3
  4/4 Test #13: zblas3 ...........................***Failed   17.80 sec
  Fortran STOP
  IEEE arithmetic exceptions signaled: INEXACT
  Error
  
  
  0% tests passed, 4 tests failed out of 4
  
  Total Test time (real) =  71.[26](https://github.com/mmuetzel/OpenBLAS/actions/runs/6297082053/job/17093337718#step:13:27) sec
  
  The following tests FAILED:
  	  4 - sblas3 (Failed)
  Errors while running CTest
  	  7 - dblas3 (Failed)
  	 10 - cblas3 (Failed)
  	 13 - zblas3 (Failed)

Log from these tests

  Start testing: Sep 25 11:47 Coordinated Universal Time
  ----------------------------------------------------------
  4/115 Testing: sblas3
  4/115 Test: sblas3
  Command: "C:/Windows/System32/WindowsPowerShell/v1.0/powershell.exe" "-ExecutionPolicy" "Bypass" "D:/a/OpenBLAS/OpenBLAS/build/test/test_helper.ps1" "D:/a/OpenBLAS/OpenBLAS/build/test/sblat3.exe" "D:/a/OpenBLAS/OpenBLAS/test/sblat3.dat" "SBLAT3.SUMM"
  Directory: D:/a/OpenBLAS/OpenBLAS/build/test
  "sblas3" start time: Sep 25 11:47 Coordinated Universal Time
  Output:
  ----------------------------------------------------------
  Fortran STOP
  IEEE arithmetic exceptions signaled: DIVBYZERO INEXACT
  Error
  <end of output>
  Test time =  17.[75](https://github.com/mmuetzel/OpenBLAS/actions/runs/6297082053/job/17093337718#step:13:77) sec
  ----------------------------------------------------------
  Test Failed.
  "sblas3" end time: Sep 25 11:47 Coordinated Universal Time
  "sblas3" time elapsed: 00:00:17
  ----------------------------------------------------------
  
  7/115 Testing: dblas3
  7/115 Test: dblas3
  Command: "C:/Windows/System32/WindowsPowerShell/v1.0/powershell.exe" "-ExecutionPolicy" "Bypass" "D:/a/OpenBLAS/OpenBLAS/build/test/test_helper.ps1" "D:/a/OpenBLAS/OpenBLAS/build/test/dblat3.exe" "D:/a/OpenBLAS/OpenBLAS/test/dblat3.dat" "DBLAT3.SUMM"
  Directory: D:/a/OpenBLAS/OpenBLAS/build/test
  "dblas3" start time: Sep 25 11:47 Coordinated Universal Time
  Output:
  ----------------------------------------------------------
  Fortran STOP
  IEEE arithmetic exceptions signaled: DIVBYZERO INEXACT
  Error
  <end of output>
  Test time =  17.[79](https://github.com/mmuetzel/OpenBLAS/actions/runs/6297082053/job/17093337718#step:13:81) sec
  ----------------------------------------------------------
  Test Failed.
  "dblas3" end time: Sep 25 11:47 Coordinated Universal Time
  "dblas3" time elapsed: 00:00:17
  ----------------------------------------------------------
  
  10/115 Testing: cblas3
  10/115 Test: cblas3
  Command: "C:/Windows/System32/WindowsPowerShell/v1.0/powershell.exe" "-ExecutionPolicy" "Bypass" "D:/a/OpenBLAS/OpenBLAS/build/test/test_helper.ps1" "D:/a/OpenBLAS/OpenBLAS/build/test/cblat3.exe" "D:/a/OpenBLAS/OpenBLAS/test/cblat3.dat" "CBLAT3.SUMM"
  Directory: D:/a/OpenBLAS/OpenBLAS/build/test
  "cblas3" start time: Sep 25 11:47 Coordinated Universal Time
  Output:
  ----------------------------------------------------------
  Fortran STOP
  IEEE arithmetic exceptions signaled: INEXACT INVALID
  Error
  <end of output>
  Test time =  17.90 sec
  ----------------------------------------------------------
  Test Failed.
  "cblas3" end time: Sep 25 11:47 Coordinated Universal Time
  "cblas3" time elapsed: 00:00:17
  ----------------------------------------------------------
  
  13/115 Testing: zblas3
  13/115 Test: zblas3
  Command: "C:/Windows/System32/WindowsPowerShell/v1.0/powershell.exe" "-ExecutionPolicy" "Bypass" "D:/a/OpenBLAS/OpenBLAS/build/test/test_helper.ps1" "D:/a/OpenBLAS/OpenBLAS/build/test/zblat3.exe" "D:/a/OpenBLAS/OpenBLAS/test/zblat3.dat" "ZBLAT3.SUMM"
  Directory: D:/a/OpenBLAS/OpenBLAS/build/test
  "zblas3" start time: Sep 25 11:47 Coordinated Universal Time
  Output:
  ----------------------------------------------------------
  Fortran STOP
  IEEE arithmetic exceptions signaled: INEXACT
  Error
  <end of output>
  Test time =  17.[80](https://github.com/mmuetzel/OpenBLAS/actions/runs/6297082053/job/17093337718#step:13:82) sec
  ----------------------------------------------------------
  Test Failed.
  "zblas3" end time: Sep 25 11:48 Coordinated Universal Time
  "zblas3" time elapsed: 00:00:17
  ----------------------------------------------------------
  
  End testing: Sep 25 11:48 Coordinated Universal Time

@martin-frbg
Copy link
Collaborator

Interesting, any idea what's in the ?BLAT3.SUMM files ? (Just to exclude problems with writing those files, or searching them for error messages)

@mmuetzel
Copy link
Contributor Author

See, e.g.: https://github.com/mmuetzel/OpenBLAS/actions/runs/6301097094/job/17109175034

Content of test/SBLAT3.SUMM
   TESTS OF THE REAL             LEVEL 3 BLAS
  
   THE FOLLOWING PARAMETER VALUES WILL BE USED:
     FOR N                   0     1     2     3     7    31
     FOR ALPHA             0.0   1.0   0.7
     FOR BETA              0.0   1.0   1.3
  
   ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00
  
   RELATIVE MACHINE PRECISION IS TAKEN TO BE  1.2E-07
  
   SGEMM  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER  6 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 11 WAS CHANGED INCORRECTLY *******
   ******* SGEMM  FAILED ON CALL NUMBER:
     8374: SGEMM ('T','N',  2, 31,  1, 1.0, A,  2, B,  2, 0.0, C,  3).
  
   SSYMM  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* SSYMM  FAILED ON CALL NUMBER:
      616: SSYMM ('L','U',  2, 31, 1.0, A,  3, B,  3, 0.0, C,  3)    .
  
   STRMM  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
             EXPECTED RESULT   COMPUTED RESULT
         1     -0.372627          0.183522    
        THESE ARE THE RESULTS FOR COLUMN  13
   ******* STRMM  FAILED ON CALL NUMBER:
      794: STRMM ('L','U','N','U',  1, 31, 1.0, A,  2, B,  2)        .
  
   STRSM  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
             EXPECTED RESULT   COMPUTED RESULT
         1      0.296703          0.269034    
         2     -0.372627         -0.526591    
         3     -0.342657         -0.214524    
         4      0.386613          0.386613    
         5     -0.529471E-01     -0.529471E-01
         6     -0.222777         -0.222777    
         7      0.306693          0.306693    
        THESE ARE THE RESULTS FOR COLUMN   1
   ******* STRSM  FAILED ON CALL NUMBER:
     2018: STRSM ('L','U','N','U',  7,  7, 1.0, A,  8, B,  8)        .
  
   SSYRK  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* SSYRK  FAILED ON CALL NUMBER:
     1354: SSYRK ('U','N',  7,  1, 1.0, A,  8, 0.0, C,  8)           .
  
   SSYR2K PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* SSYR2K FAILED ON CALL NUMBER:
     1354: SSYR2K('U','N',  7,  1, 1.0, A,  8, B,  8, 0.0, C,  8)    .
  
   END OF TESTS
Content of test/DBLAT3.SUMM
   TESTS OF THE DOUBLE PRECISION LEVEL 3 BLAS
  
   THE FOLLOWING PARAMETER VALUES WILL BE USED:
     FOR N                   0     1     2     3     7    31
     FOR ALPHA             0.0   1.0   0.7
     FOR BETA              0.0   1.0   1.3
  
   ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00
  
   RELATIVE MACHINE PRECISION IS TAKEN TO BE  2.2D-16
  
   DGEMM  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER  6 WAS CHANGED INCORRECTLY *******
   ******* DGEMM  FAILED ON CALL NUMBER:
     7888: DGEMM ('T','N',  2,  7,  1, 1.0, A,  2, B,  2, 0.0, C,  3).
  
   DSYMM  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* DSYMM  FAILED ON CALL NUMBER:
      580: DSYMM ('L','U',  2,  7, 1.0, A,  3, B,  3, 0.0, C,  3)    .
  
   DTRMM  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
             EXPECTED RESULT   COMPUTED RESULT
         1     -0.462537          0.244900E-01
        THESE ARE THE RESULTS FOR COLUMN   7
   ******* DTRMM  FAILED ON CALL NUMBER:
      722: DTRMM ('L','U','N','U',  1,  7, 1.0, A,  2, B,  2)        .
  
   DTRSM  PASSED THE TESTS OF ERROR-EXITS
  
   DTRSM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)
  
   DSYRK  PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* DSYRK  FAILED ON CALL NUMBER:
     1354: DSYRK ('U','N',  7,  1, 1.0, A,  8, 0.0, C,  8)           .
  
   DSYR2K PASSED THE TESTS OF ERROR-EXITS
  
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* DSYR2K FAILED ON CALL NUMBER:
     1354: DSYR2K('U','N',  7,  1, 1.0, A,  8, B,  8, 0.0, C,  8)    .
  
   END OF TESTS
Content of test/CBLAT3.SUMM
   TESTS OF THE COMPLEX          LEVEL 3 BLAS
  
   THE FOLLOWING PARAMETER VALUES WILL BE USED:
     FOR N                   0     1     2     3     7    31
     FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
     FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  
  
   ERROR-EXITS WILL NOT BE TESTED
  
   ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00
  
   RELATIVE MACHINE PRECISION IS TAKEN TO BE  1.2E-07
  
   ******* FATAL ERROR - PARAMETER NUMBER  6 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 11 WAS CHANGED INCORRECTLY *******
   ******* CGEMM  FAILED ON CALL NUMBER:
     7861: CGEMM ('N','N',  2,  7,  1,( 1.0, 0.0), A,  3, B,  2,( 0.0, 0.0), C,  3).
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* CHEMM  FAILED ON CALL NUMBER:
      580: CHEMM ('L','U',  2,  7,( 1.0, 0.0), A,  3, B,  3,( 0.0, 0.0), C,  3)    .
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* CSYMM  FAILED ON CALL NUMBER:
      580: CSYMM ('L','U',  2,  7,( 1.0, 0.0), A,  3, B,  3,( 0.0, 0.0), C,  3)    .
  
   CTRMM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)
  
   ******* FATAL ERROR - PARAMETER NUMBER  7 WAS CHANGED INCORRECTLY *******
   ******* CTRSM  FAILED ON CALL NUMBER:
     2234: CTRSM ('L','U','N','U', 31,  1,( 1.0, 0.0), A, 32, B, 32)               .
  
   ******* FATAL ERROR - COMPUTED RESULT IS LESS THAN HALF ACCURATE *******
                         EXPECTED RESULT                    COMPUTED RESULT
         1  (  -0.929071E-01,    0.00000    )  (  -0.929071E-01,  -0.100000E+11)
         2  (   0.669331E-01,   0.442557    )  (   0.669331E-01,   0.442557    )
         3  (  -0.302697    ,   0.450549    )  (  -0.302697    ,   0.450549    )
         4  (   0.269730E-01,   0.106893    )  (   0.269730E-01,   0.106893    )
         5  (   0.569431E-01,  -0.100899    )  (   0.569431E-01,  -0.100899    )
         6  (  -0.212787    ,  -0.156843    )  (  -0.212787    ,  -0.156843    )
         7  (   0.346653    ,  -0.292707    )  (   0.346653    ,  -0.292707    )
        THESE ARE THE RESULTS FOR COLUMN   1
   ******* CHERK  FAILED ON CALL NUMBER:
      911: CHERK ('L','N',  7,  1, 0.0, A,  8, 1.0, C,  8)                         .
  
   ******* FATAL ERROR - PARAMETER NUMBER  8 WAS CHANGED INCORRECTLY *******
   ******* CSYRK  FAILED ON CALL NUMBER:
      904: CSYRK ('U','N',  7,  1,( 1.0, 0.0) , A,  8,( 0.0, 0.0), C,  8)          .
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* CHER2K FAILED ON CALL NUMBER:
     1120: CHER2K('U','N', 31,  1,( 1.0, 0.0), A, 32, B, 32, 0.0, C, 32)           .
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* CSYR2K FAILED ON CALL NUMBER:
     1120: CSYR2K('U','N', 31,  1,( 1.0, 0.0), A, 32, B, 32,( 0.0, 0.0), C, 32)    .
  
   END OF TESTS
Content of test/ZBLAT3.SUMM
   TESTS OF THE COMPLEX*16       LEVEL 3 BLAS
  
   THE FOLLOWING PARAMETER VALUES WILL BE USED:
     FOR N                   0     1     2     3     7    31
     FOR ALPHA          ( 0.0, 0.0)  ( 1.0, 0.0)  ( 0.7,-0.9)  
     FOR BETA           ( 0.0, 0.0)  ( 1.0, 0.0)  ( 1.3,-1.1)  
  
   ERROR-EXITS WILL NOT BE TESTED
  
   ROUTINES PASS COMPUTATIONAL TESTS IF TEST RATIO IS LESS THAN   16.00
  
   RELATIVE MACHINE PRECISION IS TAKEN TO BE  2.2D-16
  
   ******* FATAL ERROR - PARAMETER NUMBER 11 WAS CHANGED INCORRECTLY *******
   ******* ZGEMM  FAILED ON CALL NUMBER:
     7861: ZGEMM ('N','N',  2,  7,  1,( 1.0, 0.0), A,  3, B,  2,( 0.0, 0.0), C,  3).
  
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* ZHEMM  FAILED ON CALL NUMBER:
      580: ZHEMM ('L','U',  2,  7,( 1.0, 0.0), A,  3, B,  3,( 0.0, 0.0), C,  3)    .
  
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* ZSYMM  FAILED ON CALL NUMBER:
      580: ZSYMM ('L','U',  2,  7,( 1.0, 0.0), A,  3, B,  3,( 0.0, 0.0), C,  3)    .
  
   ZTRMM  PASSED THE COMPUTATIONAL TESTS (  2592 CALLS)
  
   ******* FATAL ERROR - PARAMETER NUMBER  7 WAS CHANGED INCORRECTLY *******
   ******* ZTRSM  FAILED ON CALL NUMBER:
     1802: ZTRSM ('L','U','N','U',  7,  1,( 1.0, 0.0), A,  8, B,  8)               .
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER  8 WAS CHANGED INCORRECTLY *******
   ******* ZHERK  FAILED ON CALL NUMBER:
      904: ZHERK ('U','N',  7,  1, 1.0, A,  8, 0.0, C,  8)                         .
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER  8 WAS CHANGED INCORRECTLY *******
   ******* ZSYRK  FAILED ON CALL NUMBER:
      904: ZSYRK ('U','N',  7,  1,( 1.0, 0.0) , A,  8,( 0.0, 0.0), C,  8)          .
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* ZHER2K FAILED ON CALL NUMBER:
      904: ZHER2K('U','N',  7,  1,( 1.0, 0.0), A,  8, B,  8, 0.0, C,  8)           .
  
   ******* FATAL ERROR - PARAMETER NUMBER  5 WAS CHANGED INCORRECTLY *******
   ******* FATAL ERROR - PARAMETER NUMBER 10 WAS CHANGED INCORRECTLY *******
   ******* ZSYR2K FAILED ON CALL NUMBER:
      904: ZSYR2K('U','N',  7,  1,( 1.0, 0.0), A,  8, B,  8,( 0.0, 0.0), C,  8)    .
  
   END OF TESTS

@martin-frbg
Copy link
Collaborator

Funny - it is claiming that the BLAS kernels are trashing input-only arguments, but more likely it is simply not reading the returned values correctly. Not sure what to do about that as it only happens in BLAS3, so unlikely to be a general C/Fortran ABI problem. And it is only the "clang64-int32" build that fails (or is that the only one using flang17 ?)

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Sep 25, 2023

And it is only the "clang64-int32" build that fails (or is that the only one using flang17 ?)

Both CLANG64 runners should be affected. (There is no 32-bit Flang, and the MINGW* runners are using gfortran.) But it's a bit of a lottery whether a runner with AVX512 instructions is picking up the job.
I can try and re-run the CLANG64 int64 runner. Maybe, a AVX512 machine will pick up the job...

Edit: It fails very similarly:
https://github.com/mmuetzel/OpenBLAS/actions/runs/6301097094/job/17114983202#step:13:141

@h-vetinari
Copy link
Contributor

@mmuetzel, we're still running into the ?blas3 failures on windows with the in-progress flang 19 (when AVX512-instructions are present). I think this would be worth an upstream issue by now (assuming one doesn't exist already?).

It seems to me from this PR that you researched this issue in more depths than I have, but I'm happy to raise the issue myself in case you're not in a position to do it.

@mmuetzel
Copy link
Contributor Author

mmuetzel commented Jul 8, 2024

@h-vetinari: Feel free to raise that issue upstream.
IIRC, I wasn't really successful tracking this issue down to specific sources or instructions. I'm also unable to reproduce the issue locally (no AVX-512 instructions here).
The only thing that I remember is that some of the GitHub-hosted runners failed at the time if OpenBLAS was configured without -DNO_AVX512=1. All other tests were a dead-end. And I'm not even sure those flags propagated correctly or had an effect for flang at that time.

So, I'm not sure I could contribute anything more than you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants