BLAS Performance Comparisons
Discussions center on benchmarking a new matrix library against optimized BLAS implementations like OpenBLAS, MKL, and Eigen, debating if it outperforms them in speed on various hardware and questioning the value of alternatives to established linear algebra libraries.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Why exactly is this better than atlas / blas / any library using it, e.g. Eigen?
BLAS is a very well optimized library. I think a lot of it is in Fortran, which can be faster than c. It is very heavily used in scientific compute. BLAS also has methods that have been hand tuned in assembly. Itβs not magic, but the amount of work that has gone into it is not something you would probably want to replicate.
are OpenBLAS and MKL not well optimized lol? They literally compared against OpenBLAS/MKL and posted the results in the article. As someone already mentioned, this implementation is faster than MKL even on a Intel Xeon with 96 cores. Maybe you missed the point, but the purpose of the arcticle was to show HOW to implement matmul without FORTRAN/ASSEMBLY code with NumPy-like performance. NOT how to write a BLIS-competitive library. So the article and the code seem to be LGTM.
It would help to see performance benchmarks against blas or armadillo, etc.
I always hear this "fast matrix operations" argument from Matlab users, but don't they both use BLAS? The difference can only be marginal
Change my matrix library to a BLAS binding for machine learning. 150x speedup
Uses BLAS but no mention of cuBLAS to speed things up? Does that mean the linear algebra wasn't big enough a component to merit optimizing on?
Eagerly awaiting matrix libraries written in pure Python that outperform BLAS.
You don't just call DGEMM from vendor BLAS?
Don't all high performance math libraries have the option of LAPACK interfaces?