[C++] Fast Matrix Multiplication of large matrix

Math and Physics Programming C++ Optimization LargeMatrix

Started by Alundra April 04, 2020 08:26 PM

5 comments, last by bartoshe 4 years, 1 month ago

2,326

Author

April 04, 2020 08:26 PM

Hi everybody!
It looks like large matrix multiplication is extremely slow and I was looking for strategy to improve it.
The classical 3 for loop is extremely slow:

C(m, n) = A(m, k) * B(k, n)

for (int i = 0; i < m; i++) {
    for (int j = 0; j < n; j++) {
        for (int p = 0; p < k; p++) {
            C(i, j) += A(i, p) * B(p, j);
        }
    }
}

This version is 2.5 times faster but 2.5 times faster is unfortunately not enough for large matrix:

float* dstPtr = output.m_data;
const float* leftPtr = m_data;
for (size_t i = 0; i < m_rows; ++i) {
    for (size_t j = 0; j < other.m_cols; ++j) {
        const float* rightPtr = other.m_data + j;
        float sum = leftPtr[0] * rightPtr[0];
        for (size_t n = 1; n < m_cols; ++n) {
            rightPtr += other.m_cols;
            sum += leftPtr[n] * rightPtr[0];
        }
        *dstPtr++ = sum;
    }
    leftPtr += m_cols;
}

Any experience or idea about this problem?
Thanks a lot!

alvaro

21,604

April 04, 2020 10:46 PM

Multiplying matrices fast might be the most studied problem in high-performance computing. There are lots of linear-algebra libraries out there, many of which implement an interface called BLAS. One of the fastest options is Nvidia's cuBLAS, which runs on the GPU.

Alundra

2,326

Author

April 04, 2020 11:07 PM

I use Eigen which is the fastest apparently for CPU based library.
It's mostly to learn when I saw such a big difference in performance using custom matrix class.
Transpose the right matrix is actually a big help on performance!
Adding SSE is another help!
These two adds increase performance, it's 2 times faster than the previous 2 times faster result!
I'm thinking what more can be done there since Eigen still beats this result.

alvaro

21,604

April 04, 2020 11:55 PM

You can process a few (4 or 8) rows at a time, to take advantage of SIMD parallelism.

Also, you can use multiple threads.

Beyond that, there are fast matrix multiplication algorithms, but I don't think those are practical.

Alberth

10,211

April 05, 2020 05:53 AM

Read the “What every programmer should know about memory” article by Ulrich Drepper. It explains in detail how accessing memory works in a computer at electrical level and up, what performance problems you run into, and what you can do as a programmer.

bartoshe

129

April 05, 2020 09:28 AM

@Alundra What I have read that checking zeros and not apply the multiplication solves the problem.

[C++] Fast Matrix Multiplication of large matrix

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

[C++] Fast Matrix Multiplication of large matrix

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines