[C++] Fast Matrix Multiplication of large matrix

Started by
5 comments, last by bartoshe 4 years, 1 month ago

Hi everybody!
It looks like large matrix multiplication is extremely slow and I was looking for strategy to improve it.
The classical 3 for loop is extremely slow:

C(m, n) = A(m, k) * B(k, n)
for (int i = 0; i < m; i++) {
    for (int j = 0; j < n; j++) {
        for (int p = 0; p < k; p++) {
            C(i, j) += A(i, p) * B(p, j);
        }
    }
}

This version is 2.5 times faster but 2.5 times faster is unfortunately not enough for large matrix:

float* dstPtr = output.m_data;
const float* leftPtr = m_data;
for (size_t i = 0; i < m_rows; ++i) {
    for (size_t j = 0; j < other.m_cols; ++j) {
        const float* rightPtr = other.m_data + j;
        float sum = leftPtr[0] * rightPtr[0];
        for (size_t n = 1; n < m_cols; ++n) {
            rightPtr += other.m_cols;
            sum += leftPtr[n] * rightPtr[0];
        }
        *dstPtr++ = sum;
    }
    leftPtr += m_cols;
}

Any experience or idea about this problem?
Thanks a lot!

Advertisement

Multiplying matrices fast might be the most studied problem in high-performance computing. There are lots of linear-algebra libraries out there, many of which implement an interface called BLAS. One of the fastest options is Nvidia's cuBLAS, which runs on the GPU.

I use Eigen which is the fastest apparently for CPU based library.
It's mostly to learn when I saw such a big difference in performance using custom matrix class.
Transpose the right matrix is actually a big help on performance!
Adding SSE is another help!
These two adds increase performance, it's 2 times faster than the previous 2 times faster result!
I'm thinking what more can be done there since Eigen still beats this result.

You can process a few (4 or 8) rows at a time, to take advantage of SIMD parallelism.

Also, you can use multiple threads.

Beyond that, there are fast matrix multiplication algorithms, but I don't think those are practical.

Read the “What every programmer should know about memory” article by Ulrich Drepper. It explains in detail how accessing memory works in a computer at electrical level and up, what performance problems you run into, and what you can do as a programmer.

@Alundra What I have read that checking zeros and not apply the multiplication solves the problem.

This topic is closed to new replies.

Advertisement