One thing about Matlab that’s good: everyone uses it. Another thing: plots are pretty useful!

One bad thing: matrix manipulations are slow. Wait, really? Matlab, as in Matrix Laboratory? A laboratory of something that doesn’t do its job well is never a good thing…

Enter MMX. It allows for faster Matlab and faster Matlab multiplication, and is similar to ndfun or mtimesx. Who doesn’t want Matlab fast matrix stuff? I’m not 100% sure what the MMX stands for, but Yuval Tassa wrote it with my contributions to be a faster way of crunching some sweet numbers. I can let some pictures do the non-verbal talking:

aka squiggly lines

What is happening here is that many matrix manipulations can be completed if the matrices are stacked: N-by-M matrix, with D pages. Two stacks of matrix pages can be manipulated at the same time with one computational thread per page. What is happening around dimension 36 in the plot above is that some optimization libraries like to sub-divide a matrix to be handled by multiple threads. Obviously, the overhead is terrible for performance. I would like to note that the above shows the native C compiled code as doing ridiculously well; it will pretty quickly reach the natural computational capacity of your computer (read: number of cores).

How do we get the speedups over Matlab code or even compiled C? BLAS. Basic linear algebra subprograms are a set of data manipulations that are a like a nerdy way to getting more power from your engine; highly optimized and ever benchmarked to provide efficient computations. Many libraries are available, but we went with Intel’s MKL (Math Kernel Libraries). While usually commercial, you can test out the headers and libraries for BLAS for free. It would be awesome to see how CUDA or OpenCL does with this in the future (maybe).

In summary, using BLAS and correctly allocating software threads provides a much faster way of computing many matrix manipulations. While you may marginally have to rethink some scripts to take true advantage of MMX, the improvements are obvious. Check it out and bug me if you have questions.