next up previous contents
Next: ``test_getg.c'' Up: Basic Linear Algebra Subroutines Previous: BLAS2

BLAS3

The BLAS3 routines, which involve tex2html_wrap_inline588 operations (and only n tex2html_wrap_inline584 data) are where the great gains from cache may be had. Write the routines below (same prototype as mult()), and uncomment the rest of the main program. Note that, for debugging, the main program checks your two matrix-matrix multiplications routines against each other. This is useful because block-matrix multiply is somewhat intricate and prone to coding bugs.

  1. matmat1(): standard implementation of the matrix multiplication

    displaymath592

  2. matmat3(): matrix multiplication tex2html_wrap_inline594 via blocks of size Nb tex2html_wrap_inline572 Nb, where Nb is set in a #define statement. (Nb=32 is a reasonable number.)

    Note: To simplify the exercise, you may assume that Nb divides the total matrix size evenly.

Hint: Pseudo-code for the block-matrix multiplication follows:

double b1[Nb][Nb],b2[Nb][Nb],b3[Nb][Nb]; /* Regular C arrays are faster
                                               -- use them! */

for (I=0; I<n/Nb; I++)
  for (J=0; J<n/Nb; J++) { /* Compute IJ block of M3: */
  

    -> YOUR CODE: zero out b3, an Nb x Nb array to store IJ block of M3

    for (K=0; K<n/Nb; K++) { /* Add to b3 product of IK block of M1
                                with KJ block of M2 */
      /* Copy IK block of M1 into b1, an Nb x Nb matrix */
      for (i=0; i<Nb; i++) 
        for (k=0; k<Nb; k++)
         b1[i][k]=M1[I*Nb+i][K*Nb+k];

      /* Copy KJ block of M2 into b2, an Nb x Nb matrix */
      -> YOUR CODE

      /* Do b1*b2, adding result into b3 */
      for (i=0; i<Nb; i++)
        for (j=0; j<Nb; j++)
          for (k=0; k<Nb; k++)
            b3[i][j]+=b1[i][k]*b2[k][j];
    }
  -> YOUR CODE: Copy b3 into IJ block of M3
  }
}



Tomas Arias
Mon Apr 2 13:24:52 EDT 2001