next up previous contents
Next: Basic Linear Algebra Subroutines Up: Inner-loop operations Previous: timer()

Other inner-loop operations

Write subroutines, with prototypes identical to mult(), which perform the following operations:

  1. divide(): works just like mult (doing n tex2html_wrap_inline582 operations) but, after initializing x to 1,does n tex2html_wrap_inline584 divisions of x by 1.000001.
  2. multsub(): performs the same operation as mult, but where multsub() makes a subroutine call
    sub(x,1.000001)
    to perform the computation. This measures the time needed to make the subroutine calls. To prevent the compiler from automatically in-lining the subroutine, you should put sub() in a separate file.
  3. multif(): within the same loop as for mult(), test whether x<1.00001. If so, multiply by 1.000001; otherwise multiply by 0.99. Note that you must return the value of x: If you forget to return x, many compilers are smart enough to realize that you don't need the loop, and they'll skip it entirely!
  4. multal(): works just like mult, but, within the loop, makes a call to allocate/deallocate memory
    p=dvector(0,1);
    free_dvector(p,0,1);
    This gives a feel for the cost of making calls to the operating system.
  5. multadd(): works like mult(), but performs multiply-add operations instead,
      x+=x*0.000001;
    Note that each of these counts as two FLOP (one add and one multiply).
  6. cmat1(): loops through all n tex2html_wrap_inline572 n elements of the matrix M1 and copies them into matrix M3 with a loop which steps through memory sequentially, and repeats this operation n times:
    for (k=0; k<n; k++)
      for (i=0; i<n; i++)
        for (j=0; j<n; j++)
          M3[i][j]=M1[i][j];
    Note that this loop will run a lot slower than mult(), which does a number of multiplies equal to the number of copies here. This gives an idea of the cost of accessing the main memory.
  7. cmat2(): The same as cmat1(), but now with the order of the loops changed. This gives an idea of the cost of a cache-miss, equivalently the cache-line length (measured in number of double's). Note that, because some part of your matrices fit in cache, you'll have to run large problems to see the true cost of a cache miss.

Finally, in main, uncomment the timer() calls to your new routines as well as the section of print statements labeled ``Print out time ratios.'' Running your code for various sizes, you should find that most operations proceed at a constant rate, except for the cmat operations. The speed of these depends on the size of your matrices compared to the size of the cache memory!


next up previous contents
Next: Basic Linear Algebra Subroutines Up: Inner-loop operations Previous: timer()

Tomas Arias
Mon Apr 2 13:24:52 EDT 2001