Next: Basic Linear Algebra Subroutines
Up: Inner-loop operations
Previous: timer()
Write subroutines, with prototypes identical to mult(), which perform
the following operations:
- divide(): works just like mult (doing n
operations) but,
after initializing x to 1,does n
divisions of x by 1.000001. - multsub(): performs the same operation as mult, but where multsub()
makes a subroutine call
sub(x,1.000001)
to perform the computation. This measures the time needed to make the
subroutine calls. To prevent the compiler from automatically in-lining
the subroutine, you should put sub() in a separate file. - multif(): within the same loop as for mult(), test whether
x<1.00001. If so, multiply by 1.000001; otherwise multiply by 0.99.
Note that you must return the value of x: If you forget to return x,
many compilers are smart enough to realize that you don't need the
loop, and they'll skip it entirely!
- multal(): works just like mult, but, within the loop, makes a
call to allocate/deallocate memory
p=dvector(0,1);
free_dvector(p,0,1);
This gives a feel for the cost of making calls to the operating
system. - multadd(): works like mult(), but performs multiply-add
operations instead,
x+=x*0.000001;
Note that each of these counts as two FLOP (one add and one multiply). - cmat1(): loops through all n
n elements of the matrix M1
and copies them into matrix M3 with a loop which steps through memory
sequentially, and repeats this operation n times:
for (k=0; k<n; k++)
for (i=0; i<n; i++)
for (j=0; j<n; j++)
M3[i][j]=M1[i][j];
Note that this loop will
run a lot slower than mult(), which does a number of multiplies
equal to the number of copies here. This gives an idea of the
cost of accessing the main memory. - cmat2(): The same as cmat1(), but now with the order of the
loops changed. This gives an idea of the cost of a cache-miss,
equivalently the cache-line length (measured in number of double's).
Note that, because some part of your matrices fit in cache, you'll
have to run large problems to see the true cost of a cache miss.
Finally, in main, uncomment the timer() calls to your new routines as
well as the section of print statements labeled ``Print out time
ratios.'' Running your code for various sizes, you should find that
most operations proceed at a constant rate, except for the cmat
operations. The speed of these depends on the size of your matrices
compared to the size of the cache memory!
Next: Basic Linear Algebra Subroutines
Up: Inner-loop operations
Previous: timer()
Tomas Arias
Mon Apr 2 13:24:52 EDT 2001