next up previous contents
Next: Calls to dvector() and Up: No Title Previous: Convergence rate

Optimization

The most dramatic improvements (often by many orders of magnitude) in computational science almost always come from the development of better algorithms, such as the modified Broyden algorithm above.

Important gains (one or two orders of magnitude) also may be achieved through making proper use of the CPU and its architecture. Among such optimizations, often the greatest gains come from eliminating unnecessary operations in those parts of the software which are executed many times over and over. These operations are most frequently buried in the deepest loop in the code, the so-called ``inner-most'' loop.

As we learned in lab, among the most wasteful operations in the inner-most loop which should be avoided at all costs are subroutine calls.

To ascertain the impact of the improvements we are about to make, please ``#define Itmx 10'' in your code and time it. (For us, at this stage, broyden took 10 sec to run the initial five plus the ten Broyden iterations.)

Running our initial code through the gnu profiler (compiling with the -pg -O3 flags, running the code and then typing ``gprof''), we found the data below.

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 56.54      1.08     1.08       25    43.20    76.40  getg
 27.75      1.61     0.53     2419     0.22     0.27  schint
  6.81      1.74     0.13       25     5.20     5.25  getphi
  3.66      1.81     0.07  4737925     0.00     0.00  rk4p480
  1.57      1.84     0.03 14063775     0.00     0.00  derivs_Schrodinger
  1.57      1.87     0.03   200050     0.00     0.00  excp
  1.05      1.89     0.02   100025     0.00     0.00  exc
  0.52      1.90     0.01 14221421     0.00     0.00  dvector
  0.52      1.91     0.01 14221421     0.00     0.00  free_dvector
  0.00      1.91     0.00   150000     0.00     0.00  derivs_Poisson
  0.00      1.91     0.00     1880     0.00     0.27  func_Schrodinger
  0.00      1.91     0.00      389     0.00     0.27  func_SchrodingerNodes
  0.00      1.91     0.00      150     0.00     0.00  simpint
  0.00      1.91     0.00      125     0.00     0.83  rtbisp480
  0.00      1.91     0.00       75     0.00     0.54  getPsi
  0.00      1.91     0.00       75     0.00     6.72  zriddrp480
  0.00      1.91     0.00       30     0.00     0.00  dmatrix
  0.00      1.91     0.00       30     0.00     0.00  free_dmatrix
  0.00      1.91     0.00       25     0.00     0.00  d3tensor
  0.00      1.91     0.00       25     0.00     0.00  free_d3tensor
  0.00      1.91     0.00       10     0.00     0.00  lubksbp480
  0.00      1.91     0.00       10     0.00     0.00  ludcmpp480
  0.00      1.91     0.00        2     0.00     0.00  free_ivector
  0.00      1.91     0.00        2     0.00     0.00  ivector
  0.00      1.91     0.00        1     0.00  1910.00  main




Tomas Arias
Mon Apr 2 13:24:52 EDT 2001