The most dramatic improvements (often by many orders of magnitude) in computational science almost always come from the development of better algorithms, such as the modified Broyden algorithm above.
Important gains (one or two orders of magnitude) also may be achieved through making proper use of the CPU and its architecture. Among such optimizations, often the greatest gains come from eliminating unnecessary operations in those parts of the software which are executed many times over and over. These operations are most frequently buried in the deepest loop in the code, the so-called ``inner-most'' loop.
As we learned in lab, among the most wasteful operations in the inner-most loop which should be avoided at all costs are subroutine calls.
To ascertain the impact of the improvements we are about to make, please ``#define Itmx 10'' in your code and time it. (For us, at this stage, broyden took 10 sec to run the initial five plus the ten Broyden iterations.)
Running our initial code through the gnu profiler (compiling with the -pg -O3 flags, running the code and then typing ``gprof''), we found the data below.
% cumulative self self total time seconds seconds calls ms/call ms/call name 56.54 1.08 1.08 25 43.20 76.40 getg 27.75 1.61 0.53 2419 0.22 0.27 schint 6.81 1.74 0.13 25 5.20 5.25 getphi 3.66 1.81 0.07 4737925 0.00 0.00 rk4p480 1.57 1.84 0.03 14063775 0.00 0.00 derivs_Schrodinger 1.57 1.87 0.03 200050 0.00 0.00 excp 1.05 1.89 0.02 100025 0.00 0.00 exc 0.52 1.90 0.01 14221421 0.00 0.00 dvector 0.52 1.91 0.01 14221421 0.00 0.00 free_dvector 0.00 1.91 0.00 150000 0.00 0.00 derivs_Poisson 0.00 1.91 0.00 1880 0.00 0.27 func_Schrodinger 0.00 1.91 0.00 389 0.00 0.27 func_SchrodingerNodes 0.00 1.91 0.00 150 0.00 0.00 simpint 0.00 1.91 0.00 125 0.00 0.83 rtbisp480 0.00 1.91 0.00 75 0.00 0.54 getPsi 0.00 1.91 0.00 75 0.00 6.72 zriddrp480 0.00 1.91 0.00 30 0.00 0.00 dmatrix 0.00 1.91 0.00 30 0.00 0.00 free_dmatrix 0.00 1.91 0.00 25 0.00 0.00 d3tensor 0.00 1.91 0.00 25 0.00 0.00 free_d3tensor 0.00 1.91 0.00 10 0.00 0.00 lubksbp480 0.00 1.91 0.00 10 0.00 0.00 ludcmpp480 0.00 1.91 0.00 2 0.00 0.00 free_ivector 0.00 1.91 0.00 2 0.00 0.00 ivector 0.00 1.91 0.00 1 0.00 1910.00 main