Above, it is clear that among the most frequently called routines are dvector and free_dvector. These calls are particularly wasteful because they involve requesting memory from the operating system. Gprof also tells us that almost all of these calls come from rk4p480!
-----------------------------------------------
0.00 0.00 4/14221421 main [1]
0.00 0.00 10/14221421 ludcmpp480 [17]
0.00 0.00 75/14221421 getphi [7]
0.00 0.00 150/14221421 getPsi [11]
0.00 0.00 150/14221421 getg [3]
0.00 0.00 7257/14221421 schint [4]
0.01 0.00 14213775/14221421 rk4p480 [8]
[15] 0.5 0.01 0.00 14221421 dvector [15]
-----------------------------------------------
To eliminate these calls comment out the lines
dym=dvector(1,n);
dyt=dvector(1,n);
yt=dvector(1,n);
.
.
free_dvector(yt,1,n);
free_dvector(dyt,1,n);
free_dvector(dym,1,n);
from rk4p480(), and replace the declarations of dym, dyt and yt with
double dym[3],dyt[3],yt[3];
This simple change reduced our run time (recompiling with just the -O3 flag to get an accurate timing) from 10 sec to 5 sec!!!