CamCASP/ToDo/diskIO
Disk I/O
We currently spend a lot of time in disk I/O. Three years ago the situation was not so bad: we used stripping (Raid 0) and CPUs were not as fast, so the CPU-time/IO-time ratio was large. Now, with multicore CPUs and incredibly fast memory access/larger caches/... this ratio is getting smaller. I should find a concrete example that compares the old and new, but till I do that, here are some numbers for a calculation done on the new quad-core machine in UCL:
This is the pyridine...Ne daTZ/MC scan of <math>E^{(1)}_{\rm elst}</math>, <math>E^{(2)}_{\rm ind,pol}</math> and <math>E^{(2)}_{\rm disp,pol}</math>. 2370 configurations. So routines were called many times over.
Timing Report =============== Subroutine Number of Calls Time (seconds) main_parser 1 62202.78 matrix_write 82132 14067.99 <----*** matmult_types 70179 41490.18 df_parser 2 2414.72 df_monomer 21357 2435.48 df_int_for_df 28475 8678.77 make_integrals_for_d 52175 8658.32 make_T_AO_mono 2 1385.31 lineq_solver_lu 23 3422.38 matrix_read 187396 17104.03 <---*** lineq_lu_iter 25 3182.26 matvec_types 9482 3451.95 energy_scan 1 59787.24 df_int 40294 26046.73 make_oneeint 9480 2970.26 DIaux 7110 2959.49 calculate_e2ind 2370 18534.72 init_DF_algorithm 1 1713.89 densfit_prop 21 4777.32 init_prop 21 1403.91 make_twoeint 7114 23033.80 make_D_S_D 7114 16333.19 make_j_matrix 2370 6694.01 calculate_e2disp 2370 41134.33 calculate_e2disp_UC 2370 12592.81 n3_algorithm 23700 25477.80 ====================================
This is a partial report. I've trimmed out the irrelevant information.
Look at the amount of time spent in disk I/O (matrix_read/write): 31171 sec, or about 50% of the total CPU time. This is lousy and must be avoided.
So what do we do?
- Identify objects that are needed very often and keep them in memory.
- Make a large-memory route possible, so that nothing is written out to disk.
- Parallelize the code.
The first is the easiest to implement and should work very well. What would these objects be?
- DF objects: D, S matrices. All the integrals need these.
- Density-fitted Hessians.
- MOs
None of these are large (at most <math> M \times v^2</math>, but we could choose to keep only the <math>M \times ov</math> and <math>M \times o^2</math> or smaller, objects in memory). But they are needed very often. The simplest way of preventing them being written out (repeatedly) is to define a flag like D%AlwaysInMemory in these object types that will prevent any of the matrix routines from releasing them from memory or writing them to disk.
I think this will work, and will not require a large amount of my time.