CamCASP/ToDo/diskIO
Disk I/O
We currently spend a lot of time in disk I/O. Three years ago the situation was not so bad: we used stripping (Raid 0) and CPUs were not as fast, so the CPU-time/IO-time ratio was large. Now, with multicore CPUs and incredibly fast memory access/larger caches/... this ratio is getting smaller. I should find a concrete example that compares the old and new, but till I do that, here are some numbers for a calculation done on the new quad-core machine in UCL:
This is the pyridine...Ne daTZ/MC scan of <math>E^{(1)}_{\rm elst}</math>, <math>E^{(2)}_{\rm ind,pol}</math> and <math>E^{(2)}_{\rm disp,pol}</math>. 2370 configurations. So routines were called many times over.
Timing Report =============== Subroutine Number of Calls Time (seconds) main_parser 1 62202.78 matrix_write 82132 14067.99 <----*** matmult_types 70179 41490.18 df_parser 2 2414.72 df_monomer 21357 2435.48 df_int_for_df 28475 8678.77 make_integrals_for_d 52175 8658.32 make_T_AO_mono 2 1385.31 lineq_solver_lu 23 3422.38 matrix_read 187396 17104.03 <---*** lineq_lu_iter 25 3182.26 matvec_types 9482 3451.95 energy_scan 1 59787.24 df_int 40294 26046.73 make_oneeint 9480 2970.26 DIaux 7110 2959.49 calculate_e2ind 2370 18534.72 init_DF_algorithm 1 1713.89 densfit_prop 21 4777.32 init_prop 21 1403.91 make_twoeint 7114 23033.80 make_D_S_D 7114 16333.19 make_j_matrix 2370 6694.01 calculate_e2disp 2370 41134.33 calculate_e2disp_UC 2370 12592.81 n3_algorithm 23700 25477.80 ====================================
This is a partial report. I've trimmed out the irrelevant information.
Look at the amount of time spent in disk I/O (matrix_read/write): 31171 sec, or about 50% of the total CPU time. This is lousy and must be avoided.
So what do we do?
- Identify objects that are needed very often and keep them in memory.
- Make a large-memory route possible, so that nothing is written out to disk.
- Parallelize the code.
The first is the easiest to implement and should work very well. What would these objects be?
- DF objects: D, S matrices. All the integrals need these.
- Density-fitted Hessians.
- MOs
None of these are large (at most <math> M \times v^2</math>, but we could choose to keep only the <math>M \times ov</math> and <math>M \times o^2</math> or smaller, objects in memory). But they are needed very often. The simplest way of preventing them being written out (repeatedly) is to define a flag like D%AlwaysInMemory in these object types that will prevent any of the matrix routines from releasing them from memory or writing them to disk.
I think this will work, and will not require a large amount of my time.
I just thought of a more interesting solution: Use the above method for keeping some objects in memory all the time, but allow high-level modules to decide on which of these are to be kept in memory. This would be more flexible as the kinds of objects that are often used could differ in different calculations.
I suspect that a lot of the matrix writing is done in df_int. is this true?