CamCASP/ToDo/diskIO

From CUC3
Jump to navigation Jump to search

CamCASP => ToDo => Disk I/O

Disk I/O

We currently spend a lot of time in disk I/O. Three years ago the situation was not so bad: we used stripping (Raid 0) and CPUs were not as fast, so the CPU-time/IO-time ratio was large. Now, with multicore CPUs and incredibly fast memory access/larger caches/... this ratio is getting smaller. I should find a concrete example that compares the old and new, but till I do that, here are some numbers for a calculation done on the new quad-core machine in UCL:

This is the pyridine...Ne daTZ/MC scan of <math>E^{(1)}_{\rm elst}</math>, <math>E^{(2)}_{\rm ind,pol}</math> and <math>E^{(2)}_{\rm disp,pol}</math>. 2370 configurations. So routines were called many times over.

  Timing Report
 ===============
 Subroutine          Number of Calls   Time (seconds)
 main_parser                  1            62202.78
 matrix_write             82132            14067.99  <----***
 matmult_types            70179            41490.18
 df_parser                    2             2414.72
 df_monomer               21357             2435.48
 df_int_for_df            28475             8678.77
 make_integrals_for_d     52175             8658.32
 make_T_AO_mono               2             1385.31
 lineq_solver_lu             23             3422.38
 matrix_read             187396            17104.03  <---***
 lineq_lu_iter               25             3182.26

 matvec_types              9482             3451.95

 energy_scan                  1            59787.24

 df_int                   40294            26046.73
 make_oneeint              9480             2970.26

 DIaux                     7110             2959.49
 calculate_e2ind           2370            18534.72
 init_DF_algorithm            1             1713.89
 densfit_prop                21             4777.32
 init_prop                   21             1403.91
 make_twoeint              7114            23033.80
 make_D_S_D                7114            16333.19

 make_j_matrix             2370             6694.01
 calculate_e2disp          2370            41134.33

 calculate_e2disp_UC       2370            12592.81
 n3_algorithm             23700            25477.80
 ====================================

This is a partial report. I've trimmed out the irrelevant information.

Look at the amount of time spent in disk I/O (matrix_read/write): 31171 sec, or about 50% of the total CPU time. This is lousy and must be avoided.

So what do we do?

  1. Identify objects that are needed very often and keep them in memory.
  2. Make a large-memory route possible, so that nothing is written out to disk.
  3. Parallelize the code.

The first is the easiest to implement and should work very well. What would these objects be?

  1. DF objects: D, S matrices. All the integrals need these.
  2. Density-fitted Hessians.
  3. MOs

None of these are large (at most <math> M \times v^2</math>, but we could choose to keep only the <math>M \times ov</math> and <math>M \times o^2</math> or smaller, objects in memory). But they are needed very often. The simplest way of preventing them being written out (repeatedly) is to define a flag like D%AlwaysInMemory in these object types that will prevent any of the matrix routines from releasing them from memory or writing them to disk.

I think this will work, and will not require a large amount of my time.


I just thought of a more interesting solution: Use the above method for keeping some objects in memory all the time, but allow high-level modules to decide on which of these are to be kept in memory. This would be more flexible as the kinds of objects that are often used could differ in different calculations.


I suspect that a lot of the matrix writing is done in df_int. is this true?