CamCASP/Programming/6

From CUC3
Revision as of 16:58, 20 April 2010 by import>Am592 (→‎init_record_handler)
Jump to navigation Jump to search

CamCASP => Programming => Direct Access Files

CamCASP uses objects (matrices and vectors) that can be stored as direct access files. This allows the code to free memory on demand in a dynamic manner. At present, the files I/O is handled by record_handler.F90. This module is now about 6 years old and is unnecessarily complicated and probably slow (no tests were ever done). Also, recent versions of gfortran seem to be unable to handle the module correctly with the result that we cannot access files larger than about 512MB. I'm not sure what's causing this. It could very well be a genuine bug in the module, or elsewhere, but it is proving rather hard to find. The main reason for this is the complexity of record_handler.F90. So I feel like the time has come for us to retire this module.

record_handler.F90

This module was constructed at a time when I needed to use integrals from the SAPT code and (rightly) wanted to hide the complex file structure from the higher-level modules. In this structure, each matrix is stored in a fixed record length direct-access file in a linked-list like manner. The first record contains the table of contents that indexes the records written. This is followed by the data. Matrix rows can be written across multiple records, furthermore, a list of indices is written along with the data, so sparse matrices can be stored. There's more to it, but this will do. It can do a lot, but at the expense of complexity and CPU power.

Much of the functionality is no longer used. We no longer use integrals from SAPT, and we do not use sparsity. Also, there is no fundamental reason to use records of fixed length, instead we can very well use record lengths that depend on the size of the matrix. I am doing this now with records of 50000 double precision reals (400000 bytes long). This is a 19GB large file. And is likely the upper bound on files we would access - for larger systems, we'd have to go parallel. If we settle for variable record length direct-access files, we no longer need a table of contents and could dramatically simplify the structure of the file.

User-accessible subroutines of record_handler.F90 and functionality:

init_record_handler

  !INPUT: 
  !    filename = character(10) file name
  !    filestatus = character(3): OLD/old or NEW/new or OVR/ovr or ADD/add
  !                 NEW: This is a new file to be opened
  !                 OLD: This is an old file already opened and closed in the
  !                      current run.
  !                 OVR: This file has been written and closed, overwrite it.
  !                 ADD: This file has been created in an earlier run. Add it to
  !                      record_handler.
  !    info      =  0 : normal exit
  !               < 0 : Error
  !    num_records = integer (optional) is set equal to the number of records
  !    printinfo = LOGICAL (optional) controls printing.
  !
  !filestatus tells the record_handler whether the file has already been created
  !(old or OLD) or whether it is to be written into for the first time (new or
  !NEW). In the latter case, any existing file by the same name is over-written.
  !
  !Table of Contents (TOC):
  !------------------------
  !Unlike the case with SAPT where only a small (less than 90) number of types
  !of integrals needed to be written to disk, the routines in this module are
  !designed to handle a large (really large!) number of types of integrals. Each
  !type needs its own entry in the TOC. This means that the TOC can be quite
  !large and may well span many records itself. This, of course, means that we
  !must have some mechanism for expanding the TOC to arbritrary length. This
  !will be done by using the linked-list idea which is described next.
  !  Record no.     Contents
  !     1           TOC(1) 
  !     2           integrals
  !     ...         ...
  !    8189         integrals
  !    8190         TOC(2)
  !    8191         integrals
  !     ...         ...
  !    etc
  !
  !There are reclen = 32760 bytes in a record. Thus, each TOC can store 8190
  !integer(4) numbers. 8189 of these will point to records containing integrals.
  !The last, the 8190^th will contain the record of the next TOC. In this way
  !the TOC can be expanded arbritrarliy. The user need not worry about this. 
  !

write_record

read_record

leave_record_handler

query_record_handler