Difference between revisions of "CamCASP/Programming/6"

From CUC3
Jump to navigation Jump to search
import>Am592
 
import>Am592
Line 2: Line 2:
   
 
CamCASP uses objects (matrices and vectors) that can be stored as direct access files. This allows the code to free memory on demand in a dynamic manner. At present, the files I/O is handled by '''record_handler.F90'''. This module is now about 6 years old and is unnecessarily complicated and probably slow (no tests were ever done). Also, recent versions of '''gfortran''' seem to be unable to handle the module correctly with the result that we cannot access files larger than about 512MB. I'm not sure what's causing this. It could very well be a genuine bug in the module, or elsewhere, but it is proving rather hard to find. The main reason for this is the complexity of '''record_handler.F90'''. So I feel like the time has come for us to retire this module.
 
CamCASP uses objects (matrices and vectors) that can be stored as direct access files. This allows the code to free memory on demand in a dynamic manner. At present, the files I/O is handled by '''record_handler.F90'''. This module is now about 6 years old and is unnecessarily complicated and probably slow (no tests were ever done). Also, recent versions of '''gfortran''' seem to be unable to handle the module correctly with the result that we cannot access files larger than about 512MB. I'm not sure what's causing this. It could very well be a genuine bug in the module, or elsewhere, but it is proving rather hard to find. The main reason for this is the complexity of '''record_handler.F90'''. So I feel like the time has come for us to retire this module.
  +
  +
=='''record_handler.F90'''==
  +
This module was constructed at a time when I needed to use integrals from the '''SAPT''' code and (rightly) wanted to hide the complex file structure from the higher-level modules. In this structure, each matrix is stored in a fixed record length direct-access file in a linked-list like manner. The first record contains the table of contents that indexes the records written. This is followed by the data. Matrix rows can be written across multiple records, furthermore, a list of indices is written along with the data, so sparse matrices can be stored. There's more to it, but this will do. It can do a lot, but at the expense of complexity and CPU power.
  +
  +
Much of the functionality is no longer used. We no longer use integrals from SAPT, and we do not use sparsity. Also, there is no fundamental reason to use records of fixed length, instead we can very well use record lengths that depend on the size of the matrix. I am doing this now with records of 50000 double precision reals (400000 bytes long). This is a 19GB large file. And is likely the upper bound on files we would access - for larger systems, we'd have to go parallel. If we settle for variable record length direct-access files, we no longer need a table of contents and could dramatically simplify the structure of the file.
  +
  +
User-accessible subroutines of record_handler.F90 and functionality:
  +
===init_record_handler===
  +
  +
===write_record===
  +
  +
===read_record===
  +
  +
===leave_record_handler===
  +
  +
===query_record_handler===

Revision as of 17:57, 20 April 2010

CamCASP => Programming => Direct Access Files

CamCASP uses objects (matrices and vectors) that can be stored as direct access files. This allows the code to free memory on demand in a dynamic manner. At present, the files I/O is handled by record_handler.F90. This module is now about 6 years old and is unnecessarily complicated and probably slow (no tests were ever done). Also, recent versions of gfortran seem to be unable to handle the module correctly with the result that we cannot access files larger than about 512MB. I'm not sure what's causing this. It could very well be a genuine bug in the module, or elsewhere, but it is proving rather hard to find. The main reason for this is the complexity of record_handler.F90. So I feel like the time has come for us to retire this module.

record_handler.F90

This module was constructed at a time when I needed to use integrals from the SAPT code and (rightly) wanted to hide the complex file structure from the higher-level modules. In this structure, each matrix is stored in a fixed record length direct-access file in a linked-list like manner. The first record contains the table of contents that indexes the records written. This is followed by the data. Matrix rows can be written across multiple records, furthermore, a list of indices is written along with the data, so sparse matrices can be stored. There's more to it, but this will do. It can do a lot, but at the expense of complexity and CPU power.

Much of the functionality is no longer used. We no longer use integrals from SAPT, and we do not use sparsity. Also, there is no fundamental reason to use records of fixed length, instead we can very well use record lengths that depend on the size of the matrix. I am doing this now with records of 50000 double precision reals (400000 bytes long). This is a 19GB large file. And is likely the upper bound on files we would access - for larger systems, we'd have to go parallel. If we settle for variable record length direct-access files, we no longer need a table of contents and could dramatically simplify the structure of the file.

User-accessible subroutines of record_handler.F90 and functionality:

init_record_handler

write_record

read_record

leave_record_handler

query_record_handler