Difference between revisions of "NECI Re-write"

From CUC3
Jump to navigation Jump to search
import>Ajwt3
import>Ajwt3
Line 23: Line 23:
 
* Modularize the basis. Again this will include lists of basis functions, energies and symmetries. Probably also the interfaces to access the 1- and 2-electron integrals. --[[User:ajwt3|alex]] 16:36, 11 December 2007 (GMT)
 
* Modularize the basis. Again this will include lists of basis functions, energies and symmetries. Probably also the interfaces to access the 1- and 2-electron integrals. --[[User:ajwt3|alex]] 16:36, 11 December 2007 (GMT)
 
* Modularize the many-electron system. By this I'm thinking of how we deal with determinants. The initial code used just lists of electrons in the determinants. As number of electrons increases, this will mean large amounts of copying essentially redundant data (very few electrons actually change in a particular process). I've already hacked some of the code (the star I think) to deal with 'excitation-based' determinants - i.e. excitations with respect to the Hartree-Fock determinant. This sort of thing should be transparent to the actual users (i.e. subroutines) of the determinants. --[[User:ajwt3|alex]] 16:36, 11 December 2007 (GMT)
 
* Modularize the many-electron system. By this I'm thinking of how we deal with determinants. The initial code used just lists of electrons in the determinants. As number of electrons increases, this will mean large amounts of copying essentially redundant data (very few electrons actually change in a particular process). I've already hacked some of the code (the star I think) to deal with 'excitation-based' determinants - i.e. excitations with respect to the Hartree-Fock determinant. This sort of thing should be transparent to the actual users (i.e. subroutines) of the determinants. --[[User:ajwt3|alex]] 16:36, 11 December 2007 (GMT)
  +
  +
= Problems =
  +
  +
* I have been thinking about objectifying code for some time. One problem is that FORTRAN objects (TYPEs) cannot contain variable length data. In particular a general Determinant type would need to have a field with the length of the number of electrons which is a variable. One solution (which I dislike) is to include a fortran variable sized pointer array. This means that every time you allocate memory for a determinant (which is done quite a lot) you also then allocate a separate portion of memory for the actual list of electrons. This is messy and inefficient. Another solution is to consider another language. C++ (with templates) can handle this sort of object with ease. Since object files and linking conventions have been around for a good 30 years, I think most of the problems linking different languages have been sorted out in this time. Could we venture into cross-language programming? My experience from Q-Chem is mixed. It is written in FORTRAN, C and C++, and this seems to cause quite a bit of hassle -- not with linking, but with misunderstanding of interfaces. Worse still, there are C++ wrappers on Fortran routines which seem to misunderstand the data-encapsulation nature of objects. This can lead to computational overhead if handled badly. I'm undecided. --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
  +
* Documentation. Experience shows that documentation '''must''' be added at the time the code is written. If interfaces are not well-defined when writing code (i.e. the code is very experimental, and you're not quite sure what it can/will do) then keep it out of the repository. Once code is checked into the repository it should be documented and understandable (both in the interface and the general algorithm). References to papers in the code are good. If a concept is too difficult to describe in the code, then refer to a document kept with the code (otherwise it will be lost). --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
  +
* Documentation 2. I've been wondering how to enforce this, and it seems that the best way may be to get someone else to read through the code and add extra comments to it (or complain about it) once checked in (or before) - not sure whether this is a good idea - it might mean more work, but it will ensure a well-commented codebase. --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
  +
* Programmers' Reference. There are automatic programmers' reference generation systems which take comments from the code and turn them into a TeX reference manual. [http://www.stack.nl/~dimitri/doxygen/ doxygen] is one such system. Does anyone know of it? --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)

Revision as of 17:59, 11 December 2007

Issues

  • Code has been hacked together over time, with an emphasis on just getting it to work. Rarely have we gone back to tidy up.
  • The code base has become increasingly harder to deal with, maintain and develop.

Alex, George and James have all agreed to spend time cleaning it up. We need to apportion tasks and decide on the code structure. We will set aside time for this when Alex is in Cambridge 19th-20th December. To best make use of this time, we should think about what changes we need to make and how.

Please sign contributions!

Proposals

  • Use the only specifier when declaring modules (for me, this is non-negotiable). --james 15:21, 6 December 2007 (GMT)
  • Transfer include files to modules? How far do we want to go on this? --james 15:21, 6 December 2007 (GMT)

Tasks

  • Rewrite the test suite (I'll do this). --james 15:21, 6 December 2007 (GMT)
  • Create a branch for this work on svn (ditto). --james 15:24, 6 December 2007 (GMT)
  • Rewrite the main structure (neci.F). This will be painful, but a good opportunity to document/excise old code and to modularise (I suppose I've got the knowledge for that, but less the inclination) --alex 16:36, 11 December 2007 (GMT)
  • Add 'dated' input default sets -- we've talked about this but never quite got round to it --alex 16:36, 11 December 2007 (GMT)
  • Modularize the input. By this I think it would be useful to be able to add options to the input without having to add things to include files. --alex 16:36, 11 December 2007 (GMT)
  • Modularize the system-specific data. I think it would be useful to have some sort of global 'system' object which contains details specifying the system (e.g. for Hubbard the size, t and U values etc. Other data for CPMD. For read-in integrals, this could just be symmetry info.) --alex 16:36, 11 December 2007 (GMT)
  • Modularize the basis. Again this will include lists of basis functions, energies and symmetries. Probably also the interfaces to access the 1- and 2-electron integrals. --alex 16:36, 11 December 2007 (GMT)
  • Modularize the many-electron system. By this I'm thinking of how we deal with determinants. The initial code used just lists of electrons in the determinants. As number of electrons increases, this will mean large amounts of copying essentially redundant data (very few electrons actually change in a particular process). I've already hacked some of the code (the star I think) to deal with 'excitation-based' determinants - i.e. excitations with respect to the Hartree-Fock determinant. This sort of thing should be transparent to the actual users (i.e. subroutines) of the determinants. --alex 16:36, 11 December 2007 (GMT)

Problems

  • I have been thinking about objectifying code for some time. One problem is that FORTRAN objects (TYPEs) cannot contain variable length data. In particular a general Determinant type would need to have a field with the length of the number of electrons which is a variable. One solution (which I dislike) is to include a fortran variable sized pointer array. This means that every time you allocate memory for a determinant (which is done quite a lot) you also then allocate a separate portion of memory for the actual list of electrons. This is messy and inefficient. Another solution is to consider another language. C++ (with templates) can handle this sort of object with ease. Since object files and linking conventions have been around for a good 30 years, I think most of the problems linking different languages have been sorted out in this time. Could we venture into cross-language programming? My experience from Q-Chem is mixed. It is written in FORTRAN, C and C++, and this seems to cause quite a bit of hassle -- not with linking, but with misunderstanding of interfaces. Worse still, there are C++ wrappers on Fortran routines which seem to misunderstand the data-encapsulation nature of objects. This can lead to computational overhead if handled badly. I'm undecided. --alex 16:59, 11 December 2007 (GMT)
  • Documentation. Experience shows that documentation must be added at the time the code is written. If interfaces are not well-defined when writing code (i.e. the code is very experimental, and you're not quite sure what it can/will do) then keep it out of the repository. Once code is checked into the repository it should be documented and understandable (both in the interface and the general algorithm). References to papers in the code are good. If a concept is too difficult to describe in the code, then refer to a document kept with the code (otherwise it will be lost). --alex 16:59, 11 December 2007 (GMT)
  • Documentation 2. I've been wondering how to enforce this, and it seems that the best way may be to get someone else to read through the code and add extra comments to it (or complain about it) once checked in (or before) - not sure whether this is a good idea - it might mean more work, but it will ensure a well-commented codebase. --alex 16:59, 11 December 2007 (GMT)
  • Programmers' Reference. There are automatic programmers' reference generation systems which take comments from the code and turn them into a TeX reference manual. doxygen is one such system. Does anyone know of it? --alex 16:59, 11 December 2007 (GMT)