Difference between revisions of "NECI Re-write"

From CUC3
Jump to navigation Jump to search
import>Jss43
import>Jss43
Line 40: Line 40:
   
 
* I have been thinking about objectifying code for some time. One problem is that FORTRAN objects (TYPEs) cannot contain variable length data. In particular a general Determinant type would need to have a field with the length of the number of electrons which is a variable. One solution (which I dislike) is to include a fortran variable sized pointer array. This means that every time you allocate memory for a determinant (which is done quite a lot) you also then allocate a separate portion of memory for the actual list of electrons. This is messy and inefficient. Another solution is to consider another language. C++ (with templates) can handle this sort of object with ease. Since object files and linking conventions have been around for a good 30 years, I think most of the problems linking different languages have been sorted out in this time. Could we venture into cross-language programming? My experience from Q-Chem is mixed. It is written in FORTRAN, C and C++, and this seems to cause quite a bit of hassle -- not with linking, but with misunderstanding of interfaces. Worse still, there are C++ wrappers on Fortran routines which seem to misunderstand the data-encapsulation nature of objects. This can lead to computational overhead if handled badly. I'm undecided. --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
 
* I have been thinking about objectifying code for some time. One problem is that FORTRAN objects (TYPEs) cannot contain variable length data. In particular a general Determinant type would need to have a field with the length of the number of electrons which is a variable. One solution (which I dislike) is to include a fortran variable sized pointer array. This means that every time you allocate memory for a determinant (which is done quite a lot) you also then allocate a separate portion of memory for the actual list of electrons. This is messy and inefficient. Another solution is to consider another language. C++ (with templates) can handle this sort of object with ease. Since object files and linking conventions have been around for a good 30 years, I think most of the problems linking different languages have been sorted out in this time. Could we venture into cross-language programming? My experience from Q-Chem is mixed. It is written in FORTRAN, C and C++, and this seems to cause quite a bit of hassle -- not with linking, but with misunderstanding of interfaces. Worse still, there are C++ wrappers on Fortran routines which seem to misunderstand the data-encapsulation nature of objects. This can lead to computational overhead if handled badly. I'm undecided. --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
  +
  +
My (limited) experience is that mixed code is frequently messy, nasty and best avoided if possible. Plus, only Alex has any real experience with C or C++... Could we have a general determinant type based on a reference determinant, which is regarded as "special" (i.e. contains the full list of electrons)? --[[User:jss43|james]] 14:06, 12 December 2007 (GMT)
  +
 
* Documentation. Experience shows that documentation '''must''' be added at the time the code is written. If interfaces are not well-defined when writing code (i.e. the code is very experimental, and you're not quite sure what it can/will do) then keep it out of the repository. Once code is checked into the repository it should be documented and understandable (both in the interface and the general algorithm). References to papers in the code are good. If a concept is too difficult to describe in the code, then refer to a document kept with the code (otherwise it will be lost). --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
 
* Documentation. Experience shows that documentation '''must''' be added at the time the code is written. If interfaces are not well-defined when writing code (i.e. the code is very experimental, and you're not quite sure what it can/will do) then keep it out of the repository. Once code is checked into the repository it should be documented and understandable (both in the interface and the general algorithm). References to papers in the code are good. If a concept is too difficult to describe in the code, then refer to a document kept with the code (otherwise it will be lost). --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
  +
  +
Rather, a much better practice would be to use a development branch for experimental work, and only commit clean, well-commented code to the main repository. I will still carry out the nightly tests of the main (stable) repository, but there should really be no breakage. If anyone wishes, I can provide the code to carry out regular tests of a branch. --[[User:jss43|james]] 14:06, 12 December 2007 (GMT)
  +
 
* Documentation 2. I've been wondering how to enforce this, and it seems that the best way may be to get someone else to read through the code and add extra comments to it (or complain about it) once checked in (or before) - not sure whether this is a good idea - it might mean more work, but it will ensure a well-commented codebase. --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
 
* Documentation 2. I've been wondering how to enforce this, and it seems that the best way may be to get someone else to read through the code and add extra comments to it (or complain about it) once checked in (or before) - not sure whether this is a good idea - it might mean more work, but it will ensure a well-commented codebase. --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
  +
  +
Agreed. I think it's worth the extra work. Also, as we modify code, commenting on the existing codebase would be A Good Thing. --[[User:jss43|james]] 14:06, 12 December 2007 (GMT)
  +
 
* Programmers' Reference. There are automatic programmers' reference generation systems which take comments from the code and turn them into a TeX reference manual. [http://www.stack.nl/~dimitri/doxygen/ doxygen] is one such system. Does anyone know of it? --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
 
* Programmers' Reference. There are automatic programmers' reference generation systems which take comments from the code and turn them into a TeX reference manual. [http://www.stack.nl/~dimitri/doxygen/ doxygen] is one such system. Does anyone know of it? --[[User:ajwt3|alex]] 16:59, 11 December 2007 (GMT)
  +
  +
Such a resource would be very helpful for new people joining the group/working on areas of the code not written by you. Doxygen cannot parse Fortran code, sadly. I assume there's a similar resource available for Fortran... --[[User:jss43|james]] 14:06, 12 December 2007 (GMT)

Revision as of 14:06, 12 December 2007

Issues

  • Code has been hacked together over time, with an emphasis on just getting it to work. Rarely have we gone back to tidy up.
  • The code base has become increasingly harder to deal with, maintain and develop.

Alex, George and James have all agreed to spend time cleaning it up. We need to apportion tasks and decide on the code structure. We will set aside time for this when Alex is in Cambridge 19th-20th December. To best make use of this time, we should think about what changes we need to make and how.

Please sign contributions!

Proposals

  • Use the only specifier when declaring modules (for me, this is non-negotiable). --james 15:21, 6 December 2007 (GMT)
  • Transfer include files to modules? How far do we want to go on this? --james 15:21, 6 December 2007 (GMT)

Tasks

  • Rewrite the test suite (I'll do this). --james 15:21, 6 December 2007 (GMT)
  • Create a branch for this work on svn (ditto). --james 15:24, 6 December 2007 (GMT)
  • Rewrite the main structure (neci.F). This will be painful, but a good opportunity to document/excise old code and to modularise (I suppose I've got the knowledge for that, but less the inclination) --alex 16:36, 11 December 2007 (GMT)
  • Add 'dated' input default sets -- we've talked about this but never quite got round to it --alex 16:36, 11 December 2007 (GMT)

For this to be useful, we need to have very clear documentation of what is in each default input set. --james 13:55, 12 December 2007 (GMT)

  • Modularize the input. By this I think it would be useful to be able to add options to the input without having to add things to include files.

--alex 16:36, 11 December 2007 (GMT)

So all the input options are stored in modules instead, right? This would be good. We can also get rid of the IMPLICIT REAL statement (ugh!). --james 13:55, 12 December 2007 (GMT)

  • Modularize the system-specific data. I think it would be useful to have some sort of global 'system' object which contains details specifying the system (e.g. for Hubbard the size, t and U values etc. Other data for CPMD. For read-in integrals, this could just be symmetry info.) --alex 16:36, 11 December 2007 (GMT)

Would it be useful to have separate modules for each system type, or just one covering everything in the System section? --james 13:55, 12 December 2007 (GMT)

  • Modularize the basis. Again this will include lists of basis functions, energies and symmetries. Probably also the interfaces to access the 1- and 2-electron integrals. --alex 16:36, 11 December 2007 (GMT)
  • Modularize the many-electron system. By this I'm thinking of how we deal with determinants. The initial code used just lists of electrons in the determinants. As number of electrons increases, this will mean large amounts of copying essentially redundant data (very few electrons actually change in a particular process). I've already hacked some of the code (the star I think) to deal with 'excitation-based' determinants - i.e. excitations with respect to the Hartree-Fock determinant. This sort of thing should be transparent to the actual users (i.e. subroutines) of the determinants. --alex 16:36, 11 December 2007 (GMT)

Problems

  • I have been thinking about objectifying code for some time. One problem is that FORTRAN objects (TYPEs) cannot contain variable length data. In particular a general Determinant type would need to have a field with the length of the number of electrons which is a variable. One solution (which I dislike) is to include a fortran variable sized pointer array. This means that every time you allocate memory for a determinant (which is done quite a lot) you also then allocate a separate portion of memory for the actual list of electrons. This is messy and inefficient. Another solution is to consider another language. C++ (with templates) can handle this sort of object with ease. Since object files and linking conventions have been around for a good 30 years, I think most of the problems linking different languages have been sorted out in this time. Could we venture into cross-language programming? My experience from Q-Chem is mixed. It is written in FORTRAN, C and C++, and this seems to cause quite a bit of hassle -- not with linking, but with misunderstanding of interfaces. Worse still, there are C++ wrappers on Fortran routines which seem to misunderstand the data-encapsulation nature of objects. This can lead to computational overhead if handled badly. I'm undecided. --alex 16:59, 11 December 2007 (GMT)

My (limited) experience is that mixed code is frequently messy, nasty and best avoided if possible. Plus, only Alex has any real experience with C or C++... Could we have a general determinant type based on a reference determinant, which is regarded as "special" (i.e. contains the full list of electrons)? --james 14:06, 12 December 2007 (GMT)

  • Documentation. Experience shows that documentation must be added at the time the code is written. If interfaces are not well-defined when writing code (i.e. the code is very experimental, and you're not quite sure what it can/will do) then keep it out of the repository. Once code is checked into the repository it should be documented and understandable (both in the interface and the general algorithm). References to papers in the code are good. If a concept is too difficult to describe in the code, then refer to a document kept with the code (otherwise it will be lost). --alex 16:59, 11 December 2007 (GMT)

Rather, a much better practice would be to use a development branch for experimental work, and only commit clean, well-commented code to the main repository. I will still carry out the nightly tests of the main (stable) repository, but there should really be no breakage. If anyone wishes, I can provide the code to carry out regular tests of a branch. --james 14:06, 12 December 2007 (GMT)

  • Documentation 2. I've been wondering how to enforce this, and it seems that the best way may be to get someone else to read through the code and add extra comments to it (or complain about it) once checked in (or before) - not sure whether this is a good idea - it might mean more work, but it will ensure a well-commented codebase. --alex 16:59, 11 December 2007 (GMT)

Agreed. I think it's worth the extra work. Also, as we modify code, commenting on the existing codebase would be A Good Thing. --james 14:06, 12 December 2007 (GMT)

  • Programmers' Reference. There are automatic programmers' reference generation systems which take comments from the code and turn them into a TeX reference manual. doxygen is one such system. Does anyone know of it? --alex 16:59, 11 December 2007 (GMT)

Such a resource would be very helpful for new people joining the group/working on areas of the code not written by you. Doxygen cannot parse Fortran code, sadly. I assume there's a similar resource available for Fortran... --james 14:06, 12 December 2007 (GMT)