Difference between revisions of "NECI Re-write"

From CUC3
Jump to navigation Jump to search
import>Jss43
import>Jss43
Line 86: Line 86:
 
A further thought on this: a good practice would be to comment at the beginning of each routine what its purpose is (I know that this is done in many places, but certainly not all!).--[[User:jss43|james]] 18:01, 12 December 2007 (GMT)
 
A further thought on this: a good practice would be to comment at the beginning of each routine what its purpose is (I know that this is done in many places, but certainly not all!).--[[User:jss43|james]] 18:01, 12 December 2007 (GMT)
   
* Scalability: we are (well, I am) looking at larger and larger systems. There are various places in the code (eg uhfdet), where a maximum system size is hard-coded in and causes seg faults etc. Allocatable arrays are the way forward...
+
* Scalability: we are (well, I am) looking at larger and larger systems. There are various places in the code (eg uhfdet), where a maximum system size is hard-coded in and causes seg faults etc. Allocatable arrays are the way forward...--[[User:jss43|james]] 16:18, 18 December 2007 (GMT)

Revision as of 17:18, 18 December 2007

Issues

  • Code has been hacked together over time, with an emphasis on just getting it to work. Rarely have we gone back to tidy up.
  • The code base has become increasingly harder to deal with, maintain and develop.

Alex, George and James have all agreed to spend time cleaning it up. We need to apportion tasks and decide on the code structure. We will set aside time for this when Alex is in Cambridge 19th-20th December. To best make use of this time, we should think about what changes we need to make and how.

Please sign contributions!

Proposals

  • Use the only specifier when declaring modules (for me, this is non-negotiable). --james 15:21, 6 December 2007 (GMT)
  • Transfer include files to modules? How far do we want to go on this? --james 15:21, 6 December 2007 (GMT)

Tasks

  • Rewrite the test suite (I'll do this). --james 15:21, 6 December 2007 (GMT)
  • Create a branch for this work on svn (ditto). --james 15:24, 6 December 2007 (GMT)
  • Rewrite the main structure (neci.F). This will be painful, but a good opportunity to document/excise old code and to modularise (I suppose I've got the knowledge for that, but less the inclination) --alex 16:36, 11 December 2007 (GMT)

I think this will be very useful - there is little commenting on the neci.F file, and it is difficult to see a logical thread through it. Many of the flags for the IF blocks are very cryptic, especially the BTEST ones, and if the variable has a different name to the readinput one, things get very tricky. --ghb24 17:51, 12 December 2007 (GMT)

  • Add 'dated' input default sets -- we've talked about this but never quite got round to it --alex 16:36, 11 December 2007 (GMT)

For this to be useful, we need to have very clear documentation of what is in each default input set. --james 13:55, 12 December 2007 (GMT)

I'm happy to do this - seems easy enough --ghb24 17:51, 12 December 2007 (GMT)

  • Modularize the input. By this I think it would be useful to be able to add options to the input without having to add things to include files.

--alex 16:36, 11 December 2007 (GMT)

So all the input options are stored in modules instead, right? This would be good. We can also get rid of the IMPLICIT REAL statement (ugh!). --james 13:55, 12 December 2007 (GMT)

I'm happy to do this too - Will make a module file, and then lots of 'use' (only!) statements in the code, and then get rid of the include files. Only problem is that this really wants to go hand-in-hand with actually removing some of the variables which are passed through the whole program in ridiculously long argument strings (i.e. the next three tasks), which are more involved...I would want to help here, but doing it all would probably require a little more knowledge of the code... --ghb24 17:51, 12 December 2007 (GMT)

  • Modularize the system-specific data. I think it would be useful to have some sort of global 'system' object which contains details specifying the system (e.g. for Hubbard the size, t and U values etc. Other data for CPMD. For read-in integrals, this could just be symmetry info.) --alex 16:36, 11 December 2007 (GMT)

Would it be useful to have separate modules for each system type, or just one covering everything in the System section? --james 13:55, 12 December 2007 (GMT)

  • Modularize the basis. Again this will include lists of basis functions, energies and symmetries. Probably also the interfaces to access the 1- and 2-electron integrals. --alex 16:36, 11 December 2007 (GMT)
  • Modularize the many-electron system. By this I'm thinking of how we deal with determinants. The initial code used just lists of electrons in the determinants. As number of electrons increases, this will mean large amounts of copying essentially redundant data (very few electrons actually change in a particular process). I've already hacked some of the code (the star I think) to deal with 'excitation-based' determinants - i.e. excitations with respect to the Hartree-Fock determinant. This sort of thing should be transparent to the actual users (i.e. subroutines) of the determinants. --alex 16:36, 11 December 2007 (GMT)

This would be a big bonus in how we deal with new code, and would almost certainly affect the speed, and possibly scaling of the program --ghb24 17:51, 12 December 2007 (GMT)

  • Many have heard horror stories of the infamous excitation generators, but only Alex has confronted the beast - is there a way that these can be improved and made accessible to mere mortals, or are the problems with interfacing with CPMD too hard? - I'm not sure of the quality of the commenting in these, or the generality of their use either, but might be worth thinking about...--ghb24 17:51, 12 December 2007 (GMT)

Problems

  • I have been thinking about objectifying code for some time. One problem is that FORTRAN objects (TYPEs) cannot contain variable length data. In particular a general Determinant type would need to have a field with the length of the number of electrons which is a variable. One solution (which I dislike) is to include a fortran variable sized pointer array. This means that every time you allocate memory for a determinant (which is done quite a lot) you also then allocate a separate portion of memory for the actual list of electrons. This is messy and inefficient. Another solution is to consider another language. C++ (with templates) can handle this sort of object with ease. Since object files and linking conventions have been around for a good 30 years, I think most of the problems linking different languages have been sorted out in this time. Could we venture into cross-language programming? My experience from Q-Chem is mixed. It is written in FORTRAN, C and C++, and this seems to cause quite a bit of hassle -- not with linking, but with misunderstanding of interfaces. Worse still, there are C++ wrappers on Fortran routines which seem to misunderstand the data-encapsulation nature of objects. This can lead to computational overhead if handled badly. I'm undecided. --alex 16:59, 11 December 2007 (GMT)

My (limited) experience is that mixed code is frequently messy, nasty and best avoided if possible. Plus, only Alex has any real experience with C or C++... Could we have a general determinant type based on a reference determinant, which is regarded as "special" (i.e. contains the full list of electrons)? --james 14:06, 12 December 2007 (GMT)

The idea is to allow things to be as general as possible without incurring computational overhead. So, for example, for the 2-vertex star one only needs double excitations from a reference determinant. i.e. one only needs to store a reference, and then the ij->ab excitation for each determinant. However, one can base such a star on multiple determinants (especially in the case where we have degenerate systems like metals), at which point you'll need to specify a different reference. Further, I want to develop the idea of an 'active space' of determinants from which one might excite. As much of the manipulation for this should go on 'under-the-hood', so that you can deal with a 'Determiant' without know quite how it is specified. This leads to nice generally applicable code, without having to rewrite for specific cases. --alex 17:28, 12 December 2007 (GMT)

I tend to agree with above - I guess though that as we get more adventurous, the solution which minimises memory requirements will be key - Being able to specify an active space would be good - a la CASSCF - would there be subtleties here when doing MC and normalising graphs when some excitations within a vertex level lie outside the active space? --ghb24 17:51, 12 December 2007 (GMT)

  • Documentation. Experience shows that documentation must be added at the time the code is written. If interfaces are not well-defined when writing code (i.e. the code is very experimental, and you're not quite sure what it can/will do) then keep it out of the repository. Once code is checked into the repository it should be documented and understandable (both in the interface and the general algorithm). References to papers in the code are good. If a concept is too difficult to describe in the code, then refer to a document kept with the code (otherwise it will be lost). --alex 16:59, 11 December 2007 (GMT)

Rather, a much better practice would be to use a development branch for experimental work, and only commit clean, well-commented code to the main repository. I will still carry out the nightly tests of the main (stable) repository, but there should really be no breakage. If anyone wishes, I can provide the code to carry out regular tests of a branch. --james 14:06, 12 December 2007 (GMT)

So, last night I was doing some development on ajax, and then realised the systems I wanted to test were on destiny, so I had to transfer the development across. To do this I checked the code in (after running the short purely NECI test suite), and then upped my destiny version. It turns out that what I had coded was utter rubbish (which is why I needed the tests), but as it was new, it shouldn't affect anyone. Eventually I put an updated version in. This seems to be somewhat of an abuse of the repository, however, my other options were:

  • scp the code across. This would lead to two copies of experimental code lying around, which I disliked the idea of.
  • create a new branch, and use that to get it across. This seemed more complicated than I would want to consider to test some experimental ideas.

I suspect the solution is to have my own branch which I can use for experimental things and transfers, and then sync those to the trunk when they're ready. This seems like effort however, but it may be necessary --alex 17:28, 12 December 2007 (GMT)


  • Documentation 2. I've been wondering how to enforce this, and it seems that the best way may be to get someone else to read through the code and add extra comments to it (or complain about it) once checked in (or before) - not sure whether this is a good idea - it might mean more work, but it will ensure a well-commented codebase. --alex 16:59, 11 December 2007 (GMT)

Agreed. I think it's worth the extra work. Also, as we modify code, commenting on the existing codebase would be A Good Thing. --james 14:06, 12 December 2007 (GMT)

How about all newly-committed code emailed to the other developers for 'peer-review' on its documentation! --ghb24 17:51, 12 December 2007 (GMT)

  • Programmers' Reference. There are automatic programmers' reference generation systems which take comments from the code and turn them into a TeX reference manual. doxygen is one such system. Does anyone know of it? --alex 16:59, 11 December 2007 (GMT)

Such a resource would be very helpful for new people joining the group/working on areas of the code not written by you. Doxygen cannot parse Fortran code, sadly. I assume there's a similar resource available for Fortran... --james 14:06, 12 December 2007 (GMT)

It might well be worth rewritting the INPUT_DOC into a .tex file as well as the code comments and expanding it, so that maths can easily be inserted into it, and references sorted easily. I can give that a go if people think its worthwhile. --ghb24 17:51, 12 December 2007 (GMT)

The beauty of generating a programmers' reference from the source is that it reduces duplicate effort: reading the source is easy because of the commenting, and the documentation is complete from the comments. It's a neat solution and one I'll investigate.--james 17:58, 12 December 2007 (GMT)

A further thought on this: a good practice would be to comment at the beginning of each routine what its purpose is (I know that this is done in many places, but certainly not all!).--james 18:01, 12 December 2007 (GMT)

  • Scalability: we are (well, I am) looking at larger and larger systems. There are various places in the code (eg uhfdet), where a maximum system size is hard-coded in and causes seg faults etc. Allocatable arrays are the way forward...--james 16:18, 18 December 2007 (GMT)