Notes on MINPERMDIST

From Docswiki
Jump to navigation Jump to search

The subroutine that must not be named. Bringing of headaches. Bane of Friday afternoons. Mother of alignment algorithms. This page aims to collect useful information on how MINPERMDIST works to assist in the inevitable case that you find a problem and need to fix it. You are encouraged to add more detail as appropriate. These notes are based on commit 35132: please keep them up to date.

Purpose

MINPERMDIST takes two configurations and aims to find their optimal alignment with respect to translation, rotation and allowed permutation. This is, in general, a difficult problem. The algorithm is complicated by there being many special cases requiring different treatment.

Imports (l55-l71)

There's a fair bit here. As usual, only add things if you actually need them and always use ONLY.

Inputs and Outputs (l75-l84)

DOUBLE PRECISION, INTENT(INOUT) :: COORDSB(3*NATOMS)

One of the configurations for the matching. This one is the reference and will not be changed in the routine, except by the valine and leucine chirality matching routine if AMBER is in use.

DOUBLE PRECISION, INTENT(INOUT) :: COORDSA(3*NATOMS)

The other configuration for matching. This one will be adjusted to the best match with COORDSB.

INTEGER, INTENT(IN) :: NATOMS

The number of atoms in the configuration. This doesn't have to be the same as the global number of atoms, but problems will arise if it's not and the energy and gradients are calculated.

LOGICAL, INTENT(IN) :: DEBUG

Whether to produce debug printing and do various checks on the coordinates.

DOUBLE PRECISION, INTENT(IN) :: BOXLX, BOXLY, BOXLZ

Dimensions of the orthorhombic periodic boundary conditions, if this is a periodic system.

LOGICAL, INTENT(IN) :: BULKT

Whether this is a periodic system, which matters for translational and rotational matching.

LOGICAL, INTENT(IN) :: TWOD

Whether this is a two-dimensional system, which matters for rotational matching.

DOUBLE PRECISION, INTENT(OUT) :: DISTANCE, DIST2

Gets set to the RMS distance between aligned configurations (DISTANCE) or that distance squared (DIST2).

LOGICAL, INTENT(IN) :: RIGID

Whether the deprecated rigid body framework is in use. Affects the number of coordinates with respect to NATOMS.

DOUBLE PRECISION, INTENT(OUT) :: RMATBEST(3, 3)

Gets set to the rotation matrix mapping COORDSA to its best alignment with COORDSB.

Local Variables (l87-l109)

Lots here too. If you add things for testing, please remove them when you're done. Compiler warnings, particularly with nagfor, can let you know if there are unused local variables. This file currently compiles warning free. Don't be the one to mess it up.

Early forks (l113-l123)

GTHOMSON (generalised Thomson problem, for putting particles on non-Euclidean two-dimensional surfaces) has its own version. Fork to it here and return.

If VARIABLES is set, we only calculate the distance between the two configurations and return.

Generalised rigid body checks (l126-l151)

If DEBUG is on, we attempt to determine whether the routine has been passed CoM and angle-axis coordinates. If all the coordinates beyond the number of degrees of freedom are zero, we suspect this is true and transform to Cartesian coordinates.

If the system used the rigid body framework and is in bulk, we fork to the specialised routine and return.

Initial energy and gradient checks (l164-l187)

If DEBUG is on, here we calculate the energy and gradients for both configurations. We print out the energy and RMS force for information, and issue a warning if the configuration is not a stationary point.

If NATOMS is not the same as the global NATOMS, this section can cause a segmentation fault, as POTENTIAL uses the global NATOMS. It looks like the only cases where this might be a problem so far are when using GAUSSIAN and when checking the mapping from Cartesian to rigid body coordinates. The second case is indicated by TRANSFORMCHECK. In these cases, we don't do the check. The energies and gradients are probably meaningless anyway if we're not using a full system.

Rigid body forks (l189-l211)

Another couple of forks into specialised routines. The first case: when using the generalised rigid body framework, but the alignment is to be carried out over only a subset of the bodies, indicated by ALIGNRBST, which corresponds to the keyword ALIGNRBS; the second case, when using the deprecated rigid body angle-axis scheme (RBAAT) and permutational alignment is required (PERMDIST).

AMBER chiral checks (l218-l231)

If we're using any of the AMBER variants, we do some chiral fiddling here. The structures are rotationally aligned. Any prochiral centres are permuted into the same permutations and the structures are realigned.

Alignment without permutation (l233-l288)

This block is for when we don't want to do any permutational alignment (indicated by PERMDIST). Basically, we fork to another routine that does purely the rotational alignment. As ever, the deprecated rigid body angle-axis scheme (RBAAT) has its special routine. For bulk, the origins of both configurations are put to the coordinates of atom 1 and the distance is calculated. In other cases, the routine NEWMINDIST (in newmindist.f90) carries out the rotational alignment.

If no permutations are carried out, this is as far as we go.

Local permutational alignment (l289-315)

If we're using the LPERMDIST scheme, this is where we fork into its special routine. We repeat the energy and gradient checks if DEBUG is set (see #Initial energy and gradient checks (l164-l187)).

Internal coordinates (l317-l351)

If we're using a CHARMM or AMBER, we might choose to use internal coordinates and minimise the distance calculated from the internal torsion angles, specified by INTMINPERMT which corresponds to the INTMINPERM keyword. We call a specialised routine to do the alignment, depending on whether we're using CHARMM or AMBER and whether prochirality flips are allowed. After permutational alignment, we do a final rotational alignment, calculate the distance and return.

Centre of Geometry (l356-l388)

Here we calculate the centres of geometry for the initial configurations. There are cases where translation is not allowed, so these are exempted. Stockmeyer clusters (STOCKT) are a special case because they are using CoM and angle-axis coordinates.

Initialisation (l395-l451)

Some local variables are initialised before we launch into the main loops.

We initialise the best distance to be the distance between the initial configurations after rotational alignment. Naturally, it is possible that we never find a better match. The configurations are copied into the helpfully named DUMMYA and DUMMYB so they can be altered without side-effects.

We initialise the best coordinates to DUMMYA if translation is not allowed, or to the A configuration shifted by the centre of geometry of B is translation is allowed.

The best permutation vector is set to the identity operation.

We set the best rotation matrix to any rotations we've already done and set the local rotation matrices to identity operations.

Main loops start (l455-l468)

Here we enter various loops that span most of the rest of the algorithm. These loops are oh so helpfully done with GOTOs rather than explicit DO or WHILE loops.

NRANROT relates to using random starting orientations.

INVERT is for trying both inversion isomers.

Anything labelled a variant of NCHOOSE is to do with the orbits used for the standard alignment in MYORIENT.

BMTEST refers to the keyword ATOMMATCHDIST, which is used for bulk systems.

OPNUM is for counting the point group operations of a cube, for the keyword OHCELL.

Main loop resets (l476-l491)

We reset everything necessary for each loop iteration. The local coordinate copies are set back to their originals and the permutation vector is reset to the identity.

OHCELLT is a special case and DUMMYA is set to the original configuration with the next cube point group operation applied.

Standard alignment (l508-l727)

This is a very long section that does some translational and rotational matching. There are lots of special cases, so it's broken down a bit below.

Bulk alignment (l510-l630)

The first case dealt with is bulk systems. Further detail is required here.

Trap Potential (l631-l650)

This happens if MKTRAPT is set, which refers to the undocumented keyword TRAPMK. It seems we set various matrices to the identity, apply an inversion if necessary and calculate the distance.

Stockmeyer clusters (l651-l666)

A special case due to using CoM and angle axis coordinates, as well as the alignment of dipoles. Inversion is applied, configurations are put into a standard orientation and the distance is calculated.

Everything else (l667-l727)

Inversion (l668-l675)

If an inversion is specified, here we either invert the structure, or apply a reflection in some special cases, such as two-dimensional systems.

Random rotation (l676-716)

One option is to try random orientations as well as the standard orientation, as specified by the keyword RANROT. NRANROT tells us the total number or orientations to try and NROTDONE is the iteration we're on. Both configurations have their centres of geometry moved to the origin, then a random rotation is applied to DUMMYA by a call to RANROT.

Standard Orientation (l717-l722)

The other option is to just use the standard orientation, coded in MYORIENT, which is called on both configurations. The centres of geometry are moved to the origin and a rotation is applied depending on the values of the orbit parameters.

Initial distance (l723-l726)

Finally, we calculate the initial distance after standard orientation before launching into permutations. It is possible we won't find anything better.

Permutational matching loop (l751-l912)

Now we enter the permutational alignment section. The basic idea is to go through the groups and permute to the best match. Then reapply the orientational matching and get the best distance for that permutation. Rinse and repeat. The loop is controlled with a GOTO on label 10 and the counter NTRIES. The loop is broken down below.

Initialisation (l751-l761)

We set the rotation matrix from this loop to the identity before entering. Then we enter the loop.

At the start of each iteration, we set the current permutation (NEWPERM) to the identity. This is the identity with respect to the the permutation contained in ALLPERM, not with respect to the initial configuration. NDUMMY keeps track of the index in the permutational group list.

Loop over groups (l774-l856)

Now we enter a loop over the different permutational groups. NPERMGROUP gives the number of groups and J1 is the loop index. PATOMS is set to the number of atoms in this group. The list of atoms in all the groups is kept as a single array (PERMGROUP) and the indices of the atoms in the current groups go from NDUMMY + 1 to NDUMMY + PATOMS.

We create subsets of the coordinates in PDUMMYA and PDUMMYB. These contain only the coordinates of the atoms in the current group. The PDUMMYA set is based on the current best permutation. The PDUMMYB set is based on the unpermuted second coordinate set.

Now we call MINPERM which finds the best arrangement of the atoms in the group. The best permutation is returned in the LPERM array. SAVEPERM is set to NEWPERM with the new permutation applied. This should work even if the atoms in the group have already been permuted by an earlier group.

Next we have to update the permutation list if there are any dependent subgroups. Since we can't guarantee these atoms haven't already been moved, we have to search the permutation vector for their current indices. The indices of the subgroups are stored in the three-dimensional SETS array. The first index of this array is the primary atom on which the subset is dependent. The second index is the permutational group we're currently working on. The third dimension contains the list of dependent atoms. The switching is done as a series of swaps, iterating through the list and putting the correct atom into the current place. This may not be the most efficient method, but it has the advantage of working.

Finally we update NEWPERM and NDUMMY, before trying the next group.

Extra permutaions(l860)

Sometimes we want groups where not all permutations are allowed. Examples would be rigid bodies where rotations of a body are allowed, but completely messing up the order is not. These are specified by the EXTRAPERM and we fork to that routine here.

Updates to permutation vector and coordinates (l870-l912)

ALLPERM is updated with the new swaps. ALLPERM is not reset: this is a cumulative permutation. Stockmeyer and old angle-axis coordinates are a special case yet again.

DUMMYA has the new permutation applied too and the distance is recalculated.

End of attempts loop (l948-l959)

Having found our best current permutation, we call NEWMINDIST again to find the best rotational alignment given the current permutation. The distance and cumulative rotation matrix are updated. If we've got more tries to go, we jump back to the start of the loop (l738). If we've exceeded MAXIMUMTRIES, we move on.

Update best configuration (l960-l995)

Now we've found the best rotational and permutational alignment we can for the current iteration of the main loops (#Main loops start (l455-l468)). If we've improved the best distance, we update the best configuration, best permutation and best rotation stores. These bests refer to what needs to be done to COORDSA to get it as close as possible to COORDSB.

There's a commented out check here to see whether the combined effect of the best operations on COORDSA reproduces the best configuration (XBEST).

Main loop terminations (l1000-l1031)

Now we reach the end of the main loops (#Main loops start (l455-l468)).

If we've done better than GEOMDIFFTOL we can call them the same and exit. Actually it looks like we don't exit prematurely unless we've done 100 times better than GEOMDIFFTOL as even if we're going to call them the same, we still want the best match. 50 is the GOTO label to escape the main loops.

We jump back to the start of the main loops for different random rotations, the other inversion, other cubic point group operations and different values of the NCHOOSE variants.

There are several special cases that mean we won't try inversion, not least NOINVERSION.

Coordinate fiddles (l1032-l1083)

The best coordinates in XBEST are the best match with the standard orientation of configuration B produced by MYORIENT, not the original COORDSB. We undo that change with ROTINVBEST and shift the centre of geometry to that of configuration B if translation is allowed. Then we recalculate the distance between the best coordinates and configuration B, to check it matches our current best distance.

Generate best rotation matrix (l1085-l1091)

We need to apply the ROTINVBEST transformation to our best rotation matrix too, generating the final ROTMATBEST which is returned from the routine. There are special cases where rotation was not allowed.

Trap potential (l1112-l1132)

Another special case for the trap potentials, here EYTRAP which corresponds to the documented TRAP keyword. It looks like this is to do with disallowing translation.

Set return values (l1134-1135)

We can now set COORDSA to our best match. This coordinate set is returned from the routine.

The distance is sqrt'd as above it has always been calculated as the distance squared, but we return the distance.

AMBER chiral checks (l1155-l1164)

A final chiral matching and rotational alignment seems to be necessary for the AMBER variants.

Final energy and gradient checks (l1169-l1192)

If DEBUG is on, another check of the energy and gradients is performed. See #Initial energy and gradient checks (l164-l187)