Difference between revisions of "Connecting Sub-databases"

From Docswiki
Jump to navigation Jump to search
Line 15: Line 15:
 
=== Requirements ===
 
=== Requirements ===
   
A folder containing the files min.data, points.min, points.ts, ts.data, dinfo and the binary '''disconnectionDPS''', plus any other auxiliary files you may need.
+
A folder containing the files min.data, points.min, points.ts, ts.data, dinfo and the binary [disconnectionDPS], plus any other auxiliary files you may need.

Revision as of 16:30, 14 May 2019

Definitions

For the purposes of this tutorial, I am defining sub-databases to be sets of connected minima and transition states within a larger database.

Context and Motivation

In databases containing many thousands of minima and TSs, it is unlikely that these will all be connected to one another. This is particularly the case when the database has been grown using such methods as ADDPATH and MERGEDB. Instead, the database is more likely to consist of many sub-databases of varying size. Therefore, when constructing a disconnectivity graph - which cannot plot more than one set of connected minima (i.e. more than one sub-database) at a time - a lot of data present in the min.data, points.min, points.ts and ts.data files gets ignored. The sub-database which the disconnectivity graph plots depends on the numerical argument to the keyword CONNECTMIN chosen in the dinfo file. These numerical arguments correspond to minima, as listed in the min.data file. For example, an argument of 12 corresponds to line 12 of the min.data file. Therefore, only this minimum plus any others it is connected to gets plotted on the disconnectivity graph.

The question, therefore, is how to efficiently connect minima already present in the min.data file. It would be particularly important to connect sub-databases with a lot of minima in them (it would probably be a waste of time to connect all those sub-databases with only 2 minima in them for example as by doing so you’re not collecting much more information).

Another consideration to make is that we want the connection attempts between sub-databases to be efficient. Namely, we want to try to connect sub-databases which are closer to one another (or, more specifically, sub-databases which have at least one minimum which is close in chemical space to a minimum in another sub-database). This is especially important for large systems (such as large proteins with cofactors) as trying to connect minima far apart in space can be very slow or even break down due to memory issues.

Step 1: Using disconnectionDPS to determine the breakdown of sub-databases within your database

Requirements

A folder containing the files min.data, points.min, points.ts, ts.data, dinfo and the binary [disconnectionDPS], plus any other auxiliary files you may need.