Stochastic Density Fitting
Development of stochastic density fitting approaches
Project aims/abstract
One of the main bottlenecks in practical applications of quantum chemistry is the storage and the AO-MO transformation of two-electron integrals. These processes are highly memory intensive, which makes calculations challenging for systems of chemical interest.
In this project, we implement a tensor-factorisation-based technique, and exploit the inherent sparsity of the resulting intermediates to decrease this requirement. The procedure will be based on the “RI-type” density fitting technique, and we sample a set of small but still non-negligible integrals in order to decrease the number of integrals stored.
The method developed will be tested in the framework of deterministic methods (Hartree-Fock and MP2), and its performance features will be compared to existing techniques. In addition, the technique will be implemented with the coupled cluster quantum Monte Carlo technique, which – requiring only good enough estimators to the exact values – is expected to have a larger tolerance to integral errors, and hence, the application of this technique can provide an even more powerful way of extending feasibility in this framework.
Current state of the project and next steps
Past steps of this development process involved a review on the different types of density fitting techniques, and the choice of a deterministic variant that provided a basis for the stochastic algorithm. In addition, a density fitting Hartree-Fock and MP2 code has been developed in C++, which now is compatible with various proposed algorithms for the stochastic fitting. These explore different combinations of sampling the objects resulting from tensor decomposition, and the effect of contraction order. The algorithm now is based on the Vose-Alias method, and explicitly uses the integral values, which means that this does not yet lead to the desired memory advantage (though the gain in fitting time is currently half of the original technique).
Next steps in the project would be to facilitate integral screening, the estimation of selection probabilities based on bounds (without calculating the explicit integral values), and the parallelisation of the sampling process. After this, we will determine general thresholds for choosing the stochastic integrals based on test set results, and examine the performance of the resulting method both for deterministic and stochastic methods.
Useful skills and knowledge
This project requires the knowledge of the followings (and hence this is what you can prepare with if you would like):
Theoretical
- Familiarity with the integral types of electronic structure theory (including their symmetries), and the efficient process of integral transformation
- Basic notions of linear algebra, and matrix decomposition techniques
- Understanding the mindset of scaling arguments (memory and computational)
- Understanding of Hartree-Fock and MP2 theory, and the basic notions of coupled cluster theory (derivation is not required)
Practical
- Basic understanding of the C++ syntax (or understanding the syntax of another programming language (e.g., Python) and willingness to explore how the other language works)
- Some familiarity with terminal commands, bash scripting, and the VI editor
Learning outcomes
Theoretical
- Navigating electronic structure literature on integrals, and finding relevant information for understanding/implementation purposes
- Knowledge on existing approximation techniques that are extensively used in concurrent literature
- Understanding the context of fitting (where and why we use it in the methods we are interested in, and what the advantage/limitations of the proposed technique are)
- Knowledge on relevant statistical measures for performance testing
Practical
- Knowledge on C++ specific structures
- Familiarity and usage of the OpenMP/MPI parallelisation techniques in practice
- Efficiency optimisation of codes: using relevant matrix operation packages, and appropriate computational algorithms
- Efficient ways of dealing with test sets and extracting data (bash/Python scripting)
- Using Linux-based systems, computer clusters and schedulers
Interesting references
- O. Vahtras, J. Almlöf, and M. W. Feyereisen, Chem. Phys. Lett. 213, 5–6, 514–518 (1993).
- M. Vose, IEEE Transactions on Software Engineering 17, 9, 972–975 (1991).
- Practical account on the alias method
- T. Y. Takeshita, W. A. de Jong, D. Neuhauser, R. Baer, and E. Rabani, J. Chem. Theory Comput. 13, 4605–4610 (2017).