Code standard

General considerations

The PySCF code base is designed to provide a convenient environment for the development of new computational methods, ranging from proof-of-concept implementations to calculations on moderate size systems. Our emphasis is first on simplicity, next on generality, and last on efficiency. We favor implementations that have clear structure, with optimization only at Python level. If Python performance becomes a major bottleneck, parts of the implementation can be written in C to improve efficiency. The following guidelines (not strict rules!) have been followed in the development of PySCF. Please refer to them when suggesting new contributions.

  • 90/10 functional/OOP, unless performance critical, functions are pure.

  • 90/10 Python/C, only computational hot spots were written in C.

  • To extend python function with C:

    • Except complex numbers and variable length array, following C89 (gnu89) standard for C code. http://flash-gordon.me.uk/ansi.c.txt

    • Following C89 (gnu89) standard for C code;

    • Using ctypes to call C functions

  • Conservative on advanced language feature.

  • Minimal dependence principle

    • Minimal requirements on 3rd party program or libraries.

    • Loose-coupling between modules so that the failure of one module can have minimal effects on other modules.

    • Third party Python library imports need either back-up implementations or error/exception handling to avoid breaking the import chain

  • Guidelines for use of external C and Fortran libraries within C extensions to PySCF. The extensions are compiled and linked into PySCF, with compile/link flags resolved by CMake. - BLAS, FFTW: Yes. - LAPACK: Yes, but not recommended. LAPACK can be used in the PySCF C-level library. However, we recommend restructuring your code by moving all linear algebra and sparse matrix operations to NumPy operations in pure Python. - MPI and other parallel libraries: No. MPI communications should be implemented in Python through the MPI4py library.

  • Code format. Code should comply with the [PEP8](https://www.python.org/dev/peps/pep-0008/) style.

Naming conventions

  • The prefix or suffix underscore in the function names have special meanings

    • functions with prefix-underscore like _fn are private functions. They are typically not documented, and not recommended to use.

    • functions with suffix-underscore like fn_ means that they have side effects. The side effects include the change of the input arguments, the runtime modification of the class definitions (attributes or members), or module definitions (global variables or functions) etc.

    • regular (pure) functions do not have underscore as the prefix or suffix.

API conventions

  • gto.Mole (or gto.Cell for PBC calculations) holds all global parameters, like the log level, the max memory usage etc. They are used as the default values for all other classes.

  • Class for quantum chemistry models or algorithms

    • Most QC method classes (like HF, CASSCF, FCI, …) have three attributes verbose, stdout and max_memory which are copied directly from gto.Mole (or gto.Cell. Overwriting these attributes only affects the behavior of the local instance for that method class. In the following example, mf.verbose mutes all messages produced by RHF method, and the output of MP2 is written in the log file example.log:

      >>> from pyscf import gto, scf, mp
      >>> mol = gto.M(atom='H 0 0 0; H 0 0 1', verbose=5)
      >>> mf = scf.RHF(mol)
      >>> mf.verbose = 0
      >>> mf.kernel()
      >>> mp2 = mp.MP2(mf)
      >>> mp2.stdout = open('example.log', 'w')
      >>> mp2.kernel()
      
    • Method class are only to hold the options or environments (like convergence threshold, max iterations, …) to control the behavior/convergence of the method. Intermediate status at runtime are not supposed to be saved in the method class (in contrast to the object oriented paradigm). However, the final results or outputs can be kept in the method object so that they can be easily accessed in the subsequent steps. We need to assume the attributes for results will be used as default inputs or environments for other objects in the rest parts of the program. The results attributes should be immutable, once they were generated and stored (after calling the kernel() method) in a particular object.

    • In __init__ function, initialize/define the problem size. The problem size parameters (like num_orbitals etc) can be considered as environments. They should be immutable.

    • Kernel functions: Classes for QC models should provide a method kernel() as the entrance/main function. The kernel() function then call other code to finish the calculation. Although not required, it is recommended to let the kernel function return certain key results. If your class is inherited from the pyscf.lib.StreamObject, the class has a method run() which will call the kernel() function and return the object itself. One can simply call the kernel() method or run() method to start the flow of a QC method.

  • Function arguments

    • The first argument is a handler. The handler is one of gto.Mole object, a mean-field object, or a post-Hartree-Fock object.

  • Return value. Create returns for all functions whenever possible. For methods defined in class, return self instead of None if the method does not have particular return values.

Unit Tests and Example Scripts

  • Examples for modules should be placed in the appropriate directory inside the /examples directory. While the examples should be light enough to run on a modest personal computer, the examples should not be trivial. Instead, the point of the examples is to showcase the functionality of the module. The format for naming examples is:

    /examples/name_of_module/XX-function_name.py
    

    where XX is a two-digit numeric string.

  • Test cases are placed in the /test/name_of_module directory and performed with nosetest (https://nose.readthedocs.io/en/latest/). These tests are to ensure the robustness of both simple functions and more complex drivers between version changes.

General designs

Kernel and Stream functions

Every class has the kernel method which serves as the entry or the driver of the method. Once an object of one method was created, you can always call .kernel() to start or restart a calculation.

The return value of kernel method is different for different class. To unify the return value, the package introduces the stream methods to pipe the computing stream. A stream method of an object only return the object itself. There are three general stream methods available for most method classes. They are:

1 .set method to update object attributes, for example:

mf = scf.RHF(mol).set(conv_tol=1e-5)

is identical to two lines of statements:

mf = scf.RHF(mol)
mf.conv_tol = 1e-5

2 .run method to pass the call to the .kernel method. If arguments are presented in .run method, the arguments will be passed to the kernel function. If keyword arguments are given, .run method will first call .set method to update the attributes then execute the .kernel method. For example:

mf = scf.RHF(mol).run(dm_init, conv_tol=1e-5)

is identical to three lines of statements:

mf = scf.RHF(mol)
mf.conv_tol = 1e-5
mf.kernel(dm_init)

3 .apply method to pass the current object (as the first argument) to the given function/class and return a new object. If arguments and keyword arguments are presented, they will all be passed to the function/class. For example:

mc = mol.apply(scf.RHF).run().apply(mcscf.CASSCF, 6, 4, frozen=4)

is identical to:

mf = scf.RHF(mol)
mf.kernel()
mc = mcscf.CASSCF(mf, 6, 4, frozen=4)

Aside from the three general stream methods, the regular class methods may return the objects as well when the methods do not have particular value to return. Using the stream methods, you can evaluate certain quantities with one line of code:

dm = gto.M(atom='H 0 0 0; H 0 0 1') \
.apply(scf.RHF) \
.dump_flags() \
.run() \
.make_rdm1()

Pure function and Class

Class are designed to hold only the final results and the control parameters such as maximum number of iterations, convergence threshold, etc. Intermediates are NOT saved in the class. After calling the .kernel() or .run() method, results will be generated and saved in the object. For example:

from pyscf import gto, scf, ccsd
mol = gto.M(atom='H 0 0 0; H 0 0 1.1', basis='ccpvtz')
mf = scf.RHF(mol).run()
mycc = ccsd.CCSD(mf).run()
print(mycc.e_tot)
print(mycc.e_corr)
print(mycc.t1.shape)
print(mycc.t2.shape)

Many useful functions are defined at both the module level and class level. They can be accessed from either the module functions or the class methods and the return values should be the same:

vj, vk = scf.hf.get_jk(mol, dm)
vj, vk = SCF(mol).get_jk(mol, dm)

Note some module functions may require the class as the first argument.

Most functions and classes are pure, i.e. no intermediate status are held within the classes, and the argument of the methods and functions are immutable during calculations. These functions can be called arbitrary times in arbitrary order and their returns should be always the same.

Exceptions are often suffixed with underscore in the function name, e.g. mcscf.state_average_(mc) where the attributes of mc object may be changed or overwritten by the state_average_ method. Cautious should be taken when you see the functions or methods with ugly suffices.

Global configurations

Global configuration file is a Python script that contains PySCF configurations. When importing pyscf module in a Python program (or Python interpreter), the package will preload the global configuration file and take the configurations as the default values of the parameters of functions or attributes of classes during initialization. For example, the configuration file below detects the available memory in the operate system at the runtime and set the maximum memory for PySCF:

$ cat ~/.pyscf_conf.py
import psutil
total, available, percent, used, free, active, inactive, buffers, cached, shared = psutil.virtual_memory()
MAX_MEMORY = available

By setting MAX_MEMORY in the global configuration file, you don’t need the statement to set the max_memory attribute in every script. The dynamically determined max_memory will be loaded during the program initialization step automatically.

There are two methods to let the PySCF package load the global configurations. One is to create a configuration file .pyscf_conf.py in home directory or in work directory. Another is to set the environment variable PYSCF_CONFIG_FILE which points to the configuration file (with the absolute path). The environment variable PYSCF_CONFIG_FILE has high priority than the configuration file in default locations (home directory or work directory). If environment variable PYSCF_CONFIG_FILE is available, the program will read the configurations from the $PYSCF_CONFIG_FILE. If PYSCF_CONFIG_FILE is not set or the file it points to does not exist, the program will turn to the default location for the file .pyscf_conf.py. If none of the configuration file exists, the program will use the built-in configurations which are generally conservative settings.

In the source code, global configurations are loaded by importing pyscf.__config__ module:

from pyscf import __config__
MAX_MEMORY = getattr(__config__, 'MAX_MEMORY')

Please refer to the source code for the available configurations.

Scanner

Scanner is a function that takes an Mole (or Cell) object as input and return the energy or nuclear gradients of the given Mole (or Cell) object. Scanner can be considered as a shortcut function for a sequence of statements which includes the initialization of a required calculation model with necessary precomputing, next updating the attributes based on the settings of the referred object, then calling kernel function and finally returning results. For example:

cc_scanner = gto.M().apply(scf.RHF).apply(cc.CCSD).as_scanner()
for r in (1.0, 1.1, 1.2):
  print(cc_scanner(gto.M(atom='H 0 0 0; H 0 0 %g'%r)))

An equivalent but slightly complicated code is:

for r in (1.0, 1.1, 1.2):
  mol = gto.M(atom='H 0 0 0; H 0 0 %g'%r)
  mf = scf.RHF(mol).run()
  mycc = cc.CCSD(mf).run()
  print(mycc.e_tot)

There are two types of scanner available in the package. They are energy scanner and nuclear gradients scanner. The example above is the energy scanner. Energy scanner only returns the energy of the given molecular structure while the nuclear gradients scanner returns the nuclear gradients in addition.

Scanner is a special derived object of the caller. Most methods which are defined in the caller class can be used with the scanner object. For example:

mf_scanner = gto.M().apply(scf.RHF).as_scanner()
mf_scanner(gto.M(atom='H 0 0 0; H 0 0 1.2'))
mf_scanner.analyze()
dm1 = mf_scanner.make_rdm1()

mf_grad_scanner = mf_scanner.nuc_grad_method().as_scanner()
mf_grad_scanner(gto.M(atom='H 0 0 0; H 0 0 1.2'))

As shown in the example above, the scanner works pretty close to the relevant class object except that the scanner does not need the kernel or run methods to run a calculation. Given molecule structure, the scanner automatically checks and updates the necessary object dependence and passes the work flow to the kernel method. The computational results are held in the scanner object as the regular class object does.

To make structure of scanner object uniform for all methods, two attributes (.e_tot and .converged) are defined for all energy scanner and three attributes (.e_tot, .de and .converged) are defined for all nuclear gradients scanner.