\(\renewcommand{\AA}{\text{Å}}\)

fitpod command

Syntax

fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod

fitpod = style name of this command
Ta_param.pod = an input file that describes proper orthogonal descriptors (PODs)
Ta_data.pod = an input file that specifies DFT data used to fit a POD potential
Ta_coefficients.pod (optional) = an input file that specifies trainable coefficients of a POD potential

Examples

fitpod Ta_param.pod Ta_data.pod
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod

Description

Added in version 22Dec2022.

Fit a machine-learning interatomic potential (ML-IAP) based on proper orthogonal descriptors (POD); please see (Nguyen and Rohskopf), (Nguyen2023), (Nguyen2024), and (Nguyen and Sema) for details. The fitted POD potential can be used to run MD simulations via pair_style pod.

Two input files are required for this command. The first input file describes a POD potential parameter settings, while the second input file specifies the DFT data used for the fitting procedure. All keywords except species have default values. If a keyword is not set in the input file, its default value is used. The table below has one-line descriptions of all the keywords that can be used in the first input file (i.e. Ta_param.pod)

Keyword	Default	Type	Description
species	(none)	STRING	Chemical symbols for all elements in the system and have to match XYZ training files.
pbc	1 1 1	INT	three integer constants specify boundary conditions
rin	0.5	REAL	a real number specifies the inner cut-off radius
rcut	5.0	REAL	a real number specifies the outer cut-off radius
bessel_polynomial_degree	4	INT	the maximum degree of Bessel polynomials
inverse_polynomial_degree	8	INT	the maximum degree of inverse radial basis functions
number_of_environment_clusters	1	INT	the number of clusters for environment-adaptive potentials
number_of_principal_components	2	INT	the number of principal components for dimensionality reduction
onebody	1	BOOL	turns on/off one-body potential
twobody_number_radial_basis_functions	8	INT	number of radial basis functions for two-body potential
threebody_number_radial_basis_functions	6	INT	number of radial basis functions for three-body potential
threebody_angular_degree	5	INT	angular degree for three-body potential
fourbody_number_radial_basis_functions	4	INT	number of radial basis functions for four-body potential
fourbody_angular_degree	3	INT	angular degree for four-body potential
fivebody_number_radial_basis_functions	0	INT	number of radial basis functions for five-body potential
fivebody_angular_degree	0	INT	angular degree for five-body potential
sixbody_number_radial_basis_functions	0	INT	number of radial basis functions for six-body potential
sixbody_angular_degree	0	INT	angular degree for six-body potential
sevenbody_number_radial_basis_functions	0	INT	number of radial basis functions for seven-body potential
sevenbody_angular_degree	0	INT	angular degree for seven-body potential

Note that both the number of radial basis functions and angular degree must decrease as the body order increases. The next table describes all keywords that can be used in the second input file (i.e. Ta_data.pod in the example above):

Keyword	Default	Type	Description
file_format	extxyz	STRING	only the extended xyz format (extxyz) is currently supported
file_extension	xyz	STRING	extension of the data files
path_to_training_data_set	(none)	STRING	specifies the path to training data files in double quotes
path_to_test_data_set	“”	STRING	specifies the path to test data files in double quotes
path_to_environment_configuration_set	“”	STRING	specifies the path to environment configuration files in double quotes
fraction_training_data_set	1.0	REAL	a real number (<= 1.0) specifies the fraction of the training set used to fit POD
randomize_training_data_set	0	BOOL	turns on/off randomization of the training set
fraction_test_data_set	1.0	REAL	a real number (<= 1.0) specifies the fraction of the test set used to validate POD
randomize_test_data_set	0	BOOL	turns on/off randomization of the test set
fitting_weight_energy	100.0	REAL	a real constant specifies the weight for energy in the least-squares fit
fitting_weight_force	1.0	REAL	a real constant specifies the weight for force in the least-squares fit
fitting_regularization_parameter	1.0e-10	REAL	a real constant specifies the regularization parameter in the least-squares fit
error_analysis_for_training_data_set	0	BOOL	turns on/off error analysis for the training data set
error_analysis_for_test_data_set	0	BOOL	turns on/off error analysis for the test data set
basename_for_output_files	pod	STRING	a basename string added to the output files
precision_for_pod_coefficients	8	INT	number of digits after the decimal points for numbers in the coefficient file
group_weights	global	STRING	`table` uses group weights defined for each group named by filename

All keywords except path_to_training_data_set have default values. If a keyword is not set in the input file, its default value is used. After successful training, a number of output files are produced, if enabled:

<basename>_training_errors.pod reports the errors in energy and forces for the training data set
<basename>_training_analysis.pod reports detailed errors for all training configurations
<basename>_test_errors.pod reports errors for the test data set
<basename>_test_analysis.pod reports detailed errors for all test configurations
<basename>_coefficients.pod contains the coefficients of the POD potential

After training the POD potential, Ta_param.pod and <basename>_coefficients.pod are the two files needed to use the POD potential in LAMMPS. See pair_style pod for using the POD potential. Examples about training and using POD potentials are found in the directory lammps/examples/PACKAGES/pod and the Github repo https://github.com/cesmix-mit/pod-examples.

Loss Function Group Weights

The group_weights keyword in the data.pod file is responsible for weighting certain groups of configurations in the loss function. For example:

group_weights table
Displaced_A15 100.0 1.0
Displaced_BCC 100.0 1.0
Displaced_FCC 100.0 1.0
Elastic_BCC   100.0 1.0
Elastic_FCC   100.0 1.0
GSF_110       100.0 1.0
GSF_112       100.0 1.0
Liquid        100.0 1.0
Surface       100.0 1.0
Volume_A15    100.0 1.0
Volume_BCC    100.0 1.0
Volume_FCC    100.0 1.0

This will apply an energy weight of 100.0 and a force weight of 1.0 for all groups in the Ta example. The groups are named by their respective filename. If certain groups are left out of this table, then the globally defined weights from the fitting_weight_energy and fitting_weight_force keywords will be used.

POD Potential

We consider a multi-element system of N atoms with \(N_{\rm e}\) unique elements. We denote by \(\boldsymbol r_n\) and \(Z_n\) position vector and type of an atom n in the system, respectively. Note that we have \(Z_n \in \{1, \ldots, N_{\rm e} \}\), \(\boldsymbol R = (\boldsymbol r_1, \boldsymbol r_2, \ldots, \boldsymbol r_N) \in \mathbb{R}^{3N}\), and \(\boldsymbol Z = (Z_1, Z_2, \ldots, Z_N) \in \mathbb{N}^{N}\). The total energy of the POD potential is expressed as \(E(\boldsymbol R, \boldsymbol Z) = \sum_{i=1}^N E_i(\boldsymbol R_i, \boldsymbol Z_i)\), where

\[E_i(\boldsymbol R_i, \boldsymbol Z_i) \ = \ \sum_{m=1}^M c_m \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)\]

Here \(c_m\) are trainable coefficients and \(\mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)\) are per-atom POD descriptors. Summing the per-atom descriptors over \(i\) yields the global descriptors \(d_m(\boldsymbol R, \boldsymbol Z) = \sum_{i=1}^N \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)\). It thus follows that \(E(\boldsymbol R, \boldsymbol Z) = \sum_{m=1}^M c_m d_m(\boldsymbol R, \boldsymbol Z)\).

The per-atom POD descriptors include one, two, three, four, five, six, and seven-body descriptors, which can be specified in the first input file. Furthermore, the per-atom POD descriptors also depend on the number of environment clusters specified in the first input file. Please see (Nguyen2024) and (Nguyen and Sema) for the detailed description of the per-atom POD descriptors.

Training

A POD potential is trained using the least-squares regression against density functional theory (DFT) data. Let \(J\) be the number of training configurations, with \(N_j\) being the number of atoms in the j-th configuration. The training configurations are extracted from the extended XYZ files located in a directory (i.e., path_to_training_data_set in the second input file). Let \(\{E^{\star}_j\}_{j=1}^{J}\) and \(\{\boldsymbol F^{\star}_j\}_{j=1}^{J}\) be the DFT energies and forces for \(J\) configurations. Next, we calculate the global descriptors and their derivatives for all training configurations. Let \(d_{jm}, 1 \le m \le M\), be the global descriptors associated with the j-th configuration, where \(M\) is the number of global descriptors. We then form a matrix \(\boldsymbol A \in \mathbb{R}^{J \times M}\) with entries \(A_{jm} = d_{jm}/ N_j\) for \(j=1,\ldots,J\) and \(m=1,\ldots,M\). Moreover, we form a matrix \(\boldsymbol B \in \mathbb{R}^{\mathcal{N} \times M}\) by stacking the derivatives of the global descriptors for all training configurations from top to bottom, where \(\mathcal{N} = 3\sum_{j=1}^{J} N_j\).

The coefficient vector \(\boldsymbol c\) of the POD potential is found by solving the following least-squares problem

\[{\min}_{\boldsymbol c \in \mathbb{R}^{M}} \ w_E \|\boldsymbol A \boldsymbol c - \bar{\boldsymbol E}^{\star} \|^2 + w_F \|\boldsymbol B \boldsymbol c + \boldsymbol F^{\star} \|^2 + w_R \|\boldsymbol c \|^2,\]

where \(w_E\) and \(w_F\) are weights for the energy (fitting_weight_energy) and force (fitting_weight_force), respectively; and \(w_R\) is the regularization parameter (fitting_regularization_parameter). Here \(\bar{\boldsymbol E}^{\star} \in \mathbb{R}^{J}\) is a vector of with entries \(\bar{E}^{\star}_j = E^{\star}_j/N_j\) and \(\boldsymbol F^{\star}\) is a vector of \(\mathcal{N}\) entries obtained by stacking \(\{\boldsymbol F^{\star}_j\}_{j=1}^{J}\) from top to bottom.

Validation

POD potential can be validated on a test dataset in a directory specified by setting path_to_test_data_set in the second input file. It is possible to validate the POD potential after the training is complete. This is done by providing the coefficient file as an input to fitpod, for example,

fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod

Restrictions

This command is part of the ML-POD package. It is only enabled if LAMMPS was built with that package. See the Build package page for more info.

Default

The keyword defaults are also given in the description of the input files.

(Nguyen and Rohskopf) Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023).

(Nguyen2023) Nguyen, Physical Review B, 107(14), 144103, (2023).

(Nguyen2024) Nguyen, Journal of Computational Physics, 113102, (2024).

(Nguyen and Sema) Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024).