\(\renewcommand{\AA}{\text{Å}}\)
fitpod command
Syntax
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
fitpod = style name of this command
Ta_param.pod = an input file that describes proper orthogonal descriptors (PODs)
Ta_data.pod = an input file that specifies DFT data used to fit a POD potential
Ta_coefficients.pod (optional) = an input file that specifies trainable coefficients of a POD potential
Examples
fitpod Ta_param.pod Ta_data.pod
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
Description
Added in version 22Dec2022.
Fit a machine-learning interatomic potential (ML-IAP) based on proper orthogonal descriptors (POD); please see (Nguyen and Rohskopf), (Nguyen2023), (Nguyen2024), and (Nguyen and Sema) for details. The fitted POD potential can be used to run MD simulations via pair_style pod.
Two input files are required for this command. The first input file
describes a POD potential parameter settings, while the second input
file specifies the DFT data used for the fitting procedure. All keywords
except species have default values. If a keyword is not set in the
input file, its default value is used. The table below has one-line
descriptions of all the keywords that can be used in the first input
file (i.e. Ta_param.pod
)
Keyword |
Default |
Type |
Description |
---|---|---|---|
species |
(none) |
STRING |
Chemical symbols for all elements in the system and have to match XYZ training files. |
pbc |
1 1 1 |
INT |
three integer constants specify boundary conditions |
rin |
0.5 |
REAL |
a real number specifies the inner cut-off radius |
rcut |
5.0 |
REAL |
a real number specifies the outer cut-off radius |
bessel_polynomial_degree |
4 |
INT |
the maximum degree of Bessel polynomials |
inverse_polynomial_degree |
8 |
INT |
the maximum degree of inverse radial basis functions |
number_of_environment_clusters |
1 |
INT |
the number of clusters for environment-adaptive potentials |
number_of_principal_components |
2 |
INT |
the number of principal components for dimensionality reduction |
onebody |
1 |
BOOL |
turns on/off one-body potential |
twobody_number_radial_basis_functions |
8 |
INT |
number of radial basis functions for two-body potential |
threebody_number_radial_basis_functions |
6 |
INT |
number of radial basis functions for three-body potential |
threebody_angular_degree |
5 |
INT |
angular degree for three-body potential |
fourbody_number_radial_basis_functions |
4 |
INT |
number of radial basis functions for four-body potential |
fourbody_angular_degree |
3 |
INT |
angular degree for four-body potential |
fivebody_number_radial_basis_functions |
0 |
INT |
number of radial basis functions for five-body potential |
fivebody_angular_degree |
0 |
INT |
angular degree for five-body potential |
sixbody_number_radial_basis_functions |
0 |
INT |
number of radial basis functions for six-body potential |
sixbody_angular_degree |
0 |
INT |
angular degree for six-body potential |
sevenbody_number_radial_basis_functions |
0 |
INT |
number of radial basis functions for seven-body potential |
sevenbody_angular_degree |
0 |
INT |
angular degree for seven-body potential |
Note that both the number of radial basis functions and angular degree
must decrease as the body order increases. The next table describes all
keywords that can be used in the second input file (i.e. Ta_data.pod
in the example above):
Keyword |
Default |
Type |
Description |
---|---|---|---|
file_format |
extxyz |
STRING |
only the extended xyz format (extxyz) is currently supported |
file_extension |
xyz |
STRING |
extension of the data files |
path_to_training_data_set |
(none) |
STRING |
specifies the path to training data files in double quotes |
path_to_test_data_set |
“” |
STRING |
specifies the path to test data files in double quotes |
path_to_environment_configuration_set |
“” |
STRING |
specifies the path to environment configuration files in double quotes |
fraction_training_data_set |
1.0 |
REAL |
a real number (<= 1.0) specifies the fraction of the training set used to fit POD |
randomize_training_data_set |
0 |
BOOL |
turns on/off randomization of the training set |
fraction_test_data_set |
1.0 |
REAL |
a real number (<= 1.0) specifies the fraction of the test set used to validate POD |
randomize_test_data_set |
0 |
BOOL |
turns on/off randomization of the test set |
fitting_weight_energy |
100.0 |
REAL |
a real constant specifies the weight for energy in the least-squares fit |
fitting_weight_force |
1.0 |
REAL |
a real constant specifies the weight for force in the least-squares fit |
fitting_regularization_parameter |
1.0e-10 |
REAL |
a real constant specifies the regularization parameter in the least-squares fit |
error_analysis_for_training_data_set |
0 |
BOOL |
turns on/off error analysis for the training data set |
error_analysis_for_test_data_set |
0 |
BOOL |
turns on/off error analysis for the test data set |
basename_for_output_files |
pod |
STRING |
a basename string added to the output files |
precision_for_pod_coefficients |
8 |
INT |
number of digits after the decimal points for numbers in the coefficient file |
group_weights |
global |
STRING |
|
All keywords except path_to_training_data_set have default values. If a keyword is not set in the input file, its default value is used. After successful training, a number of output files are produced, if enabled:
<basename>_training_errors.pod
reports the errors in energy and forces for the training data set<basename>_training_analysis.pod
reports detailed errors for all training configurations<basename>_test_errors.pod
reports errors for the test data set<basename>_test_analysis.pod
reports detailed errors for all test configurations<basename>_coefficients.pod
contains the coefficients of the POD potential
After training the POD potential, Ta_param.pod
and
<basename>_coefficients.pod
are the two files needed to use the POD
potential in LAMMPS. See pair_style pod for using the
POD potential. Examples about training and using POD potentials are
found in the directory lammps/examples/PACKAGES/pod and the Github repo
https://github.com/cesmix-mit/pod-examples.
Loss Function Group Weights
The group_weights keyword in the data.pod
file is responsible for
weighting certain groups of configurations in the loss function. For
example:
group_weights table
Displaced_A15 100.0 1.0
Displaced_BCC 100.0 1.0
Displaced_FCC 100.0 1.0
Elastic_BCC 100.0 1.0
Elastic_FCC 100.0 1.0
GSF_110 100.0 1.0
GSF_112 100.0 1.0
Liquid 100.0 1.0
Surface 100.0 1.0
Volume_A15 100.0 1.0
Volume_BCC 100.0 1.0
Volume_FCC 100.0 1.0
This will apply an energy weight of 100.0
and a force weight of
1.0
for all groups in the Ta
example. The groups are named by
their respective filename. If certain groups are left out of this table,
then the globally defined weights from the fitting_weight_energy
and
fitting_weight_force
keywords will be used.
POD Potential
We consider a multi-element system of N atoms with \(N_{\rm e}\) unique elements. We denote by \(\boldsymbol r_n\) and \(Z_n\) position vector and type of an atom n in the system, respectively. Note that we have \(Z_n \in \{1, \ldots, N_{\rm e} \}\), \(\boldsymbol R = (\boldsymbol r_1, \boldsymbol r_2, \ldots, \boldsymbol r_N) \in \mathbb{R}^{3N}\), and \(\boldsymbol Z = (Z_1, Z_2, \ldots, Z_N) \in \mathbb{N}^{N}\). The total energy of the POD potential is expressed as \(E(\boldsymbol R, \boldsymbol Z) = \sum_{i=1}^N E_i(\boldsymbol R_i, \boldsymbol Z_i)\), where
Here \(c_m\) are trainable coefficients and \(\mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)\) are per-atom POD descriptors. Summing the per-atom descriptors over \(i\) yields the global descriptors \(d_m(\boldsymbol R, \boldsymbol Z) = \sum_{i=1}^N \mathcal{D}_{im}(\boldsymbol R_i, \boldsymbol Z_i)\). It thus follows that \(E(\boldsymbol R, \boldsymbol Z) = \sum_{m=1}^M c_m d_m(\boldsymbol R, \boldsymbol Z)\).
The per-atom POD descriptors include one, two, three, four, five, six, and seven-body descriptors, which can be specified in the first input file. Furthermore, the per-atom POD descriptors also depend on the number of environment clusters specified in the first input file. Please see (Nguyen2024) and (Nguyen and Sema) for the detailed description of the per-atom POD descriptors.
Training
A POD potential is trained using the least-squares regression against density functional theory (DFT) data. Let \(J\) be the number of training configurations, with \(N_j\) being the number of atoms in the j-th configuration. The training configurations are extracted from the extended XYZ files located in a directory (i.e., path_to_training_data_set in the second input file). Let \(\{E^{\star}_j\}_{j=1}^{J}\) and \(\{\boldsymbol F^{\star}_j\}_{j=1}^{J}\) be the DFT energies and forces for \(J\) configurations. Next, we calculate the global descriptors and their derivatives for all training configurations. Let \(d_{jm}, 1 \le m \le M\), be the global descriptors associated with the j-th configuration, where \(M\) is the number of global descriptors. We then form a matrix \(\boldsymbol A \in \mathbb{R}^{J \times M}\) with entries \(A_{jm} = d_{jm}/ N_j\) for \(j=1,\ldots,J\) and \(m=1,\ldots,M\). Moreover, we form a matrix \(\boldsymbol B \in \mathbb{R}^{\mathcal{N} \times M}\) by stacking the derivatives of the global descriptors for all training configurations from top to bottom, where \(\mathcal{N} = 3\sum_{j=1}^{J} N_j\).
The coefficient vector \(\boldsymbol c\) of the POD potential is found by solving the following least-squares problem
where \(w_E\) and \(w_F\) are weights for the energy (fitting_weight_energy) and force (fitting_weight_force), respectively; and \(w_R\) is the regularization parameter (fitting_regularization_parameter). Here \(\bar{\boldsymbol E}^{\star} \in \mathbb{R}^{J}\) is a vector of with entries \(\bar{E}^{\star}_j = E^{\star}_j/N_j\) and \(\boldsymbol F^{\star}\) is a vector of \(\mathcal{N}\) entries obtained by stacking \(\{\boldsymbol F^{\star}_j\}_{j=1}^{J}\) from top to bottom.
Validation
POD potential can be validated on a test dataset in a directory specified by setting path_to_test_data_set in the second input file. It is possible to validate the POD potential after the training is complete. This is done by providing the coefficient file as an input to fitpod, for example,
fitpod Ta_param.pod Ta_data.pod Ta_coefficients.pod
Restrictions
This command is part of the ML-POD package. It is only enabled if LAMMPS was built with that package. See the Build package page for more info.
Default
The keyword defaults are also given in the description of the input files.
(Nguyen and Rohskopf) Nguyen and Rohskopf, Journal of Computational Physics, 480, 112030, (2023).
(Nguyen2023) Nguyen, Physical Review B, 107(14), 144103, (2023).
(Nguyen2024) Nguyen, Journal of Computational Physics, 113102, (2024).
(Nguyen and Sema) Nguyen and Sema, https://arxiv.org/abs/2405.00306, (2024).