首页 >> 仿真软件说明 >>ABACUS >>朱雪刚版 >> 4.1.3.1 首先来分析DP势函数的准确性。

详细内容

4.1.3.1 首先来分析DP势函数的准确性。

其中inter000000/00.train是DeePMD做训练的文件夹里边包括了 000/ 001/ 002/ 003/四个训练势函数的文件。每个文件下都有用前边训练集训练的frozen_model.pb势函数和记录损失函数的lcurve.out。

损失函数画图

首先对某个训练的lcurve.out作图，作图脚本可参考《DFT 到 MD｜超详细「深度势能」材料计算上手指南》

#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt

data = np.genfromtxt("/home/zxg/BeCu/dpgen/run/iter.000000/00.train/000/lcurve.out", names=True)
for name in data.dtype.names[1:-1]:
    plt.plot(data['step'], data[name], label=name)
plt.legend()
plt.xlabel('Step')
plt.ylabel('Loss')
plt.xscale('log')
plt.yscale('log')
plt.savefig('000_lcurve.png')

关于损失函数分析经验目前不足，目前来看结果貌似不好，需要长训一下，（暂不明白各个数值的具体含义，后补）

如果想画多次迭代中各个DP势函数的loss函数可以在python脚本中加一个for循环进行分别画图

#!/usr/bin/env python
import os
import numpy as np
import matplotlib.pyplot as plt

for i in range(0, 9):
    for j in range(0, 4):
        file_path = f"/home/zxg/BeCu/dpgen/run/iter.00000{i}/00.train/00{j}/lcurve.out"
        data = np.genfromtxt(file_path, names=True)
        for name in data.dtype.names[1:-1]:
            plt.plot(data['step'], data[name], label=name)
        plt.legend()
        plt.xlabel('Step')
        plt.ylabel('Loss')
        plt.yscale('log')
        plt.xscale('log')
        plt.savefig(f'inter{i}_{j}_lcurve_nolog.png')
        print(f'finish the plt of inter{i}_{j}_lcurve')
        plt.clf()

通过

python you_script_name.py

执行脚本，则可以在python所在的文件夹内画出对应的图。

力和能量差值对比--DFT和dp2024.04.08

对比势函数是否准确的首要核对信息就是力和能量，特别是原子受力的计算。

dptest来测试势函数质量

将前边计算获得的frozen_model.pb复制到计划做测试的文件夹下。然后运行dp test进行势函数质量的检测，由于要激活DeePMD环境，因此我把bash命令写到脚本中：然后运行bash script.sh即可执行，跟命令行一行行输入是一样的。

#!/bin/bash
source activate deepmd
echo "clean the frozen_model.pb file"
rm -rf frozen_model.pb
cp /home/zxg/BeCu/dpgen/run/iter.000006/00.train/000/frozen_model.pb ./frozen_model.pb
dp test -m frozen_model.pb -s /home/zxg/BeCu/dpgennew/deepmd_file/Be108_fcc -n 154 -d dptest_06.txt > log_rmse.out 2>&1

运行完成dp test命令之后会在文件下生成能量和力偏差文件，以及屏幕输出的包括rmse的文件，如下

(base) zxg@zxg:~/BeCu/dpgen/run/dptest$ tree
.
├── dptest_06.e.out
├── dptest_06.e_peratom.out
├── dptest_06.f.out
├── dptest_06.v.out
├── dptest_06.v_peratom.out
├── dptest.sh
├── frozen_model.pb
├── log_rmse.out

分别是能量、单原子能量、力、应力压力、单原子应力压力数据对比数据。用python或origin作图即可显示力和能量的差值。python脚本参考《DFT 到 MD｜超详细「深度势能」材料计算上手指南》

#!/usr/bin/env python
import numpy as np
import matplotlib.pyplot as plt

# 定义绘制散点图和对角线的函数
def plot(ax, data, key, xlabel, ylabel, min_val, max_val):
    data_key = f'data_{key}'
    pred_key = f'pred_{key}'
    ax.scatter(data[data_key], data[pred_key], label=key, s=6)
    ax.legend()
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.set_xlim(min_val, max_val)
    ax.set_ylim(min_val, max_val)
    ax.plot([min_val, max_val], [min_val, max_val], 'r', lw=1)

# 读取数据，并对e数据进行原子化处理
natom = 16
data_e = np.genfromtxt("/home/zxg/BeCu/dpgen/run/train_properties_test/results.e.out", names=["data_e", "pred_e"])
data_f = np.genfromtxt("/home/zxg/BeCu/dpgen/run/train_properties_test/results.f.out", names=["data_fx", "data_fy", "data_fz", "pred_fx", "pred_fy", "pred_fz"])

for col in ['data_e', 'pred_e']:
    data_e[col] /= natom

# 计算e和f的最小值和最大值
data_e_stacked = np.column_stack((data_e['data_e'], data_e['pred_e']))
data_f_stacked = np.column_stack((data_f['data_fx'], data_f['data_fy'], data_f['data_fz'], data_f['pred_fx'], data_f['pred_fy'], data_f['pred_fz']))

min_val_e, max_val_e = np.min(data_e_stacked), np.max(data_e_stacked)
min_val_f, max_val_f = np.min(data_f_stacked), np.max(data_f_stacked)

# 绘制散点图并保存结果
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
plot(axs[0], data_e, 'e', 'DFT energy (eV/atom)', 'DP energy (eV/atom)', min_val_e, max_val_e)
for force_direction in ['fx', 'fy', 'fz']:
    plot(axs[1], data_f, force_direction, 'DFT force (eV/Å)', 'DP force (eV/Å)', min_val_f, max_val_f)
plt.savefig('energy_force.png')

我测试的是一个准确率最低的构型Be108atom fcc。测试结果如下显示，可以明显看出力和能量偏差还是比较大的，目前还在迭代中。（结果比较差可能是训练参数设置的原因，但更多的可能是初始训练集存在一定的错误，具体如何按构型来挑选结构，以及确定那些构型不稳定，我暂时还不会，但dp的工作人员是可以搞定的，后补）

另外，只画对角线图对于某些元素比例变化数量较大的体系，显示结果不太准，这时候把对应值的差值作图画出会更加直观，需要修改python脚本来做相关图，目前计划是画成跟rdf或探索时力的偏差类似形式的图。

（python脚本后补，在DeePMD的入门培训中也提供了相似的脚本可作出如下力的偏差值，因收费相关暂且不在本教程内加入）课程链接《DeePMD：从入门到精通》给个课程内的效果图

为了节约手动执行时间，把上边的脚本和bash命令放到一个python脚本中，以便用循环对所有的数据集作图分析。

现在我把bash下的命令放到了python脚本中，并用python对不同的体系作图（脚本修改自其他教程，非原创），运行下边脚本需要在DeePMD的环境下，因此需要先激活环境

conda activate deepmd

然后用python调用以下脚本，进行势函数质量检测。

python plt_dpgen_f_e_diff_dp_dft.py

运行之后会直接生成训练集质量检测的图片。

#!/usr/bin/env python
import os
import subprocess
import numpy as np
import matplotlib.pyplot as plt
#subprocess.run(['conda', 'init'])
#subprocess.run(['conda', 'activate', 'deepmd'])
for i in range(8):
    for j in range(4):
        
        dft_dp_path = '/home/zxg/BeCu/dpgennew/deepmd_file/'
        init_fp_list = os.listdir(dft_dp_path)

        for file in init_fp_list:
            folder_list = os.path.join(dft_dp_path, file)
            energy_raw_file = os.path.join(folder_list, 'deepmd/energy.raw')
            print(f"now test for {file}")
            with open(energy_raw_file, "r") as energy_raw:
                lines = energy_raw.readlines()
            num_frame = str(len(lines))
            print(f"there are {num_frame} frame for dft")

            dp_file_path = f"/home/zxg/BeCu/dpgen/run/iter.00000{i}/00.train/00{j}/frozen_model.pb"
            results = f"results_{file}_inter{i}_model{j}"
            print(f"now do the dp test for {results}")

            subprocess.run(['dp', 'test', '-m', dp_file_path, '-s', folder_list, '-n', num_frame, '-d', results], check=True)

            # 定义绘制散点图和对角线的函数
            def plot(ax, data, key, xlabel, ylabel, min_val, max_val):
                data_key = f'data_{key}'
                pred_key = f'pred_{key}'
                ax.scatter(data[data_key], data[pred_key], label=key, s=6)
                ax.legend()
                ax.set_xlabel(xlabel)
                ax.set_ylabel(ylabel)
                ax.set_xlim(min_val, max_val)
                ax.set_ylim(min_val, max_val)
                ax.plot([min_val, max_val], [min_val, max_val], 'r', lw=1)
            
            # 读取数据，并对e数据进行原子化处理
            type_raw_file = os.path.join(folder_list, 'deepmd/type.raw')
            with open(type_raw_file, "r") as type_raw:
                lines = type_raw.readlines()
            natom = len(lines)
            print(f"there are {natom} for dft")
            results_e = f"{results}.e.out"
            results_f = f"{results}.f.out"
            data_e = np.genfromtxt(results_e, names=["data_e", "pred_e"])
            data_f = np.genfromtxt(results_f, names=["data_fx", "data_fy", "data_fz", "pred_fx", "pred_fy", "pred_fz"])
            
#            for col in ['data_e', 'pred_e']:
#                data_e[col] /= natom
            
            # 计算e和f的最小值和最大值
            data_e_stacked = np.column_stack((data_e['data_e'], data_e['pred_e']))
            data_f_stacked = np.column_stack((data_f['data_fx'], data_f['data_fy'], data_f['data_fz'], data_f['pred_fx'], data_f['pred_fy'], data_f['pred_fz']))
            
            min_val_e, max_val_e = np.min(data_e_stacked), np.max(data_e_stacked)
            min_val_f, max_val_f = np.min(data_f_stacked), np.max(data_f_stacked)
            
            # 绘制散点图并保存结果
            fig, axs = plt.subplots(1, 2, figsize=(12, 5))
            plot(axs[0], data_e, 'e', 'DFT energy (eV/atom)', 'DP energy (eV/atom)', min_val_e, max_val_e)
            for force_direction in ['fx', 'fy', 'fz']:
                plot(axs[1], data_f, force_direction, 'DFT force (eV/Å)', 'DP force (eV/Å)', min_val_f, max_val_f)
            plt.savefig(f'{results}.png')
            plt.clf()
#subprocess.run(['conda', 'deactivate'])

dpdata测试势函数质量 2024.04.14

在《快速开始 DeePMD-kit｜训练甲烷深度势能分子动力学模型》中有用dpdata做准确性对比的脚本，且脚本行数更少，这里也给出来，原文件只计算了能量。

注：要运行dpdata的预测功能需要调用DeePMD，因此先要确保在当前环境下DeePMD可运行

可激活DeePMD的欢迎，

conda activate deepmd

#!/usr/bin/env python
import dpdata
import matplotlib.pyplot as plt
import numpy as np

training_systems = dpdata.LabeledSystem("/home/zxg/BeCu/dpgennew/deepmd_file/Be54_hcp/deepmd", fmt = "deepmd/npy")  # 得到训练数据点
predict = training_systems.predict("/home/zxg/BeCu/dpgen/run/iter.000000/00.train/000/frozen_model.pb")  # 得到预测数据点

plt.scatter(training_systems["energies"], predict["energies"])

x_range = np.linspace(plt.xlim()[0], plt.xlim()[1])

plt.plot(x_range, x_range, "r--", linewidth = 0.25)
plt.xlabel("Energy of DFT")  # 设置 x 轴标题
plt.ylabel("Energy predicted by deep potential")  # 设置 y 轴标题
plt.savefig('inter00.png')

经过咨询人工智能大模型后将以上脚本修改为做能量，力各个分量的python脚本

#!/usr/bin/env python
import dpdata
import matplotlib.pyplot as plt
import numpy as np

# 读取训练数据点
training_systems = dpdata.LabeledSystem("/home/zxg/BeCu/dpgennew/deepmd_file/Be54_hcp/deepmd", fmt = "deepmd/npy")
predict = training_systems.predict("/home/zxg/BeCu/dpgen/run/iter.000000/00.train/000/frozen_model.pb")

# 提取训练和预测的力分量
train_forces = training_systems["forces"]
predict_forces = predict["forces"]

# 提取 x、y、z 轴力的分量
train_forces_x = [force[0] for sublist in train_forces for force in sublist]
train_forces_y = [force[1] for sublist in train_forces for force in sublist]
train_forces_z = [force[2] for sublist in train_forces for force in sublist]

predict_forces_x = [force[0] for sublist in predict_forces for force in sublist]
predict_forces_y = [force[1] for sublist in predict_forces for force in sublist]
predict_forces_z = [force[2] for sublist in predict_forces for force in sublist]

# 绘制散点图
plt.figure(figsize=(12, 9))  # 设置figure的大小

plt.subplot(2, 2, 1)
plt.scatter(training_systems["energies"], predict["energies"], label='energy', color='blue', marker='o')
plt.xlabel('Energy of DFT')
plt.ylabel('Energy of DeepMD')

# 绘制 x 轴力的散点图
plt.subplot(2, 2, 2)
plt.scatter(train_forces_x, predict_forces_x, label='X Forces', color='blue', marker='o')
plt.xlabel('Forces x of DFT')
plt.ylabel('Forces x of DeepMD')

# 绘制 y 轴力的散点图
plt.subplot(2, 2, 3)
plt.scatter(train_forces_y, predict_forces_y, label='Y Forces', color='green', marker='x')
plt.xlabel('Forces y of DFT')
plt.ylabel('Forces y of DeepMD')

# 绘制 z 轴力的散点图
plt.subplot(2, 2, 4)
plt.scatter(train_forces_z, predict_forces_z, label='Z Forces', color='red', marker='^')
plt.xlabel('Forces z of DFT')
plt.ylabel('Forces z of DeepMD')

x_range = np.linspace(plt.xlim()[0], plt.xlim()[1])

plt.plot(x_range, x_range, "r--", linewidth = 0.25)
# 添加图例
plt.legend()

# 设置图片的分辨率和保存路径
plt.savefig('inter00_f_e_diff_test.png', dpi=300)

由于对角线图显示不是很准确，因此也可以改成差值图，这里为了显示范围更明显对差值进行了排序。

#!/usr/bin/env python
import dpdata
import matplotlib.pyplot as plt
import numpy as np

# 读取训练数据点
training_systems = dpdata.LabeledSystem("/home/zxg/BeCu/dpgennew/deepmd_file/Be54_hcp/deepmd", fmt = "deepmd/npy")
predict = training_systems.predict("/home/zxg/BeCu/dpgen/run/iter.000000/00.train/000/frozen_model.pb")

# 提取训练和预测的力分量
train_forces = training_systems["forces"]
predict_forces = predict["forces"]
train_energies = training_systems["energies"]
predict_energies = predict["energies"]

# 提取 x、y、z 轴力的分量
train_forces_x = [force[0] for sublist in train_forces for force in sublist]
train_forces_y = [force[1] for sublist in train_forces for force in sublist]
train_forces_z = [force[2] for sublist in train_forces for force in sublist]

predict_forces_x = [force[0] for sublist in predict_forces for force in sublist]
predict_forces_y = [force[1] for sublist in predict_forces for force in sublist]
predict_forces_z = [force[2] for sublist in predict_forces for force in sublist]

# 计算差值
energydiff = np.array(train_energies) - np.array(predict_energies)
forcediff_x = np.array(train_forces_x) - np.array(predict_forces_x)
forcediff_y = np.array(train_forces_y) - np.array(predict_forces_y)
forcediff_z = np.array(train_forces_z) - np.array(predict_forces_z)

# 创建一个figure
plt.figure(figsize=(12, 9))  # 设置figure的大小

# 对差值进行排序
sorted_indices_energy = np.argsort(energydiff)
sorted_indices_x = np.argsort(forcediff_x)
sorted_indices_y = np.argsort(forcediff_y)
sorted_indices_z = np.argsort(forcediff_z)

# 提取排序后的差值数组
sorted_energydiff = energydiff[sorted_indices_energy]
sorted_forcediff_x = forcediff_x[sorted_indices_x]
sorted_forcediff_y = forcediff_y[sorted_indices_y]
sorted_forcediff_z = forcediff_z[sorted_indices_z]

# 绘制能量差值图
plt.subplot(2, 2, 1)
plt.plot(sorted_energydiff, color='blue', label='Energy Difference (DFT - DeepMD)')
plt.title('Energy Difference')
plt.xlabel('Data Points')
plt.ylabel('Energy Difference (DFT - DeepMD)')
plt.legend()

# 绘制 x 轴力差值图
plt.subplot(2, 2, 2)
plt.plot(sorted_forcediff_x, color='green', label='Forces Difference x (DFT - DeepMD)')
plt.title('Forces Difference x')
plt.xlabel('Data Points')
plt.ylabel('Forces Difference x (DFT - DeepMD)')
plt.legend()

# 绘制 y 轴力差值图
plt.subplot(2, 2, 3)
plt.plot(sorted_forcediff_y, color='red', label='Forces Difference y (DFT - DeepMD)')
plt.title('Forces Difference y')
plt.xlabel('Data Points')
plt.ylabel('Forces Difference y (DFT - DeepMD)')
plt.legend()

# 绘制 z 轴力差值图
plt.subplot(2, 2, 4)
plt.plot(sorted_forcediff_z, color='blue', label='Forces Difference y (DFT - DeepMD)')
plt.title('Forces Difference y')
plt.xlabel('Data Points')
plt.ylabel('Forces Difference y (DFT - DeepMD)')
plt.legend()

# 设置图片的分辨率和保存路径
plt.tight_layout()  # 调整子图之间的间距
plt.savefig('inter00_f_e_direct_value.png', dpi=300)

还可以改成力和能量差值的直方图，具体分析，后补

#!/usr/bin/env python
import dpdata
import matplotlib.pyplot as plt
import numpy as np

# 读取训练数据点
training_systems = dpdata.LabeledSystem("/home/zxg/BeCu/dpgennew/deepmd_file/Be54_hcp/deepmd", fmt = "deepmd/npy")
predict = training_systems.predict("/home/zxg/BeCu/dpgen/run/iter.000000/00.train/000/frozen_model.pb")

# 提取训练和预测的力分量
train_forces = training_systems["forces"]
predict_forces = predict["forces"]
train_energies = training_systems["energies"]
predict_energies = predict["energies"]

# 提取 x、y、z 轴力的分量
train_forces_x = [force[0] for sublist in train_forces for force in sublist]
train_forces_y = [force[1] for sublist in train_forces for force in sublist]
train_forces_z = [force[2] for sublist in train_forces for force in sublist]

predict_forces_x = [force[0] for sublist in predict_forces for force in sublist]
predict_forces_y = [force[1] for sublist in predict_forces for force in sublist]
predict_forces_z = [force[2] for sublist in predict_forces for force in sublist]

# 计算绝对差值
energydiff = np.abs(np.array(train_energies) - np.array(predict_energies))
forcediff_x = np.abs(np.array(train_forces_x) - np.array(predict_forces_x))
forcediff_y = np.abs(np.array(train_forces_y) - np.array(predict_forces_y))
forcediff_z = np.abs(np.array(train_forces_z) - np.array(predict_forces_z))

# 使用 np.histogram 绘制直方图
hist_energy, bin_edges= np.histogram(energydiff, bins=100)
hist_x, bin_edges  = np.histogram(forcediff_x, bins=100)
hist_y, bin_edges  = np.histogram(forcediff_y, bins=100)
hist_z, bin_edges  = np.histogram(forcediff_z, bins=100)

# 创建一个figure
plt.figure(figsize=(12, 9))  # 设置figure的大小

# 创建子图来绘制直方图
plt.subplot(2, 2, 1)
plt.bar(bin_edges[:-1], hist_energy, width=bin_edges[1] - bin_edges[0], color='blue', alpha=0.7)
plt.xlabel('Absolute Difference Energy (|DFT - DeepMD|)')
plt.ylabel('Number of Data Points')

# 创建子图来绘制 x 轴力的直方图
plt.subplot(2, 2, 2)
plt.bar(bin_edges[:-1], hist_x, width=bin_edges[1] - bin_edges[0], color='green', alpha=0.7)
plt.xlabel('Absolute Difference forces x (|DFT - DeepMD|)')
plt.ylabel('Number of Data Points')

# 创建子图来绘制 y 轴力的直方图
plt.subplot(2, 2, 3)
plt.bar(bin_edges[:-1], hist_y, width=bin_edges[1] - bin_edges[0], color='red', alpha=0.7)
plt.xlabel('Absolute Difference forces y (|DFT - DeepMD|)')
plt.ylabel('Number of Data Points')

# 创建子图来绘制 z 轴力的直方图
plt.subplot(2, 2, 4)
plt.bar(bin_edges[:-1], hist_z, width=bin_edges[1] - bin_edges[0], color='blue', alpha=0.7)
plt.xlabel('Absolute Difference forces z (|DFT - DeepMD|)')
plt.ylabel('Number of Data Points')

# 设置图片的分辨率和保存路径
plt.savefig('inter00_f_e_diff_histogram.png')

另外，可以修改dp test的脚本实现相似的功能。

注：在Bohrium的《DeePMD：从入门到精通》的课程中也有相关的修改脚本可以参考（目前收费才能观看2024.4.14）里边有个极端条件的脚本把力的差值和对应的力的绝对值联系在一起进行画图，比上边只画力的差值图效果更加明显。

作者：朱雪刚邮箱:xuegangzhu@qq.com；工作单位：石家庄学院理学院/北京科学智能研究院（AISI）访问学者2023.07-2024.09，访问导师北京大学陈默涵; 徐张满仓邮箱: xuzhangmancang@dp.tech

截止2024.07.21录制视频教程已上传至Bohrium的课程《DeePMD应用案例讲解：铜原子掺杂铍晶体的机器学习势函数拟合过程演示》网址： https://bohrium.dp.tech/courses/1075495070?tab=courses 后续会在Bohrium平台更新

注意：后续的更新，大部分会在Bohrium课程平台进行，请看教程入门的同学加入课程进行学习；且录制的视频课程会把一些个人观点给加入，从个人观点来看比文字教程的内容更多

写教程内容讨论QQ群:143276924 DPGEN+ABACUS教程准备；

ABACUS软件的QQ群：759914681，群内有专职开发人员，目前ABACUS提问问题以github的issue为主，群内可作为辅助提问。

欢迎大家推广本教程，让更多的dp入门学习者有个参考，目前2024.09.16我联系的微信公众号推广是 lammps加油站的小马老师（我也报名了小马老师的一对一辅导）。当然也看到lammps爱好者在转发推广。感谢大家公众号的推广。

上一篇4.1.3.2 dp训练质量测试 2024.04.15下一篇4.1.3 dp质量分析2024.04.07

4.1.3.1 首先来分析DP势函数的准确性。

损失函数画图

力和能量差值对比--DFT和dp2024.04.08

dptest来测试势函数质量

dpdata测试势函数质量 2024.04.14