DPGEN在势函数迭代过程会根据"model_devi_jobs": 中的内容来控制迭代,如果要增加迭代次数,就在"model_devi_jobs":中多加一行,再提交DPGEN的计算任务即可续算。同样,如果计算过程中任务失败停止,则需要修改record文件,然后再提交任务进行续算。
"model_devi_jobs": [
{"sys_idx": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 2000, "ensemble": "npt-tri", "_idx": "00"},
{"sys_idx": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "01"},
{"sys_idx": [0], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "02"},
{"sys_idx": [0,4,7], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "03"},
{"sys_idx": [0], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "04"},
{"sys_idx": [0], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "05"},
{"sys_idx": [0,16], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "06"}
],
一般初次尝试建议"model_devi_jobs": 中只添加一到三行的探索内容,等DPGEN运行处部分结果后先分析势函数的准确性再进行后续的迭代。
第一次迭代过程
以我的案例为例,初次运行DPGEN的迭代我的设置是:
"model_devi_jobs": [
{"sys_idx": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 2000, "ensemble": "npt-tri", "_idx": "00"}
],
即第一次LAMMPS探索步长设置小为2000,第二次设置成10000。由于初次DPGEN的迭代我这里的力偏差上下限是0.05-0.35,导致很多结构都在备选构型中。
力的上下限偏差设置相关参数
"model_devi_f_trust_lo": 0.05,
"model_devi_f_trust_hi": 0.35,
我的第一次迭代dpgen.log输出如下
(base) zxg@zxg:~/BeCu/dpgen/run$ cat dpgen.log
2024-03-14 06:31:34,184 - INFO : start running
2024-03-14 06:31:34,188 - INFO : =============================iter.000000==============================
2024-03-14 06:31:34,188 - INFO : -------------------------iter.000000 task 00--------------------------
2024-03-14 06:31:34,224 - INFO : -------------------------iter.000000 task 01--------------------------
2024-03-15 01:27:40,850 - INFO : -------------------------iter.000000 task 02--------------------------
2024-03-15 01:27:40,850 - INFO : -------------------------iter.000000 task 03--------------------------
2024-03-15 01:27:40,981 - INFO : -------------------------iter.000000 task 04--------------------------
2024-03-15 04:42:08,985 - INFO : -------------------------iter.000000 task 05--------------------------
2024-03-15 04:42:08,986 - INFO : -------------------------iter.000000 task 06--------------------------
2024-03-15 04:42:09,011 - INFO : system 000 candidate : 2766 in 4000 69.15 %
2024-03-15 04:42:09,012 - INFO : system 000 failed : 179 in 4000 4.47 %
2024-03-15 04:42:09,012 - INFO : system 000 accurate : 1055 in 4000 26.38 %
2024-03-15 04:42:09,017 - INFO : system 000 accurate_ratio: 0.2637 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:11,456 - INFO : system 001 candidate : 1927 in 4000 48.18 %
2024-03-15 04:42:11,456 - INFO : system 001 failed : 79 in 4000 1.98 %
2024-03-15 04:42:11,456 - INFO : system 001 accurate : 1994 in 4000 49.85 %
2024-03-15 04:42:11,461 - INFO : system 001 accurate_ratio: 0.4985 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:14,151 - INFO : system 002 candidate : 1309 in 4000 32.73 %
2024-03-15 04:42:14,151 - INFO : system 002 failed : 15 in 4000 0.38 %
2024-03-15 04:42:14,151 - INFO : system 002 accurate : 2676 in 4000 66.90 %
2024-03-15 04:42:14,156 - INFO : system 002 accurate_ratio: 0.6690 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:14,949 - INFO : system 003 candidate : 2176 in 4000 54.40 %
2024-03-15 04:42:14,949 - INFO : system 003 failed : 39 in 4000 0.97 %
2024-03-15 04:42:14,949 - INFO : system 003 accurate : 1785 in 4000 44.62 %
2024-03-15 04:42:14,954 - INFO : system 003 accurate_ratio: 0.4462 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:16,007 - INFO : system 004 candidate : 1760 in 4000 44.00 %
2024-03-15 04:42:16,007 - INFO : system 004 failed : 33 in 4000 0.83 %
2024-03-15 04:42:16,007 - INFO : system 004 accurate : 2207 in 4000 55.17 %
2024-03-15 04:42:16,012 - INFO : system 004 accurate_ratio: 0.5517 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:17,464 - INFO : system 005 candidate : 1774 in 4000 44.35 %
2024-03-15 04:42:17,464 - INFO : system 005 failed : 4 in 4000 0.10 %
2024-03-15 04:42:17,464 - INFO : system 005 accurate : 2222 in 4000 55.55 %
2024-03-15 04:42:17,469 - INFO : system 005 accurate_ratio: 0.5555 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:18,267 - INFO : system 006 candidate : 2641 in 4000 66.03 %
2024-03-15 04:42:18,267 - INFO : system 006 failed : 6 in 4000 0.15 %
2024-03-15 04:42:18,267 - INFO : system 006 accurate : 1353 in 4000 33.83 %
2024-03-15 04:42:18,272 - INFO : system 006 accurate_ratio: 0.3382 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:19,724 - INFO : system 007 candidate : 1812 in 4000 45.30 %
2024-03-15 04:42:19,724 - INFO : system 007 failed : 16 in 4000 0.40 %
2024-03-15 04:42:19,724 - INFO : system 007 accurate : 2172 in 4000 54.30 %
2024-03-15 04:42:19,729 - INFO : system 007 accurate_ratio: 0.5430 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:21,176 - INFO : system 008 candidate : 2204 in 4000 55.10 %
2024-03-15 04:42:21,176 - INFO : system 008 failed : 3 in 4000 0.07 %
2024-03-15 04:42:21,176 - INFO : system 008 accurate : 1793 in 4000 44.82 %
2024-03-15 04:42:21,181 - INFO : system 008 accurate_ratio: 0.4482 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:22,633 - INFO : system 009 candidate : 2376 in 4000 59.40 %
2024-03-15 04:42:22,633 - INFO : system 009 failed : 0 in 4000 0.00 %
2024-03-15 04:42:22,633 - INFO : system 009 accurate : 1624 in 4000 40.60 %
2024-03-15 04:42:22,638 - INFO : system 009 accurate_ratio: 0.4060 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:24,087 - INFO : system 010 candidate : 3306 in 4000 82.65 %
2024-03-15 04:42:24,087 - INFO : system 010 failed : 12 in 4000 0.30 %
2024-03-15 04:42:24,087 - INFO : system 010 accurate : 682 in 4000 17.05 %
2024-03-15 04:42:24,092 - INFO : system 010 accurate_ratio: 0.1705 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:25,135 - INFO : system 011 candidate : 2839 in 4000 70.97 %
2024-03-15 04:42:25,135 - INFO : system 011 failed : 0 in 4000 0.00 %
2024-03-15 04:42:25,135 - INFO : system 011 accurate : 1161 in 4000 29.03 %
2024-03-15 04:42:25,140 - INFO : system 011 accurate_ratio: 0.2903 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:26,275 - INFO : system 012 candidate : 3266 in 4000 81.65 %
2024-03-15 04:42:26,275 - INFO : system 012 failed : 6 in 4000 0.15 %
2024-03-15 04:42:26,275 - INFO : system 012 accurate : 728 in 4000 18.20 %
2024-03-15 04:42:26,280 - INFO : system 012 accurate_ratio: 0.1820 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:27,736 - INFO : system 013 candidate : 3229 in 4000 80.73 %
2024-03-15 04:42:27,736 - INFO : system 013 failed : 15 in 4000 0.38 %
2024-03-15 04:42:27,736 - INFO : system 013 accurate : 756 in 4000 18.90 %
2024-03-15 04:42:27,741 - INFO : system 013 accurate_ratio: 0.1890 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:29,193 - INFO : system 014 candidate : 2418 in 4000 60.45 %
2024-03-15 04:42:29,193 - INFO : system 014 failed : 2 in 4000 0.05 %
2024-03-15 04:42:29,193 - INFO : system 014 accurate : 1580 in 4000 39.50 %
2024-03-15 04:42:29,197 - INFO : system 014 accurate_ratio: 0.3950 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:30,540 - INFO : system 015 candidate : 1973 in 4000 49.33 %
2024-03-15 04:42:30,540 - INFO : system 015 failed : 5 in 4000 0.12 %
2024-03-15 04:42:30,540 - INFO : system 015 accurate : 2022 in 4000 50.55 %
2024-03-15 04:42:30,545 - INFO : system 015 accurate_ratio: 0.5055 thresholds: 0.9980 and 0.9990 eff. task min and max -1 20 number of fp tasks: 20
2024-03-15 04:42:32,750 - INFO : -------------------------iter.000000 task 07--------------------------
2024-03-17 03:18:27,120 - INFO : -------------------------iter.000000 task 08--------------------------
2024-03-17 03:18:29,896 - INFO : failed tasks: 0 in 320 0.00 %
2024-03-17 03:18:29,900 - INFO : =============================iter.000001==============================
2024-03-17 03:18:29,900 - INFO : -------------------------iter.000001 task 00--------------------------
2024-03-17 03:18:30,007 - INFO : -------------------------iter.000001 task 01--------------------------
2024-03-17 22:09:35,041 - INFO : -------------------------iter.000001 task 02--------------------------
2024-03-17 22:09:35,041 - INFO : -------------------------iter.000001 task 03--------------------------
2024-03-17 22:09:35,041 - INFO : finished
第二次迭代过程inter0001
由于第一次力的偏差设置范围太大,第二次迭代,我把力的偏差上下限设置成了0.15-0.35,即param.json内容修改为:
"model_devi_f_trust_lo": 0.15,
"model_devi_f_trust_hi": 0.35,
"model_devi_jobs": [
{"sys_idx": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 2000, "ensemble": "npt-tri", "_idx": "00"},
{"sys_idx": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "01"}
],
重新提交DPGEN的任务,
nohup dpgen run param.json ~/machine.json 1>log 2>err &
运行结束后DPGEN的输出的信息为
2024-03-18 14:33:38,707 - INFO : start running
2024-03-18 14:33:38,771 - INFO : continue from iter 001 task 02
2024-03-18 14:33:38,771 - INFO : =============================iter.000000==============================
2024-03-18 14:33:38,771 - INFO : =============================iter.000001==============================
2024-03-18 14:33:38,771 - INFO : -------------------------iter.000001 task 03--------------------------
2024-03-18 14:33:39,025 - INFO : -------------------------iter.000001 task 04--------------------------
2024-03-19 05:00:05,791 - INFO : -------------------------iter.000001 task 05--------------------------
2024-03-19 05:00:05,791 - INFO : -------------------------iter.000001 task 06--------------------------
2024-03-19 05:00:05,892 - INFO : system 000 candidate : 3473 in 18020 19.27 %
2024-03-19 05:00:05,892 - INFO : system 000 failed : 1 in 18020 0.01 %
2024-03-19 05:00:05,892 - INFO : system 000 accurate : 14546 in 18020 80.72 %
2024-03-19 05:00:05,916 - INFO : system 000 accurate_ratio: 0.8072 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:00:17,907 - INFO : system 001 candidate : 234 in 18020 1.30 %
2024-03-19 05:00:17,907 - INFO : system 001 failed : 0 in 18020 0.00 %
2024-03-19 05:00:17,907 - INFO : system 001 accurate : 17786 in 18020 98.70 %
2024-03-19 05:00:17,927 - INFO : system 001 accurate_ratio: 0.9870 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:00:31,332 - INFO : system 002 candidate : 94 in 18020 0.52 %
2024-03-19 05:00:31,332 - INFO : system 002 failed : 0 in 18020 0.00 %
2024-03-19 05:00:31,332 - INFO : system 002 accurate : 17926 in 18020 99.48 %
2024-03-19 05:00:31,352 - INFO : system 002 accurate_ratio: 0.9948 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:00:35,257 - INFO : system 003 candidate : 159 in 18020 0.88 %
2024-03-19 05:00:35,257 - INFO : system 003 failed : 0 in 18020 0.00 %
2024-03-19 05:00:35,257 - INFO : system 003 accurate : 17861 in 18020 99.12 %
2024-03-19 05:00:35,276 - INFO : system 003 accurate_ratio: 0.9912 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:00:40,488 - INFO : system 004 candidate : 89 in 18020 0.49 %
2024-03-19 05:00:40,488 - INFO : system 004 failed : 0 in 18020 0.00 %
2024-03-19 05:00:40,488 - INFO : system 004 accurate : 17931 in 18020 99.51 %
2024-03-19 05:00:40,508 - INFO : system 004 accurate_ratio: 0.9951 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:00:47,738 - INFO : system 005 candidate : 161 in 18020 0.89 %
2024-03-19 05:00:47,738 - INFO : system 005 failed : 3 in 18020 0.02 %
2024-03-19 05:00:47,738 - INFO : system 005 accurate : 17856 in 18020 99.09 %
2024-03-19 05:00:47,758 - INFO : system 005 accurate_ratio: 0.9909 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:00:51,687 - INFO : system 006 candidate : 169 in 18020 0.94 %
2024-03-19 05:00:51,688 - INFO : system 006 failed : 10 in 18020 0.06 %
2024-03-19 05:00:51,688 - INFO : system 006 accurate : 17841 in 18020 99.01 %
2024-03-19 05:00:51,707 - INFO : system 006 accurate_ratio: 0.9901 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:00:58,865 - INFO : system 007 candidate : 131 in 18020 0.73 %
2024-03-19 05:00:58,865 - INFO : system 007 failed : 3 in 18020 0.02 %
2024-03-19 05:00:58,865 - INFO : system 007 accurate : 17886 in 18020 99.26 %
2024-03-19 05:00:58,885 - INFO : system 007 accurate_ratio: 0.9926 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:06,181 - INFO : system 008 candidate : 179 in 18020 0.99 %
2024-03-19 05:01:06,181 - INFO : system 008 failed : 3 in 18020 0.02 %
2024-03-19 05:01:06,181 - INFO : system 008 accurate : 17838 in 18020 98.99 %
2024-03-19 05:01:06,202 - INFO : system 008 accurate_ratio: 0.9899 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:13,402 - INFO : system 009 candidate : 443 in 18020 2.46 %
2024-03-19 05:01:13,402 - INFO : system 009 failed : 11 in 18020 0.06 %
2024-03-19 05:01:13,402 - INFO : system 009 accurate : 17566 in 18020 97.48 %
2024-03-19 05:01:13,422 - INFO : system 009 accurate_ratio: 0.9748 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:20,658 - INFO : system 010 candidate : 651 in 18020 3.61 %
2024-03-19 05:01:20,658 - INFO : system 010 failed : 2 in 18020 0.01 %
2024-03-19 05:01:20,658 - INFO : system 010 accurate : 17367 in 18020 96.38 %
2024-03-19 05:01:20,678 - INFO : system 010 accurate_ratio: 0.9638 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:25,859 - INFO : system 011 candidate : 888 in 18020 4.93 %
2024-03-19 05:01:25,859 - INFO : system 011 failed : 8 in 18020 0.04 %
2024-03-19 05:01:25,859 - INFO : system 011 accurate : 17124 in 18020 95.03 %
2024-03-19 05:01:25,879 - INFO : system 011 accurate_ratio: 0.9503 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:31,182 - INFO : system 012 candidate : 918 in 18020 5.09 %
2024-03-19 05:01:31,182 - INFO : system 012 failed : 8 in 18020 0.04 %
2024-03-19 05:01:31,182 - INFO : system 012 accurate : 17094 in 18020 94.86 %
2024-03-19 05:01:31,202 - INFO : system 012 accurate_ratio: 0.9486 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:38,391 - INFO : system 013 candidate : 1492 in 18020 8.28 %
2024-03-19 05:01:38,391 - INFO : system 013 failed : 10 in 18020 0.06 %
2024-03-19 05:01:38,391 - INFO : system 013 accurate : 16518 in 18020 91.66 %
2024-03-19 05:01:38,410 - INFO : system 013 accurate_ratio: 0.9166 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:45,586 - INFO : system 014 candidate : 89 in 18020 0.49 %
2024-03-19 05:01:45,586 - INFO : system 014 failed : 10 in 18020 0.06 %
2024-03-19 05:01:45,586 - INFO : system 014 accurate : 17921 in 18020 99.45 %
2024-03-19 05:01:45,606 - INFO : system 014 accurate_ratio: 0.9945 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:01:52,207 - INFO : system 015 candidate : 110 in 18020 0.61 %
2024-03-19 05:01:52,207 - INFO : system 015 failed : 6 in 18020 0.03 %
2024-03-19 05:01:52,207 - INFO : system 015 accurate : 17904 in 18020 99.36 %
2024-03-19 05:01:52,227 - INFO : system 015 accurate_ratio: 0.9936 thresholds: 0.9980 and 0.9990 eff. task min and max -1 60 number of fp tasks: 60
2024-03-19 05:02:01,561 - INFO : -------------------------iter.000001 task 07--------------------------
2024-03-22 13:24:42,480 - INFO : -------------------------iter.000001 task 08--------------------------
这时大部分结构的准确度都在90%以上,只有system 000即(Be108_fcc/scale-1.000/000000/STRU) 的准确性在80%左右,
第三次迭代inter0002以及更多迭代次数
因此我把param.json修改为,即"model_devi_jobs":中增加了一行,第三次迭代只迭代Be_fcc构型,其中有108个Be原子。
"model_devi_jobs": [
{"sys_idx": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 2000, "ensemble": "npt-tri", "_idx": "00"},
{"sys_idx": [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "01"},
{"sys_idx": [0], "temps":[50, 250, 450, 650], "press": [0, 100, 1000, 5000, 10000], "trj_freq": 10, "nsteps": 10000, "ensemble": "npt-tri", "_idx": "02"}
]
第三次迭代LAMMPS探索构型准确率也没超过95%,因此还是继续增加迭代次数。
Tips:在迭代过程中曾修改了训练的紧邻原子数的设置,进而导致拟合很快收敛,因此DeePMD的合适设置有助于势函数的快速拟合,该部分经验后续再补。
"sel": "auto",
最终经过8次迭代之后,dpgen.log文件中的准确率到了98%以上,我再把其他探索构型加入,准确率也挺高,即暂时停止探索集的丰富,进行后续势函数的验证。
作者:朱雪刚 邮箱:xuegangzhu@qq.com; 工作单位:石家庄学院 理学院/北京科学智能研究院(AISI)访问学者2023.07-2024.09,访问导师北京大学陈默涵; 徐张满仓 邮箱: xuzhangmancang@dp.tech