|
|
Study of data mining research based on real-world obstetrical medical records diagnosis text |
MA Yinyao1 BI Wenshuai2▲ MAO Jinjiang3 MENG Chenwei2 LYU Hanlin2 WANG Lei2#br# |
1.Department of Obstetrics,Guangxi Zhuang Autonomous Region People's Hospital,Guangxi Zhuang Autonomous Region,Nanning 530000,China;
2.Institute of Biointelligence Technology,BGI Research-Shenzhen,Guangdong Province,Shenzhen 518083,China;
3.Department of Obstetrics,Guigang City People's Hospital,Guangxi Zhuang Autonomous Region,Guigang 537000,China |
|
|
Abstract Objective The medical record diagnostic texts of obstetrics are essentially important for scientific research but are difficult to extract.This paper presents a combinatorial algorithm to automatically extract standard diagnostic terms from the diagnostic texts and can be applied in different hospitals' obstetrics.Methods A combined algorithm was proposed as method.First,the MC-BERT model was trained based on the labeled corpus,and the trained model was used to standardize the terms.Then,the Louvain algorithm was used to classify redundant terms and automatically output scientific research diagnostic terms.Result The term normalization of the combined algorithm achieved an F1 of 0.923 5 on the test set,and could automatically cluster 1 107 standard diagnostic terms into 106 scientific research diagnostic terms.The combined algorithm was also validated on the validation set of another hospital,and the F1 of the term normalization algorithm reached 0.909 4.Conclusion This method can efficiently obtain scientific research diagnostic terms in batches from the diagnostic texts of medical records,and the trained model can be applied in many hospitals'obstetrics.
|
|
|
|
|
|
|
|