Objective The medical record diagnostic texts of obstetrics are essentially important for scientific research but are difficult to extract.This paper presents a combinatorial algorithm to automatically extract standard diagnostic terms from the diagnostic texts and can be applied in different hospitals' obstetrics.Methods A combined algorithm was proposed as method.First,the MC-BERT model was trained based on the labeled corpus,and the trained model was used to standardize the terms.Then,the Louvain algorithm was used to classify redundant terms and automatically output scientific research diagnostic terms.Result The term normalization of the combined algorithm achieved an F1 of 0.923 5 on the test set,and could automatically cluster 1 107 standard diagnostic terms into 106 scientific research diagnostic terms.The combined algorithm was also validated on the validation set of another hospital,and the F1 of the term normalization algorithm reached 0.909 4.Conclusion This method can efficiently obtain scientific research diagnostic terms in batches from the diagnostic texts of medical records,and the trained model can be applied in many hospitals'obstetrics.
马银瑶;毕文帅; 毛锦江;孟晨伟; 吕翰林; 王 雷. 基于真实世界的产科病案诊断文本的数据挖掘研究[J]. 中国当代医药, 2023, 30(20): 23-28.
MA Yinyao; BI Wenshuai; MAO Jinjiang; MENG Chenwei;LYU Hanlin; WANG Lei. Study of data mining research based on real-world obstetrical medical records diagnosis text. 中国当代医药, 2023, 30(20): 23-28.