NLP-Based De-Identification Techniques for Patient Data Anonymization

Veerendra Nath Jasthi

doi:10.47363/8kmq6z78

Authors

Veerendra Nath Jasthi USA Author

DOI:

https://doi.org/10.47363/8kmq6z78

Keywords:

De-identification, Patient Anonymization, Natural Language Processing (NLP), Electronic Health Records (EHR)

Abstract

Electronic health records (EHR) Patient data in the form of electronic health records are sensitive and personal so there are legal structures to protect such things such as HIPAA. De-identification of such data is enough to guarantee the privacy of such information, allowing it to be utilized in medical studies and the creation of AI models. Natural Language Processing (NLP) has become an effective method of automating the de-identification of unstructured clinical narratives. This paper discusses the different NLP-based de-identification techniques, rule-based, machine learning models, and deep learning approaches. These approaches are compared, and the hybrid model will be created wherein Named Entity Recognition (NER) will be combined with BERT-based contextual models. Precision, recall, and F1-score are assessment measures applied to benchmark datasets. Findings show that hybrid NLP techniques are more generally accurate and generalized. The study helps in enhancing privacy of data in healthcare as the study allows useful anonymization of textual records of patients.

Author Biography

Veerendra Nath Jasthi, USA

Veerendra Nath Jasthi, USA

NLP-Based De-Identification Techniques for Patient Data Anonymization

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

issn

Make a Submission

Information

Latest publications