NLP-Based De-Identification Techniques for Patient Data Anonymization

Authors

  • Veerendra Nath Jasthi USA Author

DOI:

https://doi.org/10.47363/8kmq6z78

Keywords:

De-identification, Patient Anonymization, Natural Language Processing (NLP), Electronic Health Records (EHR)

Abstract

Electronic health records (EHR) Patient data in the form of electronic health records are sensitive and personal so there are legal structures to protect such things such as HIPAA. De-identification of such data is enough to guarantee the privacy of such information, allowing it to be utilized in medical studies and the creation of AI models. Natural Language Processing (NLP) has become an effective method of automating the de-identification of unstructured clinical narratives. This paper discusses the different NLP-based de-identification techniques, rule-based, machine learning models, and deep learning approaches. These approaches are compared, and the hybrid model will be created wherein Named Entity Recognition (NER) will be combined with BERT-based contextual models. Precision, recall, and F1-score are assessment measures applied to benchmark datasets. Findings show that hybrid NLP techniques are more generally accurate and generalized. The study helps in enhancing privacy of data in healthcare as the study allows useful anonymization of textual records of patients.

Author Biography

  • Veerendra Nath Jasthi, USA

    Veerendra Nath Jasthi, USA

Downloads

Published

2023-06-20

How to Cite

NLP-Based De-Identification Techniques for Patient Data Anonymization. (2023). Journal of Artificial Intelligence & Cloud Computing, 2(2), 1-6. https://doi.org/10.47363/8kmq6z78

Similar Articles

1-10 of 191

You may also start an advanced similarity search for this article.