Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing.

2.50
Hdl Handle:
http://hdl.handle.net/2336/227413
Title:
Validation of psoriatic arthritis diagnoses in electronic medical records using natural language processing.
Authors:
Love, Thorvardur Jon; Cai, Tianxi; Karlson, Elizabeth W
Citation:
Semin. Arthritis Rheum. 2011, 40(5):413-20
Issue Date:
Apr-2011
Abstract:
OBJECTIVES: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. METHODS: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. RESULTS: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). CONCLUSIONS: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research.
Description:
To access publisher full text version of this article. Please click on the hyperlink in Additional Links field.
Additional Links:
http://dx.doi.org/10.1016/j.semarthrit.2010.05.002
Rights:
Archived with thanks to Seminars in arthritis and rheumatism

Full metadata record

DC FieldValue Language
dc.contributor.authorLove, Thorvardur Jonen_GB
dc.contributor.authorCai, Tianxien_GB
dc.contributor.authorKarlson, Elizabeth Wen_GB
dc.date.accessioned2012-06-04T11:09:30Z-
dc.date.available2012-06-04T11:09:30Z-
dc.date.issued2011-04-
dc.date.submitted2012-06-04-
dc.identifier.citationSemin. Arthritis Rheum. 2011, 40(5):413-20en_GB
dc.identifier.issn1532-866X-
dc.identifier.pmid20701955-
dc.identifier.doi10.1016/j.semarthrit.2010.05.002-
dc.identifier.urihttp://hdl.handle.net/2336/227413-
dc.descriptionTo access publisher full text version of this article. Please click on the hyperlink in Additional Links field.en_GB
dc.description.abstractOBJECTIVES: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. METHODS: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. RESULTS: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). CONCLUSIONS: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research.en_GB
dc.language.isoenen
dc.publisherW.B. Saundersen_GB
dc.relation.urlhttp://dx.doi.org/10.1016/j.semarthrit.2010.05.002en_GB
dc.rightsArchived with thanks to Seminars in arthritis and rheumatismen_GB
dc.subject.meshAgeden_GB
dc.subject.meshAlgorithmsen_GB
dc.subject.meshArthritis, Psoriaticen_GB
dc.subject.meshElectronic Health Recordsen_GB
dc.subject.meshFemaleen_GB
dc.subject.meshHumansen_GB
dc.subject.meshMaleen_GB
dc.subject.meshMiddle Ageden_GB
dc.subject.meshNatural Language Processingen_GB
dc.subject.meshROC Curveen_GB
dc.subject.meshSensitivity and Specificityen_GB
dc.titleValidation of psoriatic arthritis diagnoses in electronic medical records using natural language processing.en
dc.typeArticleen
dc.contributor.departmentBrigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA. tlove2@partners.orgen_GB
dc.identifier.journalSeminars in arthritis and rheumatismen_GB

Related articles on PubMed

All Items in Hirsla are protected by copyright, with all rights reserved, unless otherwise indicated.