The feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data mining

2.50
Hdl Handle:
http://hdl.handle.net/2336/20532
Title:
The feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data mining
Authors:
Jonsdottir, T; Hvannberg, E.T; Sigurdsson, H; Sigurdsson, S
Citation:
Expert Systems with Applications 2008, 34(1)108-118
Issue Date:
1-Jan-2008
Abstract:
A Predictive Outcome Model (POM) for breast cancer was built, and its ability to accurately predict the (5 year) outcome of an incidence of cancer was assessed. A wide range of different feature selection and classification methods were applied in order to find the best performing algorithms on a given dataset. A special Model Selection Tool, MST, was developed to facilitate the search for the most efficient classifier model. The MST includes programs for choosing different classification algorithms, selecting subsets of features, dealing with imbalance in the data and evaluating the predictive performance by various measures. These steps are important in most data mining tasks and it would be time consuming to conduct them manually. The dataset, Rose, was assembled retroactively for this study and contains data records from 257 women diagnosed with primary breast cancer in Iceland during the years 1996-1998. An extra feature, containing the risk assessment of a doctor was added to the dataset which initially contained 400 features, both to see how much that could enhance the performance of the model and to investigate to what extent such a subjective assessment can be predicted from the remaining features. The main result is that similar performance is achieved regardless of which algorithm is used. Furthermore, the inclusion of the doctor's assessment does not appear to significantly enhance the performance. That is also reflected in the fact that the models are in general more successful in predicting the doctors risk assessment than the actual outcome if resulting Kappa values are compared.
Description:
To access publisher full text version of this article. Please click on the hyperlink in Additional Links field
Additional Links:
http://www.sciencedirect.com/science/article/B6V03-4M0J33F-5/2/70c309e81337f936d97c1873a8d3668a

Full metadata record

DC FieldValue Language
dc.contributor.authorJonsdottir, T-
dc.contributor.authorHvannberg, E.T-
dc.contributor.authorSigurdsson, H-
dc.contributor.authorSigurdsson, S-
dc.date.accessioned2008-03-13T08:40:52Z-
dc.date.available2008-03-13T08:40:52Z-
dc.date.issued2008-01-01-
dc.date.submitted2008-03-13-
dc.identifier.citationExpert Systems with Applications 2008, 34(1)108-118en
dc.identifier.issn0957-4174-
dc.identifier.doi10.1016/j.eswa.2006.08.029-
dc.identifier.urihttp://hdl.handle.net/2336/20532-
dc.descriptionTo access publisher full text version of this article. Please click on the hyperlink in Additional Links fielden
dc.description.abstractA Predictive Outcome Model (POM) for breast cancer was built, and its ability to accurately predict the (5 year) outcome of an incidence of cancer was assessed. A wide range of different feature selection and classification methods were applied in order to find the best performing algorithms on a given dataset. A special Model Selection Tool, MST, was developed to facilitate the search for the most efficient classifier model. The MST includes programs for choosing different classification algorithms, selecting subsets of features, dealing with imbalance in the data and evaluating the predictive performance by various measures. These steps are important in most data mining tasks and it would be time consuming to conduct them manually. The dataset, Rose, was assembled retroactively for this study and contains data records from 257 women diagnosed with primary breast cancer in Iceland during the years 1996-1998. An extra feature, containing the risk assessment of a doctor was added to the dataset which initially contained 400 features, both to see how much that could enhance the performance of the model and to investigate to what extent such a subjective assessment can be predicted from the remaining features. The main result is that similar performance is achieved regardless of which algorithm is used. Furthermore, the inclusion of the doctor's assessment does not appear to significantly enhance the performance. That is also reflected in the fact that the models are in general more successful in predicting the doctors risk assessment than the actual outcome if resulting Kappa values are compared.en
dc.language.isoisen
dc.publisherPergamon Press Ltd.en
dc.relation.urlhttp://www.sciencedirect.com/science/article/B6V03-4M0J33F-5/2/70c309e81337f936d97c1873a8d3668aen
dc.subject.meshBreast Neoplasmsen
dc.subject.meshData Collectionen
dc.titleThe feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data miningis
dc.typeArticleen
dc.contributor.departmentCancer Center for Research and Development, Landspitali-University Hospital, Kopavogsbraut 5-7, 105 Kopavogur, Icelanden
dc.identifier.journalExpert systems with applicationsen
All Items in Hirsla are protected by copyright, with all rights reserved, unless otherwise indicated.