A Data Representation Model for Personalized Medicine
Résumé
Personalized medicine generates and exploits the patient data, such as genetic compositions, key biomarkers, treatment history, environmental factors, behavioral preferences, and demographics data. The information loss in the transformation process, the data types heterogeneity and the events time series in this set pose a problem in their exploration process. To solve these problems, we propose a data representation model. It considers the structured, temporal and/or non-temporal data and their types "Numeric, Nominal, Date, Boolean". After the data types "Date and Boolean" transformation, we treat the nominal data by dispersion and we apply several clustering techniques with different clusters numbers to control the numeric data distribution. Our work results in three homogeneous representations, these representations have only two dimensions and are easy to explore. Compared to the Symbolic Aggregate Approximation (SAX) technique, our model preserves time-series information, keeps data as much as possible and offers multiple simple representations to explore.
Origine : Fichiers produits par l'(les) auteur(s)