Willkommen, schön sind Sie da!
Logo Ex Libris

Data Mining for Genomics and Proteomics

  • E-Book (pdf)
  • 328 Seiten
(0) Erste Bewertung abgeben
Bewertungen
(0)
(0)
(0)
(0)
(0)
Alle Bewertungen ansehen
Data Mining for Genomics and Proteomics uses pragmatic examples and a complete case study to demonstrate step-by-step how biomedic... Weiterlesen
E-Books ganz einfach mit der kostenlosen Ex Libris-Reader-App lesen. Hier erhalten Sie Ihren Download-Link.
CHF 77.00
Download steht sofort bereit
Informationen zu E-Books
E-Books eignen sich auch für mobile Geräte (sehen Sie dazu die Anleitungen).
E-Books von Ex Libris sind mit Adobe DRM kopiergeschützt: Erfahren Sie mehr.
Weitere Informationen finden Sie hier.

Beschreibung

Data Mining for Genomics and Proteomics uses pragmatic examples and a complete case study to demonstrate step-by-step how biomedical studies can be used to maximize the chance of extracting new and useful biomedical knowledge from data. It is an excellent resource for students and professionals involved with gene or protein expression data in a variety of settings.

Darius M. Dziuda, PhD, is Associate Professor of Data Mining and Statistics in the Department of Mathematical Sciences at Central Connecticut State University (CCSU). His research and professional activities have been focused on efficient data mining of biomedical data and on methods for identification of parsimonious multivariate biomarkers for medical diagnosis, prognosis, personalized medicine, and drug discovery. For CCSU's data mining program, Dr. Dziuda developed and teaches graduate-level courses on Data Mining for Genomics and Proteomics and on Biomarker Discovery.

Autorentext
Darius M. Dziuda, PhD, is Associate Professor of Data Mining and Statistics in the Department of Mathematical Sciences at Central Connecticut State University (CCSU). His research and professional activities have been focused on efficient data mining of biomedical data and on methods for identification of parsimonious multivariate biomarkers for medical diagnosis, prognosis, personalized medicine, and drug discovery. For CCSU's data mining program, Dr. Dziuda developed and teaches graduate-level courses on Data Mining for Genomics and Proteomics and on Biomarker Discovery.

Klappentext
Practical methods for mining gene and protein expression data

Proper analysis and mining of the rapidly growing amount of available genomic and proteomic data is vital for advances in biomedical research. Data Mining for Genomics and Proteomics describes efficient methods for analysis of gene and protein expression data. Dr. Darius Dziuda demonstrates step by step how biomedical studies can and should be performed to maximize the chance of extracting new and useful biomedical knowledge from available data. Readers receive clear guidance on when to use particular data mining methods and why, along with the reasons why some popular approaches can lead to inferior results.

This book covers all aspects of gene and protein expression analysisfrom technology, data preprocessing, quality assessment, and basic exploratory analysis to unsupervised and supervised learning algorithms, feature selection, and biomarker discovery. Also presented is a novel method for identification of the Informative Set of Genes, defined as a set containing all information significant for the differentiation of classes represented in training data. Special attention is given to multivariate biomarker discovery leading to parsimonious and generalizable classifiers. In addition, exercises and examples of hands-on analysis of real-world gene expression data sets give readers an opportunity to put the methods they have learned to practical use.

Data Mining for Genomics and Proteomics is an excellent resource for data mining specialists, bioinformaticians, computational biologists, biomedical scientists, computer scientists, molecular biologists, and life scientists. It is also ideal for upper-level undergraduate and graduate-level students of bioinformatics, data mining, computational biology, and biomedical sciences, as well as anyone interested in efficient methods of knowledge discovery based on high-dimensional data.



Inhalt
Preface.

Acknowledgments.

1 Introduction.

1.1 Basic Terminology.

1.1.1 The Central Dogma of Molecular Biology.

1.1.2 Genome.

1.1.3 Proteome.

1.1.4 DNA (Deoxyribonucleic Acid).

1.1.5 RNA (Ribonucleic Acid).

1.1.6 mRNA (messenger RNA).

1.1.7 Genetic Code.

1.1.8 Gene.

1.1.9 Gene Expression and the Gene Expression Level.

1.1.10 Protein.

1.2 Overlapping Areas of Research.

1.2.1 Genomics.

1.2.2 Proteomics.

1.2.3 Bioinformatics.

1.2.4 Transcriptomics and Other -omics.

1.2.5 Data Mining.

2 Basic Analysis of Gene Expression Microarray Data.

2.1 Introduction.

2.2 Microarray Technology.

2.2.1 Spotted Microarrays.

2.2.2 Affymetrix GeneChip® Microarrays.

2.2.3 Bead-Based Microarrays.

2.3 Low-Level Preprocessing of Assymetrix Microarrays.

2.3.1 MAS5.

2.3.2 RMA.

2.3.3 GCRMA.

2.3.4 PLIER.

2.4 Public Repositories of Microarray Data.

2.4.1 Microarray Gene Expression Data Society (MGED) Standards.

2.4.2 Public Databases.

2.4.2.1 Gene Expression Omnibus (GEO).

2.4.2.2 ArrayExpress.

2.5 Gene Expression Matrix.

2.5.1 Elements of Gene Expression Microarray Data Analysis.

2.6 Additional Preprocessing, Quality Assessment, and Filtering.

2.6.1 Quality Assessment.

2.6.2 Filtering.

2.7 Basic Exploratory Data Analysis.

2.7.1 t Test.

2.7.1.1 t Test for Equal Variances.

2.7.1.2 t Test for Unequal Variances.

2.7.2 ANOVA F Test.

2.7.3 SAM t Statistic.

2.7.4 Limma.

2.7.5 Adjustment for Multiple Comparisons.

2.7.5.1 Single-Step Bonferroni Procedure.

2.7.5.2 Single-Step Sidak Procedure.

2.7.5.3 Step-Down Holm Procedure.

2.7.5.4 Step-Up Benjamini and Hochberg Procedure.

2.7.5.5 Permutation Based Multiplicity Adjustment.

2.8 Unsupervised Learning (Taxonomy-Related Analysis).

2.8.1 Cluster Analysis.

2.8.1.1 Measures of Similarity or Distance.

2.8.1.2 K-Means Clustering.

2.8.1.3 Hierarchical Clustering.

2.8.1.4 Two-Way Clustering and Related Methods.

2.8.2 Principal Component Analysis.

2.8.3 Self-Organizing Maps.

Exercises.

3 Biomarker Discovery and Classification.

3.1 Overview.

3.1.1 Gene Expression Matrix . . . Again.

3.1.2 Biomarker Discovery.

3.1.3 Classification Systems.

3.1.3.1 Parametric and Nonparametric Learning Algorithms.

3.1.3.2 Terms Associated with Common Assumptions Underlying Parametric Learning Algorithms.

3.1.3.3 Visualization of Classification Results.

3.1.4 Validation of the Classification Model.

3.1.4.1 Reclassification.

3.1.4.2 Leave-One-Out and K-Fold Cross-Validation.

3.1.4.3 External and Internal Cross-Validation.

3.1.4.4 Holdout Method of Validation.

3.1.4.5 Ensemble-Based Validation (Using Out-of-Bag Samples).

3.1.4.6 Validation on an Independent Data Set.

3.1.5 Reporting Validation Results.

3.1.5.1 Binary Classifiers.

3.1.5.2 Multiclass Classifiers.

3.1.6 Identifying Biological Processes Underlying the Class Differentiation.

3.2 Feature Selection.

3.2.1 Introduction.

3.2.2 Univariate Versus Multivariate Approaches.

3.2.3 Supervised Versus Unsupervised Methods.

3.2.4 Taxonomy of Feature Selection Methods.

3.2.4.1 Filters, Wrappers, Hybrid, and Embedded Models.

3.2.4.2 Strategy: Exhaustive, Complete, Sequential, Random, and Hybrid Searches.

3.2.4.3 Subset Evaluation Criteria.

3.2.4.4 Search-Stopping Criteria.

3.2.5 Feature Selection for Multiclass Discrimination.

3.2.6 Regularization and Feature Selection.

3.2.7 Stability of Biomarkers.

3.3 Discriminant Analysis.

3.3.1 Introduction.

3.3.2 Learning Algorithm.

3.3.3 A Stepwise Hybrid Feature Selection with T2.

3.4 Support Vector Machines.

3.4.1 Hard-Margin Support Vector Machines.

3.4.2 Soft-Margin Support Vector Machines.

3.4.3 Kernels.

3.4.4 SVMs and Multiclass Discrimination.

3.4.4.1 One-Versus-the-Rest Approach.

3.4.4.2 Pairwise Approach.

3.4.4.3 All-Classes-Simultaneously Approach.

3.4.5 SVMs and Feature Selection: Recursive Feature Elimination.

3.4.6 Summary.

3.5 Random Forests.

3.5.1 Introduction.

3.5.2 Random Forests Learning Algorithm.

3.5.3 Random Forests and Feature Selection.

3.5.4 Summary.

3.6 Ensemble Classifiers, Bootstrap Methods, and The Modified Bagging Schema.

3.6.1 Ensemble Classifiers.

3.6.1.1 Parallel Approach.

3.6.1.2 Serial...

Produktinformationen

Titel: Data Mining for Genomics and Proteomics
Untertitel: Analysis of Gene and Protein Expression Data
Autor:
EAN: 9780470593400
ISBN: 978-0-470-59340-0
Digitaler Kopierschutz: Adobe-DRM
Format: E-Book (pdf)
Herausgeber: Wiley-Interscience
Genre: Informatik
Anzahl Seiten: 328
Veröffentlichung: 16.07.2010
Jahr: 2010
Untertitel: Englisch
Dateigrösse: 7.9 MB
Zuletzt angesehen
Verlauf löschen