Amazon cover image
Image from Amazon.com

Nearest Neighbor Methods for the Imputation of Missing Values in Low and High-Dimensional Data

By: Material type: TextTextPublication details: Göttingen : Cuvillier Verlag, 2018.Description: 1 online resource (219 pages)Content type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 3736987412
  • 9783736987418
Subject(s): Genre/Form: Additional physical formats: Print version:: Nearest Neighbor Methods for the Imputation of Missing Values in Low and High-Dimensional Data.DDC classification:
  • 519.53 23
LOC classification:
  • QA278
Online resources:
Contents:
Intro; Introduction; Methodological Concepts for Missing Data; Missing Values; Missing Data Mechanism; An Overview of Traditional Missing Data Techniques; Deletion Methods; Substitution Methods; Imputation; Nearest Neighbors Methods; Traditional k Nearest Neighbors Imputation; Modification of the Traditional kNN Imputation; Nearest Neighbors for High-Dimensional Data; Practical Issues in High-Dimensional Data (n<<p); Extensions to Binary and Multi-Categorical Data; Selection of Attributes by Weighted Distances; Using Nearest Neighbors to Impute Missing Values; A Pearson Correlation Strategy
Extension to Mixed Type DataAvailable Distances for Mixed Type Data; Weighted Distance for Mixed Type Data; Weighted Imputation by Nearest Neighbors; Bootstrap Inference; Missing Data and Classification; Multiple Imputation in High-Dimensional Data; MI using Nearest Neighbors; Improved Methods for the Imputation of Missing Data by Nearest Neighbor Methods; Introduction; Weighted Neighbors; Distances and Computation of Nearest Neighbors; Imputation Procedure; Choice of Tuning Parameters by Cross-Validation; Performance Measures; Evaluation of Weighted Neighbors
Weighted Neighbors Including the Selection of PredictorsSelection of Dimensions; Evaluation of the Method with Selected Weighted Neighbors; Case Studies; Gene Expression Data; Non-gene Expression Data; Concluding Remarks; Missing Value Imputation for Gene Expression Data by Tailored Nearest Neighbors; Introduction; Materials and Methods; Nearest Neighbors Approaches to Imputation; Nearest Neighbor Based on Selected Genes; Choice of Tuning Parameters; Overview of Competing Methods; Results and Discussion; Data; Simulation and Evaluation; Real Data Sets; Concluding Remarks
Nearest Neighbor Imputation for Categorical Data by Weighting of AttributesIntroduction; Methods; Distances for Categorical Variables; Selection of Attributes by Weighted Distances; Measuring Association Among Attributes; Using Nearest Neighbors to Impute Missing Values; A Pearson Correlation Strategy; Cross Validation; Evaluation of Performance; Existing Methods; Simulation Studies; Binary Variables; Multi-Categorical Variables; Mixed (Binary and Multi-Categorical) Variables; Applications; Concluding Remarks; Imputation Methods for High-Dimensional Mixed-Type Datasets by Nearest Neighbors
IntroductionDistances for Mixed-Type Data; Available Methods; Weighted Distance; Weighted Distance With Selection of Variables; Weighted Imputation by Nearest Neighbors; Choice of Tuning Parameters by Cross-Validation; Measuring Association Among Mixed Variables; A Pearson Correlation Strategy; Existing Approaches for Comparison; Performance Measures; Simulation Studies; Real Data Applications; Concluding Remarks; Bootstrap Inference for Weighted Nearest Neighbors Imputation; Introduction; Nearest Neighbors Imputation of Missing Values; Bootstrap Sampling and Missing Data
Summary: Nowadays, due to the advancement and significantly rapid growth in the technology, the collection of high-dimensional data is no longer a tedious task. Regardless of considerable advances in technology over the last few decades, the analysis of high-dimensional data faces new challenges concerning interpretation and integration. One of the major problems in high-dimensional data is the occurrence of missing values. The problem is in particular hard to handle when the distributional forms of the variables are different or the variables are measured on different measurement scales (e.g. binary, multi-categorical, continuous, etc.). Whatever the reason, missing data may occur in all areas of applied research. The inadequate handling of missing values may lead to biased results and incorrect inference. The standard statistical techniques for analyzing the data require complete cases without any missing observations. The deletion of the cases with missing information to obtain complete data will not only cause the loss of important information but can also affect inferences. In this dissertation, different imputation techniques using nearest neighbors are developed to address the missing data issues in high-dimensional as well as low dimensional data structures.
Holdings
Item type Current library Collection Call number Status Date due Barcode Item holds
eBook eBook e-Library EBSCO Business Available
Total holds: 0

Print version record.

Intro; Introduction; Methodological Concepts for Missing Data; Missing Values; Missing Data Mechanism; An Overview of Traditional Missing Data Techniques; Deletion Methods; Substitution Methods; Imputation; Nearest Neighbors Methods; Traditional k Nearest Neighbors Imputation; Modification of the Traditional kNN Imputation; Nearest Neighbors for High-Dimensional Data; Practical Issues in High-Dimensional Data (n<<p); Extensions to Binary and Multi-Categorical Data; Selection of Attributes by Weighted Distances; Using Nearest Neighbors to Impute Missing Values; A Pearson Correlation Strategy

Extension to Mixed Type DataAvailable Distances for Mixed Type Data; Weighted Distance for Mixed Type Data; Weighted Imputation by Nearest Neighbors; Bootstrap Inference; Missing Data and Classification; Multiple Imputation in High-Dimensional Data; MI using Nearest Neighbors; Improved Methods for the Imputation of Missing Data by Nearest Neighbor Methods; Introduction; Weighted Neighbors; Distances and Computation of Nearest Neighbors; Imputation Procedure; Choice of Tuning Parameters by Cross-Validation; Performance Measures; Evaluation of Weighted Neighbors

Weighted Neighbors Including the Selection of PredictorsSelection of Dimensions; Evaluation of the Method with Selected Weighted Neighbors; Case Studies; Gene Expression Data; Non-gene Expression Data; Concluding Remarks; Missing Value Imputation for Gene Expression Data by Tailored Nearest Neighbors; Introduction; Materials and Methods; Nearest Neighbors Approaches to Imputation; Nearest Neighbor Based on Selected Genes; Choice of Tuning Parameters; Overview of Competing Methods; Results and Discussion; Data; Simulation and Evaluation; Real Data Sets; Concluding Remarks

Nearest Neighbor Imputation for Categorical Data by Weighting of AttributesIntroduction; Methods; Distances for Categorical Variables; Selection of Attributes by Weighted Distances; Measuring Association Among Attributes; Using Nearest Neighbors to Impute Missing Values; A Pearson Correlation Strategy; Cross Validation; Evaluation of Performance; Existing Methods; Simulation Studies; Binary Variables; Multi-Categorical Variables; Mixed (Binary and Multi-Categorical) Variables; Applications; Concluding Remarks; Imputation Methods for High-Dimensional Mixed-Type Datasets by Nearest Neighbors

IntroductionDistances for Mixed-Type Data; Available Methods; Weighted Distance; Weighted Distance With Selection of Variables; Weighted Imputation by Nearest Neighbors; Choice of Tuning Parameters by Cross-Validation; Measuring Association Among Mixed Variables; A Pearson Correlation Strategy; Existing Approaches for Comparison; Performance Measures; Simulation Studies; Real Data Applications; Concluding Remarks; Bootstrap Inference for Weighted Nearest Neighbors Imputation; Introduction; Nearest Neighbors Imputation of Missing Values; Bootstrap Sampling and Missing Data

Proposed Bootstrap Imputation Procedure

Nowadays, due to the advancement and significantly rapid growth in the technology, the collection of high-dimensional data is no longer a tedious task. Regardless of considerable advances in technology over the last few decades, the analysis of high-dimensional data faces new challenges concerning interpretation and integration. One of the major problems in high-dimensional data is the occurrence of missing values. The problem is in particular hard to handle when the distributional forms of the variables are different or the variables are measured on different measurement scales (e.g. binary, multi-categorical, continuous, etc.). Whatever the reason, missing data may occur in all areas of applied research. The inadequate handling of missing values may lead to biased results and incorrect inference. The standard statistical techniques for analyzing the data require complete cases without any missing observations. The deletion of the cases with missing information to obtain complete data will not only cause the loss of important information but can also affect inferences. In this dissertation, different imputation techniques using nearest neighbors are developed to address the missing data issues in high-dimensional as well as low dimensional data structures.

Master record variable field(s) change: 050, 082, 650

Powered by Koha