Classification analysis of uniformly-handled data — uni.handled.simulate • PRECISION.array

Perform classification analysis on the uniformly-handled data by re-assigning samples to training and test set. More details can be found in Qin et al. (see reference).

Usage

uni.handled.simulate(
  seed,
  N,
  biological.effect,
  norm.list = c("NN", "QN"),
  class.list = c("PAM", "LASSO"),
  norm.funcs = NULL,
  class.funcs = NULL,
  pred.funcs = NULL
)

Arguments

seed: an integer used to initialize a pseudorandom number generator.
N: number of simulation runs.
biological.effect: the estimated biological effect dataset. This dataset must have rows as probes and columns as samples.
norm.list: a list of strings for normalization methods to be compared in the simulation study. The built-in normalization methods includes "NN", "QN", "MN", "VSN" for "No Normalization", "Quantile Normalization", "Median Normalization", "Variance Stabilizing Normalization". User can provide a list of normalization methods given the functions are supplied (also see norm.funcs).
class.list: a list of strings for classification methods to be compared in the simulation study. The built-in classification methods are "PAM" and "LASSO" for "prediction analysis for microarrays" and "least absolute shrinkage and selection operator". User can provide a list of classification methods given the correponding model-building and predicting functions are supplied (also see class.funcs and pred.funcs).
norm.funcs: a list of strings for names of user-defined normalization method functions, in the order of norm.list, excluding any built-in normalization methods.
class.funcs: a list of strings for names of user-defined classification model-building functions, in the order of class.list, excluding any built-in classification methods.
pred.funcs: a list of strings for names of user-defined classification predicting functions, in the order of class.list, excluding any built-in classification methods.

Value

benchmark analysis results -- a list of training-and-test-set splits, fitted models, and misclassification error rates across simulation runs:

assign_store: random training-and-test-set splits
model_store: models for each combination of normalization methods and classification methods
error_store: internal and external misclassification error rates for each combination of normalization methods and classification methods

Details

The analysis for the uniformly-handled dataset consists of the following main steps:

(1) randomly split the data into a training set and a test set, balanced by sample group of interest

(2) preprocess the training data and the test data

(3) build a classifier using the preprocessed training data

(4) assess the mislcassification error rate of the classifier using the preprocessed test data

This analysis is repeated for N random splits of training set and test set.

Data preprocessing in (2) includes three steps: log2 transformation, normalization for training data and frozen normalization for test data, and probe-set summarization using median. Normalization methods are specified in norm.list.

Classifier building in (3) includes choosing the tuning parameter for each method using five-fold cross-validation and measuring classifier accuarcy using the misclassification error rate. Classification methods are specified in class.list

The error rate is evaluated by both external validation of test data and cross-validation of training data. For user-defined normalization method or classification method, please refer to the vignette.

References

Qin LX, Huang HC, Begg CB. Cautionary note on cross validation in molecular classification. Journal of Clinical Oncology. 2016.

Examples

if (FALSE) {
biological.effect <- estimate.biological.effect(uhdata = uhdata.pl)

ctrl.genes <- unique(rownames(uhdata.pl))[grep("NC", unique(rownames(uhdata.pl)))]

biological.effect.nc <- biological.effect[!rownames(biological.effect) %in% ctrl.genes, ]

uni.handled.results <- uni.handled.simulate(seed = 1, N = 3,
                                            biological.effect = biological.effect.nc,
                                            norm.list = c("NN", "QN"),
                                            class.list = c("PAM", "LASSO"))
}