Level of confounding calculation
calc.confounding.level.Rd
Calculate the level of confounding between handling effects and sample group of interest for a dataset. First, principal component is applied on the non-biological subset of the data. The first five principal components are then used to build a simple linear regression model to predict the sample group. the highest adjusted R-squared is returned as the level of confounding.
Arguments
- data
microarry dataset. It must have rows as probes and columns as samples.
- group.id
a vector of sample-group labels for each sample of the dataset.
- nbe.genes
a vector of non-biological genes used to filter the dataset. Non-biological genes are indicated as
TRUE
, otherwise asFALSE
. The vector must have an equal length to the number of probes in the dataset.
Value
a list of two elements:
- locc
the level of confounding
- k_pc
the most correlated principal component of the non-biological genes in the dataset with the sample group
References
Leek J., Scharpf R., Bravo H., et al. Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733-9, 2010.
Examples
if (FALSE) {
biological.effect <- estimate.biological.effect(uhdata = uhdata.pl)
handling.effect <- estimate.handling.effect(uhdata = uhdata.pl,
nuhdata = nuhdata.pl)
ctrl.genes <- unique(rownames(uhdata.pl))[grep("NC", unique(rownames(uhdata.pl)))]
biological.effect.nc <- biological.effect[!rownames(biological.effect)
%in% ctrl.genes, ]
handling.effect.nc <- handling.effect[!rownames(handling.effect) %in% ctrl.genes, ]
group.id <- substr(colnames(biological.effect.nc), 7, 7)
biological.effect.train.ind <- colnames(biological.effect.nc)[c(sample(which(
group.id == "E"), size = 64),
sample(which(group.id == "V"), size = 64))]
handling.effect.train.ind <- colnames(handling.effect.nc)[c(1:64, 129:192)]
# randomly created a vector of Boolean for nbe.genes
nbe.genes <- sample(c(TRUE, FALSE), size = nrow(biological.effect.nc), replace = TRUE)
calc.confounding.level(data = biological.effect.nc[, biological.effect.train.ind],
group.id = substr(biological.effect.train.ind, 7, 7),
nbe.genes = nbe.genes)
}