Data-driven miRNA sequencing Normalization Assessment

Usage

assessNormalization(raw, normalized, negControls, posControls, clusters)

raw: Raw read count matrix (rows = genes, cols = samples). The rows and columns of the count matrix must be named, where rownames(raw) are the marker names and colnames(raw) are the sample names.
normalized: Named list of normalized count matrices. Each matrix holds the normalized read count matrix corresponding to a normalization method under study. Each list member must be named (e.g. after the used normalization). Each matrix in normalized must be named where the row names are the marker names and the column names are the sample names. A list of normalized counts can be generated using the applyNormalization function.
negControls: Vector of negative control markers as generated by the function defineControls.
posControls: Vector of positive control markers as generated by the function defineControls.
clusters: Named Vector of clusters. Associates each miRNA in raw to a polycistronic cluster. Usually generated using the function defineClusters.

DANA Assessment metrics for the provided normalized counts (for each normalized count matrix). DANA computes two assessment metrics:

cc: cc measures the preservation of biological signals before versus after normalization. A high value indicates a high preservation of biological signals (cc <= 1). In particular, cc is the concordance correlation coefficient of the within-cluster partial correlation among positive controls before and after normalization.
mscr: mscr measures the relative reduction of handling before versus after normalization. A high mscr indicates higher removal of handling effects. In particular, mscr is the mean-squared correlation reduction in negative controls before and after normalization.

When selecting a normalization method for the raw data, one should aim for the best possible trade-off of hight cc and high mscr.