Title: | Improved Allele-Specific Copy Number of SNP Microarrays for Downstream Segmentation |
---|---|
Description: | The CalMaTe method calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e.g. Affymetrix and Illumina. |
Authors: | Maria Ortiz [aut, ctb], Ander Aramburu [ctb], Henrik Bengtsson [aut, cre, cph], Pierre Neuvial [aut, ctb], Angel Rubio [aut, ctb] |
Maintainer: | Henrik Bengtsson <[email protected]> |
License: | LGPL (>= 2.1) |
Version: | 0.13.0 |
Built: | 2024-11-22 04:36:35 UTC |
Source: | https://github.com/HenrikBengtsson/calmate |
The CalMaTe method calibrates preprocessed allele-specific copy number estimates (ASCNs) from DNA microarrays by controlling for single-nucleotide polymorphism-specific allelic crosstalk. The resulting ASCNs are on average more accurate, which increases the power of segmentation methods for detecting changes between copy number states in tumor studies including copy neutral loss of heterozygosity. CalMaTe applies to any ASCNs regardless of preprocessing method and microarray technology, e.g. Affymetrix and Illumina.
This package depends on a set of packages that are all available via CRAN. It has been tested and verified to run on all common operating systems on which R runs, including Linux, Windows and OSX.
To install this package, do install.packages("calmate")
.
To process SNP and non-polymorphic signals, see calmateByTotalAndFracB
(). If you are working solely with SNP signals, calmateByThetaAB
() is also available, but we recommend the former.
For processing data in the aroma framework, see CalMaTeCalibration
.
Please cite [1] when using CalMaTe.
LGPL (>= 2.1).
Maria Ortiz [aut, ctb], Ander Aramburu [ctb], Henrik Bengtsson [aut, cre, cph], Pierre Neuvial [aut, ctb], Angel Rubio [aut, ctb].
[1] M. Ortiz-Estevez, A. Aramburu, H. Bengtsson, P. Neuvial and A. Rubio, CalMaTe: A method and software to improve allele-specific copy number of SNP arrays for downstream segmentation, Bioinformatics, 2012 [PMC3381965].
Normalize allele-specific copy numbers (CA,CB).
## S3 method for class 'array' calmateByThetaAB(data, references=NULL, ..., truncate=FALSE, refAvgFcn=NULL, flavor=c("v2", "v1"), verbose=FALSE)
## S3 method for class 'array' calmateByThetaAB(data, references=NULL, ..., truncate=FALSE, refAvgFcn=NULL, flavor=c("v2", "v1"), verbose=FALSE)
data |
An Jx2xI |
references |
An index |
... |
Additional arguments passed to the internal fit function
|
truncate |
If |
refAvgFcn |
(optional) A |
flavor |
A |
verbose |
See |
Returns an Jx2xI numeric
array
with the same dimension names as argument data
.
For backward compatibility, we try to keep all major versions of
the CalMaTe algorithm available. Older versions can be used by
specifying argument flavor
.
The default flavor is v2
.
For more information about the different flavors,
see fitCalMaTeInternal
.
[1] M. Ortiz-Estevez, A. Aramburu, H. Bengtsson, P. Neuvial and A. Rubio, CalMaTe: A method and software to improve allele-specific copy number of SNP arrays for downstream segmentation, Bioinformatics, 2012 [PMC3381965].
To calibrate (total,fracB) data,
see *calmateByTotalAndFracB()
.
We strongly recommend to always work with (total,fracB) data
instead of (CA,CB) data, because it is much more general.
For further information on the internal fit functions, see
fitCalMaTeInternal
.
# Load example (thetaA,thetaB) signals path <- system.file("exData", package="calmate"); theta <- loadObject("thetaAB,100x2x40.Rbin", path=path); # Calculate (CA,CB) thetaR <- matrixStats::rowMedians(theta[,"A",] + theta[,"B",], na.rm=TRUE); C <- 2*theta/thetaR; # Calibrate (CA,CB) by CalMaTe CC <- calmateByThetaAB(theta); # Plot two "random" arrays Clim <- c(0,4); subplots(4, ncol=2, byrow=FALSE); for (ii in c(1,5)) { sampleName <- dimnames(C)[[3]][ii]; sampleLabel <- sprintf("Sample #%d ('%s')", ii, sampleName); plot(C[,,ii], xlim=Clim, ylim=Clim); title(main=sampleLabel); plot(CC[,,ii], xlim=Clim, ylim=Clim); title(main=sprintf("%s\ncalibrated", sampleLabel)); }
# Load example (thetaA,thetaB) signals path <- system.file("exData", package="calmate"); theta <- loadObject("thetaAB,100x2x40.Rbin", path=path); # Calculate (CA,CB) thetaR <- matrixStats::rowMedians(theta[,"A",] + theta[,"B",], na.rm=TRUE); C <- 2*theta/thetaR; # Calibrate (CA,CB) by CalMaTe CC <- calmateByThetaAB(theta); # Plot two "random" arrays Clim <- c(0,4); subplots(4, ncol=2, byrow=FALSE); for (ii in c(1,5)) { sampleName <- dimnames(C)[[3]][ii]; sampleLabel <- sprintf("Sample #%d ('%s')", ii, sampleName); plot(C[,,ii], xlim=Clim, ylim=Clim); title(main=sampleLabel); plot(CC[,,ii], xlim=Clim, ylim=Clim); title(main=sprintf("%s\ncalibrated", sampleLabel)); }
Normalize allele-specific copy numbers (total,fracB), where total is the total (non-polymorphic) signal and
fracB is the allele B fraction.
It is only loci with a non-missing (NA
) fracB value that are
considered to be SNPs and normalized by CalMaTe. The other loci
are left untouched.
## S3 method for class 'array' calmateByTotalAndFracB(data, references=NULL, ..., refAvgFcn=NULL, verbose=FALSE)
## S3 method for class 'array' calmateByTotalAndFracB(data, references=NULL, ..., refAvgFcn=NULL, verbose=FALSE)
data |
An Jx2xI |
references |
A |
... |
Additional arguments passed to |
refAvgFcn |
(optional) A |
verbose |
See |
Returns an Jx2xI numeric
array
with the same dimension names as argument data
.
[1] M. Ortiz-Estevez, A. Aramburu, H. Bengtsson, P. Neuvial and A. Rubio, CalMaTe: A method and software to improve allele-specific copy number of SNP arrays for downstream segmentation, Bioinformatics, 2012 [PMC3381965].
To calibrate (thetaA,thetaB) or (CA,CB) signals,
see *calmateByThetaAB()
.
# Load example (thetaA,thetaB) signals path <- system.file("exData", package="calmate"); theta <- loadObject("thetaAB,100x2x40.Rbin", path=path); # Transform to (total,fracB) signals data <- thetaAB2TotalAndFracB(theta); # Calibrate (total,fracB) by CalMaTe dataC <- calmateByTotalAndFracB(data); # Calculate copy-number ratios theta <- data[,"total",]; thetaR <- matrixStats::rowMedians(theta, na.rm=TRUE); data[,"total",] <- 2*theta/thetaR; # Plot two "random" arrays Clim <- c(0,4); Blim <- c(0,1); subplots(4, ncol=2, byrow=FALSE); for (ii in c(1,5)) { sampleName <- dimnames(data)[[3]][ii]; sampleLabel <- sprintf("Sample #%d ('%s')", ii, sampleName); plot(data[,,ii], xlim=Clim, ylim=Blim); title(main=sampleLabel); plot(dataC[,,ii], xlim=Clim, ylim=Blim); title(main=sprintf("%s\ncalibrated", sampleLabel)); } # Assert that it also works with a single unit dummy <- calmateByTotalAndFracB(data[1,,,drop=FALSE]); stopifnot(length(dim(dummy)) == 3);
# Load example (thetaA,thetaB) signals path <- system.file("exData", package="calmate"); theta <- loadObject("thetaAB,100x2x40.Rbin", path=path); # Transform to (total,fracB) signals data <- thetaAB2TotalAndFracB(theta); # Calibrate (total,fracB) by CalMaTe dataC <- calmateByTotalAndFracB(data); # Calculate copy-number ratios theta <- data[,"total",]; thetaR <- matrixStats::rowMedians(theta, na.rm=TRUE); data[,"total",] <- 2*theta/thetaR; # Plot two "random" arrays Clim <- c(0,4); Blim <- c(0,1); subplots(4, ncol=2, byrow=FALSE); for (ii in c(1,5)) { sampleName <- dimnames(data)[[3]][ii]; sampleLabel <- sprintf("Sample #%d ('%s')", ii, sampleName); plot(data[,,ii], xlim=Clim, ylim=Blim); title(main=sampleLabel); plot(dataC[,,ii], xlim=Clim, ylim=Blim); title(main=sprintf("%s\ncalibrated", sampleLabel)); } # Assert that it also works with a single unit dummy <- calmateByTotalAndFracB(data[1,,,drop=FALSE]); stopifnot(length(dim(dummy)) == 3);
Package: calmate
Class CalMaTeCalibration
Object
~~|
~~+--
ParametersInterface
~~~~~~~|
~~~~~~~+--
CalMaTeCalibration
Directly known subclasses:
public static class CalMaTeCalibration
extends ParametersInterface
This class represents the CalMaTe method [1], which corrects for SNP effects in allele-specific copy-number estimates (ASCNs).
CalMaTeCalibration(data=NULL, tags="*", references=NULL, flavor=c("v2", "v1"), ...)
CalMaTeCalibration(data=NULL, tags="*", references=NULL, flavor=c("v2", "v1"), ...)
data |
A named |
tags |
Tags added to the output data sets. |
references |
An optional |
flavor |
A |
... |
Additional arguments passed to |
Methods:
findUnitsTodo |
- | |
getDataSets |
- | |
getFullName |
- | |
getName |
- | |
getOutputDataSets |
- | |
getPath |
- | |
getReferences |
- | |
getRootPath |
- | |
getTags |
- | |
nbrOfFiles |
- | |
process |
- | |
setTags |
- | |
Methods inherited from ParametersInterface:
getParameterSets, getParameters, getParametersAsString
Methods inherited from Object:
$, $<-, [[, [[<-, as.character, attach, attachLocally, clearCache, clearLookupCache, clone, detach, equals, extend, finalize, getEnvironment, getFieldModifier, getFieldModifiers, getFields, getInstantiationTime, getStaticInstance, hasField, hashCode, ll, load, names, objectSize, print, save, asThis
In order to estimate the calibration parameters, the model assumes that, for any given SNP, there are a majority of samples that are diploid at that SNP. Note that it does not have to be the same set of samples for all SNPs.
By using argument references
, it is possible so specify which
samples should be used when estimating the calibration parameters.
This is useful when for instance there are several tumor samples with
unknown properties as well as a set of normal samples that can be
assumed to be diploid.
Theoretical, a minimum of three reference samples are needed in order for the model to be identifiable. If less, an error is thrown. However, in practice more reference samples should be used, that is, in the order of at least 6-10 reference samples with a diverse set of genotypes.
For backward compatibility, we try to keep all major versions of
the CalMaTe algorithm available. Older versions can be used by
specifying argument flavor
.
For more information about the different flavors,
see fitCalMaTeInternal
.
[1] M. Ortiz-Estevez, A. Aramburu, H. Bengtsson, P. Neuvial and A. Rubio, CalMaTe: A method and software to improve allele-specific copy number of SNP arrays for downstream segmentation, Bioinformatics, 2012 [PMC3381965].
Low-level versions of the CalMaTe method is available
via the calmateByThetaAB
() and
calmateByTotalAndFracB
() methods.
For further information on the internal fit functions, see
fitCalMaTeInternal
.
## Not run: # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # CRMAv2 - Preprocess raw Affymetrix data # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - library("aroma.affymetrix"); # Needed for CRMAv2 dataSet <- "Affymetrix_2006-TumorNormal"; chipType <- "Mapping250K_Nsp"; dsList <- doCRMAv2(dataSet, chipType=chipType, combineAlleles=FALSE, plm="RmaCnPlm", verbose=-10); print(dsList); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # CalMaTe - Post-calibration of ASCNs estimates # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - asn <- CalMaTeCalibration(dsList); print(asn); # For speed issues, we will here only process loci on Chromosome 17. chr <- 17; ugp <- getAromaUgpFile(dsList$total); units <- getUnitsOnChromosome(ugp, chr); dsNList <- process(asn, units=units, verbose=verbose); print(dsNList); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Plot allele B fractions (before and after) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Sample #1 and Chromosome 17 ii <- 1; # Extract raw (TCN,BAF) df <- getFile(dsList$total, ii); dfR <- getAverageFile(dsList$total, verbose=verbose); gamma <- extractRawCopyNumbers(df, logBase=NULL, chromosome=chr); gammaR <- extractRawCopyNumbers(dfR, logBase=NULL, chromosome=chr); gamma <- 2*divideBy(gamma, gammaR); df <- getFile(dsList$fracB, ii); beta <- extractRawAlleleBFractions(df, chromosome=chr); # Extract calibrated (TCN,BAF) dfN <- getFile(dsNList$fracB, ii); betaN <- extractRawAlleleBFractions(dfN, chromosome=chr); dfN <- getFile(dsNList$total, ii); gammaN <- extractRawCopyNumbers(dfN, logBase=NULL, chromosome=chr); # Plot subplots(4, ncol=2, byrow=FALSE); plot(beta); title(sprintf("%s", getName(beta))); plot(gamma); plot(betaN); title(sprintf("%s (CalMaTe)", getName(betaN))); plot(gammaN); ## End(Not run)
## Not run: # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # CRMAv2 - Preprocess raw Affymetrix data # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - library("aroma.affymetrix"); # Needed for CRMAv2 dataSet <- "Affymetrix_2006-TumorNormal"; chipType <- "Mapping250K_Nsp"; dsList <- doCRMAv2(dataSet, chipType=chipType, combineAlleles=FALSE, plm="RmaCnPlm", verbose=-10); print(dsList); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # CalMaTe - Post-calibration of ASCNs estimates # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - asn <- CalMaTeCalibration(dsList); print(asn); # For speed issues, we will here only process loci on Chromosome 17. chr <- 17; ugp <- getAromaUgpFile(dsList$total); units <- getUnitsOnChromosome(ugp, chr); dsNList <- process(asn, units=units, verbose=verbose); print(dsNList); # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Plot allele B fractions (before and after) # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - # Sample #1 and Chromosome 17 ii <- 1; # Extract raw (TCN,BAF) df <- getFile(dsList$total, ii); dfR <- getAverageFile(dsList$total, verbose=verbose); gamma <- extractRawCopyNumbers(df, logBase=NULL, chromosome=chr); gammaR <- extractRawCopyNumbers(dfR, logBase=NULL, chromosome=chr); gamma <- 2*divideBy(gamma, gammaR); df <- getFile(dsList$fracB, ii); beta <- extractRawAlleleBFractions(df, chromosome=chr); # Extract calibrated (TCN,BAF) dfN <- getFile(dsNList$fracB, ii); betaN <- extractRawAlleleBFractions(dfN, chromosome=chr); dfN <- getFile(dsNList$total, ii); gammaN <- extractRawCopyNumbers(dfN, logBase=NULL, chromosome=chr); # Plot subplots(4, ncol=2, byrow=FALSE); plot(beta); title(sprintf("%s", getName(beta))); plot(gamma); plot(betaN); title(sprintf("%s (CalMaTe)", getName(betaN))); plot(gammaN); ## End(Not run)