Package 'PSCBS' reference manual

Title:	Analysis of Parent-Specific DNA Copy Numbers
Description:	Segmentation of allele-specific DNA copy number data and detection of regions with abnormal copy number within each parental chromosome. Both tumor-normal paired and tumor-only analyses are supported.
Authors:	Henrik Bengtsson [aut, cre, cph], Pierre Neuvial [aut], Venkatraman E. Seshan [aut], Adam B. Olshen [aut], Paul T. Spellman [aut], Richard A. Olshen [aut], Frank E Harrell Jr [ctb] (Weighted quantile estimator adopted from Hmisc::wtd.quantile())
Maintainer:	Henrik Bengtsson <[email protected]>
License:	GPL (>= 2)
Version:	0.67.0
Built:	2025-02-11 04:43:11 UTC
Source:	https://github.com/HenrikBengtsson/PSCBS

Package PSCBS

Description

Segmentation of allele-specific DNA copy number data and detection of regions with abnormal copy number within each parental chromosome. Both tumor-normal paired and tumor-only analyses are supported..

This package should be considered to be in an alpha or beta phase. You should expect the API to be changing over time.

Installation and updates

To install this package, use install.packages("PSCBS").

To get started

To get started, see:

Vignettes 'Parent-specific copy-number segmentation using Paired PSCBS' and 'Total copy-number segmentation using CBS'.
segmentByCBS() - segments total copy-numbers, or any other unimodal genomic signals, using the CBS method [3,4].
segmentByPairedPSCBS() - segments allele-specific tumor signal from a tumor with a matched normal using the Paired PSCBS method [1,2].
segmentByNonPairedPSCBS() - segments allele-specific tumor signal from a tumor without a matched normal using the Non-Paired PSCBS method adopted from [1,2].

How to cite

Please use [1] and [2] to cite when using Paired PSCBS, and [3] and [4] when using CBS. When using Non-Paired PSCBS, please cite [1] and [2] as well.

License

GPL (>= 2).

Author(s)

Henrik Bengtsson

References

[1] A.B. Olshen, H. Bengtsson, P. Neuvial, P.T. Spellman, R.A. Olshen, V.E. Seshan, Parent-specific copy number in paired tumor-normal studies using circular binary segmentation, Bioinformatics, 2011
[2] H. Bengtsson, P. Neuvial and T.P. Speed, TumorBoost: Normalization of allele-specific tumor copy numbers from a single pair of tumor-normal genotyping microarrays, BMC Bioinformatics, 2010
[3] A.B. Olshen, E.S. Venkatraman (aka Venkatraman E. Seshan), R. Lucito and M. Wigler, Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 2004
[4] E.S. Venkatraman and A.B. Olshen, A faster circular binary segmentation algorithm for the analysis of array CGH data, Bioinformatics, 2007

Calls/drops single-locus outliers along the genome

Description

Calls/drops single-locus outliers along the genome that have a signal that differ significantly from the neighboring loci.

Usage

 ## Default S3 method:
callSegmentationOutliers(y, chromosome=0, x=NULL, method="DNAcopy::smooth.CNA", ...,
  verbose=FALSE)
 ## S3 method for class 'data.frame'
callSegmentationOutliers(y, ...)
 ## Default S3 method:
dropSegmentationOutliers(y, ...)
 ## S3 method for class 'data.frame'
dropSegmentationOutliers(y, ...)
## Default S3 method:
callSegmentationOutliers(y, chromosome=0, x=NULL, method="DNAcopy::smooth.CNA", ...,
  verbose=FALSE)
 ## S3 method for class 'data.frame'
callSegmentationOutliers(y, ...)
 ## Default S3 method:
dropSegmentationOutliers(y, ...)
 ## S3 method for class 'data.frame'
dropSegmentationOutliers(y, ...)

Arguments

`y`	A `numeric` `vector` of J genomic signals to be segmented.
`chromosome`	(Optional) An `integer` scalar (or a `vector` of length J contain a unique value). Only used for annotation purposes.
`x`	Optional `numeric` `vector` of J genomic locations. If `NULL`, index locations `1:J` are used.
`method`	A `character` string specifying the method used for calling outliers.
`...`	Additional arguments passed to internal outlier detection method.
`verbose`	See `Verbose`.

Value

callSegmentationOutliers() returns a logical vector of length J. dropSegmentationOutliers() returns an object of the same type as argument y, where the signals for which outliers were called have been set to NA.

Missing and non-finite values

Signals as well as genomic positions may contain missing values, i.e. NAs or NaNs. By definition, these cannot be outliers.

Author(s)

Henrik Bengtsson

The CBS class

Description

A CBS object holds results from the Circular Binary Segmentation (CBS) method for one sample for one or more chromosomes.

Package: PSCBS
Class CBS

list
~~|
~~+--AbstractCBS
~~~~~~~|
~~~~~~~+--CBS

Directly known subclasses:

public abstract class CBS
extends AbstractCBS

Usage

CBS(...)
CBS(...)

Arguments

...

Arguments passed to the constructor of AbstractCBS.

Fields and Methods

Methods:

	`as`	-
	`c`	-
	`estimateStandardDeviation`	-
	`plotTracks`	-
	`pruneBySdUndo`	-
	`segmentByCBS`	-
	`seqOfSegmentsByDP`	-
	`writeSegments`	-

Methods inherited from AbstractCBS:
adjustPloidyScale, all.equal, as.data.frame, clearCalls, drawChangePoints, drawKnownSegments, dropChangePoint, dropChangePoints, dropRegion, dropRegions, extractCNs, extractChromosome, extractChromosomes, extractRegions, extractSegments, extractWIG, getChangePoints, getChromosomeOffsets, getChromosomeRanges, getChromosomes, getLocusData, getLocusSignalNames, getMeanEstimators, getSampleName, getSegmentSizes, getSegmentTrackPrefixes, getSegments, mergeThreeSegments, mergeTwoSegments, nbrOfChangePoints, nbrOfChromosomes, nbrOfLoci, nbrOfSegments, normalizeTotalCNs, ploidy, ploidy<-, plotTracks, print, pruneByDP, pruneByHClust, renameChromosomes, report, resegment, resetSegments, sampleCNs, sampleName, sampleName<-, seqOfSegmentsByDP, setLocusData, setMeanEstimators, setPloidy, setSampleName, setSegments, shiftTCN, tileChromosomes, updateMeans, writeWIG

Methods inherited from list:
Ops,nonStructure,vector-method, Ops,structure,vector-method, Ops,vector,nonStructure-method, Ops,vector,structure-method, all.equal, as.data.frame, attachLocally, callHooks, coerce,ANY,list-method, relist, type.convert, within

Difference to DNAcopy object

A CBS object is similar to DNAcopy objects with the major difference that a CBS object holds only one sample, whereas a DNAcopy object can hold more than one sample.

Author(s)

Henrik Bengtsson

Identifies gaps of a genome where there exist no observations

Description

Identifies gaps of a genome where there exist no observations.

Usage

## Default S3 method:
findLargeGaps(chromosome=NULL, x, minLength, resolution=1L, ...)
## Default S3 method:
findLargeGaps(chromosome=NULL, x, minLength, resolution=1L, ...)

Arguments

`chromosome`	(Optional) An `integer` `vector` of length J of chromosome indices.
`x`	A `numeric` `vector` of J of genomic locations.
`minLength`	A positive `numeric` scalar specifying the minimum length of a gap.
`resolution`	A non-negative `numeric` specifying the minimum length unit, which by default equals one nucleotide/base pair.
`...`	Not used.

Value

Returns data.frame zero or more rows and with columns chromosome (if given), start, stop, and length.

Author(s)

Henrik Bengtsson

Gets the genomic segments that are complementary to the gaps

Description

Gets the genomic segments that are complementary to the gaps, with default chromosome boundaries being -Inf and +Inf.

Usage

## S3 method for class 'data.frame'
gapsToSegments(gaps, resolution=1L, minLength=0L, dropGaps=FALSE, ...)
## S3 method for class 'data.frame'
gapsToSegments(gaps, resolution=1L, minLength=0L, dropGaps=FALSE, ...)

Arguments

`gaps`	A `data.frame` with columns `chromosome`, `start`, and `stop`. Any overlapping gaps will throw an error.
`resolution`	A non-negative `numeric` specifying the minimum length unit, which by default equals one nucleotide/base pair.
`minLength`	Minimum length of segments to be kept.
`dropGaps`	If `TRUE`, the gaps themselves are not part of the output.
`...`	Not used.

Value

Returns data.frame of least one row with columns chromosome if that argument is given), start, stop and length. The segments are ordered along the genome.

Author(s)

Henrik Bengtsson

The NonPairedPSCBS class

Description

Package: PSCBS
Class NonPairedPSCBS

list
~~|
~~+--AbstractCBS
~~~~~~~|
~~~~~~~+--PSCBS
~~~~~~~~~~~~|
~~~~~~~~~~~~+--NonPairedPSCBS

Directly known subclasses:

public abstract class NonPairedPSCBS
extends PSCBS

A NonPairedPSCBS is an object containing the results from the Non-paired PSCBS method.

Usage

NonPairedPSCBS(fit=list(), ...)NonPairedPSCBS(fit=list(), ...)

Arguments

`fit`	A `list` structure containing the Non-paired PSCBS results.
`...`	Not used.

Fields and Methods

Methods:
No methods defined.

Methods inherited from PSCBS:
as.data.frame, c, drawChangePoints, extractChromosomes, extractWIG, getLocusData, getLocusSignalNames, getSegmentTrackPrefixes, isLocallyPhased, isSegmentSplitter, normalizeTotalCNs, writeSegments

Author(s)

Henrik Bengtsson

The PairedPSCBS class

Description

Package: PSCBS
Class PairedPSCBS

list
~~|
~~+--AbstractCBS
~~~~~~~|
~~~~~~~+--PSCBS
~~~~~~~~~~~~|
~~~~~~~~~~~~+--PairedPSCBS

Directly known subclasses:

public abstract class PairedPSCBS
extends PSCBS

A PairedPSCBS is an object containing the results from the Paired PSCBS method.

Usage

PairedPSCBS(fit=list(), ...)PairedPSCBS(fit=list(), ...)

Arguments

`fit`	A `list` structure containing the Paired PSCBS results.
`...`	Not used.

Fields and Methods

Methods:

	`callAB`	-
	`callCopyNeutral`	-
	`callGNL`	-
	`callGNLByTCNofAB`	-
	`callGainNeutralLoss`	-
	`callLOH`	-
	`callNTCN`	-
	`callROH`	-
	`estimateDeltaAB`	-
	`estimateDeltaLOH`	-
	`estimateKappa`	-
	`extractCNs`	-
	`hasBootstrapSummaries`	-
	`plotTracks`	-
	`segmentByNonPairedPSCBS`	-
	`segmentByPairedPSCBS`	-
	`seqOfSegmentsByDP`	-

Author(s)

Henrik Bengtsson

The PSCBS class

Description

Package: PSCBS
Class PSCBS

list
~~|
~~+--AbstractCBS
~~~~~~~|
~~~~~~~+--PSCBS

Directly known subclasses:
NonPairedPSCBS, PairedPSCBS

public abstract class PSCBS
extends AbstractCBS

A PSCBS is an object containing results from parent-specific copy-number (PSCN) segmentation.

Usage

PSCBS(fit=list(), ...)PSCBS(fit=list(), ...)

Arguments

`fit`	A `list` structure containing the PSCN segmentation results.
`...`	Not used.

Fields and Methods

Methods:

	`c`	-
	`isLocallyPhased`	-
	`normalizeTotalCNs`	-
	`writeSegments`	-

Author(s)

Henrik Bengtsson

Segment genomic signals using the CBS method

Description

Segment genomic signals using the CBS method of the DNAcopy package. This is a convenient low-level wrapper for the DNAcopy::segment() method. It is intended to be applied to a sample at the time. For more details on the Circular Binary Segmentation (CBS) method see [1,2].

Usage

## Default S3 method:
segmentByCBS(y, chromosome=0L, x=NULL, index=seq_along(y), w=NULL, undo=0,
  avg=c("mean", "median"), ..., joinSegments=TRUE, knownSegments=NULL, seed=NULL,
  verbose=FALSE)
## Default S3 method:
segmentByCBS(y, chromosome=0L, x=NULL, index=seq_along(y), w=NULL, undo=0,
  avg=c("mean", "median"), ..., joinSegments=TRUE, knownSegments=NULL, seed=NULL,
  verbose=FALSE)

Arguments

`y`	A `numeric` `vector` of J genomic signals to be segmented.
`chromosome`	Optional `numeric` `vector` of length J, specifying the chromosome of each loci. If a scalar, it is expanded to a vector of length J.
`x`	Optional `numeric` `vector` of J genomic locations. If `NULL`, index locations `1:J` are used.
`index`	An optional `integer` `vector` of length J specifying the genomewide indices of the loci.
`w`	Optional `numeric` `vector` in [0,1] of J weights.
`undo`	A non-negative `numeric`. If greater than zero, then arguments `undo.splits="sdundo"` and `undo.SD=undo` are passed to `DNAcopy::segment()`. In the special case when `undo` is +`Inf`, the segmentation result will not contain any changepoints (in addition to what is specified by argument `knownSegments`).
`avg`	A `character` string specifying how to calculating segment mean levels after change points have been identified.
`...`	Additional arguments passed to the `DNAcopy::segment()` segmentation function.
`joinSegments`	If `TRUE`, there are no gaps between neighboring segments. If `FALSE`, the boundaries of a segment are defined by the support that the loci in the segments provides, i.e. there exist a locus at each end point of each segment. This also means that there is a gap between any neighboring segments, unless the change point is in the middle of multiple loci with the same position. The latter is what `DNAcopy::segment()` returns.
`knownSegments`	Optional `data.frame` specifying non-overlapping known segments. These segments must not share loci. See `findLargeGaps`() and `gapsToSegments`().
`seed`	An (optional) `integer` specifying the random seed to be set before calling the segmentation method. The random seed is set to its original state when exiting. If `NULL`, it is not set.
`verbose`	See `Verbose`.

Details

Internally segment of DNAcopy is used to segment the signals. This segmentation method support weighted segmentation.

Value

Returns a CBS object.

Reproducibility

The DNAcopy::segment() implementation of CBS uses approximation through random sampling for some estimates. Because of this, repeated calls using the same signals may result in slightly different results, unless the random seed is set/fixed.

Missing and non-finite values

Signals may contain missing values (NA or NaN), but not infinite values (+/-Inf). Loci with missing-value signals are preserved and keep in the result.

Likewise, genomic positions may contain missing values. However, if they do, such loci are silently excluded before performing the segmentation, and are not kept in the results. The mapping between the input locus-level data and ditto of the result can be inferred from the index column of the locus-level data of the result.

None of the input data may have infinite values, i.e. -Inf or +Inf. If so, an informative error is thrown.

Author(s)

Henrik Bengtsson

References

[1] A.B. Olshen, E.S. Venkatraman (aka Venkatraman E. Seshan), R. Lucito and M. Wigler, Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 2004
[2] E.S. Venkatraman and A.B. Olshen, A faster circular binary segmentation algorithm for the analysis of array CGH data, Bioinformatics, 2007

Examples

 
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Simulating copy-number data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
set.seed(0xBEEF)

# Number of loci
J <- 1000

mu <- double(J)
mu[200:300] <- mu[200:300] + 1
mu[350:400] <- NA # centromere
mu[650:800] <- mu[650:800] - 1
eps <- rnorm(J, sd=1/2)
y <- mu + eps
x <- sort(runif(length(y), max=length(y))) * 1e5
w <- runif(J)
w[650:800] <- 0.001


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- segmentByCBS(y, x=x)
print(fit)
plotTracks(fit)


     
xlab <- "Position (Mb)"
ylim <- c(-3,3)
xMb <- x/1e6
plot(xMb,y, pch=20, col="#aaaaaa", xlab=xlab, ylim=ylim)
drawLevels(fit, col="red", lwd=2, xScale=1e-6)

 
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# TESTS
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- segmentByCBS(y, x=x, seed=0xBEEF)
print(fit)
##   id chromosome       start      end nbrOfLoci    mean
## 1  y          0    55167.82 20774251       201  0.0164
## 2  y          0 20774250.85 29320105        99  1.0474
## 3  y          0 29320104.86 65874675       349 -0.0227
## 4  y          0 65874675.06 81348129       151 -1.0813
## 5  y          0 81348129.20 99910827       200 -0.0612


# Test #1: Reverse the ordering and segment
fitR <- segmentByCBS(rev(y), x=rev(x), seed=0xBEEF)
# Sanity check
stopifnot(all.equal(getSegments(fitR), getSegments(fit)))
# Sanity check
stopifnot(all.equal(rev(getLocusData(fitR)$index), getLocusData(fit)$index))

# Test #2: Reverse, but preserve ordering of 'data' object
fitRP <- segmentByCBS(rev(y), x=rev(x), preserveOrder=TRUE)
stopifnot(all.equal(getSegments(fitRP), getSegments(fit)))


# (Test #3: Change points inbetween data points at the same locus)
x[650:654] <- x[649]
fitC <- segmentByCBS(rev(y), x=rev(x), preserveOrder=TRUE, seed=0xBEEF)

# Test #4: Allow for some missing values in signals
y[450] <- NA
fitD <- segmentByCBS(y, x=x, seed=0xBEEF)


# Test #5: Allow for some missing genomic annotations
x[495] <- NA
fitE <- segmentByCBS(y, x=x, seed=0xBEEF)


# Test #6: Undo all change points found
fitF <- segmentByCBS(y, x=x, undo=Inf, seed=0xBEEF)
print(fitF)
stopifnot(nbrOfSegments(fitF) == 1L)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# MISC.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Emulate a centromere
x[650:699] <- NA
fit <- segmentByCBS(y, x=x, seed=0xBEEF)
xMb <- x/1e6
plot(xMb,y, pch=20, col="#aaaaaa", xlab=xlab, ylim=ylim)
drawLevels(fit, col="red", lwd=2, xScale=1e-6)

fitC <- segmentByCBS(y, x=x, joinSegments=FALSE, seed=0xBEEF)
drawLevels(fitC, col="blue", lwd=2, xScale=1e-6)



# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Multiple chromosomes
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Appending CBS results
fit1 <- segmentByCBS(y, chromosome=1, x=x)
fit2 <- segmentByCBS(y, chromosome=2, x=x)
fit <- c(fit1, fit2)
print(fit)
plotTracks(fit, subset=NULL, lwd=2, Clim=c(-3,3))


# Segmenting multiple chromosomes at once
chromosomeWG <- rep(1:2, each=J)
xWG <- rep(x, times=2)
yWG <- rep(y, times=2)
fitWG <- segmentByCBS(yWG, chromosome=chromosomeWG, x=xWG)
print(fitWG)
plotTracks(fitWG, subset=NULL, lwd=2, Clim=c(-3,3))

# Assert same results
fit$data[,"index"] <- getLocusData(fitWG)[,"index"] # Ignore 'index'
stopifnot(all.equal(getLocusData(fitWG), getLocusData(fit)))
stopifnot(all.equal(getSegments(fitWG), getSegments(fit)))

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Simulating copy-number data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
set.seed(0xBEEF)

# Number of loci
J <- 1000

mu <- double(J)
mu[200:300] <- mu[200:300] + 1
mu[350:400] <- NA # centromere
mu[650:800] <- mu[650:800] - 1
eps <- rnorm(J, sd=1/2)
y <- mu + eps
x <- sort(runif(length(y), max=length(y))) * 1e5
w <- runif(J)
w[650:800] <- 0.001


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- segmentByCBS(y, x=x)
print(fit)
plotTracks(fit)


     
xlab <- "Position (Mb)"
ylim <- c(-3,3)
xMb <- x/1e6
plot(xMb,y, pch=20, col="#aaaaaa", xlab=xlab, ylim=ylim)
drawLevels(fit, col="red", lwd=2, xScale=1e-6)

 
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# TESTS
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- segmentByCBS(y, x=x, seed=0xBEEF)
print(fit)
##   id chromosome       start      end nbrOfLoci    mean
## 1  y          0    55167.82 20774251       201  0.0164
## 2  y          0 20774250.85 29320105        99  1.0474
## 3  y          0 29320104.86 65874675       349 -0.0227
## 4  y          0 65874675.06 81348129       151 -1.0813
## 5  y          0 81348129.20 99910827       200 -0.0612


# Test #1: Reverse the ordering and segment
fitR <- segmentByCBS(rev(y), x=rev(x), seed=0xBEEF)
# Sanity check
stopifnot(all.equal(getSegments(fitR), getSegments(fit)))
# Sanity check
stopifnot(all.equal(rev(getLocusData(fitR)$index), getLocusData(fit)$index))

# Test #2: Reverse, but preserve ordering of 'data' object
fitRP <- segmentByCBS(rev(y), x=rev(x), preserveOrder=TRUE)
stopifnot(all.equal(getSegments(fitRP), getSegments(fit)))


# (Test #3: Change points inbetween data points at the same locus)
x[650:654] <- x[649]
fitC <- segmentByCBS(rev(y), x=rev(x), preserveOrder=TRUE, seed=0xBEEF)

# Test #4: Allow for some missing values in signals
y[450] <- NA
fitD <- segmentByCBS(y, x=x, seed=0xBEEF)


# Test #5: Allow for some missing genomic annotations
x[495] <- NA
fitE <- segmentByCBS(y, x=x, seed=0xBEEF)


# Test #6: Undo all change points found
fitF <- segmentByCBS(y, x=x, undo=Inf, seed=0xBEEF)
print(fitF)
stopifnot(nbrOfSegments(fitF) == 1L)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# MISC.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Emulate a centromere
x[650:699] <- NA
fit <- segmentByCBS(y, x=x, seed=0xBEEF)
xMb <- x/1e6
plot(xMb,y, pch=20, col="#aaaaaa", xlab=xlab, ylim=ylim)
drawLevels(fit, col="red", lwd=2, xScale=1e-6)

fitC <- segmentByCBS(y, x=x, joinSegments=FALSE, seed=0xBEEF)
drawLevels(fitC, col="blue", lwd=2, xScale=1e-6)



# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Multiple chromosomes
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Appending CBS results
fit1 <- segmentByCBS(y, chromosome=1, x=x)
fit2 <- segmentByCBS(y, chromosome=2, x=x)
fit <- c(fit1, fit2)
print(fit)
plotTracks(fit, subset=NULL, lwd=2, Clim=c(-3,3))


# Segmenting multiple chromosomes at once
chromosomeWG <- rep(1:2, each=J)
xWG <- rep(x, times=2)
yWG <- rep(y, times=2)
fitWG <- segmentByCBS(yWG, chromosome=chromosomeWG, x=xWG)
print(fitWG)
plotTracks(fitWG, subset=NULL, lwd=2, Clim=c(-3,3))

# Assert same results
fit$data[,"index"] <- getLocusData(fitWG)[,"index"] # Ignore 'index'
stopifnot(all.equal(getLocusData(fitWG), getLocusData(fit)))
stopifnot(all.equal(getSegments(fitWG), getSegments(fit)))

Segment total copy numbers and allele B fractions using the Non-paired PSCBS method

Description

Segment total copy numbers and allele B fractions using the Non-paired PSCBS method [1]. This method does not requires matched normals. This is a low-level segmentation method. It is intended to be applied to one tumor sample at the time.

Usage

## Default S3 method:
segmentByNonPairedPSCBS(CT, betaT, ..., flavor=c("tcn", "tcn&dh", "tcn,dh",
  "sqrt(tcn),dh", "sqrt(tcn)&dh"), tauA=NA, tauB=1 - tauA, verbose=FALSE)
## Default S3 method:
segmentByNonPairedPSCBS(CT, betaT, ..., flavor=c("tcn", "tcn&dh", "tcn,dh",
  "sqrt(tcn),dh", "sqrt(tcn)&dh"), tauA=NA, tauB=1 - tauA, verbose=FALSE)

Arguments

`CT`	A `numeric` `vector` of J tumor total copy number (TCN) ratios in [0,+`Inf`) (due to noise, small negative values are also allowed). The TCN ratios are typically scaled such that copy-neutral diploid loci have a mean of two.
`betaT`	A `numeric` `vector` of J tumor allele B fractions (BAFs) in [0,1] (due to noise, values may be slightly outside as well) or `NA` for non-polymorphic loci.
`...`	Additional arguments passed to `segmentByPairedPSCBS`().
`flavor`	A `character` specifying what type of segmentation and calling algorithm to be used.
`tauA`, `tauB`	Lower and upper thresholds (`tauA < tauB` for calling SNPs heterozygous based on the tumor allele B fractions (`betaT`). If `NA`, then they are estimates from data.
`verbose`	See `Verbose`.

Details

Internally segmentByPairedPSCBS() is used for segmentation. This segmentation method does not support weights.

Value

Returns the segmentation results as a NonPairedPSCBS object.

Reproducibility

The "DNAcopy::segment" implementation of CBS uses approximation through random sampling for some estimates. Because of this, repeated calls using the same signals may result in slightly different results, unless the random seed is set/fixed.

Whole-genome segmentation is preferred

Although it is possible to segment each chromosome independently using Paired PSCBS, we strongly recommend to segment whole-genome (TCN,BAF) data at once. The reason for this is that downstream CN-state calling methods, such as the AB and the LOH callers, performs much better on whole-genome data. In fact, they may fail to provide valid calls if done chromosome by chromosome.

Missing and non-finite values

The total copy number signals as well as any optional positions must not contain missing values, i.e. NAs or NaNs. If there are any, an informative error is thrown. Allele B fractions may contain missing values, because such are interpreted as representing non-polymorphic loci.

None of the input signals may have infinite values, i.e. -Inf or +Inf. If so, an informative error is thrown.

Non-Paired PSCBS with known genotypes

If allele B fractions for the matched normal (betaN) are not available, but genotypes (muN) are, then it is possible to run Paired PSCBS. See segmentByPairedPSCBS() for details.

Author(s)

Henrik Bengtsson

References

Examples

verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Load SNP microarray data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
data <- PSCBS::exampleData("paired.chr01")
str(data)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Paired PSCBS segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Drop single-locus outliers
dataS <- dropSegmentationOutliers(data)

# Speed up example by segmenting fewer loci
dataS <- dataS[seq(from=1, to=nrow(data), by=20),]

str(dataS)

R.oo::attachLocally(dataS)

# Non-Paired PSCBS segmentation
fit <- segmentByNonPairedPSCBS(CT, betaT=betaT,
                            chromosome=chromosome, x=x,
                            seed=0xBEEF, verbose=verbose)
print(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Bootstrap segment level estimates
# (used by the AB caller, which, if skipped here,
#  will do it automatically)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
print(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in allelic balance (AB)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in DH for calling AB
# (which be done by default by the caller, if skipped here)
deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
print(deltaAB)

fit <- callAB(fit, delta=deltaAB, verbose=verbose)
print(fit)


# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaAB == deltaAB)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in loss-of-heterozygosity (LOH)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in C1 for calling LOH
# (which be done by default by the caller, if skipped here)
deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
print(deltaLOH)

fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
print(fit)
plotTracks(fit)

# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaLOH == deltaLOH)
verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Load SNP microarray data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
data <- PSCBS::exampleData("paired.chr01")
str(data)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Paired PSCBS segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Drop single-locus outliers
dataS <- dropSegmentationOutliers(data)

# Speed up example by segmenting fewer loci
dataS <- dataS[seq(from=1, to=nrow(data), by=20),]

str(dataS)

R.oo::attachLocally(dataS)

# Non-Paired PSCBS segmentation
fit <- segmentByNonPairedPSCBS(CT, betaT=betaT,
                            chromosome=chromosome, x=x,
                            seed=0xBEEF, verbose=verbose)
print(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Bootstrap segment level estimates
# (used by the AB caller, which, if skipped here,
#  will do it automatically)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
print(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in allelic balance (AB)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in DH for calling AB
# (which be done by default by the caller, if skipped here)
deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
print(deltaAB)

fit <- callAB(fit, delta=deltaAB, verbose=verbose)
print(fit)


# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaAB == deltaAB)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in loss-of-heterozygosity (LOH)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in C1 for calling LOH
# (which be done by default by the caller, if skipped here)
deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
print(deltaLOH)

fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
print(fit)
plotTracks(fit)

# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaLOH == deltaLOH)

Segment total copy numbers and allele B fractions using the Paired PSCBS method

Description

Segment total copy numbers and allele B fractions using the Paired PSCBS method [1]. This method requires matched normals. This is a low-level segmentation method. It is intended to be applied to one tumor-normal sample at the time.

Usage

## Default S3 method:
segmentByPairedPSCBS(CT, thetaT=NULL, thetaN=NULL, betaT=NULL, betaN=NULL, muN=NULL,
  rho=NULL, chromosome=0, x=NULL, alphaTCN=0.009, alphaDH=0.001, undoTCN=0, undoDH=0,
  ..., avgTCN=c("mean", "median"), avgDH=c("mean", "median"),
  flavor=c("tcn&dh", "tcn,dh", "sqrt(tcn),dh", "sqrt(tcn)&dh", "tcn"), tbn=is.null(rho),
  joinSegments=TRUE, knownSegments=NULL, dropMissingCT=TRUE, seed=NULL, verbose=FALSE,
  preserveScale=FALSE)
## Default S3 method:
segmentByPairedPSCBS(CT, thetaT=NULL, thetaN=NULL, betaT=NULL, betaN=NULL, muN=NULL,
  rho=NULL, chromosome=0, x=NULL, alphaTCN=0.009, alphaDH=0.001, undoTCN=0, undoDH=0,
  ..., avgTCN=c("mean", "median"), avgDH=c("mean", "median"),
  flavor=c("tcn&dh", "tcn,dh", "sqrt(tcn),dh", "sqrt(tcn)&dh", "tcn"), tbn=is.null(rho),
  joinSegments=TRUE, knownSegments=NULL, dropMissingCT=TRUE, seed=NULL, verbose=FALSE,
  preserveScale=FALSE)

Arguments

`CT`	A `numeric` `vector` of J tumor total copy number (TCN) ratios in [0,+`Inf`) (due to noise, small negative values are also allowed). The TCN ratios are typically scaled such that copy-neutral diploid loci have a mean of two.
`thetaT`, `thetaN`	(alternative) As an alternative to specifying tumor TCN ratios relative to the match normal by argument `CT`, on may specify total tumor and normal signals seperately, in which case the TCN ratios `CT` are calculated as $CT = 2*thetaT/thetaN$ .
`betaT`	A `numeric` `vector` of J tumor allele B fractions (BAFs) in [0,1] (due to noise, values may be slightly outside as well) or `NA` for non-polymorphic loci.
`betaN`	A `numeric` `vector` of J matched normal BAFs in [0,1] (due to noise, values may be slightly outside as well) or `NA` for non-polymorphic loci.
`muN`	An optional `numeric` `vector` of J genotype calls in {0,1/2,1} for AA, AB, and BB, respectively, and `NA` for non-polymorphic loci. If not given, they are estimated from the normal BAFs using `callNaiveGenotypes` as described in [2].
`rho`	(alternative to `betaT` and `betaN`/`muN`) A `numeric` `vector` of J decrease-of-heterozygosity signals (DHs) in [0,1] (due to noise, values may be slightly larger than one as well). By definition, DH should be `NA` for homozygous loci and for non-polymorphic loci.
`chromosome`	(Optional) An `integer` scalar (or a `vector` of length J), which can be used to specify which chromosome each locus belongs to in case multiple chromosomes are segments. This argument is also used for annotation purposes.
`x`	Optional `numeric` `vector` of J genomic locations. If `NULL`, index locations `1:J` are used.
`alphaTCN`, `alphaDH`	The significance levels for segmenting total copy numbers (TCNs) and decrease-in-heterozygosity signals (DHs), respectively.
`undoTCN`, `undoDH`	Non-negative `numeric`s. If greater than 0, then a cleanup of segmentions post segmentation is done. See argument `undo` of `segmentByCBS`() for more details.
`avgTCN`, `avgDH`	A `character` string specifying how to calculating segment mean levels after change points have been identified.
`...`	Additional arguments passed to `segmentByCBS`().
`flavor`	A `character` specifying what type of segmentation and calling algorithm to be used.
`tbn`	If `TRUE`, `betaT` is normalized before segmentation using the TumorBoost method [2], otherwise not.
`joinSegments`	If `TRUE`, there are no gaps between neighboring segments. If `FALSE`, the boundaries of a segment are defined by the support that the loci in the segments provides, i.e. there exist a locus at each end point of each segment. This also means that there is a gap between any neighboring segments, unless the change point is in the middle of multiple loci with the same position. The latter is what `DNAcopy::segment()` returns.
`knownSegments`	Optional `data.frame` specifying non-overlapping known segments. These segments must not share loci. See `findLargeGaps`() and `gapsToSegments`().
`dropMissingCT`	If `TRUE`, loci for which 'CT' is missing are dropped, otherwise not.
`seed`	An (optional) `integer` specifying the random seed to be set before calling the segmentation method. The random seed is set to its original state when exiting. If `NULL`, it is not set.
`verbose`	See `Verbose`.
`preserveScale`	Defunct - gives an error is specified.

Details

Internally segmentByCBS() is used for segmentation. The Paired PSCBS segmentation method does not support weights.

Value

Returns the segmentation results as a PairedPSCBS object.

Reproducibility

Whole-genome segmentation is preferred

Missing and non-finite values

None of the input signals may have infinite values, i.e. -Inf or +Inf. If so, an informative error is thrown.

Paired PSCBS with only genotypes

If allele B fractions for the matched normal (betaN) are not available, but genotypes (muN) are, then it is possible to run a version of Paired PSCBS where TumorBoost normalization of the tumor allele B fractions is skipped. In order for this to work, argument tbn must be set to FALSE.

Author(s)

Henrik Bengtsson

References

Examples

verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Load SNP microarray data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
data <- PSCBS::exampleData("paired.chr01")
str(data)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Paired PSCBS segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Drop single-locus outliers
dataS <- dropSegmentationOutliers(data)

# Speed up example by segmenting fewer loci
dataS <- dataS[seq(from=1, to=nrow(data), by=10),]

str(dataS)

R.oo::attachLocally(dataS)

# Paired PSCBS segmentation
fit <- segmentByPairedPSCBS(CT, betaT=betaT, betaN=betaN,
                            chromosome=chromosome, x=x,
                            seed=0xBEEF, verbose=verbose)
print(fit)

# Plot results
plotTracks(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Bootstrap segment level estimates
# (used by the AB caller, which, if skipped here,
#  will do it automatically)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
print(fit)
plotTracks(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in allelic balance (AB)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in DH for calling AB
# (which be done by default by the caller, if skipped here)
deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
print(deltaAB)
## [1] 0.1657131

fit <- callAB(fit, delta=deltaAB, verbose=verbose)
print(fit)
plotTracks(fit)

# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaAB == deltaAB)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in loss-of-heterozygosity (LOH)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in C1 for calling LOH
# (which be done by default by the caller, if skipped here)
deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
print(deltaLOH)
## [1] 0.625175

fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
print(fit)
plotTracks(fit)

# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaLOH == deltaLOH)
verbose <- R.utils::Arguments$getVerbose(-10*interactive(), timestamp=TRUE)

# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Load SNP microarray data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
data <- PSCBS::exampleData("paired.chr01")
str(data)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Paired PSCBS segmentation
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Drop single-locus outliers
dataS <- dropSegmentationOutliers(data)

# Speed up example by segmenting fewer loci
dataS <- dataS[seq(from=1, to=nrow(data), by=10),]

str(dataS)

R.oo::attachLocally(dataS)

# Paired PSCBS segmentation
fit <- segmentByPairedPSCBS(CT, betaT=betaT, betaN=betaN,
                            chromosome=chromosome, x=x,
                            seed=0xBEEF, verbose=verbose)
print(fit)

# Plot results
plotTracks(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Bootstrap segment level estimates
# (used by the AB caller, which, if skipped here,
#  will do it automatically)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
fit <- bootstrapTCNandDHByRegion(fit, B=100, verbose=verbose)
print(fit)
plotTracks(fit)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in allelic balance (AB)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in DH for calling AB
# (which be done by default by the caller, if skipped here)
deltaAB <- estimateDeltaAB(fit, flavor="qq(DH)", verbose=verbose)
print(deltaAB)
## [1] 0.1657131

fit <- callAB(fit, delta=deltaAB, verbose=verbose)
print(fit)
plotTracks(fit)

# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaAB == deltaAB)


# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Calling segments in loss-of-heterozygosity (LOH)
# NOTE: Ideally, this should be done on whole-genome data
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Explicitly estimate the threshold in C1 for calling LOH
# (which be done by default by the caller, if skipped here)
deltaLOH <- estimateDeltaLOH(fit, flavor="minC1|nonAB", verbose=verbose)
print(deltaLOH)
## [1] 0.625175

fit <- callLOH(fit, delta=deltaLOH, verbose=verbose)
print(fit)
plotTracks(fit)

# Even if not explicitly specified, the estimated
# threshold parameter is returned by the caller
stopifnot(fit$params$deltaLOH == deltaLOH)

Package 'PSCBS'

Help Index

Package PSCBS

Description

Installation and updates

To get started

How to cite

License

Author(s)

References

Calls/drops single-locus outliers along the genome

Description

Usage

Arguments

Value

Missing and non-finite values

Author(s)

See Also

The CBS class

Description

Usage

Arguments

Fields and Methods

Difference to DNAcopy object

See also

Author(s)

Identifies gaps of a genome where there exist no observations

Description

Usage

Arguments

Value

Author(s)

See Also

Gets the genomic segments that are complementary to the gaps

Description

Usage

Arguments

Value

Author(s)

See Also

The NonPairedPSCBS class

Description

Usage

Arguments

Fields and Methods

Author(s)

See Also

The PairedPSCBS class

Description

Usage

Arguments

Fields and Methods

Author(s)

See Also

The PSCBS class

Description

Usage

Arguments

Fields and Methods

Author(s)

See Also

Segment genomic signals using the CBS method

Description

Usage

Arguments

Details

Value

Reproducibility

Missing and non-finite values

Author(s)

References

See Also

Examples

Segment total copy numbers and allele B fractions using the Non-paired PSCBS method

Description

Usage

Arguments

Details

Value

Reproducibility