Fixed one sprintf-related coercion issue, reported by the GCC compiler, that would produce an incorrect warning message.
Fixed two sprintf-related flag issues, reported by the GCC compiler, in internal assertions, which would never be reached.
src/_mingw.h
provided by Tomas Kalibera.-fpic
flag. The symptom was a linking error ld: 000.init.o: relocation R_X86_64_32 against '.rodata' can not be used when making a shared object; recompile with -fPIC collect2: error: ld returned 1 exit status
.Updates to build package from source on MS Windows with UCRT. Thanks to Tomas Kalibera for the contribution.
Now registering native routines - apparently never happened before.
use of undeclared identifier 'finite'; did you mean 'isfinite'?
. This issue goes back to 2014, when macOS produced
warning: 'finite' is deprecated: first deprecated in OS X 10.9 [-Wdeprecated-declarations]. isOk = finite(x);
. Patched by using
isfinite()
instead of finite()
.Link to Affx Fusion SDK archive on GitHub.
Spell corrections.
Using c(x,y)
instead of append(x,y)
internally.
CLEANUP: Dropped obsolete src/R_affx_test.*cmdline.cpp
files.
<R.h>
and extern C, reported by
Brian Ripley.BiocViews
field of DESCRIPTION.readCelHeader()
and readCel()
would core dump R/affxparser if
trying to read multi-channel CEL files (Issue #16). Now an error is
generated instead. Multi-channel CEL files (e.g. Axiom) are not
supported by affxparser. Thanks to Kevin McLoughlin (Lawrence
Livermore National Laboratory, USA) for reporting on this.
readCelHeader()
and readCel()
on corrupt CEL files could core
dump R/affparser (Issues #13 & #15). Now an error is generated
instead. Thanks to Benilton Carvalho (Universidade Estadual de
Campinas, Sao Paulo, Brazil) and Malte Bismarck (Martin Luther
University of Halle-Wittenberg) for reports.
R_affx_GetCHPEntries()
and R_affx_ReadCHP()
had unbalanced PROTECT()
/UNPROTECT()
. Also, native
R_affx_GetCHPGenotypingResults()
had two non-PROTECT()
:ed
usages of mkString()
. Thanks to Tomas Kalibera at Northeastern
University for reporting on this.SystemRequirements: GNU make
.ROBUSTNESS: Now readPgfEnv()
/readPgf()
validated indices
, iff
possible.
Now readPgfEnv()
/readPgf()
coerces some header fields to
integers, iff they exists, specifically num-cols
, num-rows
,
probesets
, and datalines
.
CLEANUP: Package no longer gives readBin()
warnings on 'signed = FALSE' is only valid for integers of sizes 1 and 2
.
convertCel()
on a CCG/v1 CEL file could give Error in sprintf("GridCorner%s=%d %d\n" ... invalid format '%d' ...)
.
Added package test for convertCel()
, but in this particular case
it would not have cought it because it only happened for chip types
of particular dimensions. Thanks to Malte Bismarck at UK Halle
(Germany) for reporting on this.SystemRequirements
(for now).ROBUSTNESS: Did not seem to be needed, but package is now a good
citizen and do library.dynlib.unload()
when unloaded.
Now using requireNamespace()
instead of require()
.
Internal cleanup of native code.
readPgf()
and readPgfEnv()
failed to read all units (probesets)
on some systems. Extensive package tests have been added to test
this and other cases. Thanks to Grischa Toedt at EMBL Germany for
reporting on, troubleshooting, and helping out with patches for
this bug.units
to readCdf()
and readCdfQc()
was never performed due to a typo, meaning it was possible to
request units out of range. Depending on system this could result
in either a core dump or random garbage read for the out of range
units.units
and
indices
arguments for most read functions.ROBUSTNESS: Now all methods gives an informative error message if
zero elements are requested, i.e. via zero-length argument
indices
or units
that is not NULL. Previously this case would
access all values just like NULL does.
ROBUSTNESS: Now readCelRectangle()
gives an informative error
message if argument xrange
or yrange
is not of length two.
readPgf()
and readPgfEnv()
would give an error if argument
indices
was specifies as a double rather than as an integer
vector.R CMD check
NOTEs that appeared in recent R versions.:::
.readCelUnits()
could throw Error in vector("double", nbrOfCells * nbrOfArrays) : vector size cannot be NA. In addition: Warning message: In nbrOfCells * nbrOfArrays : NAs produced by integer overflow
when reading from a large number of arrays and/or
a large number of units. Previously the limit of
nbrOfCells*nbrOfArrays
was .Machine$integer.max
(=2147483647),
whereas now it is .Machine$double.xmax
(=1.797693e+308). Thanks
to Damian Plichta at the Technical University of Denmark for
reporting on this.which()
instead of whichVector()
of
R.utils. Before R (< 2.11.0), which()
used to be 10x slower
than whichVector()
, but now it's 3x faster.Since affxparser 1.30.2/1.31.2 (r72352; 2013-01-08),
writeCdf()
would incorrectly encode the unit types, iff the input
cdf
argument specified them as integers, e.g. as done by
writeCdf()
for AffyGenePDInfo
in aroma.affymetrix. More
specifically, the unit type index would be off by one, e.g. an
expression
unit (1) would be encoded as an unknown
unit (0) and
so on. On the other hand, if they were specified by their
unit-type names (e.g. 'expression') the encoding should still be
correct, e.g. if input is constructed from readCdf()
of
affxparser. Thanks to Guido Hooiveld at Wageningen UR (The
Netherlands) for reporting on this.
Similarily, writeCdf()
has "always", at least affxparser
1.7.4 since (r21888; 2007-01-09), encoded unit directions and QC
unit types incorrectly, iff they were specified as integers.
Removed all remaining gc()
calls.
Replaced all rm()
calls with NULL assignments.
\usage{}
lines are at most 90 characters
long.example(invertMap)
a bit faster so R CMD check
won't
complain.isPackageLoaded()
of findFiles()
no longer uses
defunct manglePackageName()
function.compareCdfs()
gives a more precise reason
attribute when
there is a difference in (regular or QC) units. It narrows down
the first unit that differs and reports it unit number.writeCdf()
did not encode unit types as decoded by readCdf()
.
Unit type unknown
was incorrectly encoded such that readCdf()
would decode it as copynumber
. Also, unit types
genotypingcontrol
and expressioncontrol
where not encoded at
all.cdf=FALSE'
to createCel()
. Note, the previous
implementation corresponded to cdf=TRUE
.ROBUSTNESS: Now createCel()
validates/sets CEL header field
total
based on cols
and rows
.
ROBUSTNESS: Added a system test for validating that the package can write and read a CEL. The test is spawning of another R process so that the test is robust against core dumps.
aliases
to arrangeCelFilesByChipType()
, e.g.
arrangeCelFilesByChipType(..., aliases=c("Focus"="HG-Focus"))
.arrangeCelFilesByChipType(pathnames)
assumed pathnames
were
files in the current directory.patchdir
.arrangeCelFilesByChipType()
for moving CEL files to
subdirectories named according to their chip types, which can be
useful when for instance downloading GEO data sets.readPgfEnv(..., indices=NULL)
no longer gives a warning.
Updated the error messages for the CLF and PGF parsers.
tests/testWriteAndReadEmptyCdf.R
generates an
error that is detected and reported by R CMD check
.readCel()
and readCelUnits()
are no
longer calling .Internal(qsort(...))
.throw()
with stop()
, because the former
assumes that R.methodsS3 is loaded, which it may not be.In readBin(con, what = "integer", size = 4, n = 1, signed = FALSE, 'signed = FALSE' is only valid for integers of sizes 1 and 2
that some read methods would generated.convertCel()
..unwrapDatHeaderString()
, used by convertCel()
among others, would throw Internal error: Failed to extract 'pixelRange' and 'sampleName' from DAT header. They became identical: ...
in case the DAT header of the CEL file did not
contain all fields. The function has now been updated to be more
forgiving and robust so that missing values are returned for such
fields instead.matrix(...)
is used instead of
.Interal(matrix(...))
.readCdfDataFrame()
also returns the cell field expos
.parseDatHeaderString()
, which in combination with
readCelHeader()
can be used to infer the timestamp in the header
of a CEL file.applyCdfGroupFields()
and cdfSetDimension()
.readChp()
would crash (segmentation fault) for (at least) some
CHP files for GenomeWideSNP_5 generated by Affymetrix Power Tools.
Updated compareCels()
to work with new readCelHeader()
.
readCelHeader()
also reads DAT headers from Calvin CEL files.newChipType
to convertCel()
for
overriding the default chip type. Useful for updating the formal
chip type of old CEL files.gc()
calls in convertCel()
.readCcg()
and readCcgHeader()
no longer give warnings on
truncating string with embedded nul in 'rawToChar()'
. These
warnings made no difference, but were annoying.readChp()
would not read all data. Thanks Gabor Csardi for
reporting this and providing a patch.//
or \\
, then the chiptype
reported by
readCdfHeader()
contains a path component as well. This seems to
be due to a bug in Fusion SDK.readCcg()
is substantially faster after removing all gc()
calls.readCdf()
recognize more unit types.writeCdf()
would write CustomSeq
units as Tag
units, and vice
versa. This means that ASCII CDFs containing such units and
converted with convertCdf()
would be have an incorrect unit type
for these units. Also, unit type 'Copy Number' is reported as
"copynumber"
and no longer as "unknown"
.
The increase of the internal buffer for reading the refseq
header
field of ASCII CDFs that was done in 1.11.2 was mistakenly undone
in 1.13.3.
help(createCel)
(and its example) clarifies that the template
CEL header can be of v3 (ASCII), v4 (binary;XDA), or v1
(binary;Calvin).HISTORY
file to NEWS
.writeTpmap()
works.readChp()
. Contribution by Robert Gentleman.readClf()
and readPgf()
.cdfMergeStrands()
to merge any even number of groups, not
only units with two or four group pairs.findFiles()
for testing if R.utils is loaded or
not was not correct making it fail to detect R.utils.Added argument 'allFiles = TRUE'
to findFiles()
.
Updated readCcg()
according to the newer file format
specifications. Now it is possible to do low-level reading of
copy-number CNCHP files generated by the Affymetrix Genotype
Console v2.
findFiles()
and hence findCdf()
is only utilizing the
R.utils package if it is already loaded. It will no longer try
to load R.utils.reorder
from readCel()
and readCelUnits()
since its name was misleading (the returned value was identical
regardless of reorder
, but the reading speed was faster when
reorder
was TRUE, which is how it is now hardwired).Reading a CDF that has a refseq
header field longer than 65,000
symbols would crash R, e.g. when reading certain CDFs for
resequencing chip types. A buffer size internal of Fusion SDK was
increased from 65,000 to 400,000 bytes. Thanks Wenyi Wang for
reporting this.
Argument verbose
of tpmap2bpmap()
was not coerced to integer
before passed to the native code.
The internal .initializeCdf()
, used when creating new CDFs, had
an error message refering to an invalid qcUnitLengths
when it was
supposed to be unitLengths
. Thanks Elizabeth Purdom for
reporting this.
/inst/info
for comparing Fusion SDK with
affxparser.convertCel()
will no longer generate a warning if the
corresponding CDF file was not found.affymetrix-dat-header
but only parameter
affymetrix-partial-dat-header
. In that case convertCel()
would
throw an error about sprintf("DatHeader= %s\n", datHeader)
. Now
a "fake" DAT header is created from the partial one. If neither is
found, a slightly more informative exception is thrown.a-Z
is illegal on (at least) some
locale, e.g. C
(where A-z
works). The only way to specify the
ASCII alphabet is to list all characters explicitly, which we now
do in all methods of the package. See the r-devel thread "invalid
regular expression '[a-Z]'" on 2008-03-05 for details.Added argument 'recursive=TRUE'
to findCdf()
. Note, the
current working directory is always scanned first, but never
recursively (unless explicitly added to the search path). This is
to avoid "endless" scans in case the search path has not been set.
findFiles()
now do a breath-first search in lexicographic order.
Removed default search paths cdf/
and data/cdf/
. We do not
want to enforce a standard path.
Now the examples (as well as test scripts) utilize data available
in the new Bioconductor AffymetrixDataTestFiles package. This
means that R CMD check
now runs much more tests, which is good.
CLEAN UP: Removed many of the old testscripts/
scripts. They are
now under tests/
.
findFiles()
was not robust against broken Unix links.
If the destination file already existed, convertCel()
would
correctly detect that, but would report the name of the source
file.
See updated made to release v1.8.3 below.
The only difference between v1.9.3 and v1.8.3 is the modification
of findCdf()
in v1.9.2.
findCdf()
such that it is possible to set an alternative
function for how CDFs are located.isCelFile()
recognized Calvin CEL files.convertCel()
can convert a Calvin CEL files into v4 CEL files.writeCelHeader()
can write v4 CEL headers given Calvin CEL
header.Optimized writeCdfHeader()
for memory. For a CDF with 1,200,000+
units just writing the unit names would consume 1-1.5 GiB RAM. Now
it writes unit names in chunks keeping the memory overhead around
100-200 MiB.
Made convertCdf()
more memory efficient.
isCelFile()
when the file was not found was
broken.truncateGroupNames
to readCdfGroupNames()
which
defaults to TRUE for backward compatibility. When TRUE, any prefix
of group names identical to the unit name will be stripped of the
group names.Now readCelUnits()
can handle unit groups for which there are no
probes, e.g. when stratifying on PM in a unit containing only MMs.
Added writeCdfHeader()
, writeCdfQcUnits()
and
writeCdfUnits()
. These are all used by writeCdf()
. They also
make it possible to write a CDF in chunks in order to for instance
convertCdf()
in constant memory.
Added cdfAddPlasqTypes()
.
Now readCdfUnits(..., readDirections=TRUE)
also returns group
directions.
Now readCdf()
reads all unit and group fields by default.
In addition to optimizing IO time, read maps can be used to unrotate CEL data rotated by the dChip software. For more information, see help on "Cell-index maps for reading and writing".
readCel()
would give an error saying the read
map is invalid even when it is not.isPm
to readCdf()
.readCdfUnits()
and readCdfCellIndices()
with stratifyBy="mm"
would return the same as stratifyBy="pm"
. Options "pm"
and
"pmmm"
are unaffected by this fix.Updated to Fusion SDK v1.0.8.
Windows build change: The Windows version is building against the
Windows code of Fusion SDK not the POSIX code. In order to do this
we have had to patch the preprocessor code in several of the Fusion
SDK source-code files, which has to be redone manually whenever
Fusion is updated. Starting with this version, we instead set the
_MSC_VER
flag used in the Fusion code to indicate Windows (set by
the Microsoft Visual C++ compiler). Since we are using MINGW this
flag is obviously not set. Faking _MSC_VER
this way leaves us
only having to patch one single file in the Fusion release instead
of 10-20. Hopefully there are no other side effects.
readCelUnits()
a bit more clever if a cdf
structure with
only cell indices is passed. Then all fields are just indices and
one can call unlist immediately. This speeds things up a bit.writeCdf()
would create an invalid CDF file if there were no QC
units. This would in turn make readCdfUnits()
etc core dump.
Similar to get bug fix in the C code for readCelHeader()
, much of
the C-level code for CDF (and BPMAP) files assumes that the strings
from Fusion SDK have a null terminator. At least for CDF unit
names, this is not necessarily the case. To be on the safe side,
for all retrieved Fusion SDK strings we now make sure there is a
null terminator before converting it into an R string. Thanks to
Ken Simpson at WEHI for all the troubleshooting.
Because of the above bug fix, the ASCII mouse exon CDF can now be converted into a valid binary CDF.
The new implementation of updateCel()
utilizing raw vectors was
not correct; extra zeros was written too. The example code of
updateCel()
reveals such errors much easier now.
updateCel()
would in some cases give Error: subscript out of bounds
when writing the last chunk.
updateCel()
to update binary (v4)
CEL files. Currently, the code does make use the Fusion SDK.
There is currently no writeCel()
to create a CEL file from
scratch. However, with the auxillary function copyCel()
one can
copy an existing CEL file and then update that one. Thus, it is
now possible to write, say, normalized probe intensities to a CEL
file. Note that this is only a first prototype and functions may
change in a future release.updateCel()
substantially by first working
with raw vector in memory and then write binary data to file. Data
is also written in chunks (instead of all at once), to minimize the
memory overhead of using raw vectors, which is especially important
for the larger chips, e.g. 500K.Makevars
, _Makefile
and cmd_line
scripts.Added compareCdfs()
to verify that a converted CDF is correct.
Added convertCdf()
utilizing the new writeCdf()
.
Added trial version of createCel()
.
Added trial version of updateCelUnits()
.
The C code for readCelHeader()
did not allocate space for the
string null terminator for the header elements that originates from
wide C++ string. This caused readCelHeader()
to contain string
elements with random characters at the end.
nrows and ncols were swapped in the CDF header when written by
writeCdf()
. This was missed because all tested CDFs were square.
R CMD check
without warnings.cdfOrderBy()
and cdfOrderColumnsBy()
for restructuring
group fields in a CDF list structure. Added cdfGetGroups()
too.Cleaned up and restructured the help pages; several Rd pages are now made "internal" so they do not show up on the help index page. Instead they are accessable from within other help pages (if you browsing via HTML that is). Added a help page on common terms.
Added a bit more documentation on how to set the default CDF path.
On Linux 64-bit read CEL intensities would all be zero. This was due to compiler settings in the Fusion SDK package, which is circumvented by gcc compile it with a lower optimization level.
When argument cdf
was a CDF list structure with elements type
or direction
, readCelUnits()
would not read the correct cells
because the values of type
and direction
would be included in
the extracted list of cell indices.
The package now works on Solaris.
Updated the Fusion SDK to version 1.0.5 (an unofficial release).
New method readCdfCellIndices()
, which is a 5-10 times faster
special-case implementation of readCdfUnits()
to read cell
indices only.
Renamed readCdfUnitsMap()
to readCdfUnitsWriteMap()
.
New method invertMap()
for fast inversion of maps.
readCelUnits()
sorts the cell indices before reading the data
from each file. This minimizes the amount of jumping around in the
CEL files resulting in a speed-up of about 5-10 times.readCdfCellIndices()
replaces readCdfUnits()
, but the error is
still the same.readCelRectangle()
to read probe signals from a
specify area of the chip.readCelUnits()
will
sooner or later core dump R; it seems to be a memory related from
that occur when reading the CDF and extracting the name of the
unit. However, when "torturing" readCdfUnits()
the crash won't
happen so it might be that readCel()
does something. Have not
tried on other platforms.