Title: | Fast and Light-Weight Caching (Memoization) of Objects and Results to Speed Up Computations |
---|---|
Description: | Memoization can be used to speed up repetitive and computational expensive function calls. The first time a function that implements memoization is called the results are stored in a cache memory. The next time the function is called with the same set of parameters, the results are momentarily retrieved from the cache avoiding repeating the calculations. With this package, any R object can be cached in a key-value storage where the key can be an arbitrary set of R objects. The cache memory is persistent (on the file system). |
Authors: | Henrik Bengtsson [aut, cre, cph] |
Maintainer: | Henrik Bengtsson <[email protected]> |
License: | LGPL (>= 2.1) |
Version: | 0.16.0 |
Built: | 2024-11-07 02:40:07 UTC |
Source: | https://github.com/HenrikBengtsson/R.cache |
Memoization can be used to speed up repetitive and computational expensive function calls. The first time a function that implements memoization is called the results are stored in a cache memory. The next time the function is called with the same set of parameters, the results are momentarily retrieved from the cache avoiding repeating the calculations. With this package, any R object can be cached in a key-value storage where the key can be an arbitrary set of R objects. The cache memory is persistent (on the file system).
To install this package and all of its dependent packages, do:
install.packages("R.cache")
loadCache, saveCache Methods for loading and saving objects from and to the cache.
getCacheRootPath, setCacheRootPath Methods for getting and setting the directory where cache files are stored.
Whenever using this package, please cite [1] as
Bengtsson, H. The R.oo package - Object-Oriented Programming with References Using Standard R Code, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), ISSN 1609-395X, Hornik, K.; Leisch, F. & Zeileis, A. (ed.), 2003
Here is a list of features that would be useful, but which I have too little time to add myself. Contributions are appreciated.
Add a functionality to identify cache files that are no longer of use. For now, there is an extra header field for arbitrary comments which can be used, but maybe more formal fields are useful, e.g. keywords, user, etc?
If you consider implement some of the above, make sure it is not already implemented by downloading the latest "devel" version!
See also the filehash package, and the cache()
function
in the Biobase package of Bioconductor.
The releases of this package is licensed under LGPL version 2.1 or newer.
[1] H. Bengtsson, The R.oo package - Object-Oriented Programming with References Using Standard R Code, In Kurt Hornik, Friedrich Leisch and Achim Zeileis, editors, Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003), March 20-22, Vienna, Austria. https://www.r-project.org/conferences/DSC-2003/Proceedings/
Henrik Bengtsson
Creates a copy of an existing function such that its results are memoized.
## Default S3 method: addMemoization(fcn, envir=parent.frame(), ...)
## Default S3 method: addMemoization(fcn, envir=parent.frame(), ...)
fcn |
A |
envir |
The |
... |
Additional arguments for controlling the memoization,
i.e. all arguments of |
The new function is setup such that the the memoized call is done in the environment of the caller (the parent frame of the function).
If the function
returns NULL
, that particular function call is
not memoized.
Returns a function
.
Henrik Bengtsson
The returned function utilized memoizedCall
() internally.
Evaluates an R expression with memoization such that the same objects are assigned to the current environment and the same result is returned, if any.
evalWithMemoization(expr, key=NULL, ..., envir=parent.frame(), drop=c("srcref", "srcfile", "wholeSrcref"), force=FALSE)
evalWithMemoization(expr, key=NULL, ..., envir=parent.frame(), drop=c("srcref", "srcfile", "wholeSrcref"), force=FALSE)
expr |
The |
key |
Additional objects to uniquely identify the evaluation. |
... |
|
envir |
The |
drop |
|
force |
If |
Returns the value of the evaluated expr
expression
, if any.
Henrik Bengtsson
Internally, eval
() is used to evaluate the expression.
for (kk in 1:5) { cat(sprintf("Iteration #%d:\n", kk)) res <- evalWithMemoization({ cat("Evaluating expression...") a <- 1 b <- 2 c <- 4 Sys.sleep(1) cat("done\n") b }) print(res) # Sanity checks stopifnot(a == 1 && b == 2 && c == 4) # Clean up rm(a, b, c) } # for (kk ...) ## OUTPUTS: ## Iteration #1: ## Evaluating expression...done ## [1] 2 ## Iteration #2: ## [1] 2 ## Iteration #3: ## [1] 2 ## Iteration #4: ## [1] 2 ## Iteration #5: ## [1] 2 ############################################################ # WARNING ############################################################ # If the expression being evaluated depends on # "input" objects, then these must be be specified # explicitly as "key" objects. for (ii in 1:2) { for (kk in 1:3) { cat(sprintf("Iteration #%d:\n", kk)) res <- evalWithMemoization({ cat("Evaluating expression...") a <- kk Sys.sleep(1) cat("done\n") a }, key=list(kk=kk)) print(res) # Sanity checks stopifnot(a == kk) # Clean up rm(a) } # for (kk ...) } # for (ii ...) ## OUTPUTS: ## Iteration #1: ## Evaluating expression...done ## [1] 1 ## Iteration #2: ## Evaluating expression...done ## [1] 2 ## Iteration #3: ## Evaluating expression...done ## [1] 3 ## Iteration #1: ## [1] 1 ## Iteration #2: ## [1] 2 ## Iteration #3: ## [1] 3
for (kk in 1:5) { cat(sprintf("Iteration #%d:\n", kk)) res <- evalWithMemoization({ cat("Evaluating expression...") a <- 1 b <- 2 c <- 4 Sys.sleep(1) cat("done\n") b }) print(res) # Sanity checks stopifnot(a == 1 && b == 2 && c == 4) # Clean up rm(a, b, c) } # for (kk ...) ## OUTPUTS: ## Iteration #1: ## Evaluating expression...done ## [1] 2 ## Iteration #2: ## [1] 2 ## Iteration #3: ## [1] 2 ## Iteration #4: ## [1] 2 ## Iteration #5: ## [1] 2 ############################################################ # WARNING ############################################################ # If the expression being evaluated depends on # "input" objects, then these must be be specified # explicitly as "key" objects. for (ii in 1:2) { for (kk in 1:3) { cat(sprintf("Iteration #%d:\n", kk)) res <- evalWithMemoization({ cat("Evaluating expression...") a <- kk Sys.sleep(1) cat("done\n") a }, key=list(kk=kk)) print(res) # Sanity checks stopifnot(a == kk) # Clean up rm(a) } # for (kk ...) } # for (ii ...) ## OUTPUTS: ## Iteration #1: ## Evaluating expression...done ## [1] 1 ## Iteration #2: ## Evaluating expression...done ## [1] 2 ## Iteration #3: ## Evaluating expression...done ## [1] 3 ## Iteration #1: ## [1] 1 ## Iteration #2: ## [1] 2 ## Iteration #3: ## [1] 3
Gets the root path to the file cache directory.
## Default S3 method: getCacheRootPath(defaultPath=NULL, ...)
## Default S3 method: getCacheRootPath(defaultPath=NULL, ...)
defaultPath |
The default path, if no user-specified directory has been given. |
... |
Not used. |
Returns the path as a character
string.
Henrik Bengtsson
Too set the directory where cache files are stored,
see setCacheRootPath
().
print(getCacheRootPath())
print(getCacheRootPath())
Loads data from file cache, which is unique for an optional key object.
## Default S3 method: loadCache(key=NULL, sources=NULL, suffix=".Rcache", removeOldCache=TRUE, pathname=NULL, dirs=NULL, ..., onError=c("warning", "error", "message", "quiet", "print"))
## Default S3 method: loadCache(key=NULL, sources=NULL, suffix=".Rcache", removeOldCache=TRUE, pathname=NULL, dirs=NULL, ..., onError=c("warning", "error", "message", "quiet", "print"))
key |
An optional object from which a hexadecimal hash code will be generated and appended to the filename. |
sources |
Optional source objects. If the cache object has a timestamp older than one of the source objects, it will be ignored and removed. |
suffix |
A |
removeOldCache |
If |
pathname |
The pathname to the cache file. If specified,
arguments |
dirs |
A |
... |
Not used. |
onError |
A |
The hash code calculated from the key
object is a
32 characters long hexadecimal MD5 hash code.
For more details, see getChecksum
().
Returns an R object or NULL
, if cache does not exist.
Henrik Bengtsson
saveCache
().
simulate <- function(mean, sd) { # 1. Try to load cached data, if already generated key <- list(mean, sd) data <- loadCache(key) if (!is.null(data)) { cat("Loaded cached data\n") return(data); } # 2. If not available, generate it. cat("Generating data from scratch...") data <- rnorm(1000, mean=mean, sd=sd) Sys.sleep(1) # Emulate slow algorithm cat("ok\n") saveCache(data, key=key, comment="simulate()") data; } data <- simulate(2.3, 3.0) data <- simulate(2.3, 3.5) data <- simulate(2.3, 3.0) # Will load cached data # Clean up file.remove(findCache(key=list(2.3,3.0))) file.remove(findCache(key=list(2.3,3.5)))
simulate <- function(mean, sd) { # 1. Try to load cached data, if already generated key <- list(mean, sd) data <- loadCache(key) if (!is.null(data)) { cat("Loaded cached data\n") return(data); } # 2. If not available, generate it. cat("Generating data from scratch...") data <- rnorm(1000, mean=mean, sd=sd) Sys.sleep(1) # Emulate slow algorithm cat("ok\n") saveCache(data, key=key, comment="simulate()") data; } data <- simulate(2.3, 3.0) data <- simulate(2.3, 3.5) data <- simulate(2.3, 3.0) # Will load cached data # Clean up file.remove(findCache(key=list(2.3,3.0))) file.remove(findCache(key=list(2.3,3.5)))
Calls a function with memoization, that is, caches the results to be retrieved if the function is called again with the exact same arguments.
## Default S3 method: memoizedCall(what, ..., envir=parent.frame(), force=FALSE, sources=NULL, dirs=NULL)
## Default S3 method: memoizedCall(what, ..., envir=parent.frame(), force=FALSE, sources=NULL, dirs=NULL)
what |
The |
... |
Arguments passed to the function. |
envir |
The |
force |
If |
sources , dirs
|
If the function
returns NULL
, that particular function call is
not memoized.
Returns the result of the function call.
Henrik Bengtsson
Internally, loadCache
() is used to load memoized results,
if available. If not available, then do.call
() is used to
evaluate the function call,
and saveCache
() is used to save the results to cache.
Saves data to file cache, which is unique for an optional key object.
## Default S3 method: saveCache(object, key=NULL, sources=NULL, suffix=".Rcache", comment=NULL, pathname=NULL, dirs=NULL, compress=NULL, ...)
## Default S3 method: saveCache(object, key=NULL, sources=NULL, suffix=".Rcache", comment=NULL, pathname=NULL, dirs=NULL, compress=NULL, ...)
object |
The object to be saved to file. |
key |
An optional object from which a hexadecimal hash code will be generated and appended to the filename. |
sources |
Source objects used for comparison of timestamps when cache is loaded later. |
suffix |
A |
comment |
An optional |
pathname |
(Advanced) An optional |
dirs |
A |
compress |
If |
... |
Additional argument passed to |
Returns (invisible) the pathname of the cache file.
The saveCache()
method saves a compressed cache file
(with filename extension *.gz) if argument compress
is TRUE
.
The loadCache
() method locates (via findCache
()) and
loads such cache files as well.
Henrik Bengtsson
For more details on how the hash code is generated etc, loadCache
().
## Not run: For an example, see ?loadCache
## Not run: For an example, see ?loadCache
Sets the root path to the file cache directory.
## Default S3 method: setCacheRootPath(path=NULL, ...)
## Default S3 method: setCacheRootPath(path=NULL, ...)
path |
The path. |
... |
Not used. |
Returns (invisibly) the old root path.
Henrik Bengtsson