Title: | Summary Statistics for Histogram/Count Data |
---|---|
Description: | In some cases you will have data in a histogram format, where you have a vector of all possible observations, and a vector of how many times each observation appeared. You could expand this into a single 1D vector, but this may not be advisable if the counts are extremely large. 'HistDat' allows for the calculation of summary statistics without the need for expanding your data. |
Authors: | Michael Milton |
Maintainer: | Michael Milton <[email protected]> |
License: | GPL (>=3) |
Version: | 0.2.0 |
Built: | 2025-02-16 04:46:03 UTC |
Source: | https://github.com/multimeric/histdat |
In some cases you will have data in a "histogram" format, where you have a vector of all possible observations, and a vector of how many times each observation appeared. You could expand this into a single 1D vector, but this may not be advisable if the counts are extremely large. 'HistDat' allows for the calculation of summary statistics without the need for expanding your data.
Note that all the methods described for HistDat
instances have been
transformed into generic methods in this package where they are not already,
with default implementations for general numeric vectors. This allows you
to equally apply these same functions to any type of data.
HistDat
StatisticsHistDat
UtilitiesIndex the histogram data
## S4 method for signature 'HistDat' x[i, j, ..., drop = TRUE]
## S4 method for signature 'HistDat' x[i, j, ..., drop = TRUE]
x |
An instance of the class HistDat |
i |
A vector of indices to find in the sorted array of observations |
j , drop , ...
|
Included for compatibility, but ignored |
The observations that would be returned if you flattened the array and then indexed it
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) hd[1] # returns 1 hd[2] # returns 2 hd[3] # returns 2
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) hd[1] # returns 1 hd[2] # returns 2 hd[3] # returns 2
Converts an object to an empirical cumulative density function. This is a generic function.
as.ecdf(x)
as.ecdf(x)
x |
The object to coerce to a eCDF |
An instance of the "ecdf" class
cdf <- as.ecdf(1:4) cdf(2) # returns 0.5
cdf <- as.ecdf(1:4) cdf(2) # returns 0.5
Converts this histogram to an instance of the "ecdf" class, allowing the calculation of cumulative densities, and quantiles
## S4 method for signature 'HistDat' as.ecdf(x)
## S4 method for signature 'HistDat' as.ecdf(x)
x |
An instance of the class HistDat |
An instance of the ecdf
class. It can be invoked as a function to
return the cumulative proportion of the count data less than or equal to
x
.
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) cdf <- as.ecdf(hd) cdf(2) # returns 0.75
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) cdf <- as.ecdf(hd) cdf(2) # returns 0.75
Converts this histogram to a vector. Not recommended if there are many counts as this would result in an incredibly long vector
## S4 method for signature 'HistDat' as.vector(x)
## S4 method for signature 'HistDat' as.vector(x)
x |
An instance of the class HistDat |
A vector with the same length
as x
, but as a 1-D vector with
an element for each count in the counts vector. In other words, all
length(x)
observations will be represented as a single element instead of
being just counted as in the original HistDat object.
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) as.vector(hd) # returns 1 2 2 3
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) as.vector(hd) # returns 1 2 2 3
Concatenate observations into this instance
## S4 method for signature 'HistDatCompatible' c(x, ...)
## S4 method for signature 'HistDatCompatible' c(x, ...)
x |
The first value to concatenate |
... |
The remaining values to concatenate |
A new HistDat object, with the other numeric values integrated into it
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) hd_2 <- c(1, 1, hd) hd@counts # returns 1 2 1 hd_2@counts # returns 3 2 1, as the first value now has 2 more counts hd_2@vals # returns 1 2 3 (this is unchanged)
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) hd_2 <- c(1, 1, hd) hd@counts # returns 1 2 1 hd_2@counts # returns 3 2 1, as the first value now has 2 more counts hd_2@vals # returns 1 2 3 (this is unchanged)
The constructor function for the HistDat class. This is the only official way to create an instance of this class.
HistDat(vals, counts)
HistDat(vals, counts)
vals |
A vector of observation values, ie all the possible values that could be observed |
counts |
A vector of counts, each of which corresponds to the same index in the vals parameter |
hd <- HistDat::HistDat(vals = 1:3, counts = c(1, 2, 1)) length(hd) # returns 4
hd <- HistDat::HistDat(vals = 1:3, counts = c(1, 2, 1)) length(hd) # returns 4
S4 class for histogram data
vals
A vector of observations
counts
A vector of counts, each of which corresponds to the same index in the vals parameter
Calculates the total number of observations in a histogram dataset
## S4 method for signature 'HistDat' length(x)
## S4 method for signature 'HistDat' length(x)
x |
An instance of the class HistDat |
A numeric of length 1, holding the number of observations in the dataset
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) length(hd) # returns 4
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) length(hd) # returns 4
Calculates the largest observation in the histogram dataset
## S4 method for signature 'HistDat' max(x, ..., na.rm = FALSE)
## S4 method for signature 'HistDat' max(x, ..., na.rm = FALSE)
x |
An instance of the class HistDat |
... |
Passed verbatim to |
na.rm |
Passed verbatim to |
A numeric of length 1, holding the largest observation in the dataset
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) max(hd) # returns 3
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) max(hd) # returns 3
Calculates the mean value of all observations in the histogram dataset
## S4 method for signature 'HistDat' mean(x, ...)
## S4 method for signature 'HistDat' mean(x, ...)
x |
An instance of the class HistDat |
... |
Additional arguments that will be ignored |
An S3 and and S4 generic is defined for this method, allowing
compatibility with existing code that calls base::mean()
instead of
[mean()]
, which is defined as an S4 generic in this package
A numeric of length 1, holding the mean of the observations in the dataset
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) mean(hd) # returns 2
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) mean(hd) # returns 2
Calculates the median value of the observations in the histogram dataset
## S4 method for signature 'HistDat' median(x, na.rm = FALSE, ...)
## S4 method for signature 'HistDat' median(x, na.rm = FALSE, ...)
x |
An instance of the class HistDat |
na.rm |
Provided for compatibility with |
... |
Additional arguments that will be ignored |
An S3 and and S4 generic is defined for this method, allowing
compatibility with existing code that calls stats::median()
instead of
median, which is defined as an S4 generic in this package
A numeric of length 1, holding the median value of the observations in the histogram dataset
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) median(hd) # returns 2
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) median(hd) # returns 2
Calculates the smallest observation in the histogram dataset
## S4 method for signature 'HistDat' min(x, ..., na.rm = FALSE)
## S4 method for signature 'HistDat' min(x, ..., na.rm = FALSE)
x |
An instance of the class HistDat |
... |
Passed verbatim to |
na.rm |
Passed verbatim to |
A numeric of length 1, holding the smallest observation in the dataset
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) min(hd) # returns 1
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) min(hd) # returns 1
Returns the empirical quantiles of the observations represented by this class
## S4 method for signature 'HistDat' quantile(x, ...)
## S4 method for signature 'HistDat' quantile(x, ...)
x |
An instance of the class HistDat |
... |
Remaining arguments to pass to |
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) quantile(hd, 0.1) # returns 1.3
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) quantile(hd, 0.1) # returns 1.3
Calculates the range of values of the observations in the histogram dataset
## S4 method for signature 'HistDat' range(x, ..., na.rm = FALSE)
## S4 method for signature 'HistDat' range(x, ..., na.rm = FALSE)
x |
An instance of the class HistDat |
... |
Additional arguments to pass to |
na.rm |
Passed verbatim to |
A numeric of length 2, indicating the minimum and maximum value of the observations
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) range(hd) # returns 1 3
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) range(hd) # returns 1 3
Calculates the standard deviation of the observations in the histogram dataset
## S4 method for signature 'HistDat' sd(x)
## S4 method for signature 'HistDat' sd(x)
x |
An instance of the class HistDat |
A numeric of length 1, holding the standard deviation of all observations in the dataset
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) sd(hd) # returns 0.8164966
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) sd(hd) # returns 0.8164966
This is a dummy method so that sort can be applied to HistDat entries However it does nothing, because the values in a HistDat are sorted at the time of creation.
## S4 method for signature 'HistDat' sort(x, decreasing = F, ...)
## S4 method for signature 'HistDat' sort(x, decreasing = F, ...)
x |
HistDat A HistDat instance |
decreasing |
If TRUE, this function will fail, as the observations are sorted in ascending order by default and this cannot be changed |
... |
Additional arguments allowed for compatibility that will be ignored |
An S3 and and S4 generic is defined for this method, allowing
compatibility with existing code that calls base::sort()
instead of
[sort()]
, which is defined as an S4 generic in this package
The same HistDat instance, completely unchanged
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) sort(hd) # returns `hd` verbatim
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) sort(hd) # returns `hd` verbatim
Calculates the sum of all observations in the histogram dataset
## S4 method for signature 'HistDat' sum(x, ..., na.rm = FALSE)
## S4 method for signature 'HistDat' sum(x, ..., na.rm = FALSE)
x |
An instance of the class HistDat |
... |
Additional arguments to pass to |
na.rm |
Passed verbatim to |
A numeric of length 1, holding the sum of all values in the dataset
sum,HistDat-method
: The S4 version
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) sum(hd) # returns 8
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) sum(hd) # returns 8
Calculates the variance of observations in the histogram dataset
## S4 method for signature 'HistDat' var(x, y = NULL, na.rm = FALSE, use)
## S4 method for signature 'HistDat' var(x, y = NULL, na.rm = FALSE, use)
x |
An instance of the class HistDat |
y |
Provided for compatibility with |
na.rm |
Provided for compatibility with |
use |
Provided for compatibility with |
A numeric of length 1, holding the variance of all observations in the dataset
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) var(hd) # returns 0.6666667
hd <- HistDat(vals = 1:3, counts = c(1, 2, 1)) var(hd) # returns 0.6666667