HistDat

First we need to import the package:

library(HistDat)

Now, let’s say that we have seen the number one 1000 billion times, the number two 2000 billion times, the number three also 2000 billion times, and the number four 1000 billion times.

If we turned this into a single vector with the true number of each observation, we would have a vector of length 6,000,000,000,000! It seems unlikely that this would even fit into RAM, and if it did, calculations would be very difficult.

h = HistDat(
  vals = 1:4,
  counts = c(1e12, 2e12, 2e12, 1e12)
)

Now let’s calculate some summary statistics, without using RAM we don’t need!

mean(h)
#> [1] 2.5
min(h)
#> [1] 1
length(h)
#> [1] 6e+12
median(h)
#> [1] 2.5

We actually can convert a hist_dat object into a 1-D vector, which is reasonable if we only have a small number of counts:

h = HistDat(
  vals = 1:4,
  counts = c(1, 2, 2, 1)
)
as.vector(h)
#> [1] 1 2 2 3 3 4