An HdrHistogram histogram supports the recording and analyzing sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.
In contrast to traditional histograms that use linear, logarithmic, or arbitrary sized bins or buckets, HdrHistograms use a fixed storage internal data representation that simultaneously supports an arbitrarily high dynamic range and arbitrary precision throughout that dynamic range. This capability makes HdrHistograms extremely useful for tracking and reporting on the distribution of percentile values with high resolution and across a wide dynamic range  a common need in latency behavior characterization.
The HdrHistogram package was specifically designed with latency and performance sensitive applications in mind. Experimental ubenchmark measurements show value recording times as low as 36 nanoseconds on modern (circa 2012) Intel CPUs. All Histogram variants can maintain a fixed cost in both space and time. When not configured to autoresize, a Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.
NOTE: Histograms can optionally be configured to autoresize their dynamic range as a convenience feature. When configured to autoresize, recording operations that need to expand a histogram will autoresize its dynamic range to include recorded values as they are encountered. Note that recording calls that cause autoresizing may take longer to execute, and that resizing incurs allocation and copying of internal data structures.
The combination of high dynamic range and precision is useful for collection and accurate postrecording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded value count information is kept in high resolution allows for accurate postrecording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples.
An HdrHistogram histogram is usually configured to maintain value count data with a resolution good enough to support a desired precision in postrecording analysis and reporting on the collected data. Analysis can include the computation and reporting of distribution by percentiles, linear or logarithmic arbitrary value buckets, mean and standard deviation, as well as any other computations that can supported using the various iteration techniques available on the collected value count data. In practice, a precision levels of 2 or 3 decimal points are most commonly used, as they maintain a value accuracy of +/ ~1% or +/ ~0.1% respectively for derived distribution statistics.
A good example of HdrHistogram use would be tracking of latencies across a wide dynamic range. E.g. from a
microsecond to an hour. A Histogram can be configured to track and later report on the counts of observed integer
usecunit latency values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits
across that range. Such an example Histogram would simply be created with a
highestTrackableValue
of 3,600,000,000, and a
numberOfSignificantValueDigits
of 3, and would occupy a fixed, unchanging memory footprint
of around 185KB (see "Footprint estimation" below).
Code for this use example would include these basic elements:
Histogram
histogram = new Histogram
(3600000000L, 3);
.
.
.
// Repeatedly record measured latencies:
histogram.recordValue
(latency);
.
.
.
// Report histogram percentiles, expressed in msec units:
histogram.outputPercentileDistribution
(histogramLog, 1000.0)};
Specifying 3 decimal points of precision in this example guarantees that value quantization within the value range
will be no larger than 1/1,000th (or 0.1%) of any recorded value. This example Histogram can be therefor used to
track, analyze and report the counts of observed latencies ranging between 1 microsecond and 1 hour in magnitude,
while maintaining a value resolution 1 microsecond (or better) up to 1 millisecond, a resolution of 1 millisecond
(or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked
value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).
AbstractHistogram
class:
Histogram
, which is the commonly used Histogram form and tracks value counts
in long
fields. IntCountsHistogram
and ShortCountsHistogram
, which track value counts
in int
and
short
fields respectively, are provided for use cases where smaller count ranges are practical
and smaller overall storage is beneficial (e.g. systems where tens of thousands of inmemory histogram are
being tracked).AtomicHistogram
, ConcurrentHistogram
and SynchronizedHistogram
Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating
point number representation: Using a an exponent a (nonnormalized) mantissa to
support a wide dynamic range at a high but varying (by exponent value) resolution.
AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of
the exponent portion of a floating point number) with each bucket containing
a fixed number (per bucket) set of linear subbuckets (the parallel of a nonnormalized mantissa portion
of a floating point number).
Both dynamic range and resolution are configurable, with highestTrackableValue
controlling dynamic range, and numberOfSignificantValueDigits
controlling
resolution.
Histogram
class and it's IntCountsHistogram
and ShortCountsHistogram
variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially
concurrent, multithreaded updates or queries against Histogram objects should either take care to externally
synchronize and/or order their access, or use the ConcurrentHistogram
,
AtomicHistogram
, or SynchronizedHistogram
variants.
A common pattern seen in histogram value recording involves recording values in a critical path (multithreaded
or not), coupled with a noncritical path reading the recorded data for summary/reporting purposes. When such
continuous nonblocking recording operation (concurrent or not) is desired even when sampling, analyzing, or
reporting operations are needed, consider using the Recorder
and
SingleWriterRecorder
variants that were specifically designed for that purpose.
Recorders provide a recording API similar to Histogram, and internally maintain and coordinate active/inactive
histograms such that recording remains waitfree in the presense of accurate and stable interval sampling.
It is worth mentioning that since Histogram objects are additive, it is common practice to use perthread
nonsynchronized histograms or SingleWriterRecorder
s, and using a summary/reporting
thread perform histogram aggregation math across time and/or threads.
HistogramIterationValue
data points along the histogram's iterated data set, and are available via the following methods:
percentiles
:
An Iterable
<HistogramIterationValue
> through the
histogram using a PercentileIterator
linearBucketValues
:
An Iterable
<HistogramIterationValue
> through
the histogram using a LinearIterator
logarithmicBucketValues
:
An Iterable
<HistogramIterationValue
>
through the histogram using a LogarithmicIterator
recordedValues
:
An Iterable
<HistogramIterationValue
> through
the histogram using a RecordedValuesIterator
allValues
:
An Iterable
<HistogramIterationValue
> through
the histogram using a AllValuesIterator
Iteration is typically done with a foreach loop statement. E.g.:
for (HistogramIterationValue v : histogram.percentiles(percentileTicksPerHalfDistance)) {
...
}
or
for (HistogramIterationValue v : histogram.linearBucketValues(valueUnitsPerBucket)) {
...
}
The iterators associated with each iteration method are resettable, such that a caller that would like to avoid
allocating a new iterator object for each iteration loop can reuse an iterator to repeatedly iterate through the
histogram. This iterator reuse usually takes the form of a traditional for loop using the Iterator's
hasNext()
and next()
methods:
to avoid allocating a new iterator object for each iteration loop:
PercentileIterator iter = histogram.percentiles().iterator(percentileTicksPerHalfDistance);
...
iter.reset(percentileTicksPerHalfDistance);
for (iter.hasNext() {
HistogramIterationValue v = iter.next();
...
}
Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. Histogram provides methods for determining the lowest and highest equivalent values for any given value, as we as determining whether two values are equivalent, and for finding the next nonequivalent value for a given value (useful when looping through values, in order to avoid doublecounting count).
Regular, raw value data recording into an HdrHistogram is achieved with the
recordValue()
method.
Histogram variants also provide an autocorrecting
recordValueWithExpectedInterval()
form in support of a common use case found when histogram values are used to track response time
distribution in the presence of Coordinated Omission  an extremely common phenomenon found in latency recording
systems.
This correcting form is useful in [e.g. load generator] scenarios where measured response times may exceed the
expected interval between issuing requests, leading to the "omission" of response time measurements that would
typically correlate with "bad" results. This coordinated (non random) omission of source data, if left uncorrected,
will then dramatically skew any overall latency stats computed on the recorded information, as the recorded data set
itself will be significantly skewed towards good results.
When a value recorded in the histogram exceeds the
expectedIntervalBetweenValueSamples
parameter, recorded histogram data will
reflect an appropriate number of additional values, linearly decreasing in steps of
expectedIntervalBetweenValueSamples
, down to the last value
that would still be higher than expectedIntervalBetweenValueSamples
).
To illustrate why this corrective behavior is critically needed in order to accurately represent value
distribution when large value measurements may lead to missed samples, imagine a system for which response
times samples are taken once every 10 msec to characterize response time distribution.
The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample
showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples
at 1msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is
recorded (with a 100 second value).
An normally recorded (uncorrected) data histogram collected for such a hypothetical system (over the 200 second
scenario above) would show ~99.99% of results at 1msec or below, which is obviously "not right". In contrast, a
histogram that records the same data using the autocorrecting
recordValueWithExpectedInterval()
method with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the
real world response time distribution of this hypothetical system. Only ~50% of results will be at 1msec or below,
with the remaining 50% coming from the autogenerated value records covering the missing increments spread between
10msec and 100 sec.
Data sets recorded with and with
recordValue()
and with
recordValueWithExpectedInterval()
will differ only if at least one value recorded was greater than it's
associated expectedIntervalBetweenValueSamples
parameter.
Data sets recorded with
recordValueWithExpectedInterval()
parameter will be identical to ones recorded with
recordValue()
it if all values recorded via the recordValue
calls were smaller
than their associated expectedIntervalBetweenValueSamples
parameters.
In addition to atrecordingtime correction option, Histrogram variants also provide the postrecording correction
methods
copyCorrectedForCoordinatedOmission()
and
addWhileCorrectingForCoordinatedOmission()
.
These methods can be used for postrecording correction, and are useful when the
expectedIntervalBetweenValueSamples
parameter is estimated to be the same for all recorded
values. However, for obvious reasons, it is important to note that only one correction method (during or post
recording) should be be used on a given histogram data set.
When used for response time characterization, the recording with the optional
expectedIntervalBetweenValueSamples
parameter will tend to produce data sets that would
much more accurately reflect the response time distribution that a random, uncoordinated request would have
experienced.
AbstractHistogram
and their related supporting classes). HdrHistogram supports floating
point value recording and reporting with a similar set of classes, including the
DoubleHistogram
, ConcurrentDoubleHistogram
and
SynchronizedDoubleHistogram
histogram classes. Support for floating point value
iteration is provided with DoubleHistogramIterationValue
and related iterator classes (
DoubleLinearIterator
, DoubleLogarithmicIterator
,
DoublePercentileIterator
, DoubleRecordedValuesIterator
,
DoubleAllValuesIterator
). Support for interval recording is provided with
DoubleRecorder
and
SingleWriterDoubleRecorder
.
DoubleHistogram
(and variants) is not specified upfront. Only the dynamic range of values
that the histogram can cover is (optionally) specified. E.g. When a DoubleHistogram
is created to track a dynamic range of 3600000000000 (enough to track values from a nanosecond to an hour),
values could be recorded into into it in any consistent unit of time as long as the ratio between the highest
and lowest nonzero values stays within the specified dynamic range, so recording in units of nanoseconds
(1.0 thru 3600000000000.0), milliseconds (0.000001 thru 3600000.0) seconds (0.000000001 thru 3600.0), hours
(1/3.6E12 thru 1.0) will all work just as well.
highestTrackableValue
and numberOfSignificantValueDigits
combination. Beyond a relatively small fixedsize footprint used for internal fields and stats (which can be
estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by it's data value
recording counts array. The total footprint can be conservatively estimated by:
largestValueWithSingleUnitResolution = 2 * (10 ^ numberOfSignificantValueDigits);
subBucketSize = roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);
expectedHistogramFootprintInBytes = 512 +
({primitive type size} / 2) *
(log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
subBucketSize
A conservative (high) estimate of a Histogram's footprint in bytes is available via the
getEstimatedFootprintInBytes()
method.Interface  Description 

DoubleValueRecorder  
HistogramLogScanner.EncodableHistogramSupplier  
HistogramLogScanner.EventHandler 
Handles log events, return true to stop processing.

ValueRecorder 
Class  Description 

AbstractHistogram 
An abstract base class for integer values High Dynamic Range (HDR) Histograms

AllValuesIterator 
Used for iterating through histogram values using the finest granularity steps supported by the underlying
representation.

AtomicHistogram 
A High Dynamic Range (HDR) Histogram using atomic
long count type 
Base64Helper 
Base64Helper exists to bridge inconsistencies in Java SE support of Base64 encoding and decoding.

ConcurrentDoubleHistogram 
A floating point values High Dynamic Range (HDR) Histogram that supports safe concurrent recording
operations.

ConcurrentHistogram 
An integer values High Dynamic Range (HDR) Histogram that supports safe concurrent recording operations.

DoubleAllValuesIterator 
Used for iterating through
DoubleHistogram values using the finest granularity steps supported by the
underlying representation. 
DoubleHistogram 
A floating point values High Dynamic Range (HDR) Histogram

DoubleHistogramIterationValue 
Represents a value point iterated through in a
DoubleHistogram , with associated stats. 
DoubleLinearIterator 
Used for iterating through
DoubleHistogram values in linear steps. 
DoubleLogarithmicIterator 
Used for iterating through
DoubleHistogram values values in logarithmically increasing levels. 
DoublePercentileIterator 
Used for iterating through
DoubleHistogram values values according to percentile levels. 
DoubleRecordedValuesIterator 
Used for iterating through
DoubleHistogram values values using the finest granularity steps supported by
the underlying representation. 
DoubleRecorder 
Records floating point (double) values, and provides stable
interval
DoubleHistogram samples from live recorded data without interrupting or stalling active recording
of values. 
EncodableHistogram 
A base class for all encodable (and decodable) histogram classes.

Histogram 
A High Dynamic Range (HDR) Histogram

HistogramIterationValue 
Represents a value point iterated through in a Histogram, with associated stats.

HistogramLogProcessor 
HistogramLogProcessor will process an input log and
[can] generate two separate log files from a single histogram log file: a
sequential interval log file and a histogram percentile distribution log file. 
HistogramLogReader 
A histogram log reader.

HistogramLogScanner  
HistogramLogWriter 
A histogram log writer.

IntCountsHistogram 
A High Dynamic Range (HDR) Histogram using an
int count type 
LinearIterator 
Used for iterating through histogram values in linear steps.

LogarithmicIterator 
Used for iterating through histogram values in logarithmically increasing levels.

PercentileIterator 
Used for iterating through histogram values according to percentile levels.

RecordedValuesIterator 
Used for iterating through all recorded histogram values using the finest granularity steps supported by the
underlying representation.

Recorder 
Records integer values, and provides stable interval
Histogram samples from
live recorded data without interrupting or stalling active recording of values. 
ShortCountsHistogram 
A High Dynamic Range (HDR) Histogram using a
short count type 
SingleWriterDoubleRecorder 
Records floating point values, and provides stable interval
DoubleHistogram samples from live recorded data
without interrupting or stalling active recording of values. 
SingleWriterRecorder 
Records integer values, and provides stable interval
Histogram samples from
live recorded data without interrupting or stalling active recording of values. 
SynchronizedDoubleHistogram 
A floating point values High Dynamic Range (HDR) Histogram that is synchronized as a whole

SynchronizedHistogram 
An integer values High Dynamic Range (HDR) Histogram that is synchronized as a whole

WriterReaderPhaser 
WriterReaderPhaser provides an asymmetric means for
synchronizing the execution of waitfree "writer" critical sections against
a "reader phase flip" that needs to make sure no writer critical sections
that were active at the beginning of the flip are still active after the
flip is done. 
Copyright © 2019. All rights reserved.