An HdrHistogram histogram supports the recording and analyzing sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.
In contrast to traditional histograms that use linear, logarithmic, or arbitrary sized bins or buckets, HdrHistograms use a fixed storage internal data representation that simultaneously supports an arbitrarily high dynamic range and arbitrary precision throughout that dynamic range. This capability makes HdrHistograms extremely useful for tracking and reporting on the distribution of percentile values with high resolution and across a wide dynamic range -- a common need in latency behavior characterization.
The HdrHistogram package was specifically designed with latency and performance sensitive applications in mind. Experimental u-benchmark measurements show value recording times as low as 3-6 nanoseconds on modern (circa 2012) Intel CPUs. All Histogram variants can maintain a fixed cost in both space and time. When not configured to auto-resize, a Histogram's memory footprint is constant, with no allocation operations involved in recording data values or in iterating through them. The memory footprint is fixed regardless of the number of data value samples recorded, and depends solely on the dynamic range and precision chosen. The amount of work involved in recording a sample is constant, and directly computes storage index locations such that no iteration or searching is ever involved in recording data values.
NOTE: Histograms can optionally be configured to auto-resize their dynamic range as a convenience feature. When configured to auto-resize, recording operations that need to expand a histogram will auto-resize its dynamic range to include recorded values as they are encountered. Note that recording calls that cause auto-resizing may take longer to execute, and that resizing incurs allocation and copying of internal data structures.
The combination of high dynamic range and precision is useful for collection and accurate post-recording analysis of sampled value data distribution in various forms. Whether it's calculating or plotting arbitrary percentiles, iterating through and summarizing values in various ways, or deriving mean and standard deviation values, the fact that the recorded value count information is kept in high resolution allows for accurate post-recording analysis with low [and ultimately configurable] loss in accuracy when compared to performing the same analysis directly on the potentially infinite series of sourced data values samples.
An HdrHistogram histogram is usually configured to maintain value count data with a resolution good enough to support a desired precision in post-recording analysis and reporting on the collected data. Analysis can include the computation and reporting of distribution by percentiles, linear or logarithmic arbitrary value buckets, mean and standard deviation, as well as any other computations that can supported using the various iteration techniques available on the collected value count data. In practice, a precision levels of 2 or 3 decimal points are most commonly used, as they maintain a value accuracy of +/- ~1% or +/- ~0.1% respectively for derived distribution statistics.
A good example of HdrHistogram use would be tracking of latencies across a wide dynamic range. E.g. from a
microsecond to an hour. A Histogram can be configured to track and later report on the counts of observed integer
usec-unit latency values between 0 and 3,600,000,000 while maintaining a value precision of 3 significant digits
across that range. Such an example Histogram would simply be created with a
highestTrackableValue
of 3,600,000,000, and a
numberOfSignificantValueDigits
of 3, and would occupy a fixed, unchanging memory footprint
of around 185KB (see "Footprint estimation" below).
Code for this use example would include these basic elements:
Histogram
histogram = new Histogram
(3600000000L, 3);
.
.
.
// Repeatedly record measured latencies:
histogram.recordValue
(latency);
.
.
.
// Report histogram percentiles, expressed in msec units:
histogram.outputPercentileDistribution
(histogramLog, 1000.0)};
Specifying 3 decimal points of precision in this example guarantees that value quantization within the value range
will be no larger than 1/1,000th (or 0.1%) of any recorded value. This example Histogram can be therefor used to
track, analyze and report the counts of observed latencies ranging between 1 microsecond and 1 hour in magnitude,
while maintaining a value resolution 1 microsecond (or better) up to 1 millisecond, a resolution of 1 millisecond
(or better) up to one second, and a resolution of 1 second (or better) up to 1,000 seconds. At it's maximum tracked
value (1 hour), it would still maintain a resolution of 3.6 seconds (or better).
AbstractHistogram
class:
Histogram
, which is the commonly used Histogram form and tracks value counts
in long
fields. IntCountsHistogram
and ShortCountsHistogram
, which track value counts
in int
and
short
fields respectively, are provided for use cases where smaller count ranges are practical
and smaller overall storage is beneficial (e.g. systems where tens of thousands of in-memory histogram are
being tracked).AtomicHistogram
, ConcurrentHistogram
and SynchronizedHistogram
Internally, data in HdrHistogram variants is maintained using a concept somewhat similar to that of floating
point number representation: Using a an exponent a (non-normalized) mantissa to
support a wide dynamic range at a high but varying (by exponent value) resolution.
AbstractHistogram uses exponentially increasing bucket value ranges (the parallel of
the exponent portion of a floating point number) with each bucket containing
a fixed number (per bucket) set of linear sub-buckets (the parallel of a non-normalized mantissa portion
of a floating point number).
Both dynamic range and resolution are configurable, with highestTrackableValue
controlling dynamic range, and numberOfSignificantValueDigits
controlling
resolution.
Histogram
class and it's IntCountsHistogram
and ShortCountsHistogram
variants are NOT internally synchronized, and do NOT use atomic variables. Callers wishing to make potentially
concurrent, multi-threaded updates or queries against Histogram objects should either take care to externally
synchronize and/or order their access, or use the ConcurrentHistogram
,
AtomicHistogram
, or SynchronizedHistogram
variants.
A common pattern seen in histogram value recording involves recording values in a critical path (multi-threaded
or not), coupled with a non-critical path reading the recorded data for summary/reporting purposes. When such
continuous non-blocking recording operation (concurrent or not) is desired even when sampling, analyzing, or
reporting operations are needed, consider using the Recorder
and
SingleWriterRecorder
variants that were specifically designed for that purpose.
Recorders provide a recording API similar to Histogram, and internally maintain and coordinate active/inactive
histograms such that recording remains wait-free in the presense of accurate and stable interval sampling.
It is worth mentioning that since Histogram objects are additive, it is common practice to use per-thread
non-synchronized histograms or SingleWriterRecorder
s, and using a summary/reporting
thread perform histogram aggregation math across time and/or threads.
HistogramIterationValue
data points along the histogram's iterated data set, and are available via the following methods:
percentiles
:
An Iterable
<HistogramIterationValue
> through the
histogram using a PercentileIterator
linearBucketValues
:
An Iterable
<HistogramIterationValue
> through
the histogram using a LinearIterator
logarithmicBucketValues
:
An Iterable
<HistogramIterationValue
>
through the histogram using a LogarithmicIterator
recordedValues
:
An Iterable
<HistogramIterationValue
> through
the histogram using a RecordedValuesIterator
allValues
:
An Iterable
<HistogramIterationValue
> through
the histogram using a AllValuesIterator
Iteration is typically done with a for-each loop statement. E.g.:
for (HistogramIterationValue v : histogram.percentiles(percentileTicksPerHalfDistance)) {
...
}
or
for (HistogramIterationValue v : histogram.linearBucketValues(valueUnitsPerBucket)) {
...
}
The iterators associated with each iteration method are resettable, such that a caller that would like to avoid
allocating a new iterator object for each iteration loop can re-use an iterator to repeatedly iterate through the
histogram. This iterator re-use usually takes the form of a traditional for loop using the Iterator's
hasNext()
and next()
methods:
to avoid allocating a new iterator object for each iteration loop:
PercentileIterator iter = histogram.percentiles().iterator(percentileTicksPerHalfDistance);
...
iter.reset(percentileTicksPerHalfDistance);
for (iter.hasNext() {
HistogramIterationValue v = iter.next();
...
}
Due to the finite (and configurable) resolution of the histogram, multiple adjacent integer data values can be "equivalent". Two values are considered "equivalent" if samples recorded for both are always counted in a common total count due to the histogram's resolution level. Histogram provides methods for determining the lowest and highest equivalent values for any given value, as we as determining whether two values are equivalent, and for finding the next non-equivalent value for a given value (useful when looping through values, in order to avoid double-counting count).
Regular, raw value data recording into an HdrHistogram is achieved with the
recordValue()
method.
Histogram variants also provide an auto-correcting
recordValueWithExpectedInterval()
form in support of a common use case found when histogram values are used to track response time
distribution in the presence of Coordinated Omission - an extremely common phenomenon found in latency recording
systems.
This correcting form is useful in [e.g. load generator] scenarios where measured response times may exceed the
expected interval between issuing requests, leading to the "omission" of response time measurements that would
typically correlate with "bad" results. This coordinated (non random) omission of source data, if left uncorrected,
will then dramatically skew any overall latency stats computed on the recorded information, as the recorded data set
itself will be significantly skewed towards good results.
When a value recorded in the histogram exceeds the
expectedIntervalBetweenValueSamples
parameter, recorded histogram data will
reflect an appropriate number of additional values, linearly decreasing in steps of
expectedIntervalBetweenValueSamples
, down to the last value
that would still be higher than expectedIntervalBetweenValueSamples
).
To illustrate why this corrective behavior is critically needed in order to accurately represent value
distribution when large value measurements may lead to missed samples, imagine a system for which response
times samples are taken once every 10 msec to characterize response time distribution.
The hypothetical system behaves "perfectly" for 100 seconds (10,000 recorded samples), with each sample
showing a 1msec response time value. At each sample for 100 seconds (10,000 logged samples
at 1msec each). The hypothetical system then encounters a 100 sec pause during which only a single sample is
recorded (with a 100 second value).
An normally recorded (uncorrected) data histogram collected for such a hypothetical system (over the 200 second
scenario above) would show ~99.99% of results at 1msec or below, which is obviously "not right". In contrast, a
histogram that records the same data using the auto-correcting
recordValueWithExpectedInterval()
method with the knowledge of an expectedIntervalBetweenValueSamples of 10msec will correctly represent the
real world response time distribution of this hypothetical system. Only ~50% of results will be at 1msec or below,
with the remaining 50% coming from the auto-generated value records covering the missing increments spread between
10msec and 100 sec.
Data sets recorded with and with
recordValue()
and with
recordValueWithExpectedInterval()
will differ only if at least one value recorded was greater than it's
associated expectedIntervalBetweenValueSamples
parameter.
Data sets recorded with
recordValueWithExpectedInterval()
parameter will be identical to ones recorded with
recordValue()
it if all values recorded via the recordValue
calls were smaller
than their associated expectedIntervalBetweenValueSamples
parameters.
In addition to at-recording-time correction option, Histrogram variants also provide the post-recording correction
methods
copyCorrectedForCoordinatedOmission()
and
addWhileCorrectingForCoordinatedOmission()
.
These methods can be used for post-recording correction, and are useful when the
expectedIntervalBetweenValueSamples
parameter is estimated to be the same for all recorded
values. However, for obvious reasons, it is important to note that only one correction method (during or post
recording) should be be used on a given histogram data set.
When used for response time characterization, the recording with the optional
expectedIntervalBetweenValueSamples
parameter will tend to produce data sets that would
much more accurately reflect the response time distribution that a random, uncoordinated request would have
experienced.
AbstractHistogram
and their related supporting classes). HdrHistogram supports floating
point value recording and reporting with a similar set of classes, including the
DoubleHistogram
, ConcurrentDoubleHistogram
and
SynchronizedDoubleHistogram
histogram classes. Support for floating point value
iteration is provided with DoubleHistogramIterationValue
and related iterator classes (
DoubleLinearIterator
, DoubleLogarithmicIterator
,
DoublePercentileIterator
, DoubleRecordedValuesIterator
,
DoubleAllValuesIterator
). Support for interval recording is provided with
DoubleRecorder
and
SingleWriterDoubleRecorder
.
DoubleHistogram
(and variants) is not specified upfront. Only the dynamic range of values
that the histogram can cover is (optionally) specified. E.g. When a DoubleHistogram
is created to track a dynamic range of 3600000000000 (enough to track values from a nanosecond to an hour),
values could be recorded into into it in any consistent unit of time as long as the ratio between the highest
and lowest non-zero values stays within the specified dynamic range, so recording in units of nanoseconds
(1.0 thru 3600000000000.0), milliseconds (0.000001 thru 3600000.0) seconds (0.000000001 thru 3600.0), hours
(1/3.6E12 thru 1.0) will all work just as well.
highestTrackableValue
and numberOfSignificantValueDigits
combination. Beyond a relatively small fixed-size footprint used for internal fields and stats (which can be
estimated as "fixed at well less than 1KB"), the bulk of a Histogram's storage is taken up by it's data value
recording counts array. The total footprint can be conservatively estimated by:
largestValueWithSingleUnitResolution = 2 * (10 ^ numberOfSignificantValueDigits);
subBucketSize = roundedUpToNearestPowerOf2(largestValueWithSingleUnitResolution);
expectedHistogramFootprintInBytes = 512 +
({primitive type size} / 2) *
(log2RoundedUp((highestTrackableValue) / subBucketSize) + 2) *
subBucketSize
A conservative (high) estimate of a Histogram's footprint in bytes is available via the
getEstimatedFootprintInBytes()
method.Interface | Description |
---|---|
DoubleValueRecorder | |
HistogramLogScanner.EncodableHistogramSupplier | |
HistogramLogScanner.EventHandler |
Handles log events, return true to stop processing.
|
ValueRecorder |
Class | Description |
---|---|
AbstractHistogram |
An abstract base class for integer values High Dynamic Range (HDR) Histograms
|
AllValuesIterator |
Used for iterating through histogram values using the finest granularity steps supported by the underlying
representation.
|
AtomicHistogram |
A High Dynamic Range (HDR) Histogram using atomic
long count type |
Base64Helper |
Base64Helper exists to bridge inconsistencies in Java SE support of Base64 encoding and decoding.
|
ConcurrentDoubleHistogram |
A floating point values High Dynamic Range (HDR) Histogram that supports safe concurrent recording
operations.
|
ConcurrentHistogram |
An integer values High Dynamic Range (HDR) Histogram that supports safe concurrent recording operations.
|
DoubleAllValuesIterator |
Used for iterating through
DoubleHistogram values using the finest granularity steps supported by the
underlying representation. |
DoubleHistogram |
A floating point values High Dynamic Range (HDR) Histogram
|
DoubleHistogramIterationValue |
Represents a value point iterated through in a
DoubleHistogram , with associated stats. |
DoubleLinearIterator |
Used for iterating through
DoubleHistogram values in linear steps. |
DoubleLogarithmicIterator |
Used for iterating through
DoubleHistogram values values in logarithmically increasing levels. |
DoublePercentileIterator |
Used for iterating through
DoubleHistogram values values according to percentile levels. |
DoubleRecordedValuesIterator |
Used for iterating through
DoubleHistogram values values using the finest granularity steps supported by
the underlying representation. |
DoubleRecorder |
Records floating point (double) values, and provides stable
interval
DoubleHistogram samples from live recorded data without interrupting or stalling active recording
of values. |
EncodableHistogram |
A base class for all encodable (and decodable) histogram classes.
|
Histogram |
A High Dynamic Range (HDR) Histogram
|
HistogramIterationValue |
Represents a value point iterated through in a Histogram, with associated stats.
|
HistogramLogProcessor |
HistogramLogProcessor will process an input log and
[can] generate two separate log files from a single histogram log file: a
sequential interval log file and a histogram percentile distribution log file. |
HistogramLogReader |
A histogram log reader.
|
HistogramLogScanner | |
HistogramLogWriter |
A histogram log writer.
|
IntCountsHistogram |
A High Dynamic Range (HDR) Histogram using an
int count type |
LinearIterator |
Used for iterating through histogram values in linear steps.
|
LogarithmicIterator |
Used for iterating through histogram values in logarithmically increasing levels.
|
PercentileIterator |
Used for iterating through histogram values according to percentile levels.
|
RecordedValuesIterator |
Used for iterating through all recorded histogram values using the finest granularity steps supported by the
underlying representation.
|
Recorder |
Records integer values, and provides stable interval
Histogram samples from
live recorded data without interrupting or stalling active recording of values. |
ShortCountsHistogram |
A High Dynamic Range (HDR) Histogram using a
short count type |
SingleWriterDoubleRecorder |
Records floating point values, and provides stable interval
DoubleHistogram samples from live recorded data
without interrupting or stalling active recording of values. |
SingleWriterRecorder |
Records integer values, and provides stable interval
Histogram samples from
live recorded data without interrupting or stalling active recording of values. |
SynchronizedDoubleHistogram |
A floating point values High Dynamic Range (HDR) Histogram that is synchronized as a whole
|
SynchronizedHistogram |
An integer values High Dynamic Range (HDR) Histogram that is synchronized as a whole
|
WriterReaderPhaser |
WriterReaderPhaser provides an asymmetric means for
synchronizing the execution of wait-free "writer" critical sections against
a "reader phase flip" that needs to make sure no writer critical sections
that were active at the beginning of the flip are still active after the
flip is done. |
Copyright © 2019. All rights reserved.