Notes on the Random-Random Term in Autocorrelations

The following discussion is adapted from this notebook by Lehman Garrison.

\[\newcommand{\RR}{\mathrm{RR}} \newcommand{\DD}{\mathrm{DD}}\]

When computing a two-point correlation function estimator like

\[\xi(r) = \frac{\DD}{\RR} - 1,\]

the \(\RR\) term can be computed analytically if the domain is a periodic box. Often, this is done as

\[\begin{split}\begin{align} \RR_i &= N V_i \bar\rho \\ &= N V_i \frac{N}{L^3} \end{align}\end{split}\]

where \(\RR_i\) is the expected number of random-random pairs in bin \(i\), \(N\) is the total number of points, \(V_i\) is the volume (or area if 2D) of bin \(i\), \(L\) is the box size, and \(\bar\rho\) is the average density in the box.

However, using \(\bar\rho = \frac{N}{L^3}\) is only correct for continuous fields, not sets of particles. When sitting on a particle, only \(N-1\) particles are available to be in a bin at some non-zero distance. The remaining particle is the particle you’re sitting on, which is always at distance \(0\). Thus, the correct expression is

\[\RR_i = N V_i \frac{N-1}{L^3}.\]

See this notebook for an empirical demonstration of this effect; specifically, that computing the density with \(N-1\) is correct, and that using \(N\) introduces bias of order \(\frac{1}{N}\) into the estimator. This is a tiny correction for large \(N\) problems, but important for small \(N\).

Any Corrfunc function that returns a clustering statistic (not just raw pair counts) implements this correction. Currently, this includes Corrfunc.theory.xi and Corrfunc.theory.wp.

Cross-correlations of two different particle sets don’t suffer from this problem; the particle you’re sitting on is never part of the set of particles under consideration for pair-making.

Corrfunc also allows bins of zero separation, in which “self-pairs” are included in the pair counting. \(\RR_i\) must reflect this by simply adding \(N\) to any such bin.

RR in Weighted Clustering Statistics

We can extend the above discussion to weighted correlation functions in which each particle is assigned a weight, and the pair weight is taken as the product of the particle weights (see Computing Weighted Correlation Functions).

Let \(w_j\) be the weight of particle \(j\), and \(W\) be the sum of the weights. We will define the “unclustered” particle distribution to be the case of \(N\) particles uniformly distributed, where each is assigned the mean weight \(\bar w\). We thus have

\[\begin{split}\begin{align} \RR_i &= \sum_{j=1}^N \bar w (W - \bar w) \frac{V_i}{L^3} \\ &= (W^2 - \bar w W) \frac{V_i}{L^3} \\ &= W^2\left(1 - \frac{1}{N}\right) \frac{V_i}{L^3}. \end{align}\end{split}\]

When the particles all have \(w_j = 1\), then \(W = N\) and we recover the unweighted result from above.

There are other ways to define the unclustered distribution. If we were to redistribute the particles uniformly but preserve their individual weights, we would find

\[\begin{split}\begin{align} \RR_i &= \sum_{j=1}^N w_j (W - w_j) \frac{V_i}{L^3} \\ &= \left(W^2 - \sum_{j=1}^N w_j^2\right) \frac{V_i}{L^3}. \end{align}\end{split}\]

This is not what we use in Corrfunc, but this should help illuminate some of the considerations that go into defining the “unclustered” case when writing a custom weight function (see Implementing Custom Weight Functions).