Computing Weighted Correlation Functions¶
Every clustering statistic in Corrfunc
accepts an array
of weights that can be used to compute weighted correlation
functions. The API reference for each clustering statistic
(Corrfunc.theory.xi
, Corrfunc.mocks.DDrppi_mocks
,
etc.) contains examples of how to do this. The interface is standard across functions: the
inputs are a weights
array and a weight_type
string
that specifies how to use the “point weights” to compute a “pair weight”.
Currently, the only supported weight_type
is pair_product
,
in which the pair weight is the product of the point weights
(but see Implementing Custom Weight Functions for how to write your own
function).
Warning
The computation of the weighted result is susceptible to loss of floating
point precision, especially in single precision. If you are using single
precision, make sure you test double precision as well (by casting all
pos and weight input arrays to type np.float64
, for example)
and check that the difference with the single-precision result
is acceptable.
If weight_type
and weights
(or weights1
and weights2
for cross-correlations) are given, the mean pair weight in a
separation bin will be given in the weightavg
field of the
output. This field is 0.0 if weights are disabled.
Pair counts (i.e. the npairs
field in the results
array)
are never affected by weights. For theory functions like
Corrfunc.theory.xi
and Corrfunc.theory.wp
that actually return a clustering statistic, the statistic is weighted.
For pair_product
, the distribution used to compute the
expected bin weight from an unclustered particle set (the RR
term)
is taken to be a spatially uniform particle set where every particle
has the mean weight. See RR in Weighted Clustering Statistics for more discussion.
Running with weights incurrs a modest performance hit (around
20%, similar to enabling ravg
). Weights are supported for
all instruction sets (SSE, AVX, and fallback).
Consider the following simple example adapted from the Corrfunc.theory.xi
docstring, in which we assign a weight of 0.5 to every particle and get
the expected average pair weight of 0.25 (last column of the output).
Note that xi
(fourth column) is also weighted, but the case of uniform
weights is equivalent to the unweighted case.
>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.xi import xi
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
... "../theory/tests/", "bins")
>>> N = 100000
>>> boxsize = 420.0
>>> nthreads = 4
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.full_like(X, 0.5)
>>> results = xi(boxsize, nthreads, binfile, X, Y, Z, weights=weights, weight_type='pair_product', output_ravg=True)
>>> for r in results: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.6f} {4:10d} {5:10.6f}"
... .format(r['rmin'], r['rmax'],
... r['ravg'], r['xi'], r['npairs'], r['weightavg']))
...
0.167536 0.238755 0.226592 -0.205733 4 0.250000
0.238755 0.340251 0.289277 -0.176729 12 0.250000
0.340251 0.484892 0.426819 -0.051829 40 0.250000
0.484892 0.691021 0.596187 -0.131853 106 0.250000
0.691021 0.984777 0.850100 -0.049207 336 0.250000
0.984777 1.403410 1.225112 0.028543 1052 0.250000
1.403410 2.000000 1.737153 0.011403 2994 0.250000
2.000000 2.850200 2.474588 0.005405 8614 0.250000
2.850200 4.061840 3.532018 -0.014098 24448 0.250000
4.061840 5.788530 5.022241 -0.010784 70996 0.250000
5.788530 8.249250 7.160648 -0.001588 207392 0.250000
8.249250 11.756000 10.207213 -0.000323 601002 0.250000
11.756000 16.753600 14.541171 0.000007 1740084 0.250000
16.753600 23.875500 20.728773 -0.001595 5028058 0.250000