Corrfunc.theory package

Wrapper for all clustering statistic calculations on galaxies in a simulation volume.

Corrfunc.theory.DD(autocorr, nthreads, binfile, X1, Y1, Z1, weights1=None, periodic=True, X2=None, Y2=None, Z2=None, weights2=None, verbose=False, boxsize=0.0, output_ravg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest', weight_type=None)[source]

Calculate the 3-D pair-counts corresponding to the real-space correlation function, \(\xi(r)\).

If weights are provided, the resulting pair counts are weighted. The weighting scheme depends on weight_type.

Note

This module only returns pair counts and not the actual correlation function \(\xi(r)\). See Corrfunc.utils.convert_3d_counts_to_cf for computing for computing \(\xi(r)\) from the pair counts returned.

Parameters:
  • autocorr (boolean, required) – Boolean flag for auto/cross-correlation. If autocorr is set to 1, then the second set of particle positions are not required.
  • nthreads (integer) – The number of OpenMP threads to use. Has no effect if OpenMP was not enabled during library compilation.
  • binfile (string or an list/array of floats) –

    For string input: filename specifying the r bins for DD. The file should contain white-space separated values of (rmin, rmax) for each r wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

    For array-like input: A sequence of r values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

  • X1/Y1/Z1 (array_like, real (float/double)) – The array of X/Y/Z positions for the first set of points. Calculations are done in the precision of the supplied arrays.
  • weights1 (array_like, real (float/double), optional) – A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field. If only one of weights1 and weights2 is specified, the other will be set to uniform weights.
  • periodic (boolean) – Boolean flag to indicate periodic boundary conditions.
  • X2/Y2/Z2 (array-like, real (float/double)) – Array of XYZ positions for the second set of points. Must be the same precision as the X1/Y1/Z1 arrays. Only required when autocorr==0.
  • weights2 (array-like, real (float/double), optional) – Same as weights1, but for the second set of positions
  • verbose (boolean (default false)) – Boolean flag to control output of informational messages
  • boxsize (double) – The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
  • output_ravg (boolean (default false)) –

    Boolean flag to output the average r for each bin. Code will run slower if you set this flag.

    Note: If you are calculating in single-precision, ravg will suffer from numerical loss of precision and can not be trusted. If you need accurate ravg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional
The type of weighting to apply. One of [“pair_product”, None]. Default: None.
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rmin, rmax, ravg, npairs, weightavg] for each radial bin specified in the binfile. If output_ravg is not set, then ravg will be set to 0.0 for all bins; similarly for weightavg. npairs contains the number of pairs in that bin and can be used to compute the actual \(\xi(r)\) by combining with (DR, RR) counts.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.DD import DD
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> nthreads = 4
>>> autocorr = 1
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = DD(autocorr, nthreads, binfile, X, Y, Z, weights1=weights, weight_type='pair_product', output_ravg=True)
>>> for r in results: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10d} {4:10.6f}".
...                         format(r['rmin'], r['rmax'], r['ravg'],
...                         r['npairs'], r['weightavg'])) 
  0.167536   0.238755   0.000000          0   0.000000
  0.238755   0.340251   0.000000          0   0.000000
  0.340251   0.484892   0.000000          0   0.000000
  0.484892   0.691021   0.000000          0   0.000000
  0.691021   0.984777   0.945372          2   1.000000
  0.984777   1.403410   1.340525         10   1.000000
  1.403410   2.000000   1.732968         36   1.000000
  2.000000   2.850200   2.558878         54   1.000000
  2.850200   4.061840   3.564959        208   1.000000
  4.061840   5.788530   4.999278        674   1.000000
  5.788530   8.249250   7.126673       2154   1.000000
  8.249250  11.756000  10.201834       5996   1.000000
 11.756000  16.753600  14.517830      17746   1.000000
 16.753600  23.875500  20.716017      50252   1.000000
Corrfunc.theory.DDrppi(autocorr, nthreads, pimax, binfile, X1, Y1, Z1, weights1=None, periodic=True, X2=None, Y2=None, Z2=None, weights2=None, verbose=False, boxsize=0.0, output_rpavg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest', weight_type=None)[source]

Calculate the 3-D pair-counts corresponding to the real-space correlation function, \(\xi(r_p, \pi)\) or \(\wp(r_p)\). Pairs which are separated by less than the rp bins (specified in binfile) in the X-Y plane, and less than pimax in the Z-dimension are counted.

If weights are provided, the resulting pair counts are weighted. The weighting scheme depends on weight_type.

Note

that this module only returns pair counts and not the actual correlation function \(\xi(r_p, \pi)\) or \(wp(r_p)\). See the utilities Corrfunc.utils.convert_3d_counts_to_cf and Corrfunc.utils.convert_rp_pi_counts_to_wp for computing \(\xi(r_p, \pi)\) and \(wp(r_p)\) respectively from the pair counts.

Parameters:
  • autocorr (boolean, required) – Boolean flag for auto/cross-correlation. If autocorr is set to 1, then the second set of particle positions are not required.
  • nthreads (integer) – The number of OpenMP threads to use. Has no effect if OpenMP was not enabled during library compilation.
  • pimax (double) –

    A double-precision value for the maximum separation along the Z-dimension.

    Distances along the :math:\pi direction are binned with unit depth. For instance, if pimax=40, then 40 bins will be created along the pi direction.

    Note: Only pairs with 0 <= dz < pimax are counted (no equality).

binfile: string or an list/array of floats

For string input: filename specifying the rp bins for DDrppi. The file should contain white-space separated values of (rpmin, rpmax) for each rp wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

For array-like input: A sequence of rp values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

X1/Y1/Z1: array-like, real (float/double)
The array of X/Y/Z positions for the first set of points. Calculations are done in the precision of the supplied arrays.
weights1: array_like, real (float/double), optional
A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field. If only one of weights1 and weights2 is specified, the other will be set to uniform weights.
X2/Y2/Z2: array-like, real (float/double)
Array of XYZ positions for the second set of points. Must be the same precision as the X1/Y1/Z1 arrays. Only required when autocorr==0.
weights2: array-like, real (float/double), optional
Same as weights1, but for the second set of positions
periodic: boolean
Boolean flag to indicate periodic boundary conditions.
verbose: boolean (default false)
Boolean flag to control output of informational messages
boxsize: double
The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
output_rpavg: boolean (default false)

Boolean flag to output the average rp for each bin. Code will run slower if you set this flag.

Note: If you are calculating in single-precision, rpavg will suffer from numerical loss of precision and can not be trusted. If you need accurate rpavg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rpmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional
The type of weighting to apply. One of [“pair_product”, None]. Default: None.
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rpmin, rpmax, rpavg, pimax, npairs, weightavg] for each radial bin specified in the binfile. If output_rpavg is not set, then rpavg will be set to 0.0 for all bins; similarly for weightavg. npairs contains the number of pairs in that bin and can be used to compute \(\xi(r_p, \pi)\) by combining with (DR, RR) counts.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.DDrppi import DDrppi
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> nthreads = 4
>>> autocorr = 1
>>> pimax = 40.0
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = DDrppi(autocorr, nthreads, pimax, binfile,
...                  X, Y, Z, weights1=weights, weight_type='pair_product', output_rpavg=True)
>>> for r in results[519:]: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.1f}"
...                               " {4:10d} {5:10.6f}".format(r['rmin'], r['rmax'],
...                               r['rpavg'], r['pimax'], r['npairs'], r['weightavg']))
...                         
 11.756000  16.753600  14.379250       40.0       1150   1.000000
 16.753600  23.875500  20.449131        1.0       2604   1.000000
 16.753600  23.875500  20.604834        2.0       2370   1.000000
 16.753600  23.875500  20.523989        3.0       2428   1.000000
 16.753600  23.875500  20.475181        4.0       2462   1.000000
 16.753600  23.875500  20.458005        5.0       2532   1.000000
 16.753600  23.875500  20.537162        6.0       2522   1.000000
 16.753600  23.875500  20.443087        7.0       2422   1.000000
 16.753600  23.875500  20.474580        8.0       2360   1.000000
 16.753600  23.875500  20.420360        9.0       2512   1.000000
 16.753600  23.875500  20.478355       10.0       2472   1.000000
 16.753600  23.875500  20.485268       11.0       2406   1.000000
 16.753600  23.875500  20.372985       12.0       2420   1.000000
 16.753600  23.875500  20.647998       13.0       2378   1.000000
 16.753600  23.875500  20.556208       14.0       2420   1.000000
 16.753600  23.875500  20.527992       15.0       2462   1.000000
 16.753600  23.875500  20.581017       16.0       2380   1.000000
 16.753600  23.875500  20.491819       17.0       2346   1.000000
 16.753600  23.875500  20.534440       18.0       2496   1.000000
 16.753600  23.875500  20.529129       19.0       2512   1.000000
 16.753600  23.875500  20.501946       20.0       2500   1.000000
 16.753600  23.875500  20.513349       21.0       2544   1.000000
 16.753600  23.875500  20.471915       22.0       2430   1.000000
 16.753600  23.875500  20.450651       23.0       2354   1.000000
 16.753600  23.875500  20.550753       24.0       2460   1.000000
 16.753600  23.875500  20.540262       25.0       2490   1.000000
 16.753600  23.875500  20.559572       26.0       2350   1.000000
 16.753600  23.875500  20.534245       27.0       2382   1.000000
 16.753600  23.875500  20.511302       28.0       2508   1.000000
 16.753600  23.875500  20.491632       29.0       2456   1.000000
 16.753600  23.875500  20.592493       30.0       2386   1.000000
 16.753600  23.875500  20.506234       31.0       2484   1.000000
 16.753600  23.875500  20.482109       32.0       2538   1.000000
 16.753600  23.875500  20.518463       33.0       2544   1.000000
 16.753600  23.875500  20.482515       34.0       2534   1.000000
 16.753600  23.875500  20.503124       35.0       2382   1.000000
 16.753600  23.875500  20.471307       36.0       2356   1.000000
 16.753600  23.875500  20.384231       37.0       2554   1.000000
 16.753600  23.875500  20.454012       38.0       2458   1.000000
 16.753600  23.875500  20.585543       39.0       2394   1.000000
 16.753600  23.875500  20.504965       40.0       2500   1.000000
Corrfunc.theory.wp(boxsize, pimax, nthreads, binfile, X, Y, Z, weights=None, weight_type=None, verbose=False, output_rpavg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, c_cell_timer=False, isa=u'fastest')[source]

Function to compute the projected correlation function in a periodic cosmological box. Pairs which are separated by less than the rp bins (specified in binfile) in the X-Y plane, and less than pimax in the Z-dimension are counted.

If weights are provided, the resulting correlation function is weighted. The weighting scheme depends on weight_type.

Note

Pairs are double-counted. And if rpmin is set to 0.0, then all the self-pairs (i’th particle with itself) are added to the first bin => minimum number of pairs in the first bin is the total number of particles.

Parameters:
  • boxsize (double) – A double-precision value for the boxsize of the simulation in same units as the particle positions and the rp bins.
  • pimax (double) –

    A double-precision value for the maximum separation along the Z-dimension.

    Note: Only pairs with 0 <= dz < pimax are counted (no equality).

nthreads: integer
Number of threads to use.
binfile: string or an list/array of floats

For string input: filename specifying the rp bins for wp. The file should contain white-space separated values of (rpmin, rpmax) for each rp wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

For array-like input: A sequence of rp values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

X/Y/Z: arraytype, real (float/double)

Particle positions in the 3 axes. Must be within [0, boxsize] and specified in the same units as rp_bins and boxsize. All 3 arrays must be of the same floating-point type.

Calculations will be done in the same precision as these arrays, i.e., calculations will be in floating point if XYZ are single precision arrays (C float type); or in double-precision if XYZ are double precision arrays (C double type).

weights: array_like, real (float/double), optional
A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field.
verbose: boolean (default false)
Boolean flag to control output of informational messages
output_rpavg: boolean (default false)

Boolean flag to output the average rp for each bin. Code will run slower if you set this flag.

Note: If you are calculating in single-precision, rpavg will suffer from numerical loss of precision and can not be trusted. If you need accurate rpavg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rpmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
c_cell_timer : boolean (default false)
Boolean flag to measure actual time spent per cell-pair within the C libraries. A very detailed timer that stores information about the number of particles in each cell, the thread id that processed that cell-pair and the amount of time in nano-seconds taken to process that cell pair. This timer can be used to study the instruction set efficiency, and load-balancing of the code.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional
The type of weighting to apply. One of [“pair_product”, None]. Default: None.
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rpmin, rpmax, rpavg, wp, npairs, weightavg] for each radial specified in the binfile. If output_rpavg is not set then rpavg will be set to 0.0 for all bins; similarly for weightavg. wp contains the projected correlation function while npairs contains the number of unique pairs in that bin. If using weights, wp will be weighted while npairs will not be.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.
  • cell_time (list, optional) – Only returned if c_cell_timer is set. Contains detailed stats about each cell-pair visited during pair-counting, viz., number of particles in each of the cells in the pair, 1-D cell-indices for each cell in the pair, time (in nano-seconds) to process the pair and the thread-id for the thread that processed that cell-pair.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.wp import wp
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> pimax = 40.0
>>> nthreads = 4
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> results = wp(boxsize, pimax, nthreads, binfile, X, Y, Z, weights=np.ones_like(X), weight_type='pair_product')
>>> for r in results:
...     print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.6f} {4:10d} {5:10.6f}".
...           format(r['rmin'], r['rmax'],
...           r['rpavg'], r['wp'], r['npairs'], r['weightavg']))
...           
  0.167536   0.238755   0.000000  66.717143         18   1.000000
  0.238755   0.340251   0.000000 -15.786045         16   1.000000
  0.340251   0.484892   0.000000   2.998470         42   1.000000
  0.484892   0.691021   0.000000 -15.779885         66   1.000000
  0.691021   0.984777   0.000000 -11.966728        142   1.000000
  0.984777   1.403410   0.000000  -9.699906        298   1.000000
  1.403410   2.000000   0.000000 -11.698771        588   1.000000
  2.000000   2.850200   0.000000   3.848375       1466   1.000000
  2.850200   4.061840   0.000000  -0.921452       2808   1.000000
  4.061840   5.788530   0.000000   0.454851       5802   1.000000
  5.788530   8.249250   0.000000   1.428344      11926   1.000000
  8.249250  11.756000   0.000000  -1.067885      23478   1.000000
 11.756000  16.753600   0.000000  -0.553319      47994   1.000000
 16.753600  23.875500   0.000000  -0.086433      98042   1.000000
Corrfunc.theory.xi(boxsize, nthreads, binfile, X, Y, Z, weights=None, weight_type=None, verbose=False, output_ravg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest')[source]

Function to compute the projected correlation function in a periodic cosmological box. Pairs which are separated by less than the r bins (specified in binfile) in 3-D real space.

If weights are provided, the resulting correlation function is weighted. The weighting scheme depends on weight_type.

Note

Pairs are double-counted. And if rmin is set to 0.0, then all the self-pairs (i’th particle with itself) are added to the first bin => minimum number of pairs in the first bin is the total number of particles.

Parameters:
  • boxsize (double) – A double-precision value for the boxsize of the simulation in same units as the particle positions and the r bins.
  • nthreads (integer) – Number of threads to use.
  • binfile (string or an list/array of floats) –

    For string input: filename specifying the r bins for xi. The file should contain white-space separated values of (rmin, rmax) for each r wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

    For array-like input: A sequence of r values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

  • X/Y/Z (arraytype, real (float/double)) –

    Particle positions in the 3 axes. Must be within [0, boxsize] and specified in the same units as rp_bins and boxsize. All 3 arrays must be of the same floating-point type.

    Calculations will be done in the same precision as these arrays, i.e., calculations will be in floating point if XYZ are single precision arrays (C float type); or in double-precision if XYZ are double precision arrays (C double type).

  • weights (array_like, real (float/double), optional) – A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field.
  • verbose (boolean (default false)) – Boolean flag to control output of informational messages
  • output_ravg (boolean (default false)) –

    Boolean flag to output the average r for each bin. Code will run slower if you set this flag.

    Note: If you are calculating in single-precision, rpavg will suffer from numerical loss of precision and can not be trusted. If you need accurate rpavg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional, Default: None.
The type of weighting to apply. One of [“pair_product”, None].
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rmin, rmax, ravg, xi, npairs, weightavg] for each radial specified in the binfile. If output_ravg is not set then ravg will be set to 0.0 for all bins; similarly for weightavg. xi contains the correlation function while npairs contains the number of pairs in that bin. If using weights, xi will be weighted while npairs will not be.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.xi import xi
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 100000
>>> boxsize = 420.0
>>> nthreads = 4
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = xi(boxsize, nthreads, binfile, X, Y, Z, weights=weights, weight_type='pair_product', output_ravg=True)
>>> for r in results: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.6f} {4:10d} {5:10.6f}"
...                         .format(r['rmin'], r['rmax'],
...                         r['ravg'], r['xi'], r['npairs'], r['weightavg']))
...                   
  0.167536   0.238755   0.226592  -0.205733          4   1.000000
  0.238755   0.340251   0.289277  -0.176729         12   1.000000
  0.340251   0.484892   0.426819  -0.051829         40   1.000000
  0.484892   0.691021   0.596187  -0.131853        106   1.000000
  0.691021   0.984777   0.850100  -0.049207        336   1.000000
  0.984777   1.403410   1.225112   0.028543       1052   1.000000
  1.403410   2.000000   1.737153   0.011403       2994   1.000000
  2.000000   2.850200   2.474588   0.005405       8614   1.000000
  2.850200   4.061840   3.532018  -0.014098      24448   1.000000
  4.061840   5.788530   5.022241  -0.010784      70996   1.000000
  5.788530   8.249250   7.160648  -0.001588     207392   1.000000
  8.249250  11.756000  10.207213  -0.000323     601002   1.000000
 11.756000  16.753600  14.541171   0.000007    1740084   1.000000
 16.753600  23.875500  20.728773  -0.001595    5028058   1.000000
Corrfunc.theory.vpf(rmax, nbins, nspheres, numpN, seed, X, Y, Z, verbose=False, periodic=True, boxsize=0.0, xbin_refine_factor=1, ybin_refine_factor=1, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest')[source]

Function to compute the counts-in-cells on 3-D real-space points.

Returns a numpy structured array containing the probability of a sphere of radius up to rmax containing [0, numpN-1] galaxies.

Parameters:
  • rmax (double) – Maximum radius of the sphere to place on the particles
  • nbins (integer) – Number of bins in the counts-in-cells. Radius of first shell is rmax/nbins
  • nspheres (integer (>= 0)) – Number of random spheres to place within the particle distribution. For a small number of spheres, the error is larger in the measured pN’s.
  • numpN (integer (>= 1)) –

    Governs how many unique pN’s are to returned. If numpN is set to 1, then only the vpf (p0) is returned. For numpN=2, p0 and p1 are returned.

    More explicitly, the columns in the results look like the following:

    numpN Columns in output
    1 p0
    2 p0 p1
    3 p0 p1 p2
    4 p0 p1 p2 p3

    and so on…

    Note: p0 is the vpf

seed: unsigned integer
Random number seed for the underlying GSL random number generator. Used to draw centers of the spheres.
X/Y/Z: arraytype, real (float/double)

Particle positions in the 3 axes. Must be within [0, boxsize] and specified in the same units as rp_bins and boxsize. All 3 arrays must be of the same floating-point type.

Calculations will be done in the same precision as these arrays, i.e., calculations will be in floating point if XYZ are single precision arrays (C float type); or in double-precision if XYZ are double precision arrays (C double type).

verbose: boolean (default false)
Boolean flag to control output of informational messages
periodic: boolean
Boolean flag to indicate periodic boundary conditions.
boxsize: double
The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
(xyz)bin_refine_factor: integer, default is (1,1,1); typically within [1-3]

Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.

Note: Since the counts in spheres calculation is symmetric in all 3 dimensions, the defaults are different from the clustering routines.

max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

Returns:results – A numpy structured array containing [rmax, pN[numpN]] with nbins elements. Each row contains the maximum radius of the sphere and the numpN elements in the pN array. Each element of this array contains the probability that a sphere of radius rmax contains exactly N galaxies. For example, pN[0] (p0, the void probibility function) is the probability that a sphere of radius rmax contains 0 galaxies.

if c_api_timer is set, then the return value is a tuple containing (results, api_time). api_time measures only the time spent within the C library and ignores all python overhead.

Return type:Numpy structured array

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from Corrfunc.theory.vpf import vpf
>>> rmax = 10.0
>>> nbins = 10
>>> nspheres = 10000
>>> numpN = 5
>>> seed = -1
>>> N = 100000
>>> boxsize = 420.0
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> results = vpf(rmax, nbins, nspheres, numpN, seed, X, Y, Z)
>>> for r in results:
...     print("{0:10.1f} ".format(r[0]), end="")
...     
...     for pn in r[1]:
...         print("{0:10.3f} ".format(pn), end="")
...         
...     print("") 
1.0      0.995      0.005      0.000      0.000      0.000
2.0      0.956      0.044      0.001      0.000      0.000
3.0      0.858      0.130      0.012      0.001      0.000
4.0      0.695      0.252      0.047      0.005      0.001
5.0      0.493      0.347      0.127      0.028      0.005
6.0      0.295      0.362      0.219      0.091      0.026
7.0      0.141      0.285      0.265      0.179      0.085
8.0      0.056      0.159      0.228      0.229      0.161
9.0      0.019      0.066      0.135      0.192      0.192
10.0      0.003      0.019      0.054      0.106      0.150
Corrfunc.theory.DDsmu(autocorr, nthreads, binfile, mu_max, nmu_bins, X1, Y1, Z1, weights1=None, periodic=True, X2=None, Y2=None, Z2=None, weights2=None, verbose=False, boxsize=0.0, output_savg=False, fast_divide_and_NR_steps=0, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest', weight_type=None)[source]

Calculate the 2-D pair-counts corresponding to the redshift-space correlation function, \(\xi(s, \mu)\) Pairs which are separated by less than the s bins (specified in binfile) in 3-D, and less than s*mu_max in the Z-dimension are counted.

If weights are provided, the resulting pair counts are weighted. The weighting scheme depends on weight_type.

Note

This module only returns pair counts and not the actual correlation function \(\xi(s, \mu)\). See the utilities Corrfunc.utils.convert_3d_counts_to_cf for computing \(\xi(s, \mu)\) from the pair counts.

New in version 2.1.0.

Parameters:
  • autocorr (boolean, required) – Boolean flag for auto/cross-correlation. If autocorr is set to 1, then the second set of particle positions are not required.
  • nthreads (integer) – The number of OpenMP threads to use. Has no effect if OpenMP was not enabled during library compilation.
  • binfile (string or an list/array of floats) –

    For string input: filename specifying the s bins for DDsmu_mocks. The file should contain white-space separated values of (smin, smax) specifying each s bin wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

    For array-like input: A sequence of s values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

  • mu_max (double. Must be in range (0.0, 1.0]) –

    A double-precision value for the maximum cosine of the angular separation from the line of sight (LOS). Here, LOS is taken to be along the Z direction.

    Note: Only pairs with \(0 <= \cos(\theta_{LOS}) < \mu_{max}\) are counted (no equality).

  • nmu_bins (int) – The number of linear mu bins, with the bins ranging from from (0, \(\mu_{max}\))
  • X1/Y1/Z1 (array-like, real (float/double)) – The array of X/Y/Z positions for the first set of points. Calculations are done in the precision of the supplied arrays.
  • weights1 (array-like, real (float/double), shape (n_particles,) or (n_weights_per_particle,n_particles), optional) – Weights for computing a weighted pair count.
  • weight_type (str, optional) – The type of pair weighting to apply. Options: “pair_product”, None; Default: None.
  • periodic (boolean) – Boolean flag to indicate periodic boundary conditions.
  • X2/Y2/Z2 (array-like, real (float/double)) – Array of XYZ positions for the second set of points. Must be the same precision as the X1/Y1/Z1 arrays. Only required when autocorr==0.
  • weights2 (array-like, real (float/double), shape (n_particles,) or (n_weights_per_particle,n_particles), optional) – Weights for computing a weighted pair count.
  • verbose (boolean (default false)) – Boolean flag to control output of informational messages
  • boxsize (double) – The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
  • output_savg (boolean (default false)) – Boolean flag to output the average s for each bin. Code will run slower if you set this flag. Also, note, if you are calculating in single-precision, s will suffer from numerical loss of precision and can not be trusted. If you need accurate s values, then pass in double precision arrays for the particle positions.
  • fast_divide_and_NR_steps (integer (default 0)) – Replaces the division in AVX implementation with an approximate reciprocal, followed by fast_divide_and_NR_steps of Newton-Raphson. Can improve runtime by ~15-20% on older computers. Value of 0 uses the standard division operation.
  • (xyz)bin_refine_factor (integer (default (2,2,1) typical values in [1-3])) – Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
  • max_cells_per_dim (integer (default 100, typical values in [50-300])) – Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
  • c_api_timer (boolean (default false)) – Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
  • isa (integer (default -1)) –

    Controls the runtime dispatch for the instruction set to use. Possible options are: [-1, AVX, SSE42, FALLBACK]

    Setting isa to -1 will pick the fastest available instruction set on the current computer. However, if you set isa to, say, AVX and AVX is not available on the computer, then the code will revert to using FALLBACK (even though SSE42 might be available).

    Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the integer values correspond to the enum for the instruction set defined in utils/defs.h.

Returns:

  • results (A python list) – A python list containing nmu_bins of [smin, smax, savg, mu_max, npairs, weightavg] for each spatial bin specified in the binfile. There will be a total of nmu_bins ranging from [0, mu_max) per spatial bin. If output_savg is not set, then savg will be set to 0.0 for all bins; similarly for weight_avg. npairs contains the number of pairs in that bin.
  • time (if c_api_timer is set, then the return value contains the time spent) – in the API; otherwise time is set to 0.0

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.DDsmu import DDsmu
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> nthreads = 4
>>> autocorr = 1
>>> mu_max = 1.0
>>> seed = 42
>>> nmu_bins = 10
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = DDsmu(autocorr, nthreads, binfile, mu_max, nmu_bins,
...                  X, Y, Z, weights1=weights, weight_type='pair_product', output_savg=True)
>>> for r in results[100:]: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.1f}"
...                               " {4:10d} {5:10.6f}".format(r['smin'], r['smax'],
...                               r['savg'], r['mu_max'], r['npairs'], r['weightavg']))
...                         
 5.788530   8.249250   7.148213        0.1        230   1.000000
 5.788530   8.249250   7.157218        0.2        236   1.000000
 5.788530   8.249250   7.165338        0.3        208   1.000000
 5.788530   8.249250   7.079905        0.4        252   1.000000
 5.788530   8.249250   7.251661        0.5        184   1.000000
 5.788530   8.249250   7.118536        0.6        222   1.000000
 5.788530   8.249250   7.083466        0.7        238   1.000000
 5.788530   8.249250   7.198184        0.8        170   1.000000
 5.788530   8.249250   7.127409        0.9        208   1.000000
 5.788530   8.249250   6.973090        1.0        206   1.000000
 8.249250  11.756000  10.149183        0.1        592   1.000000
 8.249250  11.756000  10.213009        0.2        634   1.000000
 8.249250  11.756000  10.192220        0.3        532   1.000000
 8.249250  11.756000  10.246931        0.4        544   1.000000
 8.249250  11.756000  10.102675        0.5        530   1.000000
 8.249250  11.756000  10.276180        0.6        644   1.000000
 8.249250  11.756000  10.251264        0.7        666   1.000000
 8.249250  11.756000  10.138399        0.8        680   1.000000
 8.249250  11.756000  10.191916        0.9        566   1.000000
 8.249250  11.756000  10.243229        1.0        608   1.000000
11.756000  16.753600  14.552776        0.1       1734   1.000000
11.756000  16.753600  14.579991        0.2       1806   1.000000
11.756000  16.753600  14.599611        0.3       1802   1.000000
11.756000  16.753600  14.471100        0.4       1820   1.000000
11.756000  16.753600  14.480192        0.5       1740   1.000000
11.756000  16.753600  14.493679        0.6       1746   1.000000
11.756000  16.753600  14.547713        0.7       1722   1.000000
11.756000  16.753600  14.465390        0.8       1750   1.000000
11.756000  16.753600  14.547465        0.9       1798   1.000000
11.756000  16.753600  14.440975        1.0       1828   1.000000
16.753600  23.875500  20.720406        0.1       5094   1.000000
16.753600  23.875500  20.735403        0.2       5004   1.000000
16.753600  23.875500  20.721069        0.3       5172   1.000000
16.753600  23.875500  20.723648        0.4       5014   1.000000
16.753600  23.875500  20.650621        0.5       5094   1.000000
16.753600  23.875500  20.688135        0.6       5076   1.000000
16.753600  23.875500  20.735691        0.7       4910   1.000000
16.753600  23.875500  20.714097        0.8       4864   1.000000
16.753600  23.875500  20.751836        0.9       4954   1.000000
16.753600  23.875500  20.721183        1.0       5070   1.000000

Submodules

Corrfunc.theory.DD module

Python wrapper around the C extension for the pair counter in theory/DD/. This wrapper is in Corrfunc.theory.DD

Corrfunc.theory.DD.DD(autocorr, nthreads, binfile, X1, Y1, Z1, weights1=None, periodic=True, X2=None, Y2=None, Z2=None, weights2=None, verbose=False, boxsize=0.0, output_ravg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest', weight_type=None)[source]

Calculate the 3-D pair-counts corresponding to the real-space correlation function, \(\xi(r)\).

If weights are provided, the resulting pair counts are weighted. The weighting scheme depends on weight_type.

Note

This module only returns pair counts and not the actual correlation function \(\xi(r)\). See Corrfunc.utils.convert_3d_counts_to_cf for computing for computing \(\xi(r)\) from the pair counts returned.

Parameters:
  • autocorr (boolean, required) – Boolean flag for auto/cross-correlation. If autocorr is set to 1, then the second set of particle positions are not required.
  • nthreads (integer) – The number of OpenMP threads to use. Has no effect if OpenMP was not enabled during library compilation.
  • binfile (string or an list/array of floats) –

    For string input: filename specifying the r bins for DD. The file should contain white-space separated values of (rmin, rmax) for each r wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

    For array-like input: A sequence of r values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

  • X1/Y1/Z1 (array_like, real (float/double)) – The array of X/Y/Z positions for the first set of points. Calculations are done in the precision of the supplied arrays.
  • weights1 (array_like, real (float/double), optional) – A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field. If only one of weights1 and weights2 is specified, the other will be set to uniform weights.
  • periodic (boolean) – Boolean flag to indicate periodic boundary conditions.
  • X2/Y2/Z2 (array-like, real (float/double)) – Array of XYZ positions for the second set of points. Must be the same precision as the X1/Y1/Z1 arrays. Only required when autocorr==0.
  • weights2 (array-like, real (float/double), optional) – Same as weights1, but for the second set of positions
  • verbose (boolean (default false)) – Boolean flag to control output of informational messages
  • boxsize (double) – The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
  • output_ravg (boolean (default false)) –

    Boolean flag to output the average r for each bin. Code will run slower if you set this flag.

    Note: If you are calculating in single-precision, ravg will suffer from numerical loss of precision and can not be trusted. If you need accurate ravg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional
The type of weighting to apply. One of [“pair_product”, None]. Default: None.
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rmin, rmax, ravg, npairs, weightavg] for each radial bin specified in the binfile. If output_ravg is not set, then ravg will be set to 0.0 for all bins; similarly for weightavg. npairs contains the number of pairs in that bin and can be used to compute the actual \(\xi(r)\) by combining with (DR, RR) counts.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.DD import DD
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> nthreads = 4
>>> autocorr = 1
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = DD(autocorr, nthreads, binfile, X, Y, Z, weights1=weights, weight_type='pair_product', output_ravg=True)
>>> for r in results: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10d} {4:10.6f}".
...                         format(r['rmin'], r['rmax'], r['ravg'],
...                         r['npairs'], r['weightavg'])) 
  0.167536   0.238755   0.000000          0   0.000000
  0.238755   0.340251   0.000000          0   0.000000
  0.340251   0.484892   0.000000          0   0.000000
  0.484892   0.691021   0.000000          0   0.000000
  0.691021   0.984777   0.945372          2   1.000000
  0.984777   1.403410   1.340525         10   1.000000
  1.403410   2.000000   1.732968         36   1.000000
  2.000000   2.850200   2.558878         54   1.000000
  2.850200   4.061840   3.564959        208   1.000000
  4.061840   5.788530   4.999278        674   1.000000
  5.788530   8.249250   7.126673       2154   1.000000
  8.249250  11.756000  10.201834       5996   1.000000
 11.756000  16.753600  14.517830      17746   1.000000
 16.753600  23.875500  20.716017      50252   1.000000

Corrfunc.theory.DDrppi module

Python wrapper around the C extension for the pair counter in theory/DDrppi/. This wrapper is in Corrfunc.theory.DDrppi

Corrfunc.theory.DDrppi.DDrppi(autocorr, nthreads, pimax, binfile, X1, Y1, Z1, weights1=None, periodic=True, X2=None, Y2=None, Z2=None, weights2=None, verbose=False, boxsize=0.0, output_rpavg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest', weight_type=None)[source]

Calculate the 3-D pair-counts corresponding to the real-space correlation function, \(\xi(r_p, \pi)\) or \(\wp(r_p)\). Pairs which are separated by less than the rp bins (specified in binfile) in the X-Y plane, and less than pimax in the Z-dimension are counted.

If weights are provided, the resulting pair counts are weighted. The weighting scheme depends on weight_type.

Note

that this module only returns pair counts and not the actual correlation function \(\xi(r_p, \pi)\) or \(wp(r_p)\). See the utilities Corrfunc.utils.convert_3d_counts_to_cf and Corrfunc.utils.convert_rp_pi_counts_to_wp for computing \(\xi(r_p, \pi)\) and \(wp(r_p)\) respectively from the pair counts.

Parameters:
  • autocorr (boolean, required) – Boolean flag for auto/cross-correlation. If autocorr is set to 1, then the second set of particle positions are not required.
  • nthreads (integer) – The number of OpenMP threads to use. Has no effect if OpenMP was not enabled during library compilation.
  • pimax (double) –

    A double-precision value for the maximum separation along the Z-dimension.

    Distances along the :math:\pi direction are binned with unit depth. For instance, if pimax=40, then 40 bins will be created along the pi direction.

    Note: Only pairs with 0 <= dz < pimax are counted (no equality).

binfile: string or an list/array of floats

For string input: filename specifying the rp bins for DDrppi. The file should contain white-space separated values of (rpmin, rpmax) for each rp wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

For array-like input: A sequence of rp values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

X1/Y1/Z1: array-like, real (float/double)
The array of X/Y/Z positions for the first set of points. Calculations are done in the precision of the supplied arrays.
weights1: array_like, real (float/double), optional
A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field. If only one of weights1 and weights2 is specified, the other will be set to uniform weights.
X2/Y2/Z2: array-like, real (float/double)
Array of XYZ positions for the second set of points. Must be the same precision as the X1/Y1/Z1 arrays. Only required when autocorr==0.
weights2: array-like, real (float/double), optional
Same as weights1, but for the second set of positions
periodic: boolean
Boolean flag to indicate periodic boundary conditions.
verbose: boolean (default false)
Boolean flag to control output of informational messages
boxsize: double
The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
output_rpavg: boolean (default false)

Boolean flag to output the average rp for each bin. Code will run slower if you set this flag.

Note: If you are calculating in single-precision, rpavg will suffer from numerical loss of precision and can not be trusted. If you need accurate rpavg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rpmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional
The type of weighting to apply. One of [“pair_product”, None]. Default: None.
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rpmin, rpmax, rpavg, pimax, npairs, weightavg] for each radial bin specified in the binfile. If output_rpavg is not set, then rpavg will be set to 0.0 for all bins; similarly for weightavg. npairs contains the number of pairs in that bin and can be used to compute \(\xi(r_p, \pi)\) by combining with (DR, RR) counts.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.DDrppi import DDrppi
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> nthreads = 4
>>> autocorr = 1
>>> pimax = 40.0
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = DDrppi(autocorr, nthreads, pimax, binfile,
...                  X, Y, Z, weights1=weights, weight_type='pair_product', output_rpavg=True)
>>> for r in results[519:]: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.1f}"
...                               " {4:10d} {5:10.6f}".format(r['rmin'], r['rmax'],
...                               r['rpavg'], r['pimax'], r['npairs'], r['weightavg']))
...                         
 11.756000  16.753600  14.379250       40.0       1150   1.000000
 16.753600  23.875500  20.449131        1.0       2604   1.000000
 16.753600  23.875500  20.604834        2.0       2370   1.000000
 16.753600  23.875500  20.523989        3.0       2428   1.000000
 16.753600  23.875500  20.475181        4.0       2462   1.000000
 16.753600  23.875500  20.458005        5.0       2532   1.000000
 16.753600  23.875500  20.537162        6.0       2522   1.000000
 16.753600  23.875500  20.443087        7.0       2422   1.000000
 16.753600  23.875500  20.474580        8.0       2360   1.000000
 16.753600  23.875500  20.420360        9.0       2512   1.000000
 16.753600  23.875500  20.478355       10.0       2472   1.000000
 16.753600  23.875500  20.485268       11.0       2406   1.000000
 16.753600  23.875500  20.372985       12.0       2420   1.000000
 16.753600  23.875500  20.647998       13.0       2378   1.000000
 16.753600  23.875500  20.556208       14.0       2420   1.000000
 16.753600  23.875500  20.527992       15.0       2462   1.000000
 16.753600  23.875500  20.581017       16.0       2380   1.000000
 16.753600  23.875500  20.491819       17.0       2346   1.000000
 16.753600  23.875500  20.534440       18.0       2496   1.000000
 16.753600  23.875500  20.529129       19.0       2512   1.000000
 16.753600  23.875500  20.501946       20.0       2500   1.000000
 16.753600  23.875500  20.513349       21.0       2544   1.000000
 16.753600  23.875500  20.471915       22.0       2430   1.000000
 16.753600  23.875500  20.450651       23.0       2354   1.000000
 16.753600  23.875500  20.550753       24.0       2460   1.000000
 16.753600  23.875500  20.540262       25.0       2490   1.000000
 16.753600  23.875500  20.559572       26.0       2350   1.000000
 16.753600  23.875500  20.534245       27.0       2382   1.000000
 16.753600  23.875500  20.511302       28.0       2508   1.000000
 16.753600  23.875500  20.491632       29.0       2456   1.000000
 16.753600  23.875500  20.592493       30.0       2386   1.000000
 16.753600  23.875500  20.506234       31.0       2484   1.000000
 16.753600  23.875500  20.482109       32.0       2538   1.000000
 16.753600  23.875500  20.518463       33.0       2544   1.000000
 16.753600  23.875500  20.482515       34.0       2534   1.000000
 16.753600  23.875500  20.503124       35.0       2382   1.000000
 16.753600  23.875500  20.471307       36.0       2356   1.000000
 16.753600  23.875500  20.384231       37.0       2554   1.000000
 16.753600  23.875500  20.454012       38.0       2458   1.000000
 16.753600  23.875500  20.585543       39.0       2394   1.000000
 16.753600  23.875500  20.504965       40.0       2500   1.000000

Corrfunc.theory.DDsmu module

Python wrapper around the C extension for the pair counter in theory/DDsmu/. This wrapper is in Corrfunc.theory.DDsmu

Corrfunc.theory.DDsmu.DDsmu(autocorr, nthreads, binfile, mu_max, nmu_bins, X1, Y1, Z1, weights1=None, periodic=True, X2=None, Y2=None, Z2=None, weights2=None, verbose=False, boxsize=0.0, output_savg=False, fast_divide_and_NR_steps=0, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest', weight_type=None)[source]

Calculate the 2-D pair-counts corresponding to the redshift-space correlation function, \(\xi(s, \mu)\) Pairs which are separated by less than the s bins (specified in binfile) in 3-D, and less than s*mu_max in the Z-dimension are counted.

If weights are provided, the resulting pair counts are weighted. The weighting scheme depends on weight_type.

Note

This module only returns pair counts and not the actual correlation function \(\xi(s, \mu)\). See the utilities Corrfunc.utils.convert_3d_counts_to_cf for computing \(\xi(s, \mu)\) from the pair counts.

New in version 2.1.0.

Parameters:
  • autocorr (boolean, required) – Boolean flag for auto/cross-correlation. If autocorr is set to 1, then the second set of particle positions are not required.
  • nthreads (integer) – The number of OpenMP threads to use. Has no effect if OpenMP was not enabled during library compilation.
  • binfile (string or an list/array of floats) –

    For string input: filename specifying the s bins for DDsmu_mocks. The file should contain white-space separated values of (smin, smax) specifying each s bin wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

    For array-like input: A sequence of s values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

  • mu_max (double. Must be in range (0.0, 1.0]) –

    A double-precision value for the maximum cosine of the angular separation from the line of sight (LOS). Here, LOS is taken to be along the Z direction.

    Note: Only pairs with \(0 <= \cos(\theta_{LOS}) < \mu_{max}\) are counted (no equality).

  • nmu_bins (int) – The number of linear mu bins, with the bins ranging from from (0, \(\mu_{max}\))
  • X1/Y1/Z1 (array-like, real (float/double)) – The array of X/Y/Z positions for the first set of points. Calculations are done in the precision of the supplied arrays.
  • weights1 (array-like, real (float/double), shape (n_particles,) or (n_weights_per_particle,n_particles), optional) – Weights for computing a weighted pair count.
  • weight_type (str, optional) – The type of pair weighting to apply. Options: “pair_product”, None; Default: None.
  • periodic (boolean) – Boolean flag to indicate periodic boundary conditions.
  • X2/Y2/Z2 (array-like, real (float/double)) – Array of XYZ positions for the second set of points. Must be the same precision as the X1/Y1/Z1 arrays. Only required when autocorr==0.
  • weights2 (array-like, real (float/double), shape (n_particles,) or (n_weights_per_particle,n_particles), optional) – Weights for computing a weighted pair count.
  • verbose (boolean (default false)) – Boolean flag to control output of informational messages
  • boxsize (double) – The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
  • output_savg (boolean (default false)) – Boolean flag to output the average s for each bin. Code will run slower if you set this flag. Also, note, if you are calculating in single-precision, s will suffer from numerical loss of precision and can not be trusted. If you need accurate s values, then pass in double precision arrays for the particle positions.
  • fast_divide_and_NR_steps (integer (default 0)) – Replaces the division in AVX implementation with an approximate reciprocal, followed by fast_divide_and_NR_steps of Newton-Raphson. Can improve runtime by ~15-20% on older computers. Value of 0 uses the standard division operation.
  • (xyz)bin_refine_factor (integer (default (2,2,1) typical values in [1-3])) – Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
  • max_cells_per_dim (integer (default 100, typical values in [50-300])) – Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
  • c_api_timer (boolean (default false)) – Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
  • isa (integer (default -1)) –

    Controls the runtime dispatch for the instruction set to use. Possible options are: [-1, AVX, SSE42, FALLBACK]

    Setting isa to -1 will pick the fastest available instruction set on the current computer. However, if you set isa to, say, AVX and AVX is not available on the computer, then the code will revert to using FALLBACK (even though SSE42 might be available).

    Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the integer values correspond to the enum for the instruction set defined in utils/defs.h.

Returns:

  • results (A python list) – A python list containing nmu_bins of [smin, smax, savg, mu_max, npairs, weightavg] for each spatial bin specified in the binfile. There will be a total of nmu_bins ranging from [0, mu_max) per spatial bin. If output_savg is not set, then savg will be set to 0.0 for all bins; similarly for weight_avg. npairs contains the number of pairs in that bin.
  • time (if c_api_timer is set, then the return value contains the time spent) – in the API; otherwise time is set to 0.0

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.DDsmu import DDsmu
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> nthreads = 4
>>> autocorr = 1
>>> mu_max = 1.0
>>> seed = 42
>>> nmu_bins = 10
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = DDsmu(autocorr, nthreads, binfile, mu_max, nmu_bins,
...                  X, Y, Z, weights1=weights, weight_type='pair_product', output_savg=True)
>>> for r in results[100:]: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.1f}"
...                               " {4:10d} {5:10.6f}".format(r['smin'], r['smax'],
...                               r['savg'], r['mu_max'], r['npairs'], r['weightavg']))
...                         
 5.788530   8.249250   7.148213        0.1        230   1.000000
 5.788530   8.249250   7.157218        0.2        236   1.000000
 5.788530   8.249250   7.165338        0.3        208   1.000000
 5.788530   8.249250   7.079905        0.4        252   1.000000
 5.788530   8.249250   7.251661        0.5        184   1.000000
 5.788530   8.249250   7.118536        0.6        222   1.000000
 5.788530   8.249250   7.083466        0.7        238   1.000000
 5.788530   8.249250   7.198184        0.8        170   1.000000
 5.788530   8.249250   7.127409        0.9        208   1.000000
 5.788530   8.249250   6.973090        1.0        206   1.000000
 8.249250  11.756000  10.149183        0.1        592   1.000000
 8.249250  11.756000  10.213009        0.2        634   1.000000
 8.249250  11.756000  10.192220        0.3        532   1.000000
 8.249250  11.756000  10.246931        0.4        544   1.000000
 8.249250  11.756000  10.102675        0.5        530   1.000000
 8.249250  11.756000  10.276180        0.6        644   1.000000
 8.249250  11.756000  10.251264        0.7        666   1.000000
 8.249250  11.756000  10.138399        0.8        680   1.000000
 8.249250  11.756000  10.191916        0.9        566   1.000000
 8.249250  11.756000  10.243229        1.0        608   1.000000
11.756000  16.753600  14.552776        0.1       1734   1.000000
11.756000  16.753600  14.579991        0.2       1806   1.000000
11.756000  16.753600  14.599611        0.3       1802   1.000000
11.756000  16.753600  14.471100        0.4       1820   1.000000
11.756000  16.753600  14.480192        0.5       1740   1.000000
11.756000  16.753600  14.493679        0.6       1746   1.000000
11.756000  16.753600  14.547713        0.7       1722   1.000000
11.756000  16.753600  14.465390        0.8       1750   1.000000
11.756000  16.753600  14.547465        0.9       1798   1.000000
11.756000  16.753600  14.440975        1.0       1828   1.000000
16.753600  23.875500  20.720406        0.1       5094   1.000000
16.753600  23.875500  20.735403        0.2       5004   1.000000
16.753600  23.875500  20.721069        0.3       5172   1.000000
16.753600  23.875500  20.723648        0.4       5014   1.000000
16.753600  23.875500  20.650621        0.5       5094   1.000000
16.753600  23.875500  20.688135        0.6       5076   1.000000
16.753600  23.875500  20.735691        0.7       4910   1.000000
16.753600  23.875500  20.714097        0.8       4864   1.000000
16.753600  23.875500  20.751836        0.9       4954   1.000000
16.753600  23.875500  20.721183        1.0       5070   1.000000

Corrfunc.theory.vpf module

Python wrapper around the C extension for the counts-in-cells for 3-D real space. Corresponding C codes are in theory/vpf while the python wrapper is in Corrfunc.theory.vpf.

Corrfunc.theory.vpf.vpf(rmax, nbins, nspheres, numpN, seed, X, Y, Z, verbose=False, periodic=True, boxsize=0.0, xbin_refine_factor=1, ybin_refine_factor=1, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest')[source]

Function to compute the counts-in-cells on 3-D real-space points.

Returns a numpy structured array containing the probability of a sphere of radius up to rmax containing [0, numpN-1] galaxies.

Parameters:
  • rmax (double) – Maximum radius of the sphere to place on the particles
  • nbins (integer) – Number of bins in the counts-in-cells. Radius of first shell is rmax/nbins
  • nspheres (integer (>= 0)) – Number of random spheres to place within the particle distribution. For a small number of spheres, the error is larger in the measured pN’s.
  • numpN (integer (>= 1)) –

    Governs how many unique pN’s are to returned. If numpN is set to 1, then only the vpf (p0) is returned. For numpN=2, p0 and p1 are returned.

    More explicitly, the columns in the results look like the following:

    numpN Columns in output
    1 p0
    2 p0 p1
    3 p0 p1 p2
    4 p0 p1 p2 p3

    and so on…

    Note: p0 is the vpf

seed: unsigned integer
Random number seed for the underlying GSL random number generator. Used to draw centers of the spheres.
X/Y/Z: arraytype, real (float/double)

Particle positions in the 3 axes. Must be within [0, boxsize] and specified in the same units as rp_bins and boxsize. All 3 arrays must be of the same floating-point type.

Calculations will be done in the same precision as these arrays, i.e., calculations will be in floating point if XYZ are single precision arrays (C float type); or in double-precision if XYZ are double precision arrays (C double type).

verbose: boolean (default false)
Boolean flag to control output of informational messages
periodic: boolean
Boolean flag to indicate periodic boundary conditions.
boxsize: double
The side-length of the cube in the cosmological simulation. Present to facilitate exact calculations for periodic wrapping. If boxsize is not supplied, then the wrapping is done based on the maximum difference within each dimension of the X/Y/Z arrays.
(xyz)bin_refine_factor: integer, default is (1,1,1); typically within [1-3]

Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.

Note: Since the counts in spheres calculation is symmetric in all 3 dimensions, the defaults are different from the clustering routines.

max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

Returns:results – A numpy structured array containing [rmax, pN[numpN]] with nbins elements. Each row contains the maximum radius of the sphere and the numpN elements in the pN array. Each element of this array contains the probability that a sphere of radius rmax contains exactly N galaxies. For example, pN[0] (p0, the void probibility function) is the probability that a sphere of radius rmax contains 0 galaxies.

if c_api_timer is set, then the return value is a tuple containing (results, api_time). api_time measures only the time spent within the C library and ignores all python overhead.

Return type:Numpy structured array

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from Corrfunc.theory.vpf import vpf
>>> rmax = 10.0
>>> nbins = 10
>>> nspheres = 10000
>>> numpN = 5
>>> seed = -1
>>> N = 100000
>>> boxsize = 420.0
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> results = vpf(rmax, nbins, nspheres, numpN, seed, X, Y, Z)
>>> for r in results:
...     print("{0:10.1f} ".format(r[0]), end="")
...     
...     for pn in r[1]:
...         print("{0:10.3f} ".format(pn), end="")
...         
...     print("") 
1.0      0.995      0.005      0.000      0.000      0.000
2.0      0.956      0.044      0.001      0.000      0.000
3.0      0.858      0.130      0.012      0.001      0.000
4.0      0.695      0.252      0.047      0.005      0.001
5.0      0.493      0.347      0.127      0.028      0.005
6.0      0.295      0.362      0.219      0.091      0.026
7.0      0.141      0.285      0.265      0.179      0.085
8.0      0.056      0.159      0.228      0.229      0.161
9.0      0.019      0.066      0.135      0.192      0.192
10.0      0.003      0.019      0.054      0.106      0.150

Corrfunc.theory.wp module

Python wrapper around the C extension for the theoretical projected auto-correlation function, wp(rp), in theory/wp. This python wrapper is in Corrfunc.theory.wp.

Corrfunc.theory.wp.wp(boxsize, pimax, nthreads, binfile, X, Y, Z, weights=None, weight_type=None, verbose=False, output_rpavg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, c_cell_timer=False, isa=u'fastest')[source]

Function to compute the projected correlation function in a periodic cosmological box. Pairs which are separated by less than the rp bins (specified in binfile) in the X-Y plane, and less than pimax in the Z-dimension are counted.

If weights are provided, the resulting correlation function is weighted. The weighting scheme depends on weight_type.

Note

Pairs are double-counted. And if rpmin is set to 0.0, then all the self-pairs (i’th particle with itself) are added to the first bin => minimum number of pairs in the first bin is the total number of particles.

Parameters:
  • boxsize (double) – A double-precision value for the boxsize of the simulation in same units as the particle positions and the rp bins.
  • pimax (double) –

    A double-precision value for the maximum separation along the Z-dimension.

    Note: Only pairs with 0 <= dz < pimax are counted (no equality).

nthreads: integer
Number of threads to use.
binfile: string or an list/array of floats

For string input: filename specifying the rp bins for wp. The file should contain white-space separated values of (rpmin, rpmax) for each rp wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

For array-like input: A sequence of rp values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

X/Y/Z: arraytype, real (float/double)

Particle positions in the 3 axes. Must be within [0, boxsize] and specified in the same units as rp_bins and boxsize. All 3 arrays must be of the same floating-point type.

Calculations will be done in the same precision as these arrays, i.e., calculations will be in floating point if XYZ are single precision arrays (C float type); or in double-precision if XYZ are double precision arrays (C double type).

weights: array_like, real (float/double), optional
A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field.
verbose: boolean (default false)
Boolean flag to control output of informational messages
output_rpavg: boolean (default false)

Boolean flag to output the average rp for each bin. Code will run slower if you set this flag.

Note: If you are calculating in single-precision, rpavg will suffer from numerical loss of precision and can not be trusted. If you need accurate rpavg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rpmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
c_cell_timer : boolean (default false)
Boolean flag to measure actual time spent per cell-pair within the C libraries. A very detailed timer that stores information about the number of particles in each cell, the thread id that processed that cell-pair and the amount of time in nano-seconds taken to process that cell pair. This timer can be used to study the instruction set efficiency, and load-balancing of the code.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional
The type of weighting to apply. One of [“pair_product”, None]. Default: None.
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rpmin, rpmax, rpavg, wp, npairs, weightavg] for each radial specified in the binfile. If output_rpavg is not set then rpavg will be set to 0.0 for all bins; similarly for weightavg. wp contains the projected correlation function while npairs contains the number of unique pairs in that bin. If using weights, wp will be weighted while npairs will not be.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.
  • cell_time (list, optional) – Only returned if c_cell_timer is set. Contains detailed stats about each cell-pair visited during pair-counting, viz., number of particles in each of the cells in the pair, 1-D cell-indices for each cell in the pair, time (in nano-seconds) to process the pair and the thread-id for the thread that processed that cell-pair.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.wp import wp
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 10000
>>> boxsize = 420.0
>>> pimax = 40.0
>>> nthreads = 4
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> results = wp(boxsize, pimax, nthreads, binfile, X, Y, Z, weights=np.ones_like(X), weight_type='pair_product')
>>> for r in results:
...     print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.6f} {4:10d} {5:10.6f}".
...           format(r['rmin'], r['rmax'],
...           r['rpavg'], r['wp'], r['npairs'], r['weightavg']))
...           
  0.167536   0.238755   0.000000  66.717143         18   1.000000
  0.238755   0.340251   0.000000 -15.786045         16   1.000000
  0.340251   0.484892   0.000000   2.998470         42   1.000000
  0.484892   0.691021   0.000000 -15.779885         66   1.000000
  0.691021   0.984777   0.000000 -11.966728        142   1.000000
  0.984777   1.403410   0.000000  -9.699906        298   1.000000
  1.403410   2.000000   0.000000 -11.698771        588   1.000000
  2.000000   2.850200   0.000000   3.848375       1466   1.000000
  2.850200   4.061840   0.000000  -0.921452       2808   1.000000
  4.061840   5.788530   0.000000   0.454851       5802   1.000000
  5.788530   8.249250   0.000000   1.428344      11926   1.000000
  8.249250  11.756000   0.000000  -1.067885      23478   1.000000
 11.756000  16.753600   0.000000  -0.553319      47994   1.000000
 16.753600  23.875500   0.000000  -0.086433      98042   1.000000
Corrfunc.theory.wp.find_fastest_wp_bin_refs(boxsize, pimax, nthreads, binfile, X, Y, Z, verbose=False, output_rpavg=False, max_cells_per_dim=100, isa=u'fastest', maxbinref=3, nrepeats=3, return_runtimes=False)[source]

Finds the combination of bin refine factors that produces the fastest computation for the given dataset and rp limits.

Parameters:
  • boxsize (double) – A double-precision value for the boxsize of the simulation in same units as the particle positions and the rp bins.
  • pimax (double) –

    A double-precision value for the maximum separation along the Z-dimension.

    Note: Only pairs with 0 <= dz < pimax are counted (no equality).

nthreads: integer
Number of threads to use.
binfile: string or an list/array of floats

For string input: filename specifying the rp bins for wp. The file should contain white-space separated values of (rpmin, rpmax) for each rp wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

For array-like input: A sequence of rp values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

X/Y/Z: arraytype, real (float/double)

Particle positions in the 3 axes. Must be within [0, boxsize] and specified in the same units as rp_bins and boxsize. All 3 arrays must be of the same floating-point type.

Calculations will be done in the same precision as these arrays, i.e., calculations will be in floating point if XYZ are single precision arrays (C float type); or in double-precision if XYZ are double precision arrays (C double type).

verbose: boolean (default false)
Boolean flag to control output of informational messages
output_rpavg: boolean (default false)

Boolean flag to output the average rp for each bin. Code will run slower if you set this flag.

Note: If you are calculating in single-precision, rpavg will suffer from numerical loss of precision and can not be trusted. If you need accurate rpavg values, then pass in double precision arrays for the particle positions.

max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rpmax is too small relative to the boxsize (and increasing helps the runtime).
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

maxbinref: integer (default 3)

The maximum bin refine factor to use along each dimension. From experience, values larger than 3 do not improve wp runtime.

Runtime of module scales as maxbinref^3, so change the value of maxbinref with caution.

nrepeats: integer (default 3)
Number of times to repeat the timing for an individual run. Accounts for the dispersion in runtimes on computers with multiple user processes.
return_runtimes: boolean (default false)
If set, also returns the array of runtimes.
Returns:
  • (nx, ny, nz) (tuple of integers) – The combination of bin refine factors along each dimension that produces the fastest code.
  • runtimes (numpy structured array) – if return_runtimes is set, then the return value is a tuple containing ((nx, ny, nz), runtimes). runtimes is a numpy structured array containing the fields, [nx, ny, nz, avg_runtime, sigma_time]. Here, avg_runtime is the average time, measured over nrepeats invocations, spent in the python extension. sigma_time is the dispersion of the run times across those nrepeats invocations.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.io import read_catalog
>>> from Corrfunc.theory.wp import find_fastest_wp_bin_refs
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> X, Y, Z = read_catalog(return_dtype=np.float32)
>>> boxsize = 420.0
>>> pimax = 40.0
>>> nthreads = 4
>>> verbose = 1
>>> best, _ = find_fastest_wp_bin_refs(boxsize, pimax, nthreads, binfile,
...                                    X, Y, Z, maxbinref=2, nrepeats=3,
...                                    verbose=verbose,
...                                    return_runtimes=True)
>>> print(best) 
(2, 2, 1)

Note

Since the result might change depending on the computer, doctest is skipped for this function.

Corrfunc.theory.xi module

Python wrapper around the C extension for the theoretical 3-D real-space correlation function, \(\xi(r)\). Corresponding C routines are in theory/xi/, python interface is Corrfunc.theory.xi.

Corrfunc.theory.xi.xi(boxsize, nthreads, binfile, X, Y, Z, weights=None, weight_type=None, verbose=False, output_ravg=False, xbin_refine_factor=2, ybin_refine_factor=2, zbin_refine_factor=1, max_cells_per_dim=100, c_api_timer=False, isa=u'fastest')[source]

Function to compute the projected correlation function in a periodic cosmological box. Pairs which are separated by less than the r bins (specified in binfile) in 3-D real space.

If weights are provided, the resulting correlation function is weighted. The weighting scheme depends on weight_type.

Note

Pairs are double-counted. And if rmin is set to 0.0, then all the self-pairs (i’th particle with itself) are added to the first bin => minimum number of pairs in the first bin is the total number of particles.

Parameters:
  • boxsize (double) – A double-precision value for the boxsize of the simulation in same units as the particle positions and the r bins.
  • nthreads (integer) – Number of threads to use.
  • binfile (string or an list/array of floats) –

    For string input: filename specifying the r bins for xi. The file should contain white-space separated values of (rmin, rmax) for each r wanted. The bins need to be contiguous and sorted in increasing order (smallest bins come first).

    For array-like input: A sequence of r values that provides the bin-edges. For example, np.logspace(np.log10(0.1), np.log10(10.0), 15) is a valid input specifying 14 (logarithmic) bins between 0.1 and 10.0. This array does not need to be sorted.

  • X/Y/Z (arraytype, real (float/double)) –

    Particle positions in the 3 axes. Must be within [0, boxsize] and specified in the same units as rp_bins and boxsize. All 3 arrays must be of the same floating-point type.

    Calculations will be done in the same precision as these arrays, i.e., calculations will be in floating point if XYZ are single precision arrays (C float type); or in double-precision if XYZ are double precision arrays (C double type).

  • weights (array_like, real (float/double), optional) – A scalar, or an array of weights of shape (n_weights, n_positions) or (n_positions,). weight_type specifies how these weights are used; results are returned in the weightavg field.
  • verbose (boolean (default false)) – Boolean flag to control output of informational messages
  • output_ravg (boolean (default false)) –

    Boolean flag to output the average r for each bin. Code will run slower if you set this flag.

    Note: If you are calculating in single-precision, rpavg will suffer from numerical loss of precision and can not be trusted. If you need accurate rpavg values, then pass in double precision arrays for the particle positions.

(xyz)bin_refine_factor: integer, default is (2,2,1); typically within [1-3]
Controls the refinement on the cell sizes. Can have up to a 20% impact on runtime.
max_cells_per_dim: integer, default is 100, typical values in [50-300]
Controls the maximum number of cells per dimension. Total number of cells can be up to (max_cells_per_dim)^3. Only increase if rmax is too small relative to the boxsize (and increasing helps the runtime).
c_api_timer: boolean (default false)
Boolean flag to measure actual time spent in the C libraries. Here to allow for benchmarking and scaling studies.
isa: string (default fastest)

Controls the runtime dispatch for the instruction set to use. Possible options are: [fastest, avx, sse42, fallback]

Setting isa to fastest will pick the fastest available instruction set on the current computer. However, if you set isa to, say, avx and avx is not available on the computer, then the code will revert to using fallback (even though sse42 might be available).

Unless you are benchmarking the different instruction sets, you should always leave isa to the default value. And if you are benchmarking, then the string supplied here gets translated into an enum for the instruction set defined in utils/defs.h.

weight_type: string, optional, Default: None.
The type of weighting to apply. One of [“pair_product”, None].
Returns:
  • results (Numpy structured array) – A numpy structured array containing [rmin, rmax, ravg, xi, npairs, weightavg] for each radial specified in the binfile. If output_ravg is not set then ravg will be set to 0.0 for all bins; similarly for weightavg. xi contains the correlation function while npairs contains the number of pairs in that bin. If using weights, xi will be weighted while npairs will not be.
  • api_time (float, optional) – Only returned if c_api_timer is set. api_time measures only the time spent within the C library and ignores all python overhead.

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.theory.xi import xi
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../theory/tests/", "bins")
>>> N = 100000
>>> boxsize = 420.0
>>> nthreads = 4
>>> seed = 42
>>> np.random.seed(seed)
>>> X = np.random.uniform(0, boxsize, N)
>>> Y = np.random.uniform(0, boxsize, N)
>>> Z = np.random.uniform(0, boxsize, N)
>>> weights = np.ones_like(X)
>>> results = xi(boxsize, nthreads, binfile, X, Y, Z, weights=weights, weight_type='pair_product', output_ravg=True)
>>> for r in results: print("{0:10.6f} {1:10.6f} {2:10.6f} {3:10.6f} {4:10d} {5:10.6f}"
...                         .format(r['rmin'], r['rmax'],
...                         r['ravg'], r['xi'], r['npairs'], r['weightavg']))
...                   
  0.167536   0.238755   0.226592  -0.205733          4   1.000000
  0.238755   0.340251   0.289277  -0.176729         12   1.000000
  0.340251   0.484892   0.426819  -0.051829         40   1.000000
  0.484892   0.691021   0.596187  -0.131853        106   1.000000
  0.691021   0.984777   0.850100  -0.049207        336   1.000000
  0.984777   1.403410   1.225112   0.028543       1052   1.000000
  1.403410   2.000000   1.737153   0.011403       2994   1.000000
  2.000000   2.850200   2.474588   0.005405       8614   1.000000
  2.850200   4.061840   3.532018  -0.014098      24448   1.000000
  4.061840   5.788530   5.022241  -0.010784      70996   1.000000
  5.788530   8.249250   7.160648  -0.001588     207392   1.000000
  8.249250  11.756000  10.207213  -0.000323     601002   1.000000
 11.756000  16.753600  14.541171   0.000007    1740084   1.000000
 16.753600  23.875500  20.728773  -0.001595    5028058   1.000000