Corrfunc package

Corrfunc is a set of high-performance routines for computing clustering statistics on a distribution of points.

Corrfunc.read_text_file(filename, encoding=u'utf-8')[source]

Reads a file under python3 with encoding (default UTF-8). Also works under python2, without encoding. Uses the EAFP (https://docs.python.org/2/glossary.html#term-eafp) principle.

Corrfunc.which(program, mode=1, path=None)[source]

Mimics the Unix utility which. For python3.3+, shutil.which provides all of the required functionality. An implementation is provided in case shutil.which does not exist.

Parameters:
  • program – (required) string Name of program (can be fully-qualified path as well)
  • mode – (optional) integer flag bits Permissions to check for in the executable Default: os.F_OK (file exists) | os.X_OK (executable file)
  • path – (optional) string A custom path list to check against. Implementation taken from shutil.py.
Returns:

A fully qualified path to program as resolved by path or user environment. Returns None when program can not be resolved.

Corrfunc.write_text_file(filename, contents, encoding=u'utf-8')[source]

Writes a file under python3 with encoding (default UTF-8). Also works under python2, without encoding. Uses the EAFP (https://docs.python.org/2/glossary.html#term-eafp) principle.

Submodules

Corrfunc.call_correlation_functions module

Corrfunc.call_correlation_functions_mocks module

Example python code to call the mocks clustering functions from python. This script calls the python extensions directly; however the recommended use is via the wrappers provided in Corrfunc.mocks.

Corrfunc.call_correlation_functions_mocks.main()[source]

Corrfunc.io module

Routines to read galaxy catalogs from disk.

Corrfunc.io.read_fastfood_catalog(filename, return_dtype=None, need_header=None)[source]

Read a galaxy catalog from a fast-food binary file.

Parameters:
  • filename (string) – Filename containing the galaxy positions
  • return_dtype (numpy dtype for returned arrays. Default numpy.float) – Specifies the datatype for the returned arrays. Must be in {np.float, np.float32}
  • need_header (boolean, default None.) – Returns the header found in the fast-food file in addition to the X/Y/Z arrays.
Returns:

X, Y, Z – Returns the triplet of X/Y/Z positions as separate numpy arrays.

If need_header is set, then the header is also returned

Return type:

numpy arrays

Example

>>> import numpy as np
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.io import read_fastfood_catalog
>>> filename = pjoin(dirname(abspath(Corrfunc.__file__)),
...                  "../theory/tests/data/",
...                  "gals_Mr19.ff")
>>> X, Y, Z = read_fastfood_catalog(filename)
>>> N = 20
>>> for x,y,z in zip(X[0:N], Y[0:N], Z[0:]):
...     print("{0:10.5f} {1:10.5f} {2:10.5f}".format(x, y, z))
...     
419.94550    1.96340    0.01610
419.88272    1.79736    0.11960
0.32880   10.63620    4.16550
0.15314   10.68723    4.06529
0.46400    8.91150    6.97090
6.30690    9.77090    8.61080
5.87160    9.65870    9.29810
8.06210    0.42350    4.89410
11.92830    4.38660    4.54410
11.95543    4.32622    4.51485
11.65676    4.34665    4.53181
11.75739    4.26262    4.31666
11.81329    4.27530    4.49183
11.80406    4.54737    4.26824
12.61570    4.14470    3.70140
13.23640    4.34750    5.26450
13.19833    4.33196    5.29435
13.21249    4.35695    5.37418
13.06805    4.24275    5.35126
13.19693    4.37618    5.28772
Corrfunc.io.read_ascii_catalog(filename, return_dtype=None)[source]

Read a galaxy catalog from an ascii file.

Parameters:
  • filename (string) – Filename containing the galaxy positions
  • return_dtype (numpy dtype for returned arrays. Default numpy.float) – Specifies the datatype for the returned arrays. Must be in {np.float, np.float32}
Returns:

X, Y, Z – Returns the triplet of X/Y/Z positions as separate numpy arrays.

Return type:

numpy arrays

Example

>>> from __future__ import print_function
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.io import read_ascii_catalog
>>> filename = pjoin(dirname(abspath(Corrfunc.__file__)),
...                 "../mocks/tests/data/", "Mr19_mock_northonly.rdcz.dat")
>>> ra, dec, cz = read_ascii_catalog(filename)
>>> N = 20
>>> for r,d,c in zip(ra[0:N], dec[0:N], cz[0:]):
...     print("{0:10.5f} {1:10.5f} {2:10.5f}".format(r, d, c))
...     
178.45087   67.01112 19905.28514
178.83495   67.72519 19824.02285
179.50132   67.67628 19831.21553
182.75497   67.13004 19659.79825
186.29853   68.64099 20030.64412
186.32346   68.65879 19763.38137
187.36173   68.15151 19942.66996
187.20613   68.56189 19996.36607
185.56358   67.97724 19729.32308
183.27930   67.11318 19609.71345
183.86498   67.82823 19500.44130
184.07771   67.43429 19440.53790
185.13370   67.15382 19390.60304
189.15907   68.28252 19858.85853
190.12209   68.55062 20044.29744
193.65245   68.36878 19445.62469
194.93514   68.34870 19158.93155
180.36897   67.50058 18671.40780
179.63278   67.51318 18657.59191
180.75742   67.95530 18586.88913
Corrfunc.io.read_catalog(filebase=None, return_dtype=<Mock id='140231671971536'>)[source]

Reads a galaxy/randoms catalog and returns 3 XYZ arrays.

Parameters:
  • filebase (string (optional)) – The fully qualified path to the file. If omitted, reads the theory galaxy catalog under ../theory/tests/data/
  • return_dtype (numpy dtype for returned arrays. Default numpy.float) – Specifies the datatype for the returned arrays. Must be in {np.float, np.float32}
Returns:

  • x y z - Unpacked numpy arrays compatible with the installed
  • version of Corrfunc.

Note

If the filename is omitted, then first the fast-food file is searched for, and then the ascii file. End-users should always supply the full filename.

Corrfunc.tests module

Corrfunc.tests.tests()[source]

Wrapper to run the two scripts that should have been installed with the Corrfunc package.

If the two scripts (one for theory extensions, one for mocks extensions) run successfully, then the package is working correctly.

Corrfunc.utils module

A set of utility routines

Corrfunc.utils.convert_3d_counts_to_cf(ND1, ND2, NR1, NR2, D1D2, D1R2, D2R1, R1R2, estimator=u'LS')[source]

Converts raw pair counts to a correlation function.

Parameters:
  • ND1 (integer) – Number of points in the first dataset
  • ND2 (integer) – Number of points in the second dataset
  • NR1 (integer) – Number of points in the randoms for first dataset
  • NR2 (integer) – Number of points in the randoms for second dataset
  • D1D2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and D2
  • D1R2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and R2
  • D2R1 (array-like, integer) – Pair-counts for the cross-correlation between D2 and R1
  • R1R2 (array-like, integer) – Pair-counts for the cross-correlation between R1 and R2
  • all of these pair-counts arrays, the corresponding numpy (For) –
  • returned by the theory/mocks modules can also be passed (struct) –
  • estimator (string, default='LS' (Landy-Szalay)) – The kind of estimator to use for computing the correlation function. Currently, only supports Landy-Szalay
Returns:

cf – The correlation function, calculated using the chosen estimator, is returned. NAN is returned for the bins where the RR count is 0.

Return type:

A numpy array

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from Corrfunc.theory.DD import DD
>>> from Corrfunc.io import read_catalog
>>> from Corrfunc.utils import convert_3d_counts_to_cf
>>> X, Y, Z = read_catalog()
>>> N = len(X)
>>> boxsize = 420.0
>>> rand_N = 3*N
>>> seed = 42
>>> np.random.seed(seed)
>>> rand_X = np.random.uniform(0, boxsize, rand_N)
>>> rand_Y = np.random.uniform(0, boxsize, rand_N)
>>> rand_Z = np.random.uniform(0, boxsize, rand_N)
>>> nthreads = 2
>>> rmin = 0.1
>>> rmax = 15.0
>>> nbins = 10
>>> bins = np.linspace(rmin, rmax, nbins + 1)
>>> autocorr = 1
>>> DD_counts = DD(autocorr, nthreads, bins, X, Y, Z)
>>> autocorr = 0
>>> DR_counts = DD(autocorr, nthreads, bins,
...                X, Y, Z,
...                X2=rand_X, Y2=rand_Y, Z2=rand_Z)
>>> autocorr = 1
>>> RR_counts = DD(autocorr, nthreads, bins, rand_X, rand_Y, rand_Z)
>>> cf = convert_3d_counts_to_cf(N, N, rand_N, rand_N,
...                              DD_counts, DR_counts,
...                              DR_counts, RR_counts)
>>> for xi in cf: print("{0:10.6f}".format(xi))
...                    
22.769019
 3.612709
 1.621372
 1.000969
 0.691646
 0.511819
 0.398872
 0.318815
 0.255643
 0.207759
Corrfunc.utils.convert_rp_pi_counts_to_wp(ND1, ND2, NR1, NR2, D1D2, D1R2, D2R1, R1R2, nrpbins, pimax, dpi=1.0, estimator=u'LS')[source]

Converts raw pair counts to a correlation function.

Parameters:
  • ND1 (integer) – Number of points in the first dataset
  • ND2 (integer) – Number of points in the second dataset
  • NR1 (integer) – Number of points in the randoms for first dataset
  • NR2 (integer) – Number of points in the randoms for second dataset
  • D1D2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and D2
  • D1R2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and R2
  • D2R1 (array-like, integer) – Pair-counts for the cross-correlation between D2 and R1
  • R1R2 (array-like, integer) – Pair-counts for the cross-correlation between R1 and R2
  • all of these pair-counts arrays, the corresponding numpy (For) –
  • returned by the theory/mocks modules can also be passed (struct) –
  • nrpbins (integer) – Number of bins in rp
  • pimax (float) – Integration distance along the line of sight direction
  • dpi (float, default=1.0 Mpc/h) – Binsize in the line of sight direction
  • estimator (string, default='LS' (Landy-Szalay)) – The kind of estimator to use for computing the correlation function. Currently, only supports Landy-Szalay
Returns:

wp – The projected correlation function, calculated using the chosen estimator, is returned. If any of the pi bins (in an rp bin) contains 0 for the RR counts, then NAN is returned for that rp bin.

Return type:

A numpy array

Example

>>> from __future__ import print_function
>>> import numpy as np
>>> from Corrfunc.theory.DDrppi import DDrppi
>>> from Corrfunc.io import read_catalog
>>> from Corrfunc.utils import convert_rp_pi_counts_to_wp
>>> X, Y, Z = read_catalog()
>>> N = len(X)
>>> boxsize = 420.0
>>> rand_N = 3*N
>>> seed = 42
>>> np.random.seed(seed)
>>> rand_X = np.random.uniform(0, boxsize, rand_N)
>>> rand_Y = np.random.uniform(0, boxsize, rand_N)
>>> rand_Z = np.random.uniform(0, boxsize, rand_N)
>>> nthreads = 4
>>> pimax = 40.0
>>> nrpbins = 20
>>> rpmin = 0.1
>>> rpmax = 10.0
>>> bins = np.linspace(rpmin, rpmax, nrpbins + 1)
>>> autocorr = 1
>>> DD_counts = DDrppi(autocorr, nthreads, pimax, bins,
...                    X, Y, Z)
>>> autocorr = 0
>>> DR_counts = DDrppi(autocorr, nthreads, pimax, bins,
...                    X, Y, Z,
...                    X2=rand_X, Y2=rand_Y, Z2=rand_Z)
>>> autocorr = 1
>>> RR_counts = DDrppi(autocorr, nthreads, pimax, bins,
...                    rand_X, rand_Y, rand_Z)
>>> wp = convert_rp_pi_counts_to_wp(N, N, rand_N, rand_N,
...                                 DD_counts, DR_counts,
...                                 DR_counts, RR_counts,
...                                 nrpbins, pimax)
>>> for w in wp: print("{0:10.6f}".format(w))
...                    
187.592199
 83.059181
 53.200599
 40.389354
 33.356371
 29.045476
 26.088133
 23.628340
 21.703961
 20.153125
 18.724781
 17.433235
 16.287183
 15.443230
 14.436193
 13.592727
 12.921226
 12.330074
 11.696364
 11.208365
Corrfunc.utils.translate_isa_string_to_enum(isa)[source]

Helper function to convert an user-supplied string to the underlying enum in the C-API. The extensions only have specific implementations for AVX, SSE42 and FALLBACK. Any other value will raise a ValueError.

Parameters:isa (string) – A string containing the desired instruction set. Valid values are [‘AVX’, ‘SSE42’, ‘FALLBACK’, ‘FASTEST’]
Returns:instruction_set – An integer corresponding to the desired instruction set, as used in the underlying C API. The enum used here should be defined exactly the same way as the enum in utils/defs.h.
Return type:integer
Corrfunc.utils.return_file_with_rbins(rbins)[source]

Helper function to ensure that the binfile required by the Corrfunc extensions is a actually a string.

Checks if the input is a string and file; return if True. If not, and the input is an array, then a temporary file is created and the contents of rbins is written out.

Parameters:rbins (string or array-like) – Expected to be a string or an array containing the bins
Returns:binfile – If the input rbins was a valid filename, then returns the same string. If rbins was an array, then this function creates a temporary file with the contents of the rbins arrays. This temporary filename is returned
Return type:string, filename
Corrfunc.utils.fix_ra_dec(ra, dec)[source]

Wraps input RA and DEC values into range expected by the extensions.

Parameters:
  • RA (array-like, units must be degrees) – Right Ascension values (astronomical longitude)
  • DEC (array-like, units must be degrees) – Declination values (astronomical latitude)
Returns:

Tuple (RA, DEC) – RA is wrapped into range [0.0, 360.0] Declination is wrapped into range [-90.0, 90.0]

Return type:

array-like

Corrfunc.utils.fix_cz(cz)[source]

Multiplies the input array by speed of light, if the input values are too small.

Essentially, converts redshift into cz, if the user passed redshifts instead of cz.

Parameters:cz (array-like, reals) – An array containing [Speed of Light *] redshift values.
Returns:cz – Actual cz values, multiplying the input cz array by the Speed of Light, if redshift values were passed as input cz.
Return type:array-like
Corrfunc.utils.compute_nbins(max_diff, binsize, refine_factor=1, max_nbins=None)[source]

Helper utility to find the number of bins for that satisfies the constraints of (binsize, refine_factor, and max_nbins).

Parameters:
  • max_diff (double) – Max. difference (spatial or angular) to be spanned, (i.e., range of allowed domain values)
  • binsize (double) – Min. allowed binsize (spatial or angular)
  • refine_factor (integer, default 1) – How many times to refine the bins. The refinements occurs after nbins has already been determined (with refine_factor-1). Thus, the number of bins will be exactly higher by refine_factor compared to the base case of refine_factor=1
  • max_nbins (integer, default None) – Max number of allowed cells
Returns:

nbins – Number of bins that satisfies the constraints of bin size >= binsize, the refinement factor and nbins <= max_nbins.

Return type:

integer, >= 1

Example

>>> from Corrfunc.utils import compute_nbins
>>> max_diff = 180
>>> binsize = 10
>>> compute_nbins(max_diff, binsize)
18
>>> refine_factor=2
>>> max_nbins = 20
>>> compute_nbins(max_diff, binsize, refine_factor=refine_factor,
...              max_nbins=max_nbins)
20

A method to optimally partition spherical regions such that pairs of points within a certain angular separation, thetamax, can be quickly computed.

Generates the binning scheme used in Corrfunc.mocks.DDtheta_mocks for a spherical region in Right Ascension (RA), Declination (DEC) and a maximum angular separation.

For a given thetamax, regions on the sphere are divided into bands in DEC bands, with the width in DEC equal to thetamax. If link_in_ra is set, then these DEC bands are further sub-divided into RA cells.

Parameters:
  • thetamax (double) – Max. angular separation of pairs. Expected to be in degrees unless input_in_degrees is set to False.
  • ra_limits (array of 2 doubles. Default [0.0, 2*pi]) – Range of Righ Ascension (longitude) for the spherical region
  • dec_limits (array of 2 doubles. Default [-pi/2, pi/2]) – Range of Declination (latitude) values for the spherical region
  • link_in_ra (Boolean. Default True) – Whether linking in RA is done (in addition to linking in DEC)
  • ra_refine_factor (integer, >= 1. Default 1) – Controls the sub-division of the RA cells. For a large number of particles, higher ra_refine_factor typically results in a faster runtime
  • dec_refine_factor (integer, >= 1. Default 1) – Controls the sub-division of the DEC cells. For a large number of particles, higher dec_refine_factor typically results in a faster runtime
  • max_ra_cells (integer, >= 1. Default 100) – The max. number of RA cells per DEC band.
  • max_dec_cells (integer >= 1. Default 200) – The max. number of total DEC bands
  • return_num_ra_cells (bool, default False) – Flag to return the number of RA cells per DEC band
  • input_in_degrees (Boolean. Default True) – Flag to show if the input quantities are in degrees. If set to False, all angle inputs will be taken to be in radians.
Returns:

  • sphere_grid (A numpy compound array, shape (ncells, 2)) – A numpy compound array with fields dec_limit and ra_limit of size 2 each. These arrays contain the beginning and end of DEC and RA regions for the cell.
  • num_ra_cells (numpy array, returned if return_num_ra_cells is set) – A numpy array containing the number of RA cells per declination band

Note

If link_in_ra=False, then there is effectively one RA bin per DEC band. The ‘ra_limit’ field will show the range of allowed RA values.

Example

>>> from Corrfunc.utils import gridlink_sphere
>>> import numpy as np
>>> np.set_printoptions(precision=8)
>>> thetamax=30
>>> grid = gridlink_sphere(thetamax) 
>>> print(grid)
[([-1.57079633, -1.04719755], [ 0.        ,  3.14159265])
 ([-1.57079633, -1.04719755], [ 3.14159265,  6.28318531])
 ([-1.04719755, -0.52359878], [ 0.        ,  3.14159265])
 ([-1.04719755, -0.52359878], [ 3.14159265,  6.28318531])
 ([-0.52359878,  0.        ], [ 0.        ,  1.25663706])
 ([-0.52359878,  0.        ], [ 1.25663706,  2.51327412])
 ([-0.52359878,  0.        ], [ 2.51327412,  3.76991118])
 ([-0.52359878,  0.        ], [ 3.76991118,  5.02654825])
 ([-0.52359878,  0.        ], [ 5.02654825,  6.28318531])
 ([ 0.        ,  0.52359878], [ 0.        ,  1.25663706])
 ([ 0.        ,  0.52359878], [ 1.25663706,  2.51327412])
 ([ 0.        ,  0.52359878], [ 2.51327412,  3.76991118])
 ([ 0.        ,  0.52359878], [ 3.76991118,  5.02654825])
 ([ 0.        ,  0.52359878], [ 5.02654825,  6.28318531])
 ([ 0.52359878,  1.04719755], [ 0.        ,  3.14159265])
 ([ 0.52359878,  1.04719755], [ 3.14159265,  6.28318531])
 ([ 1.04719755,  1.57079633], [ 0.        ,  3.14159265])
 ([ 1.04719755,  1.57079633], [ 3.14159265,  6.28318531])]
>>> grid = gridlink_sphere(60, dec_refine_factor=3, ra_refine_factor=2) 
>>> print(grid)
[([-1.57079633, -1.22173048], [ 0.        ,  1.57079633])
 ([-1.57079633, -1.22173048], [ 1.57079633,  3.14159265])
 ([-1.57079633, -1.22173048], [ 3.14159265,  4.71238898])
 ([-1.57079633, -1.22173048], [ 4.71238898,  6.28318531])
 ([-1.22173048, -0.87266463], [ 0.        ,  1.57079633])
 ([-1.22173048, -0.87266463], [ 1.57079633,  3.14159265])
 ([-1.22173048, -0.87266463], [ 3.14159265,  4.71238898])
 ([-1.22173048, -0.87266463], [ 4.71238898,  6.28318531])
 ([-0.87266463, -0.52359878], [ 0.        ,  1.57079633])
 ([-0.87266463, -0.52359878], [ 1.57079633,  3.14159265])
 ([-0.87266463, -0.52359878], [ 3.14159265,  4.71238898])
 ([-0.87266463, -0.52359878], [ 4.71238898,  6.28318531])
 ([-0.52359878, -0.17453293], [ 0.        ,  1.57079633])
 ([-0.52359878, -0.17453293], [ 1.57079633,  3.14159265])
 ([-0.52359878, -0.17453293], [ 3.14159265,  4.71238898])
 ([-0.52359878, -0.17453293], [ 4.71238898,  6.28318531])
 ([-0.17453293,  0.17453293], [ 0.        ,  1.57079633])
 ([-0.17453293,  0.17453293], [ 1.57079633,  3.14159265])
 ([-0.17453293,  0.17453293], [ 3.14159265,  4.71238898])
 ([-0.17453293,  0.17453293], [ 4.71238898,  6.28318531])
 ([ 0.17453293,  0.52359878], [ 0.        ,  1.57079633])
 ([ 0.17453293,  0.52359878], [ 1.57079633,  3.14159265])
 ([ 0.17453293,  0.52359878], [ 3.14159265,  4.71238898])
 ([ 0.17453293,  0.52359878], [ 4.71238898,  6.28318531])
 ([ 0.52359878,  0.87266463], [ 0.        ,  1.57079633])
 ([ 0.52359878,  0.87266463], [ 1.57079633,  3.14159265])
 ([ 0.52359878,  0.87266463], [ 3.14159265,  4.71238898])
 ([ 0.52359878,  0.87266463], [ 4.71238898,  6.28318531])
 ([ 0.87266463,  1.22173048], [ 0.        ,  1.57079633])
 ([ 0.87266463,  1.22173048], [ 1.57079633,  3.14159265])
 ([ 0.87266463,  1.22173048], [ 3.14159265,  4.71238898])
 ([ 0.87266463,  1.22173048], [ 4.71238898,  6.28318531])
 ([ 1.22173048,  1.57079633], [ 0.        ,  1.57079633])
 ([ 1.22173048,  1.57079633], [ 1.57079633,  3.14159265])
 ([ 1.22173048,  1.57079633], [ 3.14159265,  4.71238898])
 ([ 1.22173048,  1.57079633], [ 4.71238898,  6.28318531])]