Corrfunc package¶
Corrfunc is a set of high-performance routines for computing clustering statistics on a distribution of points.
-
Corrfunc.
read_text_file
(filename, encoding=u'utf-8')[source]¶ Reads a file under python3 with encoding (default UTF-8). Also works under python2, without encoding. Uses the EAFP (https://docs.python.org/2/glossary.html#term-eafp) principle.
-
Corrfunc.
which
(program, mode=1, path=None)[source]¶ Mimics the Unix utility which. For python3.3+, shutil.which provides all of the required functionality. An implementation is provided in case shutil.which does not exist.
Parameters: - program – (required) string Name of program (can be fully-qualified path as well)
- mode – (optional) integer flag bits Permissions to check for in the executable Default: os.F_OK (file exists) | os.X_OK (executable file)
- path – (optional) string A custom path list to check against. Implementation taken from shutil.py.
Returns: A fully qualified path to program as resolved by path or user environment. Returns None when program can not be resolved.
-
Corrfunc.
write_text_file
(filename, contents, encoding=u'utf-8')[source]¶ Writes a file under python3 with encoding (default UTF-8). Also works under python2, without encoding. Uses the EAFP (https://docs.python.org/2/glossary.html#term-eafp) principle.
Subpackages¶
Corrfunc.io module¶
Routines to read galaxy catalogs from disk.
-
Corrfunc.io.
read_fastfood_catalog
(filename, return_dtype=None, need_header=None)[source]¶ Read a galaxy catalog from a fast-food binary file.
Parameters: - filename (string) – Filename containing the galaxy positions
- return_dtype (numpy dtype for returned arrays. Default
numpy.float
) – Specifies the datatype for the returned arrays. Must be in {np.float, np.float32} - need_header (boolean, default None.) – Returns the header found in the fast-food file in addition to the X/Y/Z arrays.
Returns: X, Y, Z – Returns the triplet of X/Y/Z positions as separate numpy arrays.
If need_header is set, then the header is also returned
Return type: numpy arrays
Example
>>> import numpy as np >>> from os.path import dirname, abspath, join as pjoin >>> import Corrfunc >>> from Corrfunc.io import read_fastfood_catalog >>> filename = pjoin(dirname(abspath(Corrfunc.__file__)), ... "../theory/tests/data/", ... "gals_Mr19.ff") >>> X, Y, Z = read_fastfood_catalog(filename) >>> N = 20 >>> for x,y,z in zip(X[0:N], Y[0:N], Z[0:]): ... print("{0:10.5f} {1:10.5f} {2:10.5f}".format(x, y, z)) ... # doctest: +NORMALIZE_WHITESPACE 419.94550 1.96340 0.01610 419.88272 1.79736 0.11960 0.32880 10.63620 4.16550 0.15314 10.68723 4.06529 0.46400 8.91150 6.97090 6.30690 9.77090 8.61080 5.87160 9.65870 9.29810 8.06210 0.42350 4.89410 11.92830 4.38660 4.54410 11.95543 4.32622 4.51485 11.65676 4.34665 4.53181 11.75739 4.26262 4.31666 11.81329 4.27530 4.49183 11.80406 4.54737 4.26824 12.61570 4.14470 3.70140 13.23640 4.34750 5.26450 13.19833 4.33196 5.29435 13.21249 4.35695 5.37418 13.06805 4.24275 5.35126 13.19693 4.37618 5.28772
-
Corrfunc.io.
read_ascii_catalog
(filename, return_dtype=None)[source]¶ Read a galaxy catalog from an ascii file.
Parameters: - filename (string) – Filename containing the galaxy positions
- return_dtype (numpy dtype for returned arrays. Default
numpy.float
) – Specifies the datatype for the returned arrays. Must be in {np.float, np.float32}
Returns: X, Y, Z – Returns the triplet of X/Y/Z positions as separate numpy arrays.
Return type: numpy arrays
Example
>>> from __future__ import print_function >>> from os.path import dirname, abspath, join as pjoin >>> import Corrfunc >>> from Corrfunc.io import read_ascii_catalog >>> filename = pjoin(dirname(abspath(Corrfunc.__file__)), ... "../mocks/tests/data/", "Mr19_mock_northonly.rdcz.dat") >>> ra, dec, cz = read_ascii_catalog(filename) >>> N = 20 >>> for r,d,c in zip(ra[0:N], dec[0:N], cz[0:]): ... print("{0:10.5f} {1:10.5f} {2:10.5f}".format(r, d, c)) ... # doctest: +NORMALIZE_WHITESPACE 178.45087 67.01112 19905.28514 178.83495 67.72519 19824.02285 179.50132 67.67628 19831.21553 182.75497 67.13004 19659.79825 186.29853 68.64099 20030.64412 186.32346 68.65879 19763.38137 187.36173 68.15151 19942.66996 187.20613 68.56189 19996.36607 185.56358 67.97724 19729.32308 183.27930 67.11318 19609.71345 183.86498 67.82823 19500.44130 184.07771 67.43429 19440.53790 185.13370 67.15382 19390.60304 189.15907 68.28252 19858.85853 190.12209 68.55062 20044.29744 193.65245 68.36878 19445.62469 194.93514 68.34870 19158.93155 180.36897 67.50058 18671.40780 179.63278 67.51318 18657.59191 180.75742 67.95530 18586.88913
-
Corrfunc.io.
read_catalog
(filebase=None, return_dtype=<Mock id='140520821399120'>)[source]¶ Reads a galaxy/randoms catalog and returns 3 XYZ arrays.
Parameters: - filebase (string (optional)) – The fully qualified path to the file. If omitted, reads the theory galaxy catalog under ../theory/tests/data/
- return_dtype (numpy dtype for returned arrays. Default
numpy.float
) – Specifies the datatype for the returned arrays. Must be in {np.float, np.float32}
Returns: x y z
- Unpacked numpy arrays compatible with the installed- version of
Corrfunc
.
Note
If the filename is omitted, then first the fast-food file is searched for, and then the ascii file. End-users should always supply the full filename.
Corrfunc.utils module¶
A set of utility routines
-
Corrfunc.utils.
convert_3d_counts_to_cf
(ND1, ND2, NR1, NR2, D1D2, D1R2, D2R1, R1R2, estimator=u'LS')[source]¶ Converts raw pair counts to a correlation function.
Parameters: - ND1 (integer) – Number of points in the first dataset
- ND2 (integer) – Number of points in the second dataset
- NR1 (integer) – Number of points in the randoms for first dataset
- NR2 (integer) – Number of points in the randoms for second dataset
- D1D2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and D2
- D1R2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and R2
- D2R1 (array-like, integer) – Pair-counts for the cross-correlation between D2 and R1
- R1R2 (array-like, integer) – Pair-counts for the cross-correlation between R1 and R2
- all of these pair-counts arrays, the corresponding numpy (For) –
- returned by the theory/mocks modules can also be passed (struct) –
- estimator (string, default='LS' (Landy-Szalay)) – The kind of estimator to use for computing the correlation function. Currently, only supports Landy-Szalay
Returns: cf – The correlation function, calculated using the chosen estimator, is returned. NAN is returned for the bins where the
RR
count is 0.Return type: A numpy array
Example
>>> from __future__ import print_function >>> import numpy as np >>> from Corrfunc.theory.DD import DD >>> from Corrfunc.io import read_catalog >>> from Corrfunc.utils import convert_3d_counts_to_cf >>> X, Y, Z = read_catalog() >>> N = len(X) >>> boxsize = 420.0 >>> rand_N = 3*N >>> seed = 42 >>> np.random.seed(seed) >>> rand_X = np.random.uniform(0, boxsize, rand_N) >>> rand_Y = np.random.uniform(0, boxsize, rand_N) >>> rand_Z = np.random.uniform(0, boxsize, rand_N) >>> nthreads = 2 >>> rmin = 0.1 >>> rmax = 15.0 >>> nbins = 10 >>> bins = np.linspace(rmin, rmax, nbins + 1) >>> autocorr = 1 >>> DD_counts = DD(autocorr, nthreads, bins, X, Y, Z) >>> autocorr = 0 >>> DR_counts = DD(autocorr, nthreads, bins, ... X, Y, Z, ... X2=rand_X, Y2=rand_Y, Z2=rand_Z) >>> autocorr = 1 >>> RR_counts = DD(autocorr, nthreads, bins, rand_X, rand_Y, rand_Z) >>> cf = convert_3d_counts_to_cf(N, N, rand_N, rand_N, ... DD_counts, DR_counts, ... DR_counts, RR_counts) >>> for xi in cf: print("{0:10.6f}".format(xi)) ... # doctest: +NORMALIZE_WHITESPACE 22.769019 3.612709 1.621372 1.000969 0.691646 0.511819 0.398872 0.318815 0.255643 0.207759
-
Corrfunc.utils.
convert_rp_pi_counts_to_wp
(ND1, ND2, NR1, NR2, D1D2, D1R2, D2R1, R1R2, nrpbins, pimax, dpi=1.0, estimator=u'LS')[source]¶ Converts raw pair counts to a correlation function.
Parameters: - ND1 (integer) – Number of points in the first dataset
- ND2 (integer) – Number of points in the second dataset
- NR1 (integer) – Number of points in the randoms for first dataset
- NR2 (integer) – Number of points in the randoms for second dataset
- D1D2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and D2
- D1R2 (array-like, integer) – Pair-counts for the cross-correlation between D1 and R2
- D2R1 (array-like, integer) – Pair-counts for the cross-correlation between D2 and R1
- R1R2 (array-like, integer) – Pair-counts for the cross-correlation between R1 and R2
- all of these pair-counts arrays, the corresponding numpy (For) –
- returned by the theory/mocks modules can also be passed (struct) –
- nrpbins (integer) – Number of bins in
rp
- pimax (float) – Integration distance along the line of sight direction
- dpi (float, default=1.0 Mpc/h) – Binsize in the line of sight direction
- estimator (string, default='LS' (Landy-Szalay)) – The kind of estimator to use for computing the correlation function. Currently, only supports Landy-Szalay
Returns: wp – The projected correlation function, calculated using the chosen estimator, is returned. If any of the
pi
bins (in anrp
bin) contains 0 for theRR
counts, thenNAN
is returned for thatrp
bin.Return type: A numpy array
Example
>>> from __future__ import print_function >>> import numpy as np >>> from Corrfunc.theory.DDrppi import DDrppi >>> from Corrfunc.io import read_catalog >>> from Corrfunc.utils import convert_rp_pi_counts_to_wp >>> X, Y, Z = read_catalog() >>> N = len(X) >>> boxsize = 420.0 >>> rand_N = 3*N >>> seed = 42 >>> np.random.seed(seed) >>> rand_X = np.random.uniform(0, boxsize, rand_N) >>> rand_Y = np.random.uniform(0, boxsize, rand_N) >>> rand_Z = np.random.uniform(0, boxsize, rand_N) >>> nthreads = 4 >>> pimax = 40.0 >>> nrpbins = 20 >>> rpmin = 0.1 >>> rpmax = 10.0 >>> bins = np.linspace(rpmin, rpmax, nrpbins + 1) >>> autocorr = 1 >>> DD_counts = DDrppi(autocorr, nthreads, pimax, bins, ... X, Y, Z) >>> autocorr = 0 >>> DR_counts = DDrppi(autocorr, nthreads, pimax, bins, ... X, Y, Z, ... X2=rand_X, Y2=rand_Y, Z2=rand_Z) >>> autocorr = 1 >>> RR_counts = DDrppi(autocorr, nthreads, pimax, bins, ... rand_X, rand_Y, rand_Z) >>> wp = convert_rp_pi_counts_to_wp(N, N, rand_N, rand_N, ... DD_counts, DR_counts, ... DR_counts, RR_counts, ... nrpbins, pimax) >>> for w in wp: print("{0:10.6f}".format(w)) ... # doctest: +NORMALIZE_WHITESPACE 187.592199 83.059181 53.200599 40.389354 33.356371 29.045476 26.088133 23.628340 21.703961 20.153125 18.724781 17.433235 16.287183 15.443230 14.436193 13.592727 12.921226 12.330074 11.696364 11.208365
-
Corrfunc.utils.
translate_isa_string_to_enum
(isa)[source]¶ Helper function to convert an user-supplied string to the underlying enum in the C-API. The extensions only have specific implementations for AVX512F, AVX, SSE42 and FALLBACK. Any other value will raise a ValueError.
Parameters: isa (string) – A string containing the desired instruction set. Valid values are [‘AVX512F’, ‘AVX’, ‘SSE42’, ‘FALLBACK’, ‘FASTEST’] Returns: instruction_set – An integer corresponding to the desired instruction set, as used in the underlying C API. The enum used here should be defined exactly the same way as the enum in utils/defs.h
.Return type: integer
-
Corrfunc.utils.
return_file_with_rbins
(rbins)[source]¶ Helper function to ensure that the
binfile
required by the Corrfunc extensions is a actually a string.Checks if the input is a string and file; return if True. If not, and the input is an array, then a temporary file is created and the contents of rbins is written out.
Parameters: rbins (string or array-like) – Expected to be a string or an array containing the bins Returns: binfile – If the input rbins
was a valid filename, then returns the same string. Ifrbins
was an array, then this function creates a temporary file with the contents of therbins
arrays. This temporary filename is returnedReturn type: string, filename
-
Corrfunc.utils.
fix_ra_dec
(ra, dec)[source]¶ Wraps input RA and DEC values into range expected by the extensions.
Parameters: - RA (array-like, units must be degrees) – Right Ascension values (astronomical longitude)
- DEC (array-like, units must be degrees) – Declination values (astronomical latitude)
Returns: Tuple (RA, DEC) – RA is wrapped into range [0.0, 360.0] Declination is wrapped into range [-90.0, 90.0]
Return type: array-like
-
Corrfunc.utils.
fix_cz
(cz)[source]¶ Multiplies the input array by speed of light, if the input values are too small.
Essentially, converts redshift into cz, if the user passed redshifts instead of cz.
Parameters: cz (array-like, reals) – An array containing [Speed of Light *] redshift
values.Returns: cz – Actual cz
values, multiplying the inputcz
array by theSpeed of Light
, ifredshift
values were passed as inputcz
.Return type: array-like
-
Corrfunc.utils.
compute_nbins
(max_diff, binsize, refine_factor=1, max_nbins=None)[source]¶ Helper utility to find the number of bins for that satisfies the constraints of (binsize, refine_factor, and max_nbins).
Parameters: - max_diff (double) – Max. difference (spatial or angular) to be spanned, (i.e., range of allowed domain values)
- binsize (double) – Min. allowed binsize (spatial or angular)
- refine_factor (integer, default 1) – How many times to refine the bins. The refinements occurs
after
nbins
has already been determined (withrefine_factor-1
). Thus, the number of bins will be exactly higher byrefine_factor
compared to the base case ofrefine_factor=1
- max_nbins (integer, default None) – Max number of allowed cells
Returns: nbins – Number of bins that satisfies the constraints of bin size >=
binsize
, the refinement factor and nbins <=max_nbins
.Return type: integer, >= 1
Example
>>> from Corrfunc.utils import compute_nbins >>> max_diff = 180 >>> binsize = 10 >>> compute_nbins(max_diff, binsize) 18 >>> refine_factor=2 >>> max_nbins = 20 >>> compute_nbins(max_diff, binsize, refine_factor=refine_factor, ... max_nbins=max_nbins) 20
-
Corrfunc.utils.
gridlink_sphere
(thetamax, ra_limits=None, dec_limits=None, link_in_ra=True, ra_refine_factor=1, dec_refine_factor=1, max_ra_cells=100, max_dec_cells=200, return_num_ra_cells=False, input_in_degrees=True)[source]¶ A method to optimally partition spherical regions such that pairs of points within a certain angular separation,
thetamax
, can be quickly computed.Generates the binning scheme used in
Corrfunc.mocks.DDtheta_mocks
for a spherical region in Right Ascension (RA), Declination (DEC) and a maximum angular separation.For a given
thetamax
, regions on the sphere are divided into bands in DEC bands, with the width in DEC equal tothetamax
. Iflink_in_ra
is set, then these DEC bands are further sub-divided into RA cells.Parameters: - thetamax (double) – Max. angular separation of pairs. Expected to be in degrees
unless
input_in_degrees
is set toFalse
. - ra_limits (array of 2 doubles. Default [0.0, 2*pi]) – Range of Righ Ascension (longitude) for the spherical region
- dec_limits (array of 2 doubles. Default [-pi/2, pi/2]) – Range of Declination (latitude) values for the spherical region
- link_in_ra (Boolean. Default True) – Whether linking in RA is done (in addition to linking in DEC)
- ra_refine_factor (integer, >= 1. Default 1) – Controls the sub-division of the RA cells. For a large number of particles, higher ra_refine_factor typically results in a faster runtime
- dec_refine_factor (integer, >= 1. Default 1) – Controls the sub-division of the DEC cells. For a large number of particles, higher dec_refine_factor typically results in a faster runtime
- max_ra_cells (integer, >= 1. Default 100) – The max. number of RA cells per DEC band.
- max_dec_cells (integer >= 1. Default 200) – The max. number of total DEC bands
- return_num_ra_cells (bool, default False) – Flag to return the number of RA cells per DEC band
- input_in_degrees (Boolean. Default True) – Flag to show if the input quantities are in degrees. If set to False, all angle inputs will be taken to be in radians.
Returns: - sphere_grid (A numpy compound array, shape (ncells, 2)) – A numpy compound array with fields
dec_limit
andra_limit
of size 2 each. These arrays contain the beginning and end of DEC and RA regions for the cell. - num_ra_cells (numpy array, returned if
return_num_ra_cells
is set) – A numpy array containing the number of RA cells per declination band
Note
If
link_in_ra=False
, then there is effectively one RA bin per DEC band. The ‘ra_limit’ field will show the range of allowed RA values.See also
Example
>>> from Corrfunc.utils import gridlink_sphere >>> import numpy as np >>> try: # Backwards compatibility with old Numpy print formatting ... np.set_printoptions(legacy='1.13') ... except TypeError: ... pass >>> thetamax=30 >>> grid = gridlink_sphere(thetamax) >>> print(grid) # doctest: +NORMALIZE_WHITESPACE [([-1.57079633, -1.04719755], [ 0. , 3.14159265]) ([-1.57079633, -1.04719755], [ 3.14159265, 6.28318531]) ([-1.04719755, -0.52359878], [ 0. , 3.14159265]) ([-1.04719755, -0.52359878], [ 3.14159265, 6.28318531]) ([-0.52359878, 0. ], [ 0. , 1.25663706]) ([-0.52359878, 0. ], [ 1.25663706, 2.51327412]) ([-0.52359878, 0. ], [ 2.51327412, 3.76991118]) ([-0.52359878, 0. ], [ 3.76991118, 5.02654825]) ([-0.52359878, 0. ], [ 5.02654825, 6.28318531]) ([ 0. , 0.52359878], [ 0. , 1.25663706]) ([ 0. , 0.52359878], [ 1.25663706, 2.51327412]) ([ 0. , 0.52359878], [ 2.51327412, 3.76991118]) ([ 0. , 0.52359878], [ 3.76991118, 5.02654825]) ([ 0. , 0.52359878], [ 5.02654825, 6.28318531]) ([ 0.52359878, 1.04719755], [ 0. , 3.14159265]) ([ 0.52359878, 1.04719755], [ 3.14159265, 6.28318531]) ([ 1.04719755, 1.57079633], [ 0. , 3.14159265]) ([ 1.04719755, 1.57079633], [ 3.14159265, 6.28318531])] >>> grid = gridlink_sphere(60, dec_refine_factor=3, ra_refine_factor=2) >>> print(grid) # doctest: +NORMALIZE_WHITESPACE [([-1.57079633, -1.22173048], [ 0. , 1.57079633]) ([-1.57079633, -1.22173048], [ 1.57079633, 3.14159265]) ([-1.57079633, -1.22173048], [ 3.14159265, 4.71238898]) ([-1.57079633, -1.22173048], [ 4.71238898, 6.28318531]) ([-1.22173048, -0.87266463], [ 0. , 1.57079633]) ([-1.22173048, -0.87266463], [ 1.57079633, 3.14159265]) ([-1.22173048, -0.87266463], [ 3.14159265, 4.71238898]) ([-1.22173048, -0.87266463], [ 4.71238898, 6.28318531]) ([-0.87266463, -0.52359878], [ 0. , 1.57079633]) ([-0.87266463, -0.52359878], [ 1.57079633, 3.14159265]) ([-0.87266463, -0.52359878], [ 3.14159265, 4.71238898]) ([-0.87266463, -0.52359878], [ 4.71238898, 6.28318531]) ([-0.52359878, -0.17453293], [ 0. , 1.57079633]) ([-0.52359878, -0.17453293], [ 1.57079633, 3.14159265]) ([-0.52359878, -0.17453293], [ 3.14159265, 4.71238898]) ([-0.52359878, -0.17453293], [ 4.71238898, 6.28318531]) ([-0.17453293, 0.17453293], [ 0. , 1.57079633]) ([-0.17453293, 0.17453293], [ 1.57079633, 3.14159265]) ([-0.17453293, 0.17453293], [ 3.14159265, 4.71238898]) ([-0.17453293, 0.17453293], [ 4.71238898, 6.28318531]) ([ 0.17453293, 0.52359878], [ 0. , 1.57079633]) ([ 0.17453293, 0.52359878], [ 1.57079633, 3.14159265]) ([ 0.17453293, 0.52359878], [ 3.14159265, 4.71238898]) ([ 0.17453293, 0.52359878], [ 4.71238898, 6.28318531]) ([ 0.52359878, 0.87266463], [ 0. , 1.57079633]) ([ 0.52359878, 0.87266463], [ 1.57079633, 3.14159265]) ([ 0.52359878, 0.87266463], [ 3.14159265, 4.71238898]) ([ 0.52359878, 0.87266463], [ 4.71238898, 6.28318531]) ([ 0.87266463, 1.22173048], [ 0. , 1.57079633]) ([ 0.87266463, 1.22173048], [ 1.57079633, 3.14159265]) ([ 0.87266463, 1.22173048], [ 3.14159265, 4.71238898]) ([ 0.87266463, 1.22173048], [ 4.71238898, 6.28318531]) ([ 1.22173048, 1.57079633], [ 0. , 1.57079633]) ([ 1.22173048, 1.57079633], [ 1.57079633, 3.14159265]) ([ 1.22173048, 1.57079633], [ 3.14159265, 4.71238898]) ([ 1.22173048, 1.57079633], [ 4.71238898, 6.28318531])]
- thetamax (double) – Max. angular separation of pairs. Expected to be in degrees
unless