API¶
Top level user functions:
all (a[, axis, keepdims, split_every, out]) |
Test whether all array elements along a given axis evaluate to True. |
angle (x[, deg]) |
Return the angle of the complex argument. |
any (a[, axis, keepdims, split_every, out]) |
Test whether any array element along a given axis evaluates to True. |
apply_along_axis (func1d, axis, arr, \*args, ...) |
Apply a function to 1-D slices along the given axis. |
apply_over_axes (func, a, axes) |
Apply a function repeatedly over multiple axes. |
arange (\*args, \*\*kwargs) |
Return evenly spaced values from start to stop with step size step. |
arccos |
arccos(x[, out]) |
arccosh |
arccosh(x[, out]) |
arcsin |
arcsin(x[, out]) |
arcsinh |
arcsinh(x[, out]) |
arctan |
arctan(x[, out]) |
arctan2 |
arctan2(x1, x2[, out]) |
arctanh |
arctanh(x[, out]) |
argmax (x[, axis, split_every, out]) |
Returns the indices of the maximum values along an axis. |
argmin (x[, axis, split_every, out]) |
Returns the indices of the minimum values along an axis. |
argwhere (a) |
Find the indices of array elements that are non-zero, grouped by element. |
around (x[, decimals]) |
Evenly round to the given number of decimals. |
array (object[, dtype, copy, order, subok, ndmin]) |
Create an array. |
bincount (x[, weights, minlength]) |
Count number of occurrences of each value in array of non-negative ints. |
broadcast_to (x, shape) |
Broadcast an array to a new shape. |
coarsen (reduction, x, axes[, trim_excess]) |
Coarsen array by applying reduction to fixed size neighborhoods |
ceil |
ceil(x[, out]) |
choose (a, choices) |
Construct an array from an index array and a set of arrays to choose from. |
clip (\*args, \*\*kwargs) |
Clip (limit) the values in an array. |
compress (condition, a[, axis]) |
Return selected slices of an array along given axis. |
concatenate (seq[, axis, ...]) |
Concatenate arrays along an existing axis |
conj |
conjugate(x[, out]) |
copysign |
copysign(x1, x2[, out]) |
corrcoef (x[, y, rowvar]) |
Return Pearson product-moment correlation coefficients. |
cos |
cos(x[, out]) |
cosh |
cosh(x[, out]) |
count_nonzero (a) |
Counts the number of non-zero values in the array a . |
cov (m[, y, rowvar, bias, ddof]) |
Estimate a covariance matrix, given data and weights. |
cumprod (x[, axis, dtype, out]) |
Return the cumulative product of elements along a given axis. |
cumsum (x[, axis, dtype, out]) |
Return the cumulative sum of the elements along a given axis. |
deg2rad |
deg2rad(x[, out]) |
degrees |
degrees(x[, out]) |
diag (v) |
Extract a diagonal or construct a diagonal array. |
diff (a[, n, axis]) |
Calculate the n-th discrete difference along given axis. |
digitize (x, bins[, right]) |
Return the indices of the bins to which each value in input array belongs. |
dot (a, b[, out]) |
Dot product of two arrays. |
dstack (tup) |
Stack arrays in sequence depth wise (along third axis). |
ediff1d (ary[, to_end, to_begin]) |
The differences between consecutive elements of an array. |
empty |
Blocked variant of empty |
empty_like (a[, dtype, chunks]) |
Return a new array with the same shape and type as a given array. |
exp |
exp(x[, out]) |
expm1 |
expm1(x[, out]) |
eye (N, chunks[, M, k, dtype]) |
Return a 2-D Array with ones on the diagonal and zeros elsewhere. |
fabs |
fabs(x[, out]) |
fix (\*args, \*\*kwargs) |
Round to nearest integer towards zero. |
flatnonzero (a) |
Return indices that are non-zero in the flattened version of a. |
floor |
floor(x[, out]) |
fmax |
fmax(x1, x2[, out]) |
fmin |
fmin(x1, x2[, out]) |
fmod |
fmod(x1, x2[, out]) |
frexp (x[, out1, out2]) |
Decompose the elements of x into mantissa and twos exponent. |
fromfunction (func[, chunks, shape, dtype]) |
Construct an array by executing a function over each coordinate. |
full |
Blocked variant of full |
full_like (a, fill_value[, dtype, chunks]) |
Return a full array with the same shape and type as a given array. |
histogram (a[, bins, range, normed, weights, ...]) |
Blocked variant of numpy.histogram. |
hstack (tup) |
Stack arrays in sequence horizontally (column wise). |
hypot |
hypot(x1, x2[, out]) |
imag (\*args, \*\*kwargs) |
Return the imaginary part of the elements of the array. |
indices (dimensions[, dtype, chunks]) |
Implements NumPy’s indices for Dask Arrays. |
insert (arr, obj, values, axis) |
Insert values along the given axis before the given indices. |
isclose (arr1, arr2[, rtol, atol, equal_nan]) |
Returns a boolean array where two arrays are element-wise equal within a tolerance. |
iscomplex (\*args, \*\*kwargs) |
Returns a bool array, where True if input element is complex. |
isfinite |
isfinite(x[, out]) |
isinf |
isinf(x[, out]) |
isnan |
isnan(x[, out]) |
isnull (values) |
pandas.isnull for dask arrays |
isreal (\*args, \*\*kwargs) |
Returns a bool array, where True if input element is real. |
ldexp |
ldexp(x1, x2[, out]) |
linspace (start, stop[, num, chunks, dtype]) |
Return num evenly spaced values over the closed interval [start, stop]. |
log |
log(x[, out]) |
log10 |
log10(x[, out]) |
log1p |
log1p(x[, out]) |
log2 |
log2(x[, out]) |
logaddexp |
logaddexp(x1, x2[, out]) |
logaddexp2 |
logaddexp2(x1, x2[, out]) |
logical_and |
logical_and(x1, x2[, out]) |
logical_not |
logical_not(x[, out]) |
logical_or |
logical_or(x1, x2[, out]) |
logical_xor |
logical_xor(x1, x2[, out]) |
max (a[, axis, keepdims, split_every, out]) |
Return the maximum of an array or maximum along an axis. |
maximum |
maximum(x1, x2[, out]) |
mean (a[, axis, dtype, keepdims, ...]) |
Compute the arithmetic mean along the specified axis. |
min (a[, axis, keepdims, split_every, out]) |
Return the minimum of an array or minimum along an axis. |
minimum |
minimum(x1, x2[, out]) |
modf (x[, out1, out2]) |
Return the fractional and integral parts of an array, element-wise. |
moment (a, order[, axis, dtype, keepdims, ...]) |
|
nanargmax (x[, axis, split_every, out]) |
|
nanargmin (x[, axis, split_every, out]) |
|
nancumprod (x, axis[, dtype, out]) |
Return the cumulative product of array elements over a given axis treating Not a Numbers (NaNs) as one. |
nancumsum (x, axis[, dtype, out]) |
Return the cumulative sum of array elements over a given axis treating Not a Numbers (NaNs) as zero. |
nanmax (a[, axis, keepdims, split_every, out]) |
Return the maximum of an array or maximum along an axis, ignoring any NaNs. |
nanmean (a[, axis, dtype, keepdims, ...]) |
Compute the arithmetic mean along the specified axis, ignoring NaNs. |
nanmin (a[, axis, keepdims, split_every, out]) |
Return minimum of an array or minimum along an axis, ignoring any NaNs. |
nanprod (a[, axis, dtype, keepdims, ...]) |
Return the product of array elements over a given axis treating Not a Numbers (NaNs) as zero. |
nanstd (a[, axis, dtype, keepdims, ddof, ...]) |
Compute the standard deviation along the specified axis, while ignoring NaNs. |
nansum (a[, axis, dtype, keepdims, ...]) |
Return the sum of array elements over a given axis treating Not a Numbers (NaNs) as zero. |
nanvar (a[, axis, dtype, keepdims, ddof, ...]) |
Compute the variance along the specified axis, while ignoring NaNs. |
nextafter |
nextafter(x1, x2[, out]) |
nonzero (a) |
Return the indices of the elements that are non-zero. |
notnull (values) |
pandas.notnull for dask arrays |
ones |
Blocked variant of ones |
ones_like (a[, dtype, chunks]) |
Return an array of ones with the same shape and type as a given array. |
percentile (a, q[, interpolation]) |
Approximate percentile of 1-D array |
prod (a[, axis, dtype, keepdims, ...]) |
Return the product of array elements over a given axis. |
ptp (a[, axis]) |
Range of values (maximum - minimum) along an axis. |
rad2deg |
rad2deg(x[, out]) |
radians |
radians(x[, out]) |
ravel (array) |
Return a contiguous flattened array. |
real (\*args, \*\*kwargs) |
Return the real part of the elements of the array. |
rechunk (x, chunks[, threshold, block_size_limit]) |
Convert blocks in dask array x for new chunks. |
repeat (a, repeats[, axis]) |
Repeat elements of an array. |
reshape (x, shape) |
Reshape array to new shape |
result_type (\*arrays_and_dtypes) |
Returns the type that results from applying the NumPy type promotion rules to the arguments. |
rint |
rint(x[, out]) |
roll (array, shift[, axis]) |
Roll array elements along a given axis. |
round (a[, decimals]) |
Round an array to the given number of decimals. |
sign |
sign(x[, out]) |
signbit |
signbit(x[, out]) |
sin |
sin(x[, out]) |
sinh |
sinh(x[, out]) |
sqrt |
sqrt(x[, out]) |
square |
square(x[, out]) |
squeeze (a[, axis]) |
Remove single-dimensional entries from the shape of an array. |
stack (seq[, axis]) |
Stack arrays along a new axis |
std (a[, axis, dtype, keepdims, ddof, ...]) |
Compute the standard deviation along the specified axis. |
sum (a[, axis, dtype, keepdims, split_every, out]) |
Sum of array elements over a given axis. |
take (a, indices[, axis]) |
Take elements from an array along an axis. |
tan |
tan(x[, out]) |
tanh |
tanh(x[, out]) |
tensordot (lhs, rhs[, axes]) |
Compute tensor dot product along specified axes for arrays >= 1-D. |
tile (A, reps) |
Construct an array by repeating A the number of times given by reps. |
topk (k, x) |
The top k elements of an array |
transpose (a[, axes]) |
Permute the dimensions of an array. |
tril (m[, k]) |
Lower triangle of an array with elements above the k-th diagonal zeroed. |
triu (m[, k]) |
Upper triangle of an array with elements above the k-th diagonal zeroed. |
trunc |
trunc(x[, out]) |
unique (x) |
Find the unique elements of an array. |
var (a[, axis, dtype, keepdims, ddof, ...]) |
Compute the variance along the specified axis. |
vnorm (a[, ord, axis, dtype, keepdims, ...]) |
Vector norm |
vstack (tup) |
Stack arrays in sequence vertically (row wise). |
where (condition, [x, y]) |
Return elements, either from x or y, depending on condition. |
zeros |
Blocked variant of zeros |
zeros_like (a[, dtype, chunks]) |
Return an array of zeros with the same shape and type as a given array. |
Fast Fourier Transforms¶
fft.fft_wrap (fft_func[, kind, dtype]) |
Wrap 1D complex FFT functions |
fft.fft (a[, n, axis]) |
Wrapping of numpy.fft.fftpack.fft |
fft.fft2 (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.fft2 |
fft.fftn (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.fftn |
fft.ifft (a[, n, axis]) |
Wrapping of numpy.fft.fftpack.ifft |
fft.ifft2 (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.ifft2 |
fft.ifftn (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.ifftn |
fft.rfft (a[, n, axis]) |
Wrapping of numpy.fft.fftpack.rfft |
fft.rfft2 (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.rfft2 |
fft.rfftn (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.rfftn |
fft.irfft (a[, n, axis]) |
Wrapping of numpy.fft.fftpack.irfft |
fft.irfft2 (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.irfft2 |
fft.irfftn (a[, s, axes]) |
Wrapping of numpy.fft.fftpack.irfftn |
fft.hfft (a[, n, axis]) |
Wrapping of numpy.fft.fftpack.hfft |
fft.ihfft (a[, n, axis]) |
Wrapping of numpy.fft.fftpack.ihfft |
fft.fftfreq (n[, d, chunks]) |
Return the Discrete Fourier Transform sample frequencies. |
fft.rfftfreq (n[, d, chunks]) |
Return the Discrete Fourier Transform sample frequencies (for usage with rfft, irfft). |
fft.fftshift (x[, axes]) |
Shift the zero-frequency component to the center of the spectrum. |
fft.ifftshift (x[, axes]) |
The inverse of fftshift. |
Linear Algebra¶
linalg.cholesky (a[, lower]) |
Returns the Cholesky decomposition, \(A = L L^*\) or \(A = U^* U\) of a Hermitian positive-definite matrix A. |
linalg.inv (a) |
Compute the inverse of a matrix with LU decomposition and forward / backward substitutions. |
linalg.lstsq (a, b) |
Return the least-squares solution to a linear matrix equation using QR decomposition. |
linalg.lu (a) |
Compute the lu decomposition of a matrix. |
linalg.norm (x[, ord, axis, keepdims]) |
Matrix or vector norm. |
linalg.qr (a[, name]) |
Compute the qr factorization of a matrix. |
linalg.solve (a, b[, sym_pos]) |
Solve the equation a x = b for x . |
linalg.solve_triangular (a, b[, lower]) |
Solve the equation a x = b for x, assuming a is a triangular matrix. |
linalg.svd (a[, name]) |
Compute the singular value decomposition of a matrix. |
linalg.svd_compressed (a, k[, n_power_iter, ...]) |
Randomly compressed rank-k thin Singular Value Decomposition. |
linalg.tsqr (data[, name, compute_svd]) |
Direct Tall-and-Skinny QR algorithm |
Masked Arrays¶
ma.filled (a[, fill_value]) |
Return input as an array with masked data replaced by a fill value. |
ma.fix_invalid (a[, fill_value]) |
Return input with invalid data masked and replaced by a fill value. |
ma.getdata (a) |
Return the data of a masked array as an ndarray. |
ma.getmaskarray (a) |
Return the mask of a masked array, or full boolean array of False. |
ma.masked_array (data[, mask, fill_value]) |
An array class with possibly masked values. |
ma.masked_equal (a, value) |
Mask an array where equal to a given value. |
ma.masked_greater (a, value) |
Mask an array where greater than a given value. |
ma.masked_greater_equal (a, value) |
Mask an array where greater than or equal to a given value. |
ma.masked_inside (x, v1, v2) |
Mask an array inside a given interval. |
ma.masked_invalid (a) |
Mask an array where invalid values occur (NaNs or infs). |
ma.masked_less (a, value) |
Mask an array where less than a given value. |
ma.masked_less_equal (a, value) |
Mask an array where less than or equal to a given value. |
ma.masked_not_equal (a, value) |
Mask an array where not equal to a given value. |
ma.masked_outside (x, v1, v2) |
Mask an array outside a given interval. |
ma.masked_values (x, value[, rtol, atol, shrink]) |
Mask using floating point equality. |
ma.masked_where (condition, a) |
Mask an array where a condition is met. |
ma.set_fill_value (a, fill_value) |
Set the filling value of a, if a is a masked array. |
Random¶
random.beta |
beta(a, b, size=None) |
random.binomial |
binomial(n, p, size=None) |
random.chisquare |
chisquare(df, size=None) |
random.exponential |
exponential(scale=1.0, size=None) |
random.f |
f(dfnum, dfden, size=None) |
random.gamma |
gamma(shape, scale=1.0, size=None) |
random.geometric |
geometric(p, size=None) |
random.gumbel |
gumbel(loc=0.0, scale=1.0, size=None) |
random.hypergeometric |
hypergeometric(ngood, nbad, nsample, size=None) |
random.laplace |
laplace(loc=0.0, scale=1.0, size=None) |
random.logistic |
logistic(loc=0.0, scale=1.0, size=None) |
random.lognormal |
lognormal(mean=0.0, sigma=1.0, size=None) |
random.logseries |
logseries(p, size=None) |
random.negative_binomial |
negative_binomial(n, p, size=None) |
random.noncentral_chisquare |
noncentral_chisquare(df, nonc, size=None) |
random.noncentral_f |
noncentral_f(dfnum, dfden, nonc, size=None) |
random.normal |
normal(loc=0.0, scale=1.0, size=None) |
random.pareto |
pareto(a, size=None) |
random.poisson |
poisson(lam=1.0, size=None) |
random.power |
power(a, size=None) |
random.random |
random_sample(size=None) |
random.random_sample |
random_sample(size=None) |
random.rayleigh |
rayleigh(scale=1.0, size=None) |
random.standard_cauchy |
standard_cauchy(size=None) |
random.standard_exponential |
standard_exponential(size=None) |
random.standard_gamma |
standard_gamma(shape, size=None) |
random.standard_normal |
standard_normal(size=None) |
random.standard_t |
standard_t(df, size=None) |
random.triangular |
triangular(left, mode, right, size=None) |
random.uniform |
uniform(low=0.0, high=1.0, size=None) |
random.vonmises |
vonmises(mu, kappa, size=None) |
random.wald |
wald(mean, scale, size=None) |
random.weibull |
weibull(a, size=None) |
random.zipf |
Standard distributions |
Stats¶
stats.ttest_ind (a, b[, axis, equal_var]) |
Calculates the T-test for the means of two independent samples of scores. |
stats.ttest_1samp (a, popmean[, axis, nan_policy]) |
Calculates the T-test for the mean of ONE group of scores. |
stats.ttest_rel (a, b[, axis, nan_policy]) |
Calculates the T-test on TWO RELATED samples of scores, a and b. |
stats.chisquare (f_obs[, f_exp, ddof, axis]) |
Calculates a one-way chi square test. |
stats.power_divergence (f_obs[, f_exp, ddof, ...]) |
Cressie-Read power divergence statistic and goodness of fit test. |
stats.skew (a[, axis, bias, nan_policy]) |
Computes the skewness of a data set. |
stats.skewtest (a[, axis, nan_policy]) |
Tests whether the skew is different from the normal distribution. |
stats.kurtosis (a[, axis, fisher, bias, ...]) |
Computes the kurtosis (Fisher or Pearson) of a dataset. |
stats.kurtosistest (a[, axis, nan_policy]) |
Tests whether a dataset has normal kurtosis |
stats.normaltest (a[, axis, nan_policy]) |
Tests whether a sample differs from a normal distribution. |
stats.f_oneway (\*args) |
Performs a 1-way ANOVA. |
stats.moment (a[, moment, axis, nan_policy]) |
Calculates the nth moment about the mean for a sample. |
Image Support¶
image.imread (filename[, imread, preprocess]) |
Read a stack of images into a dask array |
Slightly Overlapping Ghost Computations¶
ghost.ghost (x, depth, boundary) |
Share boundaries between neighboring blocks |
ghost.map_overlap (x, func, depth[, ...]) |
Create and Store Arrays¶
from_array (x, chunks[, name, lock, asarray, ...]) |
Create dask array from something that looks like an array |
from_delayed (value, shape, dtype[, name]) |
Create a dask array from a dask delayed value |
from_npy_stack (dirname[, mmap_mode]) |
Load dask array from stack of npy files |
store (sources, targets[, lock, regions, compute]) |
Store dask arrays in array-like objects, overwrite data in target |
to_hdf5 (filename, \*args, \*\*kwargs) |
Store arrays in HDF5 file |
to_npy_stack (dirname, x[, axis]) |
Write dask array to a stack of .npy files |
Internal functions¶
map_blocks (func, \*args, \*\*kwargs) |
Map a function across all blocks of a dask array. |
atop (func, out_ind, \*args, \*\*kwargs) |
Tensor operation: Generalized inner and outer products |
top (func, output, out_indices, ...) |
Tensor operation |
Other functions¶
-
dask.array.
from_array
(x, chunks, name=None, lock=False, asarray=True, fancy=True, getitem=None)¶ Create dask array from something that looks like an array
Input must have a
.shape
and support numpy-style slicing.Parameters: - x (array_like) –
- chunks (int, tuple) –
How to chunk the array. Must be one of the following forms: - A blocksize like 1000. - A blockshape like (1000, 1000). - Explicit sizes of all blocks along all dimensions
like ((1000, 1000, 500), (400, 400)). - name (str, optional) – The key name to use for the array. Defaults to a hash of
x
. Usename=False
to generate a random name instead of hashing (fast) - lock (bool or Lock, optional) – If
x
doesn’t support concurrent reads then provide a lock here, or pass in True to have dask.array create one for you. - asarray (bool, optional) – If True (default), then chunks will be converted to instances of
ndarray
. Set to False to pass passed chunks through unchanged. - fancy (bool, optional) – If
x
doesn’t support fancy indexing (e.g. indexing with lists or arrays) then set to False. Default is True.
Examples
>>> x = h5py.File('...')['/data/path'] >>> a = da.from_array(x, chunks=(1000, 1000))
If your underlying datastore does not support concurrent reads then include the
lock=True
keyword argument orlock=mylock
if you want multiple arrays to coordinate around the same lock.>>> a = da.from_array(x, chunks=(1000, 1000), lock=True)
-
dask.array.
from_delayed
(value, shape, dtype, name=None)¶ Create a dask array from a dask delayed value
This routine is useful for constructing dask arrays in an ad-hoc fashion using dask delayed, particularly when combined with stack and concatenate.
The dask array will consist of a single chunk.
Examples
>>> from dask import delayed >>> value = delayed(np.ones)(5) >>> array = from_delayed(value, (5,), float) >>> array dask.array<from-value, shape=(5,), dtype=float64, chunksize=(5,)> >>> array.compute() array([ 1., 1., 1., 1., 1.])
-
dask.array.
store
(sources, targets, lock=True, regions=None, compute=True, **kwargs)¶ Store dask arrays in array-like objects, overwrite data in target
This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.
If your data fits in memory then you may prefer calling
np.array(myarray)
instead.Parameters: - sources (Array or iterable of Arrays) –
- targets (array-like or iterable of array-likes) – These should support setitem syntax
target[10:20] = ...
- lock (boolean or threading.Lock, optional) – Whether or not to lock the data stores while storing.
Pass True (lock each file individually), False (don’t lock) or a
particular
threading.Lock
object to be shared among all writes. - regions (tuple of slices or iterable of tuple of slices) – Each
region
tuple inregions
should be such thattarget[region].shape = source.shape
for the corresponding source and target in sources and targets, respectively. - compute (boolean, optional) – If true compute immediately, return
dask.delayed.Delayed
otherwise
Examples
>>> x = ...
>>> import h5py >>> f = h5py.File('myfile.hdf5') >>> dset = f.create_dataset('/data', shape=x.shape, ... chunks=x.chunks, ... dtype='f8')
>>> store(x, dset)
Alternatively store many arrays at the same time
>>> store([x, y, z], [dset1, dset2, dset3])
-
dask.array.
topk
(k, x)¶ The top k elements of an array
Returns the k greatest elements of the array in sorted order. Only works on arrays of a single dimension.
This assumes that
k
is small. All results will be returned in a single chunk.Examples
>>> x = np.array([5, 1, 3, 6]) >>> d = from_array(x, chunks=2) >>> d.topk(2).compute() array([6, 5])
-
dask.array.
coarsen
(reduction, x, axes, trim_excess=False)¶ Coarsen array by applying reduction to fixed size neighborhoods
Parameters: Examples
>>> x = np.array([1, 2, 3, 4, 5, 6]) >>> coarsen(np.sum, x, {0: 2}) array([ 3, 7, 11]) >>> coarsen(np.max, x, {0: 3}) array([3, 6])
Provide dictionary of scale per dimension
>>> x = np.arange(24).reshape((4, 6)) >>> x array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]])
>>> coarsen(np.min, x, {0: 2, 1: 3}) array([[ 0, 3], [12, 15]])
You must avoid excess elements explicitly
>>> x = np.array([1, 2, 3, 4, 5, 6, 7, 8]) >>> coarsen(np.min, x, {0: 3}, trim_excess=True) array([1, 4])
-
dask.array.
stack
(seq, axis=0)¶ Stack arrays along a new axis
Given a sequence of dask Arrays form a new dask Array by stacking them along a new dimension (axis=0 by default)
Examples
Create slices
>>> import dask.array as da >>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2)) ... for i in range(3)]
>>> x = da.stack(data, axis=0) >>> x.shape (3, 4, 4)
>>> da.stack(data, axis=1).shape (4, 3, 4)
>>> da.stack(data, axis=-1).shape (4, 4, 3)
Result is a new dask Array
See also
-
dask.array.
concatenate
(seq, axis=0, allow_unknown_chunksizes=False)¶ Concatenate arrays along an existing axis
Given a sequence of dask Arrays form a new dask Array by stacking them along an existing dimension (axis=0 by default)
Parameters: - seq (list of dask.arrays) –
- axis (int) – Dimension along which to align all of the arrays
- allow_unknown_chunksizes (bool) – Allow unknown chunksizes, such as come from converting from dask dataframes. Dask.array is unable to verify that chunks line up. If data comes from differently aligned sources then this can cause unexpected results.
Examples
Create slices
>>> import dask.array as da >>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2)) ... for i in range(3)]
>>> x = da.concatenate(data, axis=0) >>> x.shape (12, 4)
>>> da.concatenate(data, axis=1).shape (4, 12)
Result is a new dask Array
See also
-
dask.array.
all
(a, axis=None, keepdims=False, split_every=None, out=None)¶ Test whether all array elements along a given axis evaluate to True.
Parameters: - a (array_like) – Input array or object that can be converted to an array.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which a logical AND reduction is performed. The default (axis = None) is to perform a logical AND over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.
- out (ndarray, optional) – Alternate output array in which to place the result.
It must have the same shape as the expected output and its
type is preserved (e.g., if
dtype(out)
is float, the result will consist of 0.0’s and 1.0’s). See doc.ufuncs (Section “Output arguments”) for more details. - keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the all method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: all – A new boolean or array is returned unless out is specified, in which case a reference to out is returned.
Return type: ndarray, bool
See also
ndarray.all()
- equivalent method
any()
- Test whether any element along a given axis evaluates to True.
Notes
Not a Number (NaN), positive infinity and negative infinity evaluate to True because these are not equal to zero.
Examples
>>> np.all([[True,False],[True,True]]) False
>>> np.all([[True,False],[True,True]], axis=0) array([ True, False], dtype=bool)
>>> np.all([-1, 4, 5]) True
>>> np.all([1.0, np.nan]) True
>>> o=np.array([False]) >>> z=np.all([-1, 4, 5], out=o) >>> id(z), id(o), z (28293632, 28293632, array([ True], dtype=bool))
-
dask.array.
angle
(x, deg=0)¶ Return the angle of the complex argument.
Parameters: - z (array_like) – A complex number or sequence of complex numbers.
- deg (bool, optional) – Return angle in degrees if True, radians if False (default).
Returns: angle – The counterclockwise angle from the positive real axis on the complex plane, with dtype as numpy.float64.
Return type: ndarray or scalar
See also
arctan2()
,absolute()
Examples
>>> np.angle([1.0, 1.0j, 1+1j]) # in radians array([ 0. , 1.57079633, 0.78539816]) >>> np.angle(1+1j, deg=True) # in degrees 45.0
-
dask.array.
any
(a, axis=None, keepdims=False, split_every=None, out=None)¶ Test whether any array element along a given axis evaluates to True.
Returns single boolean unless axis is not
None
Parameters: - a (array_like) – Input array or object that can be converted to an array.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which a logical OR reduction is performed. The default (axis = None) is to perform a logical OR over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.7.0.
If this is a tuple of ints, a reduction is performed on multiple axes, instead of a single axis or all the axes as before.
- out (ndarray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output and its type is preserved (e.g., if it is of type float, then it will remain so, returning 1.0 for True and 0.0 for False, regardless of the type of a). See doc.ufuncs (Section “Output arguments”) for details.
- keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the any method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: any – A new boolean or ndarray is returned unless out is specified, in which case a reference to out is returned.
Return type: bool or ndarray
See also
ndarray.any()
- equivalent method
all()
- Test whether all elements along a given axis evaluate to True.
Notes
Not a Number (NaN), positive infinity and negative infinity evaluate to True because these are not equal to zero.
Examples
>>> np.any([[True, False], [True, True]]) True
>>> np.any([[True, False], [False, False]], axis=0) array([ True, False], dtype=bool)
>>> np.any([-1, 0, 5]) True
>>> np.any(np.nan) True
>>> o=np.array([False]) >>> z=np.any([-1, 4, 5], out=o) >>> z, o (array([ True], dtype=bool), array([ True], dtype=bool)) >>> # Check now that z is a reference to o >>> z is o True >>> id(z), id(o) # identity of z and o (191614240, 191614240)
-
dask.array.
apply_along_axis
(func1d, axis, arr, *args, **kwargs)¶ Apply a function to 1-D slices along the given axis.
Execute func1d(a, *args) where func1d operates on 1-D arrays and a is a 1-D slice of arr along axis.
Parameters: - func1d (function) – This function should accept 1-D arrays. It is applied to 1-D slices of arr along the specified axis.
- axis (integer) – Axis along which arr is sliced.
- arr (ndarray) – Input array.
- args (any) – Additional arguments to func1d.
- kwargs (any) –
Additional named arguments to func1d.
New in version 1.9.0.
Returns: apply_along_axis – The output array. The shape of outarr is identical to the shape of arr, except along the axis dimension, where the length of outarr is equal to the size of the return value of func1d. If func1d returns a scalar outarr will have one fewer dimensions than arr.
Return type: ndarray
See also
apply_over_axes()
- Apply a function repeatedly over multiple axes.
Examples
>>> def my_func(a): ... """Average first and last element of a 1-D array""" ... return (a[0] + a[-1]) * 0.5 >>> b = np.array([[1,2,3], [4,5,6], [7,8,9]]) >>> np.apply_along_axis(my_func, 0, b) array([ 4., 5., 6.]) >>> np.apply_along_axis(my_func, 1, b) array([ 2., 5., 8.])
For a function that doesn’t return a scalar, the number of dimensions in outarr is the same as arr.
>>> b = np.array([[8,1,7], [4,3,9], [5,2,6]]) >>> np.apply_along_axis(sorted, 1, b) array([[1, 7, 8], [3, 4, 9], [2, 5, 6]])
-
dask.array.
apply_over_axes
(func, a, axes)¶ Apply a function repeatedly over multiple axes.
func is called as res = func(a, axis), where axis is the first element of axes. The result res of the function call must have either the same dimensions as a or one less dimension. If res has one less dimension than a, a dimension is inserted before axis. The call to func is then repeated for each axis in axes, with res as the first argument.
Parameters: - func (function) – This function must take two arguments, func(a, axis).
- a (array_like) – Input array.
- axes (array_like) – Axes over which func is applied; the elements must be integers.
Returns: apply_over_axis – The output array. The number of dimensions is the same as a, but the shape can be different. This depends on whether func changes the shape of its output with respect to its input.
Return type: ndarray
See also
apply_along_axis()
- Apply a function to 1-D slices of an array along the given axis.
Notes
This function is equivalent to tuple axis arguments to reorderable ufuncs with keepdims=True. Tuple axis arguments to ufuncs have been availabe since version 1.7.0.
Examples
>>> a = np.arange(24).reshape(2,3,4) >>> a array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])
Sum over axes 0 and 2. The result has same number of dimensions as the original array:
>>> np.apply_over_axes(np.sum, a, [0,2]) array([[[ 60], [ 92], [124]]])
Tuple axis arguments to ufuncs are equivalent:
>>> np.sum(a, axis=(0,2), keepdims=True) array([[[ 60], [ 92], [124]]])
-
dask.array.
arange
(*args, **kwargs)¶ Return evenly spaced values from start to stop with step size step.
The values are half-open [start, stop), so including start and excluding stop. This is basically the same as python’s range function but for dask arrays.
When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use linspace for these cases.
Parameters: - start (int, optional) – The starting value of the sequence. The default is 0.
- stop (int) – The end of the interval, this value is excluded from the interval.
- step (int, optional) – The spacing between the values. The default is 1 when not specified. The last value of the sequence.
- chunks (int) – The number of samples on each block. Note that the last block will have
fewer samples if
len(array) % chunks != 0
.
Returns: samples
Return type: dask array
See also
-
dask.array.
arccos
(x[, out])¶ Trigonometric inverse cosine, element-wise.
The inverse of cos so that, if
y = cos(x)
, thenx = arccos(y)
.Parameters: - x (array_like) – x-coordinate on the unit circle. For real arguments, the domain is [-1, 1].
- out (ndarray, optional) – Array of the same shape as a, to store results in. See doc.ufuncs (Section “Output arguments”) for more details.
Returns: angle – The angle of the ray intersecting the unit circle at the given x-coordinate in radians [0, pi]. If x is a scalar then a scalar is returned, otherwise an array of the same shape as x is returned.
Return type: ndarray
Notes
arccos is a multivalued function: for each x there are infinitely many numbers z such that cos(z) = x. The convention is to return the angle z whose real part lies in [0, pi].
For real-valued input data types, arccos always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, arccos is a complex analytic function that has branch cuts [-inf, -1] and [1, inf] and is continuous from above on the former and from below on the latter.
The inverse cos is also known as acos or cos^-1.
References
M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 79. http://www.math.sfu.ca/~cbm/aands/
Examples
We expect the arccos of 1 to be 0, and of -1 to be pi:
>>> np.arccos([1, -1]) array([ 0. , 3.14159265])
Plot arccos:
>>> import matplotlib.pyplot as plt >>> x = np.linspace(-1, 1, num=100) >>> plt.plot(x, np.arccos(x)) >>> plt.axis('tight') >>> plt.show()
-
dask.array.
arccosh
(x[, out])¶ Inverse hyperbolic cosine, element-wise.
Parameters: - x (array_like) – Input array.
- out (ndarray, optional) – Array of the same shape as x, to store results in. See doc.ufuncs (Section “Output arguments”) for details.
Returns: arccosh – Array of the same shape as x.
Return type: ndarray
Notes
arccosh is a multivalued function: for each x there are infinitely many numbers z such that cosh(z) = x. The convention is to return the z whose imaginary part lies in [-pi, pi] and the real part in
[0, inf]
.For real-valued input data types, arccosh always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, arccosh is a complex analytical function that has a branch cut [-inf, 1] and is continuous from above on it.
References
[1] M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 86. http://www.math.sfu.ca/~cbm/aands/ [2] Wikipedia, “Inverse hyperbolic function”, http://en.wikipedia.org/wiki/Arccosh Examples
>>> np.arccosh([np.e, 10.0]) array([ 1.65745445, 2.99322285]) >>> np.arccosh(1) 0.0
-
dask.array.
arcsin
(x[, out])¶ Inverse sine, element-wise.
Parameters: - x (array_like) – y-coordinate on the unit circle.
- out (ndarray, optional) – Array of the same shape as x, in which to store the results. See doc.ufuncs (Section “Output arguments”) for more details.
Returns: angle – The inverse sine of each element in x, in radians and in the closed interval
[-pi/2, pi/2]
. If x is a scalar, a scalar is returned, otherwise an array.Return type: ndarray
Notes
arcsin is a multivalued function: for each x there are infinitely many numbers z such that \(sin(z) = x\). The convention is to return the angle z whose real part lies in [-pi/2, pi/2].
For real-valued input data types, arcsin always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, arcsin is a complex analytic function that has, by convention, the branch cuts [-inf, -1] and [1, inf] and is continuous from above on the former and from below on the latter.
The inverse sine is also known as asin or sin^{-1}.
References
Abramowitz, M. and Stegun, I. A., Handbook of Mathematical Functions, 10th printing, New York: Dover, 1964, pp. 79ff. http://www.math.sfu.ca/~cbm/aands/
Examples
>>> np.arcsin(1) # pi/2 1.5707963267948966 >>> np.arcsin(-1) # -pi/2 -1.5707963267948966 >>> np.arcsin(0) 0.0
-
dask.array.
arcsinh
(x[, out])¶ Inverse hyperbolic sine element-wise.
Parameters: - x (array_like) – Input array.
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: out – Array of of the same shape as x.
Return type: ndarray
Notes
arcsinh is a multivalued function: for each x there are infinitely many numbers z such that sinh(z) = x. The convention is to return the z whose imaginary part lies in [-pi/2, pi/2].
For real-valued input data types, arcsinh always returns real output. For each value that cannot be expressed as a real number or infinity, it returns
nan
and sets the invalid floating point error flag.For complex-valued input, arccos is a complex analytical function that has branch cuts [1j, infj] and [-1j, -infj] and is continuous from the right on the former and from the left on the latter.
The inverse hyperbolic sine is also known as asinh or
sinh^-1
.References
[1] M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 86. http://www.math.sfu.ca/~cbm/aands/ [2] Wikipedia, “Inverse hyperbolic function”, http://en.wikipedia.org/wiki/Arcsinh Examples
>>> np.arcsinh(np.array([np.e, 10.0])) array([ 1.72538256, 2.99822295])
-
dask.array.
arctan
(x[, out])¶ Trigonometric inverse tangent, element-wise.
The inverse of tan, so that if
y = tan(x)
thenx = arctan(y)
.Parameters: x (array_like) – Input values. arctan is applied to each element of x. Returns: out – Out has the same shape as x. Its real part is in [-pi/2, pi/2]
(arctan(+/-inf)
returns+/-pi/2
). It is a scalar if x is a scalar.Return type: ndarray See also
Notes
arctan is a multi-valued function: for each x there are infinitely many numbers z such that tan(z) = x. The convention is to return the angle z whose real part lies in [-pi/2, pi/2].
For real-valued input data types, arctan always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, arctan is a complex analytic function that has [1j, infj] and [-1j, -infj] as branch cuts, and is continuous from the left on the former and from the right on the latter.
The inverse tangent is also known as atan or tan^{-1}.
References
Abramowitz, M. and Stegun, I. A., Handbook of Mathematical Functions, 10th printing, New York: Dover, 1964, pp. 79. http://www.math.sfu.ca/~cbm/aands/
Examples
We expect the arctan of 0 to be 0, and of 1 to be pi/4:
>>> np.arctan([0, 1]) array([ 0. , 0.78539816])
>>> np.pi/4 0.78539816339744828
Plot arctan:
>>> import matplotlib.pyplot as plt >>> x = np.linspace(-10, 10) >>> plt.plot(x, np.arctan(x)) >>> plt.axis('tight') >>> plt.show()
-
dask.array.
arctan2
(x1, x2[, out])¶ Element-wise arc tangent of
x1/x2
choosing the quadrant correctly.The quadrant (i.e., branch) is chosen so that
arctan2(x1, x2)
is the signed angle in radians between the ray ending at the origin and passing through the point (1,0), and the ray ending at the origin and passing through the point (x2, x1). (Note the role reversal: the “y-coordinate” is the first function parameter, the “x-coordinate” is the second.) By IEEE convention, this function is defined for x2 = +/-0 and for either or both of x1 and x2 = +/-inf (see Notes for specific values).This function is not defined for complex-valued arguments; for the so-called argument of complex values, use angle.
Parameters: - x1 (array_like, real-valued) – y-coordinates.
- x2 (array_like, real-valued) – x-coordinates. x2 must be broadcastable to match the shape of x1 or vice versa.
Returns: angle – Array of angles in radians, in the range
[-pi, pi]
.Return type: ndarray
Notes
arctan2 is identical to the atan2 function of the underlying C library. The following special values are defined in the C standard: [1]_
x1 x2 arctan2(x1,x2) +/- 0 +0 +/- 0 +/- 0 -0 +/- pi > 0 +/-inf +0 / +pi < 0 +/-inf -0 / -pi +/-inf +inf +/- (pi/4) +/-inf -inf +/- (3*pi/4) Note that +0 and -0 are distinct floating point numbers, as are +inf and -inf.
References
[1] ISO/IEC standard 9899:1999, “Programming language C.” Examples
Consider four points in different quadrants:
>>> x = np.array([-1, +1, +1, -1]) >>> y = np.array([-1, -1, +1, +1]) >>> np.arctan2(y, x) * 180 / np.pi array([-135., -45., 45., 135.])
Note the order of the parameters. arctan2 is defined also when x2 = 0 and at several other special points, obtaining values in the range
[-pi, pi]
:>>> np.arctan2([1., -1.], [0., 0.]) array([ 1.57079633, -1.57079633]) >>> np.arctan2([0., 0., np.inf], [+0., -0., np.inf]) array([ 0. , 3.14159265, 0.78539816])
-
dask.array.
arctanh
(x[, out])¶ Inverse hyperbolic tangent element-wise.
Parameters: x (array_like) – Input array. Returns: out – Array of the same shape as x. Return type: ndarray See also
emath.arctanh()
Notes
arctanh is a multivalued function: for each x there are infinitely many numbers z such that tanh(z) = x. The convention is to return the z whose imaginary part lies in [-pi/2, pi/2].
For real-valued input data types, arctanh always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, arctanh is a complex analytical function that has branch cuts [-1, -inf] and [1, inf] and is continuous from above on the former and from below on the latter.
The inverse hyperbolic tangent is also known as atanh or
tanh^-1
.References
[1] M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 86. http://www.math.sfu.ca/~cbm/aands/ [2] Wikipedia, “Inverse hyperbolic function”, http://en.wikipedia.org/wiki/Arctanh Examples
>>> np.arctanh([0, -0.5]) array([ 0. , -0.54930614])
-
dask.array.
argmax
(x, axis=None, split_every=None, out=None)¶ Returns the indices of the maximum values along an axis.
Parameters: - a (array_like) – Input array.
- axis (int, optional) – By default, the index is into the flattened array, otherwise along the specified axis.
- out (array, optional) – If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
Returns: index_array – Array of indices into the array. It has the same shape as a.shape with the dimension along axis removed.
Return type: ndarray of ints
See also
ndarray.argmax()
,argmin()
amax()
- The maximum value along a given axis.
unravel_index()
- Convert a flat index into an index tuple.
Notes
In case of multiple occurrences of the maximum values, the indices corresponding to the first occurrence are returned.
Examples
>>> a = np.arange(6).reshape(2,3) >>> a array([[0, 1, 2], [3, 4, 5]]) >>> np.argmax(a) 5 >>> np.argmax(a, axis=0) array([1, 1, 1]) >>> np.argmax(a, axis=1) array([2, 2])
>>> b = np.arange(6) >>> b[1] = 5 >>> b array([0, 5, 2, 3, 4, 5]) >>> np.argmax(b) # Only the first occurrence is returned. 1
-
dask.array.
argmin
(x, axis=None, split_every=None, out=None)¶ Returns the indices of the minimum values along an axis.
Parameters: - a (array_like) – Input array.
- axis (int, optional) – By default, the index is into the flattened array, otherwise along the specified axis.
- out (array, optional) – If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
Returns: index_array – Array of indices into the array. It has the same shape as a.shape with the dimension along axis removed.
Return type: ndarray of ints
See also
ndarray.argmin()
,argmax()
amin()
- The minimum value along a given axis.
unravel_index()
- Convert a flat index into an index tuple.
Notes
In case of multiple occurrences of the minimum values, the indices corresponding to the first occurrence are returned.
Examples
>>> a = np.arange(6).reshape(2,3) >>> a array([[0, 1, 2], [3, 4, 5]]) >>> np.argmin(a) 0 >>> np.argmin(a, axis=0) array([0, 0, 0]) >>> np.argmin(a, axis=1) array([0, 0])
>>> b = np.arange(6) >>> b[4] = 0 >>> b array([0, 1, 2, 3, 0, 5]) >>> np.argmin(b) # Only the first occurrence is returned. 0
-
dask.array.
argwhere
(a)¶ Find the indices of array elements that are non-zero, grouped by element.
Parameters: a (array_like) – Input data. Returns: index_array – Indices of elements that are non-zero. Indices are grouped by element. Return type: ndarray Notes
np.argwhere(a)
is the same asnp.transpose(np.nonzero(a))
.The output of
argwhere
is not suitable for indexing arrays. For this purpose usewhere(a)
instead.Examples
>>> x = np.arange(6).reshape(2,3) >>> x array([[0, 1, 2], [3, 4, 5]]) >>> np.argwhere(x>1) array([[0, 2], [1, 0], [1, 1], [1, 2]])
-
dask.array.
around
(x, decimals=0)¶ Evenly round to the given number of decimals.
Parameters: - a (array_like) – Input data.
- decimals (int, optional) – Number of decimal places to round to (default: 0). If decimals is negative, it specifies the number of positions to the left of the decimal point.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output, but the type of the output values will be cast if necessary. See doc.ufuncs (Section “Output arguments”) for details.
Returns: rounded_array – An array of the same type as a, containing the rounded values. Unless out was specified, a new array is created. A reference to the result is returned.
The real and imaginary parts of complex numbers are rounded separately. The result of rounding a float is a float.
Return type: ndarray
Notes
For values exactly halfway between rounded decimal values, Numpy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc. Results may also be surprising due to the inexact representation of decimal fractions in the IEEE floating point standard [1]_ and errors introduced when scaling by powers of ten.
References
[1] “Lecture Notes on the Status of IEEE 754”, William Kahan, http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF [2] “How Futile are Mindless Assessments of Roundoff in Floating-Point Computation?”, William Kahan, http://www.cs.berkeley.edu/~wkahan/Mindless.pdf Examples
>>> np.around([0.37, 1.64]) array([ 0., 2.]) >>> np.around([0.37, 1.64], decimals=1) array([ 0.4, 1.6]) >>> np.around([.5, 1.5, 2.5, 3.5, 4.5]) # rounds to nearest even value array([ 0., 2., 2., 4., 4.]) >>> np.around([1,2,3,11], decimals=1) # ndarray of ints is returned array([ 1, 2, 3, 11]) >>> np.around([1,2,3,11], decimals=-1) array([ 0, 0, 0, 10])
-
dask.array.
array
(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)¶ Create an array.
Parameters: - object (array_like) – An array, any object exposing the array interface, an object whose __array__ method returns an array, or any (nested) sequence.
- dtype (data-type, optional) – The desired data-type for the array. If not given, then the type will be determined as the minimum type required to hold the objects in the sequence. This argument can only be used to ‘upcast’ the array. For downcasting, use the .astype(t) method.
- copy (bool, optional) – If true (default), then the object is copied. Otherwise, a copy will only be made if __array__ returns a copy, if obj is a nested sequence, or if a copy is needed to satisfy any of the other requirements (dtype, order, etc.).
- order ({'C', 'F', 'A'}, optional) – Specify the order of the array. If order is ‘C’, then the array will be in C-contiguous order (last-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (first-index varies the fastest). If order is ‘A’ (default), then the returned array may be in any order (either C-, Fortran-contiguous, or even discontiguous), unless a copy is required, in which case it will be C-contiguous.
- subok (bool, optional) – If True, then sub-classes will be passed-through, otherwise the returned array will be forced to be a base-class array (default).
- ndmin (int, optional) – Specifies the minimum number of dimensions that the resulting array should have. Ones will be pre-pended to the shape as needed to meet this requirement.
Returns: out – An array object satisfying the specified requirements.
Return type: ndarray
See also
empty()
,empty_like()
,zeros()
,zeros_like()
,ones()
,ones_like()
,fill()
Examples
>>> np.array([1, 2, 3]) array([1, 2, 3])
Upcasting:
>>> np.array([1, 2, 3.0]) array([ 1., 2., 3.])
More than one dimension:
>>> np.array([[1, 2], [3, 4]]) array([[1, 2], [3, 4]])
Minimum dimensions 2:
>>> np.array([1, 2, 3], ndmin=2) array([[1, 2, 3]])
Type provided:
>>> np.array([1, 2, 3], dtype=complex) array([ 1.+0.j, 2.+0.j, 3.+0.j])
Data-type consisting of more than one element:
>>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')]) >>> x['a'] array([1, 3])
Creating an array from sub-classes:
>>> np.array(np.mat('1 2; 3 4')) array([[1, 2], [3, 4]])
>>> np.array(np.mat('1 2; 3 4'), subok=True) matrix([[1, 2], [3, 4]])
-
dask.array.
bincount
(x, weights=None, minlength=None)¶ Count number of occurrences of each value in array of non-negative ints.
The number of bins (of size 1) is one larger than the largest value in x. If minlength is specified, there will be at least this number of bins in the output array (though it will be longer if necessary, depending on the contents of x). Each bin gives the number of occurrences of its index value in x. If weights is specified the input array is weighted by it, i.e. if a value
n
is found at positioni
,out[n] += weight[i]
instead ofout[n] += 1
.Parameters: - x (array_like, 1 dimension, nonnegative ints) – Input array.
- weights (array_like, optional) – Weights, array of the same shape as x.
- minlength (int, optional) –
A minimum number of bins for the output array.
New in version 1.6.0.
Returns: out – The result of binning the input array. The length of out is equal to
np.amax(x)+1
.Return type: ndarray of ints
Raises: ValueError
– If the input is not 1-dimensional, or contains elements with negative values, or if minlength is non-positive.TypeError
– If the type of the input is float or complex.
See also
Examples
>>> np.bincount(np.arange(5)) array([1, 1, 1, 1, 1]) >>> np.bincount(np.array([0, 1, 1, 3, 2, 1, 7])) array([1, 3, 1, 1, 0, 0, 0, 1])
>>> x = np.array([0, 1, 1, 3, 2, 1, 7, 23]) >>> np.bincount(x).size == np.amax(x)+1 True
The input array needs to be of integer dtype, otherwise a TypeError is raised:
>>> np.bincount(np.arange(5, dtype=np.float)) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: array cannot be safely cast to required type
A possible use of
bincount
is to perform sums over variable-size chunks of an array, using theweights
keyword.>>> w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights >>> x = np.array([0, 1, 1, 2, 2, 2]) >>> np.bincount(x, weights=w) array([ 0.3, 0.7, 1.1])
-
dask.array.
broadcast_to
(x, shape)¶ Broadcast an array to a new shape.
Parameters: - array (array_like) – The array to broadcast.
- shape (tuple) – The shape of the desired array.
- subok (bool, optional) – If True, then sub-classes will be passed-through, otherwise the returned array will be forced to be a base-class array (default).
Returns: broadcast – A readonly view on the original array with the given shape. It is typically not contiguous. Furthermore, more than one element of a broadcasted array may refer to a single memory location.
Return type: Raises: ValueError
– If the array is not compatible with the new shape according to NumPy’s broadcasting rules.Notes
New in version 1.10.0.
Examples
>>> x = np.array([1, 2, 3]) >>> np.broadcast_to(x, (3, 3)) array([[1, 2, 3], [1, 2, 3], [1, 2, 3]])
-
dask.array.
coarsen
(reduction, x, axes, trim_excess=False) Coarsen array by applying reduction to fixed size neighborhoods
Parameters: Examples
>>> x = np.array([1, 2, 3, 4, 5, 6]) >>> coarsen(np.sum, x, {0: 2}) array([ 3, 7, 11]) >>> coarsen(np.max, x, {0: 3}) array([3, 6])
Provide dictionary of scale per dimension
>>> x = np.arange(24).reshape((4, 6)) >>> x array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]])
>>> coarsen(np.min, x, {0: 2, 1: 3}) array([[ 0, 3], [12, 15]])
You must avoid excess elements explicitly
>>> x = np.array([1, 2, 3, 4, 5, 6, 7, 8]) >>> coarsen(np.min, x, {0: 3}, trim_excess=True) array([1, 4])
-
dask.array.
ceil
(x[, out])¶ Return the ceiling of the input, element-wise.
The ceil of the scalar x is the smallest integer i, such that i >= x. It is often denoted as \(\lceil x \rceil\).
Parameters: x (array_like) – Input data. Returns: y – The ceiling of each element in x, with float dtype. Return type: ndarray or scalar Examples
>>> a = np.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0]) >>> np.ceil(a) array([-1., -1., -0., 1., 2., 2., 2.])
-
dask.array.
choose
(a, choices)¶ Construct an array from an index array and a set of arrays to choose from.
First of all, if confused or uncertain, definitely look at the Examples - in its full generality, this function is less simple than it might seem from the following code description (below ndi = numpy.lib.index_tricks):
np.choose(a,c) == np.array([c[a[I]][I] for I in ndi.ndindex(a.shape)])
.But this omits some subtleties. Here is a fully general summary:
Given an “index” array (a) of integers and a sequence of n arrays (choices), a and each choice array are first broadcast, as necessary, to arrays of a common shape; calling these Ba and Bchoices[i], i = 0,...,n-1 we have that, necessarily,
Ba.shape == Bchoices[i].shape
for each i. Then, a new array with shapeBa.shape
is created as follows:- if
mode=raise
(the default), then, first of all, each element of a (and thus Ba) must be in the range [0, n-1]; now, suppose that i (in that range) is the value at the (j0, j1, ..., jm) position in Ba - then the value at the same position in the new array is the value in Bchoices[i] at that same position; - if
mode=wrap
, values in a (and thus Ba) may be any (signed) integer; modular arithmetic is used to map integers outside the range [0, n-1] back into that range; and then the new array is constructed as above; - if
mode=clip
, values in a (and thus Ba) may be any (signed) integer; negative integers are mapped to 0; values greater than n-1 are mapped to n-1; and then the new array is constructed as above.
Parameters: - a (int array) – This array must contain integers in [0, n-1], where n is the number
of choices, unless
mode=wrap
ormode=clip
, in which cases any integers are permissible. - choices (sequence of arrays) – Choice arrays. a and all of the choices must be broadcastable to the
same shape. If choices is itself an array (not recommended), then
its outermost dimension (i.e., the one corresponding to
choices.shape[0]
) is taken as defining the “sequence”. - out (array, optional) – If provided, the result will be inserted into this array. It should be of the appropriate shape and dtype.
- mode ({'raise' (default), 'wrap', 'clip'}, optional) –
Specifies how indices outside [0, n-1] will be treated:
- ‘raise’ : an exception is raised
- ‘wrap’ : value becomes value mod n
- ‘clip’ : values < 0 are mapped to 0, values > n-1 are mapped to n-1
Returns: merged_array – The merged result.
Return type: Raises: ValueError: shape mismatch – If a and each choice array are not all broadcastable to the same shape.
See also
ndarray.choose()
- equivalent method
Notes
To reduce the chance of misinterpretation, even though the following “abuse” is nominally supported, choices should neither be, nor be thought of as, a single array, i.e., the outermost sequence-like container should be either a list or a tuple.
Examples
>>> choices = [[0, 1, 2, 3], [10, 11, 12, 13], ... [20, 21, 22, 23], [30, 31, 32, 33]] >>> np.choose([2, 3, 1, 0], choices ... # the first element of the result will be the first element of the ... # third (2+1) "array" in choices, namely, 20; the second element ... # will be the second element of the fourth (3+1) choice array, i.e., ... # 31, etc. ... ) array([20, 31, 12, 3]) >>> np.choose([2, 4, 1, 0], choices, mode='clip') # 4 goes to 3 (4-1) array([20, 31, 12, 3]) >>> # because there are 4 choice arrays >>> np.choose([2, 4, 1, 0], choices, mode='wrap') # 4 goes to (4 mod 4) array([20, 1, 12, 3]) >>> # i.e., 0
A couple examples illustrating how choose broadcasts:
>>> a = [[1, 0, 1], [0, 1, 0], [1, 0, 1]] >>> choices = [-10, 10] >>> np.choose(a, choices) array([[ 10, -10, 10], [-10, 10, -10], [ 10, -10, 10]])
>>> # With thanks to Anne Archibald >>> a = np.array([0, 1]).reshape((2,1,1)) >>> c1 = np.array([1, 2, 3]).reshape((1,3,1)) >>> c2 = np.array([-1, -2, -3, -4, -5]).reshape((1,1,5)) >>> np.choose(a, (c1, c2)) # result is 2x3x5, res[0,:,:]=c1, res[1,:,:]=c2 array([[[ 1, 1, 1, 1, 1], [ 2, 2, 2, 2, 2], [ 3, 3, 3, 3, 3]], [[-1, -2, -3, -4, -5], [-1, -2, -3, -4, -5], [-1, -2, -3, -4, -5]]])
- if
-
dask.array.
clip
(*args, **kwargs)¶ Clip (limit) the values in an array.
Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of
[0, 1]
is specified, values smaller than 0 become 0, and values larger than 1 become 1.Parameters: - a (array_like) – Array containing elements to clip.
- a_min (scalar or array_like) – Minimum value.
- a_max (scalar or array_like) – Maximum value. If a_min or a_max are array_like, then they will be broadcasted to the shape of a.
- out (ndarray, optional) – The results will be placed in this array. It may be the input array for in-place clipping. out must be of the right shape to hold the output. Its type is preserved.
Returns: clipped_array – An array with the elements of a, but where values < a_min are replaced with a_min, and those > a_max with a_max.
Return type: ndarray
See also
numpy.doc.ufuncs()
- Section “Output arguments”
Examples
>>> a = np.arange(10) >>> np.clip(a, 1, 8) array([1, 1, 2, 3, 4, 5, 6, 7, 8, 8]) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> np.clip(a, 3, 6, out=a) array([3, 3, 3, 3, 4, 5, 6, 6, 6, 6]) >>> a = np.arange(10) >>> a array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) >>> np.clip(a, [3,4,1,1,1,4,4,4,4,4], 8) array([3, 4, 2, 3, 4, 5, 6, 7, 8, 8])
-
dask.array.
compress
(condition, a, axis=None)¶ Return selected slices of an array along given axis.
When working along a given axis, a slice along that axis is returned in output for each index where condition evaluates to True. When working on a 1-D array, compress is equivalent to extract.
Parameters: - condition (1-D array of bools) – Array that selects which entries to return. If len(condition) is less than the size of a along the given axis, then output is truncated to the length of the condition array.
- a (array_like) – Array from which to extract a part.
- axis (int, optional) – Axis along which to take slices. If None (default), work on the flattened array.
- out (ndarray, optional) – Output array. Its type is preserved and it must be of the right shape to hold the output.
Returns: compressed_array – A copy of a without the slices along axis for which condition is false.
Return type: ndarray
See also
take()
,choose()
,diag()
,diagonal()
,select()
ndarray.compress()
- Equivalent method in ndarray
np.extract()
- Equivalent method when working on 1-D arrays
numpy.doc.ufuncs()
- Section “Output arguments”
Examples
>>> a = np.array([[1, 2], [3, 4], [5, 6]]) >>> a array([[1, 2], [3, 4], [5, 6]]) >>> np.compress([0, 1], a, axis=0) array([[3, 4]]) >>> np.compress([False, True, True], a, axis=0) array([[3, 4], [5, 6]]) >>> np.compress([False, True], a, axis=1) array([[2], [4], [6]])
Working on the flattened array does not return slices along an axis but selects elements.
>>> np.compress([False, True], a) array([2])
-
dask.array.
concatenate
(seq, axis=0, allow_unknown_chunksizes=False) Concatenate arrays along an existing axis
Given a sequence of dask Arrays form a new dask Array by stacking them along an existing dimension (axis=0 by default)
Parameters: - seq (list of dask.arrays) –
- axis (int) – Dimension along which to align all of the arrays
- allow_unknown_chunksizes (bool) – Allow unknown chunksizes, such as come from converting from dask dataframes. Dask.array is unable to verify that chunks line up. If data comes from differently aligned sources then this can cause unexpected results.
Examples
Create slices
>>> import dask.array as da >>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2)) ... for i in range(3)]
>>> x = da.concatenate(data, axis=0) >>> x.shape (12, 4)
>>> da.concatenate(data, axis=1).shape (4, 12)
Result is a new dask Array
See also
-
dask.array.
conj
()¶ conjugate(x[, out])
Return the complex conjugate, element-wise.
The complex conjugate of a complex number is obtained by changing the sign of its imaginary part.
Parameters: x (array_like) – Input value. Returns: y – The complex conjugate of x, with same dtype as y. Return type: ndarray Examples
>>> np.conjugate(1+2j) (1-2j)
>>> x = np.eye(2) + 1j * np.eye(2) >>> np.conjugate(x) array([[ 1.-1.j, 0.-0.j], [ 0.-0.j, 1.-1.j]])
-
dask.array.
copysign
(x1, x2[, out])¶ Change the sign of x1 to that of x2, element-wise.
If both arguments are arrays or sequences, they have to be of the same length. If x2 is a scalar, its sign will be copied to all elements of x1.
Parameters: - x1 (array_like) – Values to change the sign of.
- x2 (array_like) – The sign of x2 is copied to x1.
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: out – The values of x1 with the sign of x2.
Return type: Examples
>>> np.copysign(1.3, -1) -1.3 >>> 1/np.copysign(0, 1) inf >>> 1/np.copysign(0, -1) -inf
>>> np.copysign([-1, 0, 1], -1.1) array([-1., -0., -1.]) >>> np.copysign([-1, 0, 1], np.arange(3)-1) array([-1., 0., 1.])
-
dask.array.
corrcoef
(x, y=None, rowvar=1)¶ Return Pearson product-moment correlation coefficients.
Please refer to the documentation for cov for more detail. The relationship between the correlation coefficient matrix, R, and the covariance matrix, C, is
\[R_{ij} = \frac{ C_{ij} } { \sqrt{ C_{ii} * C_{jj} } }\]The values of R are between -1 and 1, inclusive.
Parameters: - x (array_like) – A 1-D or 2-D array containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables. Also see rowvar below.
- y (array_like, optional) – An additional set of variables and observations. y has the same shape as x.
- rowvar (int, optional) – If rowvar is non-zero (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.
- bias (_NoValue, optional) –
Has no effect, do not use.
Deprecated since version 1.10.0.
- ddof (_NoValue, optional) –
Has no effect, do not use.
Deprecated since version 1.10.0.
Returns: R – The correlation coefficient matrix of the variables.
Return type: ndarray
See also
cov()
- Covariance matrix
Notes
Due to floating point rounding the resulting array may not be Hermitian, the diagonal elements may not be 1, and the elements may not satisfy the inequality abs(a) <= 1. The real and imaginary parts are clipped to the interval [-1, 1] in an attempt to improve on that situation but is not much help in the complex case.
This function accepts but discards arguments bias and ddof. This is for backwards compatibility with previous versions of this function. These arguments had no effect on the return values of the function and can be safely ignored in this and previous versions of numpy.
-
dask.array.
cos
(x[, out])¶ Cosine element-wise.
Parameters: - x (array_like) – Input array in radians.
- out (ndarray, optional) – Output array of same shape as x.
Returns: y – The corresponding cosine values.
Return type: ndarray
Raises: ValueError: invalid return array shape – if out is provided and out.shape != x.shape (See Examples)
Notes
If out is provided, the function writes the result into it, and returns a reference to out. (See Examples)
References
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. New York, NY: Dover, 1972.
Examples
>>> np.cos(np.array([0, np.pi/2, np.pi])) array([ 1.00000000e+00, 6.12303177e-17, -1.00000000e+00]) >>> >>> # Example of providing the optional output parameter >>> out2 = np.cos([0.1], out1) >>> out2 is out1 True >>> >>> # Example of ValueError due to provision of shape mis-matched `out` >>> np.cos(np.zeros((3,3)),np.zeros((2,2))) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid return array shape
-
dask.array.
cosh
(x[, out])¶ Hyperbolic cosine, element-wise.
Equivalent to
1/2 * (np.exp(x) + np.exp(-x))
andnp.cos(1j*x)
.Parameters: x (array_like) – Input array. Returns: out – Output array of same shape as x. Return type: ndarray Examples
>>> np.cosh(0) 1.0
The hyperbolic cosine describes the shape of a hanging cable:
>>> import matplotlib.pyplot as plt >>> x = np.linspace(-4, 4, 1000) >>> plt.plot(x, np.cosh(x)) >>> plt.show()
-
dask.array.
count_nonzero
(a)¶ Counts the number of non-zero values in the array
a
.Parameters: a (array_like) – The array for which to count non-zeros. Returns: count – Number of non-zero values in the array. Return type: int or array of int See also
nonzero()
- Return the coordinates of all the non-zero values.
Examples
>>> np.count_nonzero(np.eye(4)) 4 >>> np.count_nonzero([[0,1,7,0,0],[3,0,0,2,19]]) 5
-
dask.array.
cov
(m, y=None, rowvar=1, bias=0, ddof=None)¶ Estimate a covariance matrix, given data and weights.
Covariance indicates the level to which two variables vary together. If we examine N-dimensional samples, \(X = [x_1, x_2, ... x_N]^T\), then the covariance matrix element \(C_{ij}\) is the covariance of \(x_i\) and \(x_j\). The element \(C_{ii}\) is the variance of \(x_i\).
See the notes for an outline of the algorithm.
Parameters: - m (array_like) – A 1-D or 2-D array containing multiple variables and observations. Each row of m represents a variable, and each column a single observation of all those variables. Also see rowvar below.
- y (array_like, optional) – An additional set of variables and observations. y has the same form as that of m.
- rowvar (bool, optional) – If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.
- bias (bool, optional) – Default normalization (False) is by
(N - 1)
, whereN
is the number of observations given (unbiased estimate). If bias is True, then normalization is byN
. These values can be overridden by using the keywordddof
in numpy versions >= 1.5. - ddof (int, optional) –
If not
None
the default value implied by bias is overridden. Note thatddof=1
will return the unbiased estimate, even if both fweights and aweights are specified, andddof=0
will return the simple average. See the notes for the details. The default value isNone
.New in version 1.5.
- fweights (array_like, int, optional) –
1-D array of integer freguency weights; the number of times each observation vector should be repeated.
New in version 1.10.
- aweights (array_like, optional) –
1-D array of observation vector weights. These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If
ddof=0
the array of weights can be used to assign probabilities to observation vectors.New in version 1.10.
Returns: out – The covariance matrix of the variables.
Return type: ndarray
See also
corrcoef()
- Normalized covariance matrix
Notes
Assume that the observations are in the columns of the observation array m and let
f = fweights
anda = aweights
for brevity. The steps to compute the weighted covariance are as follows:>>> w = f * a >>> v1 = np.sum(w) >>> v2 = np.sum(w * a) >>> m -= np.sum(m * w, axis=1, keepdims=True) / v1 >>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2)
Note that when
a == 1
, the normalization factorv1 / (v1**2 - ddof * v2)
goes over to1 / (np.sum(f) - ddof)
as it should.Examples
Consider two variables, \(x_0\) and \(x_1\), which correlate perfectly, but in opposite directions:
>>> x = np.array([[0, 2], [1, 1], [2, 0]]).T >>> x array([[0, 1, 2], [2, 1, 0]])
Note how \(x_0\) increases while \(x_1\) decreases. The covariance matrix shows this clearly:
>>> np.cov(x) array([[ 1., -1.], [-1., 1.]])
Note that element \(C_{0,1}\), which shows the correlation between \(x_0\) and \(x_1\), is negative.
Further, note how x and y are combined:
>>> x = [-2.1, -1, 4.3] >>> y = [3, 1.1, 0.12] >>> X = np.vstack((x,y)) >>> print(np.cov(X)) [[ 11.71 -4.286 ] [ -4.286 2.14413333]] >>> print(np.cov(x, y)) [[ 11.71 -4.286 ] [ -4.286 2.14413333]] >>> print(np.cov(x)) 11.71
-
dask.array.
cumprod
(x, axis=None, dtype=None, out=None)¶ Return the cumulative product of elements along a given axis.
Parameters: - a (array_like) – Input array.
- axis (int, optional) – Axis along which the cumulative product is computed. By default the input is flattened.
- dtype (dtype, optional) – Type of the returned array, as well as of the accumulator in which the elements are multiplied. If dtype is not specified, it defaults to the dtype of a, unless a has an integer dtype with a precision less than that of the default platform integer. In that case, the default platform integer is used instead.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type of the resulting values will be cast if necessary.
Returns: cumprod – A new array holding the result is returned unless out is specified, in which case a reference to out is returned.
Return type: ndarray
See also
numpy.doc.ufuncs()
- Section “Output arguments”
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
Examples
>>> a = np.array([1,2,3]) >>> np.cumprod(a) # intermediate results 1, 1*2 ... # total product 1*2*3 = 6 array([1, 2, 6]) >>> a = np.array([[1, 2, 3], [4, 5, 6]]) >>> np.cumprod(a, dtype=float) # specify type of output array([ 1., 2., 6., 24., 120., 720.])
The cumulative product for each column (i.e., over the rows) of a:
>>> np.cumprod(a, axis=0) array([[ 1, 2, 3], [ 4, 10, 18]])
The cumulative product for each row (i.e. over the columns) of a:
>>> np.cumprod(a,axis=1) array([[ 1, 2, 6], [ 4, 20, 120]])
-
dask.array.
cumsum
(x, axis=None, dtype=None, out=None)¶ Return the cumulative sum of the elements along a given axis.
Parameters: - a (array_like) – Input array.
- axis (int, optional) – Axis along which the cumulative sum is computed. The default (None) is to compute the cumsum over the flattened array.
- dtype (dtype, optional) – Type of the returned array and of the accumulator in which the elements are summed. If dtype is not specified, it defaults to the dtype of a, unless a has an integer dtype with a precision less than that of the default platform integer. In that case, the default platform integer is used.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type will be cast if necessary. See doc.ufuncs (Section “Output arguments”) for more details.
Returns: cumsum_along_axis – A new array holding the result is returned unless out is specified, in which case a reference to out is returned. The result has the same size as a, and the same shape as a if axis is not None or a is a 1-d array.
Return type: ndarray.
See also
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
Examples
>>> a = np.array([[1,2,3], [4,5,6]]) >>> a array([[1, 2, 3], [4, 5, 6]]) >>> np.cumsum(a) array([ 1, 3, 6, 10, 15, 21]) >>> np.cumsum(a, dtype=float) # specifies type of output value(s) array([ 1., 3., 6., 10., 15., 21.])
>>> np.cumsum(a,axis=0) # sum over rows for each of the 3 columns array([[1, 2, 3], [5, 7, 9]]) >>> np.cumsum(a,axis=1) # sum over columns for each of the 2 rows array([[ 1, 3, 6], [ 4, 9, 15]])
-
dask.array.
deg2rad
(x[, out])¶ Convert angles from degrees to radians.
Parameters: x (array_like) – Angles in degrees. Returns: y – The corresponding angle in radians. Return type: ndarray See also
rad2deg()
- Convert angles from radians to degrees.
unwrap()
- Remove large jumps in angle by wrapping.
Notes
New in version 1.3.0.
deg2rad(x)
isx * pi / 180
.Examples
>>> np.deg2rad(180) 3.1415926535897931
-
dask.array.
degrees
(x[, out])¶ Convert angles from radians to degrees.
Parameters: - x (array_like) – Input array in radians.
- out (ndarray, optional) – Output array of same shape as x.
Returns: y – The corresponding degree values; if out was supplied this is a reference to it.
Return type: ndarray of floats
See also
rad2deg()
- equivalent function
Examples
Convert a radian array to degrees
>>> rad = np.arange(12.)*np.pi/6 >>> np.degrees(rad) array([ 0., 30., 60., 90., 120., 150., 180., 210., 240., 270., 300., 330.])
>>> out = np.zeros((rad.shape)) >>> r = degrees(rad, out) >>> np.all(r == out) True
-
dask.array.
diag
(v)¶ Extract a diagonal or construct a diagonal array.
See the more detailed documentation for
numpy.diagonal
if you use this function to extract a diagonal and wish to write to the resulting array; whether it returns a copy or a view depends on what version of numpy you are using.Parameters: - v (array_like) – If v is a 2-D array, return a copy of its k-th diagonal. If v is a 1-D array, return a 2-D array with v on the k-th diagonal.
- k (int, optional) – Diagonal in question. The default is 0. Use k>0 for diagonals above the main diagonal, and k<0 for diagonals below the main diagonal.
Returns: out – The extracted diagonal or constructed diagonal array.
Return type: ndarray
See also
Examples
>>> x = np.arange(9).reshape((3,3)) >>> x array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
>>> np.diag(x) array([0, 4, 8]) >>> np.diag(x, k=1) array([1, 5]) >>> np.diag(x, k=-1) array([3, 7])
>>> np.diag(np.diag(x)) array([[0, 0, 0], [0, 4, 0], [0, 0, 8]])
-
dask.array.
diff
(a, n=1, axis=-1)¶ Calculate the n-th discrete difference along given axis.
The first difference is given by
out[n] = a[n+1] - a[n]
along the given axis, higher differences are calculated by using diff recursively.- a : array_like
- Input array
- n : int, optional
- The number of times values are differenced.
- axis : int, optional
- The axis along which the difference is taken, default is the last axis.
- diff : ndarray
- The n-th differences. The shape of the output is the same as a except along axis where the dimension is smaller by n.
.
gradient, ediff1d, cumsum
>>> x = np.array([1, 2, 4, 7, 0]) >>> np.diff(x) array([ 1, 2, 3, -7]) >>> np.diff(x, n=2) array([ 1, 1, -10])
>>> x = np.array([[1, 3, 6, 10], [0, 5, 6, 8]]) >>> np.diff(x) array([[2, 3, 4], [5, 1, 2]]) >>> np.diff(x, axis=0) array([[-1, 2, 0, -2]])
-
dask.array.
digitize
(x, bins, right=False)¶ Return the indices of the bins to which each value in input array belongs.
Each index
i
returned is such thatbins[i-1] <= x < bins[i]
if bins is monotonically increasing, orbins[i-1] > x >= bins[i]
if bins is monotonically decreasing. If values in x are beyond the bounds of bins, 0 orlen(bins)
is returned as appropriate. If right is True, then the right bin is closed so that the indexi
is such thatbins[i-1] < x <= bins[i]
or bins[i-1] >= x > bins[i]`` if bins is monotonically increasing or decreasing, respectively.Parameters: - x (array_like) – Input array to be binned. Prior to Numpy 1.10.0, this array had to be 1-dimensional, but can now have any shape.
- bins (array_like) – Array of bins. It has to be 1-dimensional and monotonic.
- right (bool, optional) – Indicating whether the intervals include the right or the left bin edge. Default behavior is (right==False) indicating that the interval does not include the right edge. The left bin end is open in this case, i.e., bins[i-1] <= x < bins[i] is the default behavior for monotonically increasing bins.
Returns: out – Output array of indices, of same shape as x.
Return type: ndarray of ints
Raises: ValueError
– If bins is not monotonic.TypeError
– If the type of the input is complex.
See also
Notes
If values in x are such that they fall outside the bin range, attempting to index bins with the indices that digitize returns will result in an IndexError.
New in version 1.10.0.
np.digitize is implemented in terms of np.searchsorted. This means that a binary search is used to bin the values, which scales much better for larger number of bins than the previous linear search. It also removes the requirement for the input array to be 1-dimensional.
Examples
>>> x = np.array([0.2, 6.4, 3.0, 1.6]) >>> bins = np.array([0.0, 1.0, 2.5, 4.0, 10.0]) >>> inds = np.digitize(x, bins) >>> inds array([1, 4, 3, 2]) >>> for n in range(x.size): ... print(bins[inds[n]-1], "<=", x[n], "<", bins[inds[n]]) ... 0.0 <= 0.2 < 1.0 4.0 <= 6.4 < 10.0 2.5 <= 3.0 < 4.0 1.0 <= 1.6 < 2.5
>>> x = np.array([1.2, 10.0, 12.4, 15.5, 20.]) >>> bins = np.array([0, 5, 10, 15, 20]) >>> np.digitize(x,bins,right=True) array([1, 2, 3, 4, 4]) >>> np.digitize(x,bins,right=False) array([1, 3, 3, 4, 5])
-
dask.array.
dot
(a, b, out=None)¶ Dot product of two arrays.
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])
Parameters: - a (array_like) – First argument.
- b (array_like) – Second argument.
- out (ndarray, optional) – Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.
Returns: output – Returns the dot product of a and b. If a and b are both scalars or both 1-D arrays then a scalar is returned; otherwise an array is returned. If out is given, then it is returned.
Return type: ndarray
Raises: ValueError
– If the last dimension of a is not the same size as the second-to-last dimension of b.See also
vdot()
- Complex-conjugating dot product.
tensordot()
- Sum products over arbitrary axes.
einsum()
- Einstein summation convention.
matmul()
- ‘@’ operator as method with out parameter.
Examples
>>> np.dot(3, 4) 12
Neither argument is complex-conjugated:
>>> np.dot([2j, 3j], [2j, 3j]) (-13+0j)
For 2-D arrays it is the matrix product:
>>> a = [[1, 0], [0, 1]] >>> b = [[4, 1], [2, 2]] >>> np.dot(a, b) array([[4, 1], [2, 2]])
>>> a = np.arange(3*4*5*6).reshape((3,4,5,6)) >>> b = np.arange(3*4*5*6)[::-1].reshape((5,4,6,3)) >>> np.dot(a, b)[2,3,2,1,2,2] 499128 >>> sum(a[2,3,2,:] * b[1,2,:,2]) 499128
-
dask.array.
dstack
(tup)¶ Stack arrays in sequence depth wise (along third axis).
Takes a sequence of arrays and stack them along the third axis to make a single array. Rebuilds arrays divided by dsplit. This is a simple way to stack 2D arrays (images) into a single 3D array for processing.
Parameters: tup (sequence of arrays) – Arrays to stack. All of them must have the same shape along all but the third axis. Returns: stacked – The array formed by stacking the given arrays. Return type: ndarray See also
stack()
- Join a sequence of arrays along a new axis.
vstack()
- Stack along first axis.
hstack()
- Stack along second axis.
concatenate()
- Join a sequence of arrays along an existing axis.
dsplit()
- Split array along third axis.
Notes
Equivalent to
np.concatenate(tup, axis=2)
.Examples
>>> a = np.array((1,2,3)) >>> b = np.array((2,3,4)) >>> np.dstack((a,b)) array([[[1, 2], [2, 3], [3, 4]]])
>>> a = np.array([[1],[2],[3]]) >>> b = np.array([[2],[3],[4]]) >>> np.dstack((a,b)) array([[[1, 2]], [[2, 3]], [[3, 4]]])
-
dask.array.
ediff1d
(ary, to_end=None, to_begin=None)¶ The differences between consecutive elements of an array.
Parameters: - ary (array_like) – If necessary, will be flattened before the differences are taken.
- to_end (array_like, optional) – Number(s) to append at the end of the returned differences.
- to_begin (array_like, optional) – Number(s) to prepend at the beginning of the returned differences.
Returns: ediff1d – The differences. Loosely, this is
ary.flat[1:] - ary.flat[:-1]
.Return type: ndarray
See also
diff()
,gradient()
Notes
When applied to masked arrays, this function drops the mask information if the to_begin and/or to_end parameters are used.
Examples
>>> x = np.array([1, 2, 4, 7, 0]) >>> np.ediff1d(x) array([ 1, 2, 3, -7])
>>> np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99])) array([-99, 1, 2, 3, -7, 88, 99])
The returned array is always 1D.
>>> y = [[1, 2, 4], [1, 6, 24]] >>> np.ediff1d(y) array([ 1, 2, -3, 5, 18])
-
dask.array.
empty
()¶ Blocked variant of empty
Follows the signature of empty exactly except that it also requires a keyword argument chunks=(...)
Original signature follows below. empty(shape, dtype=float, order=’C’)
Return a new array of given shape and type, without initializing entries.
Parameters: - shape (int or tuple of int) – Shape of the empty array
- dtype (data-type, optional) – Desired output data-type.
- order ({'C', 'F'}, optional) – Whether to store multi-dimensional data in row-major (C-style) or column-major (Fortran-style) order in memory.
Returns: out – Array of uninitialized (arbitrary) data of the given shape, dtype, and order. Object arrays will be initialized to None.
Return type: ndarray
See also
Notes
empty, unlike zeros, does not set the array values to zero, and may therefore be marginally faster. On the other hand, it requires the user to manually set all the values in the array, and should be used with caution.
Examples
>>> np.empty([2, 2]) array([[ -9.74499359e+001, 6.69583040e-309], [ 2.13182611e-314, 3.06959433e-309]]) #random
>>> np.empty([2, 2], dtype=int) array([[-1073741821, -1067949133], [ 496041986, 19249760]]) #random
-
dask.array.
empty_like
(a, dtype=None, chunks=None)¶ Return a new array with the same shape and type as a given array.
Parameters: - a (array_like) – The shape and data-type of a define these same attributes of the returned array.
- dtype (data-type, optional) – Overrides the data type of the result.
- chunks (sequence of ints) – The number of samples on each block. Note that the last block will have
fewer samples if
len(array) % chunks != 0
.
Returns: out – Array of uninitialized (arbitrary) data with the same shape and type as a.
Return type: ndarray
See also
ones_like()
- Return an array of ones with shape and type of input.
zeros_like()
- Return an array of zeros with shape and type of input.
empty()
- Return a new uninitialized array.
ones()
- Return a new array setting values to one.
zeros()
- Return a new array setting values to zero.
Notes
This function does not initialize the returned array; to do that use zeros_like or ones_like instead. It may be marginally faster than the functions that do set the array values.
-
dask.array.
exp
(x[, out])¶ Calculate the exponential of all elements in the input array.
Parameters: x (array_like) – Input values. Returns: out – Output array, element-wise exponential of x. Return type: ndarray See also
expm1()
- Calculate
exp(x) - 1
for all elements in the array. exp2()
- Calculate
2**x
for all elements in the array.
Notes
The irrational number
e
is also known as Euler’s number. It is approximately 2.718281, and is the base of the natural logarithm,ln
(this means that, if \(x = \ln y = \log_e y\), then \(e^x = y\). For real input,exp(x)
is always positive.For complex arguments,
x = a + ib
, we can write \(e^x = e^a e^{ib}\). The first term, \(e^a\), is already known (it is the real argument, described above). The second term, \(e^{ib}\), is \(\cos b + i \sin b\), a function with magnitude 1 and a periodic phase.References
[1] Wikipedia, “Exponential function”, http://en.wikipedia.org/wiki/Exponential_function [2] M. Abramovitz and I. A. Stegun, “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables,” Dover, 1964, p. 69, http://www.math.sfu.ca/~cbm/aands/page_69.htm Examples
Plot the magnitude and phase of
exp(x)
in the complex plane:>>> import matplotlib.pyplot as plt
>>> x = np.linspace(-2*np.pi, 2*np.pi, 100) >>> xx = x + 1j * x[:, np.newaxis] # a + ib over complex plane >>> out = np.exp(xx)
>>> plt.subplot(121) >>> plt.imshow(np.abs(out), ... extent=[-2*np.pi, 2*np.pi, -2*np.pi, 2*np.pi]) >>> plt.title('Magnitude of exp(x)')
>>> plt.subplot(122) >>> plt.imshow(np.angle(out), ... extent=[-2*np.pi, 2*np.pi, -2*np.pi, 2*np.pi]) >>> plt.title('Phase (angle) of exp(x)') >>> plt.show()
-
dask.array.
expm1
(x[, out])¶ Calculate
exp(x) - 1
for all elements in the array.Parameters: x (array_like) – Input values. Returns: out – Element-wise exponential minus one: out = exp(x) - 1
.Return type: ndarray See also
log1p()
log(1 + x)
, the inverse of expm1.
Notes
This function provides greater precision than
exp(x) - 1
for small values ofx
.Examples
The true value of
exp(1e-10) - 1
is1.00000000005e-10
to about 32 significant digits. This example shows the superiority of expm1 in this case.>>> np.expm1(1e-10) 1.00000000005e-10 >>> np.exp(1e-10) - 1 1.000000082740371e-10
-
dask.array.
eye
(N, chunks, M=None, k=0, dtype=<type 'float'>)¶ Return a 2-D Array with ones on the diagonal and zeros elsewhere.
Parameters: - N (int) – Number of rows in the output.
- chunks (int) – chunk size of resulting blocks
- M (int, optional) – Number of columns in the output. If None, defaults to N.
- k (int, optional) – Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal.
- dtype (data-type, optional) – Data-type of the returned array.
Returns: I – An array where all elements are equal to zero, except for the k-th diagonal, whose values are equal to one.
Return type: Array of shape (N,M)
-
dask.array.
fabs
(x[, out])¶ Compute the absolute values element-wise.
This function returns the absolute values (positive magnitude) of the data in x. Complex values are not handled, use absolute to find the absolute values of complex data.
Parameters: - x (array_like) – The array of numbers for which the absolute values are required. If x is a scalar, the result y will also be a scalar.
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: y – The absolute values of x, the returned values are always floats.
Return type: ndarray or scalar
See also
absolute()
- Absolute values including complex types.
Examples
>>> np.fabs(-1) 1.0 >>> np.fabs([-1.2, 1.2]) array([ 1.2, 1.2])
-
dask.array.
fix
(*args, **kwargs)¶ Round to nearest integer towards zero.
Round an array of floats element-wise to nearest integer towards zero. The rounded values are returned as floats.
Parameters: - x (array_like) – An array of floats to be rounded
- y (ndarray, optional) – Output array
Returns: out – The array of rounded numbers
Return type: ndarray of floats
Examples
>>> np.fix(3.14) 3.0 >>> np.fix(3) 3.0 >>> np.fix([2.1, 2.9, -2.1, -2.9]) array([ 2., 2., -2., -2.])
-
dask.array.
flatnonzero
(a)¶ Return indices that are non-zero in the flattened version of a.
This is equivalent to a.ravel().nonzero()[0].
Parameters: a (ndarray) – Input array. Returns: res – Output array, containing the indices of the elements of a.ravel() that are non-zero. Return type: ndarray See also
Examples
>>> x = np.arange(-2, 3) >>> x array([-2, -1, 0, 1, 2]) >>> np.flatnonzero(x) array([0, 1, 3, 4])
Use the indices of the non-zero elements as an index array to extract these elements:
>>> x.ravel()[np.flatnonzero(x)] array([-2, -1, 1, 2])
-
dask.array.
floor
(x[, out])¶ Return the floor of the input, element-wise.
The floor of the scalar x is the largest integer i, such that i <= x. It is often denoted as \(\lfloor x \rfloor\).
Parameters: x (array_like) – Input data. Returns: y – The floor of each element in x. Return type: ndarray or scalar Notes
Some spreadsheet programs calculate the “floor-towards-zero”, in other words
floor(-2.5) == -2
. NumPy instead uses the definition of floor where floor(-2.5) == -3.Examples
>>> a = np.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0]) >>> np.floor(a) array([-2., -2., -1., 0., 1., 1., 2.])
-
dask.array.
fmax
(x1, x2[, out])¶ Element-wise maximum of array elements.
Compare two arrays and returns a new array containing the element-wise maxima. If one of the elements being compared is a NaN, then the non-nan element is returned. If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts being a NaN. The net effect is that NaNs are ignored when possible.
Parameters: x2 (x1,) – The arrays holding the elements to be compared. They must have the same shape. Returns: y – The maximum of x1 and x2, element-wise. Returns scalar if both x1 and x2 are scalars. Return type: ndarray or scalar See also
Notes
New in version 1.3.0.
The fmax is equivalent to
np.where(x1 >= x2, x1, x2)
when neither x1 nor x2 are NaNs, but it is faster and does proper broadcasting.Examples
>>> np.fmax([2, 3, 4], [1, 5, 2]) array([ 2., 5., 4.])
>>> np.fmax(np.eye(2), [0.5, 2]) array([[ 1. , 2. ], [ 0.5, 2. ]])
>>> np.fmax([np.nan, 0, np.nan],[0, np.nan, np.nan]) array([ 0., 0., NaN])
-
dask.array.
fmin
(x1, x2[, out])¶ Element-wise minimum of array elements.
Compare two arrays and returns a new array containing the element-wise minima. If one of the elements being compared is a NaN, then the non-nan element is returned. If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts being a NaN. The net effect is that NaNs are ignored when possible.
Parameters: x2 (x1,) – The arrays holding the elements to be compared. They must have the same shape. Returns: y – The minimum of x1 and x2, element-wise. Returns scalar if both x1 and x2 are scalars. Return type: ndarray or scalar See also
Notes
New in version 1.3.0.
The fmin is equivalent to
np.where(x1 <= x2, x1, x2)
when neither x1 nor x2 are NaNs, but it is faster and does proper broadcasting.Examples
>>> np.fmin([2, 3, 4], [1, 5, 2]) array([2, 5, 4])
>>> np.fmin(np.eye(2), [0.5, 2]) array([[ 1. , 2. ], [ 0.5, 2. ]])
>>> np.fmin([np.nan, 0, np.nan],[0, np.nan, np.nan]) array([ 0., 0., NaN])
-
dask.array.
fmod
(x1, x2[, out])¶ Return the element-wise remainder of division.
This is the NumPy implementation of the C library function fmod, the remainder has the same sign as the dividend x1. It is equivalent to the Matlab(TM)
rem
function and should not be confused with the Python modulus operatorx1 % x2
.Parameters: - x1 (array_like) – Dividend.
- x2 (array_like) – Divisor.
Returns: y – The remainder of the division of x1 by x2.
Return type: See also
remainder()
- Equivalent to the Python
%
operator.
divide()
Notes
The result of the modulo operation for negative dividend and divisors is bound by conventions. For fmod, the sign of result is the sign of the dividend, while for remainder the sign of the result is the sign of the divisor. The fmod function is equivalent to the Matlab(TM)
rem
function.Examples
>>> np.fmod([-3, -2, -1, 1, 2, 3], 2) array([-1, 0, -1, 1, 0, 1]) >>> np.remainder([-3, -2, -1, 1, 2, 3], 2) array([1, 0, 1, 1, 0, 1])
>>> np.fmod([5, 3], [2, 2.]) array([ 1., 1.]) >>> a = np.arange(-3, 3).reshape(3, 2) >>> a array([[-3, -2], [-1, 0], [ 1, 2]]) >>> np.fmod(a, [2,2]) array([[-1, 0], [-1, 0], [ 1, 0]])
-
dask.array.
frexp
(x[, out1, out2])¶ Decompose the elements of x into mantissa and twos exponent.
Returns (mantissa, exponent), where x = mantissa * 2**exponent`. The mantissa is lies in the open interval(-1, 1), while the twos exponent is a signed integer.
Parameters: - x (array_like) – Array of numbers to be decomposed.
- out1 (ndarray, optional) – Output array for the mantissa. Must have the same shape as x.
- out2 (ndarray, optional) – Output array for the exponent. Must have the same shape as x.
Returns: (mantissa, exponent) – mantissa is a float array with values between -1 and 1. exponent is an int array which represents the exponent of 2.
Return type: tuple of ndarrays, (float, int)
See also
ldexp()
- Compute
y = x1 * 2**x2
, the inverse of frexp.
Notes
Complex dtypes are not supported, they will raise a TypeError.
Examples
>>> x = np.arange(9) >>> y1, y2 = np.frexp(x) >>> y1 array([ 0. , 0.5 , 0.5 , 0.75 , 0.5 , 0.625, 0.75 , 0.875, 0.5 ]) >>> y2 array([0, 1, 2, 2, 3, 3, 3, 3, 4]) >>> y1 * 2**y2 array([ 0., 1., 2., 3., 4., 5., 6., 7., 8.])
-
dask.array.
fromfunction
(func, chunks=None, shape=None, dtype=None)¶ Construct an array by executing a function over each coordinate.
The resulting array therefore has a value
fn(x, y, z)
at coordinate(x, y, z)
.Parameters: - function (callable) – The function is called with N parameters, where N is the rank of
shape. Each parameter represents the coordinates of the array
varying along a specific axis. For example, if shape
were
(2, 2)
, then the parameters in turn be (0, 0), (0, 1), (1, 0), (1, 1). - shape ((N,) tuple of ints) – Shape of the output array, which also determines the shape of the coordinate arrays passed to function.
- dtype (data-type, optional) – Data-type of the coordinate arrays passed to function. By default, dtype is float.
Returns: fromfunction – The result of the call to function is passed back directly. Therefore the shape of fromfunction is completely determined by function. If function returns a scalar value, the shape of fromfunction would match the shape parameter.
Return type: See also
indices()
,meshgrid()
Notes
Keywords other than dtype are passed to function.
Examples
>>> np.fromfunction(lambda i, j: i == j, (3, 3), dtype=int) array([[ True, False, False], [False, True, False], [False, False, True]], dtype=bool)
>>> np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int) array([[0, 1, 2], [1, 2, 3], [2, 3, 4]])
- function (callable) – The function is called with N parameters, where N is the rank of
shape. Each parameter represents the coordinates of the array
varying along a specific axis. For example, if shape
were
-
dask.array.
full
(*args, **kwargs)¶ Blocked variant of full
Follows the signature of full exactly except that it also requires a keyword argument chunks=(...)
Original signature follows below.
Return a new array of given shape and type, filled with fill_value.
Parameters: - shape (int or sequence of ints) – Shape of the new array, e.g.,
(2, 3)
or2
. - fill_value (scalar) – Fill value.
- dtype (data-type, optional) – The desired data-type for the array, e.g., np.int8. Default is float, but will change to np.array(fill_value).dtype in a future release.
- order ({'C', 'F'}, optional) – Whether to store multidimensional data in C- or Fortran-contiguous (row- or column-wise) order in memory.
Returns: out – Array of fill_value with the given shape, dtype, and order.
Return type: ndarray
See also
zeros_like()
- Return an array of zeros with shape and type of input.
ones_like()
- Return an array of ones with shape and type of input.
empty_like()
- Return an empty array with shape and type of input.
full_like()
- Fill an array with shape and type of input.
zeros()
- Return a new array setting values to zero.
ones()
- Return a new array setting values to one.
empty()
- Return a new uninitialized array.
Examples
>>> np.full((2, 2), np.inf) array([[ inf, inf], [ inf, inf]]) >>> np.full((2, 2), 10, dtype=np.int) array([[10, 10], [10, 10]])
- shape (int or sequence of ints) – Shape of the new array, e.g.,
-
dask.array.
full_like
(a, fill_value, dtype=None, chunks=None)¶ Return a full array with the same shape and type as a given array.
Parameters: - a (array_like) – The shape and data-type of a define these same attributes of the returned array.
- fill_value (scalar) – Fill value.
- dtype (data-type, optional) – Overrides the data type of the result.
- chunks (sequence of ints) – The number of samples on each block. Note that the last block will have
fewer samples if
len(array) % chunks != 0
.
Returns: out – Array of fill_value with the same shape and type as a.
Return type: ndarray
See also
zeros_like()
- Return an array of zeros with shape and type of input.
ones_like()
- Return an array of ones with shape and type of input.
empty_like()
- Return an empty array with shape and type of input.
zeros()
- Return a new array setting values to zero.
ones()
- Return a new array setting values to one.
empty()
- Return a new uninitialized array.
full()
- Fill a new array.
-
dask.array.
histogram
(a, bins=None, range=None, normed=False, weights=None, density=None)¶ Blocked variant of numpy.histogram.
Follows the signature of numpy.histogram exactly with the following exceptions:
- Either an iterable specifying the
bins
or the number ofbins
and arange
argument is required as computingmin
andmax
over blocked arrays is an expensive operation that must be performed explicitly. weights
must be a dask.array.Array with the same block structure asa
.
Examples
Using number of bins and range:
>>> import dask.array as da >>> import numpy as np >>> x = da.from_array(np.arange(10000), chunks=10) >>> h, bins = da.histogram(x, bins=10, range=[0, 10000]) >>> bins array([ 0., 1000., 2000., 3000., 4000., 5000., 6000., 7000., 8000., 9000., 10000.]) >>> h.compute() array([1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000])
Explicitly specifying the bins:
>>> h, bins = da.histogram(x, bins=np.array([0, 5000, 10000])) >>> bins array([ 0, 5000, 10000]) >>> h.compute() array([5000, 5000])
- Either an iterable specifying the
-
dask.array.
hstack
(tup)¶ Stack arrays in sequence horizontally (column wise).
Take a sequence of arrays and stack them horizontally to make a single array. Rebuild arrays divided by hsplit.
Parameters: tup (sequence of ndarrays) – All arrays must have the same shape along all but the second axis. Returns: stacked – The array formed by stacking the given arrays. Return type: ndarray See also
stack()
- Join a sequence of arrays along a new axis.
vstack()
- Stack arrays in sequence vertically (row wise).
dstack()
- Stack arrays in sequence depth wise (along third axis).
concatenate()
- Join a sequence of arrays along an existing axis.
hsplit()
- Split array along second axis.
Notes
Equivalent to
np.concatenate(tup, axis=1)
Examples
>>> a = np.array((1,2,3)) >>> b = np.array((2,3,4)) >>> np.hstack((a,b)) array([1, 2, 3, 2, 3, 4]) >>> a = np.array([[1],[2],[3]]) >>> b = np.array([[2],[3],[4]]) >>> np.hstack((a,b)) array([[1, 2], [2, 3], [3, 4]])
-
dask.array.
hypot
(x1, x2[, out])¶ Given the “legs” of a right triangle, return its hypotenuse.
Equivalent to
sqrt(x1**2 + x2**2)
, element-wise. If x1 or x2 is scalar_like (i.e., unambiguously cast-able to a scalar type), it is broadcast for use with each element of the other argument. (See Examples)Parameters: - x2 (x1,) – Leg of the triangle(s).
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: z – The hypotenuse of the triangle(s).
Return type: ndarray
Examples
>>> np.hypot(3*np.ones((3, 3)), 4*np.ones((3, 3))) array([[ 5., 5., 5.], [ 5., 5., 5.], [ 5., 5., 5.]])
Example showing broadcast of scalar_like argument:
>>> np.hypot(3*np.ones((3, 3)), [4]) array([[ 5., 5., 5.], [ 5., 5., 5.], [ 5., 5., 5.]])
-
dask.array.
imag
(*args, **kwargs)¶ Return the imaginary part of the elements of the array.
Parameters: val (array_like) – Input array. Returns: out – Output array. If val is real, the type of val is used for the output. If val has complex elements, the returned type is float. Return type: ndarray Examples
>>> a = np.array([1+2j, 3+4j, 5+6j]) >>> a.imag array([ 2., 4., 6.]) >>> a.imag = np.array([8, 10, 12]) >>> a array([ 1. +8.j, 3.+10.j, 5.+12.j])
-
dask.array.
indices
(dimensions, dtype=<type 'int'>, chunks=None)¶ Implements NumPy’s
indices
for Dask Arrays.Generates a grid of indices covering the dimensions provided.
The final array has the shape
(len(dimensions), *dimensions)
. The chunks are used to specify the chunking for axis 1 up tolen(dimensions)
. The 0th axis always has chunks of length 1.Parameters: - dimensions (sequence of ints) – The shape of the index grid.
- dtype (dtype, optional) – Type to use for the array. Default is
int
. - chunks (sequence of ints) – The number of samples on each block. Note that the last block will have
fewer samples if
len(array) % chunks != 0
.
Returns: grid
Return type: dask array
-
dask.array.
insert
(arr, obj, values, axis)¶ Insert values along the given axis before the given indices.
Parameters: - arr (array_like) – Input array.
- obj (int, slice or sequence of ints) –
Object that defines the index or indices before which values is inserted.
New in version 1.8.0.
Support for multiple insertions when obj is a single scalar or a sequence with one element (similar to calling insert multiple times).
- values (array_like) – Values to insert into arr. If the type of values is different
from that of arr, values is converted to the type of arr.
values should be shaped so that
arr[...,obj,...] = values
is legal. - axis (int, optional) – Axis along which to insert values. If axis is None then arr is flattened first.
Returns: out – A copy of arr with values inserted. Note that insert does not occur in-place: a new array is returned. If axis is None, out is a flattened array.
Return type: ndarray
See also
append()
- Append elements at the end of an array.
concatenate()
- Join a sequence of arrays along an existing axis.
delete()
- Delete elements from an array.
Notes
Note that for higher dimensional inserts obj=0 behaves very different from obj=[0] just like arr[:,0,:] = values is different from arr[:,[0],:] = values.
Examples
>>> a = np.array([[1, 1], [2, 2], [3, 3]]) >>> a array([[1, 1], [2, 2], [3, 3]]) >>> np.insert(a, 1, 5) array([1, 5, 1, 2, 2, 3, 3]) >>> np.insert(a, 1, 5, axis=1) array([[1, 5, 1], [2, 5, 2], [3, 5, 3]])
Difference between sequence and scalars:
>>> np.insert(a, [1], [[1],[2],[3]], axis=1) array([[1, 1, 1], [2, 2, 2], [3, 3, 3]]) >>> np.array_equal(np.insert(a, 1, [1, 2, 3], axis=1), ... np.insert(a, [1], [[1],[2],[3]], axis=1)) True
>>> b = a.flatten() >>> b array([1, 1, 2, 2, 3, 3]) >>> np.insert(b, [2, 2], [5, 6]) array([1, 1, 5, 6, 2, 2, 3, 3])
>>> np.insert(b, slice(2, 4), [5, 6]) array([1, 1, 5, 2, 6, 2, 3, 3])
>>> np.insert(b, [2, 2], [7.13, False]) # type casting array([1, 1, 7, 0, 2, 2, 3, 3])
>>> x = np.arange(8).reshape(2, 4) >>> idx = (1, 3) >>> np.insert(x, idx, 999, axis=1) array([[ 0, 999, 1, 2, 999, 3], [ 4, 999, 5, 6, 999, 7]])
-
dask.array.
isclose
(arr1, arr2, rtol=1e-05, atol=1e-08, equal_nan=False)¶ Returns a boolean array where two arrays are element-wise equal within a tolerance.
The tolerance values are positive, typically very small numbers. The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b.
Parameters: Returns: y – Returns a boolean array of where a and b are equal within the given tolerance. If both a and b are scalars, returns a single boolean value.
Return type: See also
allclose()
Notes
New in version 1.7.0.
For finite values, isclose uses the following equation to test whether two floating point values are equivalent.
absolute(a - b) <= (atol + rtol * absolute(b))The above equation is not symmetric in a and b, so that isclose(a, b) might be different from isclose(b, a) in some rare cases.
Examples
>>> np.isclose([1e10,1e-7], [1.00001e10,1e-8]) array([True, False]) >>> np.isclose([1e10,1e-8], [1.00001e10,1e-9]) array([True, True]) >>> np.isclose([1e10,1e-8], [1.0001e10,1e-9]) array([False, True]) >>> np.isclose([1.0, np.nan], [1.0, np.nan]) array([True, False]) >>> np.isclose([1.0, np.nan], [1.0, np.nan], equal_nan=True) array([True, True])
-
dask.array.
iscomplex
(*args, **kwargs)¶ Returns a bool array, where True if input element is complex.
What is tested is whether the input has a non-zero imaginary part, not if the input type is complex.
Parameters: x (array_like) – Input array. Returns: out – Output array. Return type: ndarray of bools Examples
>>> np.iscomplex([1+1j, 1+0j, 4.5, 3, 2, 2j]) array([ True, False, False, False, False, True], dtype=bool)
-
dask.array.
isfinite
(x[, out])¶ Test element-wise for finiteness (not infinity or not Not a Number).
The result is returned as a boolean array.
Parameters: - x (array_like) – Input values.
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: y – For scalar input, the result is a new boolean with value True if the input is finite; otherwise the value is False (input is either positive infinity, negative infinity or Not a Number).
For array input, the result is a boolean array with the same dimensions as the input and the values are True if the corresponding element of the input is finite; otherwise the values are False (element is either positive infinity, negative infinity or Not a Number).
Return type: ndarray, bool
Notes
Not a Number, positive infinity and negative infinity are considered to be non-finite.
Numpy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). This means that Not a Number is not equivalent to infinity. Also that positive infinity is not equivalent to negative infinity. But infinity is equivalent to positive infinity. Errors result if the second argument is also supplied when x is a scalar input, or if first and second arguments have different shapes.
Examples
>>> np.isfinite(1) True >>> np.isfinite(0) True >>> np.isfinite(np.nan) False >>> np.isfinite(np.inf) False >>> np.isfinite(np.NINF) False >>> np.isfinite([np.log(-1.),1.,np.log(0)]) array([False, True, False], dtype=bool)
>>> x = np.array([-np.inf, 0., np.inf]) >>> y = np.array([2, 2, 2]) >>> np.isfinite(x, y) array([0, 1, 0]) >>> y array([0, 1, 0])
-
dask.array.
isinf
(x[, out])¶ Test element-wise for positive or negative infinity.
Returns a boolean array of the same shape as x, True where
x == +/-inf
, otherwise False.Parameters: - x (array_like) – Input values
- out (array_like, optional) – An array with the same shape as x to store the result.
Returns: y – For scalar input, the result is a new boolean with value True if the input is positive or negative infinity; otherwise the value is False.
For array input, the result is a boolean array with the same shape as the input and the values are True where the corresponding element of the input is positive or negative infinity; elsewhere the values are False. If a second argument was supplied the result is stored there. If the type of that array is a numeric type the result is represented as zeros and ones, if the type is boolean then as False and True, respectively. The return value y is then a reference to that array.
Return type: bool (scalar) or boolean ndarray
See also
isneginf()
,isposinf()
,isnan()
,isfinite()
Notes
Numpy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754).
Errors result if the second argument is supplied when the first argument is a scalar, or if the first and second arguments have different shapes.
Examples
>>> np.isinf(np.inf) True >>> np.isinf(np.nan) False >>> np.isinf(np.NINF) True >>> np.isinf([np.inf, -np.inf, 1.0, np.nan]) array([ True, True, False, False], dtype=bool)
>>> x = np.array([-np.inf, 0., np.inf]) >>> y = np.array([2, 2, 2]) >>> np.isinf(x, y) array([1, 0, 1]) >>> y array([1, 0, 1])
-
dask.array.
isnan
(x[, out])¶ Test element-wise for NaN and return result as a boolean array.
Parameters: x (array_like) – Input array. Returns: y – For scalar input, the result is a new boolean with value True if the input is NaN; otherwise the value is False. For array input, the result is a boolean array of the same dimensions as the input and the values are True if the corresponding element of the input is NaN; otherwise the values are False.
Return type: ndarray or bool See also
isinf()
,isneginf()
,isposinf()
,isfinite()
Notes
Numpy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). This means that Not a Number is not equivalent to infinity.
Examples
>>> np.isnan(np.nan) True >>> np.isnan(np.inf) False >>> np.isnan([np.log(-1.),1.,np.log(0)]) array([ True, False, False], dtype=bool)
-
dask.array.
isnull
(values)¶ pandas.isnull for dask arrays
-
dask.array.
isreal
(*args, **kwargs)¶ Returns a bool array, where True if input element is real.
If element has complex type with zero complex part, the return value for that element is True.
Parameters: x (array_like) – Input array. Returns: out – Boolean array of same shape as x. Return type: ndarray, bool Examples
>>> np.isreal([1+1j, 1+0j, 4.5, 3, 2, 2j]) array([False, True, True, True, True, False], dtype=bool)
-
dask.array.
ldexp
(x1, x2[, out])¶ Returns x1 * 2**x2, element-wise.
The mantissas x1 and twos exponents x2 are used to construct floating point numbers
x1 * 2**x2
.Parameters: - x1 (array_like) – Array of multipliers.
- x2 (array_like, int) – Array of twos exponents.
- out (ndarray, optional) – Output array for the result.
Returns: y – The result of
x1 * 2**x2
.Return type: ndarray or scalar
See also
frexp()
- Return (y1, y2) from
x = y1 * 2**y2
, inverse to ldexp.
Notes
Complex dtypes are not supported, they will raise a TypeError.
ldexp is useful as the inverse of frexp, if used by itself it is more clear to simply use the expression
x1 * 2**x2
.Examples
>>> np.ldexp(5, np.arange(4)) array([ 5., 10., 20., 40.], dtype=float32)
>>> x = np.arange(6) >>> np.ldexp(*np.frexp(x)) array([ 0., 1., 2., 3., 4., 5.])
-
dask.array.
linspace
(start, stop, num=50, chunks=None, dtype=None)¶ Return num evenly spaced values over the closed interval [start, stop].
TODO: implement the endpoint, restep, and dtype keyword args
Parameters: - start (scalar) – The starting value of the sequence.
- stop (scalar) – The last value of the sequence.
- num (int, optional) – Number of samples to include in the returned dask array, including the endpoints.
- chunks (int) – The number of samples on each block. Note that the last block will have fewer samples if num % blocksize != 0
Returns: samples
Return type: dask array
See also
-
dask.array.
log
(x[, out])¶ Natural logarithm, element-wise.
The natural logarithm log is the inverse of the exponential function, so that log(exp(x)) = x. The natural logarithm is logarithm in base e.
Parameters: x (array_like) – Input value. Returns: y – The natural logarithm of x, element-wise. Return type: ndarray Notes
Logarithm is a multivalued function: for each x there is an infinite number of z such that exp(z) = x. The convention is to return the z whose imaginary part lies in [-pi, pi].
For real-valued input data types, log always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, log is a complex analytical function that has a branch cut [-inf, 0] and is continuous from above on it. log handles the floating-point negative zero as an infinitesimal negative number, conforming to the C99 standard.
References
[1] M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 67. http://www.math.sfu.ca/~cbm/aands/ [2] Wikipedia, “Logarithm”. http://en.wikipedia.org/wiki/Logarithm Examples
>>> np.log([1, np.e, np.e**2, 0]) array([ 0., 1., 2., -Inf])
-
dask.array.
log10
(x[, out])¶ Return the base 10 logarithm of the input array, element-wise.
Parameters: x (array_like) – Input values. Returns: y – The logarithm to the base 10 of x, element-wise. NaNs are returned where x is negative. Return type: ndarray See also
emath.log10()
Notes
Logarithm is a multivalued function: for each x there is an infinite number of z such that 10**z = x. The convention is to return the z whose imaginary part lies in [-pi, pi].
For real-valued input data types, log10 always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, log10 is a complex analytical function that has a branch cut [-inf, 0] and is continuous from above on it. log10 handles the floating-point negative zero as an infinitesimal negative number, conforming to the C99 standard.
References
[1] M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 67. http://www.math.sfu.ca/~cbm/aands/ [2] Wikipedia, “Logarithm”. http://en.wikipedia.org/wiki/Logarithm Examples
>>> np.log10([1e-15, -3.]) array([-15., NaN])
-
dask.array.
log1p
(x[, out])¶ Return the natural logarithm of one plus the input array, element-wise.
Calculates
log(1 + x)
.Parameters: x (array_like) – Input values. Returns: y – Natural logarithm of 1 + x, element-wise. Return type: ndarray See also
expm1()
exp(x) - 1
, the inverse of log1p.
Notes
For real-valued input, log1p is accurate also for x so small that 1 + x == 1 in floating-point accuracy.
Logarithm is a multivalued function: for each x there is an infinite number of z such that exp(z) = 1 + x. The convention is to return the z whose imaginary part lies in [-pi, pi].
For real-valued input data types, log1p always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, log1p is a complex analytical function that has a branch cut [-inf, -1] and is continuous from above on it. log1p handles the floating-point negative zero as an infinitesimal negative number, conforming to the C99 standard.
References
[1] M. Abramowitz and I.A. Stegun, “Handbook of Mathematical Functions”, 10th printing, 1964, pp. 67. http://www.math.sfu.ca/~cbm/aands/ [2] Wikipedia, “Logarithm”. http://en.wikipedia.org/wiki/Logarithm Examples
>>> np.log1p(1e-99) 1e-99 >>> np.log(1 + 1e-99) 0.0
-
dask.array.
log2
(x[, out])¶ Base-2 logarithm of x.
Parameters: x (array_like) – Input values. Returns: y – Base-2 logarithm of x. Return type: ndarray Notes
New in version 1.3.0.
Logarithm is a multivalued function: for each x there is an infinite number of z such that 2**z = x. The convention is to return the z whose imaginary part lies in [-pi, pi].
For real-valued input data types, log2 always returns real output. For each value that cannot be expressed as a real number or infinity, it yields
nan
and sets the invalid floating point error flag.For complex-valued input, log2 is a complex analytical function that has a branch cut [-inf, 0] and is continuous from above on it. log2 handles the floating-point negative zero as an infinitesimal negative number, conforming to the C99 standard.
Examples
>>> x = np.array([0, 1, 2, 2**4]) >>> np.log2(x) array([-Inf, 0., 1., 4.])
>>> xi = np.array([0+1.j, 1, 2+0.j, 4.j]) >>> np.log2(xi) array([ 0.+2.26618007j, 0.+0.j , 1.+0.j , 2.+2.26618007j])
-
dask.array.
logaddexp
(x1, x2[, out])¶ Logarithm of the sum of exponentiations of the inputs.
Calculates
log(exp(x1) + exp(x2))
. This function is useful in statistics where the calculated probabilities of events may be so small as to exceed the range of normal floating point numbers. In such cases the logarithm of the calculated probability is stored. This function allows adding probabilities stored in such a fashion.Parameters: x2 (x1,) – Input values. Returns: result – Logarithm of exp(x1) + exp(x2)
.Return type: ndarray See also
logaddexp2()
- Logarithm of the sum of exponentiations of inputs in base 2.
Notes
New in version 1.3.0.
Examples
>>> prob1 = np.log(1e-50) >>> prob2 = np.log(2.5e-50) >>> prob12 = np.logaddexp(prob1, prob2) >>> prob12 -113.87649168120691 >>> np.exp(prob12) 3.5000000000000057e-50
-
dask.array.
logaddexp2
(x1, x2[, out])¶ Logarithm of the sum of exponentiations of the inputs in base-2.
Calculates
log2(2**x1 + 2**x2)
. This function is useful in machine learning when the calculated probabilities of events may be so small as to exceed the range of normal floating point numbers. In such cases the base-2 logarithm of the calculated probability can be used instead. This function allows adding probabilities stored in such a fashion.Parameters: - x2 (x1,) – Input values.
- out (ndarray, optional) – Array to store results in.
Returns: result – Base-2 logarithm of
2**x1 + 2**x2
.Return type: ndarray
See also
logaddexp()
- Logarithm of the sum of exponentiations of the inputs.
Notes
New in version 1.3.0.
Examples
>>> prob1 = np.log2(1e-50) >>> prob2 = np.log2(2.5e-50) >>> prob12 = np.logaddexp2(prob1, prob2) >>> prob1, prob2, prob12 (-166.09640474436813, -164.77447664948076, -164.28904982231052) >>> 2**prob12 3.4999999999999914e-50
-
dask.array.
logical_and
(x1, x2[, out])¶ Compute the truth value of x1 AND x2 element-wise.
Parameters: x2 (x1,) – Input arrays. x1 and x2 must be of the same shape. Returns: y – Boolean result with the same shape as x1 and x2 of the logical AND operation on corresponding elements of x1 and x2. Return type: ndarray or bool See also
logical_or()
,logical_not()
,logical_xor()
,bitwise_and()
Examples
>>> np.logical_and(True, False) False >>> np.logical_and([True, False], [False, False]) array([False, False], dtype=bool)
>>> x = np.arange(5) >>> np.logical_and(x>1, x<4) array([False, False, True, True, False], dtype=bool)
-
dask.array.
logical_not
(x[, out])¶ Compute the truth value of NOT x element-wise.
Parameters: x (array_like) – Logical NOT is applied to the elements of x. Returns: y – Boolean result with the same shape as x of the NOT operation on elements of x. Return type: bool or ndarray of bool See also
Examples
>>> np.logical_not(3) False >>> np.logical_not([True, False, 0, 1]) array([False, True, True, False], dtype=bool)
>>> x = np.arange(5) >>> np.logical_not(x<3) array([False, False, False, True, True], dtype=bool)
-
dask.array.
logical_or
(x1, x2[, out])¶ Compute the truth value of x1 OR x2 element-wise.
Parameters: x2 (x1,) – Logical OR is applied to the elements of x1 and x2. They have to be of the same shape. Returns: y – Boolean result with the same shape as x1 and x2 of the logical OR operation on elements of x1 and x2. Return type: ndarray or bool See also
logical_and()
,logical_not()
,logical_xor()
,bitwise_or()
Examples
>>> np.logical_or(True, False) True >>> np.logical_or([True, False], [False, False]) array([ True, False], dtype=bool)
>>> x = np.arange(5) >>> np.logical_or(x < 1, x > 3) array([ True, False, False, False, True], dtype=bool)
-
dask.array.
logical_xor
(x1, x2[, out])¶ Compute the truth value of x1 XOR x2, element-wise.
Parameters: x2 (x1,) – Logical XOR is applied to the elements of x1 and x2. They must be broadcastable to the same shape. Returns: y – Boolean result of the logical XOR operation applied to the elements of x1 and x2; the shape is determined by whether or not broadcasting of one or both arrays was required. Return type: bool or ndarray of bool See also
logical_and()
,logical_or()
,logical_not()
,bitwise_xor()
Examples
>>> np.logical_xor(True, False) True >>> np.logical_xor([True, True, False, False], [True, False, True, False]) array([False, True, True, False], dtype=bool)
>>> x = np.arange(5) >>> np.logical_xor(x < 1, x > 3) array([ True, False, False, False, True], dtype=bool)
Simple example showing support of broadcasting
>>> np.logical_xor(0, np.eye(2)) array([[ True, False], [False, True]], dtype=bool)
-
dask.array.
max
(a, axis=None, keepdims=False, split_every=None, out=None)¶ Return the maximum of an array or maximum along an axis.
Parameters: - a (array_like) – Input data.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which to operate. By default, flattened input is used.
If this is a tuple of ints, the maximum is selected over multiple axes, instead of a single axis or all the axes as before.
- out (ndarray, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. See doc.ufuncs (Section “Output arguments”) for more details.
- keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the amax method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: amax – Maximum of a. If axis is None, the result is a scalar value. If axis is given, the result is an array of dimension
a.ndim - 1
.Return type: ndarray or scalar
See also
amin()
- The minimum value of an array along a given axis, propagating any NaNs.
nanmax()
- The maximum value of an array along a given axis, ignoring any NaNs.
maximum()
- Element-wise maximum of two arrays, propagating any NaNs.
fmax()
- Element-wise maximum of two arrays, ignoring any NaNs.
argmax()
- Return the indices of the maximum values.
Notes
NaN values are propagated, that is if at least one item is NaN, the corresponding max value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmax.
Don’t use amax for element-wise comparison of 2 arrays; when
a.shape[0]
is 2,maximum(a[0], a[1])
is faster thanamax(a, axis=0)
.Examples
>>> a = np.arange(4).reshape((2,2)) >>> a array([[0, 1], [2, 3]]) >>> np.amax(a) # Maximum of the flattened array 3 >>> np.amax(a, axis=0) # Maxima along the first axis array([2, 3]) >>> np.amax(a, axis=1) # Maxima along the second axis array([1, 3])
>>> b = np.arange(5, dtype=np.float) >>> b[2] = np.NaN >>> np.amax(b) nan >>> np.nanmax(b) 4.0
-
dask.array.
maximum
(x1, x2[, out])¶ Element-wise maximum of array elements.
Compare two arrays and returns a new array containing the element-wise maxima. If one of the elements being compared is a NaN, then that element is returned. If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts being a NaN. The net effect is that NaNs are propagated.
Parameters: x2 (x1,) – The arrays holding the elements to be compared. They must have the same shape, or shapes that can be broadcast to a single shape. Returns: y – The maximum of x1 and x2, element-wise. Returns scalar if both x1 and x2 are scalars. Return type: ndarray or scalar See also
Notes
The maximum is equivalent to
np.where(x1 >= x2, x1, x2)
when neither x1 nor x2 are nans, but it is faster and does proper broadcasting.Examples
>>> np.maximum([2, 3, 4], [1, 5, 2]) array([2, 5, 4])
>>> np.maximum(np.eye(2), [0.5, 2]) # broadcasting array([[ 1. , 2. ], [ 0.5, 2. ]])
>>> np.maximum([np.nan, 0, np.nan], [0, np.nan, np.nan]) array([ NaN, NaN, NaN]) >>> np.maximum(np.Inf, 1) inf
-
dask.array.
mean
(a, axis=None, dtype=None, keepdims=False, split_every=None, out=None)¶ Compute the arithmetic mean along the specified axis.
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.
Parameters: - a (array_like) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which the means are computed. The default is to compute the mean of the flattened array.
If this is a tuple of ints, a mean is performed over multiple axes, instead of a single axis or all the axes as before.
- dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for floating point inputs, it is the same as the input dtype.
- out (ndarray, optional) – Alternate output array in which to place the result. The default
is
None
; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details. - keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the mean method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned.
Return type: ndarray, see dtype parameter above
Notes
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> np.mean(a) 2.5 >>> np.mean(a, axis=0) array([ 2., 3.]) >>> np.mean(a, axis=1) array([ 1.5, 3.5])
In single precision, mean can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.mean(a) 0.546875
Computing the mean in float64 is more accurate:
>>> np.mean(a, dtype=np.float64) 0.55000000074505806
-
dask.array.
min
(a, axis=None, keepdims=False, split_every=None, out=None)¶ Return the minimum of an array or minimum along an axis.
Parameters: - a (array_like) – Input data.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which to operate. By default, flattened input is used.
If this is a tuple of ints, the minimum is selected over multiple axes, instead of a single axis or all the axes as before.
- out (ndarray, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output. See doc.ufuncs (Section “Output arguments”) for more details.
- keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the amin method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: amin – Minimum of a. If axis is None, the result is a scalar value. If axis is given, the result is an array of dimension
a.ndim - 1
.Return type: ndarray or scalar
See also
amax()
- The maximum value of an array along a given axis, propagating any NaNs.
nanmin()
- The minimum value of an array along a given axis, ignoring any NaNs.
minimum()
- Element-wise minimum of two arrays, propagating any NaNs.
fmin()
- Element-wise minimum of two arrays, ignoring any NaNs.
argmin()
- Return the indices of the minimum values.
Notes
NaN values are propagated, that is if at least one item is NaN, the corresponding min value will be NaN as well. To ignore NaN values (MATLAB behavior), please use nanmin.
Don’t use amin for element-wise comparison of 2 arrays; when
a.shape[0]
is 2,minimum(a[0], a[1])
is faster thanamin(a, axis=0)
.Examples
>>> a = np.arange(4).reshape((2,2)) >>> a array([[0, 1], [2, 3]]) >>> np.amin(a) # Minimum of the flattened array 0 >>> np.amin(a, axis=0) # Minima along the first axis array([0, 1]) >>> np.amin(a, axis=1) # Minima along the second axis array([0, 2])
>>> b = np.arange(5, dtype=np.float) >>> b[2] = np.NaN >>> np.amin(b) nan >>> np.nanmin(b) 0.0
-
dask.array.
minimum
(x1, x2[, out])¶ Element-wise minimum of array elements.
Compare two arrays and returns a new array containing the element-wise minima. If one of the elements being compared is a NaN, then that element is returned. If both elements are NaNs then the first is returned. The latter distinction is important for complex NaNs, which are defined as at least one of the real or imaginary parts being a NaN. The net effect is that NaNs are propagated.
Parameters: x2 (x1,) – The arrays holding the elements to be compared. They must have the same shape, or shapes that can be broadcast to a single shape. Returns: y – The minimum of x1 and x2, element-wise. Returns scalar if both x1 and x2 are scalars. Return type: ndarray or scalar See also
Notes
The minimum is equivalent to
np.where(x1 <= x2, x1, x2)
when neither x1 nor x2 are NaNs, but it is faster and does proper broadcasting.Examples
>>> np.minimum([2, 3, 4], [1, 5, 2]) array([1, 3, 2])
>>> np.minimum(np.eye(2), [0.5, 2]) # broadcasting array([[ 0.5, 0. ], [ 0. , 1. ]])
>>> np.minimum([np.nan, 0, np.nan],[0, np.nan, np.nan]) array([ NaN, NaN, NaN]) >>> np.minimum(-np.Inf, 1) -inf
-
dask.array.
modf
(x[, out1, out2])¶ Return the fractional and integral parts of an array, element-wise.
The fractional and integral parts are negative if the given number is negative.
Parameters: x (array_like) – Input array. Returns: - y1 (ndarray) – Fractional part of x.
- y2 (ndarray) – Integral part of x.
Notes
For integer input the return values are floats.
Examples
>>> np.modf([0, 3.5]) (array([ 0. , 0.5]), array([ 0., 3.])) >>> np.modf(-0.5) (-0.5, -0)
-
dask.array.
moment
(a, order, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶
-
dask.array.
nanargmax
(x, axis=None, split_every=None, out=None)¶
-
dask.array.
nanargmin
(x, axis=None, split_every=None, out=None)¶
-
dask.array.
nancumprod
(x, axis, dtype=None, out=None)¶ Return the cumulative product of array elements over a given axis treating Not a Numbers (NaNs) as one. The cumulative product does not change when NaNs are encountered and leading NaNs are replaced by ones.
Ones are returned for slices that are all-NaN or empty.
New in version 1.12.0.
Parameters: - a (array_like) – Input array.
- axis (int, optional) – Axis along which the cumulative product is computed. By default the input is flattened.
- dtype (dtype, optional) – Type of the returned array, as well as of the accumulator in which the elements are multiplied. If dtype is not specified, it defaults to the dtype of a, unless a has an integer dtype with a precision less than that of the default platform integer. In that case, the default platform integer is used instead.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type of the resulting values will be cast if necessary.
Returns: nancumprod – A new array holding the result is returned unless out is specified, in which case it is returned.
Return type: ndarray
See also
numpy.cumprod()
- Cumulative product across array propagating NaNs.
isnan()
- Show which elements are NaN.
Examples
>>> np.nancumprod(1) array([1]) >>> np.nancumprod([1]) array([1]) >>> np.nancumprod([1, np.nan]) array([ 1., 1.]) >>> a = np.array([[1, 2], [3, np.nan]]) >>> np.nancumprod(a) array([ 1., 2., 6., 6.]) >>> np.nancumprod(a, axis=0) array([[ 1., 2.], [ 3., 2.]]) >>> np.nancumprod(a, axis=1) array([[ 1., 2.], [ 3., 3.]])
-
dask.array.
nancumsum
(x, axis, dtype=None, out=None)¶ Return the cumulative sum of array elements over a given axis treating Not a Numbers (NaNs) as zero. The cumulative sum does not change when NaNs are encountered and leading NaNs are replaced by zeros.
Zeros are returned for slices that are all-NaN or empty.
New in version 1.12.0.
Parameters: - a (array_like) – Input array.
- axis (int, optional) – Axis along which the cumulative sum is computed. The default (None) is to compute the cumsum over the flattened array.
- dtype (dtype, optional) – Type of the returned array and of the accumulator in which the elements are summed. If dtype is not specified, it defaults to the dtype of a, unless a has an integer dtype with a precision less than that of the default platform integer. In that case, the default platform integer is used.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type will be cast if necessary. See doc.ufuncs (Section “Output arguments”) for more details.
Returns: nancumsum – A new array holding the result is returned unless out is specified, in which it is returned. The result has the same size as a, and the same shape as a if axis is not None or a is a 1-d array.
Return type: ndarray.
See also
numpy.cumsum()
- Cumulative sum across array propagating NaNs.
isnan()
- Show which elements are NaN.
Examples
>>> np.nancumsum(1) array([1]) >>> np.nancumsum([1]) array([1]) >>> np.nancumsum([1, np.nan]) array([ 1., 1.]) >>> a = np.array([[1, 2], [3, np.nan]]) >>> np.nancumsum(a) array([ 1., 3., 6., 6.]) >>> np.nancumsum(a, axis=0) array([[ 1., 2.], [ 4., 2.]]) >>> np.nancumsum(a, axis=1) array([[ 1., 3.], [ 3., 3.]])
-
dask.array.
nanmax
(a, axis=None, keepdims=False, split_every=None, out=None)¶ Return the maximum of an array or maximum along an axis, ignoring any NaNs. When all-NaN slices are encountered a
RuntimeWarning
is raised and NaN is returned for that slice.Parameters: - a (array_like) – Array containing numbers whose maximum is desired. If a is not an array, a conversion is attempted.
- axis (int, optional) – Axis along which the maximum is computed. The default is to compute the maximum of the flattened array.
- out (ndarray, optional) –
Alternate output array in which to place the result. The default is
None
; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details.New in version 1.8.0.
- keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original a.
If the value is anything but the default, then keepdims will be passed through to the max method of sub-classes of ndarray. If the sub-classes methods does not implement keepdims any exceptions will be raised.
New in version 1.8.0.
Returns: nanmax – An array with the same shape as a, with the specified axis removed. If a is a 0-d array, or if axis is None, an ndarray scalar is returned. The same dtype as a is returned.
Return type: ndarray
See also
nanmin()
- The minimum value of an array along a given axis, ignoring any NaNs.
amax()
- The maximum value of an array along a given axis, propagating any NaNs.
fmax()
- Element-wise maximum of two arrays, ignoring any NaNs.
maximum()
- Element-wise maximum of two arrays, propagating any NaNs.
isnan()
- Shows which elements are Not a Number (NaN).
isfinite()
- Shows which elements are neither NaN nor infinity.
Notes
Numpy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). This means that Not a Number is not equivalent to infinity. Positive infinity is treated as a very large number and negative infinity is treated as a very small (i.e. negative) number.
If the input has a integer type the function is equivalent to np.max.
Examples
>>> a = np.array([[1, 2], [3, np.nan]]) >>> np.nanmax(a) 3.0 >>> np.nanmax(a, axis=0) array([ 3., 2.]) >>> np.nanmax(a, axis=1) array([ 2., 3.])
When positive infinity and negative infinity are present:
>>> np.nanmax([1, 2, np.nan, np.NINF]) 2.0 >>> np.nanmax([1, 2, np.nan, np.inf]) inf
-
dask.array.
nanmean
(a, axis=None, dtype=None, keepdims=False, split_every=None, out=None)¶ Compute the arithmetic mean along the specified axis, ignoring NaNs.
Returns the average of the array elements. The average is taken over the flattened array by default, otherwise over the specified axis. float64 intermediate and return values are used for integer inputs.
For all-NaN slices, NaN is returned and a RuntimeWarning is raised.
New in version 1.8.0.
Parameters: - a (array_like) – Array containing numbers whose mean is desired. If a is not an array, a conversion is attempted.
- axis (int, optional) – Axis along which the means are computed. The default is to compute the mean of the flattened array.
- dtype (data-type, optional) – Type to use in computing the mean. For integer inputs, the default is float64; for inexact inputs, it is the same as the input dtype.
- out (ndarray, optional) – Alternate output array in which to place the result. The default
is
None
; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details. - keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original a.
If the value is anything but the default, then keepdims will be passed through to the mean or sum methods of sub-classes of ndarray. If the sub-classes methods does not implement keepdims any exceptions will be raised.
Returns: m – If out=None, returns a new array containing the mean values, otherwise a reference to the output array is returned. Nan is returned for slices that contain only NaNs.
Return type: ndarray, see dtype parameter above
See also
average()
- Weighted average
mean()
- Arithmetic mean taken while not ignoring NaNs
Notes
The arithmetic mean is the sum of the non-NaN elements along the axis divided by the number of non-NaN elements.
Note that for floating-point input, the mean is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32. Specifying a higher-precision accumulator using the dtype keyword can alleviate this issue.
Examples
>>> a = np.array([[1, np.nan], [3, 4]]) >>> np.nanmean(a) 2.6666666666666665 >>> np.nanmean(a, axis=0) array([ 2., 4.]) >>> np.nanmean(a, axis=1) array([ 1., 3.5])
-
dask.array.
nanmin
(a, axis=None, keepdims=False, split_every=None, out=None)¶ Return minimum of an array or minimum along an axis, ignoring any NaNs. When all-NaN slices are encountered a
RuntimeWarning
is raised and Nan is returned for that slice.Parameters: - a (array_like) – Array containing numbers whose minimum is desired. If a is not an array, a conversion is attempted.
- axis (int, optional) – Axis along which the minimum is computed. The default is to compute the minimum of the flattened array.
- out (ndarray, optional) –
Alternate output array in which to place the result. The default is
None
; if provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details.New in version 1.8.0.
- keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original a.
If the value is anything but the default, then keepdims will be passed through to the min method of sub-classes of ndarray. If the sub-classes methods does not implement keepdims any exceptions will be raised.
New in version 1.8.0.
Returns: nanmin – An array with the same shape as a, with the specified axis removed. If a is a 0-d array, or if axis is None, an ndarray scalar is returned. The same dtype as a is returned.
Return type: ndarray
See also
nanmax()
- The maximum value of an array along a given axis, ignoring any NaNs.
amin()
- The minimum value of an array along a given axis, propagating any NaNs.
fmin()
- Element-wise minimum of two arrays, ignoring any NaNs.
minimum()
- Element-wise minimum of two arrays, propagating any NaNs.
isnan()
- Shows which elements are Not a Number (NaN).
isfinite()
- Shows which elements are neither NaN nor infinity.
Notes
Numpy uses the IEEE Standard for Binary Floating-Point for Arithmetic (IEEE 754). This means that Not a Number is not equivalent to infinity. Positive infinity is treated as a very large number and negative infinity is treated as a very small (i.e. negative) number.
If the input has a integer type the function is equivalent to np.min.
Examples
>>> a = np.array([[1, 2], [3, np.nan]]) >>> np.nanmin(a) 1.0 >>> np.nanmin(a, axis=0) array([ 1., 2.]) >>> np.nanmin(a, axis=1) array([ 1., 3.])
When positive infinity and negative infinity are present:
>>> np.nanmin([1, 2, np.nan, np.inf]) 1.0 >>> np.nanmin([1, 2, np.nan, np.NINF]) -inf
-
dask.array.
nanprod
(a, axis=None, dtype=None, keepdims=False, split_every=None, out=None)¶ Return the product of array elements over a given axis treating Not a Numbers (NaNs) as zero.
One is returned for slices that are all-NaN or empty.
New in version 1.10.0.
Parameters: - a (array_like) – Array containing numbers whose sum is desired. If a is not an array, a conversion is attempted.
- axis (int, optional) – Axis along which the product is computed. The default is to compute the product of the flattened array.
- dtype (data-type, optional) – The type of the returned array and of the accumulator in which the elements are summed. By default, the dtype of a is used. An exception is when a has an integer type with less precision than the platform (u)intp. In that case, the default will be either (u)int32 or (u)int64 depending on whether the platform is 32 or 64 bits. For inexact inputs, dtype must be inexact.
- out (ndarray, optional) – Alternate output array in which to place the result. The default
is
None
. If provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details. The casting of NaN to integer can yield unexpected results. - keepdims (bool, optional) – If True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
Returns: y
Return type: ndarray or numpy scalar
See also
numpy.prod()
- Product across array propagating NaNs.
isnan()
- Show which elements are NaN.
Notes
Numpy integer arithmetic is modular. If the size of a product exceeds the size of an integer accumulator, its value will wrap around and the result will be incorrect. Specifying
dtype=double
can alleviate that problem.Examples
>>> np.nanprod(1) 1 >>> np.nanprod([1]) 1 >>> np.nanprod([1, np.nan]) 1.0 >>> a = np.array([[1, 2], [3, np.nan]]) >>> np.nanprod(a) 6.0 >>> np.nanprod(a, axis=0) array([ 3., 2.])
-
dask.array.
nanstd
(a, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶ Compute the standard deviation along the specified axis, while ignoring NaNs.
Returns the standard deviation, a measure of the spread of a distribution, of the non-NaN array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.
For all-NaN slices or slices with zero degrees of freedom, NaN is returned and a RuntimeWarning is raised.
New in version 1.8.0.
Parameters: - a (array_like) – Calculate the standard deviation of the non-NaN values.
- axis (int, optional) – Axis along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.
- dtype (dtype, optional) – Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output but the type (of the calculated values) will be cast if necessary.
- ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations
is
N - ddof
, whereN
represents the number of non-NaN elements. By default ddof is zero. - keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original a.
If this value is anything but the default it is passed through as-is to the relevant functions of the sub-classes. If these functions do not have a keepdims kwarg, a RuntimeError will be raised.
Returns: standard_deviation – If out is None, return a new array containing the standard deviation, otherwise return a reference to the output array. If ddof is >= the number of non-NaN elements in a slice or the slice contains only NaNs, then the result for that slice is NaN.
Return type: ndarray, see dtype parameter above.
Notes
The standard deviation is the square root of the average of the squared deviations from the mean:
std = sqrt(mean(abs(x - x.mean())**2))
.The average squared deviation is normally calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of the infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even withddof=1
, it will not be an unbiased estimate of the standard deviation per se.Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.
For floating-point input, the std is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.
Examples
>>> a = np.array([[1, np.nan], [3, 4]]) >>> np.nanstd(a) 1.247219128924647 >>> np.nanstd(a, axis=0) array([ 1., 0.]) >>> np.nanstd(a, axis=1) array([ 0., 0.5])
-
dask.array.
nansum
(a, axis=None, dtype=None, keepdims=False, split_every=None, out=None)¶ Return the sum of array elements over a given axis treating Not a Numbers (NaNs) as zero.
In Numpy versions <= 1.8 Nan is returned for slices that are all-NaN or empty. In later versions zero is returned.
Parameters: - a (array_like) – Array containing numbers whose sum is desired. If a is not an array, a conversion is attempted.
- axis (int, optional) – Axis along which the sum is computed. The default is to compute the sum of the flattened array.
- dtype (data-type, optional) –
The type of the returned array and of the accumulator in which the elements are summed. By default, the dtype of a is used. An exception is when a has an integer type with less precision than the platform (u)intp. In that case, the default will be either (u)int32 or (u)int64 depending on whether the platform is 32 or 64 bits. For inexact inputs, dtype must be inexact.
New in version 1.8.0.
- out (ndarray, optional) –
Alternate output array in which to place the result. The default is
None
. If provided, it must have the same shape as the expected output, but the type will be cast if necessary. See doc.ufuncs for details. The casting of NaN to integer can yield unexpected results.New in version 1.8.0.
- keepdims – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original a.
Returns: y
Return type: ndarray or numpy scalar
See also
numpy.sum()
- Sum across array propagating NaNs.
isnan()
- Show which elements are NaN.
isfinite()
- Show which elements are not NaN or +/-inf.
Notes
If both positive and negative infinity are present, the sum will be Not A Number (NaN).
Numpy integer arithmetic is modular. If the size of a sum exceeds the size of an integer accumulator, its value will wrap around and the result will be incorrect. Specifying
dtype=double
can alleviate that problem.Examples
>>> np.nansum(1) 1 >>> np.nansum([1]) 1 >>> np.nansum([1, np.nan]) 1.0 >>> a = np.array([[1, 1], [1, np.nan]]) >>> np.nansum(a) 3.0 >>> np.nansum(a, axis=0) array([ 2., 1.]) >>> np.nansum([1, np.nan, np.inf]) inf >>> np.nansum([1, np.nan, np.NINF]) -inf >>> np.nansum([1, np.nan, np.inf, -np.inf]) # both +/- infinity present nan
-
dask.array.
nanvar
(a, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶ Compute the variance along the specified axis, while ignoring NaNs.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
For all-NaN slices or slices with zero degrees of freedom, NaN is returned and a RuntimeWarning is raised.
New in version 1.8.0.
Parameters: - a (array_like) – Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.
- axis (int, optional) – Axis along which the variance is computed. The default is to compute the variance of the flattened array.
- dtype (data-type, optional) – Type to use in computing the variance. For arrays of integer type the default is float32; for arrays of float types it is the same as the array type.
- out (ndarray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.
- ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of non-NaN elements. By default ddof is zero. - keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original a.
Returns: variance – If out is None, return a new array containing the variance, otherwise return a reference to the output array. If ddof is >= the number of non-NaN elements in a slice or the slice contains only NaNs, then the result for that slice is NaN.
Return type: ndarray, see dtype parameter above
See also
numpy.doc.ufuncs()
- Section “Output arguments”
Notes
The variance is the average of the squared deviations from the mean, i.e.,
var = mean(abs(x - x.mean())**2)
.The mean is normally calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.
For floating-point input, the variance is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the
dtype
keyword can alleviate this issue.For this function to work on sub-classes of ndarray, they must define sum with the kwarg keepdims
Examples
>>> a = np.array([[1, np.nan], [3, 4]]) >>> np.var(a) 1.5555555555555554 >>> np.nanvar(a, axis=0) array([ 1., 0.]) >>> np.nanvar(a, axis=1) array([ 0., 0.25])
-
dask.array.
nextafter
(x1, x2[, out])¶ Return the next floating-point value after x1 towards x2, element-wise.
Parameters: - x1 (array_like) – Values to find the next representable value of.
- x2 (array_like) – The direction where to look for the next representable value of x1.
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: out – The next representable values of x1 in the direction of x2.
Return type: Examples
>>> eps = np.finfo(np.float64).eps >>> np.nextafter(1, 2) == eps + 1 True >>> np.nextafter([1, 2], [2, 1]) == [eps + 1, 2 - eps] array([ True, True], dtype=bool)
-
dask.array.
nonzero
(a)¶ Return the indices of the elements that are non-zero.
Returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension. The values in a are always tested and returned in row-major, C-style order. The corresponding non-zero values can be obtained with:
a[nonzero(a)]
To group the indices by element, rather than dimension, use:
transpose(nonzero(a))
The result of this is always a 2-D array, with a row for each non-zero element.
Parameters: a (array_like) – Input array. Returns: tuple_of_arrays – Indices of elements that are non-zero. Return type: tuple See also
flatnonzero()
- Return indices that are non-zero in the flattened version of the input array.
ndarray.nonzero()
- Equivalent ndarray method.
count_nonzero()
- Counts the number of non-zero elements in the input array.
Examples
>>> x = np.eye(3) >>> x array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]) >>> np.nonzero(x) (array([0, 1, 2]), array([0, 1, 2]))
>>> x[np.nonzero(x)] array([ 1., 1., 1.]) >>> np.transpose(np.nonzero(x)) array([[0, 0], [1, 1], [2, 2]])
A common use for
nonzero
is to find the indices of an array, where a condition is True. Given an array a, the condition a > 3 is a boolean array and since False is interpreted as 0, np.nonzero(a > 3) yields the indices of the a where the condition is true.>>> a = np.array([[1,2,3],[4,5,6],[7,8,9]]) >>> a > 3 array([[False, False, False], [ True, True, True], [ True, True, True]], dtype=bool) >>> np.nonzero(a > 3) (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
The
nonzero
method of the boolean array can also be called.>>> (a > 3).nonzero() (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
-
dask.array.
notnull
(values)¶ pandas.notnull for dask arrays
-
dask.array.
ones
()¶ Blocked variant of ones
Follows the signature of ones exactly except that it also requires a keyword argument chunks=(...)
Original signature follows below.
Return a new array of given shape and type, filled with ones.
Parameters: - shape (int or sequence of ints) – Shape of the new array, e.g.,
(2, 3)
or2
. - dtype (data-type, optional) – The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64.
- order ({'C', 'F'}, optional) – Whether to store multidimensional data in C- or Fortran-contiguous (row- or column-wise) order in memory.
Returns: out – Array of ones with the given shape, dtype, and order.
Return type: ndarray
See also
Examples
>>> np.ones(5) array([ 1., 1., 1., 1., 1.])
>>> np.ones((5,), dtype=np.int) array([1, 1, 1, 1, 1])
>>> np.ones((2, 1)) array([[ 1.], [ 1.]])
>>> s = (2,2) >>> np.ones(s) array([[ 1., 1.], [ 1., 1.]])
- shape (int or sequence of ints) – Shape of the new array, e.g.,
-
dask.array.
ones_like
(a, dtype=None, chunks=None)¶ Return an array of ones with the same shape and type as a given array.
Parameters: - a (array_like) – The shape and data-type of a define these same attributes of the returned array.
- dtype (data-type, optional) – Overrides the data type of the result.
- chunks (sequence of ints) – The number of samples on each block. Note that the last block will have
fewer samples if
len(array) % chunks != 0
.
Returns: out – Array of ones with the same shape and type as a.
Return type: ndarray
See also
zeros_like()
- Return an array of zeros with shape and type of input.
empty_like()
- Return an empty array with shape and type of input.
zeros()
- Return a new array setting values to zero.
ones()
- Return a new array setting values to one.
empty()
- Return a new uninitialized array.
-
dask.array.
percentile
(a, q, interpolation='linear')¶ Approximate percentile of 1-D array
See numpy.percentile for more information
-
dask.array.
prod
(a, axis=None, dtype=None, keepdims=False, split_every=None, out=None)¶ Return the product of array elements over a given axis.
Parameters: - a (array_like) – Input data.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which a product is performed. The default, axis=None, will calculate the product of all the elements in the input array. If axis is negative it counts from the last to the first axis.
New in version 1.7.0.
If axis is a tuple of ints, a product is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.
- dtype (dtype, optional) – The type of the returned array, as well as of the accumulator in which the elements are multiplied. The dtype of a is used by default unless a has an integer dtype of less precision than the default platform integer. In that case, if a is signed then the platform integer is used while if a is unsigned then an unsigned integer of the same precision as the platform integer is used.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output, but the type of the output values will be cast if necessary.
- keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be passed through to the prod method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: product_along_axis – An array shaped as a but with the specified axis removed. Returns a reference to out if specified.
Return type: ndarray, see dtype parameter above.
See also
ndarray.prod()
- equivalent method
numpy.doc.ufuncs()
- Section “Output arguments”
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow. That means that, on a 32-bit platform:
>>> x = np.array([536870910, 536870910, 536870910, 536870910]) >>> np.prod(x) #random 16
The product of an empty array is the neutral element 1:
>>> np.prod([]) 1.0
Examples
By default, calculate the product of all elements:
>>> np.prod([1.,2.]) 2.0
Even when the input array is two-dimensional:
>>> np.prod([[1.,2.],[3.,4.]]) 24.0
But we can also specify the axis over which to multiply:
>>> np.prod([[1.,2.],[3.,4.]], axis=1) array([ 2., 12.])
If the type of x is unsigned, then the output type is the unsigned platform integer:
>>> x = np.array([1, 2, 3], dtype=np.uint8) >>> np.prod(x).dtype == np.uint True
If x is of a signed integer type, then the output type is the default platform integer:
>>> x = np.array([1, 2, 3], dtype=np.int8) >>> np.prod(x).dtype == np.int True
-
dask.array.
ptp
(a, axis=None)¶ Range of values (maximum - minimum) along an axis.
The name of the function comes from the acronym for ‘peak to peak’.
Parameters: - a (array_like) – Input values.
- axis (int, optional) – Axis along which to find the peaks. By default, flatten the array.
- out (array_like) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output, but the type of the output values will be cast if necessary.
Returns: ptp – A new array holding the result, unless out was specified, in which case a reference to out is returned.
Return type: ndarray
Examples
>>> x = np.arange(4).reshape((2,2)) >>> x array([[0, 1], [2, 3]])
>>> np.ptp(x, axis=0) array([2, 2])
>>> np.ptp(x, axis=1) array([1, 1])
-
dask.array.
rad2deg
(x[, out])¶ Convert angles from radians to degrees.
Parameters: - x (array_like) – Angle in radians.
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: y – The corresponding angle in degrees.
Return type: ndarray
See also
deg2rad()
- Convert angles from degrees to radians.
unwrap()
- Remove large jumps in angle by wrapping.
Notes
New in version 1.3.0.
rad2deg(x) is
180 * x / pi
.Examples
>>> np.rad2deg(np.pi/2) 90.0
-
dask.array.
radians
(x[, out])¶ Convert angles from degrees to radians.
Parameters: - x (array_like) – Input array in degrees.
- out (ndarray, optional) – Output array of same shape as x.
Returns: y – The corresponding radian values.
Return type: ndarray
See also
deg2rad()
- equivalent function
Examples
Convert a degree array to radians
>>> deg = np.arange(12.) * 30. >>> np.radians(deg) array([ 0. , 0.52359878, 1.04719755, 1.57079633, 2.0943951 , 2.61799388, 3.14159265, 3.66519143, 4.1887902 , 4.71238898, 5.23598776, 5.75958653])
>>> out = np.zeros((deg.shape)) >>> ret = np.radians(deg, out) >>> ret is out True
-
dask.array.
ravel
(array)¶ Return a contiguous flattened array.
A 1-D array, containing the elements of the input, is returned. A copy is made only if needed.
As of NumPy 1.10, the returned array will have the same type as the input array. (for example, a masked array will be returned for a masked array input)
Parameters: - a (array_like) – Input array. The elements in a are read in the order specified by order, and packed as a 1-D array.
- order ({'C','F', 'A', 'K'}, optional) – The elements of a are read using this index order. ‘C’ means to index the elements in row-major, C-style order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to index the elements in column-major, Fortran-style order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of axis indexing. ‘A’ means to read the elements in Fortran-like index order if a is Fortran contiguous in memory, C-like order otherwise. ‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used.
Returns: y – If a is a matrix, y is a 1-D ndarray, otherwise y is an array of the same subtype as a. The shape of the returned array is
(a.size,)
. Matrices are special cased for backward compatibility.Return type: See also
ndarray.flat()
- 1-D iterator over an array.
ndarray.flatten()
- 1-D array copy of the elements of an array in row-major order.
ndarray.reshape()
- Change the shape of an array without changing its data.
Notes
In row-major, C-style order, in two dimensions, the row index varies the slowest, and the column index the quickest. This can be generalized to multiple dimensions, where row-major order implies that the index along the first axis varies slowest, and the index along the last quickest. The opposite holds for column-major, Fortran-style index ordering.
When a view is desired in as many cases as possible,
arr.reshape(-1)
may be preferable.Examples
It is equivalent to
reshape(-1, order=order)
.>>> x = np.array([[1, 2, 3], [4, 5, 6]]) >>> print(np.ravel(x)) [1 2 3 4 5 6]
>>> print(x.reshape(-1)) [1 2 3 4 5 6]
>>> print(np.ravel(x, order='F')) [1 4 2 5 3 6]
When
order
is ‘A’, it will preserve the array’s ‘C’ or ‘F’ ordering:>>> print(np.ravel(x.T)) [1 4 2 5 3 6] >>> print(np.ravel(x.T, order='A')) [1 2 3 4 5 6]
When
order
is ‘K’, it will preserve orderings that are neither ‘C’ nor ‘F’, but won’t reverse axes:>>> a = np.arange(3)[::-1]; a array([2, 1, 0]) >>> a.ravel(order='C') array([2, 1, 0]) >>> a.ravel(order='K') array([2, 1, 0])
>>> a = np.arange(12).reshape(2,3,2).swapaxes(1,2); a array([[[ 0, 2, 4], [ 1, 3, 5]], [[ 6, 8, 10], [ 7, 9, 11]]]) >>> a.ravel(order='C') array([ 0, 2, 4, 1, 3, 5, 6, 8, 10, 7, 9, 11]) >>> a.ravel(order='K') array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
-
dask.array.
real
(*args, **kwargs)¶ Return the real part of the elements of the array.
Parameters: val (array_like) – Input array. Returns: out – Output array. If val is real, the type of val is used for the output. If val has complex elements, the returned type is float. Return type: ndarray Examples
>>> a = np.array([1+2j, 3+4j, 5+6j]) >>> a.real array([ 1., 3., 5.]) >>> a.real = 9 >>> a array([ 9.+2.j, 9.+4.j, 9.+6.j]) >>> a.real = np.array([9, 8, 7]) >>> a array([ 9.+2.j, 8.+4.j, 7.+6.j])
-
dask.array.
rechunk
(x, chunks, threshold=4, block_size_limit=100000000.0)¶ Convert blocks in dask array x for new chunks.
>>> import dask.array as da >>> a = np.random.uniform(0, 1, 7**4).reshape((7,) * 4) >>> x = da.from_array(a, chunks=((2, 3, 2),)*4) >>> x.chunks ((2, 3, 2), (2, 3, 2), (2, 3, 2), (2, 3, 2))
>>> y = rechunk(x, chunks=((2, 4, 1), (4, 2, 1), (4, 3), (7,))) >>> y.chunks ((2, 4, 1), (4, 2, 1), (4, 3), (7,))
chunks also accept dict arguments mapping axis to blockshape
>>> y = rechunk(x, chunks={1: 2}) # rechunk axis 1 with blockshape 2
Parameters:
-
dask.array.
repeat
(a, repeats, axis=None)¶ Repeat elements of an array.
Parameters: - a (array_like) – Input array.
- repeats (int or array of ints) – The number of repetitions for each element. repeats is broadcasted to fit the shape of the given axis.
- axis (int, optional) – The axis along which to repeat values. By default, use the flattened input array, and return a flat output array.
Returns: repeated_array – Output array which has the same shape as a, except along the given axis.
Return type: ndarray
See also
tile()
- Tile an array.
Examples
>>> x = np.array([[1,2],[3,4]]) >>> np.repeat(x, 2) array([1, 1, 2, 2, 3, 3, 4, 4]) >>> np.repeat(x, 3, axis=1) array([[1, 1, 1, 2, 2, 2], [3, 3, 3, 4, 4, 4]]) >>> np.repeat(x, [1, 2], axis=0) array([[1, 2], [3, 4], [3, 4]])
-
dask.array.
reshape
(x, shape)¶ Reshape array to new shape
This is a parallelized version of the
np.reshape
function with the following limitations:- It assumes that the array is stored in C-order
- It only allows for reshapings that collapse or merge dimensions like
(1, 2, 3, 4) -> (1, 6, 4)
or(64,) -> (4, 4, 4)
When communication is necessary this algorithm depends on the logic within rechunk. It endeavors to keep chunk sizes roughly the same when possible.
See also
-
dask.array.
result_type
(*arrays_and_dtypes)¶ Returns the type that results from applying the NumPy type promotion rules to the arguments.
Type promotion in NumPy works similarly to the rules in languages like C++, with some slight differences. When both scalars and arrays are used, the array’s type takes precedence and the actual value of the scalar is taken into account.
For example, calculating 3*a, where a is an array of 32-bit floats, intuitively should result in a 32-bit float output. If the 3 is a 32-bit integer, the NumPy rules indicate it can’t convert losslessly into a 32-bit float, so a 64-bit float should be the result type. By examining the value of the constant, ‘3’, we see that it fits in an 8-bit integer, which can be cast losslessly into the 32-bit float.
Parameters: arrays_and_dtypes (list of arrays and dtypes) – The operands of some operation whose result type is needed. Returns: out – The result type. Return type: dtype See also
dtype()
,promote_types()
,min_scalar_type()
,can_cast()
Notes
New in version 1.6.0.
The specific algorithm used is as follows.
Categories are determined by first checking which of boolean, integer (int/uint), or floating point (float/complex) the maximum kind of all the arrays and the scalars are.
If there are only scalars or the maximum category of the scalars is higher than the maximum category of the arrays, the data types are combined with
promote_types()
to produce the return value.Otherwise, min_scalar_type is called on each array, and the resulting data types are all combined with
promote_types()
to produce the return value.The set of int values is not a subset of the uint values for types with the same number of bits, something not reflected in
min_scalar_type()
, but handled as a special case in result_type.Examples
>>> np.result_type(3, np.arange(7, dtype='i1')) dtype('int8')
>>> np.result_type('i4', 'c8') dtype('complex128')
>>> np.result_type(3.0, -2) dtype('float64')
-
dask.array.
rint
(x[, out])¶ Round elements of the array to the nearest integer.
Parameters: x (array_like) – Input array. Returns: out – Output array is same shape and type as x. Return type: ndarray or scalar Examples
>>> a = np.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0]) >>> np.rint(a) array([-2., -2., -0., 0., 2., 2., 2.])
-
dask.array.
roll
(array, shift, axis=None)¶ Roll array elements along a given axis.
Elements that roll beyond the last position are re-introduced at the first.
Parameters: - a (array_like) – Input array.
- shift (int) – The number of places by which elements are shifted.
- axis (int, optional) – The axis along which elements are shifted. By default, the array is flattened before shifting, after which the original shape is restored.
Returns: res – Output array, with the same shape as a.
Return type: ndarray
See also
rollaxis()
- Roll the specified axis backwards, until it lies in a given position.
Examples
>>> x = np.arange(10) >>> np.roll(x, 2) array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])
>>> x2 = np.reshape(x, (2,5)) >>> x2 array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) >>> np.roll(x2, 1) array([[9, 0, 1, 2, 3], [4, 5, 6, 7, 8]]) >>> np.roll(x2, 1, axis=0) array([[5, 6, 7, 8, 9], [0, 1, 2, 3, 4]]) >>> np.roll(x2, 1, axis=1) array([[4, 0, 1, 2, 3], [9, 5, 6, 7, 8]])
-
dask.array.
round
(a, decimals=0)¶ Round an array to the given number of decimals.
Refer to around for full documentation.
See also
around()
- equivalent function
-
dask.array.
sign
(x[, out])¶ Returns an element-wise indication of the sign of a number.
The sign function returns
-1 if x < 0, 0 if x==0, 1 if x > 0
. nan is returned for nan inputs.For complex inputs, the sign function returns
sign(x.real) + 0j if x.real != 0 else sign(x.imag) + 0j
.complex(nan, 0) is returned for complex nan inputs.
Parameters: x (array_like) – Input values. Returns: y – The sign of x. Return type: ndarray Notes
There is more than one definition of sign in common use for complex numbers. The definition used here is equivalent to \(x/\sqrt{x*x}\) which is different from a common alternative, \(x/|x|\).
Examples
>>> np.sign([-5., 4.5]) array([-1., 1.]) >>> np.sign(0) 0 >>> np.sign(5-2j) (1+0j)
-
dask.array.
signbit
(x[, out])¶ Returns element-wise True where signbit is set (less than zero).
Parameters: - x (array_like) – The input value(s).
- out (ndarray, optional) – Array into which the output is placed. Its type is preserved and it must be of the right shape to hold the output. See doc.ufuncs.
Returns: result – Output array, or reference to out if that was supplied.
Return type: ndarray of bool
Examples
>>> np.signbit(-1.2) True >>> np.signbit(np.array([1, -2.3, 2.1])) array([False, True, False], dtype=bool)
-
dask.array.
sin
(x[, out])¶ Trigonometric sine, element-wise.
Parameters: x (array_like) – Angle, in radians (\(2 \pi\) rad equals 360 degrees). Returns: y – The sine of each element of x. Return type: array_like Notes
The sine is one of the fundamental functions of trigonometry (the mathematical study of triangles). Consider a circle of radius 1 centered on the origin. A ray comes in from the \(+x\) axis, makes an angle at the origin (measured counter-clockwise from that axis), and departs from the origin. The \(y\) coordinate of the outgoing ray’s intersection with the unit circle is the sine of that angle. It ranges from -1 for \(x=3\pi / 2\) to +1 for \(\pi / 2.\) The function has zeroes where the angle is a multiple of \(\pi\). Sines of angles between \(\pi\) and \(2\pi\) are negative. The numerous properties of the sine and related functions are included in any standard trigonometry text.
Examples
Print sine of one angle:
>>> np.sin(np.pi/2.) 1.0
Print sines of an array of angles given in degrees:
>>> np.sin(np.array((0., 30., 45., 60., 90.)) * np.pi / 180. ) array([ 0. , 0.5 , 0.70710678, 0.8660254 , 1. ])
Plot the sine function:
>>> import matplotlib.pylab as plt >>> x = np.linspace(-np.pi, np.pi, 201) >>> plt.plot(x, np.sin(x)) >>> plt.xlabel('Angle [rad]') >>> plt.ylabel('sin(x)') >>> plt.axis('tight') >>> plt.show()
-
dask.array.
sinh
(x[, out])¶ Hyperbolic sine, element-wise.
Equivalent to
1/2 * (np.exp(x) - np.exp(-x))
or-1j * np.sin(1j*x)
.Parameters: - x (array_like) – Input array.
- out (ndarray, optional) – Output array of same shape as x.
Returns: y – The corresponding hyperbolic sine values.
Return type: ndarray
Raises: ValueError: invalid return array shape – if out is provided and out.shape != x.shape (See Examples)
Notes
If out is provided, the function writes the result into it, and returns a reference to out. (See Examples)
References
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. New York, NY: Dover, 1972, pg. 83.
Examples
>>> np.sinh(0) 0.0 >>> np.sinh(np.pi*1j/2) 1j >>> np.sinh(np.pi*1j) # (exact value is 0) 1.2246063538223773e-016j >>> # Discrepancy due to vagaries of floating point arithmetic.
>>> # Example of providing the optional output parameter >>> out2 = np.sinh([0.1], out1) >>> out2 is out1 True
>>> # Example of ValueError due to provision of shape mis-matched `out` >>> np.sinh(np.zeros((3,3)),np.zeros((2,2))) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid return array shape
-
dask.array.
sqrt
(x[, out])¶ Return the positive square-root of an array, element-wise.
Parameters: - x (array_like) – The values whose square-roots are required.
- out (ndarray, optional) – Alternate array object in which to put the result; if provided, it must have the same shape as x
Returns: y – An array of the same shape as x, containing the positive square-root of each element in x. If any element in x is complex, a complex array is returned (and the square-roots of negative reals are calculated). If all of the elements in x are real, so is y, with negative elements returning
nan
. If out was provided, y is a reference to it.Return type: ndarray
See also
lib.scimath.sqrt()
- A version which returns complex numbers when given negative reals.
Notes
sqrt has–consistent with common convention–as its branch cut the real “interval” [-inf, 0), and is continuous from above on it. A branch cut is a curve in the complex plane across which a given complex function fails to be continuous.
Examples
>>> np.sqrt([1,4,9]) array([ 1., 2., 3.])
>>> np.sqrt([4, -1, -3+4J]) array([ 2.+0.j, 0.+1.j, 1.+2.j])
>>> np.sqrt([4, -1, numpy.inf]) array([ 2., NaN, Inf])
-
dask.array.
square
(x[, out])¶ Return the element-wise square of the input.
Parameters: x (array_like) – Input data. Returns: out – Element-wise x*x, of the same shape and dtype as x. Returns scalar if x is a scalar. Return type: ndarray See also
numpy.linalg.matrix_power()
,sqrt()
,power()
Examples
>>> np.square([-1j, 1]) array([-1.-0.j, 1.+0.j])
-
dask.array.
squeeze
(a, axis=None)¶ Remove single-dimensional entries from the shape of an array.
Parameters: - a (array_like) – Input data.
- axis (None or int or tuple of ints, optional) –
New in version 1.7.0.
Selects a subset of the single-dimensional entries in the shape. If an axis is selected with shape entry greater than one, an error is raised.
Returns: squeezed – The input array, but with all or a subset of the dimensions of length 1 removed. This is always a itself or a view into a.
Return type: ndarray
Examples
>>> x = np.array([[[0], [1], [2]]]) >>> x.shape (1, 3, 1) >>> np.squeeze(x).shape (3,) >>> np.squeeze(x, axis=(2,)).shape (1, 3)
-
dask.array.
stack
(seq, axis=0) Stack arrays along a new axis
Given a sequence of dask Arrays form a new dask Array by stacking them along a new dimension (axis=0 by default)
Examples
Create slices
>>> import dask.array as da >>> import numpy as np
>>> data = [from_array(np.ones((4, 4)), chunks=(2, 2)) ... for i in range(3)]
>>> x = da.stack(data, axis=0) >>> x.shape (3, 4, 4)
>>> da.stack(data, axis=1).shape (4, 3, 4)
>>> da.stack(data, axis=-1).shape (4, 4, 3)
Result is a new dask Array
See also
-
dask.array.
std
(a, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶ Compute the standard deviation along the specified axis.
Returns the standard deviation, a measure of the spread of a distribution, of the array elements. The standard deviation is computed for the flattened array by default, otherwise over the specified axis.
Parameters: - a (array_like) – Calculate the standard deviation of these values.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which the standard deviation is computed. The default is to compute the standard deviation of the flattened array.
If this is a tuple of ints, a standard deviation is performed over multiple axes, instead of a single axis or all the axes as before.
- dtype (dtype, optional) – Type to use in computing the standard deviation. For arrays of integer type the default is float64, for arrays of float types it is the same as the array type.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output but the type (of the calculated values) will be cast if necessary.
- ddof (int, optional) – Means Delta Degrees of Freedom. The divisor used in calculations
is
N - ddof
, whereN
represents the number of elements. By default ddof is zero. - keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the std method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: standard_deviation – If out is None, return a new array containing the standard deviation, otherwise return a reference to the output array.
Return type: ndarray, see dtype parameter above.
Notes
The standard deviation is the square root of the average of the squared deviations from the mean, i.e.,
std = sqrt(mean(abs(x - x.mean())**2))
.The average squared deviation is normally calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of the infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables. The standard deviation computed in this function is the square root of the estimated variance, so even withddof=1
, it will not be an unbiased estimate of the standard deviation per se.Note that, for complex numbers, std takes the absolute value before squaring, so that the result is always real and nonnegative.
For floating-point input, the std is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the dtype keyword can alleviate this issue.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> np.std(a) 1.1180339887498949 >>> np.std(a, axis=0) array([ 1., 1.]) >>> np.std(a, axis=1) array([ 0.5, 0.5])
In single precision, std() can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.std(a) 0.45000005
Computing the standard deviation in float64 is more accurate:
>>> np.std(a, dtype=np.float64) 0.44999999925494177
-
dask.array.
sum
(a, axis=None, dtype=None, keepdims=False, split_every=None, out=None)¶ Sum of array elements over a given axis.
Parameters: - a (array_like) – Elements to sum.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which a sum is performed. The default, axis=None, will sum all of the elements of the input array. If axis is negative it counts from the last to the first axis.
New in version 1.7.0.
If axis is a tuple of ints, a sum is performed on all of the axes specified in the tuple instead of a single axis or all the axes as before.
- dtype (dtype, optional) – The type of the returned array and of the accumulator in which the elements are summed. The dtype of a is used by default unless a has an integer dtype of less precision than the default platform integer. In that case, if a is signed then the platform integer is used while if a is unsigned then an unsigned integer of the same precision as the platform integer is used.
- out (ndarray, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output, but the type of the output values will be cast if necessary.
- keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the sum method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: sum_along_axis – An array with the same shape as a, with the specified axis removed. If a is a 0-d array, or if axis is None, a scalar is returned. If an output array is specified, a reference to out is returned.
Return type: ndarray
See also
ndarray.sum()
- Equivalent method.
cumsum()
- Cumulative sum of array elements.
trapz()
- Integration of array values using the composite trapezoidal rule.
mean()
,average()
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
The sum of an empty array is the neutral element 0:
>>> np.sum([]) 0.0
Examples
>>> np.sum([0.5, 1.5]) 2.0 >>> np.sum([0.5, 0.7, 0.2, 1.5], dtype=np.int32) 1 >>> np.sum([[0, 1], [0, 5]]) 6 >>> np.sum([[0, 1], [0, 5]], axis=0) array([0, 6]) >>> np.sum([[0, 1], [0, 5]], axis=1) array([1, 5])
If the accumulator is too small, overflow occurs:
>>> np.ones(128, dtype=np.int8).sum(dtype=np.int8) -128
-
dask.array.
take
(a, indices, axis=0)¶ Take elements from an array along an axis.
This function does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis.
Parameters: - a (array_like) – The source array.
- indices (array_like) –
The indices of the values to extract.
New in version 1.8.0.
Also allow scalars for indices.
- axis (int, optional) – The axis over which to select values. By default, the flattened input array is used.
- out (ndarray, optional) – If provided, the result will be placed in this array. It should be of the appropriate shape and dtype.
- mode ({'raise', 'wrap', 'clip'}, optional) –
Specifies how out-of-bounds indices will behave.
- ‘raise’ – raise an error (default)
- ‘wrap’ – wrap around
- ‘clip’ – clip to the range
‘clip’ mode means that all indices that are too large are replaced by the index that addresses the last element along that axis. Note that this disables indexing with negative numbers.
Returns: subarray – The returned array has the same type as a.
Return type: ndarray
See also
compress()
- Take elements using a boolean mask
ndarray.take()
- equivalent method
Examples
>>> a = [4, 3, 5, 7, 6, 8] >>> indices = [0, 1, 4] >>> np.take(a, indices) array([4, 3, 6])
In this example if a is an ndarray, “fancy” indexing can be used.
>>> a = np.array(a) >>> a[indices] array([4, 3, 6])
If indices is not one dimensional, the output also has these dimensions.
>>> np.take(a, [[0, 1], [2, 3]]) array([[4, 3], [5, 7]])
-
dask.array.
tan
(x[, out])¶ Compute tangent element-wise.
Equivalent to
np.sin(x)/np.cos(x)
element-wise.Parameters: - x (array_like) – Input array.
- out (ndarray, optional) – Output array of same shape as x.
Returns: y – The corresponding tangent values.
Return type: ndarray
Raises: ValueError: invalid return array shape – if out is provided and out.shape != x.shape (See Examples)
Notes
If out is provided, the function writes the result into it, and returns a reference to out. (See Examples)
References
M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. New York, NY: Dover, 1972.
Examples
>>> from math import pi >>> np.tan(np.array([-pi,pi/2,pi])) array([ 1.22460635e-16, 1.63317787e+16, -1.22460635e-16]) >>> >>> # Example of providing the optional output parameter illustrating >>> # that what is returned is a reference to said parameter >>> out2 = np.cos([0.1], out1) >>> out2 is out1 True >>> >>> # Example of ValueError due to provision of shape mis-matched `out` >>> np.cos(np.zeros((3,3)),np.zeros((2,2))) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid return array shape
-
dask.array.
tanh
(x[, out])¶ Compute hyperbolic tangent element-wise.
Equivalent to
np.sinh(x)/np.cosh(x)
or-1j * np.tan(1j*x)
.Parameters: - x (array_like) – Input array.
- out (ndarray, optional) – Output array of same shape as x.
Returns: y – The corresponding hyperbolic tangent values.
Return type: ndarray
Raises: ValueError: invalid return array shape – if out is provided and out.shape != x.shape (See Examples)
Notes
If out is provided, the function writes the result into it, and returns a reference to out. (See Examples)
References
[1] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions. New York, NY: Dover, 1972, pg. 83. http://www.math.sfu.ca/~cbm/aands/ [2] Wikipedia, “Hyperbolic function”, http://en.wikipedia.org/wiki/Hyperbolic_function Examples
>>> np.tanh((0, np.pi*1j, np.pi*1j/2)) array([ 0. +0.00000000e+00j, 0. -1.22460635e-16j, 0. +1.63317787e+16j])
>>> # Example of providing the optional output parameter illustrating >>> # that what is returned is a reference to said parameter >>> out2 = np.tanh([0.1], out1) >>> out2 is out1 True
>>> # Example of ValueError due to provision of shape mis-matched `out` >>> np.tanh(np.zeros((3,3)),np.zeros((2,2))) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid return array shape
-
dask.array.
tensordot
(lhs, rhs, axes=2)¶ Compute tensor dot product along specified axes for arrays >= 1-D.
Given two tensors (arrays of dimension greater than or equal to one), a and b, and an array_like object containing two array_like objects,
(a_axes, b_axes)
, sum the products of a‘s and b‘s elements (components) over the axes specified bya_axes
andb_axes
. The third argument can be a single non-negative integer_like scalar,N
; if it is such, then the lastN
dimensions of a and the firstN
dimensions of b are summed over.Parameters: - b (a,) – Tensors to “dot”.
- axes (int or (2,) array_like) –
- integer_like If an int N, sum over the last N axes of a and the first N axes of b in order. The sizes of the corresponding axes must match.
- (2,) array_like Or, a list of axes to be summed over, first sequence applying to a, second to b. Both elements array_like must be of the same length.
See also
dot()
,einsum()
Notes
- Three common use cases are:
axes = 0
: tensor product $aotimes b$axes = 1
: tensor dot product $acdot b$axes = 2
: (default) tensor double contraction $a:b$
When axes is integer_like, the sequence for evaluation will be: first the -Nth axis in a and 0th axis in b, and the -1th axis in a and Nth axis in b last.
When there is more than one axis to sum over - and they are not the last (first) axes of a (b) - the argument axes should consist of two sequences of the same length, with the first axis to sum over given first in both sequences, the second axis second, and so forth.
Examples
A “traditional” example:
>>> a = np.arange(60.).reshape(3,4,5) >>> b = np.arange(24.).reshape(4,3,2) >>> c = np.tensordot(a,b, axes=([1,0],[0,1])) >>> c.shape (5, 2) >>> c array([[ 4400., 4730.], [ 4532., 4874.], [ 4664., 5018.], [ 4796., 5162.], [ 4928., 5306.]]) >>> # A slower but equivalent way of computing the same... >>> d = np.zeros((5,2)) >>> for i in range(5): ... for j in range(2): ... for k in range(3): ... for n in range(4): ... d[i,j] += a[k,n,i] * b[n,k,j] >>> c == d array([[ True, True], [ True, True], [ True, True], [ True, True], [ True, True]], dtype=bool)
An extended example taking advantage of the overloading of + and *:
>>> a = np.array(range(1, 9)) >>> a.shape = (2, 2, 2) >>> A = np.array(('a', 'b', 'c', 'd'), dtype=object) >>> A.shape = (2, 2) >>> a; A array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]]) array([[a, b], [c, d]], dtype=object)
>>> np.tensordot(a, A) # third argument default is 2 for double-contraction array([abbcccdddd, aaaaabbbbbbcccccccdddddddd], dtype=object)
>>> np.tensordot(a, A, 1) array([[[acc, bdd], [aaacccc, bbbdddd]], [[aaaaacccccc, bbbbbdddddd], [aaaaaaacccccccc, bbbbbbbdddddddd]]], dtype=object)
>>> np.tensordot(a, A, 0) # tensor product (result too long to incl.) array([[[[[a, b], [c, d]], ...
>>> np.tensordot(a, A, (0, 1)) array([[[abbbbb, cddddd], [aabbbbbb, ccdddddd]], [[aaabbbbbbb, cccddddddd], [aaaabbbbbbbb, ccccdddddddd]]], dtype=object)
>>> np.tensordot(a, A, (2, 1)) array([[[abb, cdd], [aaabbbb, cccdddd]], [[aaaaabbbbbb, cccccdddddd], [aaaaaaabbbbbbbb, cccccccdddddddd]]], dtype=object)
>>> np.tensordot(a, A, ((0, 1), (0, 1))) array([abbbcccccddddddd, aabbbbccccccdddddddd], dtype=object)
>>> np.tensordot(a, A, ((2, 1), (1, 0))) array([acccbbdddd, aaaaacccccccbbbbbbdddddddd], dtype=object)
-
dask.array.
tile
(A, reps)¶ Construct an array by repeating A the number of times given by reps.
If reps has length
d
, the result will have dimension ofmax(d, A.ndim)
.If
A.ndim < d
, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication, or shape (1, 1, 3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function.If
A.ndim > d
, reps is promoted to A.ndim by pre-pending 1’s to it. Thus for an A of shape (2, 3, 4, 5), a reps of (2, 2) is treated as (1, 1, 2, 2).Note : Although tile may be used for broadcasting, it is strongly recommended to use numpy’s broadcasting operations and functions.
Parameters: - A (array_like) – The input array.
- reps (array_like) – The number of repetitions of A along each axis.
Returns: c – The tiled output array.
Return type: ndarray
See also
repeat()
- Repeat elements of an array.
broadcast_to()
- Broadcast an array to a new shape
Examples
>>> a = np.array([0, 1, 2]) >>> np.tile(a, 2) array([0, 1, 2, 0, 1, 2]) >>> np.tile(a, (2, 2)) array([[0, 1, 2, 0, 1, 2], [0, 1, 2, 0, 1, 2]]) >>> np.tile(a, (2, 1, 2)) array([[[0, 1, 2, 0, 1, 2]], [[0, 1, 2, 0, 1, 2]]])
>>> b = np.array([[1, 2], [3, 4]]) >>> np.tile(b, 2) array([[1, 2, 1, 2], [3, 4, 3, 4]]) >>> np.tile(b, (2, 1)) array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> c = np.array([1,2,3,4]) >>> np.tile(c,(4,1)) array([[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]])
-
dask.array.
topk
(k, x) The top k elements of an array
Returns the k greatest elements of the array in sorted order. Only works on arrays of a single dimension.
This assumes that
k
is small. All results will be returned in a single chunk.Examples
>>> x = np.array([5, 1, 3, 6]) >>> d = from_array(x, chunks=2) >>> d.topk(2).compute() array([6, 5])
-
dask.array.
transpose
(a, axes=None)¶ Permute the dimensions of an array.
Parameters: - a (array_like) – Input array.
- axes (list of ints, optional) – By default, reverse the dimensions, otherwise permute the axes according to the values given.
Returns: p – a with its axes permuted. A view is returned whenever possible.
Return type: ndarray
See also
moveaxis()
,argsort()
Notes
Use transpose(a, argsort(axes)) to invert the transposition of tensors when using the axes keyword argument.
Transposing a 1-D array returns an unchanged view of the original array.
Examples
>>> x = np.arange(4).reshape((2,2)) >>> x array([[0, 1], [2, 3]])
>>> np.transpose(x) array([[0, 2], [1, 3]])
>>> x = np.ones((1, 2, 3)) >>> np.transpose(x, (1, 0, 2)).shape (2, 1, 3)
-
dask.array.
tril
(m, k=0)¶ Lower triangle of an array with elements above the k-th diagonal zeroed.
Parameters: - m (array_like, shape (M, M)) – Input array.
- k (int, optional) – Diagonal above which to zero elements. k = 0 (the default) is the main diagonal, k < 0 is below it and k > 0 is above.
Returns: tril – Lower triangle of m, of same shape and data-type as m.
Return type: ndarray, shape (M, M)
See also
triu()
- upper triangle of an array
-
dask.array.
triu
(m, k=0)¶ Upper triangle of an array with elements above the k-th diagonal zeroed.
Parameters: - m (array_like, shape (M, N)) – Input array.
- k (int, optional) – Diagonal above which to zero elements. k = 0 (the default) is the main diagonal, k < 0 is below it and k > 0 is above.
Returns: triu – Upper triangle of m, of same shape and data-type as m.
Return type: ndarray, shape (M, N)
See also
tril()
- lower triangle of an array
-
dask.array.
trunc
(x[, out])¶ Return the truncated value of the input, element-wise.
The truncated value of the scalar x is the nearest integer i which is closer to zero than x is. In short, the fractional part of the signed number x is discarded.
Parameters: x (array_like) – Input data. Returns: y – The truncated value of each element in x. Return type: ndarray or scalar Notes
New in version 1.3.0.
Examples
>>> a = np.array([-1.7, -1.5, -0.2, 0.2, 1.5, 1.7, 2.0]) >>> np.trunc(a) array([-1., -1., -0., 0., 1., 1., 2.])
-
dask.array.
unique
(x)¶ Find the unique elements of an array.
Returns the sorted unique elements of an array. There are three optional outputs in addition to the unique elements: the indices of the input array that give the unique values, the indices of the unique array that reconstruct the input array, and the number of times each unique value comes up in the input array.
Parameters: - ar (array_like) – Input array. This will be flattened if it is not already 1-D.
- return_index (bool, optional) – If True, also return the indices of ar that result in the unique array.
- return_inverse (bool, optional) – If True, also return the indices of the unique array that can be used to reconstruct ar.
- return_counts (bool, optional) –
If True, also return the number of times each unique value comes up in ar.
New in version 1.9.0.
Returns: unique (ndarray) – The sorted unique values.
unique_indices (ndarray, optional) – The indices of the first occurrences of the unique values in the (flattened) original array. Only provided if return_index is True.
unique_inverse (ndarray, optional) – The indices to reconstruct the (flattened) original array from the unique array. Only provided if return_inverse is True.
unique_counts (ndarray, optional) – The number of times each of the unique values comes up in the original array. Only provided if return_counts is True.
New in version 1.9.0.
See also
numpy.lib.arraysetops()
- Module with a number of other functions for performing set operations on arrays.
Examples
>>> np.unique([1, 1, 2, 2, 3, 3]) array([1, 2, 3]) >>> a = np.array([[1, 1], [2, 3]]) >>> np.unique(a) array([1, 2, 3])
Return the indices of the original array that give the unique values:
>>> a = np.array(['a', 'b', 'b', 'c', 'a']) >>> u, indices = np.unique(a, return_index=True) >>> u array(['a', 'b', 'c'], dtype='|S1') >>> indices array([0, 1, 3]) >>> a[indices] array(['a', 'b', 'c'], dtype='|S1')
Reconstruct the input array from the unique values:
>>> a = np.array([1, 2, 6, 4, 2, 3, 2]) >>> u, indices = np.unique(a, return_inverse=True) >>> u array([1, 2, 3, 4, 6]) >>> indices array([0, 1, 4, 3, 1, 2, 1]) >>> u[indices] array([1, 2, 6, 4, 2, 3, 2])
-
dask.array.
var
(a, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶ Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
Parameters: - a (array_like) – Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.
- axis (None or int or tuple of ints, optional) –
Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.
If this is a tuple of ints, a variance is performed over multiple axes, instead of a single axis or all the axes as before.
- dtype (data-type, optional) – Type to use in computing the variance. For arrays of integer type the default is float32; for arrays of float types it is the same as the array type.
- out (ndarray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.
- ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of elements. By default ddof is zero. - keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original arr.
If the default value is passed, then keepdims will not be passed through to the var method of sub-classes of ndarray, however any non-default value will be. If the sub-classes sum method does not implement keepdims any exceptions will be raised.
Returns: variance – If
out=None
, returns a new array containing the variance; otherwise, a reference to the output array is returned.Return type: ndarray, see dtype parameter above
Notes
The variance is the average of the squared deviations from the mean, i.e.,
var = mean(abs(x - x.mean())**2)
.The mean is normally calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.
For floating-point input, the variance is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the
dtype
keyword can alleviate this issue.Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> np.var(a) 1.25 >>> np.var(a, axis=0) array([ 1., 1.]) >>> np.var(a, axis=1) array([ 0.25, 0.25])
In single precision, var() can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.var(a) 0.20250003
Computing the variance in float64 is more accurate:
>>> np.var(a, dtype=np.float64) 0.20249999932944759 >>> ((1-0.55)**2 + (0.1-0.55)**2)/2 0.2025
-
dask.array.
vnorm
(a, ord=None, axis=None, dtype=None, keepdims=False, split_every=None, out=None)¶ Vector norm
See np.linalg.norm
-
dask.array.
vstack
(tup)¶ Stack arrays in sequence vertically (row wise).
Take a sequence of arrays and stack them vertically to make a single array. Rebuild arrays divided by vsplit.
Parameters: tup (sequence of ndarrays) – Tuple containing arrays to be stacked. The arrays must have the same shape along all but the first axis. Returns: stacked – The array formed by stacking the given arrays. Return type: ndarray See also
stack()
- Join a sequence of arrays along a new axis.
hstack()
- Stack arrays in sequence horizontally (column wise).
dstack()
- Stack arrays in sequence depth wise (along third dimension).
concatenate()
- Join a sequence of arrays along an existing axis.
vsplit()
- Split array into a list of multiple sub-arrays vertically.
Notes
Equivalent to
np.concatenate(tup, axis=0)
if tup contains arrays that are at least 2-dimensional.Examples
>>> a = np.array([1, 2, 3]) >>> b = np.array([2, 3, 4]) >>> np.vstack((a,b)) array([[1, 2, 3], [2, 3, 4]])
>>> a = np.array([[1], [2], [3]]) >>> b = np.array([[2], [3], [4]]) >>> np.vstack((a,b)) array([[1], [2], [3], [2], [3], [4]])
-
dask.array.
where
(condition[, x, y])¶ Return elements, either from x or y, depending on condition.
If only condition is given, return
condition.nonzero()
.Parameters: - condition (array_like, bool) – When True, yield x, otherwise yield y.
- y (x,) – Values from which to choose. x and y need to have the same shape as condition.
Returns: out – If both x and y are specified, the output array contains elements of x where condition is True, and elements from y elsewhere.
If only condition is given, return the tuple
condition.nonzero()
, the indices where condition is True.Return type: ndarray or tuple of ndarrays
Notes
If x and y are given and input arrays are 1-D, where is equivalent to:
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
Examples
>>> np.where([[True, False], [True, True]], ... [[1, 2], [3, 4]], ... [[9, 8], [7, 6]]) array([[1, 8], [3, 4]])
>>> np.where([[0, 1], [1, 0]]) (array([0, 1]), array([1, 0]))
>>> x = np.arange(9.).reshape(3, 3) >>> np.where( x > 5 ) (array([2, 2, 2]), array([0, 1, 2])) >>> x[np.where( x > 3.0 )] # Note: result is 1D. array([ 4., 5., 6., 7., 8.]) >>> np.where(x < 5, x, -1) # Note: broadcasting. array([[ 0., 1., 2.], [ 3., 4., -1.], [-1., -1., -1.]])
Find the indices of elements of x that are in goodvalues.
>>> goodvalues = [3, 4, 7] >>> ix = np.in1d(x.ravel(), goodvalues).reshape(x.shape) >>> ix array([[False, False, False], [ True, True, False], [False, True, False]], dtype=bool) >>> np.where(ix) (array([1, 1, 2]), array([0, 1, 1]))
-
dask.array.
zeros
()¶ Blocked variant of zeros
Follows the signature of zeros exactly except that it also requires a keyword argument chunks=(...)
Original signature follows below. zeros(shape, dtype=float, order=’C’)
Return a new array of given shape and type, filled with zeros.
Parameters: - shape (int or sequence of ints) – Shape of the new array, e.g.,
(2, 3)
or2
. - dtype (data-type, optional) – The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64.
- order ({'C', 'F'}, optional) – Whether to store multidimensional data in C- or Fortran-contiguous (row- or column-wise) order in memory.
Returns: out – Array of zeros with the given shape, dtype, and order.
Return type: ndarray
See also
zeros_like()
- Return an array of zeros with shape and type of input.
ones_like()
- Return an array of ones with shape and type of input.
empty_like()
- Return an empty array with shape and type of input.
ones()
- Return a new array setting values to one.
empty()
- Return a new uninitialized array.
Examples
>>> np.zeros(5) array([ 0., 0., 0., 0., 0.])
>>> np.zeros((5,), dtype=np.int) array([0, 0, 0, 0, 0])
>>> np.zeros((2, 1)) array([[ 0.], [ 0.]])
>>> s = (2,2) >>> np.zeros(s) array([[ 0., 0.], [ 0., 0.]])
>>> np.zeros((2,), dtype=[('x', 'i4'), ('y', 'i4')]) # custom dtype array([(0, 0), (0, 0)], dtype=[('x', '<i4'), ('y', '<i4')])
- shape (int or sequence of ints) – Shape of the new array, e.g.,
-
dask.array.
zeros_like
(a, dtype=None, chunks=None)¶ Return an array of zeros with the same shape and type as a given array.
Parameters: - a (array_like) – The shape and data-type of a define these same attributes of the returned array.
- dtype (data-type, optional) – Overrides the data type of the result.
- chunks (sequence of ints) – The number of samples on each block. Note that the last block will have
fewer samples if
len(array) % chunks != 0
.
Returns: out – Array of zeros with the same shape and type as a.
Return type: ndarray
See also
ones_like()
- Return an array of ones with shape and type of input.
empty_like()
- Return an empty array with shape and type of input.
zeros()
- Return a new array setting values to zero.
ones()
- Return a new array setting values to one.
empty()
- Return a new uninitialized array.
-
dask.array.linalg.
cholesky
(a, lower=False)¶ Returns the Cholesky decomposition, \(A = L L^*\) or \(A = U^* U\) of a Hermitian positive-definite matrix A.
Parameters: - a ((M, M) array_like) – Matrix to be decomposed
- lower (bool, optional) – Whether to compute the upper or lower triangular Cholesky factorization. Default is upper-triangular.
Returns: c – Upper- or lower-triangular Cholesky factor of a.
Return type: (M, M) Array
-
dask.array.linalg.
inv
(a)¶ Compute the inverse of a matrix with LU decomposition and forward / backward substitutions.
Parameters: a (array_like) – Square matrix to be inverted. Returns: ainv – Inverse of the matrix a. Return type: Array
-
dask.array.linalg.
lstsq
(a, b)¶ Return the least-squares solution to a linear matrix equation using QR decomposition.
Solves the equation a x = b by computing a vector x that minimizes the Euclidean 2-norm || b - a x ||^2. The equation may be under-, well-, or over- determined (i.e., the number of linearly independent rows of a can be less than, equal to, or greater than its number of linearly independent columns). If a is square and of full rank, then x (but for round-off error) is the “exact” solution of the equation.
Parameters: - a ((M, N) array_like) – “Coefficient” matrix.
- b ((M,) array_like) – Ordinate or “dependent variable” values.
Returns: - x ((N,) Array) – Least-squares solution. If b is two-dimensional, the solutions are in the K columns of x.
- residuals ((1,) Array) – Sums of residuals; squared Euclidean 2-norm for each column in
b - a*x
. - rank (Array) – Rank of matrix a.
- s ((min(M, N),) Array) – Singular values of a.
-
dask.array.linalg.
lu
(a)¶ Compute the lu decomposition of a matrix.
Examples
>>> p, l, u = da.linalg.lu(x)
Returns: - p (Array, permutation matrix)
- l (Array, lower triangular matrix with unit diagonal.)
- u (Array, upper triangular matrix)
-
dask.array.linalg.
norm
(x, ord=None, axis=None, keepdims=False)¶ Matrix or vector norm.
This function is able to return one of eight different matrix norms, or one of an infinite number of vector norms (described below), depending on the value of the
ord
parameter.Parameters: - x (array_like) – Input array. If axis is None, x must be 1-D or 2-D.
- ord ({non-zero int, inf, -inf, 'fro', 'nuc'}, optional) – Order of the norm (see table under
Notes
). inf means numpy’s inf object. - axis ({int, 2-tuple of ints, None}, optional) – If axis is an integer, it specifies the axis of x along which to compute the vector norms. If axis is a 2-tuple, it specifies the axes that hold 2-D matrices, and the matrix norms of these matrices are computed. If axis is None then either a vector norm (when x is 1-D) or a matrix norm (when x is 2-D) is returned.
- keepdims (bool, optional) –
If this is set to True, the axes which are normed over are left in the result as dimensions with size one. With this option the result will broadcast correctly against the original x.
New in version 1.10.0.
Returns: n – Norm of the matrix or vector(s).
Return type: float or ndarray
Notes
For values of
ord <= 0
, the result is, strictly speaking, not a mathematical ‘norm’, but it may still be useful for various numerical purposes.The following norms can be calculated:
ord norm for matrices norm for vectors None Frobenius norm 2-norm ‘fro’ Frobenius norm – ‘nuc’ nuclear norm – inf max(sum(abs(x), axis=1)) max(abs(x)) -inf min(sum(abs(x), axis=1)) min(abs(x)) 0 – sum(x != 0) 1 max(sum(abs(x), axis=0)) as below -1 min(sum(abs(x), axis=0)) as below 2 2-norm (largest sing. value) as below -2 smallest singular value as below other – sum(abs(x)**ord)**(1./ord) The Frobenius norm is given by [1]_:
\(||A||_F = [\sum_{i,j} abs(a_{i,j})^2]^{1/2}\)The nuclear norm is the sum of the singular values.
References
[1] G. H. Golub and C. F. Van Loan, Matrix Computations, Baltimore, MD, Johns Hopkins University Press, 1985, pg. 15 Examples
>>> from numpy import linalg as LA >>> a = np.arange(9) - 4 >>> a array([-4, -3, -2, -1, 0, 1, 2, 3, 4]) >>> b = a.reshape((3, 3)) >>> b array([[-4, -3, -2], [-1, 0, 1], [ 2, 3, 4]])
>>> LA.norm(a) 7.745966692414834 >>> LA.norm(b) 7.745966692414834 >>> LA.norm(b, 'fro') 7.745966692414834 >>> LA.norm(a, np.inf) 4.0 >>> LA.norm(b, np.inf) 9.0 >>> LA.norm(a, -np.inf) 0.0 >>> LA.norm(b, -np.inf) 2.0
>>> LA.norm(a, 1) 20.0 >>> LA.norm(b, 1) 7.0 >>> LA.norm(a, -1) -4.6566128774142013e-010 >>> LA.norm(b, -1) 6.0 >>> LA.norm(a, 2) 7.745966692414834 >>> LA.norm(b, 2) 7.3484692283495345
>>> LA.norm(a, -2) nan >>> LA.norm(b, -2) 1.8570331885190563e-016 >>> LA.norm(a, 3) 5.8480354764257312 >>> LA.norm(a, -3) nan
Using the axis argument to compute vector norms:
>>> c = np.array([[ 1, 2, 3], ... [-1, 1, 4]]) >>> LA.norm(c, axis=0) array([ 1.41421356, 2.23606798, 5. ]) >>> LA.norm(c, axis=1) array([ 3.74165739, 4.24264069]) >>> LA.norm(c, ord=1, axis=1) array([ 6., 6.])
Using the axis argument to compute matrix norms:
>>> m = np.arange(8).reshape(2,2,2) >>> LA.norm(m, axis=(1,2)) array([ 3.74165739, 11.22497216]) >>> LA.norm(m[0, :, :]), LA.norm(m[1, :, :]) (3.7416573867739413, 11.224972160321824)
-
dask.array.linalg.
qr
(a, name=None)¶ Compute the qr factorization of a matrix.
Examples
>>> q, r = da.linalg.qr(x)
Returns: - q (Array, orthonormal)
- r (Array, upper-triangular)
See also
np.linalg.qr()
- Equivalent NumPy Operation
dask.array.linalg.tsqr()
- Actual implementation with citation
-
dask.array.linalg.
solve
(a, b, sym_pos=False)¶ Solve the equation
a x = b
forx
. By default, use LU decomposition and forward / backward substitutions. Whensym_pos
isTrue
, use Cholesky decomposition.Parameters: - a ((M, M) array_like) – A square matrix.
- b ((M,) or (M, N) array_like) – Right-hand side matrix in
a x = b
. - sym_pos (bool) – Assume a is symmetric and positive definite. If
True
, use Cholesky decomposition.
Returns: x – Solution to the system
a x = b
. Shape of the return matches the shape of b.Return type: (M,) or (M, N) Array
-
dask.array.linalg.
solve_triangular
(a, b, lower=False)¶ Solve the equation a x = b for x, assuming a is a triangular matrix.
Parameters: - a ((M, M) array_like) – A triangular matrix
- b ((M,) or (M, N) array_like) – Right-hand side matrix in a x = b
- lower (bool, optional) – Use only data contained in the lower triangle of a. Default is to use upper triangle.
Returns: x – Solution to the system a x = b. Shape of return matches b.
Return type: (M,) or (M, N) array
-
dask.array.linalg.
svd
(a, name=None)¶ Compute the singular value decomposition of a matrix.
Examples
>>> u, s, v = da.linalg.svd(x)
Returns: - u (Array, unitary / orthogonal)
- s (Array, singular values in decreasing order (largest first))
- v (Array, unitary / orthogonal)
See also
np.linalg.svd()
- Equivalent NumPy Operation
dask.array.linalg.tsqr()
- Actual implementation with citation
-
dask.array.linalg.
svd_compressed
(a, k, n_power_iter=0, seed=None, name=None)¶ Randomly compressed rank-k thin Singular Value Decomposition.
This computes the approximate singular value decomposition of a large array. This algorithm is generally faster than the normal algorithm but does not provide exact results. One can balance between performance and accuracy with input parameters (see below).
Parameters: Examples
>>> u, s, vt = svd_compressed(x, 20)
Returns: - u (Array, unitary / orthogonal)
- s (Array, singular values in decreasing order (largest first))
- v (Array, unitary / orthogonal)
References
N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011 http://arxiv.org/abs/0909.4061
-
dask.array.linalg.
tsqr
(data, name=None, compute_svd=False)¶ Direct Tall-and-Skinny QR algorithm
As presented in:
A. Benson, D. Gleich, and J. Demmel. Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures. IEEE International Conference on Big Data, 2013. http://arxiv.org/abs/1301.1071This algorithm is used to compute both the QR decomposition and the Singular Value Decomposition. It requires that the input array have a single column of blocks, each of which fit in memory.
If blocks are of size
(n, k)
then this algorithm has memory use that scales asn**2 * k * nthreads
.Parameters:
-
dask.array.ma.
filled
(a, fill_value=None)¶ Return input as an array with masked data replaced by a fill value.
If a is not a MaskedArray, a itself is returned. If a is a MaskedArray and fill_value is None, fill_value is set to
a.fill_value
.Parameters: - a (MaskedArray or array_like) – An input object.
- fill_value (scalar, optional) – Filling value. Default is None.
Returns: a – The filled array.
Return type: ndarray
See also
compressed()
Examples
>>> x = np.ma.array(np.arange(9).reshape(3, 3), mask=[[1, 0, 0], ... [1, 0, 0], ... [0, 0, 0]]) >>> x.filled() array([[999999, 1, 2], [999999, 4, 5], [ 6, 7, 8]])
-
dask.array.ma.
fix_invalid
(a, fill_value=None)¶ Return input with invalid data masked and replaced by a fill value.
Invalid data means values of nan, inf, etc.
Parameters: - a (array_like) – Input array, a (subclass of) ndarray.
- mask (sequence, optional) – Mask. Must be convertible to an array of booleans with the same shape as data. True indicates a masked (i.e. invalid) data.
- copy (bool, optional) – Whether to use a copy of a (True) or to fix a in place (False). Default is True.
- fill_value (scalar, optional) – Value used for fixing invalid data. Default is None, in which case
the
a.fill_value
is used.
Returns: b – The input array with invalid entries fixed.
Return type: MaskedArray
Notes
A copy is performed by default.
Examples
>>> x = np.ma.array([1., -1, np.nan, np.inf], mask=[1] + [0]*3) >>> x masked_array(data = [-- -1.0 nan inf], mask = [ True False False False], fill_value = 1e+20) >>> np.ma.fix_invalid(x) masked_array(data = [-- -1.0 -- --], mask = [ True False True True], fill_value = 1e+20)
>>> fixed = np.ma.fix_invalid(x) >>> fixed.data array([ 1.00000000e+00, -1.00000000e+00, 1.00000000e+20, 1.00000000e+20]) >>> x.data array([ 1., -1., NaN, Inf])
-
dask.array.ma.
getdata
(a)¶ Return the data of a masked array as an ndarray.
Return the data of a (if any) as an ndarray if a is a
MaskedArray
, else return a as a ndarray or subclass (depending on subok) if not.Parameters: - a (array_like) – Input
MaskedArray
, alternatively a ndarray or a subclass thereof. - subok (bool) – Whether to force the output to be a pure ndarray (False) or to return a subclass of ndarray if appropriate (True, default).
See also
getmask()
- Return the mask of a masked array, or nomask.
getmaskarray()
- Return the mask of a masked array, or full array of False.
Examples
>>> import numpy.ma as ma >>> a = ma.masked_equal([[1,2],[3,4]], 2) >>> a masked_array(data = [[1 --] [3 4]], mask = [[False True] [False False]], fill_value=999999) >>> ma.getdata(a) array([[1, 2], [3, 4]])
Equivalently use the
MaskedArray
data attribute.>>> a.data array([[1, 2], [3, 4]])
- a (array_like) – Input
-
dask.array.ma.
getmaskarray
(a)¶ Return the mask of a masked array, or full boolean array of False.
Return the mask of arr as an ndarray if arr is a MaskedArray and the mask is not nomask, else return a full boolean array of False of the same shape as arr.
Parameters: arr (array_like) – Input MaskedArray for which the mask is required. See also
getmask()
- Return the mask of a masked array, or nomask.
getdata()
- Return the data of a masked array as an ndarray.
Examples
>>> import numpy.ma as ma >>> a = ma.masked_equal([[1,2],[3,4]], 2) >>> a masked_array(data = [[1 --] [3 4]], mask = [[False True] [False False]], fill_value=999999) >>> ma.getmaskarray(a) array([[False, True], [False, False]], dtype=bool)
Result when mask ==
nomask
>>> b = ma.masked_array([[1,2],[3,4]]) >>> b masked_array(data = [[1 2] [3 4]], mask = False, fill_value=999999) >>> >ma.getmaskarray(b) array([[False, False], [False, False]], dtype=bool)
-
dask.array.ma.
masked_array
(data, mask=False, fill_value=None, **kwargs)¶ An array class with possibly masked values.
Masked values of True exclude the corresponding element from any computation.
Construction:
x = MaskedArray(data, mask=nomask, dtype=None, copy=False, subok=True, ndmin=0, fill_value=None, keep_mask=True, hard_mask=None, shrink=True, order=None)
Parameters: - data (array_like) – Input data.
- mask (sequence, optional) – Mask. Must be convertible to an array of booleans with the same shape as data. True indicates a masked (i.e. invalid) data.
- dtype (dtype, optional) – Data type of the output.
If dtype is None, the type of the data argument (
data.dtype
) is used. If dtype is not None and different fromdata.dtype
, a copy is performed. - copy (bool, optional) – Whether to copy the input data (True), or to use a reference instead. Default is False.
- subok (bool, optional) – Whether to return a subclass of MaskedArray if possible (True) or a plain MaskedArray. Default is True.
- ndmin (int, optional) – Minimum number of dimensions. Default is 0.
- fill_value (scalar, optional) – Value used to fill in the masked values when necessary. If None, a default based on the data-type is used.
- keep_mask (bool, optional) – Whether to combine mask with the mask of the input data, if any (True), or to use only mask for the output (False). Default is True.
- hard_mask (bool, optional) – Whether to use a hard mask or not. With a hard mask, masked values cannot be unmasked. Default is False.
- shrink (bool, optional) – Whether to force compression of an empty mask. Default is True.
- order ({'C', 'F', 'A'}, optional) – Specify the order of the array. If order is ‘C’, then the array will be in C-contiguous order (last-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (first-index varies the fastest). If order is ‘A’ (default), then the returned array may be in any order (either C-, Fortran-contiguous, or even discontiguous), unless a copy is required, in which case it will be C-contiguous.
-
dask.array.ma.
masked_equal
(a, value)¶ Mask an array where equal to a given value.
This function is a shortcut to
masked_where
, with condition = (x == value). For floating point arrays, consider usingmasked_values(x, value)
.See also
masked_where()
- Mask where a condition is met.
masked_values()
- Mask using floating point equality.
Examples
>>> import numpy.ma as ma >>> a = np.arange(4) >>> a array([0, 1, 2, 3]) >>> ma.masked_equal(a, 2) masked_array(data = [0 1 -- 3], mask = [False False True False], fill_value=999999)
-
dask.array.ma.
masked_greater
(a, value)¶ Mask an array where greater than a given value.
This function is a shortcut to
masked_where
, with condition = (x > value).See also
masked_where()
- Mask where a condition is met.
Examples
>>> import numpy.ma as ma >>> a = np.arange(4) >>> a array([0, 1, 2, 3]) >>> ma.masked_greater(a, 2) masked_array(data = [0 1 2 --], mask = [False False False True], fill_value=999999)
-
dask.array.ma.
masked_greater_equal
(a, value)¶ Mask an array where greater than or equal to a given value.
This function is a shortcut to
masked_where
, with condition = (x >= value).See also
masked_where()
- Mask where a condition is met.
Examples
>>> import numpy.ma as ma >>> a = np.arange(4) >>> a array([0, 1, 2, 3]) >>> ma.masked_greater_equal(a, 2) masked_array(data = [0 1 -- --], mask = [False False True True], fill_value=999999)
-
dask.array.ma.
masked_inside
(x, v1, v2)¶ Mask an array inside a given interval.
Shortcut to
masked_where
, where condition is True for x inside the interval [v1,v2] (v1 <= x <= v2). The boundaries v1 and v2 can be given in either order.See also
masked_where()
- Mask where a condition is met.
Notes
The array x is prefilled with its filling value.
Examples
>>> import numpy.ma as ma >>> x = [0.31, 1.2, 0.01, 0.2, -0.4, -1.1] >>> ma.masked_inside(x, -0.3, 0.3) masked_array(data = [0.31 1.2 -- -- -0.4 -1.1], mask = [False False True True False False], fill_value=1e+20)
The order of v1 and v2 doesn’t matter.
>>> ma.masked_inside(x, 0.3, -0.3) masked_array(data = [0.31 1.2 -- -- -0.4 -1.1], mask = [False False True True False False], fill_value=1e+20)
-
dask.array.ma.
masked_invalid
(a)¶ Mask an array where invalid values occur (NaNs or infs).
This function is a shortcut to
masked_where
, with condition = ~(np.isfinite(a)). Any pre-existing mask is conserved. Only applies to arrays with a dtype where NaNs or infs make sense (i.e. floating point types), but accepts any array_like object.See also
masked_where()
- Mask where a condition is met.
Examples
>>> import numpy.ma as ma >>> a = np.arange(5, dtype=np.float) >>> a[2] = np.NaN >>> a[3] = np.PINF >>> a array([ 0., 1., NaN, Inf, 4.]) >>> ma.masked_invalid(a) masked_array(data = [0.0 1.0 -- -- 4.0], mask = [False False True True False], fill_value=1e+20)
-
dask.array.ma.
masked_less
(a, value)¶ Mask an array where less than a given value.
This function is a shortcut to
masked_where
, with condition = (x < value).See also
masked_where()
- Mask where a condition is met.
Examples
>>> import numpy.ma as ma >>> a = np.arange(4) >>> a array([0, 1, 2, 3]) >>> ma.masked_less(a, 2) masked_array(data = [-- -- 2 3], mask = [ True True False False], fill_value=999999)
-
dask.array.ma.
masked_less_equal
(a, value)¶ Mask an array where less than or equal to a given value.
This function is a shortcut to
masked_where
, with condition = (x <= value).See also
masked_where()
- Mask where a condition is met.
Examples
>>> import numpy.ma as ma >>> a = np.arange(4) >>> a array([0, 1, 2, 3]) >>> ma.masked_less_equal(a, 2) masked_array(data = [-- -- -- 3], mask = [ True True True False], fill_value=999999)
-
dask.array.ma.
masked_not_equal
(a, value)¶ Mask an array where not equal to a given value.
This function is a shortcut to
masked_where
, with condition = (x != value).See also
masked_where()
- Mask where a condition is met.
Examples
>>> import numpy.ma as ma >>> a = np.arange(4) >>> a array([0, 1, 2, 3]) >>> ma.masked_not_equal(a, 2) masked_array(data = [-- -- 2 --], mask = [ True True False True], fill_value=999999)
-
dask.array.ma.
masked_outside
(x, v1, v2)¶ Mask an array outside a given interval.
Shortcut to
masked_where
, where condition is True for x outside the interval [v1,v2] (x < v1)|(x > v2). The boundaries v1 and v2 can be given in either order.See also
masked_where()
- Mask where a condition is met.
Notes
The array x is prefilled with its filling value.
Examples
>>> import numpy.ma as ma >>> x = [0.31, 1.2, 0.01, 0.2, -0.4, -1.1] >>> ma.masked_outside(x, -0.3, 0.3) masked_array(data = [-- -- 0.01 0.2 -- --], mask = [ True True False False True True], fill_value=1e+20)
The order of v1 and v2 doesn’t matter.
>>> ma.masked_outside(x, 0.3, -0.3) masked_array(data = [-- -- 0.01 0.2 -- --], mask = [ True True False False True True], fill_value=1e+20)
-
dask.array.ma.
masked_values
(x, value, rtol=1e-05, atol=1e-08, shrink=True)¶ Mask using floating point equality.
Return a MaskedArray, masked where the data in array x are approximately equal to value, i.e. where the following condition is True
(abs(x - value) <= atol+rtol*abs(value))
The fill_value is set to value and the mask is set to
nomask
if possible. For integers, consider usingmasked_equal
.Parameters: - x (array_like) – Array to mask.
- value (float) – Masking value.
- rtol (float, optional) – Tolerance parameter.
- atol (float, optional) – Tolerance parameter (1e-8).
- copy (bool, optional) – Whether to return a copy of x.
- shrink (bool, optional) – Whether to collapse a mask full of False to
nomask
.
Returns: result – The result of masking x where approximately equal to value.
Return type: MaskedArray
See also
masked_where()
- Mask where a condition is met.
masked_equal()
- Mask where equal to a given value (integers).
Examples
>>> import numpy.ma as ma >>> x = np.array([1, 1.1, 2, 1.1, 3]) >>> ma.masked_values(x, 1.1) masked_array(data = [1.0 -- 2.0 -- 3.0], mask = [False True False True False], fill_value=1.1)
Note that mask is set to
nomask
if possible.>>> ma.masked_values(x, 1.5) masked_array(data = [ 1. 1.1 2. 1.1 3. ], mask = False, fill_value=1.5)
For integers, the fill value will be different in general to the result of
masked_equal
.>>> x = np.arange(5) >>> x array([0, 1, 2, 3, 4]) >>> ma.masked_values(x, 2) masked_array(data = [0 1 -- 3 4], mask = [False False True False False], fill_value=2) >>> ma.masked_equal(x, 2) masked_array(data = [0 1 -- 3 4], mask = [False False True False False], fill_value=999999)
-
dask.array.ma.
masked_where
(condition, a)¶ Mask an array where a condition is met.
Return a as an array masked where condition is True. Any masked values of a or condition are also masked in the output.
Parameters: - condition (array_like) – Masking condition. When condition tests floating point values for
equality, consider using
masked_values
instead. - a (array_like) – Array to mask.
- copy (bool) – If True (default) make a copy of a in the result. If False modify a in place and return a view.
Returns: result – The result of masking a where condition is True.
Return type: MaskedArray
See also
masked_values()
- Mask using floating point equality.
masked_equal()
- Mask where equal to a given value.
masked_not_equal()
- Mask where not equal to a given value.
masked_less_equal()
- Mask where less than or equal to a given value.
masked_greater_equal()
- Mask where greater than or equal to a given value.
masked_less()
- Mask where less than a given value.
masked_greater()
- Mask where greater than a given value.
masked_inside()
- Mask inside a given interval.
masked_outside()
- Mask outside a given interval.
masked_invalid()
- Mask invalid values (NaNs or infs).
Examples
>>> import numpy.ma as ma >>> a = np.arange(4) >>> a array([0, 1, 2, 3]) >>> ma.masked_where(a <= 2, a) masked_array(data = [-- -- -- 3], mask = [ True True True False], fill_value=999999)
Mask array b conditional on a.
>>> b = ['a', 'b', 'c', 'd'] >>> ma.masked_where(a == 2, b) masked_array(data = [a b -- d], mask = [False False True False], fill_value=N/A)
Effect of the copy argument.
>>> c = ma.masked_where(a <= 2, a) >>> c masked_array(data = [-- -- -- 3], mask = [ True True True False], fill_value=999999) >>> c[0] = 99 >>> c masked_array(data = [99 -- -- 3], mask = [False True True False], fill_value=999999) >>> a array([0, 1, 2, 3]) >>> c = ma.masked_where(a <= 2, a, copy=False) >>> c[0] = 99 >>> c masked_array(data = [99 -- -- 3], mask = [False True True False], fill_value=999999) >>> a array([99, 1, 2, 3])
When condition or a contain masked values.
>>> a = np.arange(4) >>> a = ma.masked_where(a == 2, a) >>> a masked_array(data = [0 1 -- 3], mask = [False False True False], fill_value=999999) >>> b = np.arange(4) >>> b = ma.masked_where(b == 0, b) >>> b masked_array(data = [-- 1 2 3], mask = [ True False False False], fill_value=999999) >>> ma.masked_where(a == 3, b) masked_array(data = [-- 1 -- --], mask = [ True False True True], fill_value=999999)
- condition (array_like) – Masking condition. When condition tests floating point values for
equality, consider using
-
dask.array.ma.
set_fill_value
(a, fill_value)¶ Set the filling value of a, if a is a masked array.
This function changes the fill value of the masked array a in place. If a is not a masked array, the function returns silently, without doing anything.
Parameters: - a (array_like) – Input array.
- fill_value (dtype) – Filling value. A consistency test is performed to make sure the value is compatible with the dtype of a.
Returns: Nothing returned by this function.
Return type: See also
maximum_fill_value()
- Return the default fill value for a dtype.
MaskedArray.fill_value()
- Return current fill value.
MaskedArray.set_fill_value()
- Equivalent method.
Examples
>>> import numpy.ma as ma >>> a = np.arange(5) >>> a array([0, 1, 2, 3, 4]) >>> a = ma.masked_where(a < 3, a) >>> a masked_array(data = [-- -- -- 3 4], mask = [ True True True False False], fill_value=999999) >>> ma.set_fill_value(a, -999) >>> a masked_array(data = [-- -- -- 3 4], mask = [ True True True False False], fill_value=-999)
Nothing happens if a is not a masked array.
>>> a = range(5) >>> a [0, 1, 2, 3, 4] >>> ma.set_fill_value(a, 100) >>> a [0, 1, 2, 3, 4] >>> a = np.arange(5) >>> a array([0, 1, 2, 3, 4]) >>> ma.set_fill_value(a, 100) >>> a array([0, 1, 2, 3, 4])
-
dask.array.ghost.
ghost
(x, depth, boundary)¶ Share boundaries between neighboring blocks
Parameters: - x (da.Array) – A dask array
- depth (dict) – The size of the shared boundary per axis
- boundary (dict) – The boundary condition on each axis. Options are ‘reflect’, ‘periodic’, ‘nearest’, ‘none’, or an array value. Such a value will fill the boundary with that value.
- depth input informs how many cells to overlap between neighboring (The) –
- ``{0 (blocks) –
- missing from this input will not be overlapped. (Axes) –
Examples
>>> import numpy as np >>> import dask.array as da
>>> x = np.arange(64).reshape((8, 8)) >>> d = da.from_array(x, chunks=(4, 4)) >>> d.chunks ((4, 4), (4, 4))
>>> g = da.ghost.ghost(d, depth={0: 2, 1: 1}, ... boundary={0: 100, 1: 'reflect'}) >>> g.chunks ((8, 8), (6, 6))
>>> np.array(g) array([[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100], [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100], [ 0, 0, 1, 2, 3, 4, 3, 4, 5, 6, 7, 7], [ 8, 8, 9, 10, 11, 12, 11, 12, 13, 14, 15, 15], [ 16, 16, 17, 18, 19, 20, 19, 20, 21, 22, 23, 23], [ 24, 24, 25, 26, 27, 28, 27, 28, 29, 30, 31, 31], [ 32, 32, 33, 34, 35, 36, 35, 36, 37, 38, 39, 39], [ 40, 40, 41, 42, 43, 44, 43, 44, 45, 46, 47, 47], [ 16, 16, 17, 18, 19, 20, 19, 20, 21, 22, 23, 23], [ 24, 24, 25, 26, 27, 28, 27, 28, 29, 30, 31, 31], [ 32, 32, 33, 34, 35, 36, 35, 36, 37, 38, 39, 39], [ 40, 40, 41, 42, 43, 44, 43, 44, 45, 46, 47, 47], [ 48, 48, 49, 50, 51, 52, 51, 52, 53, 54, 55, 55], [ 56, 56, 57, 58, 59, 60, 59, 60, 61, 62, 63, 63], [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100], [100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]])
-
dask.array.ghost.
map_overlap
(x, func, depth, boundary=None, trim=True, **kwargs)¶
-
dask.array.
from_array
(x, chunks, name=None, lock=False, asarray=True, fancy=True, getitem=None) Create dask array from something that looks like an array
Input must have a
.shape
and support numpy-style slicing.Parameters: - x (array_like) –
- chunks (int, tuple) –
How to chunk the array. Must be one of the following forms: - A blocksize like 1000. - A blockshape like (1000, 1000). - Explicit sizes of all blocks along all dimensions
like ((1000, 1000, 500), (400, 400)). - name (str, optional) – The key name to use for the array. Defaults to a hash of
x
. Usename=False
to generate a random name instead of hashing (fast) - lock (bool or Lock, optional) – If
x
doesn’t support concurrent reads then provide a lock here, or pass in True to have dask.array create one for you. - asarray (bool, optional) – If True (default), then chunks will be converted to instances of
ndarray
. Set to False to pass passed chunks through unchanged. - fancy (bool, optional) – If
x
doesn’t support fancy indexing (e.g. indexing with lists or arrays) then set to False. Default is True.
Examples
>>> x = h5py.File('...')['/data/path'] >>> a = da.from_array(x, chunks=(1000, 1000))
If your underlying datastore does not support concurrent reads then include the
lock=True
keyword argument orlock=mylock
if you want multiple arrays to coordinate around the same lock.>>> a = da.from_array(x, chunks=(1000, 1000), lock=True)
-
dask.array.
from_delayed
(value, shape, dtype, name=None) Create a dask array from a dask delayed value
This routine is useful for constructing dask arrays in an ad-hoc fashion using dask delayed, particularly when combined with stack and concatenate.
The dask array will consist of a single chunk.
Examples
>>> from dask import delayed >>> value = delayed(np.ones)(5) >>> array = from_delayed(value, (5,), float) >>> array dask.array<from-value, shape=(5,), dtype=float64, chunksize=(5,)> >>> array.compute() array([ 1., 1., 1., 1., 1.])
-
dask.array.
from_npy_stack
(dirname, mmap_mode='r')¶ Load dask array from stack of npy files
See
da.to_npy_stack
for docstringParameters: - dirname (string) – Directory of .npy files
- mmap_mode ((None or 'r')) – Read data in memory map mode
-
dask.array.
store
(sources, targets, lock=True, regions=None, compute=True, **kwargs) Store dask arrays in array-like objects, overwrite data in target
This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.
If your data fits in memory then you may prefer calling
np.array(myarray)
instead.Parameters: - sources (Array or iterable of Arrays) –
- targets (array-like or iterable of array-likes) – These should support setitem syntax
target[10:20] = ...
- lock (boolean or threading.Lock, optional) – Whether or not to lock the data stores while storing.
Pass True (lock each file individually), False (don’t lock) or a
particular
threading.Lock
object to be shared among all writes. - regions (tuple of slices or iterable of tuple of slices) – Each
region
tuple inregions
should be such thattarget[region].shape = source.shape
for the corresponding source and target in sources and targets, respectively. - compute (boolean, optional) – If true compute immediately, return
dask.delayed.Delayed
otherwise
Examples
>>> x = ...
>>> import h5py >>> f = h5py.File('myfile.hdf5') >>> dset = f.create_dataset('/data', shape=x.shape, ... chunks=x.chunks, ... dtype='f8')
>>> store(x, dset)
Alternatively store many arrays at the same time
>>> store([x, y, z], [dset1, dset2, dset3])
-
dask.array.
to_hdf5
(filename, *args, **kwargs)¶ Store arrays in HDF5 file
This saves several dask arrays into several datapaths in an HDF5 file. It creates the necessary datasets and handles clean file opening/closing.
>>> da.to_hdf5('myfile.hdf5', '/x', x)
or
>>> da.to_hdf5('myfile.hdf5', {'/x': x, '/y': y})
Optionally provide arguments as though to
h5py.File.create_dataset
>>> da.to_hdf5('myfile.hdf5', '/x', x, compression='lzf', shuffle=True)
This can also be used as a method on a single Array
>>> x.to_hdf5('myfile.hdf5', '/x')
See also
da.store()
,h5py.File.create_dataset()
-
dask.array.
to_npy_stack
(dirname, x, axis=0)¶ Write dask array to a stack of .npy files
This partitions the dask.array along one axis and stores each block along that axis as a single .npy file in the specified directory
Examples
>>> x = da.ones((5, 10, 10), chunks=(2, 4, 4)) >>> da.to_npy_stack('data/', x, axis=0)
The
.npy
files store numpy arrays forx[0:2], x[2:4], and x[4:5]
respectively, as is specified by the chunk size along the zeroth axis. The info file stores the dtype, chunks, and axis information of the array.You can load these stacks with the
da.from_npy_stack
function.>>> y = da.from_npy_stack('data/')
See also
-
dask.array.fft.
fft_wrap
(fft_func, kind=None, dtype=None)¶ Wrap 1D complex FFT functions
Takes a function that behaves like
numpy.fft
functions and a specified kind to match it to that are named after the functions in thenumpy.fft
API.Supported kinds include:
- fft
- ifft
- rfft
- irfft
- hfft
- ihfft
Examples
>>> parallel_fft = fft_wrap(np.fft.fft) >>> parallel_ifft = fft_wrap(np.fft.ifft)
-
dask.array.fft.
fft
(a, n=None, axis=None)¶ Wrapping of numpy.fft.fftpack.fft
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.fft docstring follows below:
Compute the one-dimensional discrete Fourier Transform.
This function computes the one-dimensional n-point discrete Fourier Transform (DFT) with the efficient Fast Fourier Transform (FFT) algorithm [CT].
Parameters: - a (array_like) – Input array, can be complex.
- n (int, optional) – Length of the transformed axis of the output. If n is smaller than the length of the input, the input is cropped. If it is larger, the input is padded with zeros. If n is not given, the length of the input along the axis specified by axis is used.
- axis (int, optional) – Axis over which to compute the FFT. If not given, the last axis is used.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified.
Return type: complex ndarray
Raises: IndexError
– if axes is larger than the last axis of a.See also
Notes
FFT (Fast Fourier Transform) refers to a way the discrete Fourier Transform (DFT) can be calculated efficiently, by using symmetries in the calculated terms. The symmetry is highest when n is a power of 2, and the transform is therefore most efficient for these sizes.
The DFT is defined, with the conventions used in this implementation, in the documentation for the numpy.fft module.
References
[CT] Cooley, James W., and John W. Tukey, 1965, “An algorithm for the machine calculation of complex Fourier series,” Math. Comput. 19: 297-301. Examples
>>> np.fft.fft(np.exp(2j * np.pi * np.arange(8) / 8)) array([ -3.44505240e-16 +1.14383329e-17j, 8.00000000e+00 -5.71092652e-15j, 2.33482938e-16 +1.22460635e-16j, 1.64863782e-15 +1.77635684e-15j, 9.95839695e-17 +2.33482938e-16j, 0.00000000e+00 +1.66837030e-15j, 1.14383329e-17 +1.22460635e-16j, -1.64863782e-15 +1.77635684e-15j])
>>> import matplotlib.pyplot as plt >>> t = np.arange(256) >>> sp = np.fft.fft(np.sin(t)) >>> freq = np.fft.fftfreq(t.shape[-1]) >>> plt.plot(freq, sp.real, freq, sp.imag) [<matplotlib.lines.Line2D object at 0x...>, <matplotlib.lines.Line2D object at 0x...>] >>> plt.show()
In this example, real input has an FFT which is Hermitian, i.e., symmetric in the real part and anti-symmetric in the imaginary part, as described in the numpy.fft documentation.
-
dask.array.fft.
fft2
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.fft2
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.fft2 docstring follows below:
Compute the 2-dimensional discrete Fourier Transform
This function computes the n-dimensional discrete Fourier Transform over any axes in an M-dimensional array by means of the Fast Fourier Transform (FFT). By default, the transform is computed over the last two axes of the input array, i.e., a 2-dimensional FFT.
Parameters: - a (array_like) – Input array, can be complex
- s (sequence of ints, optional) – Shape (length of each transformed axis) of the output (s[0] refers to axis 0, s[1] to axis 1, etc.). This corresponds to n for fft(x, n). Along each axis, if the given shape is smaller than that of the input, the input is cropped. If it is larger, the input is padded with zeros. if s is not given, the shape of the input along the axes specified by axes is used.
- axes (sequence of ints, optional) – Axes over which to compute the FFT. If not given, the last two axes are used. A repeated index in axes means the transform over that axis is performed multiple times. A one-element sequence means that a one-dimensional FFT is performed.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axes indicated by axes, or the last two axes if axes is not given.
Return type: complex ndarray
Raises: ValueError
– If s and axes have different length, or axes not given andlen(s) != 2
.IndexError
– If an element of axes is larger than than the number of axes of a.
See also
numpy.fft()
- Overall view of discrete Fourier transforms, with definitions and conventions used.
ifft2()
- The inverse two-dimensional FFT.
fft()
- The one-dimensional FFT.
fftn()
- The n-dimensional FFT.
fftshift()
- Shifts zero-frequency terms to the center of the array. For two-dimensional input, swaps first and third quadrants, and second and fourth quadrants.
Notes
fft2 is just fftn with a different default for axes.
The output, analogously to fft, contains the term for zero frequency in the low-order corner of the transformed axes, the positive frequency terms in the first half of these axes, the term for the Nyquist frequency in the middle of the axes and the negative frequency terms in the second half of the axes, in order of decreasingly negative frequency.
See fftn for details and a plotting example, and numpy.fft for definitions and conventions used.
Examples
>>> a = np.mgrid[:5, :5][0] >>> np.fft.fft2(a) array([[ 50.0 +0.j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j ], [-12.5+17.20477401j, 0.0 +0.j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j ], [-12.5 +4.0614962j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j ], [-12.5 -4.0614962j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j ], [-12.5-17.20477401j, 0.0 +0.j , 0.0 +0.j , 0.0 +0.j , 0.0 +0.j ]])
-
dask.array.fft.
fftn
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.fftn
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.fftn docstring follows below:
Compute the N-dimensional discrete Fourier Transform.
This function computes the N-dimensional discrete Fourier Transform over any number of axes in an M-dimensional array by means of the Fast Fourier Transform (FFT).
Parameters: - a (array_like) – Input array, can be complex.
- s (sequence of ints, optional) – Shape (length of each transformed axis) of the output (s[0] refers to axis 0, s[1] to axis 1, etc.). This corresponds to n for fft(x, n). Along any axis, if the given shape is smaller than that of the input, the input is cropped. If it is larger, the input is padded with zeros. if s is not given, the shape of the input along the axes specified by axes is used.
- axes (sequence of ints, optional) – Axes over which to compute the FFT. If not given, the last
len(s)
axes are used, or all axes if s is also not specified. Repeated indices in axes means that the transform over that axis is performed multiple times. - norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axes indicated by axes, or by a combination of s and a, as explained in the parameters section above.
Return type: complex ndarray
Raises: ValueError
– If s and axes have different length.IndexError
– If an element of axes is larger than than the number of axes of a.
See also
numpy.fft()
- Overall view of discrete Fourier transforms, with definitions and conventions used.
ifftn()
- The inverse of fftn, the inverse n-dimensional FFT.
fft()
- The one-dimensional FFT, with definitions and conventions used.
rfftn()
- The n-dimensional FFT of real input.
fft2()
- The two-dimensional FFT.
fftshift()
- Shifts zero-frequency terms to centre of array
Notes
The output, analogously to fft, contains the term for zero frequency in the low-order corner of all axes, the positive frequency terms in the first half of all axes, the term for the Nyquist frequency in the middle of all axes and the negative frequency terms in the second half of all axes, in order of decreasingly negative frequency.
See numpy.fft for details, definitions and conventions used.
Examples
>>> a = np.mgrid[:3, :3, :3][0] >>> np.fft.fftn(a, axes=(1, 2)) array([[[ 0.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j]], [[ 9.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j]], [[ 18.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j]]]) >>> np.fft.fftn(a, (2, 2), axes=(0, 1)) array([[[ 2.+0.j, 2.+0.j, 2.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j]], [[-2.+0.j, -2.+0.j, -2.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j]]])
>>> import matplotlib.pyplot as plt >>> [X, Y] = np.meshgrid(2 * np.pi * np.arange(200) / 12, ... 2 * np.pi * np.arange(200) / 34) >>> S = np.sin(X) + np.cos(Y) + np.random.uniform(0, 1, X.shape) >>> FS = np.fft.fftn(S) >>> plt.imshow(np.log(np.abs(np.fft.fftshift(FS))**2)) <matplotlib.image.AxesImage object at 0x...> >>> plt.show()
-
dask.array.fft.
ifft
(a, n=None, axis=None)¶ Wrapping of numpy.fft.fftpack.ifft
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.ifft docstring follows below:
Compute the one-dimensional inverse discrete Fourier Transform.
This function computes the inverse of the one-dimensional n-point discrete Fourier transform computed by fft. In other words,
ifft(fft(a)) == a
to within numerical accuracy. For a general description of the algorithm and definitions, see numpy.fft.The input should be ordered in the same way as is returned by fft, i.e.,
a[0]
should contain the zero frequency term,a[1:n//2]
should contain the positive-frequency terms,a[n//2 + 1:]
should contain the negative-frequency terms, in increasing order starting from the most negative frequency.
For an even number of input points,
A[n//2]
represents the sum of the values at the positive and negative Nyquist frequencies, as the two are aliased together. See numpy.fft for details.Parameters: - a (array_like) – Input array, can be complex.
- n (int, optional) – Length of the transformed axis of the output. If n is smaller than the length of the input, the input is cropped. If it is larger, the input is padded with zeros. If n is not given, the length of the input along the axis specified by axis is used. See notes about padding issues.
- axis (int, optional) – Axis over which to compute the inverse DFT. If not given, the last axis is used.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified.
Return type: complex ndarray
Raises: IndexError
– If axes is larger than the last axis of a.See also
Notes
If the input parameter n is larger than the size of the input, the input is padded by appending zeros at the end. Even though this is the common approach, it might lead to surprising results. If a different padding is desired, it must be performed before calling ifft.
Examples
>>> np.fft.ifft([0, 4, 0, 0]) array([ 1.+0.j, 0.+1.j, -1.+0.j, 0.-1.j])
Create and plot a band-limited signal with random phases:
>>> import matplotlib.pyplot as plt >>> t = np.arange(400) >>> n = np.zeros((400,), dtype=complex) >>> n[40:60] = np.exp(1j*np.random.uniform(0, 2*np.pi, (20,))) >>> s = np.fft.ifft(n) >>> plt.plot(t, s.real, 'b-', t, s.imag, 'r--') ... >>> plt.legend(('real', 'imaginary')) ... >>> plt.show()
-
dask.array.fft.
ifft2
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.ifft2
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.ifft2 docstring follows below:
Compute the 2-dimensional inverse discrete Fourier Transform.
This function computes the inverse of the 2-dimensional discrete Fourier Transform over any number of axes in an M-dimensional array by means of the Fast Fourier Transform (FFT). In other words,
ifft2(fft2(a)) == a
to within numerical accuracy. By default, the inverse transform is computed over the last two axes of the input array.The input, analogously to ifft, should be ordered in the same way as is returned by fft2, i.e. it should have the term for zero frequency in the low-order corner of the two axes, the positive frequency terms in the first half of these axes, the term for the Nyquist frequency in the middle of the axes and the negative frequency terms in the second half of both axes, in order of decreasingly negative frequency.
Parameters: - a (array_like) – Input array, can be complex.
- s (sequence of ints, optional) – Shape (length of each axis) of the output (
s[0]
refers to axis 0,s[1]
to axis 1, etc.). This corresponds to n forifft(x, n)
. Along each axis, if the given shape is smaller than that of the input, the input is cropped. If it is larger, the input is padded with zeros. if s is not given, the shape of the input along the axes specified by axes is used. See notes for issue on ifft zero padding. - axes (sequence of ints, optional) – Axes over which to compute the FFT. If not given, the last two axes are used. A repeated index in axes means the transform over that axis is performed multiple times. A one-element sequence means that a one-dimensional FFT is performed.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axes indicated by axes, or the last two axes if axes is not given.
Return type: complex ndarray
Raises: ValueError
– If s and axes have different length, or axes not given andlen(s) != 2
.IndexError
– If an element of axes is larger than than the number of axes of a.
See also
Notes
ifft2 is just ifftn with a different default for axes.
See ifftn for details and a plotting example, and numpy.fft for definition and conventions used.
Zero-padding, analogously with ifft, is performed by appending zeros to the input along the specified dimension. Although this is the common approach, it might lead to surprising results. If another form of zero padding is desired, it must be performed before ifft2 is called.
Examples
>>> a = 4 * np.eye(4) >>> np.fft.ifft2(a) array([[ 1.+0.j, 0.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j], [ 0.+0.j, 0.+0.j, 1.+0.j, 0.+0.j], [ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j]])
-
dask.array.fft.
ifftn
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.ifftn
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.ifftn docstring follows below:
Compute the N-dimensional inverse discrete Fourier Transform.
This function computes the inverse of the N-dimensional discrete Fourier Transform over any number of axes in an M-dimensional array by means of the Fast Fourier Transform (FFT). In other words,
ifftn(fftn(a)) == a
to within numerical accuracy. For a description of the definitions and conventions used, see numpy.fft.The input, analogously to ifft, should be ordered in the same way as is returned by fftn, i.e. it should have the term for zero frequency in all axes in the low-order corner, the positive frequency terms in the first half of all axes, the term for the Nyquist frequency in the middle of all axes and the negative frequency terms in the second half of all axes, in order of decreasingly negative frequency.
Parameters: - a (array_like) – Input array, can be complex.
- s (sequence of ints, optional) – Shape (length of each transformed axis) of the output
(
s[0]
refers to axis 0,s[1]
to axis 1, etc.). This corresponds ton
forifft(x, n)
. Along any axis, if the given shape is smaller than that of the input, the input is cropped. If it is larger, the input is padded with zeros. if s is not given, the shape of the input along the axes specified by axes is used. See notes for issue on ifft zero padding. - axes (sequence of ints, optional) – Axes over which to compute the IFFT. If not given, the last
len(s)
axes are used, or all axes if s is also not specified. Repeated indices in axes means that the inverse transform over that axis is performed multiple times. - norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axes indicated by axes, or by a combination of s or a, as explained in the parameters section above.
Return type: complex ndarray
Raises: ValueError
– If s and axes have different length.IndexError
– If an element of axes is larger than than the number of axes of a.
See also
numpy.fft()
- Overall view of discrete Fourier transforms, with definitions and conventions used.
fftn()
- The forward n-dimensional FFT, of which ifftn is the inverse.
ifft()
- The one-dimensional inverse FFT.
ifft2()
- The two-dimensional inverse FFT.
ifftshift()
- Undoes fftshift, shifts zero-frequency terms to beginning of array.
Notes
See numpy.fft for definitions and conventions used.
Zero-padding, analogously with ifft, is performed by appending zeros to the input along the specified dimension. Although this is the common approach, it might lead to surprising results. If another form of zero padding is desired, it must be performed before ifftn is called.
Examples
>>> a = np.eye(4) >>> np.fft.ifftn(np.fft.fftn(a, axes=(0,)), axes=(1,)) array([[ 1.+0.j, 0.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 1.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j, 0.+0.j, 1.+0.j]])
Create and plot an image with band-limited frequency content:
>>> import matplotlib.pyplot as plt >>> n = np.zeros((200,200), dtype=complex) >>> n[60:80, 20:40] = np.exp(1j*np.random.uniform(0, 2*np.pi, (20, 20))) >>> im = np.fft.ifftn(n).real >>> plt.imshow(im) <matplotlib.image.AxesImage object at 0x...> >>> plt.show()
-
dask.array.fft.
rfft
(a, n=None, axis=None)¶ Wrapping of numpy.fft.fftpack.rfft
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.rfft docstring follows below:
Compute the one-dimensional discrete Fourier Transform for real input.
This function computes the one-dimensional n-point discrete Fourier Transform (DFT) of a real-valued array by means of an efficient algorithm called the Fast Fourier Transform (FFT).
Parameters: - a (array_like) – Input array
- n (int, optional) – Number of points along transformation axis in the input to use. If n is smaller than the length of the input, the input is cropped. If it is larger, the input is padded with zeros. If n is not given, the length of the input along the axis specified by axis is used.
- axis (int, optional) – Axis over which to compute the FFT. If not given, the last axis is used.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified. If n is even, the length of the transformed axis is
(n/2)+1
. If n is odd, the length is(n+1)/2
.Return type: complex ndarray
Raises: IndexError
– If axis is larger than the last axis of a.See also
Notes
When the DFT is computed for purely real input, the output is Hermitian-symmetric, i.e. the negative frequency terms are just the complex conjugates of the corresponding positive-frequency terms, and the negative-frequency terms are therefore redundant. This function does not compute the negative frequency terms, and the length of the transformed axis of the output is therefore
n//2 + 1
.When
A = rfft(a)
and fs is the sampling frequency,A[0]
contains the zero-frequency term 0*fs, which is real due to Hermitian symmetry.If n is even,
A[-1]
contains the term representing both positive and negative Nyquist frequency (+fs/2 and -fs/2), and must also be purely real. If n is odd, there is no term at fs/2;A[-1]
contains the largest positive frequency (fs/2*(n-1)/n), and is complex in the general case.If the input a contains an imaginary part, it is silently discarded.
Examples
>>> np.fft.fft([0, 1, 0, 0]) array([ 1.+0.j, 0.-1.j, -1.+0.j, 0.+1.j]) >>> np.fft.rfft([0, 1, 0, 0]) array([ 1.+0.j, 0.-1.j, -1.+0.j])
Notice how the final element of the fft output is the complex conjugate of the second element, for real input. For rfft, this symmetry is exploited to compute only the non-negative frequency terms.
-
dask.array.fft.
rfft2
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.rfft2
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.rfft2 docstring follows below:
Compute the 2-dimensional FFT of a real array.
Parameters: - a (array) – Input array, taken to be real.
- s (sequence of ints, optional) – Shape of the FFT.
- axes (sequence of ints, optional) – Axes over which to compute the FFT.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The result of the real 2-D FFT.
Return type: ndarray
See also
rfftn()
- Compute the N-dimensional discrete Fourier Transform for real input.
Notes
This is really just rfftn with different default behavior. For more details see rfftn.
-
dask.array.fft.
rfftn
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.rfftn
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.rfftn docstring follows below:
Compute the N-dimensional discrete Fourier Transform for real input.
This function computes the N-dimensional discrete Fourier Transform over any number of axes in an M-dimensional real array by means of the Fast Fourier Transform (FFT). By default, all axes are transformed, with the real transform performed over the last axis, while the remaining transforms are complex.
Parameters: - a (array_like) – Input array, taken to be real.
- s (sequence of ints, optional) – Shape (length along each transformed axis) to use from the input.
(
s[0]
refers to axis 0,s[1]
to axis 1, etc.). The final element of s corresponds to n forrfft(x, n)
, while for the remaining axes, it corresponds to n forfft(x, n)
. Along any axis, if the given shape is smaller than that of the input, the input is cropped. If it is larger, the input is padded with zeros. if s is not given, the shape of the input along the axes specified by axes is used. - axes (sequence of ints, optional) – Axes over which to compute the FFT. If not given, the last
len(s)
axes are used, or all axes if s is also not specified. - norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axes indicated by axes, or by a combination of s and a, as explained in the parameters section above. The length of the last axis transformed will be
s[-1]//2+1
, while the remaining transformed axes will have lengths according to s, or unchanged from the input.Return type: complex ndarray
Raises: ValueError
– If s and axes have different length.IndexError
– If an element of axes is larger than than the number of axes of a.
See also
Notes
The transform for real input is performed over the last transformation axis, as by rfft, then the transform over the remaining axes is performed as by fftn. The order of the output is as for rfft for the final transformation axis, and as for fftn for the remaining transformation axes.
See fft for details, definitions and conventions used.
Examples
>>> a = np.ones((2, 2, 2)) >>> np.fft.rfftn(a) array([[[ 8.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j]], [[ 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j]]])
>>> np.fft.rfftn(a, axes=(2, 0)) array([[[ 4.+0.j, 0.+0.j], [ 4.+0.j, 0.+0.j]], [[ 0.+0.j, 0.+0.j], [ 0.+0.j, 0.+0.j]]])
-
dask.array.fft.
irfft
(a, n=None, axis=None)¶ Wrapping of numpy.fft.fftpack.irfft
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.irfft docstring follows below:
Compute the inverse of the n-point DFT for real input.
This function computes the inverse of the one-dimensional n-point discrete Fourier Transform of real input computed by rfft. In other words,
irfft(rfft(a), len(a)) == a
to within numerical accuracy. (See Notes below for whylen(a)
is necessary here.)The input is expected to be in the form returned by rfft, i.e. the real zero-frequency term followed by the complex positive frequency terms in order of increasing frequency. Since the discrete Fourier Transform of real input is Hermitian-symmetric, the negative frequency terms are taken to be the complex conjugates of the corresponding positive frequency terms.
Parameters: - a (array_like) – The input array.
- n (int, optional) – Length of the transformed axis of the output.
For n output points,
n//2+1
input points are necessary. If the input is longer than this, it is cropped. If it is shorter than this, it is padded with zeros. If n is not given, it is determined from the length of the input along the axis specified by axis. - axis (int, optional) – Axis over which to compute the inverse FFT. If not given, the last axis is used.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified. The length of the transformed axis is n, or, if n is not given,
2*(m-1)
wherem
is the length of the transformed axis of the input. To get an odd number of output points, n must be specified.Return type: ndarray
Raises: IndexError
– If axis is larger than the last axis of a.See also
Notes
Returns the real valued n-point inverse discrete Fourier transform of a, where a contains the non-negative frequency terms of a Hermitian-symmetric sequence. n is the length of the result, not the input.
If you specify an n such that a must be zero-padded or truncated, the extra/removed values will be added/removed at high frequencies. One can thus resample a series to m points via Fourier interpolation by:
a_resamp = irfft(rfft(a), m)
.Examples
>>> np.fft.ifft([1, -1j, -1, 1j]) array([ 0.+0.j, 1.+0.j, 0.+0.j, 0.+0.j]) >>> np.fft.irfft([1, -1j, -1]) array([ 0., 1., 0., 0.])
Notice how the last term in the input to the ordinary ifft is the complex conjugate of the second term, and the output has zero imaginary part everywhere. When calling irfft, the negative frequencies are not specified, and the output array is purely real.
-
dask.array.fft.
irfft2
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.irfft2
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.irfft2 docstring follows below:
Compute the 2-dimensional inverse FFT of a real array.
Parameters: - a (array_like) – The input array
- s (sequence of ints, optional) – Shape of the inverse FFT.
- axes (sequence of ints, optional) – The axes over which to compute the inverse fft. Default is the last two axes.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The result of the inverse real 2-D FFT.
Return type: ndarray
See also
irfftn()
- Compute the inverse of the N-dimensional FFT of real input.
Notes
This is really irfftn with different defaults. For more details see irfftn.
-
dask.array.fft.
irfftn
(a, s=None, axes=None)¶ Wrapping of numpy.fft.fftpack.irfftn
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.irfftn docstring follows below:
Compute the inverse of the N-dimensional FFT of real input.
This function computes the inverse of the N-dimensional discrete Fourier Transform for real input over any number of axes in an M-dimensional array by means of the Fast Fourier Transform (FFT). In other words,
irfftn(rfftn(a), a.shape) == a
to within numerical accuracy. (Thea.shape
is necessary likelen(a)
is for irfft, and for the same reason.)The input should be ordered in the same way as is returned by rfftn, i.e. as for irfft for the final transformation axis, and as for ifftn along all the other axes.
Parameters: - a (array_like) – Input array.
- s (sequence of ints, optional) – Shape (length of each transformed axis) of the output
(
s[0]
refers to axis 0,s[1]
to axis 1, etc.). s is also the number of input points used along this axis, except for the last axis, wheres[-1]//2+1
points of the input are used. Along any axis, if the shape indicated by s is smaller than that of the input, the input is cropped. If it is larger, the input is padded with zeros. If s is not given, the shape of the input along the axes specified by axes is used. - axes (sequence of ints, optional) – Axes over which to compute the inverse FFT. If not given, the last len(s) axes are used, or all axes if s is also not specified. Repeated indices in axes means that the inverse transform over that axis is performed multiple times.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axes indicated by axes, or by a combination of s or a, as explained in the parameters section above. The length of each transformed axis is as given by the corresponding element of s, or the length of the input in every axis except for the last one if s is not given. In the final transformed axis the length of the output when s is not given is
2*(m-1)
wherem
is the length of the final transformed axis of the input. To get an odd number of output points in the final axis, s must be specified.Return type: ndarray
Raises: ValueError
– If s and axes have different length.IndexError
– If an element of axes is larger than than the number of axes of a.
See also
Notes
See fft for definitions and conventions used.
See rfft for definitions and conventions used for real input.
Examples
>>> a = np.zeros((3, 2, 2)) >>> a[0, 0, 0] = 3 * 2 * 2 >>> np.fft.irfftn(a) array([[[ 1., 1.], [ 1., 1.]], [[ 1., 1.], [ 1., 1.]], [[ 1., 1.], [ 1., 1.]]])
-
dask.array.fft.
hfft
(a, n=None, axis=None)¶ Wrapping of numpy.fft.fftpack.hfft
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.hfft docstring follows below:
Compute the FFT of a signal which has Hermitian symmetry (real spectrum).
Parameters: - a (array_like) – The input array.
- n (int, optional) – Length of the transformed axis of the output.
For n output points,
n//2+1
input points are necessary. If the input is longer than this, it is cropped. If it is shorter than this, it is padded with zeros. If n is not given, it is determined from the length of the input along the axis specified by axis. - axis (int, optional) – Axis over which to compute the FFT. If not given, the last axis is used.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified. The length of the transformed axis is n, or, if n is not given,
2*(m-1)
wherem
is the length of the transformed axis of the input. To get an odd number of output points, n must be specified.Return type: ndarray
Raises: IndexError
– If axis is larger than the last axis of a.Notes
hfft/ihfft are a pair analogous to rfft/irfft, but for the opposite case: here the signal has Hermitian symmetry in the time domain and is real in the frequency domain. So here it’s hfft for which you must supply the length of the result if it is to be odd:
ihfft(hfft(a), len(a)) == a
, within numerical accuracy.Examples
>>> signal = np.array([1, 2, 3, 4, 3, 2]) >>> np.fft.fft(signal) array([ 15.+0.j, -4.+0.j, 0.+0.j, -1.-0.j, 0.+0.j, -4.+0.j]) >>> np.fft.hfft(signal[:4]) # Input first half of signal array([ 15., -4., 0., -1., 0., -4.]) >>> np.fft.hfft(signal, 6) # Input entire signal and truncate array([ 15., -4., 0., -1., 0., -4.])
>>> signal = np.array([[1, 1.j], [-1.j, 2]]) >>> np.conj(signal.T) - signal # check Hermitian symmetry array([[ 0.-0.j, 0.+0.j], [ 0.+0.j, 0.-0.j]]) >>> freq_spectrum = np.fft.hfft(signal) >>> freq_spectrum array([[ 1., 1.], [ 2., -2.]])
-
dask.array.fft.
ihfft
(a, n=None, axis=None)¶ Wrapping of numpy.fft.fftpack.ihfft
The axis along which the FFT is applied must have a one chunk. To change the array’s chunking use dask.Array.rechunk.
The numpy.fft.fftpack.ihfft docstring follows below:
Compute the inverse FFT of a signal which has Hermitian symmetry.
Parameters: - a (array_like) – Input array.
- n (int, optional) – Length of the inverse FFT. Number of points along transformation axis in the input to use. If n is smaller than the length of the input, the input is cropped. If it is larger, the input is padded with zeros. If n is not given, the length of the input along the axis specified by axis is used.
- axis (int, optional) – Axis over which to compute the inverse FFT. If not given, the last axis is used.
- norm ({None, "ortho"}, optional) –
New in version 1.10.0.
Normalization mode (see numpy.fft). Default is None.
Returns: out – The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified. If n is even, the length of the transformed axis is
(n/2)+1
. If n is odd, the length is(n+1)/2
.Return type: complex ndarray
Notes
hfft/ihfft are a pair analogous to rfft/irfft, but for the opposite case: here the signal has Hermitian symmetry in the time domain and is real in the frequency domain. So here it’s hfft for which you must supply the length of the result if it is to be odd:
ihfft(hfft(a), len(a)) == a
, within numerical accuracy.Examples
>>> spectrum = np.array([ 15, -4, 0, -1, 0, -4]) >>> np.fft.ifft(spectrum) array([ 1.+0.j, 2.-0.j, 3.+0.j, 4.+0.j, 3.+0.j, 2.-0.j]) >>> np.fft.ihfft(spectrum) array([ 1.-0.j, 2.-0.j, 3.-0.j, 4.-0.j])
-
dask.array.fft.
fftfreq
(n, d=1.0, chunks=None)¶ Return the Discrete Fourier Transform sample frequencies.
The returned float array f contains the frequency bin centers in cycles per unit of the sample spacing (with zero at the start). For instance, if the sample spacing is in seconds, then the frequency unit is cycles/second.
Given a window length n and a sample spacing d:
f = [0, 1, ..., n/2-1, -n/2, ..., -1] / (d*n) if n is even f = [0, 1, ..., (n-1)/2, -(n-1)/2, ..., -1] / (d*n) if n is odd
Parameters: - n (int) – Window length.
- d (scalar, optional) – Sample spacing (inverse of the sampling rate). Defaults to 1.
Returns: f – Array of length n containing the sample frequencies.
Return type: ndarray
Examples
>>> signal = np.array([-2, 8, 6, 4, 1, 0, 3, 5], dtype=float) >>> fourier = np.fft.fft(signal) >>> n = signal.size >>> timestep = 0.1 >>> freq = np.fft.fftfreq(n, d=timestep) >>> freq array([ 0. , 1.25, 2.5 , 3.75, -5. , -3.75, -2.5 , -1.25])
-
dask.array.fft.
rfftfreq
(n, d=1.0, chunks=None)¶ Return the Discrete Fourier Transform sample frequencies (for usage with rfft, irfft).
The returned float array f contains the frequency bin centers in cycles per unit of the sample spacing (with zero at the start). For instance, if the sample spacing is in seconds, then the frequency unit is cycles/second.
Given a window length n and a sample spacing d:
f = [0, 1, ..., n/2-1, n/2] / (d*n) if n is even f = [0, 1, ..., (n-1)/2-1, (n-1)/2] / (d*n) if n is odd
Unlike fftfreq (but like scipy.fftpack.rfftfreq) the Nyquist frequency component is considered to be positive.
Parameters: - n (int) – Window length.
- d (scalar, optional) – Sample spacing (inverse of the sampling rate). Defaults to 1.
Returns: f – Array of length
n//2 + 1
containing the sample frequencies.Return type: ndarray
Examples
>>> signal = np.array([-2, 8, 6, 4, 1, 0, 3, 5, -3, 4], dtype=float) >>> fourier = np.fft.rfft(signal) >>> n = signal.size >>> sample_rate = 100 >>> freq = np.fft.fftfreq(n, d=1./sample_rate) >>> freq array([ 0., 10., 20., 30., 40., -50., -40., -30., -20., -10.]) >>> freq = np.fft.rfftfreq(n, d=1./sample_rate) >>> freq array([ 0., 10., 20., 30., 40., 50.])
-
dask.array.fft.
fftshift
(x, axes=None)¶ Shift the zero-frequency component to the center of the spectrum.
This function swaps half-spaces for all axes listed (defaults to all). Note that
y[0]
is the Nyquist component only iflen(x)
is even.Parameters: - x (array_like) – Input array.
- axes (int or shape tuple, optional) – Axes over which to shift. Default is None, which shifts all axes.
Returns: y – The shifted array.
Return type: ndarray
See also
ifftshift()
- The inverse of fftshift.
Examples
>>> freqs = np.fft.fftfreq(10, 0.1) >>> freqs array([ 0., 1., 2., 3., 4., -5., -4., -3., -2., -1.]) >>> np.fft.fftshift(freqs) array([-5., -4., -3., -2., -1., 0., 1., 2., 3., 4.])
Shift the zero-frequency component only along the second axis:
>>> freqs = np.fft.fftfreq(9, d=1./9).reshape(3, 3) >>> freqs array([[ 0., 1., 2.], [ 3., 4., -4.], [-3., -2., -1.]]) >>> np.fft.fftshift(freqs, axes=(1,)) array([[ 2., 0., 1.], [-4., 3., 4.], [-1., -3., -2.]])
-
dask.array.fft.
ifftshift
(x, axes=None)¶ The inverse of fftshift. Although identical for even-length x, the functions differ by one sample for odd-length x.
Parameters: - x (array_like) – Input array.
- axes (int or shape tuple, optional) – Axes over which to calculate. Defaults to None, which shifts all axes.
Returns: y – The shifted array.
Return type: ndarray
See also
fftshift()
- Shift zero-frequency component to the center of the spectrum.
Examples
>>> freqs = np.fft.fftfreq(9, d=1./9).reshape(3, 3) >>> freqs array([[ 0., 1., 2.], [ 3., 4., -4.], [-3., -2., -1.]]) >>> np.fft.ifftshift(np.fft.fftshift(freqs)) array([[ 0., 1., 2.], [ 3., 4., -4.], [-3., -2., -1.]])
-
dask.array.random.
beta
(a, b, size=None)¶ Draw samples from a Beta distribution.
The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function
\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]where the normalisation, B, is the beta function,
\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]It is often seen in Bayesian inference and order statistics.
Parameters: Returns: out – Array of the given shape, containing values drawn from a Beta distribution.
Return type: ndarray
-
dask.array.random.
binomial
(n, p, size=None)¶ Draw samples from a binomial distribution.
Samples are drawn from a binomial distribution with specified parameters, n trials and p probability of success where n an integer >= 0 and p is in the interval [0,1]. (n may be input as a float, but it is truncated to an integer in use)
Parameters: - n (float (but truncated to an integer)) – parameter, >= 0.
- p (float) – parameter, >= 0 and <=1.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – where the values are all integers in [0, n].
Return type: ndarray or scalar
See also
scipy.stats.distributions.binom()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the binomial distribution is
\[P(N) = \binom{n}{N}p^N(1-p)^{n-N},\]where \(n\) is the number of trials, \(p\) is the probability of success, and \(N\) is the number of successes.
When estimating the standard error of a proportion in a population by using a random sample, the normal distribution works well unless the product p*n <=5, where p = population proportion estimate, and n = number of samples, in which case the binomial distribution is used instead. For example, a sample of 15 people shows 4 who are left handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27*15 = 4, so the binomial distribution should be used in this case.
References
[1] Dalgaard, Peter, “Introductory Statistics with R”, Springer-Verlag, 2002. [2] Glantz, Stanton A. “Primer of Biostatistics.”, McGraw-Hill, Fifth Edition, 2002. [3] Lentner, Marvin, “Elementary Applied Statistics”, Bogden and Quigley, 1972. [4] Weisstein, Eric W. “Binomial Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/BinomialDistribution.html [5] Wikipedia, “Binomial-distribution”, http://en.wikipedia.org/wiki/Binomial_distribution Examples
Draw samples from the distribution:
>> n, p = 10, .5 # number of trials, probability of each trial >> s = np.random.binomial(n, p, 1000) # result of flipping a coin 10 times, tested 1000 times.
A real world example. A company drills 9 wild-cat oil exploration wells, each with an estimated probability of success of 0.1. All nine wells fail. What is the probability of that happening?
Let’s do 20,000 trials of the model, and count the number that generate zero positive results.
>> sum(np.random.binomial(9, 0.1, 20000) == 0)/20000. # answer = 0.38885, or 38%.
-
dask.array.random.
chisquare
(df, size=None)¶ Draw samples from a chi-square distribution.
When df independent random variables, each with standard normal distributions (mean 0, variance 1), are squared and summed, the resulting distribution is chi-square (see Notes). This distribution is often used in hypothesis testing.
Parameters: - df (int) – Number of degrees of freedom.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: output – Samples drawn from the distribution, packed in a size-shaped array.
Return type: ndarray
Raises: ValueError
– When df <= 0 or when an inappropriate size (e.g.size=-1
) is given.Notes
The variable obtained by summing the squares of df independent, standard normally distributed random variables:
\[Q = \sum_{i=0}^{\mathtt{df}} X^2_i\]is chi-square distributed, denoted
\[Q \sim \chi^2_k.\]The probability density function of the chi-squared distribution is
\[p(x) = \frac{(1/2)^{k/2}}{\Gamma(k/2)} x^{k/2 - 1} e^{-x/2},\]where \(\Gamma\) is the gamma function,
\[\Gamma(x) = \int_0^{-\infty} t^{x - 1} e^{-t} dt.\]References
[1] NIST “Engineering Statistics Handbook” http://www.itl.nist.gov/div898/handbook/eda/section3/eda3666.htm Examples
>> np.random.chisquare(2,4) array([ 1.89920014, 9.00867716, 3.13710533, 5.62318272])
-
dask.array.random.
exponential
(scale=1.0, size=None)¶ Draw samples from an exponential distribution.
Its probability density function is
\[f(x; \frac{1}{\beta}) = \frac{1}{\beta} \exp(-\frac{x}{\beta}),\]for
x > 0
and 0 elsewhere. \(\beta\) is the scale parameter, which is the inverse of the rate parameter \(\lambda = 1/\beta\). The rate parameter is an alternative, widely used parameterization of the exponential distribution [3]_.The exponential distribution is a continuous analogue of the geometric distribution. It describes many common situations, such as the size of raindrops measured over many rainstorms [1]_, or the time between page requests to Wikipedia [2]_.
Parameters: - scale (float) – The scale parameter, \(\beta = 1/\lambda\).
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
References
[1] Peyton Z. Peebles Jr., “Probability, Random Variables and Random Signal Principles”, 4th ed, 2001, p. 57. [2] “Poisson Process”, Wikipedia, http://en.wikipedia.org/wiki/Poisson_process [3] “Exponential Distribution, Wikipedia, http://en.wikipedia.org/wiki/Exponential_distribution
-
dask.array.random.
f
(dfnum, dfden, size=None)¶ Draw samples from an F distribution.
Samples are drawn from an F distribution with specified parameters, dfnum (degrees of freedom in numerator) and dfden (degrees of freedom in denominator), where both parameters should be greater than zero.
The random variate of the F distribution (also known as the Fisher distribution) is a continuous probability distribution that arises in ANOVA tests, and is the ratio of two chi-square variates.
Parameters: - dfnum (float) – Degrees of freedom in numerator. Should be greater than zero.
- dfden (float) – Degrees of freedom in denominator. Should be greater than zero.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – Samples from the Fisher distribution.
Return type: ndarray or scalar
See also
scipy.stats.distributions.f()
- probability density function, distribution or cumulative density function, etc.
Notes
The F statistic is used to compare in-group variances to between-group variances. Calculating the distribution depends on the sampling, and so it is a function of the respective degrees of freedom in the problem. The variable dfnum is the number of samples minus one, the between-groups degrees of freedom, while dfden is the within-groups degrees of freedom, the sum of the number of samples in each group minus the number of groups.
References
[1] Glantz, Stanton A. “Primer of Biostatistics.”, McGraw-Hill, Fifth Edition, 2002. [2] Wikipedia, “F-distribution”, http://en.wikipedia.org/wiki/F-distribution Examples
An example from Glantz[1], pp 47-40:
Two groups, children of diabetics (25 people) and children from people without diabetes (25 controls). Fasting blood glucose was measured, case group had a mean value of 86.1, controls had a mean value of 82.2. Standard deviations were 2.09 and 2.49 respectively. Are these data consistent with the null hypothesis that the parents diabetic status does not affect their children’s blood glucose levels? Calculating the F statistic from the data gives a value of 36.01.
Draw samples from the distribution:
>> dfnum = 1. # between group degrees of freedom >> dfden = 48. # within groups degrees of freedom >> s = np.random.f(dfnum, dfden, 1000)
The lower bound for the top 1% of the samples is :
>> sort(s)[-10] 7.61988120985
So there is about a 1% chance that the F statistic will exceed 7.62, the measured value is 36, so the null hypothesis is rejected at the 1% level.
-
dask.array.random.
gamma
(shape, scale=1.0, size=None)¶ Draw samples from a Gamma distribution.
Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated “k”) and scale (sometimes designated “theta”), where both parameters are > 0.
Parameters: - shape (scalar > 0) – The shape of the gamma distribution.
- scale (scalar > 0, optional) – The scale of the gamma distribution. Default is equal to 1.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: out – Returns one sample unless size parameter is specified.
Return type: ndarray, float
See also
scipy.stats.distributions.gamma()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the Gamma distribution is
\[p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},\]where \(k\) is the shape and \(\theta\) the scale, and \(\Gamma\) is the Gamma function.
The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.
References
[1] Weisstein, Eric W. “Gamma Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/GammaDistribution.html [2] Wikipedia, “Gamma-distribution”, http://en.wikipedia.org/wiki/Gamma-distribution Examples
Draw samples from the distribution:
>> shape, scale = 2., 2. # mean and dispersion >> s = np.random.gamma(shape, scale, 1000)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> import scipy.special as sps >> count, bins, ignored = plt.hist(s, 50, normed=True) >> y = bins**(shape-1)*(np.exp(-bins/scale) / .. (sps.gamma(shape)*scale**shape)) >> plt.plot(bins, y, linewidth=2, color=’r’) >> plt.show()
-
dask.array.random.
geometric
(p, size=None)¶ Draw samples from the geometric distribution.
Bernoulli trials are experiments with one of two outcomes: success or failure (an example of such an experiment is flipping a coin). The geometric distribution models the number of trials that must be run in order to achieve success. It is therefore supported on the positive integers,
k = 1, 2, ..
.The probability mass function of the geometric distribution is
\[f(k) = (1 - p)^{k - 1} p\]where p is the probability of success of an individual trial.
Parameters: - p (float) – The probability of success of an individual trial.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: out – Samples from the geometric distribution, shaped according to size.
Return type: ndarray
Examples
Draw ten thousand values from the geometric distribution, with the probability of an individual success equal to 0.35:
>> z = np.random.geometric(p=0.35, size=10000)
How many trials succeeded after a single run?
>> (z == 1).sum() / 10000. 0.34889999999999999 #random
-
dask.array.random.
gumbel
(loc=0.0, scale=1.0, size=None)¶ Draw samples from a Gumbel distribution.
Draw samples from a Gumbel distribution with specified location and scale. For more information on the Gumbel distribution, see Notes and References below.
Parameters: - loc (float) – The location of the mode of the distribution.
- scale (float) – The scale parameter of the distribution.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples
Return type: ndarray or scalar
See also
scipy.stats.gumbel_l()
,scipy.stats.gumbel_r()
,scipy.stats.genextreme()
,weibull()
Notes
The Gumbel (or Smallest Extreme Value (SEV) or the Smallest Extreme Value Type I) distribution is one of a class of Generalized Extreme Value (GEV) distributions used in modeling extreme value problems. The Gumbel is a special case of the Extreme Value Type I distribution for maximums from distributions with “exponential-like” tails.
The probability density for the Gumbel distribution is
\[p(x) = \frac{e^{-(x - \mu)/ \beta}}{\beta} e^{ -e^{-(x - \mu)/ \beta}},\]where \(\mu\) is the mode, a location parameter, and \(\beta\) is the scale parameter.
The Gumbel (named for German mathematician Emil Julius Gumbel) was used very early in the hydrology literature, for modeling the occurrence of flood events. It is also used for modeling maximum wind speed and rainfall rates. It is a “fat-tailed” distribution - the probability of an event in the tail of the distribution is larger than if one used a Gaussian, hence the surprisingly frequent occurrence of 100-year floods. Floods were initially modeled as a Gaussian process, which underestimated the frequency of extreme events.
It is one of a class of extreme value distributions, the Generalized Extreme Value (GEV) distributions, which also includes the Weibull and Frechet.
The function has a mean of \(\mu + 0.57721\beta\) and a variance of \(\frac{\pi^2}{6}\beta^2\).
References
[1] Gumbel, E. J., “Statistics of Extremes,” New York: Columbia University Press, 1958. [2] Reiss, R.-D. and Thomas, M., “Statistical Analysis of Extreme Values from Insurance, Finance, Hydrology and Other Fields,” Basel: Birkhauser Verlag, 2001. Examples
Draw samples from the distribution:
>> mu, beta = 0, 0.1 # location and scale >> s = np.random.gumbel(mu, beta, 1000)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> count, bins, ignored = plt.hist(s, 30, normed=True) >> plt.plot(bins, (1/beta)*np.exp(-(bins - mu)/beta) .. * np.exp( -np.exp( -(bins - mu) /beta) ), .. linewidth=2, color=’r’) >> plt.show()
Show how an extreme value distribution can arise from a Gaussian process and compare to a Gaussian:
>> means = [] >> maxima = [] >> for i in range(0,1000) : .. a = np.random.normal(mu, beta, 1000) .. means.append(a.mean()) .. maxima.append(a.max()) >> count, bins, ignored = plt.hist(maxima, 30, normed=True) >> beta = np.std(maxima) * np.sqrt(6) / np.pi >> mu = np.mean(maxima) - 0.57721*beta >> plt.plot(bins, (1/beta)*np.exp(-(bins - mu)/beta) .. * np.exp(-np.exp(-(bins - mu)/beta)), .. linewidth=2, color=’r’) >> plt.plot(bins, 1/(beta * np.sqrt(2 * np.pi)) .. * np.exp(-(bins - mu)**2 / (2 * beta**2)), .. linewidth=2, color=’g’) >> plt.show()
-
dask.array.random.
hypergeometric
(ngood, nbad, nsample, size=None)¶ Draw samples from a Hypergeometric distribution.
Samples are drawn from a hypergeometric distribution with specified parameters, ngood (ways to make a good selection), nbad (ways to make a bad selection), and nsample = number of items sampled, which is less than or equal to the sum ngood + nbad.
Parameters: - ngood (int or array_like) – Number of ways to make a good selection. Must be nonnegative.
- nbad (int or array_like) – Number of ways to make a bad selection. Must be nonnegative.
- nsample (int or array_like) – Number of items sampled. Must be at least 1 and at most
ngood + nbad
. - size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – The values are all integers in [0, n].
Return type: ndarray or scalar
See also
scipy.stats.distributions.hypergeom()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the Hypergeometric distribution is
\[P(x) = \frac{\binom{m}{n}\binom{N-m}{n-x}}{\binom{N}{n}},\]where \(0 \le x \le m\) and \(n+m-N \le x \le n\)
for P(x) the probability of x successes, n = ngood, m = nbad, and N = number of samples.
Consider an urn with black and white marbles in it, ngood of them black and nbad are white. If you draw nsample balls without replacement, then the hypergeometric distribution describes the distribution of black balls in the drawn sample.
Note that this distribution is very similar to the binomial distribution, except that in this case, samples are drawn without replacement, whereas in the Binomial case samples are drawn with replacement (or the sample space is infinite). As the sample space becomes large, this distribution approaches the binomial.
References
[1] Lentner, Marvin, “Elementary Applied Statistics”, Bogden and Quigley, 1972. [2] Weisstein, Eric W. “Hypergeometric Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/HypergeometricDistribution.html [3] Wikipedia, “Hypergeometric-distribution”, http://en.wikipedia.org/wiki/Hypergeometric_distribution Examples
Draw samples from the distribution:
>> ngood, nbad, nsamp = 100, 2, 10 # number of good, number of bad, and number of samples >> s = np.random.hypergeometric(ngood, nbad, nsamp, 1000) >> hist(s) # note that it is very unlikely to grab both bad items
Suppose you have an urn with 15 white and 15 black marbles. If you pull 15 marbles at random, how likely is it that 12 or more of them are one color?
>> s = np.random.hypergeometric(15, 15, 15, 100000) >> sum(s>=12)/100000. + sum(s<=3)/100000. # answer = 0.003 .. pretty unlikely!
-
dask.array.random.
laplace
(loc=0.0, scale=1.0, size=None)¶ Draw samples from the Laplace or double exponential distribution with specified location (or mean) and scale (decay).
The Laplace distribution is similar to the Gaussian/normal distribution, but is sharper at the peak and has fatter tails. It represents the difference between two independent, identically distributed exponential random variables.
Parameters: - loc (float, optional) – The position, \(\mu\), of the distribution peak.
- scale (float, optional) – \(\lambda\), the exponential decay.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples
Return type: ndarray or float
Notes
It has the probability density function
\[f(x; \mu, \lambda) = \frac{1}{2\lambda} \exp\left(-\frac{|x - \mu|}{\lambda}\right).\]The first law of Laplace, from 1774, states that the frequency of an error can be expressed as an exponential function of the absolute magnitude of the error, which leads to the Laplace distribution. For many problems in economics and health sciences, this distribution seems to model the data better than the standard Gaussian distribution.
References
[1] Abramowitz, M. and Stegun, I. A. (Eds.). “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing,” New York: Dover, 1972. [2] Kotz, Samuel, et. al. “The Laplace Distribution and Generalizations, ” Birkhauser, 2001. [3] Weisstein, Eric W. “Laplace Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/LaplaceDistribution.html [4] Wikipedia, “Laplace Distribution”, http://en.wikipedia.org/wiki/Laplace_distribution Examples
Draw samples from the distribution
>> loc, scale = 0., 1. >> s = np.random.laplace(loc, scale, 1000)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> count, bins, ignored = plt.hist(s, 30, normed=True) >> x = np.arange(-8., 8., .01) >> pdf = np.exp(-abs(x-loc)/scale)/(2.*scale) >> plt.plot(x, pdf)
Plot Gaussian for comparison:
>> g = (1/(scale * np.sqrt(2 * np.pi)) * .. np.exp(-(x - loc)**2 / (2 * scale**2))) >> plt.plot(x,g)
-
dask.array.random.
logistic
(loc=0.0, scale=1.0, size=None)¶ Draw samples from a logistic distribution.
Samples are drawn from a logistic distribution with specified parameters, loc (location or mean, also median), and scale (>0).
Parameters: - loc (float) –
- scale (float > 0.) –
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – where the values are all integers in [0, n].
Return type: ndarray or scalar
See also
scipy.stats.distributions.logistic()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the Logistic distribution is
\[P(x) = P(x) = \frac{e^{-(x-\mu)/s}}{s(1+e^{-(x-\mu)/s})^2},\]where \(\mu\) = location and \(s\) = scale.
The Logistic distribution is used in Extreme Value problems where it can act as a mixture of Gumbel distributions, in Epidemiology, and by the World Chess Federation (FIDE) where it is used in the Elo ranking system, assuming the performance of each player is a logistically distributed random variable.
References
[1] Reiss, R.-D. and Thomas M. (2001), “Statistical Analysis of Extreme Values, from Insurance, Finance, Hydrology and Other Fields,” Birkhauser Verlag, Basel, pp 132-133. [2] Weisstein, Eric W. “Logistic Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/LogisticDistribution.html [3] Wikipedia, “Logistic-distribution”, http://en.wikipedia.org/wiki/Logistic_distribution Examples
Draw samples from the distribution:
>> loc, scale = 10, 1 >> s = np.random.logistic(loc, scale, 10000) >> count, bins, ignored = plt.hist(s, bins=50)
# plot against distribution
>> def logist(x, loc, scale): .. return exp((loc-x)/scale)/(scale*(1+exp((loc-x)/scale))**2) >> plt.plot(bins, logist(bins, loc, scale)*count.max()/.. logist(bins, loc, scale).max()) >> plt.show()
-
dask.array.random.
lognormal
(mean=0.0, sigma=1.0, size=None)¶ Draw samples from a log-normal distribution.
Draw samples from a log-normal distribution with specified mean, standard deviation, and array shape. Note that the mean and standard deviation are not the values for the distribution itself, but of the underlying normal distribution it is derived from.
Parameters: - mean (float) – Mean value of the underlying normal distribution
- sigma (float, > 0.) – Standard deviation of the underlying normal distribution
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – The desired samples. An array of the same shape as size if given, if size is None a float is returned.
Return type: ndarray or float
See also
scipy.stats.lognorm()
- probability density function, distribution, cumulative density function, etc.
Notes
A variable x has a log-normal distribution if log(x) is normally distributed. The probability density function for the log-normal distribution is:
\[p(x) = \frac{1}{\sigma x \sqrt{2\pi}} e^{(-\frac{(ln(x)-\mu)^2}{2\sigma^2})}\]where \(\mu\) is the mean and \(\sigma\) is the standard deviation of the normally distributed logarithm of the variable. A log-normal distribution results if a random variable is the product of a large number of independent, identically-distributed variables in the same way that a normal distribution results if the variable is the sum of a large number of independent, identically-distributed variables.
References
[1] Limpert, E., Stahel, W. A., and Abbt, M., “Log-normal Distributions across the Sciences: Keys and Clues,” BioScience, Vol. 51, No. 5, May, 2001. http://stat.ethz.ch/~stahel/lognormal/bioscience.pdf [2] Reiss, R.D. and Thomas, M., “Statistical Analysis of Extreme Values,” Basel: Birkhauser Verlag, 2001, pp. 31-32. Examples
Draw samples from the distribution:
>> mu, sigma = 3., 1. # mean and standard deviation >> s = np.random.lognormal(mu, sigma, 1000)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> count, bins, ignored = plt.hist(s, 100, normed=True, align=’mid’)
>> x = np.linspace(min(bins), max(bins), 10000) >> pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) .. / (x * sigma * np.sqrt(2 * np.pi)))
>> plt.plot(x, pdf, linewidth=2, color=’r’) >> plt.axis(‘tight’) >> plt.show()
Demonstrate that taking the products of random samples from a uniform distribution can be fit well by a log-normal probability density function.
>> # Generate a thousand samples: each is the product of 100 random >> # values, drawn from a normal distribution. >> b = [] >> for i in range(1000): .. a = 10. + np.random.random(100) .. b.append(np.product(a))
>> b = np.array(b) / np.min(b) # scale values to be positive >> count, bins, ignored = plt.hist(b, 100, normed=True, align=’mid’) >> sigma = np.std(np.log(b)) >> mu = np.mean(np.log(b))
>> x = np.linspace(min(bins), max(bins), 10000) >> pdf = (np.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)) .. / (x * sigma * np.sqrt(2 * np.pi)))
>> plt.plot(x, pdf, color=’r’, linewidth=2) >> plt.show()
-
dask.array.random.
logseries
(p, size=None)¶ Draw samples from a logarithmic series distribution.
Samples are drawn from a log series distribution with specified shape parameter, 0 <
p
< 1.Parameters: - loc (float) –
- scale (float > 0.) –
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – where the values are all integers in [0, n].
Return type: ndarray or scalar
See also
scipy.stats.distributions.logser()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the Log Series distribution is
\[P(k) = \frac{-p^k}{k \ln(1-p)},\]where p = probability.
The log series distribution is frequently used to represent species richness and occurrence, first proposed by Fisher, Corbet, and Williams in 1943 [2]. It may also be used to model the numbers of occupants seen in cars [3].
References
[1] Buzas, Martin A.; Culver, Stephen J., Understanding regional species diversity through the log series distribution of occurrences: BIODIVERSITY RESEARCH Diversity & Distributions, Volume 5, Number 5, September 1999 , pp. 187-195(9). [2] Fisher, R.A,, A.S. Corbet, and C.B. Williams. 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. Journal of Animal Ecology, 12:42-58. [3] D. J. Hand, F. Daly, D. Lunn, E. Ostrowski, A Handbook of Small Data Sets, CRC Press, 1994. [4] Wikipedia, “Logarithmic-distribution”, http://en.wikipedia.org/wiki/Logarithmic-distribution Examples
Draw samples from the distribution:
>> a = .6 >> s = np.random.logseries(a, 10000) >> count, bins, ignored = plt.hist(s)
# plot against distribution
>> def logseries(k, p): .. return -p**k/(k*log(1-p)) >> plt.plot(bins, logseries(bins, a)*count.max()/
logseries(bins, a).max(), ‘r’)>> plt.show()
-
dask.array.random.
negative_binomial
(n, p, size=None)¶ Draw samples from a negative binomial distribution.
Samples are drawn from a negative binomial distribution with specified parameters, n trials and p probability of success where n is an integer > 0 and p is in the interval [0, 1].
Parameters: Returns: samples – Drawn samples.
Return type: int or ndarray of ints
Notes
The probability density for the negative binomial distribution is
\[P(N;n,p) = \binom{N+n-1}{n-1}p^{n}(1-p)^{N},\]where \(n-1\) is the number of successes, \(p\) is the probability of success, and \(N+n-1\) is the number of trials. The negative binomial distribution gives the probability of n-1 successes and N failures in N+n-1 trials, and success on the (N+n)th trial.
If one throws a die repeatedly until the third time a “1” appears, then the probability distribution of the number of non-“1”s that appear before the third “1” is a negative binomial distribution.
References
[1] Weisstein, Eric W. “Negative Binomial Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/NegativeBinomialDistribution.html [2] Wikipedia, “Negative binomial distribution”, http://en.wikipedia.org/wiki/Negative_binomial_distribution Examples
Draw samples from the distribution:
A real world example. A company drills wild-cat oil exploration wells, each with an estimated probability of success of 0.1. What is the probability of having one success for each successive well, that is what is the probability of a single success after drilling 5 wells, after 6 wells, etc.?
>> s = np.random.negative_binomial(1, 0.1, 100000) >> for i in range(1, 11): .. probability = sum(s<i) / 100000. .. print i, “wells drilled, probability of one success =”, probability
-
dask.array.random.
noncentral_chisquare
(df, nonc, size=None)¶ Draw samples from a noncentral chi-square distribution.
The noncentral \(\chi^2\) distribution is a generalisation of the \(\chi^2\) distribution.
Parameters: - df (int) – Degrees of freedom, should be > 0 as of Numpy 1.10, should be > 1 for earlier versions.
- nonc (float) – Non-centrality, should be non-negative.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Notes
The probability density function for the noncentral Chi-square distribution is
\[P(x;df,nonc) = \sum^{\infty}_{i=0} \frac{e^{-nonc/2}(nonc/2)^{i}}{i!} \P_{Y_{df+2i}}(x),\]where \(Y_{q}\) is the Chi-square with q degrees of freedom.
In Delhi (2007), it is noted that the noncentral chi-square is useful in bombing and coverage problems, the probability of killing the point target given by the noncentral chi-squared distribution.
References
[1] Delhi, M.S. Holla, “On a noncentral chi-square distribution in the analysis of weapon systems effectiveness”, Metrika, Volume 15, Number 1 / December, 1970. [2] Wikipedia, “Noncentral chi-square distribution” http://en.wikipedia.org/wiki/Noncentral_chi-square_distribution Examples
Draw values from the distribution and plot the histogram
>> import matplotlib.pyplot as plt >> values = plt.hist(np.random.noncentral_chisquare(3, 20, 100000), .. bins=200, normed=True) >> plt.show()
Draw values from a noncentral chisquare with very small noncentrality, and compare to a chisquare.
>> plt.figure() >> values = plt.hist(np.random.noncentral_chisquare(3, .0000001, 100000), .. bins=np.arange(0., 25, .1), normed=True) >> values2 = plt.hist(np.random.chisquare(3, 100000), .. bins=np.arange(0., 25, .1), normed=True) >> plt.plot(values[1][0:-1], values[0]-values2[0], ‘ob’) >> plt.show()
Demonstrate how large values of non-centrality lead to a more symmetric distribution.
>> plt.figure() >> values = plt.hist(np.random.noncentral_chisquare(3, 20, 100000), .. bins=200, normed=True) >> plt.show()
-
dask.array.random.
noncentral_f
(dfnum, dfden, nonc, size=None)¶ Draw samples from the noncentral F distribution.
Samples are drawn from an F distribution with specified parameters, dfnum (degrees of freedom in numerator) and dfden (degrees of freedom in denominator), where both parameters > 1. nonc is the non-centrality parameter.
Parameters: - dfnum (int) – Parameter, should be > 1.
- dfden (int) – Parameter, should be > 1.
- nonc (float) – Parameter, should be >= 0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – Drawn samples.
Return type: scalar or ndarray
Notes
When calculating the power of an experiment (power = probability of rejecting the null hypothesis when a specific alternative is true) the non-central F statistic becomes important. When the null hypothesis is true, the F statistic follows a central F distribution. When the null hypothesis is not true, then it follows a non-central F statistic.
References
[1] Weisstein, Eric W. “Noncentral F-Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/NoncentralF-Distribution.html [2] Wikipedia, “Noncentral F distribution”, http://en.wikipedia.org/wiki/Noncentral_F-distribution Examples
In a study, testing for a specific alternative to the null hypothesis requires use of the Noncentral F distribution. We need to calculate the area in the tail of the distribution that exceeds the value of the F distribution for the null hypothesis. We’ll plot the two probability distributions for comparison.
>> dfnum = 3 # between group deg of freedom >> dfden = 20 # within groups degrees of freedom >> nonc = 3.0 >> nc_vals = np.random.noncentral_f(dfnum, dfden, nonc, 1000000) >> NF = np.histogram(nc_vals, bins=50, normed=True) >> c_vals = np.random.f(dfnum, dfden, 1000000) >> F = np.histogram(c_vals, bins=50, normed=True) >> plt.plot(F[1][1:], F[0]) >> plt.plot(NF[1][1:], NF[0]) >> plt.show()
-
dask.array.random.
normal
(loc=0.0, scale=1.0, size=None)¶ Draw random samples from a normal (Gaussian) distribution.
The probability density function of the normal distribution, first derived by De Moivre and 200 years later by both Gauss and Laplace independently [2]_, is often called the bell curve because of its characteristic shape (see the example below).
The normal distributions occurs often in nature. For example, it describes the commonly occurring distribution of samples influenced by a large number of tiny, random disturbances, each with its own unique distribution [2]_.
Parameters: - loc (float) – Mean (“centre”) of the distribution.
- scale (float) – Standard deviation (spread or “width”) of the distribution.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
See also
scipy.stats.distributions.norm()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the Gaussian distribution is
\[p(x) = \frac{1}{\sqrt{ 2 \pi \sigma^2 }} e^{ - \frac{ (x - \mu)^2 } {2 \sigma^2} },\]where \(\mu\) is the mean and \(\sigma\) the standard deviation. The square of the standard deviation, \(\sigma^2\), is called the variance.
The function has its peak at the mean, and its “spread” increases with the standard deviation (the function reaches 0.607 times its maximum at \(x + \sigma\) and \(x - \sigma\) [2]_). This implies that numpy.random.normal is more likely to return samples lying close to the mean, rather than those far away.
References
[1] Wikipedia, “Normal distribution”, http://en.wikipedia.org/wiki/Normal_distribution [2] P. R. Peebles Jr., “Central Limit Theorem” in “Probability, Random Variables and Random Signal Principles”, 4th ed., 2001, pp. 51, 51, 125. Examples
Draw samples from the distribution:
>> mu, sigma = 0, 0.1 # mean and standard deviation >> s = np.random.normal(mu, sigma, 1000)
Verify the mean and the variance:
>> abs(mu - np.mean(s)) < 0.01 True
>> abs(sigma - np.std(s, ddof=1)) < 0.01 True
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> count, bins, ignored = plt.hist(s, 30, normed=True) >> plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * .. np.exp( - (bins - mu)**2 / (2 * sigma**2) ), .. linewidth=2, color=’r’) >> plt.show()
-
dask.array.random.
pareto
(a, size=None)¶ Draw samples from a Pareto II or Lomax distribution with specified shape.
The Lomax or Pareto II distribution is a shifted Pareto distribution. The classical Pareto distribution can be obtained from the Lomax distribution by adding 1 and multiplying by the scale parameter
m
(see Notes). The smallest value of the Lomax distribution is zero while for the classical Pareto distribution it ismu
, where the standard Pareto distribution has locationmu = 1
. Lomax can also be considered as a simplified version of the Generalized Pareto distribution (available in SciPy), with the scale set to one and the location set to zero.The Pareto distribution must be greater than zero, and is unbounded above. It is also known as the “80-20 rule”. In this distribution, 80 percent of the weights are in the lowest 20 percent of the range, while the other 20 percent fill the remaining 80 percent of the range.
Parameters: - shape (float, > 0.) – Shape of the distribution.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
See also
scipy.stats.distributions.lomax.pdf()
- probability density function, distribution or cumulative density function, etc.
scipy.stats.distributions.genpareto.pdf()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the Pareto distribution is
\[p(x) = \frac{am^a}{x^{a+1}}\]where \(a\) is the shape and \(m\) the scale.
The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution useful in many real world problems. Outside the field of economics it is generally referred to as the Bradford distribution. Pareto developed the distribution to describe the distribution of wealth in an economy. It has also found use in insurance, web page access statistics, oil field sizes, and many other problems, including the download frequency for projects in Sourceforge [1]_. It is one of the so-called “fat-tailed” distributions.
References
[1] Francis Hunt and Paul Johnson, On the Pareto Distribution of Sourceforge projects. [2] Pareto, V. (1896). Course of Political Economy. Lausanne. [3] Reiss, R.D., Thomas, M.(2001), Statistical Analysis of Extreme Values, Birkhauser Verlag, Basel, pp 23-30. [4] Wikipedia, “Pareto distribution”, http://en.wikipedia.org/wiki/Pareto_distribution Examples
Draw samples from the distribution:
>> a, m = 3., 2. # shape and mode >> s = (np.random.pareto(a, 1000) + 1) * m
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> count, bins, _ = plt.hist(s, 100, normed=True) >> fit = a*m**a / bins**(a+1) >> plt.plot(bins, max(count)*fit/max(fit), linewidth=2, color=’r’) >> plt.show()
-
dask.array.random.
poisson
(lam=1.0, size=None)¶ Draw samples from a Poisson distribution.
The Poisson distribution is the limit of the binomial distribution for large N.
Parameters: - lam (float or sequence of float) – Expectation of interval, should be >= 0. A sequence of expectation intervals must be broadcastable over the requested size.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – The drawn samples, of shape size, if it was provided.
Return type: ndarray or scalar
Notes
The Poisson distribution
\[f(k; \lambda)=\frac{\lambda^k e^{-\lambda}}{k!}\]For events with an expected separation \(\lambda\) the Poisson distribution \(f(k; \lambda)\) describes the probability of \(k\) events occurring within the observed interval \(\lambda\).
Because the output is limited to the range of the C long type, a ValueError is raised when lam is within 10 sigma of the maximum representable value.
References
[1] Weisstein, Eric W. “Poisson Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/PoissonDistribution.html [2] Wikipedia, “Poisson distribution”, http://en.wikipedia.org/wiki/Poisson_distribution Examples
Draw samples from the distribution:
>> import numpy as np >> s = np.random.poisson(5, 10000)
Display histogram of the sample:
>> import matplotlib.pyplot as plt >> count, bins, ignored = plt.hist(s, 14, normed=True) >> plt.show()
Draw each 100 values for lambda 100 and 500:
>> s = np.random.poisson(lam=(100., 500.), size=(100, 2))
-
dask.array.random.
power
(a, size=None)¶ Draws samples in [0, 1] from a power distribution with positive exponent a - 1.
Also known as the power function distribution.
Parameters: - a (float) – parameter, > 0
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – The returned samples lie in [0, 1].
Return type: ndarray or scalar
Raises: ValueError
– If a < 1.Notes
The probability density function is
\[P(x; a) = ax^{a-1}, 0 \le x \le 1, a>0.\]The power function distribution is just the inverse of the Pareto distribution. It may also be seen as a special case of the Beta distribution.
It is used, for example, in modeling the over-reporting of insurance claims.
References
[1] Christian Kleiber, Samuel Kotz, “Statistical size distributions in economics and actuarial sciences”, Wiley, 2003. [2] Heckert, N. A. and Filliben, James J. “NIST Handbook 148: Dataplot Reference Manual, Volume 2: Let Subcommands and Library Functions”, National Institute of Standards and Technology Handbook Series, June 2003. http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/powpdf.pdf Examples
Draw samples from the distribution:
>> a = 5. # shape >> samples = 1000 >> s = np.random.power(a, samples)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> count, bins, ignored = plt.hist(s, bins=30) >> x = np.linspace(0, 1, 100) >> y = a*x**(a-1.) >> normed_y = samples*np.diff(bins)[0]*y >> plt.plot(x, normed_y) >> plt.show()
Compare the power function distribution to the inverse of the Pareto.
>> from scipy import stats >> rvs = np.random.power(5, 1000000) >> rvsp = np.random.pareto(5, 1000000) >> xx = np.linspace(0,1,100) >> powpdf = stats.powerlaw.pdf(xx,5)
>> plt.figure() >> plt.hist(rvs, bins=50, normed=True) >> plt.plot(xx,powpdf,’r-‘) >> plt.title(‘np.random.power(5)’)
>> plt.figure() >> plt.hist(1./(1.+rvsp), bins=50, normed=True) >> plt.plot(xx,powpdf,’r-‘) >> plt.title(‘inverse of 1 + np.random.pareto(5)’)
>> plt.figure() >> plt.hist(1./(1.+rvsp), bins=50, normed=True) >> plt.plot(xx,powpdf,’r-‘) >> plt.title(‘inverse of stats.pareto(5)’)
-
dask.array.random.
random
(self, size=None, chunks=None)¶ random_sample(size=None)
Return random floats in the half-open interval [0.0, 1.0).
Results are from the “continuous uniform” distribution over the stated interval. To sample \(Unif[a, b), b > a\) multiply the output of random_sample by (b-a) and add a:
(b - a) * random_sample() + a
Parameters: size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.Returns: out – Array of random floats of shape size (unless size=None
, in which case a single float is returned).Return type: float or ndarray of floats Examples
>> np.random.random_sample() 0.47108547995356098 >> type(np.random.random_sample()) <type ‘float’> >> np.random.random_sample((5,)) array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])
Three-by-two array of random numbers from [-5, 0):
>> 5 * np.random.random_sample((3, 2)) - 5 array([[-3.99149989, -0.52338984],
[-2.99091858, -0.79479508], [-1.23204345, -1.75224494]])
-
dask.array.random.
random_sample
(size=None)¶ Return random floats in the half-open interval [0.0, 1.0).
Results are from the “continuous uniform” distribution over the stated interval. To sample \(Unif[a, b), b > a\) multiply the output of random_sample by (b-a) and add a:
(b - a) * random_sample() + a
Parameters: size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.Returns: out – Array of random floats of shape size (unless size=None
, in which case a single float is returned).Return type: float or ndarray of floats Examples
>> np.random.random_sample() 0.47108547995356098 >> type(np.random.random_sample()) <type ‘float’> >> np.random.random_sample((5,)) array([ 0.30220482, 0.86820401, 0.1654503 , 0.11659149, 0.54323428])
Three-by-two array of random numbers from [-5, 0):
>> 5 * np.random.random_sample((3, 2)) - 5 array([[-3.99149989, -0.52338984],
[-2.99091858, -0.79479508], [-1.23204345, -1.75224494]])
-
dask.array.random.
rayleigh
(scale=1.0, size=None)¶ Draw samples from a Rayleigh distribution.
The \(\chi\) and Weibull distributions are generalizations of the Rayleigh.
Parameters: - scale (scalar) – Scale, also equals the mode. Should be >= 0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Notes
The probability density function for the Rayleigh distribution is
\[P(x;scale) = \frac{x}{scale^2}e^{\frac{-x^2}{2 \cdotp scale^2}}\]The Rayleigh distribution would arise, for example, if the East and North components of the wind velocity had identical zero-mean Gaussian distributions. Then the wind speed would have a Rayleigh distribution.
References
[1] Brighton Webs Ltd., “Rayleigh Distribution,” http://www.brighton-webs.co.uk/distributions/rayleigh.asp [2] Wikipedia, “Rayleigh distribution” http://en.wikipedia.org/wiki/Rayleigh_distribution Examples
Draw values from the distribution and plot the histogram
>> values = hist(np.random.rayleigh(3, 100000), bins=200, normed=True)
Wave heights tend to follow a Rayleigh distribution. If the mean wave height is 1 meter, what fraction of waves are likely to be larger than 3 meters?
>> meanvalue = 1 >> modevalue = np.sqrt(2 / np.pi) * meanvalue >> s = np.random.rayleigh(modevalue, 1000000)
The percentage of waves larger than 3 meters is:
>> 100.*sum(s>3)/1000000. 0.087300000000000003
-
dask.array.random.
standard_cauchy
(size=None)¶ Draw samples from a standard Cauchy distribution with mode = 0.
Also known as the Lorentz distribution.
Parameters: size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.Returns: samples – The drawn samples. Return type: ndarray or scalar Notes
The probability density function for the full Cauchy distribution is
\[P(x; x_0, \gamma) = \frac{1}{\pi \gamma \bigl[ 1+ (\frac{x-x_0}{\gamma})^2 \bigr] }\]and the Standard Cauchy distribution just sets \(x_0=0\) and \(\gamma=1\)
The Cauchy distribution arises in the solution to the driven harmonic oscillator problem, and also describes spectral line broadening. It also describes the distribution of values at which a line tilted at a random angle will cut the x axis.
When studying hypothesis tests that assume normality, seeing how the tests perform on data from a Cauchy distribution is a good indicator of their sensitivity to a heavy-tailed distribution, since the Cauchy looks very much like a Gaussian distribution, but with heavier tails.
References
[1] NIST/SEMATECH e-Handbook of Statistical Methods, “Cauchy Distribution”, http://www.itl.nist.gov/div898/handbook/eda/section3/eda3663.htm [2] Weisstein, Eric W. “Cauchy Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/CauchyDistribution.html [3] Wikipedia, “Cauchy distribution” http://en.wikipedia.org/wiki/Cauchy_distribution Examples
Draw samples and plot the distribution:
>> s = np.random.standard_cauchy(1000000) >> s = s[(s>-25) & (s<25)] # truncate distribution so it plots well >> plt.hist(s, bins=100) >> plt.show()
-
dask.array.random.
standard_exponential
(size=None)¶ Draw samples from the standard exponential distribution.
standard_exponential is identical to the exponential distribution with a scale parameter of 1.
Parameters: size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.Returns: out – Drawn samples. Return type: float or ndarray Examples
Output a 3x8000 array:
>> n = np.random.standard_exponential((3, 8000))
-
dask.array.random.
standard_gamma
(shape, size=None)¶ Draw samples from a standard Gamma distribution.
Samples are drawn from a Gamma distribution with specified parameters, shape (sometimes designated “k”) and scale=1.
Parameters: - shape (float) – Parameter, should be > 0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – The drawn samples.
Return type: ndarray or scalar
See also
scipy.stats.distributions.gamma()
- probability density function, distribution or cumulative density function, etc.
Notes
The probability density for the Gamma distribution is
\[p(x) = x^{k-1}\frac{e^{-x/\theta}}{\theta^k\Gamma(k)},\]where \(k\) is the shape and \(\theta\) the scale, and \(\Gamma\) is the Gamma function.
The Gamma distribution is often used to model the times to failure of electronic components, and arises naturally in processes for which the waiting times between Poisson distributed events are relevant.
References
[1] Weisstein, Eric W. “Gamma Distribution.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/GammaDistribution.html [2] Wikipedia, “Gamma-distribution”, http://en.wikipedia.org/wiki/Gamma-distribution Examples
Draw samples from the distribution:
>> shape, scale = 2., 1. # mean and width >> s = np.random.standard_gamma(shape, 1000000)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> import scipy.special as sps >> count, bins, ignored = plt.hist(s, 50, normed=True) >> y = bins**(shape-1) * ((np.exp(-bins/scale))/ .. (sps.gamma(shape) * scale**shape)) >> plt.plot(bins, y, linewidth=2, color=’r’) >> plt.show()
-
dask.array.random.
standard_normal
(size=None)¶ Draw samples from a standard Normal distribution (mean=0, stdev=1).
Parameters: size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g., (m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.Returns: out – Drawn samples. Return type: float or ndarray Examples
>> s = np.random.standard_normal(8000) >> s array([ 0.6888893 , 0.78096262, -0.89086505, .., 0.49876311, #random
-0.38672696, -0.4685006 ]) #random>> s.shape (8000,) >> s = np.random.standard_normal(size=(3, 4, 2)) >> s.shape (3, 4, 2)
-
dask.array.random.
standard_t
(df, size=None)¶ Draw samples from a standard Student’s t distribution with df degrees of freedom.
A special case of the hyperbolic distribution. As df gets large, the result resembles that of the standard normal distribution (standard_normal).
Parameters: - df (int) – Degrees of freedom, should be > 0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – Drawn samples.
Return type: ndarray or scalar
Notes
The probability density function for the t distribution is
\[P(x, df) = \frac{\Gamma(\frac{df+1}{2})}{\sqrt{\pi df} \Gamma(\frac{df}{2})}\Bigl( 1+\frac{x^2}{df} \Bigr)^{-(df+1)/2}\]The t test is based on an assumption that the data come from a Normal distribution. The t test provides a way to test whether the sample mean (that is the mean calculated from the data) is a good estimate of the true mean.
The derivation of the t-distribution was first published in 1908 by William Gisset while working for the Guinness Brewery in Dublin. Due to proprietary issues, he had to publish under a pseudonym, and so he used the name Student.
References
[1] Dalgaard, Peter, “Introductory Statistics With R”, Springer, 2002. [2] Wikipedia, “Student’s t-distribution” http://en.wikipedia.org/wiki/Student’s_t-distribution Examples
From Dalgaard page 83 [1]_, suppose the daily energy intake for 11 women in Kj is:
>> intake = np.array([5260., 5470, 5640, 6180, 6390, 6515, 6805, 7515, .. 7515, 8230, 8770])
Does their energy intake deviate systematically from the recommended value of 7725 kJ?
We have 10 degrees of freedom, so is the sample mean within 95% of the recommended value?
>> s = np.random.standard_t(10, size=100000) >> np.mean(intake) 6753.636363636364 >> intake.std(ddof=1) 1142.1232221373727
Calculate the t statistic, setting the ddof parameter to the unbiased value so the divisor in the standard deviation will be degrees of freedom, N-1.
>> t = (np.mean(intake)-7725)/(intake.std(ddof=1)/np.sqrt(len(intake))) >> import matplotlib.pyplot as plt >> h = plt.hist(s, bins=100, normed=True)
For a one-sided t-test, how far out in the distribution does the t statistic appear?
>> np.sum(s<t) / float(len(s)) 0.0090699999999999999 #random
So the p-value is about 0.009, which says the null hypothesis has a probability of about 99% of being true.
-
dask.array.random.
triangular
(left, mode, right, size=None)¶ Draw samples from the triangular distribution.
The triangular distribution is a continuous probability distribution with lower limit left, peak at mode, and upper limit right. Unlike the other distributions, these parameters directly define the shape of the pdf.
Parameters: - left (scalar) – Lower limit.
- mode (scalar) – The value where the peak of the distribution occurs.
The value should fulfill the condition
left <= mode <= right
. - right (scalar) – Upper limit, should be larger than left.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – The returned samples all lie in the interval [left, right].
Return type: ndarray or scalar
Notes
The probability density function for the triangular distribution is
\[\begin{split}P(x;l, m, r) = \begin{cases} \frac{2(x-l)}{(r-l)(m-l)}& \text{for $l \leq x \leq m$},\\ \frac{2(r-x)}{(r-l)(r-m)}& \text{for $m \leq x \leq r$},\\ 0& \text{otherwise}. \end{cases}\end{split}\]The triangular distribution is often used in ill-defined problems where the underlying distribution is not known, but some knowledge of the limits and mode exists. Often it is used in simulations.
References
[1] Wikipedia, “Triangular distribution” http://en.wikipedia.org/wiki/Triangular_distribution Examples
Draw values from the distribution and plot the histogram:
>> import matplotlib.pyplot as plt >> h = plt.hist(np.random.triangular(-3, 0, 8, 100000), bins=200, .. normed=True) >> plt.show()
-
dask.array.random.
uniform
(low=0.0, high=1.0, size=None)¶ Draw samples from a uniform distribution.
Samples are uniformly distributed over the half-open interval
[low, high)
(includes low, but excludes high). In other words, any value within the given interval is equally likely to be drawn by uniform.Parameters: - low (float, optional) – Lower boundary of the output interval. All values generated will be greater than or equal to low. The default value is 0.
- high (float) – Upper boundary of the output interval. All values generated will be less than high. The default value is 1.0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: out – Drawn samples, with shape size.
Return type: ndarray
See also
randint()
- Discrete uniform distribution, yielding integers.
random_integers()
- Discrete uniform distribution over the closed interval
[low, high]
. random_sample()
- Floats uniformly distributed over
[0, 1)
. random()
- Alias for random_sample.
rand()
- Convenience function that accepts dimensions as input, e.g.,
rand(2,2)
would generate a 2-by-2 array of floats, uniformly distributed over[0, 1)
.
Notes
The probability density function of the uniform distribution is
\[p(x) = \frac{1}{b - a}\]anywhere within the interval
[a, b)
, and zero elsewhere.Examples
Draw samples from the distribution:
>> s = np.random.uniform(-1,0,1000)
All values are within the given interval:
>> np.all(s >= -1) True >> np.all(s < 0) True
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> count, bins, ignored = plt.hist(s, 15, normed=True) >> plt.plot(bins, np.ones_like(bins), linewidth=2, color=’r’) >> plt.show()
-
dask.array.random.
vonmises
(mu, kappa, size=None)¶ Draw samples from a von Mises distribution.
Samples are drawn from a von Mises distribution with specified mode (mu) and dispersion (kappa), on the interval [-pi, pi].
The von Mises distribution (also known as the circular normal distribution) is a continuous probability distribution on the unit circle. It may be thought of as the circular analogue of the normal distribution.
Parameters: Returns: samples – The returned samples, which are in the interval [-pi, pi].
Return type: scalar or ndarray
See also
scipy.stats.distributions.vonmises()
- probability density function, distribution, or cumulative density function, etc.
Notes
The probability density for the von Mises distribution is
\[p(x) = \frac{e^{\kappa cos(x-\mu)}}{2\pi I_0(\kappa)},\]where \(\mu\) is the mode and \(\kappa\) the dispersion, and \(I_0(\kappa)\) is the modified Bessel function of order 0.
The von Mises is named for Richard Edler von Mises, who was born in Austria-Hungary, in what is now the Ukraine. He fled to the United States in 1939 and became a professor at Harvard. He worked in probability theory, aerodynamics, fluid mechanics, and philosophy of science.
References
[1] Abramowitz, M. and Stegun, I. A. (Eds.). “Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th printing,” New York: Dover, 1972. [2] von Mises, R., “Mathematical Theory of Probability and Statistics”, New York: Academic Press, 1964. Examples
Draw samples from the distribution:
>> mu, kappa = 0.0, 4.0 # mean and dispersion >> s = np.random.vonmises(mu, kappa, 1000)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> from scipy.special import i0 >> plt.hist(s, 50, normed=True) >> x = np.linspace(-np.pi, np.pi, num=51) >> y = np.exp(kappa*np.cos(x-mu))/(2*np.pi*i0(kappa)) >> plt.plot(x, y, linewidth=2, color=’r’) >> plt.show()
-
dask.array.random.
wald
(mean, scale, size=None)¶ Draw samples from a Wald, or inverse Gaussian, distribution.
As the scale approaches infinity, the distribution becomes more like a Gaussian. Some references claim that the Wald is an inverse Gaussian with mean equal to 1, but this is by no means universal.
The inverse Gaussian distribution was first studied in relationship to Brownian motion. In 1956 M.C.K. Tweedie used the name inverse Gaussian because there is an inverse relationship between the time to cover a unit distance and distance covered in unit time.
Parameters: - mean (scalar) – Distribution mean, should be > 0.
- scale (scalar) – Scale parameter, should be >= 0.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples – Drawn sample, all greater than zero.
Return type: ndarray or scalar
Notes
The probability density function for the Wald distribution is
\[P(x;mean,scale) = \sqrt{\frac{scale}{2\pi x^3}}e^ \frac{-scale(x-mean)^2}{2\cdotp mean^2x}\]As noted above the inverse Gaussian distribution first arise from attempts to model Brownian motion. It is also a competitor to the Weibull for use in reliability modeling and modeling stock returns and interest rate processes.
References
[1] Brighton Webs Ltd., Wald Distribution, http://www.brighton-webs.co.uk/distributions/wald.asp [2] Chhikara, Raj S., and Folks, J. Leroy, “The Inverse Gaussian Distribution: Theory : Methodology, and Applications”, CRC Press, 1988. [3] Wikipedia, “Wald distribution” http://en.wikipedia.org/wiki/Wald_distribution Examples
Draw values from the distribution and plot the histogram:
>> import matplotlib.pyplot as plt >> h = plt.hist(np.random.wald(3, 2, 100000), bins=200, normed=True) >> plt.show()
-
dask.array.random.
weibull
(a, size=None)¶ Draw samples from a Weibull distribution.
Draw samples from a 1-parameter Weibull distribution with the given shape parameter a.
\[X = (-ln(U))^{1/a}\]Here, U is drawn from the uniform distribution over (0,1].
The more common 2-parameter Weibull, including a scale parameter \(\lambda\) is just \(X = \lambda(-ln(U))^{1/a}\).
Parameters: - a (float) – Shape of the distribution.
- size (int or tuple of ints, optional) – Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.
Returns: samples
Return type: ndarray
See also
scipy.stats.distributions.weibull_max()
,scipy.stats.distributions.weibull_min()
,scipy.stats.distributions.genextreme()
,gumbel()
Notes
The Weibull (or Type III asymptotic extreme value distribution for smallest values, SEV Type III, or Rosin-Rammler distribution) is one of a class of Generalized Extreme Value (GEV) distributions used in modeling extreme value problems. This class includes the Gumbel and Frechet distributions.
The probability density for the Weibull distribution is
\[p(x) = \frac{a} {\lambda}(\frac{x}{\lambda})^{a-1}e^{-(x/\lambda)^a},\]where \(a\) is the shape and \(\lambda\) the scale.
The function has its peak (the mode) at \(\lambda(\frac{a-1}{a})^{1/a}\).
When
a = 1
, the Weibull distribution reduces to the exponential distribution.References
[1] Waloddi Weibull, Royal Technical University, Stockholm, 1939 “A Statistical Theory Of The Strength Of Materials”, Ingeniorsvetenskapsakademiens Handlingar Nr 151, 1939, Generalstabens Litografiska Anstalts Forlag, Stockholm. [2] Waloddi Weibull, “A Statistical Distribution Function of Wide Applicability”, Journal Of Applied Mechanics ASME Paper 1951. [3] Wikipedia, “Weibull distribution”, http://en.wikipedia.org/wiki/Weibull_distribution Examples
Draw samples from the distribution:
>> a = 5. # shape >> s = np.random.weibull(a, 1000)
Display the histogram of the samples, along with the probability density function:
>> import matplotlib.pyplot as plt >> x = np.arange(1,100.)/50. >> def weib(x,n,a): .. return (a / n) * (x / n)**(a - 1) * np.exp(-(x / n)**a)
>> count, bins, ignored = plt.hist(np.random.weibull(5.,1000)) >> x = np.arange(1,100.)/50. >> scale = count.max()/weib(x, 1., 5.).max() >> plt.plot(x, weib(x, 1., 5.)*scale) >> plt.show()
-
dask.array.random.
zipf
(a, size=None)¶ Standard distributions
-
dask.array.stats.
ttest_ind
(a, b, axis=0, equal_var=True)¶ Calculates the T-test for the means of two independent samples of scores.
This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. This test assumes that the populations have identical variances by default.
Parameters: - b (a,) – The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
- axis (int or None, optional) – Axis along which to compute test. If None, compute over the whole arrays, a, and b.
- equal_var (bool, optional) –
If True (default), perform a standard independent 2 sample test that assumes equal population variances [1]_. If False, perform Welch’s t-test, which does not assume equal population variance [2]_.
New in version 0.11.0.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: - statistic (float or array) – The calculated t-statistic.
- pvalue (float or array) – The two-tailed p-value.
Notes
We can use this test, if we observe two independent samples from the same or different population, e.g. exam scores of boys and girls or of two ethnic groups. The test measures whether the average (expected) value differs significantly across samples. If we observe a large p-value, for example larger than 0.05 or 0.1, then we cannot reject the null hypothesis of identical average scores. If the p-value is smaller than the threshold, e.g. 1%, 5% or 10%, then we reject the null hypothesis of equal averages.
References
[1] http://en.wikipedia.org/wiki/T-test#Independent_two-sample_t-test [2] http://en.wikipedia.org/wiki/Welch%27s_t_test Examples
>> from scipy import stats >> np.random.seed(12345678)
Test with sample with identical means:
>> rvs1 = stats.norm.rvs(loc=5,scale=10,size=500) >> rvs2 = stats.norm.rvs(loc=5,scale=10,size=500) >> stats.ttest_ind(rvs1,rvs2) (0.26833823296239279, 0.78849443369564776) >> stats.ttest_ind(rvs1,rvs2, equal_var = False) (0.26833823296239279, 0.78849452749500748)
ttest_ind underestimates p for unequal variances:
>> rvs3 = stats.norm.rvs(loc=5, scale=20, size=500) >> stats.ttest_ind(rvs1, rvs3) (-0.46580283298287162, 0.64145827413436174) >> stats.ttest_ind(rvs1, rvs3, equal_var = False) (-0.46580283298287162, 0.64149646246569292)
When n1 != n2, the equal variance t-statistic is no longer equal to the unequal variance t-statistic:
>> rvs4 = stats.norm.rvs(loc=5, scale=20, size=100) >> stats.ttest_ind(rvs1, rvs4) (-0.99882539442782481, 0.3182832709103896) >> stats.ttest_ind(rvs1, rvs4, equal_var = False) (-0.69712570584654099, 0.48716927725402048)
T-test with different means, variance, and n:
>> rvs5 = stats.norm.rvs(loc=8, scale=20, size=100) >> stats.ttest_ind(rvs1, rvs5) (-1.4679669854490653, 0.14263895620529152) >> stats.ttest_ind(rvs1, rvs5, equal_var = False) (-0.94365973617132992, 0.34744170334794122)
-
dask.array.stats.
ttest_1samp
(a, popmean, axis=0, nan_policy='propagate')¶ Calculates the T-test for the mean of ONE group of scores.
This is a two-sided test for the null hypothesis that the expected value (mean) of a sample of independent observations a is equal to the given population mean, popmean.
Parameters: - a (array_like) – sample observation
- popmean (float or array_like) – expected value in null hypothesis, if array_like than it must have the same shape as a excluding the axis dimension
- axis (int or None, optional) – Axis along which to compute test. If None, compute over the whole array a.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: - statistic (float or array) – t-statistic
- pvalue (float or array) – two-tailed p-value
Examples
>> from scipy import stats
>> np.random.seed(7654567) # fix seed to get the same result >> rvs = stats.norm.rvs(loc=5, scale=10, size=(50,2))
Test if mean of random sample is equal to true mean, and different mean. We reject the null hypothesis in the second case and don’t reject it in the first case.
>> stats.ttest_1samp(rvs,5.0) (array([-0.68014479, -0.04323899]), array([ 0.49961383, 0.96568674])) >> stats.ttest_1samp(rvs,0.0) (array([ 2.77025808, 4.11038784]), array([ 0.00789095, 0.00014999]))
Examples using axis and non-scalar dimension for population mean.
>> stats.ttest_1samp(rvs,[5.0,0.0]) (array([-0.68014479, 4.11038784]), array([ 4.99613833e-01, 1.49986458e-04])) >> stats.ttest_1samp(rvs.T,[5.0,0.0],axis=1) (array([-0.68014479, 4.11038784]), array([ 4.99613833e-01, 1.49986458e-04])) >> stats.ttest_1samp(rvs,[[5.0],[0.0]]) (array([[-0.68014479, -0.04323899],
[ 2.77025808, 4.11038784]]), array([[ 4.99613833e-01, 9.65686743e-01], [ 7.89094663e-03, 1.49986458e-04]]))
-
dask.array.stats.
ttest_rel
(a, b, axis=0, nan_policy='propagate')¶ Calculates the T-test on TWO RELATED samples of scores, a and b.
This is a two-sided test for the null hypothesis that 2 related or repeated samples have identical average (expected) values.
Parameters: - b (a,) – The arrays must have the same shape.
- axis (int or None, optional) – Axis along which to compute test. If None, compute over the whole arrays, a, and b.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: - statistic (float or array) – t-statistic
- pvalue (float or array) – two-tailed p-value
Notes
Examples for the use are scores of the same set of student in different exams, or repeated sampling from the same units. The test measures whether the average score differs significantly across samples (e.g. exams). If we observe a large p-value, for example greater than 0.05 or 0.1 then we cannot reject the null hypothesis of identical average scores. If the p-value is smaller than the threshold, e.g. 1%, 5% or 10%, then we reject the null hypothesis of equal averages. Small p-values are associated with large t-statistics.
References
http://en.wikipedia.org/wiki/T-test#Dependent_t-test
Examples
>> from scipy import stats >> np.random.seed(12345678) # fix random seed to get same numbers
>> rvs1 = stats.norm.rvs(loc=5,scale=10,size=500) >> rvs2 = (stats.norm.rvs(loc=5,scale=10,size=500) + .. stats.norm.rvs(scale=0.2,size=500)) >> stats.ttest_rel(rvs1,rvs2) (0.24101764965300962, 0.80964043445811562) >> rvs3 = (stats.norm.rvs(loc=8,scale=10,size=500) + .. stats.norm.rvs(scale=0.2,size=500)) >> stats.ttest_rel(rvs1,rvs3) (-3.9995108708727933, 7.3082402191726459e-005)
-
dask.array.stats.
chisquare
(f_obs, f_exp=None, ddof=0, axis=0)¶ Calculates a one-way chi square test.
The chi square test tests the null hypothesis that the categorical data has the given frequencies.
Parameters: - f_obs (array_like) – Observed frequencies in each category.
- f_exp (array_like, optional) – Expected frequencies in each category. By default the categories are assumed to be equally likely.
- ddof (int, optional) – “Delta degrees of freedom”: adjustment to the degrees of freedom
for the p-value. The p-value is computed using a chi-squared
distribution with
k - 1 - ddof
degrees of freedom, where k is the number of observed frequencies. The default value of ddof is 0. - axis (int or None, optional) – The axis of the broadcast result of f_obs and f_exp along which to apply the test. If axis is None, all values in f_obs are treated as a single data set. Default is 0.
Returns: - chisq (float or ndarray) – The chi-squared test statistic. The value is a float if axis is None or f_obs and f_exp are 1-D.
- p (float or ndarray) – The p-value of the test. The value is a float if ddof and the return value chisq are scalars.
See also
power_divergence()
,mstats.chisquare()
Notes
This test is invalid when the observed or expected frequencies in each category are too small. A typical rule is that all of the observed and expected frequencies should be at least 5.
The default degrees of freedom, k-1, are for the case when no parameters of the distribution are estimated. If p parameters are estimated by efficient maximum likelihood then the correct degrees of freedom are k-1-p. If the parameters are estimated in a different way, then the dof can be between k-1-p and k-1. However, it is also possible that the asymptotic distribution is not a chisquare, in which case this test is not appropriate.
References
[1] Lowry, Richard. “Concepts and Applications of Inferential Statistics”. Chapter 8. http://faculty.vassar.edu/lowry/ch8pt1.html [2] “Chi-squared test”, http://en.wikipedia.org/wiki/Chi-squared_test Examples
When just f_obs is given, it is assumed that the expected frequencies are uniform and given by the mean of the observed frequencies.
>> from scipy.stats import chisquare >> chisquare([16, 18, 16, 14, 12, 12]) (2.0, 0.84914503608460956)
With f_exp the expected frequencies can be given.
>> chisquare([16, 18, 16, 14, 12, 12], f_exp=[16, 16, 16, 16, 16, 8]) (3.5, 0.62338762774958223)
When f_obs is 2-D, by default the test is applied to each column.
>> obs = np.array([[16, 18, 16, 14, 12, 12], [32, 24, 16, 28, 20, 24]]).T >> obs.shape (6, 2) >> chisquare(obs) (array([ 2. , 6.66666667]), array([ 0.84914504, 0.24663415]))
By setting
axis=None
, the test is applied to all data in the array, which is equivalent to applying the test to the flattened array.>> chisquare(obs, axis=None) (23.31034482758621, 0.015975692534127565) >> chisquare(obs.ravel()) (23.31034482758621, 0.015975692534127565)
ddof is the change to make to the default degrees of freedom.
>> chisquare([16, 18, 16, 14, 12, 12], ddof=1) (2.0, 0.73575888234288467)
The calculation of the p-values is done by broadcasting the chi-squared statistic with ddof.
>> chisquare([16, 18, 16, 14, 12, 12], ddof=[0,1,2]) (2.0, array([ 0.84914504, 0.73575888, 0.5724067 ]))
f_obs and f_exp are also broadcast. In the following, f_obs has shape (6,) and f_exp has shape (2, 6), so the result of broadcasting f_obs and f_exp has shape (2, 6). To compute the desired chi-squared statistics, we use
axis=1
:>> chisquare([16, 18, 16, 14, 12, 12], .. f_exp=[[16, 16, 16, 16, 16, 8], [8, 20, 20, 16, 12, 12]], .. axis=1) (array([ 3.5 , 9.25]), array([ 0.62338763, 0.09949846]))
-
dask.array.stats.
power_divergence
(f_obs, f_exp=None, ddof=0, axis=0, lambda_=None)¶ Cressie-Read power divergence statistic and goodness of fit test.
This function tests the null hypothesis that the categorical data has the given frequencies, using the Cressie-Read power divergence statistic.
Parameters: - f_obs (array_like) – Observed frequencies in each category.
- f_exp (array_like, optional) – Expected frequencies in each category. By default the categories are assumed to be equally likely.
- ddof (int, optional) – “Delta degrees of freedom”: adjustment to the degrees of freedom
for the p-value. The p-value is computed using a chi-squared
distribution with
k - 1 - ddof
degrees of freedom, where k is the number of observed frequencies. The default value of ddof is 0. - axis (int or None, optional) – The axis of the broadcast result of f_obs and f_exp along which to apply the test. If axis is None, all values in f_obs are treated as a single data set. Default is 0.
- lambda (float or str, optional) –
lambda_ gives the power in the Cressie-Read power divergence statistic. The default is 1. For convenience, lambda_ may be assigned one of the following strings, in which case the corresponding numerical value is used:
String Value Description "pearson" 1 Pearson's chi-squared statistic. In this case, the function is equivalent to `stats.chisquare`. "log-likelihood" 0 Log-likelihood ratio. Also known as the G-test [3]_. "freeman-tukey" -1/2 Freeman-Tukey statistic. "mod-log-likelihood" -1 Modified log-likelihood ratio. "neyman" -2 Neyman's statistic. "cressie-read" 2/3 The power recommended in [5]_.
Returns: - statistic (float or ndarray) – The Cressie-Read power divergence test statistic. The value is a float if axis is None or if` f_obs and f_exp are 1-D.
- pvalue (float or ndarray) – The p-value of the test. The value is a float if ddof and the return value stat are scalars.
See also
Notes
This test is invalid when the observed or expected frequencies in each category are too small. A typical rule is that all of the observed and expected frequencies should be at least 5.
When lambda_ is less than zero, the formula for the statistic involves dividing by f_obs, so a warning or error may be generated if any value in f_obs is 0.
Similarly, a warning or error may be generated if any value in f_exp is zero when lambda_ >= 0.
The default degrees of freedom, k-1, are for the case when no parameters of the distribution are estimated. If p parameters are estimated by efficient maximum likelihood then the correct degrees of freedom are k-1-p. If the parameters are estimated in a different way, then the dof can be between k-1-p and k-1. However, it is also possible that the asymptotic distribution is not a chisquare, in which case this test is not appropriate.
This function handles masked arrays. If an element of f_obs or f_exp is masked, then data at that position is ignored, and does not count towards the size of the data set.
New in version 0.13.0.
References
[1] Lowry, Richard. “Concepts and Applications of Inferential Statistics”. Chapter 8. http://faculty.vassar.edu/lowry/ch8pt1.html [2] “Chi-squared test”, http://en.wikipedia.org/wiki/Chi-squared_test [3] “G-test”, http://en.wikipedia.org/wiki/G-test [4] Sokal, R. R. and Rohlf, F. J. “Biometry: the principles and practice of statistics in biological research”, New York: Freeman (1981) [5] Cressie, N. and Read, T. R. C., “Multinomial Goodness-of-Fit Tests”, J. Royal Stat. Soc. Series B, Vol. 46, No. 3 (1984), pp. 440-464. Examples
(See chisquare for more examples.)
When just f_obs is given, it is assumed that the expected frequencies are uniform and given by the mean of the observed frequencies. Here we perform a G-test (i.e. use the log-likelihood ratio statistic):
>> from scipy.stats import power_divergence >> power_divergence([16, 18, 16, 14, 12, 12], lambda_=’log-likelihood’) (2.006573162632538, 0.84823476779463769)
The expected frequencies can be given with the f_exp argument:
>> power_divergence([16, 18, 16, 14, 12, 12], .. f_exp=[16, 16, 16, 16, 16, 8], .. lambda_=’log-likelihood’) (3.3281031458963746, 0.6495419288047497)
When f_obs is 2-D, by default the test is applied to each column.
>> obs = np.array([[16, 18, 16, 14, 12, 12], [32, 24, 16, 28, 20, 24]]).T >> obs.shape (6, 2) >> power_divergence(obs, lambda_=”log-likelihood”) (array([ 2.00657316, 6.77634498]), array([ 0.84823477, 0.23781225]))
By setting
axis=None
, the test is applied to all data in the array, which is equivalent to applying the test to the flattened array.>> power_divergence(obs, axis=None) (23.31034482758621, 0.015975692534127565) >> power_divergence(obs.ravel()) (23.31034482758621, 0.015975692534127565)
ddof is the change to make to the default degrees of freedom.
>> power_divergence([16, 18, 16, 14, 12, 12], ddof=1) (2.0, 0.73575888234288467)
The calculation of the p-values is done by broadcasting the test statistic with ddof.
>> power_divergence([16, 18, 16, 14, 12, 12], ddof=[0,1,2]) (2.0, array([ 0.84914504, 0.73575888, 0.5724067 ]))
f_obs and f_exp are also broadcast. In the following, f_obs has shape (6,) and f_exp has shape (2, 6), so the result of broadcasting f_obs and f_exp has shape (2, 6). To compute the desired chi-squared statistics, we must use
axis=1
:>> power_divergence([16, 18, 16, 14, 12, 12], .. f_exp=[[16, 16, 16, 16, 16, 8], .. [8, 20, 20, 16, 12, 12]], .. axis=1) (array([ 3.5 , 9.25]), array([ 0.62338763, 0.09949846]))
-
dask.array.stats.
skew
(a, axis=0, bias=True, nan_policy='propagate')¶ Computes the skewness of a data set.
For normally distributed data, the skewness should be about 0. A skewness value > 0 means that there is more weight in the left tail of the distribution. The function skewtest can be used to determine if the skewness value is close enough to 0, statistically speaking.
Parameters: - a (ndarray) – data
- axis (int or None, optional) – Axis along which skewness is calculated. Default is 0. If None, compute over the whole array a.
- bias (bool, optional) – If False, then the calculations are corrected for statistical bias.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: skewness – The skewness of values along an axis, returning 0 where all values are equal.
Return type: ndarray
References
[1] Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000. Section 2.2.24.1
-
dask.array.stats.
skewtest
(a, axis=0, nan_policy='propagate')¶ Tests whether the skew is different from the normal distribution.
This function tests the null hypothesis that the skewness of the population that the sample was drawn from is the same as that of a corresponding normal distribution.
Parameters: - a (array) – The data to be tested
- axis (int or None, optional) – Axis along which statistics are calculated. Default is 0. If None, compute over the whole array a.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: - statistic (float) – The computed z-score for this test.
- pvalue (float) – a 2-sided p-value for the hypothesis test
Notes
The sample size must be at least 8.
-
dask.array.stats.
kurtosis
(a, axis=0, fisher=True, bias=True, nan_policy='propagate')¶ Computes the kurtosis (Fisher or Pearson) of a dataset.
Kurtosis is the fourth central moment divided by the square of the variance. If Fisher’s definition is used, then 3.0 is subtracted from the result to give 0.0 for a normal distribution.
If bias is False then the kurtosis is calculated using k statistics to eliminate bias coming from biased moment estimators
Use kurtosistest to see if result is close enough to normal.
Parameters: - a (array) – data for which the kurtosis is calculated
- axis (int or None, optional) – Axis along which the kurtosis is calculated. Default is 0. If None, compute over the whole array a.
- fisher (bool, optional) – If True, Fisher’s definition is used (normal ==> 0.0). If False, Pearson’s definition is used (normal ==> 3.0).
- bias (bool, optional) – If False, then the calculations are corrected for statistical bias.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: kurtosis – The kurtosis of values along an axis. If all values are equal, return -3 for Fisher’s definition and 0 for Pearson’s definition.
Return type: References
[1] Zwillinger, D. and Kokoska, S. (2000). CRC Standard Probability and Statistics Tables and Formulae. Chapman & Hall: New York. 2000.
-
dask.array.stats.
kurtosistest
(a, axis=0, nan_policy='propagate')¶ Tests whether a dataset has normal kurtosis
This function tests the null hypothesis that the kurtosis of the population from which the sample was drawn is that of the normal distribution:
kurtosis = 3(n-1)/(n+1)
.Parameters: - a (array) – array of the sample data
- axis (int or None, optional) – Axis along which to compute test. Default is 0. If None, compute over the whole array a.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: - statistic (float) – The computed z-score for this test.
- pvalue (float) – The 2-sided p-value for the hypothesis test
Notes
Valid only for n>20. The Z-score is set to 0 for bad entries. This function uses the method described in [1]_.
References
[1] see e.g. F. J. Anscombe, W. J. Glynn, “Distribution of the kurtosis statistic b2 for normal samples”, Biometrika, vol. 70, pp. 227-234, 1983.
-
dask.array.stats.
normaltest
(a, axis=0, nan_policy='propagate')¶ Tests whether a sample differs from a normal distribution.
This function tests the null hypothesis that a sample comes from a normal distribution. It is based on D’Agostino and Pearson’s [1]_, [2]_ test that combines skew and kurtosis to produce an omnibus test of normality.
Parameters: - a (array_like) – The array containing the data to be tested.
- axis (int or None, optional) – Axis along which to compute test. Default is 0. If None, compute over the whole array a.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: - statistic (float or array) –
s^2 + k^2
, wheres
is the z-score returned by skewtest andk
is the z-score returned by kurtosistest. - pvalue (float or array) – A 2-sided chi squared probability for the hypothesis test.
References
[1] D’Agostino, R. B. (1971), “An omnibus test of normality for moderate and large sample size”, Biometrika, 58, 341-348 [2] D’Agostino, R. and Pearson, E. S. (1973), “Tests for departure from normality”, Biometrika, 60, 613-622
-
dask.array.stats.
f_oneway
(*args)¶ Performs a 1-way ANOVA.
The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.
Parameters: sample2, .. (sample1,) – The sample measurements for each group. Returns: - statistic (float) – The computed F-value of the test.
- pvalue (float) – The associated p-value from the F-distribution.
Notes
The ANOVA test has important assumptions that must be satisfied in order for the associated p-value to be valid.
- The samples are independent.
- Each sample is from a normally distributed population.
- The population standard deviations of the groups are all equal. This property is known as homoscedasticity.
If these assumptions are not true for a given set of data, it may still be possible to use the Kruskal-Wallis H-test (scipy.stats.kruskal) although with some loss of power.
The algorithm is from Heiman[2], pp.394-7.
References
[1] Lowry, Richard. “Concepts and Applications of Inferential Statistics”. Chapter 14. http://faculty.vassar.edu/lowry/ch14pt1.html [2] Heiman, G.W. Research Methods in Statistics. 2002. [3] McDonald, G. H. “Handbook of Biological Statistics”, One-way ANOVA. http://www.biostathandbook.com/onewayanova.html Examples
>> import scipy.stats as stats
[3]_ Here are some data on a shell measurement (the length of the anterior adductor muscle scar, standardized by dividing by length) in the mussel Mytilus trossulus from five locations: Tillamook, Oregon; Newport, Oregon; Petersburg, Alaska; Magadan, Russia; and Tvarminne, Finland, taken from a much larger data set used in McDonald et al. (1991).
>> tillamook = [0.0571, 0.0813, 0.0831, 0.0976, 0.0817, 0.0859, 0.0735, .. 0.0659, 0.0923, 0.0836] >> newport = [0.0873, 0.0662, 0.0672, 0.0819, 0.0749, 0.0649, 0.0835, .. 0.0725] >> petersburg = [0.0974, 0.1352, 0.0817, 0.1016, 0.0968, 0.1064, 0.105] >> magadan = [0.1033, 0.0915, 0.0781, 0.0685, 0.0677, 0.0697, 0.0764, .. 0.0689] >> tvarminne = [0.0703, 0.1026, 0.0956, 0.0973, 0.1039, 0.1045] >> stats.f_oneway(tillamook, newport, petersburg, magadan, tvarminne) (7.1210194716424473, 0.00028122423145345439)
-
dask.array.stats.
moment
(a, moment=1, axis=0, nan_policy='propagate')¶ Calculates the nth moment about the mean for a sample.
A moment is a specific quantitative measure of the shape of a set of points. It is often used to calculate coefficients of skewness and kurtosis due to its close relationship with them.
Parameters: - a (array_like) – data
- moment (int or array_like of ints, optional) – order of central moment that is returned. Default is 1.
- axis (int or None, optional) – Axis along which the central moment is computed. Default is 0. If None, compute over the whole array a.
- nan_policy ({'propagate', 'raise', 'omit'}, optional) – Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’.
Returns: n-th central moment – The appropriate moment along the given axis or over all values if axis is None. The denominator for the moment calculation is the number of observations, no degrees of freedom correction is done.
Return type: ndarray or float
See also
kurtosis()
,skew()
,describe()
Notes
The k-th central moment of a data sample is:
\[m_k = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^k\]Where n is the number of samples and x-bar is the mean. This function uses exponentiation by squares [1]_ for efficiency.
References
[1] http://eli.thegreenplace.net/2009/03/21/efficient-integer-exponentiation-algorithms
-
dask.array.image.
imread
(filename, imread=None, preprocess=None)¶ Read a stack of images into a dask array
Parameters: - filename (string) – A globstring like ‘myfile.*.png’
- imread (function (optional)) – Optionally provide custom imread function.
Function should expect a filename and produce a numpy array.
Defaults to
skimage.io.imread
. - preprocess (function (optional)) – Optionally provide custom function to preprocess the image. Function should expect a numpy array for a single image.
Examples
>>> from dask.array.image import imread >>> im = imread('2015-*-*.png') >>> im.shape (365, 1000, 1000, 3)
Returns: - Dask array of all images stacked along the first dimension. All images
- will be treated as individual chunks
-
dask.array.core.
map_blocks
(func, *args, **kwargs)¶ Map a function across all blocks of a dask array.
Parameters: - func (callable) – Function to apply to every block in the array.
- args (dask arrays or constants) –
- dtype (np.dtype, optional) – The
dtype
of the output array. It is recommended to provide this. If not provided, will be inferred by applying the function to a small set of fake data. - chunks (tuple, optional) – Chunk shape of resulting blocks if the function does not preserve shape. If not provided, the resulting array is assumed to have the same block structure as the first input array.
- drop_axis (number or iterable, optional) – Dimensions lost by the function.
- new_axis (number or iterable, optional) – New dimensions created by the function. Note that these are applied
after
drop_axis
(if present). - token (string, optional) – The key prefix to use for the output array. If not provided, will be determined from the function name.
- name (string, optional) – The key name to use for the output array. Note that this fully specifies the output key name, and must be unique. If not provided, will be determined by a hash of the arguments.
- **kwargs – Other keyword arguments to pass to function. Values must be constants (not dask.arrays)
Examples
>>> import dask.array as da >>> x = da.arange(6, chunks=3)
>>> x.map_blocks(lambda x: x * 2).compute() array([ 0, 2, 4, 6, 8, 10])
The
da.map_blocks
function can also accept multiple arrays.>>> d = da.arange(5, chunks=2) >>> e = da.arange(5, chunks=2)
>>> f = map_blocks(lambda a, b: a + b**2, d, e) >>> f.compute() array([ 0, 2, 6, 12, 20])
If the function changes shape of the blocks then you must provide chunks explicitly.
>>> y = x.map_blocks(lambda x: x[::2], chunks=((2, 2),))
You have a bit of freedom in specifying chunks. If all of the output chunk sizes are the same, you can provide just that chunk size as a single tuple.
>>> a = da.arange(18, chunks=(6,)) >>> b = a.map_blocks(lambda x: x[:3], chunks=(3,))
If the function changes the dimension of the blocks you must specify the created or destroyed dimensions.
>>> b = a.map_blocks(lambda x: x[None, :, None], chunks=(1, 6, 1), ... new_axis=[0, 2])
Map_blocks aligns blocks by block positions without regard to shape. In the following example we have two arrays with the same number of blocks but with different shape and chunk sizes.
>>> x = da.arange(1000, chunks=(100,)) >>> y = da.arange(100, chunks=(10,))
The relevant attribute to match is numblocks.
>>> x.numblocks (10,) >>> y.numblocks (10,)
If these match (up to broadcasting rules) then we can map arbitrary functions across blocks
>>> def func(a, b): ... return np.array([a.max(), b.max()])
>>> da.map_blocks(func, x, y, chunks=(2,), dtype='i8') dask.array<func, shape=(20,), dtype=int64, chunksize=(2,)>
>>> _.compute() array([ 99, 9, 199, 19, 299, 29, 399, 39, 499, 49, 599, 59, 699, 69, 799, 79, 899, 89, 999, 99])
Your block function can learn where in the array it is if it supports a
block_id
keyword argument. This will receive entries like (2, 0, 1), the position of the block in the dask array.>>> def func(block, block_id=None): ... pass
You may specify the key name prefix of the resulting task in the graph with the optional
token
keyword argument.>>> x.map_blocks(lambda x: x + 1, token='increment') dask.array<increment, shape=(100,), dtype=int64, chunksize=(10,)>
-
dask.array.core.
atop
(func, out_ind, *args, **kwargs)¶ Tensor operation: Generalized inner and outer products
A broad class of blocked algorithms and patterns can be specified with a concise multi-index notation. The
atop
function applies an in-memory function across multiple blocks of multiple inputs in a variety of ways. Many dask.array operations are special cases of atop including elementwise, broadcasting, reductions, tensordot, and transpose.Parameters: - func (callable) – Function to apply to individual tuples of blocks
- out_ind (iterable) – Block pattern of the output, something like ‘ijk’ or (1, 2, 3)
- *args (sequence of Array, index pairs) – Sequence like (x, ‘ij’, y, ‘jk’, z, ‘i’)
- **kwargs (dict) – Extra keyword arguments to pass to function
- dtype (np.dtype) – Datatype of resulting array.
- concatenate (bool, keyword only) – If true concatenate arrays along dummy indices, else provide lists
- adjust_chunks (dict) – Dictionary mapping index to function to be applied to chunk sizes
- new_axes (dict, keyword only) – New indexes and their dimension lengths
Examples
2D embarrassingly parallel operation from two arrays, x, and y.
>>> z = atop(operator.add, 'ij', x, 'ij', y, 'ij', dtype='f8') # z = x + y
Outer product multiplying x by y, two 1-d vectors
>>> z = atop(operator.mul, 'ij', x, 'i', y, 'j', dtype='f8')
z = x.T
>>> z = atop(np.transpose, 'ji', x, 'ij', dtype=x.dtype)
The transpose case above is illustrative because it does same transposition both on each in-memory block by calling
np.transpose
and on the order of the blocks themselves, by switching the order of the indexij -> ji
.We can compose these same patterns with more variables and more complex in-memory functions
z = X + Y.T
>>> z = atop(lambda x, y: x + y.T, 'ij', x, 'ij', y, 'ji', dtype='f8')
Any index, like
i
missing from the output index is interpreted as a contraction (note that this differs from Einstein convention; repeated indices do not imply contraction.) In the case of a contraction the passed function should expect an iterable of blocks on any array that holds that index. To receive arrays concatenated along contracted dimensions instead passconcatenate=True
.Inner product multiplying x by y, two 1-d vectors
>>> def sequence_dot(x_blocks, y_blocks): ... result = 0 ... for x, y in zip(x_blocks, y_blocks): ... result += x.dot(y) ... return result
>>> z = atop(sequence_dot, '', x, 'i', y, 'i', dtype='f8')
Add new single-chunk dimensions with the
new_axes=
keyword, including the length of the new dimension. New dimensions will always be in a single chunk.>>> def f(x): ... return x[:, None] * np.ones((1, 5))
>>> z = atop(f, 'az', x, 'a', new_axes={'z': 5}, dtype=x.dtype)
If the applied function changes the size of each chunk you can specify this with a
adjust_chunks={...}
dictionary holding a function for each index that modifies the dimension size in that index.>>> def double(x): ... return np.concatenate([x, x])
>>> y = atop(double, 'ij', x, 'ij', ... adjust_chunks={'i': lambda n: 2 * n}, dtype=x.dtype)
Include literals by indexing with None
>>> y = atop(add, 'ij', x, 'ij', 1234, None, dtype=x.dtype)
See also
top()
,contains()
-
dask.array.core.
top
(func, output, out_indices, *arrind_pairs, **kwargs)¶ Tensor operation
Applies a function,
func
, across blocks from many different input dasks. We arrange the pattern with which those blocks interact with sets of matching indices. E.g.:top(func, 'z', 'i', 'x', 'i', 'y', 'i')
yield an embarrassingly parallel communication pattern and is read as
$$ z_i = func(x_i, y_i) $$More complex patterns may emerge, including multiple indices:
top(func, 'z', 'ij', 'x', 'ij', 'y', 'ji') $$ z_{ij} = func(x_{ij}, y_{ji}) $$
Indices missing in the output but present in the inputs results in many inputs being sent to one function (see examples).
Examples
Simple embarrassing map operation
>>> inc = lambda x: x + 1 >>> top(inc, 'z', 'ij', 'x', 'ij', numblocks={'x': (2, 2)}) {('z', 0, 0): (inc, ('x', 0, 0)), ('z', 0, 1): (inc, ('x', 0, 1)), ('z', 1, 0): (inc, ('x', 1, 0)), ('z', 1, 1): (inc, ('x', 1, 1))}
Simple operation on two datasets
>>> add = lambda x, y: x + y >>> top(add, 'z', 'ij', 'x', 'ij', 'y', 'ij', numblocks={'x': (2, 2), ... 'y': (2, 2)}) {('z', 0, 0): (add, ('x', 0, 0), ('y', 0, 0)), ('z', 0, 1): (add, ('x', 0, 1), ('y', 0, 1)), ('z', 1, 0): (add, ('x', 1, 0), ('y', 1, 0)), ('z', 1, 1): (add, ('x', 1, 1), ('y', 1, 1))}
Operation that flips one of the datasets
>>> addT = lambda x, y: x + y.T # Transpose each chunk >>> # z_ij ~ x_ij y_ji >>> # .. .. .. notice swap >>> top(addT, 'z', 'ij', 'x', 'ij', 'y', 'ji', numblocks={'x': (2, 2), ... 'y': (2, 2)}) {('z', 0, 0): (add, ('x', 0, 0), ('y', 0, 0)), ('z', 0, 1): (add, ('x', 0, 1), ('y', 1, 0)), ('z', 1, 0): (add, ('x', 1, 0), ('y', 0, 1)), ('z', 1, 1): (add, ('x', 1, 1), ('y', 1, 1))}
Dot product with contraction over
j
index. Yields list arguments>>> top(dotmany, 'z', 'ik', 'x', 'ij', 'y', 'jk', numblocks={'x': (2, 2), ... 'y': (2, 2)}) {('z', 0, 0): (dotmany, [('x', 0, 0), ('x', 0, 1)], [('y', 0, 0), ('y', 1, 0)]), ('z', 0, 1): (dotmany, [('x', 0, 0), ('x', 0, 1)], [('y', 0, 1), ('y', 1, 1)]), ('z', 1, 0): (dotmany, [('x', 1, 0), ('x', 1, 1)], [('y', 0, 0), ('y', 1, 0)]), ('z', 1, 1): (dotmany, [('x', 1, 0), ('x', 1, 1)], [('y', 0, 1), ('y', 1, 1)])}
Pass
concatenate=True
to concatenate arrays ahead of time>>> top(f, 'z', 'i', 'x', 'ij', 'y', 'ij', concatenate=True, ... numblocks={'x': (2, 2), 'y': (2, 2,)}) {('z', 0): (f, (concatenate_axes, [('x', 0, 0), ('x', 0, 1)], (1,)), (concatenate_axes, [('y', 0, 0), ('y', 0, 1)], (1,))) ('z', 1): (f, (concatenate_axes, [('x', 1, 0), ('x', 1, 1)], (1,)), (concatenate_axes, [('y', 1, 0), ('y', 1, 1)], (1,)))}
Supports Broadcasting rules
>>> top(add, 'z', 'ij', 'x', 'ij', 'y', 'ij', numblocks={'x': (1, 2), ... 'y': (2, 2)}) {('z', 0, 0): (add, ('x', 0, 0), ('y', 0, 0)), ('z', 0, 1): (add, ('x', 0, 1), ('y', 0, 1)), ('z', 1, 0): (add, ('x', 0, 0), ('y', 1, 0)), ('z', 1, 1): (add, ('x', 0, 1), ('y', 1, 1))}
Support keyword arguments with apply
>>> def f(a, b=0): return a + b >>> top(f, 'z', 'i', 'x', 'i', numblocks={'x': (2,)}, b=10) {('z', 0): (apply, f, [('x', 0)], {'b': 10}), ('z', 1): (apply, f, [('x', 1)], {'b': 10})}
Include literals by indexing with
None
>>> top(add, 'z', 'i', 'x', 'i', 100, None, numblocks={'x': (2,)}) {('z', 0): (add, ('x', 0), 100), ('z', 1): (add, ('x', 1), 100)}
See also
Array Methods¶
-
class
dask.array.
Array
¶ Parallel Dask Array
A parallel nd-array comprised of many numpy arrays arranged in a grid.
This constructor is for advanced uses only. For normal use see the
da.from_array
function.Parameters: See also
-
all
(axis=None, out=None, keepdims=False)¶ Returns True if all elements evaluate to True.
Refer to numpy.all for full documentation.
See also
numpy.all()
- equivalent function
-
any
(axis=None, out=None, keepdims=False)¶ Returns True if any of the elements of a evaluate to True.
Refer to numpy.any for full documentation.
See also
numpy.any()
- equivalent function
-
argmax
(axis=None, out=None)¶ Return indices of the maximum values along the given axis.
Refer to numpy.argmax for full documentation.
See also
numpy.argmax()
- equivalent function
-
argmin
(axis=None, out=None)¶ Return indices of the minimum values along the given axis of a.
Refer to numpy.argmin for detailed documentation.
See also
numpy.argmin()
- equivalent function
-
astype
(dtype, **kwargs)¶ Copy of the array, cast to a specified type.
Parameters: - dtype (str or dtype) – Typecode or data-type to which the array is cast.
- casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –
Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility.
- ‘no’ means the data types should not be cast at all.
- ‘equiv’ means only byte-order changes are allowed.
- ‘safe’ means only casts which can preserve values are allowed.
- ‘same_kind’ means only safe casts or casts within a kind,
- like float64 to float32, are allowed.
- ‘unsafe’ means any data conversions may be done.
- copy (bool, optional) – By default, astype always returns a newly allocated array. If this is set to False and the dtype requirement is satisfied, the input array is returned instead of a copy.
-
choose
(choices, out=None, mode='raise')¶ Use an index array to construct a new array from a set of choices.
Refer to numpy.choose for full documentation.
See also
numpy.choose()
- equivalent function
-
clip
(min=None, max=None, out=None)¶ Return an array whose values are limited to
[min, max]
. One of max or min must be given.Refer to numpy.clip for full documentation.
See also
numpy.clip()
- equivalent function
-
copy
()¶ Copy array. This is a no-op for dask.arrays, which are immutable
-
cumprod
(axis, dtype=None, out=None)¶ See da.cumprod for docstring
-
cumsum
(axis, dtype=None, out=None)¶ See da.cumsum for docstring
-
dot
(b, out=None)¶ Dot product of two arrays.
Refer to numpy.dot for full documentation.
See also
numpy.dot()
- equivalent function
Examples
>>> a = np.eye(2) >>> b = np.ones((2, 2)) * 2 >>> a.dot(b) array([[ 2., 2.], [ 2., 2.]])
This array method can be conveniently chained:
>>> a.dot(b).dot(b) array([[ 8., 8.], [ 8., 8.]])
-
flatten
()¶ a.ravel([order])
Return a flattened array.
Refer to numpy.ravel for full documentation.
See also
numpy.ravel()
- equivalent function
ndarray.flat()
- a flat iterator on the array.
-
map_blocks
(func, *args, **kwargs)¶ Map a function across all blocks of a dask array.
Parameters: - func (callable) – Function to apply to every block in the array.
- args (dask arrays or constants) –
- dtype (np.dtype, optional) – The
dtype
of the output array. It is recommended to provide this. If not provided, will be inferred by applying the function to a small set of fake data. - chunks (tuple, optional) – Chunk shape of resulting blocks if the function does not preserve shape. If not provided, the resulting array is assumed to have the same block structure as the first input array.
- drop_axis (number or iterable, optional) – Dimensions lost by the function.
- new_axis (number or iterable, optional) – New dimensions created by the function. Note that these are applied
after
drop_axis
(if present). - token (string, optional) – The key prefix to use for the output array. If not provided, will be determined from the function name.
- name (string, optional) – The key name to use for the output array. Note that this fully specifies the output key name, and must be unique. If not provided, will be determined by a hash of the arguments.
- **kwargs – Other keyword arguments to pass to function. Values must be constants (not dask.arrays)
Examples
>>> import dask.array as da >>> x = da.arange(6, chunks=3)
>>> x.map_blocks(lambda x: x * 2).compute() array([ 0, 2, 4, 6, 8, 10])
The
da.map_blocks
function can also accept multiple arrays.>>> d = da.arange(5, chunks=2) >>> e = da.arange(5, chunks=2)
>>> f = map_blocks(lambda a, b: a + b**2, d, e) >>> f.compute() array([ 0, 2, 6, 12, 20])
If the function changes shape of the blocks then you must provide chunks explicitly.
>>> y = x.map_blocks(lambda x: x[::2], chunks=((2, 2),))
You have a bit of freedom in specifying chunks. If all of the output chunk sizes are the same, you can provide just that chunk size as a single tuple.
>>> a = da.arange(18, chunks=(6,)) >>> b = a.map_blocks(lambda x: x[:3], chunks=(3,))
If the function changes the dimension of the blocks you must specify the created or destroyed dimensions.
>>> b = a.map_blocks(lambda x: x[None, :, None], chunks=(1, 6, 1), ... new_axis=[0, 2])
Map_blocks aligns blocks by block positions without regard to shape. In the following example we have two arrays with the same number of blocks but with different shape and chunk sizes.
>>> x = da.arange(1000, chunks=(100,)) >>> y = da.arange(100, chunks=(10,))
The relevant attribute to match is numblocks.
>>> x.numblocks (10,) >>> y.numblocks (10,)
If these match (up to broadcasting rules) then we can map arbitrary functions across blocks
>>> def func(a, b): ... return np.array([a.max(), b.max()])
>>> da.map_blocks(func, x, y, chunks=(2,), dtype='i8') dask.array<func, shape=(20,), dtype=int64, chunksize=(2,)>
>>> _.compute() array([ 99, 9, 199, 19, 299, 29, 399, 39, 499, 49, 599, 59, 699, 69, 799, 79, 899, 89, 999, 99])
Your block function can learn where in the array it is if it supports a
block_id
keyword argument. This will receive entries like (2, 0, 1), the position of the block in the dask array.>>> def func(block, block_id=None): ... pass
You may specify the key name prefix of the resulting task in the graph with the optional
token
keyword argument.>>> x.map_blocks(lambda x: x + 1, token='increment') dask.array<increment, shape=(100,), dtype=int64, chunksize=(10,)>
-
map_overlap
(func, depth, boundary=None, trim=True, **kwargs)¶ Map a function over blocks of the array with some overlap
We share neighboring zones between blocks of the array, then map a function, then trim away the neighboring strips.
Parameters: - func (function) – The function to apply to each extended block
- depth (int, tuple, or dict) – The number of cells that each block should share with its neighbors If a tuple or dict this can be different per axis
- boundary (str, tuple, dict) – how to handle the boundaries. Values include ‘reflect’, ‘periodic’, ‘nearest’, ‘none’, or any constant value like 0 or np.nan
- trim (bool) – Whether or not to trim the excess after the map function. Set this to false if your mapping function does this for you.
- **kwargs – Other keyword arguments valid in
map_blocks
Examples
>>> x = np.array([1, 1, 2, 3, 3, 3, 2, 1, 1]) >>> x = from_array(x, chunks=5) >>> def derivative(x): ... return x - np.roll(x, 1)
>>> y = x.map_overlap(derivative, depth=1, boundary=0) >>> y.compute() array([ 1, 0, 1, 1, 0, 0, -1, -1, 0])
>>> import dask.array as da >>> x = np.arange(16).reshape((4, 4)) >>> d = da.from_array(x, chunks=(2, 2)) >>> d.map_overlap(lambda x: x + x.size, depth=1).compute() array([[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]])
>>> func = lambda x: x + x.size >>> depth = {0: 1, 1: 1} >>> boundary = {0: 'reflect', 1: 'none'} >>> d.map_overlap(func, depth, boundary).compute() array([[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27]])
-
max
(axis=None, out=None)¶ Return the maximum along a given axis.
Refer to numpy.amax for full documentation.
See also
numpy.amax()
- equivalent function
-
mean
(axis=None, dtype=None, out=None, keepdims=False)¶ Returns the average of the array elements along given axis.
Refer to numpy.mean for full documentation.
See also
numpy.mean()
- equivalent function
-
min
(axis=None, out=None, keepdims=False)¶ Return the minimum along a given axis.
Refer to numpy.amin for full documentation.
See also
numpy.amin()
- equivalent function
-
moment
(order, axis=None, dtype=None, keepdims=False, ddof=0, split_every=None, out=None)¶ Calculate the nth centralized moment.
Parameters: - order (int) – Order of the moment that is returned, must be >= 2.
- axis (int, optional) – Axis along which the central moment is computed. The default is to compute the moment of the flattened array.
- dtype (data-type, optional) – Type to use in computing the moment. For arrays of integer type the default is float64; for arrays of float types it is the same as the array type.
- keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the original array.
- ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default ddof is zero.
Returns: moment
Return type: ndarray
References
[1] Pebay, Philippe (2008), “Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments” (PDF), Technical Report SAND2008-6212, Sandia National Laboratories
-
nonzero
()¶ Return the indices of the elements that are non-zero.
Refer to numpy.nonzero for full documentation.
See also
numpy.nonzero()
- equivalent function
-
prod
(axis=None, dtype=None, out=None, keepdims=False)¶ Return the product of the array elements over the given axis
Refer to numpy.prod for full documentation.
See also
numpy.prod()
- equivalent function
-
ravel
([order])¶ Return a flattened array.
Refer to numpy.ravel for full documentation.
See also
numpy.ravel()
- equivalent function
ndarray.flat()
- a flat iterator on the array.
-
rechunk
(chunks, threshold=None, block_size_limit=None)¶ See da.rechunk for docstring
-
repeat
(repeats, axis=None)¶ Repeat elements of an array.
Refer to numpy.repeat for full documentation.
See also
numpy.repeat()
- equivalent function
-
reshape
(shape, order='C')¶ Returns an array containing the same data with a new shape.
Refer to numpy.reshape for full documentation.
See also
numpy.reshape()
- equivalent function
-
round
(decimals=0, out=None)¶ Return a with each element rounded to the given number of decimals.
Refer to numpy.around for full documentation.
See also
numpy.around()
- equivalent function
-
squeeze
(axis=None)¶ Remove single-dimensional entries from the shape of a.
Refer to numpy.squeeze for full documentation.
See also
numpy.squeeze()
- equivalent function
-
std
(axis=None, dtype=None, out=None, ddof=0, keepdims=False)¶ Returns the standard deviation of the array elements along given axis.
Refer to numpy.std for full documentation.
See also
numpy.std()
- equivalent function
-
store
(target, **kwargs)¶ Store dask arrays in array-like objects, overwrite data in target
This stores dask arrays into object that supports numpy-style setitem indexing. It stores values chunk by chunk so that it does not have to fill up memory. For best performance you can align the block size of the storage target with the block size of your array.
If your data fits in memory then you may prefer calling
np.array(myarray)
instead.Parameters: - sources (Array or iterable of Arrays) –
- targets (array-like or iterable of array-likes) – These should support setitem syntax
target[10:20] = ...
- lock (boolean or threading.Lock, optional) – Whether or not to lock the data stores while storing.
Pass True (lock each file individually), False (don’t lock) or a
particular
threading.Lock
object to be shared among all writes. - regions (tuple of slices or iterable of tuple of slices) – Each
region
tuple inregions
should be such thattarget[region].shape = source.shape
for the corresponding source and target in sources and targets, respectively. - compute (boolean, optional) – If true compute immediately, return
dask.delayed.Delayed
otherwise
Examples
>>> x = ...
>>> import h5py >>> f = h5py.File('myfile.hdf5') >>> dset = f.create_dataset('/data', shape=x.shape, ... chunks=x.chunks, ... dtype='f8')
>>> store(x, dset)
Alternatively store many arrays at the same time
>>> store([x, y, z], [dset1, dset2, dset3])
-
sum
(axis=None, dtype=None, out=None, keepdims=False)¶ Return the sum of the array elements over the given axis.
Refer to numpy.sum for full documentation.
See also
numpy.sum()
- equivalent function
-
swapaxes
(axis1, axis2)¶ Return a view of the array with axis1 and axis2 interchanged.
Refer to numpy.swapaxes for full documentation.
See also
numpy.swapaxes()
- equivalent function
-
to_dask_dataframe
(columns=None)¶ Convert dask Array to dask Dataframe
Parameters: columns (list or string) – list of column names if DataFrame, single string if Series See also
-
to_delayed
()¶ Convert Array into dask Delayed objects
Returns an array of values, one value per chunk.
See also
-
to_hdf5
(filename, datapath, **kwargs)¶ Store array in HDF5 file
>>> x.to_hdf5('myfile.hdf5', '/x')
Optionally provide arguments as though to
h5py.File.create_dataset
>>> x.to_hdf5('myfile.hdf5', '/x', compression='lzf', shuffle=True)
See also
da.store()
,h5py.File.create_dataset()
-
topk
(k)¶ The top k elements of an array.
See
da.topk
for docstring
-
transpose
(*axes)¶ Returns a view of the array with axes transposed.
For a 1-D array, this has no effect. (To change between column and row vectors, first cast the 1-D array into a matrix object.) For a 2-D array, this is the usual matrix transpose. For an n-D array, if axes are given, their order indicates how the axes are permuted (see Examples). If axes are not provided and
a.shape = (i[0], i[1], ... i[n-2], i[n-1])
, thena.transpose().shape = (i[n-1], i[n-2], ... i[1], i[0])
.Parameters: axes (None, tuple of ints, or n ints) – - None or no argument: reverses the order of the axes.
- tuple of ints: i in the j-th place in the tuple means a‘s i-th axis becomes a.transpose()‘s j-th axis.
- n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
Returns: out – View of a, with axes suitably permuted. Return type: ndarray See also
ndarray.T()
- Array property returning the array transposed.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> a array([[1, 2], [3, 4]]) >>> a.transpose() array([[1, 3], [2, 4]]) >>> a.transpose((1, 0)) array([[1, 3], [2, 4]]) >>> a.transpose(1, 0) array([[1, 3], [2, 4]])
-
var
(axis=None, dtype=None, out=None, ddof=0, keepdims=False)¶ Returns the variance of the array elements, along given axis.
Refer to numpy.var for full documentation.
See also
numpy.var()
- equivalent function
-
view
(dtype, order='C')¶ Get a view of the array as a new data type
Parameters: - dtype – The dtype by which to view the array
- order (string) – ‘C’ or ‘F’ (Fortran) ordering
- reinterprets the bytes of the array under a new dtype. If that (This) –
- does not have the same size as the original array then the shape (dtype) –
- change. (will) –
- that both numpy and dask.array can behave oddly when taking (Beware) –
- views of arrays under Fortran ordering. Under some (shape-changing) –
- of NumPy this function will fail when taking shape-changing (versions) –
- of Fortran ordered arrays if the first dimension has chunks of (views) –
- one. (size) –
-
vnorm
(ord=None, axis=None, keepdims=False, split_every=None, out=None)¶ Vector norm
-
itemsize
¶ Length of one array element in bytes
-
nbytes
¶ Number of bytes in array
-
size
¶ Number of elements in array
-
vindex
¶ Vectorized indexing with broadcasting.
This is equivalent to numpy’s advanced indexing, using arrays that are broadcast against each other. This allows for pointwise indexing:
>>> x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> x = from_array(x, chunks=2) >>> x.vindex[[0, 1, 2], [0, 1, 2]].compute() array([1, 5, 9])
Mixed basic/advanced indexing with slices/arrays is also supported. The order of dimensions in the result follows those proposed for ndarray.vindex [1]_: the subspace spanned by arrays is followed by all slices.
Note:
vindex
provides more general functionality than standard indexing, but it also has fewer optimizations and can be significantly slower.
-