API¶
This page contains a comprehensive list of functionality within blaze
.
Docstrings should provide sufficient understanding for any individual function
or class.
Interactive Use¶
_Data |
Bind a data resource to a symbol, for use in expressions and computation. |
Expressions¶
Projection |
Select a subset of fields from data. |
Selection |
Filter elements of expression based on predicate |
Label |
An expression with a name. |
ReLabel |
Table with same content but with new labels |
Map |
Map an arbitrary Python function across elements in a collection |
Apply |
Apply an arbitrary Python function onto an expression |
Coerce |
Coerce an expression to a different type. |
Coalesce |
SQL like coalesce. |
Cast |
Cast an expression to a different type. |
Sort |
Table in sorted order |
Distinct |
Remove duplicate elements from an expression |
Head |
First n elements of collection |
Merge |
Merge many fields together |
Join |
Join two tables on common columns |
Concat |
Stack tables on common columns |
IsIn |
Check if an expression contains values from a set. |
By |
Split-Apply-Combine Operator |
Blaze Server¶
Server ([data, formats, authorization, ...]) |
Blaze Data Server |
Client (url[, serial, verify_ssl, auth]) |
Client for Blaze Server |
Additional Server Utilities¶
expr_md5 (expr) |
Returns the md5 hash of the str of the expression. |
to_tree (expr[, names]) |
Represent Blaze expression with core data structures |
from_tree (expr[, namespace]) |
Convert core data structures to Blaze expression |
data_spider (path[, ignore, followlinks, ...]) |
Traverse a directory and call blaze.data on its contents. |
from_yaml (fh[, ignore, followlinks, hidden, ...]) |
Construct a dictionary of resources from a YAML specification. |
Definitions¶
-
blaze.interactive.
data
(data_source, dshape=None, name=None, fields=None, schema=None, **kwargs)¶ Bind a data resource to a symbol, for use in expressions and computation.
A
data
object presents a consistent view onto a variety of concrete data sources. Likesymbol
objects, they are meant to be used in expressions. Because they are tied to concrete data resources,data
objects can be used withcompute
directly, making them convenient for interactive exploration.Parameters: - data_source (object) – Any type with
discover
andcompute
implementations - fields (list, optional) – Field or column names, will be inferred from data_source if possible
- dshape (str or DataShape, optional) – DataShape describing input data
- name (str, optional) – A name for the data.
Examples
>>> t = data([(1, 'Alice', 100), ... (2, 'Bob', -200), ... (3, 'Charlie', 300), ... (4, 'Denis', 400), ... (5, 'Edith', -500)], ... fields=['id', 'name', 'balance']) >>> t[t.balance < 0].name name 0 Bob 1 Edith
- data_source (object) – Any type with
-
blaze.server.spider.
data_spider
(path, ignore=(<type 'exceptions.ValueError'>, <type 'exceptions.NotImplementedError'>), followlinks=True, hidden=False, extra_kwargs=None)¶ Traverse a directory and call
blaze.data
on its contents.Parameters: - path (str) – Path to a directory of resources to load
- ignore (tuple of Exception, optional) – Ignore these exceptions when calling
blaze.data
- followlinks (bool, optional) – Follow symbolic links
- hidden (bool, optional) – Load hidden files
- extra_kwargs (dict, optional) – extra kwargs to forward on to
blaze.data
Returns: Possibly nested dictionary of containing basenames mapping to resources
Return type:
-
blaze.server.spider.
from_yaml
(fh, ignore=(<type 'exceptions.ValueError'>, <type 'exceptions.NotImplementedError'>), followlinks=True, hidden=False, relative_to_yaml_dir=False)¶ Construct a dictionary of resources from a YAML specification.
Parameters: - fh (file) – File object referring to the YAML specification of resources to load.
- ignore (tuple of Exception, optional) – Ignore these exceptions when calling
blaze.data
. - followlinks (bool, optional) – Follow symbolic links.
- hidden (bool, optional) – Load hidden files.
- relative_to_yaml_dir (bool, optional, default False) – Load paths relative to yaml file’s directory. Default is to load relative to process’ CWD.
Returns: A dictionary mapping top level keys in a YAML file to resources.
Return type: See also
data_spider()
- Traverse a directory tree for resources
-
class
blaze.server.server.
Server
(data=None, formats=None, authorization=None, allow_profiler=False, profiler_output=None, profile_by_default=False, allow_add=False, logfile=<open file '<stdout>', mode 'w'>, loglevel='WARNING', log_exception_formatter=<toolz.functoolz.Compose object>)¶ Blaze Data Server
Host local data through a web API
Parameters: - data (dict, optional) – A dictionary mapping dataset name to any data format that blaze understands.
- formats (iterable, optional) – An iterable of supported serialization formats. By default, the server will support JSON. A serialization format is an object that supports: name, loads, and dumps.
- authorization (callable, optional) – A callable to be used to check the auth header from the client. This callable should accept a single argument that will either be None indicating that no header was passed, or an object containing a username and password attribute. By default, all requests are allowed.
- allow_profiler (bool, optional) – Allow payloads to specify “profile”: true which will run the computation under cProfile.
- profiler_output (str, optional) –
The directory to write pstats files after profile runs. The files will be written in a structure like:
{profiler_output}/{hash(expr)}/{timestamp}This defaults to a relative path of profiler_output. This requires allow_profiler=True.
If this is the string ‘:response’ then writing to the local filesystem is disabled. Only requests that specify profiler_output=’:response’ will be served. All others will return a 403 (Forbidden).
- profile_by_default (bool, optional) – Run the profiler on any computation that does not explicitly set “profile”: false. This requires allow_profiler=True.
- allow_add (bool, optional) – Expose an /add endpoint to allow datasets to be dynamically added to the server. Since this increases the risk of security holes, it defaults to False.
- logfile (str or file-like object, optional) – A filename or open file-like stream to which to send log output. Defaults to sys.stdout.
- loglevel (str, optional) – A string logging level (e.g. ‘WARNING’, ‘INFO’) to set how verbose log output should be.
- log_exception_formatter (callable, optional) – A callable to be used to format an exception traceback for logging. It should take a traceback argument, and return the string to be logged. This defaults to the standard library traceback.format_tb
Examples
>>> from pandas import DataFrame >>> df = DataFrame([[1, 'Alice', 100], ... [2, 'Bob', -200], ... [3, 'Alice', 300], ... [4, 'Dennis', 400], ... [5, 'Bob', -500]], ... columns=['id', 'name', 'amount'])
>>> server = Server({'accounts': df}) >>> server.run()
-
run
(port=6363, retry=False, **kwargs)¶ Run the server.
Parameters: - port (int, optional) – The port to bind to.
- retry (bool, optional) – If the port is busy, should we retry with the next available port?
- **kwargs – Forwarded to the underlying flask app’s
run
method.
Notes
This function blocks forever when successful.
-
blaze.server.server.
to_tree
(expr, names=None)¶ Represent Blaze expression with core data structures
Transform a Blaze expression into a form using only strings, dicts, lists and base types (int, float, datetime, ....) This form can be useful for serialization.
Parameters: expr (Expr) – A Blaze expression Examples
>>> t = symbol('t', 'var * {x: int32, y: int32}') >>> to_tree(t) {'op': 'Symbol', 'args': ['t', 'var * { x : int32, y : int32 }', False]}
>>> to_tree(t.x.sum()) {'op': 'sum', 'args': [{'op': 'Column', 'args': [{'op': 'Symbol' 'args': ['t', 'var * { x : int32, y : int32 }', False]} 'x']}]}
Simplify expresion using explicit
names
dictionary. In the example below we replace theSymbol
node with the string't'
.>>> tree = to_tree(t.x, names={t: 't'}) >>> tree {'op': 'Column', 'args': ['t', 'x']}
>>> from_tree(tree, namespace={'t': t}) t.x
See also
-
blaze.server.server.
from_tree
(expr, namespace=None)¶ Convert core data structures to Blaze expression
Core data structure representations created by
to_tree
are converted back into Blaze expressions.Parameters: expr (dict) – Examples
>>> t = symbol('t', 'var * {x: int32, y: int32}') >>> tree = to_tree(t) >>> tree {'op': 'Symbol', 'args': ['t', 'var * { x : int32, y : int32 }', False]}
>>> from_tree(tree) <`t` symbol; dshape='var * {x: int32, y: int32}'>
>>> tree = to_tree(t.x.sum()) >>> tree {'op': 'sum', 'args': [{'op': 'Field', 'args': [{'op': 'Symbol' 'args': ['t', 'var * {x : int32, y : int32}', False]} 'x']}]}
>>> from_tree(tree) sum(t.x)
Simplify expresion using explicit
names
dictionary. In the example below we replace theSymbol
node with the string't'
.>>> tree = to_tree(t.x, names={t: 't'}) >>> tree {'op': 'Field', 'args': ['t', 'x']}
>>> from_tree(tree, namespace={'t': t}) t.x
See also
-
blaze.server.server.
expr_md5
(expr)¶ Returns the md5 hash of the str of the expression.
Parameters: expr (Expr) – The expression to hash. Returns: hexdigest – The hexdigest of the md5 of the str of expr
.Return type: str
-
class
blaze.server.client.
Client
(url, serial=<SerializationFormat: 'json'>, verify_ssl=True, auth=None, **kwargs)¶ Client for Blaze Server
Provides programmatic access to datasets living on Blaze Server
Parameters: - url (str) – URL of a Blaze server
- serial (SerializationFormat, optional) – The serialization format object to use. Defaults to JSON. A serialization format is an object that supports: name, loads, and dumps.
- verify_ssl (bool, optional) – Verify the ssl certificate from the server. This is enabled by default.
- auth (tuple, optional) – The username and password to use when connecting to the server. If not provided, no auth header will be sent.
Examples
>>> # This example matches with the docstring of ``Server`` >>> from blaze import data >>> c = Client('localhost:6363') >>> t = data(c)
See also
-
add
(name, resource_uri, *args, **kwargs)¶ Add the given resource URI to the Blaze server.
Parameters: - name (str) – The name to give the resource
- resource_uri (str) – The URI string describing the resource to add to the server, e.g ‘sqlite:///path/to/file.db::table’
- imports (list) – A list of string names for any modules that must be imported on the Blaze server before the resource can be added. This is identical to the imports field in a Blaze server YAML file.
- args (any, optional) – Any additional positional arguments that can be passed to the
blaze.resource
constructor for this resource type - kwargs (any, optional) – Any additional keyword arguments that can be passed to the
blaze.resource
constructor for this resource type
-
dshape
¶ The datashape of the client
-
class
blaze.expr.collections.
Concat
¶ Stack tables on common columns
Parameters: - rhs (lhs,) – Collections to concatenate
- axis (int, optional) – The axis to concatenate on.
Examples
>>> from blaze import symbol
Vertically stack tables:
>>> names = symbol('names', '5 * {name: string, id: int32}') >>> more_names = symbol('more_names', '7 * {name: string, id: int32}') >>> stacked = concat(names, more_names) >>> stacked.dshape dshape("12 * {name: string, id: int32}")
Vertically stack matrices:
>>> mat_a = symbol('a', '3 * 5 * int32') >>> mat_b = symbol('b', '3 * 5 * int32') >>> vstacked = concat(mat_a, mat_b, axis=0) >>> vstacked.dshape dshape("6 * 5 * int32")
Horizontally stack matrices:
>>> hstacked = concat(mat_a, mat_b, axis=1) >>> hstacked.dshape dshape("3 * 10 * int32")
See also
-
class
blaze.expr.collections.
Distinct
¶ Remove duplicate elements from an expression
Parameters: on (tuple of Field
) – The subset of fields or names of fields to be distinct on.Examples
>>> from blaze import symbol >>> t = symbol('t', 'var * {name: string, amount: int, id: int}') >>> e = distinct(t)
>>> data = [('Alice', 100, 1), ... ('Bob', 200, 2), ... ('Alice', 100, 1)]
>>> from blaze.compute.python import compute >>> sorted(compute(e, data)) [('Alice', 100, 1), ('Bob', 200, 2)]
Use a subset by passing on:
>>> import pandas as pd >>> e = distinct(t, 'name') >>> data = pd.DataFrame([['Alice', 100, 1], ... ['Alice', 200, 2], ... ['Bob', 100, 1], ... ['Bob', 200, 2]], ... columns=['name', 'amount', 'id']) >>> compute(e, data) name amount id 0 Alice 100 1 1 Bob 100 1
-
class
blaze.expr.collections.
Head
¶ First n elements of collection
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.head(5).dshape dshape("5 * {name: string, amount: int32}")
See also
-
class
blaze.expr.collections.
IsIn
¶ Check if an expression contains values from a set.
Return a boolean expression indicating whether another expression contains values that are members of a collection.
Parameters: - expr (Expr) – Expression whose elements to check for membership in keys
- keys (Sequence) – Elements to test against. Blaze stores this as a
frozenset
.
Examples
Check if a vector contains any of 1, 2 or 3:
>>> from blaze import symbol >>> t = symbol('t', '10 * int64') >>> expr = t.isin([1, 2, 3]) >>> expr.dshape dshape("10 * bool")
-
class
blaze.expr.collections.
Join
¶ Join two tables on common columns
Parameters: - rhs (lhs,) – Expressions to join
- on_left (str, optional) – The fields from the left side to join on.
If no
on_right
is passed, then these are the fields for both sides. - on_right (str, optional) – The fields from the right side to join on.
- how ({'inner', 'outer', 'left', 'right'}) – What type of join to perform.
- suffixes (pair of str) – The suffixes to be applied to the left and right sides in order to resolve duplicate field names.
Examples
>>> from blaze import symbol >>> names = symbol('names', 'var * {name: string, id: int}') >>> amounts = symbol('amounts', 'var * {amount: int, id: int}')
Join tables based on shared column name
>>> joined = join(names, amounts, 'id')
Join based on different column names
>>> amounts = symbol('amounts', 'var * {amount: int, acctNumber: int}') >>> joined = join(names, amounts, 'id', 'acctNumber')
See also
-
class
blaze.expr.collections.
Merge
¶ Merge many fields together
Examples
>>> from blaze import symbol, label >>> accounts = symbol('accounts', 'var * {name: string, x: int, y: real}') >>> merge(accounts.name, z=accounts.x + accounts.y).fields ['name', 'z']
To control the ordering of the fields, use
label
:>>> merge(label(accounts.name, 'NAME'), label(accounts.x, 'X')).dshape dshape("var * {NAME: string, X: int32}") >>> merge(label(accounts.x, 'X'), label(accounts.name, 'NAME')).dshape dshape("var * {X: int32, NAME: string}")
-
class
blaze.expr.collections.
Sample
¶ Random row-wise sample. Can specify n or frac for an absolute or fractional number of rows, respectively.
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.sample(n=2).dshape dshape("var * {name: string, amount: int32}") >>> accounts.sample(frac=0.1).dshape dshape("var * {name: string, amount: int32}")
-
class
blaze.expr.collections.
Shift
¶ Shift a column backward or forward by N elements
Parameters:
-
class
blaze.expr.collections.
Sort
¶ Table in sorted order
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.sort('amount', ascending=False).schema dshape("{name: string, amount: int32}")
Some backends support sorting by arbitrary rowwise tables, e.g.
>>> accounts.sort(-accounts.amount)
-
class
blaze.expr.collections.
Tail
¶ Last n elements of collection
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.tail(5).dshape dshape("5 * {name: string, amount: int32}")
See also
-
blaze.expr.collections.
concat
(lhs, rhs, axis=0)¶ Stack tables on common columns
Parameters: - rhs (lhs,) – Collections to concatenate
- axis (int, optional) – The axis to concatenate on.
Examples
>>> from blaze import symbol
Vertically stack tables:
>>> names = symbol('names', '5 * {name: string, id: int32}') >>> more_names = symbol('more_names', '7 * {name: string, id: int32}') >>> stacked = concat(names, more_names) >>> stacked.dshape dshape("12 * {name: string, id: int32}")
Vertically stack matrices:
>>> mat_a = symbol('a', '3 * 5 * int32') >>> mat_b = symbol('b', '3 * 5 * int32') >>> vstacked = concat(mat_a, mat_b, axis=0) >>> vstacked.dshape dshape("6 * 5 * int32")
Horizontally stack matrices:
>>> hstacked = concat(mat_a, mat_b, axis=1) >>> hstacked.dshape dshape("3 * 10 * int32")
See also
-
blaze.expr.collections.
distinct
(expr, *on)¶ Remove duplicate elements from an expression
Parameters: on (tuple of Field
) – The subset of fields or names of fields to be distinct on.Examples
>>> from blaze import symbol >>> t = symbol('t', 'var * {name: string, amount: int, id: int}') >>> e = distinct(t)
>>> data = [('Alice', 100, 1), ... ('Bob', 200, 2), ... ('Alice', 100, 1)]
>>> from blaze.compute.python import compute >>> sorted(compute(e, data)) [('Alice', 100, 1), ('Bob', 200, 2)]
Use a subset by passing on:
>>> import pandas as pd >>> e = distinct(t, 'name') >>> data = pd.DataFrame([['Alice', 100, 1], ... ['Alice', 200, 2], ... ['Bob', 100, 1], ... ['Bob', 200, 2]], ... columns=['name', 'amount', 'id']) >>> compute(e, data) name amount id 0 Alice 100 1 1 Bob 100 1
-
blaze.expr.collections.
head
(child, n=10)¶ First n elements of collection
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.head(5).dshape dshape("5 * {name: string, amount: int32}")
See also
-
blaze.expr.collections.
isin
(expr, keys)¶ Check if an expression contains values from a set.
Return a boolean expression indicating whether another expression contains values that are members of a collection.
Parameters: - expr (Expr) – Expression whose elements to check for membership in keys
- keys (Sequence) – Elements to test against. Blaze stores this as a
frozenset
.
Examples
Check if a vector contains any of 1, 2 or 3:
>>> from blaze import symbol >>> t = symbol('t', '10 * int64') >>> expr = t.isin([1, 2, 3]) >>> expr.dshape dshape("10 * bool")
-
blaze.expr.collections.
join
(lhs, rhs, on_left=None, on_right=None, how='inner', suffixes=('_left', '_right'))¶ Join two tables on common columns
Parameters: - rhs (lhs,) – Expressions to join
- on_left (str, optional) – The fields from the left side to join on.
If no
on_right
is passed, then these are the fields for both sides. - on_right (str, optional) – The fields from the right side to join on.
- how ({'inner', 'outer', 'left', 'right'}) – What type of join to perform.
- suffixes (pair of str) – The suffixes to be applied to the left and right sides in order to resolve duplicate field names.
Examples
>>> from blaze import symbol >>> names = symbol('names', 'var * {name: string, id: int}') >>> amounts = symbol('amounts', 'var * {amount: int, id: int}')
Join tables based on shared column name
>>> joined = join(names, amounts, 'id')
Join based on different column names
>>> amounts = symbol('amounts', 'var * {amount: int, acctNumber: int}') >>> joined = join(names, amounts, 'id', 'acctNumber')
See also
-
blaze.expr.collections.
merge
(*exprs, **kwargs)¶ Merge many fields together
Examples
>>> from blaze import symbol, label >>> accounts = symbol('accounts', 'var * {name: string, x: int, y: real}') >>> merge(accounts.name, z=accounts.x + accounts.y).fields ['name', 'z']
To control the ordering of the fields, use
label
:>>> merge(label(accounts.name, 'NAME'), label(accounts.x, 'X')).dshape dshape("var * {NAME: string, X: int32}") >>> merge(label(accounts.x, 'X'), label(accounts.name, 'NAME')).dshape dshape("var * {X: int32, NAME: string}")
-
blaze.expr.collections.
sample
(child, n=None, frac=None)¶ Random row-wise sample. Can specify n or frac for an absolute or fractional number of rows, respectively.
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.sample(n=2).dshape dshape("var * {name: string, amount: int32}") >>> accounts.sample(frac=0.1).dshape dshape("var * {name: string, amount: int32}")
-
blaze.expr.collections.
shift
(expr, n)¶ Shift a column backward or forward by N elements
Parameters:
-
blaze.expr.collections.
sort
(child, key=None, ascending=True)¶ Sort a collection
Parameters: - key (str, list of str, or Expr) – Defines by what you want to sort.
- A single column string:
t.sort('amount')
- A list of column strings:
t.sort(['name', 'amount'])
- An expression:
t.sort(-t.amount)
If sorting a columnar dataset, the
key
is ignored, as it is not necessary:t.amount.sort()
t.amount.sort('amount')
t.amount.sort('foobar')
are all equivalent.
- A single column string:
- ascending (bool, optional) – Determines order of the sort
- key (str, list of str, or Expr) –
-
blaze.expr.collections.
tail
(child, n=10)¶ Last n elements of collection
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.tail(5).dshape dshape("5 * {name: string, amount: int32}")
See also
-
blaze.expr.collections.
transform
(t, replace=True, **kwargs)¶ Add named columns to table
>>> from blaze import symbol >>> t = symbol('t', 'var * {x: int, y: int}') >>> transform(t, z=t.x + t.y).fields ['x', 'y', 'z']
-
class
blaze.expr.expressions.
Apply
¶ Apply an arbitrary Python function onto an expression
Examples
>>> t = symbol('t', 'var * {name: string, amount: int}') >>> h = t.apply(hash, dshape='int64') # Hash value of resultant dataset
You must provide the datashape of the result with the
dshape=
keyword. For datashape examples see http://datashape.pydata.org/grammar.html#some-simple-examplesIf using a chunking backend and your operation may be safely split and concatenated then add the
splittable=True
keyword argument>>> t.apply(f, dshape='...', splittable=True)
See also
-
class
blaze.expr.expressions.
Cast
¶ Cast an expression to a different type.
This is only an expression time operation.
Examples
>>> s = symbol('s', '?int64') >>> s.cast('?int32').dshape dshape("?int32")
# Cast to correct mislabeled optionals >>> s.cast(‘int64’).dshape dshape(“int64”)
# Cast to give concrete dimension length >>> t = symbol(‘t’, ‘var * float32’) >>> t.cast(‘10 * float32’).dshape dshape(“10 * float32”)
-
class
blaze.expr.expressions.
Coalesce
¶ SQL like coalesce.
- coalesce(a, b) = {
- a if a is not NULL b otherwise
}
Examples
>>> coalesce(1, 2) 1 >>> coalesce(1, None) 1 >>> coalesce(None, 2) 2 >>> coalesce(None, None) is None True
-
class
blaze.expr.expressions.
Coerce
¶ Coerce an expression to a different type.
Examples
>>> t = symbol('t', '100 * float64') >>> t.coerce(to='int64') t.coerce(to='int64') >>> t.coerce('float32') t.coerce(to='float32') >>> t.coerce('int8').dshape dshape("100 * int8")
-
class
blaze.expr.expressions.
ElemWise
¶ Elementwise operation.
The shape of this expression matches the shape of the child.
-
class
blaze.expr.expressions.
Expr
¶ Symbolic expression of a computation
All Blaze expressions (Join, By, Sort, ...) descend from this class. It contains shared logic and syntax. It in turn inherits from
Node
which holds all tree traversal logic-
cast
(expr, to)¶ Cast an expression to a different type.
This is only an expression time operation.
Examples
>>> s = symbol('s', '?int64') >>> s.cast('?int32').dshape dshape("?int32")
# Cast to correct mislabeled optionals >>> s.cast(‘int64’).dshape dshape(“int64”)
# Cast to give concrete dimension length >>> t = symbol(‘t’, ‘var * float32’) >>> t.cast(‘10 * float32’).dshape dshape(“10 * float32”)
-
map
(func, schema=None, name=None)¶ Map an arbitrary Python function across elements in a collection
Examples
>>> from datetime import datetime
>>> t = symbol('t', 'var * {price: real, time: int64}') # times as integers >>> datetimes = t.time.map(datetime.utcfromtimestamp)
Optionally provide extra schema information
>>> datetimes = t.time.map(datetime.utcfromtimestamp, ... schema='{time: datetime}')
See also
blaze.expr.expresions.Apply()
-
-
class
blaze.expr.expressions.
Field
¶ A single field from an expression.
Get a single field from an expression with record-type schema. We store the name of the field in the
_name
attribute.Examples
>>> points = symbol('points', '5 * 3 * {x: int32, y: int32}') >>> points.x.dshape dshape("5 * 3 * int32")
For fields that aren’t valid Python identifiers, use
[]
syntax:>>> points = symbol('points', '5 * 3 * {"space station": float64}') >>> points['space station'].dshape dshape("5 * 3 * float64")
-
class
blaze.expr.expressions.
Label
¶ An expression with a name.
Examples
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> expr = accounts.amount * 100 >>> expr._name 'amount' >>> expr.label('new_amount')._name 'new_amount'
See also
-
class
blaze.expr.expressions.
Map
¶ Map an arbitrary Python function across elements in a collection
Examples
>>> from datetime import datetime
>>> t = symbol('t', 'var * {price: real, time: int64}') # times as integers >>> datetimes = t.time.map(datetime.utcfromtimestamp)
Optionally provide extra schema information
>>> datetimes = t.time.map(datetime.utcfromtimestamp, ... schema='{time: datetime}')
See also
blaze.expr.expresions.Apply
-
class
blaze.expr.expressions.
Projection
¶ Select a subset of fields from data.
Examples
>>> accounts = symbol('accounts', ... 'var * {name: string, amount: int, id: int}') >>> accounts[['name', 'amount']].schema dshape("{name: string, amount: int32}") >>> accounts[['name', 'amount']] accounts[['name', 'amount']]
See also
-
class
blaze.expr.expressions.
ReLabel
¶ Table with same content but with new labels
Examples
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.schema dshape("{name: string, amount: int32}") >>> accounts.relabel(amount='balance').schema dshape("{name: string, balance: int32}") >>> accounts.relabel(not_a_column='definitely_not_a_column') Traceback (most recent call last): ... ValueError: Cannot relabel non-existent child fields: {'not_a_column'} >>> s = symbol('s', 'var * {"0": int64}') >>> s.relabel({'0': 'foo'}) s.relabel({'0': 'foo'}) >>> s.relabel(0='foo') Traceback (most recent call last): ... SyntaxError: keyword can't be an expression
Notes
When names are not valid Python names, such as integers or string with spaces, you must pass a dictionary to
relabel
. For example>>> s = symbol('s', 'var * {"0": int64}') >>> s.relabel({'0': 'foo'}) s.relabel({'0': 'foo'}) >>> t = symbol('t', 'var * {"whoo hoo": ?float32}') >>> t.relabel({"whoo hoo": 'foo'}) t.relabel({'whoo hoo': 'foo'})
See also
-
class
blaze.expr.expressions.
Selection
¶ Filter elements of expression based on predicate
Examples
>>> accounts = symbol('accounts', ... 'var * {name: string, amount: int, id: int}') >>> deadbeats = accounts[accounts.amount < 0]
-
class
blaze.expr.expressions.
SimpleSelection
¶ Internal selection class that does not treat the predicate as an input.
-
class
blaze.expr.expressions.
Slice
¶ Elements start until stop. On many backends, a step parameter is also allowed.
Examples
>>> from blaze import symbol >>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts[2:7].dshape dshape("5 * {name: string, amount: int32}") >>> accounts[2:7:2].dshape dshape("3 * {name: string, amount: int32}")
-
class
blaze.expr.expressions.
Symbol
¶ Symbolic data. The leaf of a Blaze expression
Examples
>>> points = symbol('points', '5 * 3 * {x: int, y: int}') >>> points <`points` symbol; dshape='5 * 3 * {x: int32, y: int32}'> >>> points.dshape dshape("5 * 3 * {x: int32, y: int32}")
-
blaze.expr.expressions.
apply
(expr, func, dshape, splittable=False)¶ Apply an arbitrary Python function onto an expression
Examples
>>> t = symbol('t', 'var * {name: string, amount: int}') >>> h = t.apply(hash, dshape='int64') # Hash value of resultant dataset
You must provide the datashape of the result with the
dshape=
keyword. For datashape examples see http://datashape.pydata.org/grammar.html#some-simple-examplesIf using a chunking backend and your operation may be safely split and concatenated then add the
splittable=True
keyword argument>>> t.apply(f, dshape='...', splittable=True)
See also
-
blaze.expr.expressions.
cast
(expr, to)¶ Cast an expression to a different type.
This is only an expression time operation.
Examples
>>> s = symbol('s', '?int64') >>> s.cast('?int32').dshape dshape("?int32")
# Cast to correct mislabeled optionals >>> s.cast(‘int64’).dshape dshape(“int64”)
# Cast to give concrete dimension length >>> t = symbol(‘t’, ‘var * float32’) >>> t.cast(‘10 * float32’).dshape dshape(“10 * float32”)
-
blaze.expr.expressions.
coalesce
(a, b)¶ SQL like coalesce.
- coalesce(a, b) = {
- a if a is not NULL b otherwise
}
Examples
>>> coalesce(1, 2) 1 >>> coalesce(1, None) 1 >>> coalesce(None, 2) 2 >>> coalesce(None, None) is None True
-
blaze.expr.expressions.
coerce
(expr, to)¶ Coerce an expression to a different type.
Examples
>>> t = symbol('t', '100 * float64') >>> t.coerce(to='int64') t.coerce(to='int64') >>> t.coerce('float32') t.coerce(to='float32') >>> t.coerce('int8').dshape dshape("100 * int8")
-
blaze.expr.expressions.
label
(expr, lab)¶ An expression with a name.
Examples
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> expr = accounts.amount * 100 >>> expr._name 'amount' >>> expr.label('new_amount')._name 'new_amount'
See also
-
blaze.expr.expressions.
ndim
(expr)¶ Number of dimensions of expression
>>> symbol('s', '3 * var * int32').ndim 2
-
blaze.expr.expressions.
projection
(expr, names)¶ Select a subset of fields from data.
Examples
>>> accounts = symbol('accounts', ... 'var * {name: string, amount: int, id: int}') >>> accounts[['name', 'amount']].schema dshape("{name: string, amount: int32}") >>> accounts[['name', 'amount']] accounts[['name', 'amount']]
See also
-
blaze.expr.expressions.
relabel
(child, labels=None, **kwargs)¶ Table with same content but with new labels
Examples
>>> accounts = symbol('accounts', 'var * {name: string, amount: int}') >>> accounts.schema dshape("{name: string, amount: int32}") >>> accounts.relabel(amount='balance').schema dshape("{name: string, balance: int32}") >>> accounts.relabel(not_a_column='definitely_not_a_column') Traceback (most recent call last): ... ValueError: Cannot relabel non-existent child fields: {'not_a_column'} >>> s = symbol('s', 'var * {"0": int64}') >>> s.relabel({'0': 'foo'}) s.relabel({'0': 'foo'}) >>> s.relabel(0='foo') Traceback (most recent call last): ... SyntaxError: keyword can't be an expression
Notes
When names are not valid Python names, such as integers or string with spaces, you must pass a dictionary to
relabel
. For example>>> s = symbol('s', 'var * {"0": int64}') >>> s.relabel({'0': 'foo'}) s.relabel({'0': 'foo'}) >>> t = symbol('t', 'var * {"whoo hoo": ?float32}') >>> t.relabel({"whoo hoo": 'foo'}) t.relabel({'whoo hoo': 'foo'})
See also
-
blaze.expr.expressions.
selection
(table, predicate)¶ Filter elements of expression based on predicate
Examples
>>> accounts = symbol('accounts', ... 'var * {name: string, amount: int, id: int}') >>> deadbeats = accounts[accounts.amount < 0]
-
blaze.expr.expressions.
symbol
(name, dshape, token=None)¶ Symbolic data. The leaf of a Blaze expression
Examples
>>> points = symbol('points', '5 * 3 * {x: int, y: int}') >>> points <`points` symbol; dshape='5 * 3 * {x: int32, y: int32}'> >>> points.dshape dshape("5 * 3 * {x: int32, y: int32}")
-
class
blaze.expr.reductions.
Reduction
¶ A column-wise reduction
Blaze supports the same class of reductions as NumPy and Pandas.
sum, min, max, any, all, mean, var, std, count, nuniqueExamples
>>> from blaze import symbol >>> t = symbol('t', 'var * {name: string, amount: int, id: int}') >>> e = t['amount'].sum()
>>> data = [['Alice', 100, 1], ... ['Bob', 200, 2], ... ['Alice', 50, 3]]
>>> from blaze.compute.python import compute >>> compute(e, data) 350
-
class
blaze.expr.reductions.
Summary
¶ A collection of named reductions
Examples
>>> from blaze import symbol >>> t = symbol('t', 'var * {name: string, amount: int, id: int}') >>> expr = summary(number=t.id.nunique(), sum=t.amount.sum())
>>> data = [['Alice', 100, 1], ... ['Bob', 200, 2], ... ['Alice', 50, 1]]
>>> from blaze import compute >>> compute(expr, data) (2, 350)
-
class
blaze.expr.reductions.
count
¶ The number of non-null elements
-
class
blaze.expr.reductions.
nelements
¶ Compute the number of elements in a collection, including missing values.
See also
blaze.expr.reductions.count
- compute the number of non-null elements
Examples
>>> from blaze import symbol >>> t = symbol('t', 'var * {name: string, amount: float64}') >>> t[t.amount < 1].nelements() nelements(t[t.amount < 1])
-
class
blaze.expr.reductions.
std
¶ Standard Deviation
Parameters: - child (Expr) – An expression
- unbiased (bool, optional) –
Compute the square root of an unbiased estimate of the population variance if this is
True
.Warning
This does not return an unbiased estimate of the population standard deviation.
See also
-
class
blaze.expr.reductions.
var
¶ Variance
Parameters: - child (Expr) – An expression
- unbiased (bool, optional) – Compute an unbiased estimate of the population variance if this is
True
. In NumPy and pandas, this parameter is calledddof
(delta degrees of freedom) and is equal to 1 for unbiased and 0 for biased.
-
blaze.expr.reductions.
summary
(keepdims=False, axis=None, **kwargs)¶ A collection of named reductions
Examples
>>> from blaze import symbol >>> t = symbol('t', 'var * {name: string, amount: int, id: int}') >>> expr = summary(number=t.id.nunique(), sum=t.amount.sum())
>>> data = [['Alice', 100, 1], ... ['Bob', 200, 2], ... ['Alice', 50, 1]]
>>> from blaze import compute >>> compute(expr, data) (2, 350)
-
blaze.expr.reductions.
vnorm
(expr, ord=None, axis=None, keepdims=False)¶ Vector norm
See np.linalg.norm
-
class
blaze.expr.arrays.
Transpose
¶ Transpose dimensions in an N-Dimensional array
Examples
>>> x = symbol('x', '10 * 20 * int32') >>> x.T transpose(x) >>> x.T.shape (20, 10)
Specify axis ordering with axes keyword argument
>>> x = symbol('x', '10 * 20 * 30 * int32') >>> x.transpose([2, 0, 1]) transpose(x, axes=[2, 0, 1]) >>> x.transpose([2, 0, 1]).shape (30, 10, 20)
-
class
blaze.expr.arrays.
TensorDot
¶ Dot Product: Contract and sum dimensions of two arrays
>>> x = symbol('x', '20 * 20 * int32') >>> y = symbol('y', '20 * 30 * int32')
>>> x.dot(y) tensordot(x, y)
>>> tensordot(x, y, axes=[0, 0]) tensordot(x, y, axes=[0, 0])
-
blaze.expr.arrays.
dot
(lhs, rhs)¶ Dot Product: Contract and sum dimensions of two arrays
>>> x = symbol('x', '20 * 20 * int32') >>> y = symbol('y', '20 * 30 * int32')
>>> x.dot(y) tensordot(x, y)
>>> tensordot(x, y, axes=[0, 0]) tensordot(x, y, axes=[0, 0])
-
blaze.expr.arrays.
transpose
(expr, axes=None)¶ Transpose dimensions in an N-Dimensional array
Examples
>>> x = symbol('x', '10 * 20 * int32') >>> x.T transpose(x) >>> x.T.shape (20, 10)
Specify axis ordering with axes keyword argument
>>> x = symbol('x', '10 * 20 * 30 * int32') >>> x.transpose([2, 0, 1]) transpose(x, axes=[2, 0, 1]) >>> x.transpose([2, 0, 1]).shape (30, 10, 20)
-
blaze.expr.arrays.
tensordot
(lhs, rhs, axes=None)¶ Dot Product: Contract and sum dimensions of two arrays
>>> x = symbol('x', '20 * 20 * int32') >>> y = symbol('y', '20 * 30 * int32')
>>> x.dot(y) tensordot(x, y)
>>> tensordot(x, y, axes=[0, 0]) tensordot(x, y, axes=[0, 0])
-
class
blaze.expr.arithmetic.
Arithmetic
¶ Super class for arithmetic operators like add or mul
-
class
blaze.expr.math.
notnull
¶ Return whether an expression is not null
Examples
>>> from blaze import symbol, compute >>> s = symbol('s', 'var * int64') >>> expr = notnull(s) >>> expr.dshape dshape("var * bool") >>> list(compute(expr, [1, 2, None, 3])) [True, True, False, True]
-
class
blaze.expr.math.
UnaryMath
¶ Mathematical unary operator with real valued dshape like sin, or exp
-
class
blaze.expr.broadcast.
Broadcast
¶ Fuse scalar expressions over collections
Given elementwise operations on collections, e.g.
>>> from blaze import sin >>> a = symbol('a', '100 * int') >>> t = symbol('t', '100 * {x: int, y: int}')
>>> expr = sin(a) + t.y**2
It may be best to represent this as a scalar expression mapped over a collection
>>> sa = symbol('a', 'int') >>> st = symbol('t', '{x: int, y: int}')
>>> sexpr = sin(sa) + st.y**2
>>> expr = Broadcast((a, t), (sa, st), sexpr)
This provides opportunities for optimized computation.
In practice, expressions are often collected into Broadcast expressions automatically. This class is mainly intented for internal use.
-
blaze.expr.broadcast.
scalar_symbols
(exprs)¶ Gives a sequence of scalar symbols to mirror these expressions
Examples
>>> x = symbol('x', '5 * 3 * int32') >>> y = symbol('y', '5 * 3 * int32')
>>> xx, yy = scalar_symbols([x, y])
>>> xx._name, xx.dshape ('x', dshape("int32")) >>> yy._name, yy.dshape ('y', dshape("int32"))
-
blaze.expr.broadcast.
broadcast_collect
(expr, broadcastable=(<class 'blaze.expr.expressions.Map'>, <class 'blaze.expr.expressions.Field'>, <class 'blaze.expr.datetime.DateTime'>, <class 'blaze.expr.arithmetic.UnaryOp'>, <class 'blaze.expr.arithmetic.BinOp'>, <class 'blaze.expr.expressions.Coerce'>, <class 'blaze.expr.collections.Shift'>, <class 'blaze.expr.strings.Like'>, <class 'blaze.expr.strings.StrCat'>), want_to_broadcast=(<class 'blaze.expr.expressions.Map'>, <class 'blaze.expr.datetime.DateTime'>, <class 'blaze.expr.arithmetic.UnaryOp'>, <class 'blaze.expr.arithmetic.BinOp'>, <class 'blaze.expr.expressions.Coerce'>, <class 'blaze.expr.collections.Shift'>, <class 'blaze.expr.strings.Like'>, <class 'blaze.expr.strings.StrCat'>), no_recurse=None)¶ Collapse expression down using Broadcast - Tabular cases only
Expressions of type Broadcastables are swallowed into Broadcast operations
>>> t = symbol('t', 'var * {x: int, y: int, z: int, when: datetime}') >>> expr = (t.x + 2*t.y).distinct()
>>> broadcast_collect(expr) distinct(Broadcast(_children=(t,), _scalars=(t,), _scalar_expr=t.x + (2 * t.y)))
>>> from blaze import exp >>> expr = t.x + 2 * exp(-(t.x - 1.3) ** 2) >>> broadcast_collect(expr) Broadcast(_children=(t,), _scalars=(t,), _scalar_expr=t.x + (2 * (exp(-((t.x - 1.3) ** 2)))))
-
class
blaze.expr.datetime.
DateTime
¶ Superclass for datetime accessors
-
class
blaze.expr.split_apply_combine.
By
¶ Split-Apply-Combine Operator
Examples
>>> from blaze import symbol >>> t = symbol('t', 'var * {name: string, amount: int, id: int}') >>> e = by(t['name'], total=t['amount'].sum())
>>> data = [['Alice', 100, 1], ... ['Bob', 200, 2], ... ['Alice', 50, 3]]
>>> from blaze.compute.python import compute >>> sorted(compute(e, data)) [('Alice', 150), ('Bob', 200)]
-
blaze.expr.split_apply_combine.
count_values
(expr, sort=True)¶ Count occurrences of elements in this column
Sort by counts by default Add
sort=False
keyword to avoid this behavior.