Accumulo Adapter¶
The AccumuloAdapter module reads data from Accumulo key/value stores and produces a NumPy array containing the parsed values.
- The AccumuloAdapter engine is written in C to ensure returned data is parsed as fast as data can be read from the source. Data is read and parsed in small chunks instead of reading entire data into memory at once.
- Python slicing notation can be used to specify a subset of records to be read from the data source.
Adapter Methods¶
Accumulo Adapter Constructor:
AccumuloAdapter (server=’localhost’, port=42424, username=’‘, password=’‘, table=None, field_type=’f8’, start_key=None, stop_key=None, start_key_inclusive=True, stop_key_inclusive=False, missing_values=None, fill_value=None):
Create an adaptor for connecting to an Accumulo key/value store.server: Accumulo server addressport: Accumulo portusername: Accumulo user namepassword: Accumulo user passwordtable: Accumulo table to read data fromfield_type: str, NumPy dtype to interpret table values asstart_key: str, key of record where scanning will start fromstop_key: str, key of record where scanning will stop atstart_key_inclusive: If True, start_key is inclusive (default is True)stop_key_inclusive: If True, stop_key is inclusive (default is False)missing_values: list, missing value strings. Any values in table equal to one of these strings will be replaced with fill_value.fill_value: fill value used to replace missing value when scanning
- close ()
- Close connection to the database.
The AccumuloAdapter object supports array slicing:
Read all records: adapter[:]Read first ten records: adapter[0:10]Read last record: adapter[-1]Read every other record: adapter[::2]
Adapter Properties¶
- field_type (readonly)
- Get dtype of output NumPy array
- start_key
- Get/set key of record where reading/scanning will start.The start_key_inclusive property specifies whether this key is inclusive(default is inclusive).
- stop_key
- Get/set key of record where reading/scanning will stop.The stop_key_inclusive property specifies whether this key is inclusive(default is exclusive).
- start_key_inclusive
- Toggle whether start key is inclusive. Default is true.
- stop_key_inclusive
- Toggle whether stop key is inclusive. Default is False.
- missing_values
- Get/Set missing value strings. Any values in Accumulo table matching oneof these strings will be replaced with fill_value.
- fill_value
- Fill value used to replace missing_values. Fill value type should matchspecified output type.
Basic Usage¶
Create AccumuloAdapter object for data source:
>>> import iopro
>>> adapter = iopro.AccumuloAdapter(server='172.17.0.1',
port=42424,
username='root',
password='password',
field_type='f4',
table='iopro_tutorial_data')
IOPro adapters use slicing to retrieve data. To retrieve records from the table or query, the standard NumPy slicing notation can be used:
>>> # read all records
>>> array = adapter[:]
array([ 0.5, 1.5, 2.5, 3.5, 4.5], dtype=float32)
>>> # read first three records
>>> array = adapter[0:3]
array([ 0.5, 1.5, 2.5], dtype=float32)
>>> # read every other record from the first four records
>>> array = adapter[:4:2]
array([ 0.5, 2.5], dtype=float32)
The Accumulo adapter does not support seeking from the last record.
The field_types property can be used to see what type the output NumPy array will have:
>>> adapter.field_type
'f4'
Since Accumulo is essentially a key/value store, results can be filtered based on key. For example, a start key using the start_key property. This will retrieve all values with a key equal to or greater than the start key.
>>> adapter.start_key = 'row02'
>>> adapter[:]
array([ 1.5, 2.5, 3.5, 4.5], dtype=float32)
Likewise, a stop key. This will retrieve all values with a key less than the stop key but equal to or greater than the start key.
>>> adapter.stop_key = 'row04'
>>> adapter[:]
array([ 1.5, 2.5], dtype=float32)
By default, the start key is inclusive. This can be changed by setting the start_key_inclusive property to False.
>>> adapter.start_key_inclusive = False
>>> adapter[:]
array([ 2.5], dtype=float32)
By default, the stop key is exclusive. This can be changed by setting the stop_key_inclusive property to True.
>>> adapter.stop_key_inclusive = True
>>> adapter[:]
array([ 2.5, 3.5], dtype=float32)
The Accumulo adapter can handle missing values. If it is known that the strings ‘NA’ and ‘nan’ signify missing float values, the missing_values property can be used to tell the adapter to treat these strings as missing values: Also, the fill_value property can be used to specify what value to replace missing values with.
>>> adapter = iopro.AccumuloAdapter('172.17.0.1', 42424, 'root', 'password', 'iopro_tutorial_missing_data', field_type='S10')
>>> adapter[:]
array([b'NA', b'nan'], dtype='|S10')
>>> adapter = iopro.AccumuloAdapter('172.17.0.1', 42424, 'root', 'secret', 'iopro_tutorial_missing_data', field_type='f8')
>>> adapter.missing_values = ['NA', 'nan']
>>> adapter.fill_value = np.nan
>>> adapter[:]
array([ nan, nan])
- Close database connection:
>>> adapter.close()