Developer guide¶
This section is for those who wish to author notebooks to distribute to users for use in Excel.
Register functions for Excel¶
To register a python function to be exposed inside Excel with Anaconda Fusion, simply use the @fusion.register()
decorator.
from anacondafusion.fusion import fusion
@fusion.register()
def add_evens(data):
total = 0
for item in *data:
if item % 2 == 0:
total = total + item
return total
You can pass lists of predefined inputs in drop-down menus for Excel users in Anaconda Fusion by passing them as an argument in the decorator. For example:
algorithms = ['MiniBatchKMeans', 'AffinityPropagation', 'MeanShift',
'SpectralClustering', 'Ward', 'AgglomerativeClustering',
'Birch', 'DBSCAN']
@fusion.register(args={'algorithm':{"values": algorithms}, 'n_clusters': {"values":[3, 4, 5, 6]}})
The Excel user will be able to select the values for algorithms and n_cluster from a drop-down menu with the options given:
Document your functions for end users¶
You can write documentation in markdown for users of your Fusion functions by writing docstrings. For example:
@fusion.register(args={'algorithm':{"values": algorithms}, 'n_clusters': {"values":[3, 4, 5, 6]}})
def clustering(data, algorithm='MiniBatchKMeans', n_clusters=3):
"""
Use Clustering function
-----------------------
The clustering function receives a 2-column table (x, y) `data`
and applies the selected `algorithm` with the number of clusters `n_clusters`.
The available algorithms are:
* MiniBatchKMeans
* AffinityPropagation
* MeanShift
* SpectralClustering
* Ward
* AgglomerativeClustering
* Birch
* DBSCAN
SEE more information about `clustering and scikit-learn <http://scikit-learn.org/stable/modules/clustering.html>`_.
When a users clicks the i Information icon next to the function in the Anaconda Fusion UI, they can read the documentation written in the docstring:
Data types and structures¶
Excel data types and shapes are very similar to their counterparts in high level languages like Python. In general, Anaconda Fusion takes care of the conversion between the 2 sides seamlessly but it’s also very important to understand the more complex structures (such as Ranges) to be able to manipulate the source data through transformations in order to change the default conversion behavior when needed.
The default data conversion is driven by data serialization in JSON format, since it’s a strong, stable and de-facto industry standard. Give that, most simple data types ,like string, integer or float, are just automatically translated, while it’s worth spending some time to go over the more complex data types conversion.
Excel data types¶
Excel presents two main complex data types, Ranges and Tables:
- Range represents a set of one or more contiguous cells such as a cell, a row, a column, block of cells, and so on.
- Table is a more complex object that maps on top of a range but also includes metadata that is not available on a range (like explicitly defining headers or table styles), and additional operations (like PivotTable summaries, removing duplicates, and so on.)
NOTE: An Excel Cell is not considered as a data type on its own and is instead a singular case of a range with only one element. In this sense, Excel never exports scalar types such as string, integer or float.
The current version of Anaconda Fusion fully supports ranges and named ranges (ranges with a custom alias name), but tables are not directly supported at the moment. It is possible to manipulate the contents of a table by simply manipulating the range mapped to the table.
By default, Excel defines Ranges values as nested arrays. So, for example a 3x3 range like $A2:$C4 will map to an array with 3 elements where every element is itself an array of 3 elements. For example, [[‘name’, ‘age’, ‘city’], [‘Bob’, 42, ‘Austin’], [‘Barbara’, 24, ‘NY’]]. The same principle applies to cells, rows (in the sense of horizontal range selections) and columns (in the sense of vertical range selections). Here are a few examples:
Given the previous example where the content of $A2:$C4 is [[‘name’, ‘age’, ‘city’], [‘Bob’, 42, ‘Austin’], [‘Barbara’, 24, ‘NY’]], then:
- the cell $B3 content is: [[‘42’]]
- the row $A3:$C3 content is: [[‘Bob’,42,’Austin’]]
- the column $A2:$A4 content is: [[‘name’], [‘bob’], [‘Barbara’]]