Data Scientist tasks

A data scientist is a user who authors notebooks for distribution to other users to use in Excel. This guide provides important information for these users.

Registering Excel functions

To register a Python function that will be exposed in Excel using Fusion:

  1. Run this code to add the @fusion.register() decorator:

    from anacondafusion.fusion import fusion
    
    @fusion.register()
    def add_evens(data):
        total = 0
        for item in *data:
            if item % 2 == 0:
                total = total + item
        return total
    

Passing a list of predefined inputs

You can pass a list of predefined inputs into drop-down lists for Excel users by using Fusion to pass them as an argument in the decorator.

EXAMPLE:

algorithms = ['MiniBatchKMeans', 'AffinityPropagation', 'MeanShift',
              'SpectralClustering', 'Ward', 'AgglomerativeClustering',
             'Birch', 'DBSCAN']

@fusion.register(args={'algorithm':{"values": algorithms}, 'n_clusters': {"values":[3, 4, 5, 6]}})

The Excel user will be able to select the values for algorithms and n_clusters from lists with the options you entered:

../../../../_images/fusion_clustering_options.png

Documenting functions for end users

You can write documentation for users of your Fusion functions in markdown by writing docstrings.

EXAMPLE:

@fusion.register(args={'algorithm':{"values": algorithms},
'n_clusters': {"values":[3, 4, 5, 6]}})
def clustering(data, algorithm='MiniBatchKMeans',
n_clusters=3):
"""
Use Clustering function
-----------------------

The clustering function receives a 2-column table (x, y)
`data` and applies the selected `algorithm` with the
number of clusters `n_clusters`.

The available algorithms are:

* MiniBatchKMeans
* AffinityPropagation
* MeanShift
* SpectralClustering
* Ward
* AgglomerativeClustering
* Birch
* DBSCAN

For more information see,  `clustering and scikit-learn
<http://scikit-learn.org/stable/modules/clustering.html>`_.

When a user clicks the i Information icon next to the function in the Fusion pane of Excel, they can read the documentation written in the docstring:

../../../../_images/fusion_clustering_docs.png

Plotting with Bokeh

When plotting with Bokeh, use the display_plot function for the plot to be displayed in Fusion.

EXAMPLE:

from bokeh.plotting import figure, output_file, show
from anacondafusion.fusion import fusion, display_plot

@fusion.register()
def plot_example():

    plot = figure(plot_width=400, plot_height=400)
    plot.circle([1, 2, 3, 4, 5], [6, 7, 2, 4, 5], size=20,
    color="navy", alpha=0.5)

    display_plot(plot)

../../../../_images/fusion_display_plot.png