Getting started

This getting started guide walks you, the data scientist, through using Fusion for the first time after it has been installed and activated.

After completing this guide, you will be able to:

  • Open a demo data file and run a clustering algorithm in Excel.
  • Execute an interactive plot of the clustering results in Fusion.
  • Create a Jupyter Notebook to export functions from Anaconda to Excel.

Before you start

If you have not yet installed and started Fusion, you must do so before using this guide. For more information, see Installation.

Opening the Clustering notebook on Windows

To open the notebook named Clustering:

  1. Click Start and then select the Fusion Example Spreadsheet icon.
  2. Open the clustering.xlsx demo spreadsheet in Excel 2016.
  3. In the Excel ribbon, click the Insert tab then select My Add-ins.
  4. In the Office Add-ins window, click the Shared Folder tab and select Anaconda Fusion.
  5. Click the OK button.
  6. In the Fusion pane, click the Notebooks tab.
  7. In the example notebooks list, select clustering.ipynb.

Opening the Clustering notebook on macOS

To open the notebook named Clustering:

  1. Open the terminal window, and run the command fusion-examples.

  2. In the Finder window, select the clustering.xlsx spreadsheet and open it in Excel 2016.

  3. In the Excel ribbon, click the Insert tab then go to My Add-ins > Anaconda Fusion to activate the Fusion Add-in:

    ../../../_images/fusion_menu_activation_mac.png

  4. In the Fusion pane, click the Notebooks tab.

  5. In the example notebooks list, select clustering.ipynb.

Executing the clustering algorithm in Excel

In the following clustering demonstration you will learn how to execute a code created by a data scientist directly from Excel. You will execute a Python code to run a machine learning algorithm on your data and visualize the resulting output.

Creating a dataset from Excel data

To select data in the spreadsheet and make it visible to Fusion as a named dataset:

  1. In the NOISY_CIRCLES table of the clustering.ipynb spreadsheet, find the columns x and y.

  2. Highlight at least 100 rows of those two columns–without selecting the x and y column headers.

  3. In the Fusion pane, click the Data tab, then select Current Selection:

    ../../../_images/fusion_add_data.png

  4. In the Name field, type noisy_circles_small to name the dataset you selected in step 2.

  5. Click the Confirm button to save the dataset.

    ../../../_images/fusion_confirm_add_data.png

  6. Click the clustering.ipynb tab at the bottom of the pane.

Running the clustering algorithm from the Fusion pane

  1. In the list at the top of the the Fusion pane, select clustering.

  2. In the three lists under Inputs, select the following parameters for the algorithm:

    • In the Select Data list, select your noisy_small_circles dataset.
    • In the Select Algorithm list, select MiniBatchKMeans.
    • In the n_clusters list, leave the default selection ---.
    ../../../_images/fusion_clustering_params.png

  3. Click the Run button to produce a plot of the clustering results in Fusion:

    ../../../_images/fusion_clustering_plot.png

The Python code you executed from within Excel runs a machine learning algorithm on your data and visualizes its output.

Running the clustering algorithm in the Excel formula bar

To run the clustering algorith in the Excel formula bar:

NOTE: Parameters in brackets are optional and their default values will be used if you do not specify new ones.

  1. Select an empty cell and type clustering(data, [algorithm], [n_clusters]).

    EXAMPLE: =clustering(B3:C1502, "MiniBatchKMeans", 5).

  2. Press the Enter key to execute the algorithm.

The Python code you executed from within Excel runs a machine learning algorithm on your data and visualizes its output.

Creating a notebook to export functions to Excel

If you already use Python for data analysis and want to make your code available for your coworkers using Excel, this section will teach you how to do so. It will demonstrate how to create Python code that others can open and execute using Excel.

NOTE: The following script will calculate and display the sum of the even numbers in an Excel list.

To run this script:

  1. If the Fusion server is not running, click start and select the Fusion icon to display a black window with white text containing the Fusion server log.

  2. Open a browser.

  3. Type the address of the Jupyter kernel that was installed with Fusion–by default, the address is https://localhost:9888:

    TIP: If you did not use the default address, you may find the address by opening Fusion window and finding the localhost:... string. This is the address you should type in your browser.

    ../../../_images/fusion_jupyter_opening.png

  4. In your Jupyter Notebooks window, click the Files tab.

  5. On the Files tab, click on the Notebooks folder.

  6. Click the New button and select Python [default] to display a new Jupyter Notebook with the Python 3 kernel.

  7. In the new notebook’s single cell, type:

      from anacondafusion.fusion import fusion
    
      @fusion.register()
      def add_evens(data):
        total = 0
        for row in data:
            for item in row:
                if item % 2 == 0:
                    total = total+item
        return total
    
    TIP: The ``add_evens`` function is exposed to Excel with the ``@fusion.register`` decorator.
    
    .. figure:: /img/fusion_jupyter_function.png
      :width: 50%
    

  8. Click on File and select Save and Checkpoint.

    NOTE: At the top left, next to the Jupyter symbol, is the notebook name followed by “Last Checkpoint:...”–by default, the notebook name is Untitled.

The add_evens function is saved and can now be used.

Using the Jupyter Notebooks interface

The Jupyter Notebooks interface in your browser is the best way to create, edit, and delete your Fusion notebooks.

  1. In the Excel ribbon, click the Insert tab then select My Add-ins.

  2. In the Office Add-ins window, click the Shared Folder tab and select Anaconda Fusion.

  3. Click the OK button.

  4. Click the Notebooks tab and then select the notebook you saved in the last section–by default it was named Untitled.ipynb.

  5. In the list with the add_evens function displayed, verify that add_evens is selected.

  6. On the blank Excel worksheet, in cells B2:E2 type 1, 2, 3, and 4.

  7. Select B2:E2.

  8. In Fusion, name the dataset mydata.

  9. In the Fusion pane, select Data and Current Selection.

  10. In the Name field, type mydata.

  11. Click the Confirm button to define mydata as the dataset you selected in Excel which creates an object accessible to Jupyter that points to your Excel dataset.

  12. In Excel, click on cell C4.

    NOTE: This cell will contain the result.

  13. Call the add_evens function on mydata by clicking Fusion’s Inputs list, and selecting mydata.

  14. Click the Run button.

In Excel, in the blank cell C4 that you selected, you should see the sum 6.

More practice

At this point you may want to go back to your browser and look at the first example to see how we wrote the clustering.ipynb notebook.

Other Output options

In the Output section of Fusion, the Options link is displayed. Clicking this link displays the Select Default Output list which sets the default way Fusion outputs data to Excel–either Selection or Cell/Range.

The Output section also displays the Export link which exports the most recent result from Fusion to Excel.

The Select Export Destination list also offers a choice of Selection or Cell/Range.

Next steps

Now that you have a basic understanding of how to use Fusion, you are ready to start performing some user-specific tasks.

For more information, see Business Analyst tasks and/or Data Scientist tasks.