Concepts¶
Fusion users¶
- Data scientists create Notebooks for business analysts to use in Fusion. See Data Scientist tasks.
- Business analysts use the created Notebooks to run data science analysis functions in the Fusion pane of Excel. See Business Analyst tasks.
Anaconda data storage and accessibility¶
Data created with Fusion is stored by default in the local directory with the spreadsheet in which it was defined.
If there is no corresponding spreadsheet, the defined data will be stored in a global namespace.
NOTE: All spreadsheets will have access to data that is not connected to a spreadsheet–even though the data itself will reference the spreadsheet it was created in. Otherwise, defined data will only be visible and accessible within the spreadsheet where it was originally defined.
EXAMPLE: A business analyst started a New Excel Workbook without saving it. The data entered in that workbook is visible and accessible to any workbook that calls on the global namespace. Once the New Excel Workbook is saved properly, the data in it will only be visible within the saved spreadsheet.
Notebooks¶
Fusion connects Excel to the Anaconda platform using Jupyter notebooks.
From the Jupyter team:
The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
Data scientists use notebooks to share their work with others. Using Fusion, data scientists write functions in notebooks and share them with business analysts using Excel so that the analysts can leverage advanced data science functionality in a code-free application within Excel .
Functions¶
A function is a block of organized, reusable code that is used to perform a single, related action. Functions provide improvements to the modularity of your application through code reuse.
In Fusion, data scientists write functions in notebooks that provide Excel users with advanced functionality, including database queries, interactive visualizations and machine learning.
To register a function with Fusion, data scientists should use the decorator
@fusion.register()
.
There are 3 types of function parameters available for Fusion:
- Open parameters–The user can enter any value, defined data or reference to Excel ranges.
- Closed parameters–The user must select a value from a list of options.
- Optional parameters–The user is not required to enter a value. If no value is entered, the default value is used. Optional parameters are marked with
*
.
Data¶
In Fusion, data is defined as a predefined dataset that can be used and referenced at any time during a function execution. Fusion Data datasets can be references to Excel ranges and cells or to Python expressions.
Data types and structures¶
Excel data types and shapes are similar to their counterparts in high-level
languages like Python. In general, Fusion takes care of the conversion between
the 2 data types seamlessly; however, it is important that you understand the
more complex structures–such as Ranges
–so that you will be able to
manipulate the source data through transformations, in order to change the
default conversion behavior when needed.
The default data conversion is driven by data serialization in JSON format,
since it’s a strong, stable and de-facto industry standard. Given the use of
JSON, most simple data types, like string
, integer
or float
, are
automatically translated. More complex data type conversions should be reviewed.
Excel data types¶
Excel presents two main complex data types, Ranges and Tables:
Ranges–Represent a set of one, or more, contiguous cells such as a cell, a row, a column or block of cells.
Tables–More complex objects that map on top of a Range but also include metadata that is not available on a Range
EXAMPLE: Explicitly defining headers or table styles and additional operations–PivotTable summaries, removing duplicates, etc.
NOTE: An Excel cell is not considered as a data type on its own and is instead a singular case of a range–with only one element. In this sense, Excel never exports scalar types such as
string
,integer
orfloat
.
The current version of Fusion fully supports ranges, including named ranges–ranges with a custom alias name; however, tables are not directly supported at the moment. It is possible to manipulate the contents of a table by manipulating the range mapped to it.
By default, Excel defines Range values as nested arrays.
EXAMPLE: A 3x3 range, $A2:$C4
, will map to an array with 3 elements, in
which every element is itself an array of 3 elements:
[['name', 'age', 'city'], ['Bob', 42, 'Austin'], ['Barbara', 24, 'NY']]
The same principle applies to cells
, rows
–in the sense of horizontal
range selections and columns
–in the sense of vertical range selections.
Given the previous example where the content of $A2:$C4
is
[['name', 'age', 'city'], ['Bob', 42, 'Austin'], ['Barbara', 24, 'NY']]
.
Then:
- the cell
$B3
content is[['42']]
. - the row
A3:$C3
content is[['Bob',42,'Austin']]
. - the column
$A2:$A4
content is[['name'], ['bob'], ['Barbara']]
.
Inputs¶
Inputs are the values assigned to function parameters.
In Fusion, business analysts assign values to function parameters in the Input section. The name of the parameter is displayed on the left and the value is displayed on the right.
The input values can be:
A manually entered value, such as a number or a reference to an Excel cell, cell range, or name.
EXAMPLE:
2
,A2
,A2:B4
orPrice
A named dataset defined inside Fusion; see data.
EXAMPLE:
noisy_circles_small
A reference to an Excel range
EXAMPLE:
A2:B4
Output¶
Output is what results from running Fusion functions for the inputs.
In Fusion, business analysts view their results in the Output section as:
- Visualizations displayed in Fusion.
- Output data, such as query results or model scores.