Getting started

Anaconda Mosaic is available only to current Anaconda Enterprise subscribers. Mosaic is not available separately.

In this Anaconda Mosaic getting started guide you will learn how to:

  • launch Mosaic;
  • import your dataset to the Mosaic catalog;
  • transform your imported dataset;
  • explore your data with visualizations directly available from Mosaic;
  • dive deeper into your data from a Jupyter Notebook for further analysis.

If you have not yet installed Mosaic, follow the Install and update instructions.

Launch Mosaic

After installing, launch Mosaic:

anaconda-mosaic

Your default browser will open the following window:

../../_images/mosaic_open.png

Mosaic Opening View

Log in with the username “admin” and the password “admin”.

Working with the sample dataset

We provide the classic Iris dataset as a built-in sample for initial exploration. To the right you can see the schema information (called Datashape in Mosaic) and other metadata about your dataset.

Let’s see a sample of what we can do with Mosaic, starting with data transformations.

You can transform your data by entering Blaze expressions in the input box at the top of the center pane.

The dataset you have imported is always denoted x. The notation should be familiar if you have used pandas or blaze before. Enter x[x.sepal_length > 5.0] in the input box and click Apply to obtain only those rows where the sepal length is greater than 5.

../../_images/manual_select.png

Selecting rows from the Iris dataset

When you have large datasets, you can preview your changes by clicking on Preview instead of Apply.

While you can write Blaze expressions manually, you don’t have to. Mosaic helps you write Blaze expressions by providing buttons to paste in a template of the selected expression. Let’s try again to select those rows where the sepal length is greater than 5 using the select button.

../../_images/blaze_expr_buttons.png

Entering Blaze expressions with a button

Notice the help icon ”?” next to the input box.

../../_images/blaze_expr_help_a.png

Obtaining help about Blaze expressions (a)

Once you click there, you will see direct links to the documentation about that Blaze expression.

../../_images/blaze_expr_help_b.png

Obtaining help about Blaze expressions (b)

The UI provides a template for the transformation. Go ahead and adapt the expression so that you obtain x[x.sepal_length > 5] in the input box and click Apply. You should obtain the same results as before.

You can get to the information you need by applying transformations in sequence. For example, you can find out how many observations of each species there are in the dataset where the sepal length exceeds 5cm by first applying the selection we did above, then grouping by species and counting using the by expression. Start by clicking on the by button, then enter by(x.species, total=x.species.count()):

../../_images/blaze_expr_sequence_a.png

Expression to count observations by species where the sepal length exceeds 5

../../_images/blaze_expr_sequence_b.png

Count of observations by species where the sepal length exceeds 5

Notice below the input box that Mosaic keeps track of the transformations you are applying on your data. With Mosaic you can build your expressions interactively, checking your steps as you go. If you end up with the wrong results at any point, you can just go back a step by clicking on the “x” symbols right next to the sequence of steps that Mosaic recorded.

For more details on writing your own Blaze expressions, consult the Blaze documentation.

Visualize your data

Below the input box, in the center, you will find the Table, Plot, List, and Stats tabs. By default, you should be in the Table tab, which shows you a tabular representation of your data.

In the Plot tab you can explore your data with Mosaic’s built-in visualizations.

You can now choose what type of plot you want, such as bar, scatter, and others, choose what variables to place on each axis, and choose how to identify variables with color and size attributes. For example, you can quickly make sense of the Iris dataset by making a scatter plot with the sepal length and width on the axes and color coding the dots according to their species. (Make sure you are working with the original Iris dataset.)

../../_images/bokeh_viz.png

Petal length and width by species

Dive deeper with Jupyter Notebook

Mosaic itself is useful for quick exploration of the data and evaluation of simple hypotheses, and Mosaic works with other environments so you can create custom visualizations, use statistical and machine learning models, and dive deeply into your data in any other way.

You can easily switch environments to a Jupyter Notebook and import your data straight from Mosaic by clicking on the button “Open Dataset in New Notebook” at the top right corner.

../../_images/open_notebook.png

Opening your dataset in a new notebook

In the notebook you can use the full power of any package in Anaconda to work with your data and obtain the answers you need.

Import your dataset to the Mosaic catalog

Now that you have learned the basics of how Mosaic works, you are ready to work with your own data.

To import a dataset to the Mosaic catalog, click the “+” button in the top left corner. After you click that, you should see the following:

../../_images/mosaic_add_dataset.png

Adding a dataset to Mosaic

Add your dataset wth a Uniform Resource Identifier (URI) and choose a name to label it. You may also provide a short description of your data.

Here we will import the same iris.csv dataset as before.

../../_images/import_dataset.png

Importing a CSV file into Mosaic

You should now see the first rows of the Iris dataset, just like before.