Quickstart¶
In this Anaconda Mosaic quickstart you will learn how to:
- launch Mosaic;
- import your dataset to the Mosaic catalog;
- transform your imported dataset;
- explore your data with visualizations directly available from Mosaic;
- dive deeper into your data from a Jupyter Notebook for further analysis.
If you have not yet installed Mosaic, follow the Install and update instructions.
Launch Mosaic¶
After installing, launch Mosaic:
anaconda-mosaic
Your default browser will open the following window:

Mosaic Opening View
Log in with the username “admin” and the password “admin”.
Working with the sample dataset¶
We provide the classic Iris
dataset as a built-in sample for initial exploration.
To the right you can see the schema information (called Datashape
in
Mosaic) and other metadata about your dataset.
Let’s see a sample of what we can do with Mosaic, starting with data transformations.
You can transform your data by entering Blaze expressions in the input box at the top of the center pane.
The dataset you have imported is always denoted x
. The notation should be
familiar if you have used pandas or blaze before. Enter x[x.sepal_length >
5.0]
in the input box and click Apply
to obtain only those rows where the
sepal length is greater than 5.

Selecting rows from the Iris dataset
When you have large datasets, you can preview your changes by clicking on
Preview
instead of Apply
.
While you can write Blaze expressions manually, you don’t have to. Mosaic helps
you write Blaze expressions by providing buttons to paste in a template of the
selected expression. Let’s try again to select those rows where the sepal
length is greater than 5 using the select
button.

Entering Blaze expressions with a button
Notice the help icon ”?” next to the input box.

Obtaining help about Blaze expressions (a)
Once you click there, you will see direct links to the documentation about that Blaze expression.

Obtaining help about Blaze expressions (b)
The UI provides a template for the transformation. Go ahead and adapt the
expression so that you obtain x[x.sepal_length > 5]
in the input box and
click Apply
. You should obtain the same results as before.
You can get to the information you need by applying transformations in
sequence. For example, you can find out how many observations of each species
there are in the dataset where the sepal length exceeds 5cm by first applying
the selection we did above, then grouping by species and counting using the
by
expression. Start by clicking on the by
button, then enter
by(x.species, total=x.species.count())
:

Expression to count observations by species where the sepal length exceeds 5

Count of observations by species where the sepal length exceeds 5
Notice below the input box that Mosaic keeps track of the transformations you are applying on your data. With Mosaic you can build your expressions interactively, checking your steps as you go. If you end up with the wrong results at any point, you can just go back a step by clicking on the “x” symbols right next to the sequence of steps that Mosaic recorded.
For more details on writing your own Blaze expressions, consult the Blaze documentation.
Visualize your data¶
Below the input box, in the center, you will find the Table
, Plot
, List
,
and Stats
tabs. By default, you should be in the Table
tab, which shows you
a tabular representation of your data.
In the Plot
tab you can explore your data with Mosaic’s built-in
visualizations.
You can now choose what type of plot you want, such as bar, scatter, and others, choose what variables to place on each axis, and choose how to identify variables with color and size attributes. For example, you can quickly make sense of the Iris dataset by making a scatter plot with the sepal length and width on the axes and color coding the dots according to their species. (Make sure you are working with the original Iris dataset.)

Petal length and width by species
Dive deeper with Jupyter Notebook¶
Mosaic itself is useful for quick exploration of the data and evaluation of simple hypotheses, and Mosaic works with other environments so you can create custom visualizations, use statistical and machine learning models, and dive deeply into your data in any other way.
You can easily switch environments to a Jupyter Notebook and import your data straight from Mosaic by clicking on the button “Open Dataset in New Notebook” at the top right corner.

Opening your dataset in a new notebook
In the notebook you can use the full power of any package in Anaconda to work with your data and obtain the answers you need.
Import your dataset to the Mosaic catalog¶
Now that you have learned the basics of how Mosaic works, you are ready to work with your own data.
To import a dataset to the Mosaic catalog, click the “+” button in the top left corner. After you click that, you should see the following:

Adding a dataset to Mosaic
Add your dataset wth a Uniform Resource Identifier (URI) and choose a name to label it. You may also provide a short description of your data.
Here we will import the same iris.csv
dataset as before.

Importing a CSV file into Mosaic
You should now see the first rows of the Iris dataset, just like before.