Anaconda Mosaic

Heterogeneous Data Exploration

Overview

Anaconda Mosaic is a module of the Anaconda Platform used for interactive exploration of larger-than-memory datasets. Anaconda Mosaic 1.3 was released on July 22, 2016, and is available as part of the Anaconda Enterprise subscription. It is not available separately.

Mosaic Opening View
../../_images/mosaic_open_full.png

Mosaic provides these core benefits:

  1. Write once, compute anywhere. Mosaic translates the same computational code to various backends, including relational databases, NoSQL stores, Spark and more, by leveraging the power of Blaze. You can access data in multiple data stores with the same code, and you can swap backends without rewriting queries or analytic pipelines.
  2. Push computations to the data. For maximum efficiency, Mosaic will push the computations to your data backend, minimizing the costly movement of data across the network and taking full advantage of the built-in highly optimized code featured in your data backend (PostgreSQL, Oracle, MongoDB, Spark, and more).
  3. Instantly work with large flat file repositories Mosaic allows you to work with data scattered across directories of CSVs with minimal effort. You no longer have to write tedious Extract/Load/Transform (ETL) code! All you have to do is tell Mosaic how to parse the information embedded in the directory structure with simple wildcards.
  4. Collect all your data sources in one place. Mosaic saves you time that would be wasted discovering what data is available and where it is located.
  5. Visually explore your data. Mosaic provides many built-in visualizations, including scatter plots, bar charts and more, allowing you to map variables to different visual traits such as axis, color, glyph size and more. With these visualizations you can quickly make sense of your data and test simple hypotheses.
  6. Integrate seamlessly with an Enterprise environment. Mosaic provides Enterprise-ready features such as provenance, governance and orchestration.

What’s new in Mosaic 1.3?

  1. Add Excel spreadsheets. Users can now load Excel spreadsheets containing tabular data into Mosaic in xls, .xlsx or .csv format. Click the top left Add Dataset icon “+” and in the dialog box that appears, enter the path and filename to the spreadsheet, then create expressions or plots to explore the data. Users can also join their Excel spreadsheet data with other datasets by using the join operation.
  2. Improved downsampling functionality and controls to further enable interactive exploration of large datasets. Click the “Settings” icon immediately above the tabular data display, then specify any number of either the first or last rows.
  3. Enhanced “Add Dataset” dialog to enable custom arguments for every computational backend. This gives users full control over the configuration parameters for the various backends. Additionally, the dataset is automatically reloaded when a configuration parameter is changed. Click the top left Add Dataset icon “+” and in the dialog box that appears, Add Custom Fields section, click the “+” icon and add one or more extra options, which correspond to key-value pairs. Values must be string, integer, or list. Keys will be parsed as strings. Input examples: 1, “two”, [3, 4, 5]
  4. New “Edit Metadata” dialog to allow the user to easily edit the metadata and custom arguments associated with a dataset. This functions much the same way as Add Dataset above.
  5. New row count button allows users to refresh the count of the underlying dataset relative to the current expression. Simply click the “Refresh” icon in the bottom right “Datashape” menu.

See the Release Notes for further information.

Help & support

If you have any questions or problems with Anaconda Mosaic, and you are a current Anaconda Enterprise subscriber, please contact your support representative.

What’s next?

If you are an Anaconda Enterprise subscriber and want to try out Anaconda Mosaic, get in touch, and learn more about Anaconda Mosaic: