The SAMI Galaxy Survey: Data Archive, Web Interface, and Tools for Big Science Exploration

We describe the data archive and database for the SAMI Galaxy Survey, an ongoing observational program that will cover ≈3400 galaxies with integral-field (spatially-resolved) spectroscopy. Amounting to some three million spectra, this is the largest sample of its kind to date. The data archive and built-in query engine use the versatile Hierarchical Data Format (HDF5), which precludes the need for external metadata tables and hence the setup and maintenance overhead those carry. The code produces simple outputs that can easily be translated to plots and tables, and the combination of these tools makes for a light system that can handle heavy data. This article acts as a contextual companion to the SAMI Survey Database source code repository, samiDB, which is freely available online and written entirely in Python. We also discuss the decisions related to the selection of tools and the creation of data visualisation modules. It is our aim that the work presented in this article-descriptions, rationale, and source code-will be of use to scientists looking to set up a maintenance-light data archive for a Big Science data load.

Publication Date: 
November 2015
Paper PDF: