SkyhookDM is an extension of Ceph for the scalable storage of tables and for offloading common data management operations on them, including selection, projection, aggregation, and indexing, as well as user-defined functions. The goal of SkyhookDM is to transparently scale out data management operations across many storage servers leveraging the scale-out and availability properties of Ceph while significantly reducing the use of CPU cycles and interconnect bandwidth for unnecessary data transfers. The SkyhookDM architecture is also designed to transparently optimize for future storage devices of increasing heterogeneity and specialization.

Tables can be stored either using Google Flatbuffers (for row-based processing) or using Apache Arrow (for column-based processing) serialization. Current SkyhookDM clients include a foreign data wrapper for PostgreSQL as well as Python clients for pandas dataframes, Apache arrow data, and SQL.

SkyhookDM is currently an incubator project at the Center for Research on Open Source Software at the University of California Santa Cruz.

Team

Presentations

Publications