SkyhookDM is an extension of Ceph for the scalable storage of tables and for offloading common data management operations on them, including selection, projection, aggregation, and indexing, as well as user-defined functions. The goal of SkyhookDM is to transparently scale out data management operations across many storage servers leveraging the scale-out and availability properties of Ceph while significantly reducing the use of CPU cycles and interconnect bandwidth for unnecessary data transfers. The SkyhookDM architecture is also designed to transparently optimize for future storage devices of increasing heterogeneity and specialization.
Tables can be stored either using Google Flatbuffers (for row-based processing) or using Apache Arrow (for column-based processing) serialization. Current SkyhookDM clients include a foreign data wrapper for PostgreSQL as well as Python clients for pandas dataframes, Apache arrow data, and SQL.
- 26 May 2020 - "Skyhook Data Management: programmable object storage for databases", Jeff LeFevre, Fujitsu Labs
- 5 Nov 2019 - "Mapping datasets to object storage", Jeff LeFevre, CHEP 2019
- 3 Oct 2019 - "Skyhook Data Management: Scaling Databases and Applications with Open Source Extensible Storage", Jeff LeFevre, CROSS Research Symposium 2019
- 24 Apr 2019 - "Skyhook: Programmable Object Storage for Analysis", Jeff LeFevre, IRIS-HEP Topical Meetings
- 26 Feb 2019 - "Skyhook: programmable storage for databases", Jeff LeFevre, Vault'19
- 16 Nov 2017 - "SkyhookDB - Leveraging object storage toward database elasticity in the cloud", Jeff LeFevre, DOMA Workshop 2017 (Flatiron Institute)
- Scale-out Edge Storage Systems with Embedded Storage Nodes to Get Better Availability and Cost-Efficiency At the Same Time, Jianshen Liu, Matthew Leon Curry, Carlos Maltzahn, and Philip Kufeldt, 3rd USENIX Workshop on Hot Topics in Edge Computing (HotEdge ’20), Santa Clara, CA, June 25-26 2020 (26 May 2020).
- Scaling databases and file APIs with programmable Ceph object storage, Jeff LeFevre and Carlos Maltzahn, 2020 Linux Storage and Filesystems Conference (Vault'20, co-located with FAST'20 and NSDI'20), Santa Clara, CA, February 24-25 2020 (24 Feb 2020).
- Towards Physical Design Management in Storage Systems, Kathryn Dahlgren, Jeff LeFevre, Ashay Shirwadkar, Ken Iizawa, Aldrin Montana, Peter Alvaro, Carlos Maltzahn, 4th International Parallel Data Systems Workshop (PDSW 2019, co-located with SC’19), Denver, CO, November 18, 2019. (18 Nov 2019) [NSF PAR].
- MBWU: Benefit Quantification for Data Access Function Offloading, Jianshen Liu, Philip Kufeldt, Carlos Maltzahn, HPC I/O in the Data Center Workshop (HPC-IODC 2019, co-located with ISC-HPC 2019), Frankfurt, Germany, June 20, 2019. (20 Jun 2019).
- Skyhook: Programmable storage for databases, Jeff LeFevre, Noah Watkins, Michael Sevilla, and Carlos Maltzahn, 2020 Linux Storage and Filesystems Conference (Vault'19, co-located with FAST'19), Santa Clara, CA, February 25-26 2019 (25 Feb 2019).