IRIS-HEP Fellow: Jayjeet Chakraborty
Fellowship dates: Jun – Sep, 2020
Jan – Jul, 2021
Home Institution: National Institute Of Technology, Durgapur
Project: Reproducible, large-scale SkyhookDM experiments
SkyhookDM injects programmable data management and data storage capabilities directly in the storage layer of distributed object databases such as Ceph. SkyhookDM utilizes and extends the Ceph distributed object storage platform with customized C++ object classes that enable database operations such as SELECT, PROJECT, AGGREGATE to be offloaded directly into the object storage layer, allowing applications to efficiently query multi-dimensional arrays. Compiling Ceph along with Skyhook and running benchmark tests consists of a number of steps and can become irreproducible at times. The aim of this project is to implement a reproducible workflow with Popper to automate large-scale tests on different cloud infrastructure like GCP, Cloudlab and Kubernetes clusters and benchmark SkyhookDM at the 10's of terabyte scale over the various supported data formats.More information: My project proposal
Mentors:
-
Carlos Maltzahn (UC Santa Cruz)
-
Ivo Jimenez (UC Santa Cruz)
-
Jeff LeFevre (UC Santa Cruz)
Project: Arrow-Native Storage with SkyhookDM Ceph
Apache Arrow is a columnar in-memory format for seamless data transfer between different big data systems. It mitigates the need for serializing and deserializing data. It has native abstractions for use in Big Data storage systems. We aim to convert SkyhookDM into an Arrow-Native storage system by utilizing the Object class SDK provided by Ceph to add a layer in its storage side using the Arrow C++ SDK to allow querying and processing of tabular datasets stored as objects in Apache Arrow format both in the storage and client side. We aim to upstream the Rados specific implementations of the Arrow C++ SDK also. Native support for Arrow will allow applications such as Coffea Processors, and ServiceX transformers to seamlessly interact with SkyhookDM, as well as other storage systems.More information: My project proposal
Mentors:
-
Carlos Maltzahn (UC Santa Cruz)
-
Ivo Jimenez (UC Santa Cruz)
-
Jeff LeFevre (UC Santa Cruz)
- 5 Oct 2020 - "Reproducible and Scalable Experiments with SkyhookDM Ceph", Jayjeet Chakraborty, IRIS-HEP Topical Meetings Recording: Reproducible and Scalable Experiments with SkyhookDM Ceph
- 30 Jun 2021 - "SkyhookDM: Towards an Arrow-Native Storage System", Jayjeet Chakraborty, IRIS-HEP Topical Meetings Recording: SkyhookDM: Towards an Arrow-Native Storage System
Current Status
July 2021 - As of Fall 2021, Jayjeet is beginning graduate studies in Computer Science at the University of California, Santa Cruz.
Contact me: