IRIS-HEP Fellow: Jayjeet Chakraborty



Fellowship dates: Jun – Sep, 2020
Jan – Jul, 2021

Home Institution: National Institute Of Technology, Durgapur


Project: Reproducible, large-scale SkyhookDM experiments

SkyhookDM injects programmable data management and data storage capabilities directly in the storage layer of distributed object databases such as Ceph. SkyhookDM utilizes and extends the Ceph distributed object storage platform with customized C++ object classes that enable database operations such as SELECT, PROJECT, AGGREGATE to be offloaded directly into the object storage layer, allowing applications to efficiently query multi-dimensional arrays. Compiling Ceph along with Skyhook and running benchmark tests consists of a number of steps and can become irreproducible at times. The aim of this project is to implement a reproducible workflow with Popper to automate large-scale tests on different cloud infrastructure like GCP, Cloudlab and Kubernetes clusters and benchmark SkyhookDM at the 10's of terabyte scale over the various supported data formats.

More information: My project proposal

Mentors:
  • Carlos Maltzahn (UC Santa Cruz)

  • Ivo Jimenez (UC Santa Cruz)

  • Jeff LeFevre (UC Santa Cruz)


Project: Arrow-Native Storage with SkyhookDM Ceph

Apache Arrow​ is a columnar in-memory format for seamless data transfer between different big data systems. It mitigates the need for serializing and deserializing data. It has native abstractions for use in Big Data storage systems. We aim to convert ​SkyhookDM​ into an Arrow-Native storage system by utilizing the Object class SDK provided by ​Ceph​ to add a layer in its storage side using the Arrow C++ SDK to allow querying and processing of tabular datasets stored as objects in Apache Arrow format both in the storage and client side. We aim to upstream the Rados specific implementations of the Arrow C++ SDK also. Native support for Arrow will allow applications such as Coffea Processors, and ServiceX transformers to seamlessly interact with SkyhookDM, as well as other storage systems.

More information: My project proposal

Mentors:
  • Carlos Maltzahn (UC Santa Cruz)

  • Ivo Jimenez (UC Santa Cruz)

  • Jeff LeFevre (UC Santa Cruz)

Presentations and Publications
Current Status
July 2021 - As of Fall 2021, Jayjeet is beginning graduate studies in Computer Science at the University of California, Santa Cruz.

Contact me: