
Coffea-Casa Analysis Facility
The HL-LHC era will represent more than order-magnitude increase of event counts for analysts. The increased data volume will force physicists to adopt new methods and approaches; what fit comfortably on a laptop for LHC will require a distributed system for the next generation.
Coffea-Casa
is a prototype analysis facility, which provides services for “low latency columnar analysis”, enabling rapid processing of data in a column-wise fashion.
This provides an interactive experience and quick ``initial results” while scaling to the full scale of datasets.
These services, based on the Dask parallelism library and Jupyter notebooks, aim to dramatically lower the time for analysis and provide an easily scalable and
user-friendly computational environment that will simplify and accelerate the delivery of particle physics measurements. The facility is built on top of a
Kubernetes cluster and integrates dedicated resources along with resources allocated via fairshare through the local HTCondor
system. In addition to the user-facing interfaces such as Dask, the facility also manages access control through a common single-sign-on authentication &
authorization for data access (the data access strategy aligns with the new authorization technologies used by OSG-LHC).
After authentication (e.g., via the CERN SSO), the user is presented with a Jupyter notebook interface that can be populated with code from a Git repository specified by the user. When the notebook is executed, the processing automatically scales out to available resources (such as the Nebraska Tier-2 facility for the SSL instance at Nebraska), giving the user transparent interactive access to a large computing resource. The CMS instance of the facility has access to the entire CMS data set, thanks to the global data federation and local caches. It supports the Coffea framework, which provides a declarative programming interface that treats the data in its natural columnar form. An important feature is access to a “column service” like ServiceX; if a user is working with a compact data format (such as CMS NanoAOD or ATLAS PHYSLITE) that is missing a data element that the user needs, the facility can be used to serve that “column” from a remote site. This allows only the compact data formats to be stored locally and augmented only as needed, a critical strategy for CMS and ATLAS to control the costs of storage in the HL-LHC era.
Core software components and other developed plugins that were used in the design of Coffea-Casa analysis facility:
Coffea-casa repositories and related resources
More information could be found in the corresponding repository:
Recent accomplishments and plans
Recent accomplishments:
- The CMS facility, deployed at the Nebraska Tier-2 center, is accommodating first users: try it!. More then 140 users have used the CMS facility over last year.
- For non-CMS users, we have enabled the Opendata
coffea-casa
facility: try it!. More then 60 users have used the Opendata facility over last year.
-
For ATLAS physicists, an ATLAS Coffea-Casa analysis facility instance has been deployed at the University of Chicago.
-
The coffea-casa analysis facility is a key component for IRIS-HEP Analysis Grand Challenge preparations.
-
Both the Opendata
coffea-casa
analysis facility at the University of Nebraska-Lincoln and ATLAS analysis facility instance at the University of Chicago were used to showcase various Python analysis packages and services for the Analysis Grand Challenge Tools workshop 2021 and Analysis Grand Challenge Tools workshop 2022.
Future plans for 2022:
- Test Helm charts and other by-products on the other facilities.
- Recruit more physics analysis groups to facility use.
- Benchmark various software components and packages deployed at Coffea-Casa analysis facility at the University of Nebraska-Lincoln.
- Prepare and execute the Analysis Grand Challenge at Coffea-Casa Analysis Facilities deployed at the University of Nebraska-Lincoln and the University of Chicago.
Recent videos and tutorials
- The Coffea-Casa analysis facility demo “Scale-out with coffea: coffea-casa” - Youtube video at Analysis Grand Challenge Tools workshop 2021
- The Coffea-Casa analysis facility introduction - Youtube video at PyHEP 2020
- The Coffea-Casa tutorial “Coffea columnar analysis at scale” - Youtube vide at PyHEP 2020
Fellows
Team
- Oksana Shadura
- Ken Bloom
- Zhengkai Wu
- Carl Lundstedt
- John Thiltges
- Brian Bockelman
- Garhan Attebury
- Matous Adamec
Presentations
- 23 May 2022 - "Analysis user experience with the Python HEP ecosystem", Jim Pivarski, Analysis Ecosystems Workshop II
- 25 Apr 2022 - "IRIS-HEP Analysis Grand Challenge Tools Workshop", Oksana Shadura, RIS-HEP AGC Tools 2022 Workshop
- 25 Apr 2022 - "Scale-out with coffea: coffea-casa analysis facility", Carl Lundstedt, IRIS-HEP AGC Tools (April) 2022 Workshop
- 5 Apr 2022 - "Analysis Grand Challenge updates", Oksana Shadura, IRIS-HEP / Ops Program Analysis Grand Challenge Planning
- 1 Apr 2022 - "Report about HSF Analysis Facilities Forum Kick-off meeting", Oksana Shadura, CMS Spring 2022 O&C Week
- 1 Mar 2022 - "Analysis Grand Challenge updates", Alexander Held, IRIS-HEP / Ops Program Analysis Grand Challenge Planning
- 28 Jan 2022 - "Analysis Grand Challenge updates", Oksana Shadura, IRIS-HEP / Ops Program Analysis Grand Challenge Planning
- 16 Dec 2021 - "Analysis Grand Challenge updates", Oksana Shadura, IRIS-HEP Executive Board / Ops Program Grand Challenge Discussion
- 30 Nov 2021 - "Analysis Grand Challenge updates", Alexander Held, Steering Board Meeting
- 22 Nov 2021 - "Coffea-casa news and developments", Oksana Shadura, Coffea Users Meeting
- 17 Nov 2021 - "Deep Dive - Analysis Grand Challenge", Oksana Shadura, NSF / IRIS-HEP Meeting (November 2021)
- 9 Nov 2021 - "Analysis Facilities", Oksana Shadura, CMS Operations & Computing R&D meeting
- 4 Nov 2021 - "Scale-out with coffea", Oksana Shadura, IRIS-HEP AGC Tools 2021 Workshop
- 2 Nov 2021 - "Analysis Grand Challenge", Oksana Shadura, SwiftHep/ExcaliburHep workshop
- 24 Sep 2021 - "Coffea-casa - an analysis facility prototype", Oksana Shadura, Joint AMG and WFMS Meeting on Analysis Facilities
- 9 Jun 2021 - "Advances in Analysis tools/ecosystem", Oksana Shadura, 9th Edition of the Large Hadron Collider Physics Conference
- 21 May 2021 - "Dask in High-Energy Physics community (workshop)", Oksana Shadura, Dask Distributed Summit 2021
- 21 May 2021 - "Dask at U.S.CMS analysis facilities", Carl Lundstedt, Dask Distributed Summit 2021, Dask in High Energy Physics Community, Tutorials and Workshops
- 20 May 2021 - "Coffea-casa an analysis facility prototype (plenary)", Oksana Shadura, 25th International Conference on Computing in High-Energy and Nuclear Physics
- 19 May 2021 - "Challenges Designing Interactive Analysis Facilities with Dask", Oksana Shadura, Dask Distributed Summit 2021
- 3 Feb 2021 - "Future analysis facilities", Oksana Shadura, CMS Week
- 24 Nov 2020 - "U.S. CMS Managed Analysis Facilities", Oksana Shadura, HSF WLCG Virtual Workshop
- 27 Oct 2020 - "Analysis on LHC-Managed Facilities: Coffea-Casa", Oksana Shadura, IRIS-HEP Future Analysis Systems and Facilities Blueprint Workshop
- 23 Sep 2020 - "Analysis facilities", Oksana Shadura, Upgrade R&D/CMP Meeting (Presented on Weekly CMS O&C Meeting slot)
Publications
- Coffea-casa: an analysis facility prototype, M. Adamec, G. Attebury, K. Bloom, B. Bockelman, C. Lundstedt, O. Shadura and J. Thiltges, EPJ Web Conf. 251 02061 (2021) (02 Mar 2021) [2 citations].