Caching Analysis Data
Significant portions of LHC analysis use the same datasets, running over each dataset several times. Hence, we can utilize cache-based approaches as an opportunity to efficiency of CPU use (via reduced latency) and network (reduce WAN traffic). We are investigating the use of regional caches to store, on-demand, certain datasets.
In Southern California the UCSD CMS Tier-2 and Caltech CMS Tier-2 joined forces to create and mantain a regional cache, commonly referred as the “CMS SoCal cache”, that benefits all southern California CMS researchers.
Later on ESnet approached the SoCal CMS group to integrate a caching server into the SoCal Cache. The server is deployed on the ESnet PoP at Sunnyvale, CA. but it is managed by UCSD via the PRP kubernetes cluster.
A recent study, led by ESnet, on the network savings of the SoCal cache, was carried out by analyzing the XRootD monitoring records from the XCache servers. The results showed a factor 3 reduction of network bandwidth over the analyzed period.
Network utilization savings
Network utilization reduction ratio in terms of (a)number of accesses and (b) volume transferred.
The aforementioned study also demonstrated how the accesses to the cache are evenly distributed among the different servers that conform the SoCal cache.
Misses(a) and Hits(b) distribution in SoCal cache
The above shows the distribution of hits and misses among the servers that conform the SoCal cache.
We also engaged with CMS to have a monitoring page that shows the popularity of the analyzed data, this helps us to consider changes in the namespace definition for what we cache.
CMS data popularity
The above shows the distribution of acesses in terms of volume of the CMS analysis tasks by data campaing.
Currently XCache is distributed by the OSG both in the form of RPM and docker images. The following are the corresponding repositories where the base code can be found:
- Report on cache usage on the WLCG and potential use cases and deployment scenarios for the US LHC facilities
- Report on LHC data access patterns, data uses, and intelligent caching approaches for the HL-LHC (draft)
- 25 Feb 2021 - "GeoIP HTTPS Redirector", Edgar Fajardo, XCache DevOps Meeting
- 15 Sep 2020 - "Data lake prototyping for US CMS", Edgar Fajardo, DOMA / ACCESS Meeting
- 4 Sep 2020 - "A US Data Lake", Edgar Fajardo, OSG All Hands Meeting (US ATLAS/CMS Combined session)
- 2 Sep 2020 - "Stashcache: CDN for Science", Edgar Fajardo, OSG All Hands Meeting
- 23 Apr 2020 - "How CMS user jobs use the caches", Edgar Fajardo, XCache DevOps SPECIAL
- 22 Apr 2020 - "XRootD Transfer Accounting Validation Plan", Diego Davila, S&C Blueprint Meeting
- 27 Feb 2020 - "XCache", Edgar Fajardo, IRIS-HEP Poster Session
- 5 Nov 2019 - "Creating a content delivery network for general science on the backbone of the Internet using xcaches.", Igor Sfiligoi, CHEP 2019
- 5 Nov 2019 - "Creating a content delivery network for general science on the backbone of the Internet using xcaches.", Edgar Fajardo, CHEP 2019
- 5 Nov 2019 - "Moving the California distributed CMS xcache from bare metal into containers using Kubernetes", Edgar Fajardo, CHEP 2019
- 12 Sep 2019 - "OSG XCache Discussion", Frank Wuerthwein, IRIS-HEP retreat
- 31 Jul 2019 - "CMS XCache Monitoring Dashboard", Diego Davila, OSG Area Coordination
- 8 Jul 2019 - "XCache Initiatives and Experiences", Frank Wuerthwein, pre-GDB meeting on XCache
- 11 Jun 2019 - "XCache Packaging", Brian Lin, XRootD Workshop
- 20 Mar 2019 - "Data Access in DOMA", Frank Wuerthwein, HOW2019 (Joint HSF/OSG/WLCG Workshop)
- 7 Mar 2019 - "The OSG Data Federation", Frank Wuerthwein, Internet2 Global Summit 2019
- 16 Jan 2019 - "OSG Cache on Internet Backbone developments", Edgar Fajardo, GDB Jan 2019
- 12 Dec 2018 - "OSG-LHC and XCache", Brian Lin, ATLAS Software & Computing Week \#61
- 2 Oct 2018 - "Current production use of caching for CMS in Southern California", Edgar Fajardo, DOMA / ACCESS Meeting
- Analyzing scientific data sharing patterns for in-network data caching, Elizabeth Copps, Huiyi Zhang, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank Würthwein, Diego Davila, and Edgar Fajardo. 2021 (21 Jun 2021).
- Creating a content delivery network for general science on the internet backbone using XCaches, Edgar Fajardo and Marian Zvada and Derek Weitzel and Mats Rynge and John Hicks and Mat Selmeci and Brian Lin and Pascal Paschos and Brian Bockelman and Igor Sfiligoi and Andrew Hanushevsky and Frank Würthwein, arXiv:2007.01408 [cs.DC] (Submitted to CHEP 2019) (08 Nov 2019).