Third Party Copy
LHC data is constantly beign moved between computing and storage sites to support analysis, processing, and simluation; this is done at a scale that is currently unique within the science community. For example, the CMS experiment on the LHC manages approximately 200PB of data and, on a daily basis, moves 1PB between sites. Across all four experiments, the global data movement is regularly peaks above 250Gbps in 2021 – and this is without the LHC accelerator taking new data!
The HL-LHC promises a data deluge: we will need to modernize the infrastructure to sustain at least 1Tbps by 2027 and, likely, peeking at twice that level. Historically, bulk data movement has been done with the GridFTP protocol; as the community looks to the increased data volumes of HL-LHC and GridFTP becomes increasingly niche, there is a need to use modern software, protocols, and techniques to move data. The IRIS-HEP DOMA area - in collaboration with the WLCG DOMA activity - is helping the LHC and HEP in general transition to using HTTP for bulk data transfer.
How fast is HTTP?
The above graph shows data movement rates (up to 24Gbps) for a single host, achieved during standalone tests; a typical LHC site will load-balance across multiple hosts in order to saturate available network links. With a sufficiently performant HTTP server, we have observed the protocol can go as quickly as the underlying network infrastructure.
During the initial phase of IRIS-HEP, the team worked with a variety of implementations to improve code and ensure interoperability. The first goal was to get all commonly-used storage implementations for the LHC to provide an HTTP endpoint. Initially, the goal was set to get one site to get more that 30% of its data using the HTTP protocol. This was accomplished in 2020; for 2021, the goal is to have every LHC site to use HTTP-TPC.
For CMS, we have picked 2 sites: Nebraska and UCSD to be the ones leading the transition by using the ‘davs’ protocol for all their incoming production transfers from the many sites which can support such protocol.
Percentage of data transfered to UCSD using GridFTP and HTTP
The above shows the amount of data transferred to UCSD using the GridFTP protocol with respect to HTTP during July 2020.
The next goal was set to have a single site having 50% of all its data being transferred via HTTPS.
Percentage of data transfered to/from Nebraska via HTTPS vs non-HTTPS
The above shows the amount of production data transferred to and from Nebraska using HTTPS with respect to non-HTTPS during April 2021.
On the ATLAS side, the transition has taken a faster pace having that most of their sites have adopted an HTTPS endpoint.
Protocol breakdown for transfers at all ATLAS sites
The above shows the percentage of data transferred among all sites (excluding tape endpoints) using each of the available protocols during April 2021.
- Brian Bockelman
- Diego Davila
- 23 Jun 2021 - "OSG Xrootd Monitoring", Diego Davila, WLCG - xrootd monitoring discussion
- 12 May 2021 - "Transferring at 500Gbps with XRootD", Diego Davila, S&C Blueprint Meeting - Data Challenge
- 5 Mar 2021 - "Latest updates on the WLCG Token Transition Planning", Brian Bockelman, OSG All-Hands Meeting 2021
- 3 Mar 2021 - "HTTP Third-Party Copy: Getting rid of GridFTP", Diego Davila, OSG All-Hands Meeting 2021
- 24 Feb 2021 - "Update on the adoption of WebDAV for Third Party Copy transfers", Diego Davila, Offline and Computing Weekly meeting
- 10 Feb 2021 - "Follow-up on the WLCG Token Transition Timeline", Brian Bockelman, February 2021 GDB
- 9 Dec 2020 - "Update on OSG Token & Transfer Transition", Brian Bockelman, December 2020 GDB
- 21 Oct 2020 - "Benchmarking TPC Transfers on 100G links", Edgar Fajardo, DOMA / TPC Meeting
- 5 Aug 2020 - "Progress on transferring with HTTP-TPC", Diego Davila, Offline and Computing Weekly meeting
- 29 Apr 2020 - "OSG-LHC Technical Roadmap", Brian Bockelman, US ATLAS Computing Facility
- 19 Mar 2020 - "Update on the Globus transition", Brian Bockelman, OSG Council March 2020 Meeting
- 9 Mar 2020 - "IRIS-HEP and DOMA related activities", Brian Bockelman, WLCG DOMA F2F @ FNAL
- 27 Feb 2020 - "Modernizing the LHC’s transfer infrastructure", Edgar Fajardo, IRIS-HEP Poster Session
- 27 Feb 2020 - "Modernizing the LHC’s transfer infrastructure", Brian Bockelman, IRIS-HEP Poster Session
- 27 Nov 2019 - "Benchmarking xrootd HTTP tests", Edgar Fajardo, WLCG DOMA General Meeting
- 5 Nov 2019 - "Third-party transfers in WLCG using HTTP", Brian Bockelman, 24th International Conference on Computing in High Energy & Nuclear Physics
- 12 Jun 2019 - "XRootD and HTTP performance studies", Edgar Fajardo, XrootD Workshop@CC-IN2P3
- 28 May 2019 - "WLCG DOMA TPC Working Group", Brian Bockelman, US CMS Tier-2 Facilities May 2019 Meeting
- 6 Feb 2019 - "IRIS-HEP DOMA", Brian Bockelman, IRIS-HEP Steering Board Meeting
- Systematic benchmarking of HTTPS third party copy on 100Gbps links using XRootD, Fajardo, Edgar, Aashay Arora, Diego Davila, Richard Gao, Frank Würthwein, and Brian Bockelman, arXiv:2103.12116 (2021). (Submitted to CHEP 2019) (22 Mar 2021).
- WLCG Authorisation from X.509 to Tokens, Brian Bockelman and Andrea Ceccanti and Ian Collier and Linda Cornwall and Thomas Dack and Jaroslav Guenther and Mario Lassnig and Maarten Litmaath and Paul Millar and Mischa Sallé and Hannah Short and Jeny Teheran and Romain Wartel, arXiv:2007.03602 [cs.CR] (Submitted to CHEP 2019) (08 Nov 2019).
- Third-party transfers in WLCG using HTTP, Brian Bockelman and Andrea Ceccanti and Fabrizio Furano and Paul Millar and Dmitry Litvintsev and Alessandra Forti, arXiv:2007.03490 [cs.DC] (Submitted to CHEP 2019) (08 Nov 2019).