Third Party Copy


LHC data is constantly beign moved between computing and storage sites to support analysis, processing, and simluation; this is done at a scale that is currently unique within the science community. For example, the CMS experiment on the LHC manages approximately 200PB of data and, on a daily basis, moves 1PB between sites. Across all four experiments, the global data movement is regularly peaks above 250Gbps in 2021 – and this is without the LHC accelerator taking new data!

The HL-LHC promises a data deluge: we will need to modernize the infrastructure to sustain at least 1Tbps by 2027 and, likely, peeking at twice that level. Historically, bulk data movement has been done with the GridFTP protocol; as the community looks to the increased data volumes of HL-LHC and GridFTP becomes increasingly niche, there is a need to use modern software, protocols, and techniques to move data. The IRIS-HEP DOMA area - in collaboration with the WLCG DOMA activity - is helping the LHC and HEP in general transition to using HTTP for bulk data transfer.

TPC rates from testing
How fast is HTTP?

The above graph shows data movement rates (up to 24Gbps) for a single host, achieved during standalone tests; a typical LHC site will load-balance across multiple hosts in order to saturate available network links. With a sufficiently performant HTTP server, we have observed the protocol can go as quickly as the underlying network infrastructure.


During the initial phase of IRIS-HEP, the team worked with a variety of implementations to improve code and ensure interoperability. The first goal was to get all commonly-used storage implementations for the LHC to provide an HTTP endpoint. Initially, the goal was set to get one site to get more that 30% of its data using the HTTP protocol. This was accomplished in 2020; for 2021, the goal is to have every LHC site to use HTTP-TPC.

For CMS, we have picked 2 sites: Nebraska and UCSD to be the ones leading the transition by using the ‘davs’ protocol for all their incoming production transfers from the many sites which can support such protocol.


GridFTP vs HTTP
Percentage of data transfered to UCSD using GridFTP and HTTP

The above shows the amount of data transferred to UCSD using the GridFTP protocol with respect to HTTP during July 2020.


The next goal was set to have a single site having 50% of all its data being transferred via HTTPS.

HTTPS vs non-HTTPS
Percentage of data transfered to/from Nebraska via HTTPS vs non-HTTPS

The above shows the amount of production data transferred to and from Nebraska using HTTPS with respect to non-HTTPS during April 2021.



On the ATLAS side, the transition has taken a faster pace having that most of their sites have adopted an HTTPS endpoint.


Atlas protocol breakdown
Protocol breakdown for transfers at all ATLAS sites

The above shows the percentage of data transferred among all sites (excluding tape endpoints) using each of the available protocols during April 2021.


More Information

Team

Presentations

Publications

  • Systematic benchmarking of HTTPS third party copy on 100Gbps links using XRootD, Fajardo, Edgar, Aashay Arora, Diego Davila, Richard Gao, Frank Würthwein, and Brian Bockelman, arXiv:2103.12116 (2021). (Submitted to CHEP 2019) (22 Mar 2021).
  • WLCG Authorisation from X.509 to Tokens, Brian Bockelman and Andrea Ceccanti and Ian Collier and Linda Cornwall and Thomas Dack and Jaroslav Guenther and Mario Lassnig and Maarten Litmaath and Paul Millar and Mischa Sallé and Hannah Short and Jeny Teheran and Romain Wartel, arXiv:2007.03602 [cs.CR] (Submitted to CHEP 2019) (08 Nov 2019).
  • Third-party transfers in WLCG using HTTP, Brian Bockelman and Andrea Ceccanti and Fabrizio Furano and Paul Millar and Dmitry Litvintsev and Alessandra Forti, arXiv:2007.03490 [cs.DC] (Submitted to CHEP 2019) (08 Nov 2019).