OSG Network Monitoring
Deployed
The OSG Network Monitoring team designs, deploys, and maintains an infrastructure for collecting and using network monitoring data coming from LHC, OSG, and other collaborations and sites around the world.
Activities
- Network Pipeline Development Our team, using feedback from our operating experience, works to upgrade and evolve our network data pipeline to provide near real-time metrics for our analytics and visualization tools.
- Operating the Network Pipeline We need to monitor and maintain our global network pipeline, starting from the data source (perfSONAR toolkits) to all data destinations. The maintenance includes tuning and optimizing component settings to ensure quick, reliable access to the data.
- Site and User Support Using the network measurement data, site administrators and network users can identify potential network issues. Our team supports them in triaging the initial problem reports to either rule out a network issue, suggest next steps in the diagnosis, or sometimes identifying the root cause.
- perfSONAR Deployment Our network monitoring infrastructure critically depends upon deploying and properly configuring perfSONAR to make network measurements to and from each site. We advise users on best practices and help diagnose perfSONAR issues.
- Documentation for Network Tools and Services The team maintains documentation on the OSG website as well as additional documentation for our tools and services.
- Community Engagement Since OSG, and our WLCG partners, serve the broader research and education community, we work to attend relevant community meetings and present on our tools and services.
- Training To help train the next generation of network cyberinfrastructure specialists, we run a weekly meeting, bringing together undergraduates, graduate students, team members and project leaders to discuss our work, plans and effort.
Accomplishments
- Network Data Pipeline: We have created and evolved a robust network data pipeline which continuously gathers data from more than 250 perfSONAR toolkits world-wide and sends it to multiple locations supporting analysis, visualization and backup.
- Alerting and Alarming Service: Over the last year we have created an alerting and alarming service for our network data. Any user can authenticate with their institutional credentials and select various types of alerts to subscribe to.
- Toolkit Information Server: We created a web service that serves as a central location for finding network related tools, documentation and applications at ToolKitInfo.
- Network Data User Interfaces: With the extensive set of network data we collect, we have also worked to provide various user interfaces to allow easy exploration and visualization of the data. We have created Kibana and Grafana dashboards and stand-alone applications like TRACER and pSDash, all findable via the Toolkit Info link above.
Collaborations
This project collaborates and has collaborated with a number of projects including:
- SAND (Service Analysis and Network Diagnosis) (2017-2021) https://sand-ci.org/, NSF Grant #1827116):
- WLCG Throughput Working Group: (2014-Ongoing) https://twiki.cern.ch/twiki/bin/view/LCG/NetworkTransferMetrics
- HEPiX Network Function Virtualization Working Group: (2018-2020) see final report
- Research Networking Technical Working Group: (2019-Ongoing) see charter
- WLCG Monitoring Task Force (2021-Ongoing) https://twiki.cern.ch/twiki/bin/view/LCG/
- WLCG DOMA Working Group (2020-Ongoing) https://twiki.cern.ch/twiki/bin/view/LCG/DomaActivities
Team
Presentations
- 23 Oct 2024 - "Enhancing Network Analytics through Machine Learning", Petya Vasileva, CHEP 2024
- 11 Oct 2024 - "perfSONAR Network Analytics - Status & Plans", Petya Vasileva, #53 LHCOPN-LHCONE Meeting
- 17 Apr 2024 - "WLCG Network Monitoring and Analytics Updates", Shawn McKee, HEPiX Spring 2024 meeting
- 17 Apr 2024 - "Research Networking Technical Working Group Status and Plans", Shawn McKee, HEPiX Spring 2024 meeting
- 8 Mar 2024 - "Medium to Long Term Network Plans for ATLAS and CMS", Shawn McKee, #12 RCS-ICT Meeting
- 28 Feb 2024 - "USATLAS Data Challenge 2024 Take-aways", Shawn McKee, USATLAS Topical meeting (virtual)
- 9 Nov 2023 - "perfSONAR Plans for DC24", Shawn McKee, Data Challenge 2024 Workshop
- 14 Jun 2023 - "WLCG Data Challenge 2024 (DC24) Status and Plans Related to ATLAS DDM", Shawn McKee, ATLAS Software and Computing #75
- 14 Jun 2023 - "Network Problem Detection and Notification", Petya Vasileva, ATLAS Software & Computing Week #75
- 5 Jun 2023 - "USATLAS Facility and Data Challenge 2024", Shawn McKee, USATLAS Technical Meeting June 2023
- 8 May 2023 - "Identifying and Understanding Scientific Network Flows", Shawn McKee, CHEP 2023
- 19 Apr 2023 - "perfSONAR Strategy in Support of DC24 Preparation", Shawn McKee, #50 LHCOPN/LHCONE Meeting
- 19 Apr 2023 - "RNTWG Packet Marking and Pacing Update", Shawn McKee, #50 LHCOPN/LHCONE Meeting
- 6 Apr 2023 - "WLCG Site Network Monitoring Campaign", Shawn McKee, WLCG Operations Coordination
- 29 Mar 2023 - "perfSONAR Global Monitoring and Analytics Framework Update", Shawn McKee, Spring 2023 HEPiX Meeting
- 29 Mar 2023 - "Status and Plans for the Research Networking Technical WG", Shawn McKee, Spring 2023 HEPiX Meeting
- 11 Nov 2022 - "Network Packet Marking and Flow Labeling, the Technical Details", Shawn McKee, 5th Rucio Workshop
- 9 Nov 2022 - "WLCG Networking Topics from the LHCONE/LHCOPN Meeting", Shawn McKee, WLCG Meeting
- 2 Nov 2022 - "Update on the Global perfSONAR Monitoring Framework", Shawn McKee, Fall 2022 HEPiX Meeting
- 25 Oct 2022 - "RNTWG Packet Marking and Pacing", Shawn McKee, #49 LHCOPN-LHCONE Meeting
- 24 Oct 2022 - "LHC[ONE|OPN] Network Monitoring Update", Shawn McKee, #49 LHCOPN-LHCONE Meeting
- 10 Oct 2022 - "Packet and Flow Marking for Global Science Domains", Shawn McKee, 3rd Global Research Platform
- 14 Sep 2022 - "Monitoring WLCG Networks and Site Connections", Shawn McKee, Grid Deployment Board (GDB) at Nikhef
- 19 Aug 2022 - "Networking Analytics Activities", Shawn McKee, ATLAS ADC Analytics
- 13 Jun 2022 - "ATLAS Site Network Monitoring", Shawn McKee, ATLAS Software & Computing Week Sites Round Table
- 2 Jun 2022 - "WLCG Site Network Monitoring", Shawn McKee, WLCG Ops Coordination
- 11 May 2022 - "OSG Networking Update", Shawn McKee, OSG/PATh Staff Meeting
- 2 May 2022 - "Networking Topics for Science - Activities and Plans", Shawn McKee, IRIS-HEP Topical Meeting: What's going on in Networking
- 26 Apr 2022 - "Research Networking Technical WG Status and Plans", Shawn McKee, Spring 2022 HEPiX Meeting
- 30 Mar 2022 - "Research Networking Technical WG Update", Shawn McKee, LHCOPN-LHCONE meeting
- 29 Mar 2022 - "LHCOPN/LHCONE Monitoring Update", Shawn McKee, LHCOPN-LHCONE meeting
- 18 Mar 2022 - "Networking Activities and Plans", Shawn McKee, OSG 2022 All-Hands Meeting, Joint LHC Session
- 2 Mar 2022 - "Scientific Network Tags Packet and Flow Marking", Shawn McKee, WLCG DOMA Bulk Data Transfer (BDT) WG
- 27 Jan 2022 - "Packet and Flow Marking Technical Specification Update", Shawn McKee, Research Networking Technical Working Group Meeting
- 15 Nov 2021 - "The Service Analysis and Network Diagnosis (SAND) Data Pipeline", Shawn McKee, Supercomputing 2021 INDIS Workshop
- 3 Nov 2021 - "Using perfSONAR for WLCG and its Data Challenges", Shawn McKee, Internet2/TechEXtra21 perfSONAR Day
- 28 Oct 2021 - "WLCG Network Monitoring and Analytics Update", Shawn McKee, Fall 2021 HEPiX Meeting
- 12 Oct 2021 - "FABRIC and FAB Project Overviews and Status", Shawn McKee, Fall 2021 LHCOPN/LHCONE Meeting
- 11 Oct 2021 - "Research Network Technical WG Update", Shawn McKee, Fall 2021 LHCOPN/LHCONE Meeting
- 11 Oct 2021 - "LHCOPN/LHCONE Monitoring Update", Shawn McKee, Fall 2021 LHCOPN/LHCONE Meeting
- 8 Oct 2021 - "Towards a National Research Platform that federates all academic Cyberinfrastructure in the USA", Frank Wuerthwein, CARLA 2021
- 7 Oct 2021 - "Networking for the HL-LHC Era", Shawn McKee, ATLAS Software and Computing, ADC Session Towards Run-4
- 20 Sep 2021 - "Prototype National Research Platform", Frank Wuerthwein, 2nd Global Research Platform Workshop
- 21 Jun 2021 - "Big Science on PRP", Frank Wuerthwein, Pacific Research Platform, Its Legacy and Promise - 6 Year Virtual Symposium
- 27 Apr 2021 - "Network Monitoring R&D (and data challenge needs)", Shawn McKee, Data Challenge Monitoring Mini-Workshop
- 23 Mar 2021 - "LHCOPN/LHCONE Monitoring Updated", Shawn McKee, LHCONE/LHCOPN Spring 2021 Meeting
- 23 Mar 2021 - "Towards A National Research Platform that federates all academic Cyberinfrastructure in the USA", Frank Wuerthwein, Open Science Workshop, UCLA 2022
- 16 Mar 2021 - "WLCG/OSG Network Activities, Status and Plans", Shawn McKee, HEPiX Spring 2021 Meeting
- 16 Mar 2021 - "Relationship of OSG and NRP", Frank Wuerthwein, OSG All-Hands Meeting 2022
- 12 Mar 2021 - "Global Cyberinfrastructure for LIGO, Virgo, Kagra, IceCube, and others", Frank Wuerthwein, IPAM Mathematical and Computational Challenges in the Era of Gravitational Wave Astronomy
- 5 Mar 2021 - "Network Topics For Discussion", Shawn McKee, Open Science Grid All-Hands Meeting, Joint USATLAS-USCMS Session
- 19 Jan 2021 - "Update on the Packet Marking WG", Shawn McKee, HEPiX IPv6 Working Group Virtual F2F
- 12 Nov 2020 - "Data-intensive IceCube Cloud Burst", Igor Sfiligoi, NRP Pilot weekly meeting
- 29 Sep 2020 - "Update from RNTWG - Packet Marking Subgroup", Shawn McKee, HEPiX IPv6 Working Group
- 16 Sep 2020 - "Research Networking Technical WG (RNTWG) Update", Shawn McKee, LHCONE/LHCOPN Fall 2020 Meeting
- 28 Jul 2020 - "Demonstrating 100 Gbps in and out of the public Clouds", Igor Sfiligoi, PEARC20
- 3 Jun 2020 - "IPv6 and RNTWG Sub-groups", Shawn McKee, HEPiX IPv6 Working Group
- 27 May 2020 - "TransAtlantic networking using Cloud links", Igor Sfiligoi, S&C Blueprint Meeting
- 13 May 2020 - "Research Network Technology Working Group", Shawn McKee, LHCONE/LHCOPN Spring 2020 Meeting
- 13 May 2020 - "LHCOPN/LHCONE perfSONAR Update", Shawn McKee, LHCONE/LHCOPN Spring 2020 Meeting
- 12 May 2020 - "Report on the Research Networking Technical WG", Shawn McKee, Internet2 Community Measurement, Metrics and Telemetry Meeting
- 12 May 2020 - "The SAND Project and a New Research Networking Technical Working Group", Shawn McKee, Internet2 Community Measurement, Metrics and Telemetry Meeting
- 5 May 2020 - "Networking: Status and Discussion", Shawn McKee, ATLAS Distributed Data Management Roundtable
- 30 Apr 2020 - "GPU Cloud Bursting for Multi-Messenger Astrophysics with IceCube", Frank Wuerthwein, AWS Education: Research Seminar Series
- 16 Apr 2020 - "FABRIC for High-Energy Physics Prototyping", Shawn McKee, FABRIC Community Workshop
- 14 Apr 2020 - "New National Science Foundation International Research and education Network Connections Testbed Solicitation", Shawn McKee, WLCG Management Board
- 1 Apr 2020 - "USATLAS: OSG/WLCG perfSONAR Network Monitoring and Analytics", Shawn McKee, USATLAS Facilities Meeting
- 27 Feb 2020 - "OSG Network Monitoring (poster)", Shawn McKee, IRIS-HEP Poster Session
- 27 Jan 2020 - "Running a 380PFLOP32s GPU burst for Multi-Messenger Astrophysics with IceCube across all available GPUs in the Cloud", Frank Wuerthwein, NRP Engagement webinar
- 15 Jan 2020 - "LHCONE/LHCOPN Meeting Summary", Shawn McKee, Grid Deployment Board
- 15 Jan 2020 - "Updates on OSG/WLCG perfSONAR Network Monitoring and Analytics", Shawn McKee, Grid Deployment Board
- 14 Jan 2020 - "Network Function Virtualization Report and Next Steps for Experiments", Shawn McKee, LHCONE/LHCOPN Meeting
- 4 Dec 2019 - "Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all available GPUs in the Cloud", Frank Wuerthwein, Middleware and Grid Interagency Cooperation (MAGIC) meeting
- 3 Dec 2019 - "perfSONAR Analytics", Shawn McKee, ATLAS Software and Computing Week
- 19 Nov 2019 - "Burst data retrieval after 50k GPU Cloud run", Igor Sfiligoi, SC 19 Internet2 Booth
- 19 Nov 2019 - "Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all available GPUs in the Cloud", Frank Wuerthwein, SC 19 Multiple Booths
- 7 Nov 2019 - "Network Capabilities for the HL-LHC Era", Shawn McKee, CHEP2019
- 4 Nov 2019 - "WLCG Networks: Update on Monitoring and Analytics", Shawn McKee, CHEP2019
- 4 Nov 2019 - "Characterizing network paths in and out of the Clouds", Igor Sfiligoi, CHEP 2019
- 16 Oct 2019 - "The SAND Project a the Halfway Point", Shawn McKee, The Fall 2019 HEPiX Meeting
- 16 Oct 2019 - "Network Function Virtualization Working Group Update", Shawn McKee, The Fall 2019 HEPiX Meeting
- 16 Oct 2019 - "WLCG/OSG Network Activities, Status and Plans", Shawn McKee, The Fall 2019 HEPiX Meeting
- 2 Jul 2019 - "The Service Analysis and Network Diagnostic (SAND) Project", Shawn McKee, The 6th Special Interest Group on Performance Monitoring and Verification Meeting
- 4 Jun 2019 - "LHCOPN/LHCONE perfSONAR Update", Shawn McKee, LHCONE/LHCOPN Meeting
- 26 Mar 2019 - "WLCG/OSG Network Activities, Status and Plans", Shawn McKee, HEPiX Spring 2019
- 20 Mar 2019 - "OSG Networking: Status, Collaborations and Plans", Shawn McKee, Joint HSF/OSG/WLCG Workshop (HOW2019)
Publications
- The Service Analysis and Network Diagnosis DataPipeline, D. Weitzel, S. McKee, B. Bockelman, J. Thiltges, M. Babik and I. Vukotic, arXiv 2112.03074 (06 Dec 2021).
- TRACER (TRACe route ExploRer): A tool to explore OSG/WLCG network route topologies, E. Tretyakov, A. Artamonov, M. Grigorieva, A. Klimentov, S. McKee and I. Vukotic, Int.J.Mod.Phys.A 36 2130005 (2021) (10 Mar 2021) [NSF PAR].