#system-reliability

[ follow ]
San Francisco
fromsfist.com
4 days ago

BART Board Does Some Grilling, Managers Give Some Explanation About Last Week's Systemwide Meltdown

A fiber-optic cable move during upgrades triggered another BART systemwide meltdown, prompting management accountability and urgent mitigation across dozens more stations.
Software development
fromInfoQ
1 month ago

Grafana 12.1 Brings Built-in Diagnostics and Enhanced Alerting

Grafana 12.1 introduces features for system reliability, alert management, and dashboard interactivity, including Grafana Advisor and trendline transformations.
Privacy technologies
fromInfoQ
5 months ago

How Meta Uses Precision Time Protocol to Handle Leap Seconds

Leap seconds are critical for time-sensitive systems, requiring precise handling to prevent errors.
Meta's algorithmic method improves PTP synchronization by introducing a Window of Uncertainty for leap seconds.
[ Load more ]