[18:07] <sarnold_> teward: jeeeeze, that's an unhappy system and unhappy sysadmin..
[18:08] <teward> 'grumpy' sysadmin is the proper wording
[18:08] <teward> i also didn't sleep well so i'm extra grumpy today
[18:08] <teward> :P
[18:19] <JanC> talking about bad updates: the person who pushed that BGP update to the routers in Facebook's DCs is probably looking for flights out of the country by now  :P
[18:36] <teward> hah
[18:37] <teward> we sure it's BGP updates that took it out?  I mean, probably was, but still xD
[18:37] <teward> (there's nowhere on the planet they can hide, their flight better be a SpaceX)
[18:37] <JanC> from a FB tech on reddit (now deleted again):
[18:38] <JanC> """As many of you know, DNS for FB services has been affected and this is likely a symptom of the actual issue, and that's that BGP peering with Facebook peering routers has gone down, very likely due to a configuration change that went into effect shortly before the outages happened (started roughly 1540 UTC).
[18:38] <JanC> There are people now trying to gain access to the peering routers to implement fixes, but the people with physical access is separate from the people with knowledge of how to actually authenticate to the systems and people who know what to actually do, so there is now a logistical challenge with getting all that knowledge unified.
[18:38] <JanC> Part of this is also due to lower staffing in data centers due to pandemic measures."""
[18:39] <JanC> also: allegedly they normally use FB Messenger for internal communications
[18:40] <teward> so basically, FB screwed themselves xD
[18:42] <sarnold> it's super-helpful to have a channel on oftc or libera or maybe even both :)
[18:42] <teward> or a Slack for them xD
[18:42] <teward> but you're not wrong
[18:42] <JanC> they already apologised on Twitter
[18:43] <JanC> that must have hurt  :P
[18:43] <JanC> teward/sarnold: now you assume their internal network routing to Slack servers is still up...
[18:44] <teward> accurate statement
[18:44] <JanC> or to IRC
[18:44] <teward> well if they fubar'd their network THAT badly with bad BGP routes they failed hard xD
[18:44] <JanC> mobile phones probably still work, but maybe not inside the DC
[18:45] <sarnold> JanC: easy peasy, cell phone tether
[18:47] <JanC> I guess you could set up some route over a mobile phone to the internal network inside each DC, once you can log into the router, but you teh person with the password/key for that is maybe 500km away   :)
[18:48] <JanC> I'm sort of surprised they don't have some sort of "hardcoded" route into their DCs...
[18:55] <sarnold> a modem with serial port ..
[19:00] <JanC> """Was just on phone with someone who works for FB who described employees unable to enter buildings this morning to begin to evaluate extent of outage because their badges weren’t working to access doors."""
[19:01] <JanC> imagine not getting into the DC either  ;)
[19:01] <teward> heheh whoops xD
[19:01] <JanC> hello, can we rent a bulldozer from you?
[19:08] <teward> accurate
 process control failure lead to applying a too aggressive export filter. the routers complied, stopped announcing routes to the internet, and FB's OOB network management fell over because it had a sneaky dependency on the rest of FB's network
[20:29] <JanC> so they *did* have a "hardcoded" independent control route... except it wasn't as independent as they thought!  :P