[13:17] <cmaloney> Morning
[13:20] <jrwren> mornin
[13:31] <cmaloney> Man, surprise covers abound
[13:31]  * cmaloney is listening to Deconbrio - Guilty
[13:31] <cmaloney> cover of Gravity Kills - Guilty
[13:31] <cmaloney> I like.
[13:31] <jrwren> rather it be a klute cover :p
[13:32] <cmaloney> Not familiar
[13:38] <jrwren> leather strip side project
[13:38] <jrwren> very aggro
[13:44] <cmaloney> Ah
[13:44] <cmaloney> laeeaeaeaaeeeaeeeather strip?
[13:44] <cmaloney> ;)
[17:36] <greg-g> weeee
[17:36] <greg-g> https://twitter.com/Wikimedia/status/563388375898411008
[17:36] <greg-g> "All Wikimedia sites are experiencing issues due to a network problem. We'll be back up shortly!"
[17:39] <cmaloney> greg-g: Woo woo
[18:26] <brousch> greg-g: Fix it!
[18:30] <greg-g> luckily, I don't have to
[18:30] <greg-g> poweroutage to an important switch
[18:54] <cmaloney> Lovely
[18:55] <greg-g> yeah, I feel bad for the DC tech
[18:55] <cmaloney> I'm sure his death will be swift.
[18:55] <cmaloney> Though not honorable
[18:58] <jrwren> single switch?
[18:58] <jrwren> that is a shit spof design.
[18:58] <jrwren> bridging is your friend FFS! :p
[18:58] <cmaloney> jrwren: Likely one switch with bad failover.
[18:59] <greg-g> yeah, I don't know all the details, but it sounds like it was a perfect shitty accidental storm
[19:00] <cmaloney> yeah, there's a lot of instances where if one component fails then things are fine, but if one component disappears (or conversely doesn't disappear enough) then shit doesn't work
[19:01] <cmaloney> "Hi, I know this is weird and all but I just powered up and I have no idea what a route is. Pleased to meet me"
[19:02] <greg-g> the best part is, it looks like our logging system kept us from coming back up in a timely manner, we had to disable logging for a $timespan
[19:02] <cmaloney> "Hi other router. Apparently you're up, so here's all the traffic. Derp derp"
[19:02] <greg-g> :)
[19:02] <cmaloney> greg-g: Those are the worst
[19:03] <cmaloney> When your tracking is actively fucking you.
[19:03] <greg-g> yeah, which we also just beefed up a bit (and starting logging a lot more stuff by default)
[19:12] <greg-g> We've indeed had a total site outage for roughly 30 minutes. We're still
[19:12] <greg-g> collecting all data, but we've tracked down the cause to multiple cascading
[19:12] <greg-g> issues including loss of power to a critical SPOF network switch and HHVM
[19:12] <greg-g> MediaWiki application servers getting blocked due to multiple unoptimal
[19:12] <greg-g> timeout settings. We'll post a full incident report soon, and work to
[19:12] <greg-g> correct the underlying issues as soon as possible.
[21:44] <brousch> hmmmm
[21:44] <brousch> https://www.irccloud.com/pastebin/oz47PtCo
[21:49] <brousch> Ah, it's docker.io, not docker
[21:57] <greg-g> https://wikitech.wikimedia.org/wiki/Incident_documentation/20150205-SiteOutage
[22:23] <rick_h_> evening
[22:23] <cmaloney> Hello from OCC
[22:24] <rick_h_> party!
[22:25] <cmaloney> W00t
[23:00]  * DrDaemonEye waves at cmaloney
[23:14] <cmaloney> Howdy
[23:17] <DrDaemonEye> how goes?
[23:18] <cmaloney> Writing
[23:18] <DrDaemonEye> fun times
[23:18] <cmaloney> Yeah