[10:54] blackboxsw: https://slexy.org/view/s21hGF2TPF - it's just a single NIC dual-stack setup with stateful DHCPv6 === shardy is now known as shardy_mtg === shardy_mtg is now known as shardy [13:45] Hi all! Can someone explain why do we check it in this line and then in line 182? https://github.com/cloud-init/cloud-init/blob/master/cloudinit/ssh_util.py#L178 [13:46] looks like in line 182 'k' should be checked [14:12] kholkina: reading. [14:22] kholkina: you would appear to be correct. i'd seem you could expose that pretty easily exposed in a unit test [14:35] well, it doesnt test your issue [14:35] but http://paste.ubuntu.com/26500979/ [14:36] adds a test of that. and i guess to expose your bug, we just need to make the 'new_entry' invalid. [14:38] http://paste.ubuntu.com/26500992/ [14:38] and there is a unit test showing your issue, kholkina [17:07] mcb30: thanks! [18:54] * blackboxsw sets up vpn * vmware on my desktop [18:54] oops [21:24] smoser - I think I've tracked down my problem to being a form of PEBKAC. That being said, there's a reasonable case for protecting against this failure mode (as well as the case where the metadata server goes to lunch in middle of a request) that doesn't involve a ton of code. I'd like to get your opinion on this. [21:24] https://bugs.launchpad.net/cloud-init/+bug/1746605 [21:24] Ubuntu bug 1746605 in cloud-init "stack trace when sdc:* not defined" [Undecided,New] [21:27] mgerdts: login prompt [21:27] right ? [21:27] login is running on that console and writing as cloud-init is [21:27] no, a mdata-get command running at the same time as cloud-init [21:28] rm /usr/sbin/mdata-get; reboot; problem fixed! [21:29] since pretty much anything can connect to the port and mess up the conversation, a retry or two may be useful. [21:29] That would also provide some resilience in case the metadata server in the host is bounced while cloud-init is in the middle of a transaction. [21:30] * mgerdts wishing we had a transport that was aware of multiple sessions [21:50] mgerdts: you mention a race with other mdata-get users ... what else would be running in parallel with cloud-init that early ? does mdata-get write a lock file somewhere that cloud-init could check for? does mdata-get or the scripts themselves do any retry? that may still result in some sort of storm of clobbering each other then requiring some sort of back-off (or both sides failing) [21:52] We could follow the advice at tldp.org for locking serial ports in mdata-{get,put,...} and cloud-init. That would be a good thing. It doesn't prevent some other random.process from opening the serial port and producing or consuming data. [21:52] Options 2 + 3 would be the most robust. [21:54] cloud-init's use of the serial port is pretty limited , just early boot; it's somewhat surprising that any other user of the serial is active at that time [21:56] Right now I'm trying track down what else on the system is using the metadata port. [21:57] I do think your suggestion (2 +3) is a reasonable approach though; but maybe pushing the parallel usage issue somewhere else, only to come back once those users add their retry and locking [21:59] there are various serial multiplexors; conserver and the like; maybe something infront of the raw serial device can help serialize access [21:59] mgerdts: yeah.. we can add retry. [21:59] and we can add advisory locking too, and put a change in mdata-get to do the same. [22:02] I don't think that trying to add a multiplexer is the solution, as there's no session information attached to each byte that goes across the connection. For now, the best solution seems to be a lock along with a retry. [22:02] There's been some discussion of supporting vsockets. If we get that in place, it would be great to transition the SmartOS metadata service over to that. [22:09] +1 on vsockets [22:14] mgerdts: well we use a socket in the container solution [22:15] right ? [22:15] yep [22:15] We just don't have a way to pass a socket all the way into a VM yet. [22:15] right. [22:19] https://github.com/joyent/mdata-client/issues/11