[12:12] rharper: i think i probably paste-binned you a git-propose-merge once ? [12:36] smoser: good morning; I'm about to push the new version of my MR in our infrastructure as it fixes an IPv6 regression [12:36] hopefully it'll be adequate for merging, otherwise I'll push a new version [12:37] btw, I took ownership of LP: 1662345 as it is a showstopper for us to deploy cloud-init on arm64 [12:37] Launchpad bug 1662345 in qemu (Ubuntu Xenial) "smbios parameter settings not visible in guest" [Medium,Confirmed] https://launchpad.net/bugs/1662345 [14:08] caribou: ok. i will look. [14:09] smoser: fun thing is that your suggestion fixed an issue we have with IPv6 following our recent deployment [14:22] smoser: hrm [14:31] rharper: i found it. [14:31] thanks to irc logs. [14:31] heh [14:31] git-propose-merge is http://paste.ubuntu.com/p/gMBWKvD42W/ [14:31] what was it again? oh, like sparkie's git to MP [14:31] right? [14:31] for launchpad ? [14:31] yeah, but sparkiegeeks' requires launchpadlib and auth. [14:32] which is sort of the right way to do it, but sure , we're already logged in anyhow [14:32] this is much faster, and works in a small subset of places . but places that i use. [14:32] yep [14:32] well, this just opens up the browser for you [14:32] and you hit 'merge' [14:32] his actually creates it. [14:33] there are other issues i have with his. nothign that couldnt be fixed. that is definitly the right place (or more right than hacky smoser scripts) [14:40] smoser: i am working on https://bugs.launchpad.net/cloud-init/+bug/1779207 [14:40] Ubuntu bug 1779207 in cloud-init "Failed mount of '/dev/sdb1' with Swap File cloud-init config" [Medium,Triaged] [14:41] Do you know if other cloud platforms have hit this issue? [14:56] danMS_: by "platform" you mean cloud platform (not linux distro) [14:56] right? [14:57] yes [14:57] the issue is probably present everywhere. [14:57] however it is dramatically worse on azure [14:57] because of "redeploy" with same instance id. [14:57] er... maybe not relavant to instance id. [14:58] let me try again [14:58] the problem exists any time there are "stale" entries in /etc/fstab [14:58] ok, on Azure they will get a new ntfs formatted ephemeral on redeploy or deallocate [14:58] on Azure, stale entries occur in a regular lifespan of an instance (with 'redeploy'... that might not be the right word) [14:59] on other platforms, the issue can occur on a snapshot -> new-instance [14:59] so it sounds like, when they deallocate their VM, they present the previously attached ephemeral drive [14:59] whereas we are not [15:00] blackboxsw: Trying to update DataSourceSmartOS to use EventType.BOOT to get network reconfiguration on boot and hitting a snag. The change is simple enough, I think: https://hastebin.com/vahizocidi.diff [15:00] danMS_: well no one else "deallocates"to my knowledge. [15:01] i dont know acutally. [15:01] i have not done a deep dive, but i did not see this issue on Ubuntu vms [15:01] But when I install the new deb networking is not configured properly. The ENI file has dhcp, and it looks like the saved configuration is for the wrong datasource. [15:02] >>> pickle.load(open('obj.pkl', 'rb')) [15:02] [15:02] That was in /var/lib/cloud/intance. [15:02] *instance [15:03] mgerdts: blackboxsw is out for a hwhile [15:03] oh, ok. Any idea what could be causing the wrong datasource data to be cached? That is, does this sound familiar? === r-daneel_ is now known as r-daneel [15:20] mgerdts: hrm; that's curious; what's the recreate scenario ? new instance boot, upgrade deb , reboot ? [15:23] install xenial, upgrade to bionic, reboot, install new cloud-init, reboot (ssh host key changed - bad) (did not look closely at networking config), poweroff vm, modify network config in host, restart host metadata service to be sure it picked it up, booted VM. [15:24] mgerdts: ok; I suppose it's best to pkl.load and print after each state to see where it changes [15:24] Since then, I did cloud-init clean -l, reboot. It put DataSourceNone on obj.pkl [15:24] ah [15:24] that sounds like unidentified change [15:24] clean wipes the previous instance info [15:25] so next boot ds-identify needs to run and pick; what does the log look like after that reboot ? [15:27] https://hastebin.com/axaguxusit.txt [15:28] no local data found from DataSourceSmartOS [15:28] so the SmartOS DS didn;t say "yes" I have metadata/user-data [15:28] when cloud-init called .get_data() on it [15:28] which means that the fallback is DatasourceNone [15:29] so the question is , why did the SmartOS DS say it had no local metadata ? [15:29] isn't that over the serial interface ? [15:29] yeah, [15:29] I'll try some debug statements in _get_data [15:31] yeah, I don't see a return without a boolean, and the False path has logging =( [15:31] that's what I was thinking [15:32] and some debugging in sources/__init__.py [15:32] we now do this metadata caching; [15:32] mgerdts: you should be able to use the main too [15:32] python -m cloudinit.sources.DataSourceSmartOS [15:33] migth be easier to debug that way [15:33] that dumps a bunch of metadata [15:35] so i think the pickled object must have a method that is getting in the way. method or attribute i guess. [15:35] IIUC, cloud-init clean was run [15:36] which wipes the object [15:36] oh. hm.. [15:36] yes, and verified that clean worked [15:36] so then this is essentially fresh boot ? [15:37] or as llose as clean can get us ? [15:38] commenting the change to DataSourceSmartOS caused ENI/50-cloud-init.cfg to get static network config. Oddly, not the right network config. [15:38] https://hastebin.com/aviqacabew.txt [15:39] ip should be .223 per sdc:nics, but is .222 [15:39] "ip":"10.88.88.223","ips":["10.88.88.222/24"] [15:39] your data disagrees [15:40] huh. I guess I missed one of the ip's int he zonecfg. [15:40] I think we only look at ips since it was a superset ? [15:40] notice 222 and 223 in there [15:40] yeah [15:41] ok, so something in the get_data() path is unhappy with update_events = {'network': [EventType.BOOT]} in DataSourceSmartOS. [15:42] I'll go hunting [15:57] Looks like the comment in class DataSource is wrong. This seems to work: [15:57] update_events = {'network': [EventType.BOOT_NEW_INSTANCE, EventType.BOOT]} [15:58] Apparently BOOT_NEW_INSTANCE is not a subset of BOOT. [15:58] oh, yeah [15:58] that must have been asperational [15:59] :) [15:59] Don't we log if we skip reading it ? [16:00] I think DataSourceSmartOS should do: update_events['network'].append(EventType.BOOT) [16:01] Doesn't look like it. [16:01] update_metadata could use some logging in the negative path I think [16:01] otherwise you get return False and nothing [16:02] yeah, I'll add something there along with updating the aspirational comment. [16:05] thanks for debugging that [16:08] Yeah, no problem. Thanks for the nudges in the right direction. [16:09] hopefully this addresses the changed ssh host key after upgrade + reboot. [16:10] would you like the fix for blackboxsw's change in a separate changeset from the smartos changes? [16:11] likely: https://hastebin.com/evokubiluj.diff [16:18] mgerdts: the ssh won't regen if the instance-id hasn't changed; [16:19] separate is best if that's not too much trouble [16:19] so maybe dpkg -i cloud-init_all.deb cuased it to get clobbered. [16:19] sure, easy enough. [16:19] I suspect we should also add a unittest on the derived Datasource [16:19] I wonder if the actual unittests do the .append() like you did [16:21] sadly unittests fail now. So probably need some fixes there too. === akik_ is now known as akik [19:18] I think I managed to sort this out. https://code.launchpad.net/~mgerdts/cloud-init/+git/cloud-init/+merge/350374 then https://code.launchpad.net/~mgerdts/cloud-init/+git/cloud-init/+merge/350375 [19:27] mgerdts: thatnks, reviewing [19:27] mgerdts: you just want the first to land first ? [19:27] the second is a superset right ? but not separate commits. [19:27] i think.. ? [19:27] yeah, the second will break without the first. [19:27] due to list vs. set [19:28] I tried to set dependencies in the merge proposal, but not sure if that is actually hooked into anything. [20:50] awesome. python 2.6 strikes again. [21:16] https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/350381 [21:17] mgerdts: ./tools/run-container is fairly easily usable from ubuntu if you alve lxc to get you a centos/6 [21:17] set notation? [21:17] yeah [21:22] should be fixed now. will CI bot automatically re-run or does it need to be nudged? [21:22] rharper: https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/350382 will fix tip-pylint [21:22] auto reruns [21:22] looking