[12:12] <smoser> rharper: i think i probably paste-binned you a git-propose-merge once ?
[12:36] <caribou> smoser: good morning; I'm about to push the new version of my MR in our infrastructure as it fixes an IPv6 regression
[12:36] <caribou> hopefully it'll be adequate for merging, otherwise I'll push a new version
[12:37] <caribou> btw, I took ownership of LP: 1662345 as it is a showstopper for us to deploy cloud-init on arm64
[14:08] <smoser> caribou: ok. i will look.
[14:09] <caribou> smoser: fun thing is that your suggestion fixed an issue we have with IPv6 following our recent deployment
[14:22] <rharper> smoser: hrm
[14:31] <smoser> rharper: i found it.
[14:31] <smoser> thanks to irc logs.
[14:31] <rharper> heh
[14:31] <smoser> git-propose-merge is http://paste.ubuntu.com/p/gMBWKvD42W/
[14:31] <rharper> what was it again?  oh, like sparkie's git to MP
[14:31] <rharper> right?
[14:31] <rharper> for launchpad ?
[14:31] <smoser> yeah, but sparkiegeeks' requires launchpadlib and auth.
[14:32] <rharper> which is sort of the right way to do it, but  sure , we're already  logged in anyhow
[14:32] <smoser> this is much faster, and works in a small subset of places . but places that i use.
[14:32] <rharper> yep
[14:32] <smoser> well, this just opens up the browser for you
[14:32] <smoser> and you hit 'merge'
[14:32] <smoser> his actually creates it.
[14:33] <smoser> there are other issues i have with his. nothign that couldnt be fixed. that is definitly the right place (or more right than hacky smoser scripts)
[14:40] <danMS_> smoser: i am working on https://bugs.launchpad.net/cloud-init/+bug/1779207
[14:41] <danMS_> Do you know if other cloud platforms have hit this issue?
[14:56] <smoser> danMS_: by "platform" you mean cloud platform (not linux distro)
[14:56] <smoser> right?
[14:57] <danMS_> yes
[14:57] <smoser> the issue is probably present everywhere.
[14:57] <smoser> however it is dramatically worse on azure
[14:57] <smoser> because of "redeploy" with same instance id.
[14:57] <smoser> er... maybe not relavant to instance id.
[14:58] <smoser> let me try again
[14:58] <smoser> the problem exists any time there are "stale" entries in /etc/fstab
[14:58] <danMS_> ok, on Azure they will get a new ntfs formatted ephemeral on redeploy or deallocate
[14:58] <smoser> on Azure, stale entries occur in a regular lifespan of an instance (with 'redeploy'... that might not be the right word)
[14:59] <smoser> on other platforms, the issue can occur on a snapshot -> new-instance
[14:59] <danMS_> so it sounds like, when they deallocate their VM, they present the previously attached ephemeral drive
[14:59] <danMS_> whereas we are not
[15:00] <mgerdts> blackboxsw: Trying to update DataSourceSmartOS to use EventType.BOOT to get network reconfiguration on boot and hitting a snag.  The change is simple enough, I think: https://hastebin.com/vahizocidi.diff
[15:00] <smoser> danMS_: well no one else "deallocates"to my knowledge.
[15:01] <smoser> i dont know acutally.
[15:01] <danMS_> i have not done a deep dive, but i did not see this issue on Ubuntu vms
[15:01] <mgerdts> But when I install the new deb networking is not configured properly.  The ENI file has dhcp, and it looks like the saved configuration is for the wrong datasource.
[15:02] <mgerdts> >>> pickle.load(open('obj.pkl', 'rb'))
[15:02] <mgerdts> <cloudinit.sources.DataSourceNone.DataSourceNone object at 0x7fc92c313c50>
[15:02] <mgerdts> That was in /var/lib/cloud/intance.
[15:02] <mgerdts> *instance
[15:03] <smoser> mgerdts: blackboxsw is out for a hwhile
[15:03] <mgerdts> oh, ok.  Any idea what could be causing the wrong datasource data to be cached?  That is, does this sound familiar?
[15:20] <rharper> mgerdts: hrm;  that's curious;   what's the recreate scenario ? new instance boot, upgrade deb , reboot ?
[15:23] <mgerdts> install xenial, upgrade to bionic, reboot, install new cloud-init, reboot (ssh host key changed - bad) (did not look closely at networking config), poweroff vm, modify network config in host, restart host metadata service to be sure it picked it up, booted VM.
[15:24] <rharper> mgerdts: ok;  I suppose it's best to pkl.load and print after each state to see where it changes
[15:24] <mgerdts> Since then, I did cloud-init clean -l, reboot.  It put DataSourceNone on obj.pkl
[15:24] <rharper> ah
[15:24] <rharper> that sounds like unidentified change
[15:24] <rharper> clean wipes the previous instance info
[15:25] <rharper> so next boot ds-identify needs to run and pick; what does the log look like after that reboot ?
[15:27] <mgerdts> https://hastebin.com/axaguxusit.txt
[15:28] <rharper> no local data found from DataSourceSmartOS
[15:28] <rharper> so the SmartOS DS didn;t say "yes" I have metadata/user-data
[15:28] <rharper> when cloud-init called .get_data() on it
[15:28] <rharper> which means that the fallback is DatasourceNone
[15:29] <rharper> so the question is , why did the SmartOS DS say it had no local metadata  ?
[15:29] <rharper> isn't that over the serial interface ?
[15:29] <mgerdts> yeah,
[15:29] <mgerdts> I'll try some debug statements in _get_data
[15:31] <rharper> yeah, I don't see a return without a boolean, and the False path has logging =(
[15:31] <mgerdts> that's what I was thinking
[15:32] <rharper> and some debugging in sources/__init__.py
[15:32] <rharper> we now do this metadata caching;
[15:32] <smoser> mgerdts: you should be able to use the main too
[15:32] <smoser> python -m cloudinit.sources.DataSourceSmartOS
[15:33] <smoser> migth be easier to debug that way
[15:33] <mgerdts> that dumps a bunch of metadata
[15:35] <smoser> so i think the pickled object must have a method that is getting in the way. method or attribute i guess.
[15:35] <rharper> IIUC, cloud-init clean was run
[15:36] <rharper> which wipes the object
[15:36] <smoser> oh. hm..
[15:36] <mgerdts> yes, and verified that clean worked
[15:36] <smoser> so then this is essentially fresh boot ?
[15:37] <smoser> or as llose as clean can get us ?
[15:38] <mgerdts> commenting the change to DataSourceSmartOS caused ENI/50-cloud-init.cfg to get static network config.  Oddly, not the right network config.
[15:38] <mgerdts> https://hastebin.com/aviqacabew.txt
[15:39] <mgerdts> ip should be .223 per sdc:nics, but is .222
[15:39] <rharper> "ip":"10.88.88.223","ips":["10.88.88.222/24"]
[15:39] <rharper> your data disagrees
[15:40] <mgerdts> huh.  I guess I missed one of the ip's int he zonecfg.
[15:40] <rharper> I think we only look at ips since it was a superset ?
[15:40] <mgerdts> notice 222 and 223 in there
[15:40] <rharper> yeah
[15:41] <mgerdts> ok, so something in the get_data() path is unhappy with update_events = {'network': [EventType.BOOT]} in DataSourceSmartOS.
[15:42] <mgerdts> I'll go hunting
[15:57] <mgerdts> Looks like the comment in class DataSource is wrong.  This seems to work:
[15:57] <mgerdts> update_events = {'network': [EventType.BOOT_NEW_INSTANCE, EventType.BOOT]}
[15:58] <mgerdts> Apparently BOOT_NEW_INSTANCE is not a subset of BOOT.
[15:58] <rharper> oh, yeah
[15:58] <rharper> that must have been asperational
[15:59] <mgerdts> :)
[15:59] <rharper> Don't we log if we skip reading it ?
[16:00] <mgerdts> I think DataSourceSmartOS should do: update_events['network'].append(EventType.BOOT)
[16:01] <mgerdts> Doesn't look like it.
[16:01] <rharper> update_metadata could use some logging in the negative path I think
[16:01] <rharper> otherwise you get return False and nothing
[16:02] <mgerdts> yeah, I'll add something there along with updating the aspirational comment.
[16:05] <rharper> thanks for debugging that
[16:08] <mgerdts> Yeah, no problem.  Thanks for the nudges in the right direction.
[16:09] <mgerdts> hopefully this addresses the changed ssh host key after upgrade + reboot.
[16:10] <mgerdts> would you like the fix for blackboxsw's change in a separate changeset from the smartos changes?
[16:11] <mgerdts> likely: https://hastebin.com/evokubiluj.diff
[16:18] <rharper> mgerdts: the ssh won't regen if the instance-id hasn't changed;
[16:19] <rharper> separate is best if that's not too much trouble
[16:19] <mgerdts> so maybe dpkg -i cloud-init_all.deb cuased it to get clobbered.
[16:19] <mgerdts> sure, easy enough.
[16:19] <rharper> I suspect we should also add a unittest on the derived Datasource
[16:19] <rharper> I wonder if the actual unittests do the .append() like you did
[16:21] <mgerdts> sadly unittests fail now.  So probably need some fixes there too.
[19:18] <mgerdts> I think I managed to sort this out.  https://code.launchpad.net/~mgerdts/cloud-init/+git/cloud-init/+merge/350374 then https://code.launchpad.net/~mgerdts/cloud-init/+git/cloud-init/+merge/350375
[19:27] <rharper> mgerdts: thatnks, reviewing
[19:27] <smoser> mgerdts: you just want the first to land first ?
[19:27] <smoser> the second is a superset right ? but not separate commits.
[19:27] <smoser> i think.. ?
[19:27] <mgerdts> yeah, the second will break without the first.
[19:27] <mgerdts> due to list vs. set
[19:28] <mgerdts> I tried to set dependencies in the merge proposal, but not sure if that is actually hooked into anything.
[20:50] <mgerdts> awesome.  python 2.6 strikes again.
[21:16] <smoser> https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/350381
[21:17] <smoser> mgerdts: ./tools/run-container is fairly easily usable from ubuntu if you alve lxc to get you a centos/6
[21:17] <smoser> set notation?
[21:17] <mgerdts> yeah
[21:22] <mgerdts> should be fixed now.  will CI bot automatically re-run or does it need to be nudged?
[21:22] <smoser> rharper: https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/350382 will fix tip-pylint
[21:22] <rharper> auto reruns
[21:22] <rharper> looking