smoser | rharper: i think i probably paste-binned you a git-propose-merge once ? | 12:12 |
---|---|---|
caribou | smoser: good morning; I'm about to push the new version of my MR in our infrastructure as it fixes an IPv6 regression | 12:36 |
caribou | hopefully it'll be adequate for merging, otherwise I'll push a new version | 12:36 |
caribou | btw, I took ownership of LP: 1662345 as it is a showstopper for us to deploy cloud-init on arm64 | 12:37 |
ubot5 | Launchpad bug 1662345 in qemu (Ubuntu Xenial) "smbios parameter settings not visible in guest" [Medium,Confirmed] https://launchpad.net/bugs/1662345 | 12:37 |
smoser | caribou: ok. i will look. | 14:08 |
caribou | smoser: fun thing is that your suggestion fixed an issue we have with IPv6 following our recent deployment | 14:09 |
rharper | smoser: hrm | 14:22 |
smoser | rharper: i found it. | 14:31 |
smoser | thanks to irc logs. | 14:31 |
rharper | heh | 14:31 |
smoser | git-propose-merge is http://paste.ubuntu.com/p/gMBWKvD42W/ | 14:31 |
rharper | what was it again? oh, like sparkie's git to MP | 14:31 |
rharper | right? | 14:31 |
rharper | for launchpad ? | 14:31 |
smoser | yeah, but sparkiegeeks' requires launchpadlib and auth. | 14:31 |
rharper | which is sort of the right way to do it, but sure , we're already logged in anyhow | 14:32 |
smoser | this is much faster, and works in a small subset of places . but places that i use. | 14:32 |
rharper | yep | 14:32 |
smoser | well, this just opens up the browser for you | 14:32 |
smoser | and you hit 'merge' | 14:32 |
smoser | his actually creates it. | 14:32 |
smoser | there are other issues i have with his. nothign that couldnt be fixed. that is definitly the right place (or more right than hacky smoser scripts) | 14:33 |
danMS_ | smoser: i am working on https://bugs.launchpad.net/cloud-init/+bug/1779207 | 14:40 |
ubot5 | Ubuntu bug 1779207 in cloud-init "Failed mount of '/dev/sdb1' with Swap File cloud-init config" [Medium,Triaged] | 14:40 |
danMS_ | Do you know if other cloud platforms have hit this issue? | 14:41 |
smoser | danMS_: by "platform" you mean cloud platform (not linux distro) | 14:56 |
smoser | right? | 14:56 |
danMS_ | yes | 14:57 |
smoser | the issue is probably present everywhere. | 14:57 |
smoser | however it is dramatically worse on azure | 14:57 |
smoser | because of "redeploy" with same instance id. | 14:57 |
smoser | er... maybe not relavant to instance id. | 14:57 |
smoser | let me try again | 14:58 |
smoser | the problem exists any time there are "stale" entries in /etc/fstab | 14:58 |
danMS_ | ok, on Azure they will get a new ntfs formatted ephemeral on redeploy or deallocate | 14:58 |
smoser | on Azure, stale entries occur in a regular lifespan of an instance (with 'redeploy'... that might not be the right word) | 14:58 |
smoser | on other platforms, the issue can occur on a snapshot -> new-instance | 14:59 |
danMS_ | so it sounds like, when they deallocate their VM, they present the previously attached ephemeral drive | 14:59 |
danMS_ | whereas we are not | 14:59 |
mgerdts | blackboxsw: Trying to update DataSourceSmartOS to use EventType.BOOT to get network reconfiguration on boot and hitting a snag. The change is simple enough, I think: https://hastebin.com/vahizocidi.diff | 15:00 |
smoser | danMS_: well no one else "deallocates"to my knowledge. | 15:00 |
smoser | i dont know acutally. | 15:01 |
danMS_ | i have not done a deep dive, but i did not see this issue on Ubuntu vms | 15:01 |
mgerdts | But when I install the new deb networking is not configured properly. The ENI file has dhcp, and it looks like the saved configuration is for the wrong datasource. | 15:01 |
mgerdts | >>> pickle.load(open('obj.pkl', 'rb')) | 15:02 |
mgerdts | <cloudinit.sources.DataSourceNone.DataSourceNone object at 0x7fc92c313c50> | 15:02 |
mgerdts | That was in /var/lib/cloud/intance. | 15:02 |
mgerdts | *instance | 15:02 |
smoser | mgerdts: blackboxsw is out for a hwhile | 15:03 |
mgerdts | oh, ok. Any idea what could be causing the wrong datasource data to be cached? That is, does this sound familiar? | 15:03 |
=== r-daneel_ is now known as r-daneel | ||
rharper | mgerdts: hrm; that's curious; what's the recreate scenario ? new instance boot, upgrade deb , reboot ? | 15:20 |
mgerdts | install xenial, upgrade to bionic, reboot, install new cloud-init, reboot (ssh host key changed - bad) (did not look closely at networking config), poweroff vm, modify network config in host, restart host metadata service to be sure it picked it up, booted VM. | 15:23 |
rharper | mgerdts: ok; I suppose it's best to pkl.load and print after each state to see where it changes | 15:24 |
mgerdts | Since then, I did cloud-init clean -l, reboot. It put DataSourceNone on obj.pkl | 15:24 |
rharper | ah | 15:24 |
rharper | that sounds like unidentified change | 15:24 |
rharper | clean wipes the previous instance info | 15:24 |
rharper | so next boot ds-identify needs to run and pick; what does the log look like after that reboot ? | 15:25 |
mgerdts | https://hastebin.com/axaguxusit.txt | 15:27 |
rharper | no local data found from DataSourceSmartOS | 15:28 |
rharper | so the SmartOS DS didn;t say "yes" I have metadata/user-data | 15:28 |
rharper | when cloud-init called .get_data() on it | 15:28 |
rharper | which means that the fallback is DatasourceNone | 15:28 |
rharper | so the question is , why did the SmartOS DS say it had no local metadata ? | 15:29 |
rharper | isn't that over the serial interface ? | 15:29 |
mgerdts | yeah, | 15:29 |
mgerdts | I'll try some debug statements in _get_data | 15:29 |
rharper | yeah, I don't see a return without a boolean, and the False path has logging =( | 15:31 |
mgerdts | that's what I was thinking | 15:31 |
rharper | and some debugging in sources/__init__.py | 15:32 |
rharper | we now do this metadata caching; | 15:32 |
smoser | mgerdts: you should be able to use the main too | 15:32 |
smoser | python -m cloudinit.sources.DataSourceSmartOS | 15:32 |
smoser | migth be easier to debug that way | 15:33 |
mgerdts | that dumps a bunch of metadata | 15:33 |
smoser | so i think the pickled object must have a method that is getting in the way. method or attribute i guess. | 15:35 |
rharper | IIUC, cloud-init clean was run | 15:35 |
rharper | which wipes the object | 15:36 |
smoser | oh. hm.. | 15:36 |
mgerdts | yes, and verified that clean worked | 15:36 |
smoser | so then this is essentially fresh boot ? | 15:36 |
smoser | or as llose as clean can get us ? | 15:37 |
mgerdts | commenting the change to DataSourceSmartOS caused ENI/50-cloud-init.cfg to get static network config. Oddly, not the right network config. | 15:38 |
mgerdts | https://hastebin.com/aviqacabew.txt | 15:38 |
mgerdts | ip should be .223 per sdc:nics, but is .222 | 15:39 |
rharper | "ip":"10.88.88.223","ips":["10.88.88.222/24"] | 15:39 |
rharper | your data disagrees | 15:39 |
mgerdts | huh. I guess I missed one of the ip's int he zonecfg. | 15:40 |
rharper | I think we only look at ips since it was a superset ? | 15:40 |
mgerdts | notice 222 and 223 in there | 15:40 |
rharper | yeah | 15:40 |
mgerdts | ok, so something in the get_data() path is unhappy with update_events = {'network': [EventType.BOOT]} in DataSourceSmartOS. | 15:41 |
mgerdts | I'll go hunting | 15:42 |
mgerdts | Looks like the comment in class DataSource is wrong. This seems to work: | 15:57 |
mgerdts | update_events = {'network': [EventType.BOOT_NEW_INSTANCE, EventType.BOOT]} | 15:57 |
mgerdts | Apparently BOOT_NEW_INSTANCE is not a subset of BOOT. | 15:58 |
rharper | oh, yeah | 15:58 |
rharper | that must have been asperational | 15:58 |
mgerdts | :) | 15:59 |
rharper | Don't we log if we skip reading it ? | 15:59 |
mgerdts | I think DataSourceSmartOS should do: update_events['network'].append(EventType.BOOT) | 16:00 |
mgerdts | Doesn't look like it. | 16:01 |
rharper | update_metadata could use some logging in the negative path I think | 16:01 |
rharper | otherwise you get return False and nothing | 16:01 |
mgerdts | yeah, I'll add something there along with updating the aspirational comment. | 16:02 |
rharper | thanks for debugging that | 16:05 |
mgerdts | Yeah, no problem. Thanks for the nudges in the right direction. | 16:08 |
mgerdts | hopefully this addresses the changed ssh host key after upgrade + reboot. | 16:09 |
mgerdts | would you like the fix for blackboxsw's change in a separate changeset from the smartos changes? | 16:10 |
mgerdts | likely: https://hastebin.com/evokubiluj.diff | 16:11 |
rharper | mgerdts: the ssh won't regen if the instance-id hasn't changed; | 16:18 |
rharper | separate is best if that's not too much trouble | 16:19 |
mgerdts | so maybe dpkg -i cloud-init_all.deb cuased it to get clobbered. | 16:19 |
mgerdts | sure, easy enough. | 16:19 |
rharper | I suspect we should also add a unittest on the derived Datasource | 16:19 |
rharper | I wonder if the actual unittests do the .append() like you did | 16:19 |
mgerdts | sadly unittests fail now. So probably need some fixes there too. | 16:21 |
=== akik_ is now known as akik | ||
mgerdts | I think I managed to sort this out. https://code.launchpad.net/~mgerdts/cloud-init/+git/cloud-init/+merge/350374 then https://code.launchpad.net/~mgerdts/cloud-init/+git/cloud-init/+merge/350375 | 19:18 |
rharper | mgerdts: thatnks, reviewing | 19:27 |
smoser | mgerdts: you just want the first to land first ? | 19:27 |
smoser | the second is a superset right ? but not separate commits. | 19:27 |
smoser | i think.. ? | 19:27 |
mgerdts | yeah, the second will break without the first. | 19:27 |
mgerdts | due to list vs. set | 19:27 |
mgerdts | I tried to set dependencies in the merge proposal, but not sure if that is actually hooked into anything. | 19:28 |
mgerdts | awesome. python 2.6 strikes again. | 20:50 |
smoser | https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/350381 | 21:16 |
smoser | mgerdts: ./tools/run-container is fairly easily usable from ubuntu if you alve lxc to get you a centos/6 | 21:17 |
smoser | set notation? | 21:17 |
mgerdts | yeah | 21:17 |
mgerdts | should be fixed now. will CI bot automatically re-run or does it need to be nudged? | 21:22 |
smoser | rharper: https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/350382 will fix tip-pylint | 21:22 |
rharper | auto reruns | 21:22 |
rharper | looking | 21:22 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!