=== frickler_ is now known as frickler === r-daneel_ is now known as r-daneel === rangerpb is now known as baude [16:19] blackboxsw: if you had a minute [16:20] - bug 1770712 fixes for ubuntu package branches. [16:20] bug 1770712 in cloud-init (Ubuntu Cosmic) "It would be nice if cloud-init provides full version in logs" [Medium,Confirmed] https://launchpad.net/bugs/1770712 [16:20] devel https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347380 [16:20] bionic https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347381 [16:20] artful https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347382 [16:20] xenial https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347384 [16:20] that'd be good to land into cosmic today... just get those all in and an upload would be good. [16:23] +1 [16:23] will do [18:05] smoser: we're good on the devel portion of those branches, because PACKAGED_VERSION exists in cloudinit/version.py. in bionic and older branches we haven't yet pulled back 5446c788160412189200c6cc688b14c9f9071943 [18:05] shouldn't we pull that back too ? [18:05] I realize the packaging change doesn't break currently, but it also doesn't do anything yet [18:08] blackboxsw: well, the next new-upstream-snapshot will get it [18:08] so at this point it is just "staged". [18:08] you're correct though in that it basically adds dead code. [18:09] (the daily builds *would* have it ) [18:09] ok just wanted to make sure this was intended, they are decoupled from each other kindof, so I wanted to confirm that we are staging it and know that we are not yet expecting to report full pkg version number in <= Bionic ok I'm good [18:12] https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347380 with nit on UNRELEASED -> cosmic [18:12] going through the rest now [18:12] to approve [18:12] blackboxsw: if i uploaded to cosmic i think i'd just do a new-upstream-snapshot [18:13] whicih would then dtrt [18:13] +1 good dela [18:13] deal [18:13] so same there... rwe're just "staging" a change basicalkl. i think that is generally the flow we'd have on all changes to the packaging branches. [18:15] smoser: want me to queue a new-upstream snapshot then for cosmic. [18:15] ? [18:15] and you can merge in your existing branches? [18:15] you can if you'd like. or i can just do it. [18:15] i'll pull existing. [18:16] merged devel [18:17] ok putting up MP [18:17] i'll grab the others too [18:18] ok. all pacikaging branches have it now. [18:21] smoser: testing this now https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/347396 [18:21] I didn't push the tag [18:23] ok package built and proper _package_version is showing up [18:26] blackboxsw: doing 'build and push' so that willl land in cosmic-proposed shortly [19:06] rharper: have 5 mins to chat about https://trello.com/c/Uk7OA71K/798-cloud-provided-network-configuration-openstack-azure-aws ? [19:06] specifically the azure portion [19:34] blackboxsw: here now; still need to chat? [19:43] cool rharper yeah just for a couple mins [19:43] *-mtj [19:43] ok, lemme get setup [19:44] headphone trouble here. coming [19:56] rharper: lost you at 'wouldn't really' [20:33] hi there, [20:34] I'm encountering an issue that is hard to debug :-/ [20:35] A few months ago I did install cloud-init on scaleway images (17.1 was the cloud-init version) then. [20:35] Booting from these images worked fine. [20:36] https://bugs.launchpad.net/cloud-init/+bug/1775074 filed. [20:36] Ubuntu bug 1775074 in cloud-init "collect logs: grab /var/lib/cloud data files" [Medium,Confirmed] [20:36] hi vila, yeah just explain the prob as best you can, maybe someone can help [20:36] I'm now using the exact same scripts to install cloud-init (18.2) on more recent images and things break: [20:37] 2018-06-04 19:37:35,378 - stages.py[DEBUG]: cache invalid in datasource: DataSourceScaleway [20:37] 2018-06-04 19:37:35,378 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: cache invalid in datasource: DataSourceScaleway [20:38] on top of that, then run finished by creating /var/lib/cloud/instance/boot-finished at a time where /var/lib/cloud/instance does not exist (i.e.the '/var/lib/cloud/instance' symlink), so a dir is created instead [20:39] further runs of cloud-init then fail because then can't delete the dir (a symlink is expected) [20:41] any hints on how such issues can be debugged highly appreciated [20:42] vila: hrm looking. That specific log message on "cache invalid", means that the datasource will attempt to re-run metadata collection again because it appeared that that instance cache was invalid (and needed a refresh). [20:42] blackboxsw: Where and how is the cache said to be invalid ? [20:42] blackboxsw: I was able to unpickle it from python [20:43] specifically in /usr/lib/python3/dist-packages/cloudinit/stages.py [20:43] it checks datasource.check_instance_id [20:43] yeah, opened already [20:44] in Scaleway it looks like that always returns False *I think* [20:44] which means always re-run get-data [20:44] when I unpickled it, I check hasattr(ds, 'check_instance_id') [20:44] but I was unclear about inferring self.cfg [20:45] I guessed it was the instance-id and as far as I could see it was the same instance (I did a reboot) [20:45] blackboxsw: I have a vague feeeling it may related to /var/lib/cloud/instance being deleted at that point but I don't where to look for that [20:45] from the base class cloudinit/sources/__init__.py:DataSource.check_instance_id is just a dummy function returning False and I don't see that Scaleway is overriding that [20:46] nope indeed [20:46] and it used to work [20:46] let's rewing a bit may be [20:46] rewind [20:46] starting from a booted image, I run apt-get install cloud-init --no-install-recommends [20:47] please do. [20:47] anything I need to do for c-i to behave at next boot (same instance or different one) [20:47] A way to test a clean boot scenario from cloud-init would be hrm so 'sudo cloud-init clean --logs --reboot' would perform a [20:47] 'greenfield' install as if the system had never run cloud-init before [20:47] \o/ [20:47] it's what we use for upgrade testing and fresh boot validatin [20:47] validation [20:48] that will blow away /var/log/cloud* /var/lib/cloud/* with the exception of a /var/lib/cloud/seed subdir if applicable (as that seeds some metadata on some clouds) [20:49] blackboxsw: done [20:49] cloud-init analyze show [20:49] http://cloudinit.readthedocs.io/en/latest/topics/capabilities.html#cloud-init-collect-logs for more details [20:50] yeah analyze show is good for quick inspection of what cloud-init performed. [20:50] says no cache found but fails to find the scaleway ds [20:51] and cloud-init.log shows the scaleway datasource is not seen nor used [20:51] cloud-init status --long? [20:52] detail: [20:52] ('ssh-authkey-fingerprints', KeyError('getpwnam(): name not found: ubuntu',)) [20:52] but that's a fallout from not using the ds and not finding the user-data [20:52] hrm, ok so we have a couple errors looks like [20:52] right [20:52] hmm [20:52] so, red herring [20:52] why is the datasouce missed ? [20:53] I have datasource_list: [ Scaleway, None] [20:53] and disable_root: false [20:53] (right, forgot to mention that I added the later because scaleway default login is on root) [20:53] which was a first hint that things behave differently [20:54] that's good at least. mind doing a 'sudo cloud-init collect-logs' and sending an email to chad.smith@canonical.com. I can glance at it quickly here [20:54] collect-logs will dump cloud-init.tar.gz in your cwd [20:54] it'll contain all logs and potentially your user-data [20:55] no worries, nothing secret there [20:55] good deal [20:59] sent [20:59] checking thanks [21:01] I've tried various workflows giving different results, for example, after installing, running 'systemctl start cloud-final && cloud-init status --wait' find the datasource and process properly, but the next boot fails [21:02] right now, for the logs I sent, I have a broken /var/lib/cloud/instance (a dir rather than a symlink) [21:02] vila: hrm, normally in cloud-init logs I [21:03] am accustomed to seeing init-local stage, then init then modules:config, but your logs skip the 'init' stage [21:03] can you cloud-init analyze show | pastebinit [21:03] can you 'cloud-init analyze show | pastebinit' [21:04] https://paste.ubuntu.com/p/ZwG4BnnJRd/ [21:04] normally I'd see an Starting stage: init-network after init-local and before modules-config [21:04] hrm [21:05] can you cat /etc/cloud/cloud.cfg | pastebinit [21:05] https://paste.ubuntu.com/p/Gb6Mzxms9q/ <- that one worked [21:05] like ~2 hours ago [21:06] blackboxsw: /etc/cloud/cloud.cfg is untouched, but I add: [21:06] cat < /etc/cloud/cloud.cfg.d/99_byov.cfg [21:06] # Generated by byov at $(date) [21:06] datasource_list: [ Scaleway, None] [21:06] apt_preserve_sources_list: true [21:06] disable_root: false [21:06] EOC [21:07] https://paste.ubuntu.com/p/zVQDMtmG65/ <- cat /etc/cloud/cloud.cfg [21:08] but yeah, it's the missing init-network that I'm tracking indeed [21:09] interesting. so your cloud-init.log mentions 2018-06-04 20:48:32,448 - __init__.py[DEBUG]: Searching for local data source in: [] [21:09] that list should have represented Scaleway in it [21:09] something is modifying that datasource list [21:09] exactly, sometimes it's there sometimes it's not [21:09] any other files int /etc/cloud/cloud.cfg.d [21:10] and it seems the invalid cache somehow mark the datasource entirely wrong [21:10] like /etc/cloud/cloud.cfg.d/90_dpkg.cfg ? [21:10] blackboxsw: nope, I used to have datasource_list: [ NoCloud, OpenStack, Scaleway, None] when that was working [21:10] right, that one is overriedn... oh, let me check [21:10] nope, standard content: [21:11] # to update this file, run dpkg-reconfigure cloud-init [21:11] datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, None ] [21:11] and scaleway is there [21:11] ok that's good. yeah and your ds-itentify.log in the cloud-init.tar.gz also shows ds-identify properly detected Scaleway as an option [21:11] Also, I could find when ds-identify is run, but I noticed it's run more than once in some scenarios [21:11] ds-identify rather [21:12] Also, I could NOT find when ds-identify is run, but I noticed it's run more than once in some scenarios [21:13] so, I keep 'cloud-init clean --logs --reboot' in my notes for the future, but it failed here [21:14] which reproduces my issue so it still a good recipe but it doesn't give the result you expected I think [21:16] vila: on the failed case, 'systemctl list-dependencies | grep cloud' this is what I see http://paste.ubuntu.com/p/TQM8RWbwkP/ [21:17] I'd expect a cloud-init.server job/unit listed in systemd. it's what runs 'cloud-init init' which is the network stage that we are missing in your failed case [21:17] not sure I'm going down a rat hole there [21:17] not sure *if* [21:18] blackboxsw: right, I can rebuild the instance without installing cloud-init and restart from there may be ? [21:19] blackboxsw: once cloud-init is installed, I'm running https://paste.ubuntu.com/p/xtGNYpfzZT/ [21:19] sounds like a good plan, installing the new cloud-init deb in your environment after the fact should take care of creating the right systemd generators to queue cloud-init stages during boot (if something got mangled across the upgrade path) [21:20] vila: running cloud-init init-local; and cloud-init init 'naked' outside of the standard boot process on an instance is not exactly what we intended (and could be rife with some error condtions) [21:20] blackboxsw: I used to run nothing and all was rosy ;-) [21:20] yeah nothing is what we hope is always rosy (and intended). Just booting normally should take care of ordering all cloud-init stages appropriately (including module configuration etc). [21:20] blackboxsw: what *is* the intended workflow ? install, save image, boot ? [21:21] yes vila, boot clean image, install cloud-init, power off, copy clean image, let cloud-init boot in user-configured environment to collect and config based on metadata/user-data [21:21] trying to look more at your latest paste [21:22] damn it, that was I did and I thought may be I missed a step [21:27] vila, yeah something smells funky, (I don't have a scaleway acct unfortuntately), I'll try to bisect the diffs on Scaleway datasource from 17.1 -> 18.2 I didn't think we have anything significant in that upgrade path other than some exception handling changes on url retries in that space [21:28] blackboxsw: yup, went there saw that, could find a link either (but I'm not the expert ;-) [21:29] I'd like to see a /var/log/cloud-init.log in the case where cloud-init was upgraded and only a reboot run. (not manual run of cloud-init init --local and 'cloud-init init'). [21:29] just got the instance without cloud-init installed [21:29] so, apt-get cloud-init --no-install-recommends [21:29] *install [21:32] reboot [21:33] https://paste.ubuntu.com/p/HDhH8hWpmd/ [21:34] yet https://paste.ubuntu.com/p/z6wXtF6Jtx/ [21:35] blackboxsw: you said "in the case where cloud-init was upgraded" s/upgraded/installed/ otherwise, nothing from my script [21:38] ok so Scaleway ordered before Ec2, Ec2 considered maybe, that shouldn't break anything specifically.for Scaleway's datasource. reading your cloud-init.log [21:38] 2018-06-04 21:31:31,651 - stages.py[DEBUG]: no cache found === fresh boot, no cruft from previous around [21:39] right, so the cache itself is not the root cause, well done [21:40] line 74 of your first paste is showing us we are properly attempting the discover Scaleway (and many other datasources) in python (instead of ds-identify which is just a shell script (for speed) [21:40] and https://paste.ubuntu.com/p/PJ2C5tQsSJ/ should cover all the datasource inputs [21:40] line 78 rather [21:41] ohh wait [21:41] right [21:41] no Scaleway in line 78 [21:41] while still there in line 77 [21:42] * vila thinks [21:42] could it be that /var/run/scaleway is created too late (aka race ?) [21:42] I had thought Scaleway datasource was defined as FILESYSTEM only. checking the DataSourceScaleway.py again [21:42] my bad [21:42] (DataSourceScaleway, (sources.DEP_FILESYSTEM, sources.DEP_NETWORK)), [21:43] that means Scaleway datasource is init-network stage detected only ... ok so we expect to filtered out of init-local stage [21:43] ok so we're still good in init-local stage (not detecting scaleway) [21:44] but the fact that init-network (otherwise called via CLI as 'cloud-init init') should not be skipped, [21:44] that's what should have detected scaleway.... [21:44] reading down past init-local in your cloud-init log now. sorry for the noise [21:45] no no ! very helpful [21:45] (and entertaining ;) [21:45] 2018-06-04 21:31:32,137 - handlers.py[DEBUG]: finish: init-local: SUCCESS: searching for local datasources [21:45] 2018-06-04 21:31:34,280 - util.py[DEBUG]: Cloud-init v. 18.2 running 'modules:config' at Mon, 04 Jun 2018 21:31:34 +0000. Up 16.05 seconds. [21:45] heh [21:45] (don't get derailed but line 136 : 2018-06-04 21:31:31,963 - util.py[DEBUG]: dmi data /sys/class/dmi/id/sys_vendor returned Scaleway) [21:46] There is a comment that dmi is not implemented IIRC... [21:46] * check DMI data: not yet implemented by Scaleway, but the check is made to [21:46] be future-proof. [21:48] vila: what's systemctl list-dependencies | grep cloud tell you? [21:48] https://paste.ubuntu.com/p/7SSDVB7ZsG/ [21:49] that's ok on the dmi read, as it was something cloud-init did to determine that it's not running on DigitalOcean. [21:49] ha [21:51] meh. something is causing cloud-init to skip init-network stage in that environment. (like a systemd job falling over maybe?) I see no tracebacks indicating why that is skipped. lemme see if I can digup the format of the systemd job [21:51] do you have a /lib/systemd/system/cloud-init.service ? [21:51] yes [21:52] https://paste.ubuntu.com/p/JgDkpMCy5v/ [21:55] bah. ok I think we need a bug here. I'll have to get a scaleway account setup to checkit out. nothing should have changed w.r.t. 17.1->18.2 and the systemd startup jobs/units. but skipping init-network stage is broken and that's why things are falling over. I'll have to get a scaleway acct setup to triage more [21:55] what ubuntu release was this instance? [21:55] xenial [21:55] bionic? xenial? [21:55] ok [21:56] blackboxsw: [21:56] would you kindly 'ubuntu-bug cloud-init' vila and file a bug per instructions? [21:56] it'll dump your collect-logs output into a bug attachement [21:57] blackboxsw: from inside the instance ? [21:58] -bash: ubuntu-bug: command not found, installing [21:58] vila: yes please (if it has outbound connectivity). otherwise you could just file a bug at https://bugs.launchpad.net/cloud-init/+filebug and attach a the cloud-init.tar.gz from your latest run to the bug [21:59] all ubuntu-bug does is ask a question or two about your cloud platform and collate that data when filing output from 'sudo cloud-init collect-logs' [21:59] * vila installs apport [22:00] * vila thinks about giving access... should be a matter of adding an ssh key on my account ? [22:01] yeah in the nearterm your sudo cloud-init init --local; sudo cloud-init init; sudo cloud-init modules --mode config, sudo cloud-init modules --mode final I *think* should get you 90% of the way there [22:01] vila: right you could add ssh-import-id chad.smith to the instance [22:01] vila: right you could run 'ssh-import-id chad.smith' to the instance then I'd be able to login as whatever user you did that under [22:02] blackboxsw: root ! what else ? :-D [22:02] hah! but that said, I'm going to have to disappear shortly so I may not get to it until tomorrow morn my time [22:02] <--- and file your back acct and social security number here ;) [22:03] it may be good to have a reference bug so the others on the team can peek at the triage/respose too [22:03] hehe [22:04] https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1775086 [22:04] Ubuntu bug 1775086 in cloud-init (Ubuntu) "cloud-init fails to recognize scaleway" [Undecided,New] [22:06] thanks again. vila I have to bail for a while. will check it out [22:07] blackboxsw: thanks to you, at least I'm not mad and something is going on that is worth fixing ;-) [22:07] blackboxsw: if only for /var/lib/cloud/instance being a dir... [22:19] thx vila on the not being a dir issue I'll track a separate bug on 'cloud-init collect-logs' cmd being more resilient of failure cases [22:22] added that content to https://bugs.launchpad.net/cloud-init/+bug/1775074 [22:22] Ubuntu bug 1775074 in cloud-init "collect logs: grab /var/lib/cloud data files" [Medium,Confirmed] [22:22] will try to kill 2 birds with 1 stone there [22:31] vila: comment for you on your bug. ok systemd has removed the cloud-init.service job for some reason and I need to dig into why [22:44] haaaa