=== frickler_ is now known as frickler | ||
=== r-daneel_ is now known as r-daneel | ||
=== rangerpb is now known as baude | ||
smoser | blackboxsw: if you had a minute | 16:19 |
---|---|---|
smoser | - bug 1770712 fixes for ubuntu package branches. | 16:20 |
ubot5 | bug 1770712 in cloud-init (Ubuntu Cosmic) "It would be nice if cloud-init provides full version in logs" [Medium,Confirmed] https://launchpad.net/bugs/1770712 | 16:20 |
smoser | devel https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347380 | 16:20 |
smoser | bionic https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347381 | 16:20 |
smoser | artful https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347382 | 16:20 |
smoser | xenial https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347384 | 16:20 |
smoser | that'd be good to land into cosmic today... just get those all in and an upload would be good. | 16:20 |
blackboxsw | +1 | 16:23 |
blackboxsw | will do | 16:23 |
blackboxsw | smoser: we're good on the devel portion of those branches, because PACKAGED_VERSION exists in cloudinit/version.py. in bionic and older branches we haven't yet pulled back 5446c788160412189200c6cc688b14c9f9071943 | 18:05 |
blackboxsw | shouldn't we pull that back too ? | 18:05 |
blackboxsw | I realize the packaging change doesn't break currently, but it also doesn't do anything yet | 18:05 |
smoser | blackboxsw: well, the next new-upstream-snapshot will get it | 18:08 |
smoser | so at this point it is just "staged". | 18:08 |
smoser | you're correct though in that it basically adds dead code. | 18:08 |
smoser | (the daily builds *would* have it ) | 18:09 |
blackboxsw | ok just wanted to make sure this was intended, they are decoupled from each other kindof, so I wanted to confirm that we are staging it and know that we are not yet expecting to report full pkg version number in <= Bionic ok I'm good | 18:09 |
blackboxsw | https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347380 with nit on UNRELEASED -> cosmic | 18:12 |
blackboxsw | going through the rest now | 18:12 |
blackboxsw | to approve | 18:12 |
smoser | blackboxsw: if i uploaded to cosmic i think i'd just do a new-upstream-snapshot | 18:12 |
smoser | whicih would then dtrt | 18:13 |
blackboxsw | +1 good dela | 18:13 |
blackboxsw | deal | 18:13 |
smoser | so same there... rwe're just "staging" a change basicalkl. i think that is generally the flow we'd have on all changes to the packaging branches. | 18:13 |
blackboxsw | smoser: want me to queue a new-upstream snapshot then for cosmic. | 18:15 |
blackboxsw | ? | 18:15 |
blackboxsw | and you can merge in your existing branches? | 18:15 |
smoser | you can if you'd like. or i can just do it. | 18:15 |
smoser | i'll pull existing. | 18:15 |
smoser | merged devel | 18:16 |
blackboxsw | ok putting up MP | 18:17 |
smoser | i'll grab the others too | 18:17 |
smoser | ok. all pacikaging branches have it now. | 18:18 |
blackboxsw | smoser: testing this now https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/347396 | 18:21 |
blackboxsw | I didn't push the tag | 18:21 |
blackboxsw | ok package built and proper _package_version is showing up | 18:23 |
smoser | blackboxsw: doing 'build and push' so that willl land in cosmic-proposed shortly | 18:26 |
blackboxsw | rharper: have 5 mins to chat about https://trello.com/c/Uk7OA71K/798-cloud-provided-network-configuration-openstack-azure-aws ? | 19:06 |
blackboxsw | specifically the azure portion | 19:06 |
rharper | blackboxsw: here now; still need to chat? | 19:34 |
blackboxsw | cool rharper yeah just for a couple mins | 19:43 |
blackboxsw | *-mtj | 19:43 |
rharper | ok, lemme get setup | 19:43 |
blackboxsw | headphone trouble here. coming | 19:44 |
blackboxsw | rharper: lost you at 'wouldn't really' | 19:56 |
vila | hi there, | 20:33 |
vila | I'm encountering an issue that is hard to debug :-/ | 20:34 |
vila | A few months ago I did install cloud-init on scaleway images (17.1 was the cloud-init version) then. | 20:35 |
vila | Booting from these images worked fine. | 20:35 |
blackboxsw | https://bugs.launchpad.net/cloud-init/+bug/1775074 filed. | 20:36 |
ubot5 | Ubuntu bug 1775074 in cloud-init "collect logs: grab /var/lib/cloud data files" [Medium,Confirmed] | 20:36 |
blackboxsw | hi vila, yeah just explain the prob as best you can, maybe someone can help | 20:36 |
vila | I'm now using the exact same scripts to install cloud-init (18.2) on more recent images and things break: | 20:36 |
vila | 2018-06-04 19:37:35,378 - stages.py[DEBUG]: cache invalid in datasource: DataSourceScaleway | 20:37 |
vila | 2018-06-04 19:37:35,378 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: cache invalid in datasource: DataSourceScaleway | 20:37 |
vila | on top of that, then run finished by creating /var/lib/cloud/instance/boot-finished at a time where /var/lib/cloud/instance does not exist (i.e.the '/var/lib/cloud/instance' symlink), so a dir is created instead | 20:38 |
vila | further runs of cloud-init then fail because then can't delete the dir (a symlink is expected) | 20:39 |
vila | any hints on how such issues can be debugged highly appreciated | 20:41 |
blackboxsw | vila: hrm looking. That specific log message on "cache invalid", means that the datasource will attempt to re-run metadata collection again because it appeared that that instance cache was invalid (and needed a refresh). | 20:42 |
vila | blackboxsw: Where and how is the cache said to be invalid ? | 20:42 |
vila | blackboxsw: I was able to unpickle it from python | 20:42 |
blackboxsw | specifically in /usr/lib/python3/dist-packages/cloudinit/stages.py | 20:43 |
blackboxsw | it checks datasource.check_instance_id | 20:43 |
vila | yeah, opened already | 20:43 |
blackboxsw | in Scaleway it looks like that always returns False *I think* | 20:44 |
blackboxsw | which means always re-run get-data | 20:44 |
vila | when I unpickled it, I check hasattr(ds, 'check_instance_id') | 20:44 |
vila | but I was unclear about inferring self.cfg | 20:44 |
vila | I guessed it was the instance-id and as far as I could see it was the same instance (I did a reboot) | 20:45 |
vila | blackboxsw: I have a vague feeeling it may related to /var/lib/cloud/instance being deleted at that point but I don't where to look for that | 20:45 |
blackboxsw | from the base class cloudinit/sources/__init__.py:DataSource.check_instance_id is just a dummy function returning False and I don't see that Scaleway is overriding that | 20:45 |
vila | nope indeed | 20:46 |
vila | and it used to work | 20:46 |
vila | let's rewing a bit may be | 20:46 |
vila | rewind | 20:46 |
vila | starting from a booted image, I run apt-get install cloud-init --no-install-recommends | 20:46 |
blackboxsw | please do. | 20:47 |
vila | anything I need to do for c-i to behave at next boot (same instance or different one) | 20:47 |
blackboxsw | A way to test a clean boot scenario from cloud-init would be hrm so 'sudo cloud-init clean --logs --reboot' would perform a | 20:47 |
blackboxsw | 'greenfield' install as if the system had never run cloud-init before | 20:47 |
vila | \o/ | 20:47 |
blackboxsw | it's what we use for upgrade testing and fresh boot validatin | 20:47 |
blackboxsw | validation | 20:47 |
blackboxsw | that will blow away /var/log/cloud* /var/lib/cloud/* with the exception of a /var/lib/cloud/seed subdir if applicable (as that seeds some metadata on some clouds) | 20:48 |
vila | blackboxsw: done | 20:49 |
vila | cloud-init analyze show | 20:49 |
blackboxsw | http://cloudinit.readthedocs.io/en/latest/topics/capabilities.html#cloud-init-collect-logs for more details | 20:49 |
blackboxsw | yeah analyze show is good for quick inspection of what cloud-init performed. | 20:50 |
vila | says no cache found but fails to find the scaleway ds | 20:50 |
vila | and cloud-init.log shows the scaleway datasource is not seen nor used | 20:51 |
blackboxsw | cloud-init status --long? | 20:51 |
vila | detail: | 20:52 |
vila | ('ssh-authkey-fingerprints', KeyError('getpwnam(): name not found: ubuntu',)) | 20:52 |
vila | but that's a fallout from not using the ds and not finding the user-data | 20:52 |
blackboxsw | hrm, ok so we have a couple errors looks like | 20:52 |
blackboxsw | right | 20:52 |
blackboxsw | hmm | 20:52 |
vila | so, red herring | 20:52 |
vila | why is the datasouce missed ? | 20:52 |
vila | I have datasource_list: [ Scaleway, None] | 20:53 |
vila | and disable_root: false | 20:53 |
vila | (right, forgot to mention that I added the later because scaleway default login is on root) | 20:53 |
vila | which was a first hint that things behave differently | 20:53 |
blackboxsw | that's good at least. mind doing a 'sudo cloud-init collect-logs' and sending an email to chad.smith@canonical.com. I can glance at it quickly here | 20:54 |
blackboxsw | collect-logs will dump cloud-init.tar.gz in your cwd | 20:54 |
blackboxsw | it'll contain all logs and potentially your user-data | 20:54 |
vila | no worries, nothing secret there | 20:55 |
blackboxsw | good deal | 20:55 |
vila | sent | 20:59 |
blackboxsw | checking thanks | 20:59 |
vila | I've tried various workflows giving different results, for example, after installing, running 'systemctl start cloud-final && cloud-init status --wait' find the datasource and process properly, but the next boot fails | 21:01 |
vila | right now, for the logs I sent, I have a broken /var/lib/cloud/instance (a dir rather than a symlink) | 21:02 |
blackboxsw | vila: hrm, normally in cloud-init logs I | 21:02 |
blackboxsw | am accustomed to seeing init-local stage, then init then modules:config, but your logs skip the 'init' stage | 21:03 |
blackboxsw | can you cloud-init analyze show | pastebinit | 21:03 |
blackboxsw | can you 'cloud-init analyze show | pastebinit' | 21:03 |
vila | https://paste.ubuntu.com/p/ZwG4BnnJRd/ | 21:04 |
blackboxsw | normally I'd see an Starting stage: init-network after init-local and before modules-config | 21:04 |
blackboxsw | hrm | 21:04 |
blackboxsw | can you cat /etc/cloud/cloud.cfg | pastebinit | 21:05 |
vila | https://paste.ubuntu.com/p/Gb6Mzxms9q/ <- that one worked | 21:05 |
vila | like ~2 hours ago | 21:05 |
vila | blackboxsw: /etc/cloud/cloud.cfg is untouched, but I add: | 21:06 |
vila | cat <<EOC > /etc/cloud/cloud.cfg.d/99_byov.cfg | 21:06 |
vila | # Generated by byov at $(date) | 21:06 |
vila | datasource_list: [ Scaleway, None] | 21:06 |
vila | apt_preserve_sources_list: true | 21:06 |
vila | disable_root: false | 21:06 |
vila | EOC | 21:06 |
vila | https://paste.ubuntu.com/p/zVQDMtmG65/ <- cat /etc/cloud/cloud.cfg | 21:07 |
vila | but yeah, it's the missing init-network that I'm tracking indeed | 21:08 |
blackboxsw | interesting. so your cloud-init.log mentions 2018-06-04 20:48:32,448 - __init__.py[DEBUG]: Searching for local data source in: [] | 21:09 |
blackboxsw | that list should have represented Scaleway in it | 21:09 |
blackboxsw | something is modifying that datasource list | 21:09 |
vila | exactly, sometimes it's there sometimes it's not | 21:09 |
blackboxsw | any other files int /etc/cloud/cloud.cfg.d | 21:09 |
vila | and it seems the invalid cache somehow mark the datasource entirely wrong | 21:10 |
blackboxsw | like /etc/cloud/cloud.cfg.d/90_dpkg.cfg ? | 21:10 |
vila | blackboxsw: nope, I used to have datasource_list: [ NoCloud, OpenStack, Scaleway, None] when that was working | 21:10 |
vila | right, that one is overriedn... oh, let me check | 21:10 |
vila | nope, standard content: | 21:10 |
vila | # to update this file, run dpkg-reconfigure cloud-init | 21:11 |
vila | datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, None ] | 21:11 |
vila | and scaleway is there | 21:11 |
blackboxsw | ok that's good. yeah and your ds-itentify.log in the cloud-init.tar.gz also shows ds-identify properly detected Scaleway as an option | 21:11 |
vila | Also, I could find when ds-identify is run, but I noticed it's run more than once in some scenarios | 21:11 |
blackboxsw | ds-identify rather | 21:11 |
vila | Also, I could NOT find when ds-identify is run, but I noticed it's run more than once in some scenarios | 21:12 |
vila | so, I keep 'cloud-init clean --logs --reboot' in my notes for the future, but it failed here | 21:13 |
vila | which reproduces my issue so it still a good recipe but it doesn't give the result you expected I think | 21:14 |
blackboxsw | vila: on the failed case, 'systemctl list-dependencies | grep cloud' this is what I see http://paste.ubuntu.com/p/TQM8RWbwkP/ | 21:16 |
blackboxsw | I'd expect a cloud-init.server job/unit listed in systemd. it's what runs 'cloud-init init' which is the network stage that we are missing in your failed case | 21:17 |
blackboxsw | not sure I'm going down a rat hole there | 21:17 |
blackboxsw | not sure *if* | 21:17 |
vila | blackboxsw: right, I can rebuild the instance without installing cloud-init and restart from there may be ? | 21:18 |
vila | blackboxsw: once cloud-init is installed, I'm running https://paste.ubuntu.com/p/xtGNYpfzZT/ | 21:19 |
blackboxsw | sounds like a good plan, installing the new cloud-init deb in your environment after the fact should take care of creating the right systemd generators to queue cloud-init stages during boot (if something got mangled across the upgrade path) | 21:19 |
blackboxsw | vila: running cloud-init init-local; and cloud-init init 'naked' outside of the standard boot process on an instance is not exactly what we intended (and could be rife with some error condtions) | 21:20 |
vila | blackboxsw: I used to run nothing and all was rosy ;-) | 21:20 |
blackboxsw | yeah nothing is what we hope is always rosy (and intended). Just booting normally should take care of ordering all cloud-init stages appropriately (including module configuration etc). | 21:20 |
vila | blackboxsw: what *is* the intended workflow ? install, save image, boot ? | 21:20 |
blackboxsw | yes vila, boot clean image, install cloud-init, power off, copy clean image, let cloud-init boot in user-configured environment to collect and config based on metadata/user-data | 21:21 |
blackboxsw | trying to look more at your latest paste | 21:21 |
vila | damn it, that was I did and I thought may be I missed a step | 21:22 |
blackboxsw | vila, yeah something smells funky, (I don't have a scaleway acct unfortuntately), I'll try to bisect the diffs on Scaleway datasource from 17.1 -> 18.2 I didn't think we have anything significant in that upgrade path other than some exception handling changes on url retries in that space | 21:27 |
vila | blackboxsw: yup, went there saw that, could find a link either (but I'm not the expert ;-) | 21:28 |
blackboxsw | I'd like to see a /var/log/cloud-init.log in the case where cloud-init was upgraded and only a reboot run. (not manual run of cloud-init init --local and 'cloud-init init'). | 21:29 |
vila | just got the instance without cloud-init installed | 21:29 |
vila | so, apt-get cloud-init --no-install-recommends | 21:29 |
vila | *install | 21:29 |
vila | reboot | 21:32 |
vila | https://paste.ubuntu.com/p/HDhH8hWpmd/ | 21:33 |
vila | yet https://paste.ubuntu.com/p/z6wXtF6Jtx/ | 21:34 |
vila | blackboxsw: you said "in the case where cloud-init was upgraded" s/upgraded/installed/ otherwise, nothing from my script | 21:35 |
blackboxsw | ok so Scaleway ordered before Ec2, Ec2 considered maybe, that shouldn't break anything specifically.for Scaleway's datasource. reading your cloud-init.log | 21:38 |
blackboxsw | 2018-06-04 21:31:31,651 - stages.py[DEBUG]: no cache found === fresh boot, no cruft from previous around | 21:38 |
vila | right, so the cache itself is not the root cause, well done | 21:39 |
blackboxsw | line 74 of your first paste is showing us we are properly attempting the discover Scaleway (and many other datasources) in python (instead of ds-identify which is just a shell script (for speed) | 21:40 |
vila | and https://paste.ubuntu.com/p/PJ2C5tQsSJ/ should cover all the datasource inputs | 21:40 |
blackboxsw | line 78 rather | 21:40 |
blackboxsw | ohh wait | 21:41 |
vila | right | 21:41 |
blackboxsw | no Scaleway in line 78 | 21:41 |
vila | while still there in line 77 | 21:41 |
* vila thinks | 21:42 | |
vila | could it be that /var/run/scaleway is created too late (aka race ?) | 21:42 |
blackboxsw | I had thought Scaleway datasource was defined as FILESYSTEM only. checking the DataSourceScaleway.py again | 21:42 |
blackboxsw | my bad | 21:42 |
blackboxsw | (DataSourceScaleway, (sources.DEP_FILESYSTEM, sources.DEP_NETWORK)), | 21:42 |
blackboxsw | that means Scaleway datasource is init-network stage detected only ... ok so we expect to filtered out of init-local stage | 21:43 |
blackboxsw | ok so we're still good in init-local stage (not detecting scaleway) | 21:43 |
blackboxsw | but the fact that init-network (otherwise called via CLI as 'cloud-init init') should not be skipped, | 21:44 |
blackboxsw | that's what should have detected scaleway.... | 21:44 |
blackboxsw | reading down past init-local in your cloud-init log now. sorry for the noise | 21:44 |
vila | no no ! very helpful | 21:45 |
vila | (and entertaining ;) | 21:45 |
blackboxsw | 2018-06-04 21:31:32,137 - handlers.py[DEBUG]: finish: init-local: SUCCESS: searching for local datasources | 21:45 |
blackboxsw | 2018-06-04 21:31:34,280 - util.py[DEBUG]: Cloud-init v. 18.2 running 'modules:config' at Mon, 04 Jun 2018 21:31:34 +0000. Up 16.05 seconds. | 21:45 |
blackboxsw | heh | 21:45 |
vila | (don't get derailed but line 136 : 2018-06-04 21:31:31,963 - util.py[DEBUG]: dmi data /sys/class/dmi/id/sys_vendor returned Scaleway) | 21:45 |
vila | There is a comment that dmi is not implemented IIRC... | 21:46 |
vila | * check DMI data: not yet implemented by Scaleway, but the check is made to | 21:46 |
vila | be future-proof. | 21:46 |
blackboxsw | vila: what's systemctl list-dependencies | grep cloud tell you? | 21:48 |
vila | https://paste.ubuntu.com/p/7SSDVB7ZsG/ | 21:48 |
blackboxsw | that's ok on the dmi read, as it was something cloud-init did to determine that it's not running on DigitalOcean. | 21:49 |
vila | ha | 21:49 |
blackboxsw | meh. something is causing cloud-init to skip init-network stage in that environment. (like a systemd job falling over maybe?) I see no tracebacks indicating why that is skipped. lemme see if I can digup the format of the systemd job | 21:51 |
blackboxsw | do you have a /lib/systemd/system/cloud-init.service ? | 21:51 |
vila | yes | 21:51 |
vila | https://paste.ubuntu.com/p/JgDkpMCy5v/ | 21:52 |
blackboxsw | bah. ok I think we need a bug here. I'll have to get a scaleway account setup to checkit out. nothing should have changed w.r.t. 17.1->18.2 and the systemd startup jobs/units. but skipping init-network stage is broken and that's why things are falling over. I'll have to get a scaleway acct setup to triage more | 21:55 |
blackboxsw | what ubuntu release was this instance? | 21:55 |
vila | xenial | 21:55 |
blackboxsw | bionic? xenial? | 21:55 |
blackboxsw | ok | 21:55 |
vila | blackboxsw: | 21:56 |
blackboxsw | would you kindly 'ubuntu-bug cloud-init' vila and file a bug per instructions? | 21:56 |
blackboxsw | it'll dump your collect-logs output into a bug attachement | 21:56 |
vila | blackboxsw: from inside the instance ? | 21:57 |
vila | -bash: ubuntu-bug: command not found, installing | 21:58 |
blackboxsw | vila: yes please (if it has outbound connectivity). otherwise you could just file a bug at https://bugs.launchpad.net/cloud-init/+filebug and attach a the cloud-init.tar.gz from your latest run to the bug | 21:58 |
blackboxsw | all ubuntu-bug does is ask a question or two about your cloud platform and collate that data when filing output from 'sudo cloud-init collect-logs' | 21:59 |
* vila installs apport | 21:59 | |
* vila thinks about giving access... should be a matter of adding an ssh key on my account ? | 22:00 | |
blackboxsw | yeah in the nearterm your sudo cloud-init init --local; sudo cloud-init init; sudo cloud-init modules --mode config, sudo cloud-init modules --mode final I *think* should get you 90% of the way there | 22:01 |
blackboxsw | vila: right you could add ssh-import-id chad.smith to the instance | 22:01 |
blackboxsw | vila: right you could run 'ssh-import-id chad.smith' to the instance then I'd be able to login as whatever user you did that under | 22:01 |
vila | blackboxsw: root ! what else ? :-D | 22:02 |
blackboxsw | hah! but that said, I'm going to have to disappear shortly so I may not get to it until tomorrow morn my time | 22:02 |
blackboxsw | <--- and file your back acct and social security number here ;) | 22:02 |
blackboxsw | it may be good to have a reference bug so the others on the team can peek at the triage/respose too | 22:03 |
vila | hehe | 22:03 |
vila | https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1775086 | 22:04 |
ubot5 | Ubuntu bug 1775086 in cloud-init (Ubuntu) "cloud-init fails to recognize scaleway" [Undecided,New] | 22:04 |
blackboxsw | thanks again. vila I have to bail for a while. will check it out | 22:06 |
vila | blackboxsw: thanks to you, at least I'm not mad and something is going on that is worth fixing ;-) | 22:07 |
vila | blackboxsw: if only for /var/lib/cloud/instance being a dir... | 22:07 |
blackboxsw | thx vila on the not being a dir issue I'll track a separate bug on 'cloud-init collect-logs' cmd being more resilient of failure cases | 22:19 |
blackboxsw | added that content to https://bugs.launchpad.net/cloud-init/+bug/1775074 | 22:22 |
ubot5 | Ubuntu bug 1775074 in cloud-init "collect logs: grab /var/lib/cloud data files" [Medium,Confirmed] | 22:22 |
blackboxsw | will try to kill 2 birds with 1 stone there | 22:22 |
blackboxsw | vila: comment for you on your bug. ok systemd has removed the cloud-init.service job for some reason and I need to dig into why | 22:31 |
vila | haaaa | 22:44 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!