/srv/irclogs.ubuntu.com/2018/06/04/#cloud-init.txt

=== frickler_ is now known as frickler
=== r-daneel_ is now known as r-daneel
=== rangerpb is now known as baude
smoserblackboxsw: if you had a minute16:19
smoser  - bug 1770712 fixes for ubuntu package branches.16:20
ubot5bug 1770712 in cloud-init (Ubuntu Cosmic) "It would be nice if cloud-init provides full version in logs" [Medium,Confirmed] https://launchpad.net/bugs/177071216:20
smoser    devel https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/34738016:20
smoser    bionic https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/34738116:20
smoser    artful https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/34738216:20
smoser    xenial https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/34738416:20
smoserthat'd be good to land into cosmic today... just get those all in and an upload would be good.16:20
blackboxsw+116:23
blackboxswwill do16:23
blackboxswsmoser: we're good on the devel portion of those branches, because PACKAGED_VERSION exists in cloudinit/version.py. in bionic and older branches we haven't yet pulled back 5446c788160412189200c6cc688b14c9f907194318:05
blackboxswshouldn't we pull that back too ?18:05
blackboxswI realize the packaging change doesn't break currently, but it also doesn't do anything yet18:05
smoserblackboxsw: well, the next new-upstream-snapshot will get it18:08
smoserso at this point it is just "staged".18:08
smoseryou're correct though in that it basically adds dead code.18:08
smoser(the daily builds *would* have it )18:09
blackboxswok just wanted to make sure this was intended, they are decoupled from each other kindof, so I wanted to confirm that we are staging it and know that we are not yet expecting to report full pkg version number in <= Bionic ok I'm good18:09
blackboxswhttps://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347380 with nit on UNRELEASED -> cosmic18:12
blackboxswgoing through the rest now18:12
blackboxswto approve18:12
smoserblackboxsw: if i uploaded to cosmic i think i'd just do a new-upstream-snapshot18:12
smoserwhicih would then dtrt18:13
blackboxsw+1 good dela18:13
blackboxswdeal18:13
smoserso same there... rwe're just "staging" a change basicalkl.  i think that is generally the flow we'd have on all changes to the packaging branches.18:13
blackboxswsmoser: want me to queue a new-upstream snapshot then for cosmic.18:15
blackboxsw?18:15
blackboxswand you can merge in your existing branches?18:15
smoseryou can if you'd like. or i can just do it.18:15
smoseri'll pull existing.18:15
smosermerged devel18:16
blackboxswok putting up MP18:17
smoseri'll grab the others too18:17
smoserok. all pacikaging branches have it now.18:18
blackboxswsmoser: testing this now https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/34739618:21
blackboxswI didn't push the tag18:21
blackboxswok package built and proper _package_version is showing up18:23
smoserblackboxsw: doing 'build and push' so that willl land in cosmic-proposed shortly18:26
blackboxswrharper: have 5 mins to chat about https://trello.com/c/Uk7OA71K/798-cloud-provided-network-configuration-openstack-azure-aws ?19:06
blackboxswspecifically the azure portion19:06
rharperblackboxsw: here now;  still need to chat?19:34
blackboxswcool rharper yeah just for a couple mins19:43
blackboxsw*-mtj19:43
rharperok, lemme get setup19:43
blackboxswheadphone trouble here. coming19:44
blackboxswrharper: lost you at 'wouldn't really'19:56
vilahi there,20:33
vilaI'm encountering an issue that is hard to debug :-/20:34
vilaA few months ago I did install cloud-init on scaleway images (17.1 was the cloud-init version) then.20:35
vilaBooting from these images worked fine.20:35
blackboxswhttps://bugs.launchpad.net/cloud-init/+bug/1775074 filed.20:36
ubot5Ubuntu bug 1775074 in cloud-init "collect logs: grab /var/lib/cloud data files" [Medium,Confirmed]20:36
blackboxswhi vila, yeah just explain the prob as best you can, maybe someone can help20:36
vilaI'm now using the exact same scripts to install cloud-init (18.2) on more recent images and things break:20:36
vila2018-06-04 19:37:35,378 - stages.py[DEBUG]: cache invalid in datasource: DataSourceScaleway20:37
vila2018-06-04 19:37:35,378 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: cache invalid in datasource: DataSourceScaleway20:37
vilaon top of that, then run finished by creating /var/lib/cloud/instance/boot-finished at a time where /var/lib/cloud/instance does not exist (i.e.the '/var/lib/cloud/instance' symlink), so a dir is created instead20:38
vilafurther runs of cloud-init then fail because then can't delete the dir (a symlink is expected)20:39
vilaany hints on how such issues can be debugged highly appreciated20:41
blackboxswvila: hrm looking. That specific log message on "cache invalid", means that the datasource will attempt to re-run  metadata collection again because it appeared that that instance cache was invalid (and needed a refresh).20:42
vilablackboxsw: Where and how is the cache said to be invalid ?20:42
vilablackboxsw: I was able to unpickle it from python20:42
blackboxswspecifically in /usr/lib/python3/dist-packages/cloudinit/stages.py20:43
blackboxswit checks datasource.check_instance_id20:43
vilayeah, opened already20:43
blackboxswin Scaleway it looks like that always returns False *I think*20:44
blackboxswwhich means always re-run get-data20:44
vilawhen I unpickled it, I check hasattr(ds, 'check_instance_id')20:44
vilabut I was unclear about inferring self.cfg20:44
vilaI guessed it was the instance-id and as far as I could see it was the same instance (I did a reboot)20:45
vilablackboxsw: I have a vague feeeling it may related to /var/lib/cloud/instance being deleted at that point but I don't where to look for that20:45
blackboxswfrom the base class cloudinit/sources/__init__.py:DataSource.check_instance_id  is just a dummy function returning False and I don't see that Scaleway is overriding that20:45
vilanope indeed20:46
vilaand it used to work20:46
vilalet's rewing a bit may be20:46
vilarewind20:46
vilastarting from a booted image, I run apt-get install cloud-init --no-install-recommends20:46
blackboxswplease do.20:47
vilaanything I need to do for c-i to behave at next boot (same instance or different one)20:47
blackboxswA way to test a clean boot scenario  from cloud-init would be hrm so 'sudo cloud-init clean --logs --reboot' would perform a20:47
blackboxsw'greenfield' install as if the system had never run cloud-init before20:47
vila\o/20:47
blackboxswit's what we use for upgrade testing and fresh boot validatin20:47
blackboxswvalidation20:47
blackboxswthat will blow away /var/log/cloud* /var/lib/cloud/* with the exception of a /var/lib/cloud/seed  subdir if applicable (as that seeds some metadata on some clouds)20:48
vilablackboxsw: done20:49
vilacloud-init analyze show20:49
blackboxswhttp://cloudinit.readthedocs.io/en/latest/topics/capabilities.html#cloud-init-collect-logs for more details20:49
blackboxswyeah analyze show is good for quick inspection of what cloud-init performed.20:50
vilasays no cache found but fails to find the scaleway ds20:50
vilaand cloud-init.log shows the scaleway datasource is not seen nor used20:51
blackboxswcloud-init status --long?20:51
viladetail:20:52
vila('ssh-authkey-fingerprints', KeyError('getpwnam(): name not found: ubuntu',))20:52
vilabut that's a fallout from not using the ds and not finding the user-data20:52
blackboxswhrm, ok so we have a couple errors looks like20:52
blackboxswright20:52
blackboxswhmm20:52
vilaso, red herring20:52
vilawhy is the datasouce missed ?20:52
vilaI have datasource_list: [ Scaleway, None]20:53
vilaand disable_root: false20:53
vila(right, forgot to mention that I added the later because scaleway default login is on root)20:53
vilawhich was a first hint that things behave differently20:53
blackboxswthat's good at least. mind doing a 'sudo cloud-init collect-logs' and sending an email to chad.smith@canonical.com. I can glance at it quickly here20:54
blackboxswcollect-logs will dump cloud-init.tar.gz in your cwd20:54
blackboxswit'll contain all logs and potentially your user-data20:54
vilano worries, nothing secret there20:55
blackboxswgood deal20:55
vilasent20:59
blackboxswchecking thanks20:59
vilaI've tried various workflows giving different results, for example, after installing, running 'systemctl start cloud-final && cloud-init status --wait' find the datasource and process properly, but the next boot fails21:01
vilaright now, for the logs I sent, I have a broken /var/lib/cloud/instance (a dir rather than a symlink)21:02
blackboxswvila: hrm, normally in cloud-init logs I21:02
blackboxswam accustomed to seeing init-local stage, then init   then modules:config, but your logs skip the 'init' stage21:03
blackboxswcan you cloud-init analyze show | pastebinit21:03
blackboxswcan you 'cloud-init analyze show | pastebinit'21:03
vilahttps://paste.ubuntu.com/p/ZwG4BnnJRd/21:04
blackboxswnormally I'd see an Starting stage: init-network after init-local and before modules-config21:04
blackboxswhrm21:04
blackboxswcan you cat /etc/cloud/cloud.cfg | pastebinit21:05
vilahttps://paste.ubuntu.com/p/Gb6Mzxms9q/ <- that one worked21:05
vilalike ~2 hours ago21:05
vilablackboxsw: /etc/cloud/cloud.cfg is untouched, but I add:21:06
vilacat <<EOC > /etc/cloud/cloud.cfg.d/99_byov.cfg21:06
vila# Generated by byov at $(date)21:06
viladatasource_list: [ Scaleway, None]21:06
vilaapt_preserve_sources_list: true21:06
viladisable_root: false21:06
vilaEOC21:06
vilahttps://paste.ubuntu.com/p/zVQDMtmG65/ <-  cat /etc/cloud/cloud.cfg21:07
vilabut yeah, it's the missing init-network that I'm tracking indeed21:08
blackboxswinteresting. so your cloud-init.log mentions 2018-06-04 20:48:32,448 - __init__.py[DEBUG]: Searching for local data source in: []21:09
blackboxswthat list should have represented Scaleway in it21:09
blackboxswsomething is modifying that datasource list21:09
vilaexactly, sometimes it's there sometimes it's not21:09
blackboxswany other files int /etc/cloud/cloud.cfg.d21:09
vilaand it seems the invalid cache somehow mark the datasource entirely wrong21:10
blackboxswlike /etc/cloud/cloud.cfg.d/90_dpkg.cfg ?21:10
vilablackboxsw: nope, I used to have datasource_list: [ NoCloud, OpenStack, Scaleway, None] when that was working21:10
vilaright, that one is overriedn... oh, let me check21:10
vilanope, standard content:21:10
vila# to update this file, run dpkg-reconfigure cloud-init21:11
viladatasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, None ]21:11
vilaand scaleway is there21:11
blackboxswok that's good. yeah and your ds-itentify.log in the cloud-init.tar.gz also shows ds-identify properly detected Scaleway as an option21:11
vilaAlso, I could find when ds-identify is run, but I noticed it's run more than once in some scenarios21:11
blackboxswds-identify rather21:11
vilaAlso, I could NOT find when ds-identify is run, but I noticed it's run more than once in some scenarios21:12
vilaso, I keep 'cloud-init clean --logs --reboot' in my notes for the future, but it failed here21:13
vilawhich reproduces my issue so it still a good recipe but it doesn't give the result you expected I think21:14
blackboxswvila: on the failed case, 'systemctl list-dependencies | grep cloud' this is what I see http://paste.ubuntu.com/p/TQM8RWbwkP/21:16
blackboxswI'd expect a cloud-init.server job/unit listed in systemd. it's what runs 'cloud-init init' which is the network stage that we are missing in your failed case21:17
blackboxswnot sure I'm going down a rat hole there21:17
blackboxswnot sure *if*21:17
vilablackboxsw: right, I can rebuild the instance without installing cloud-init and restart from there may be ?21:18
vilablackboxsw: once cloud-init is installed, I'm running https://paste.ubuntu.com/p/xtGNYpfzZT/21:19
blackboxswsounds like a good plan, installing the new cloud-init deb in your environment after the fact should take care of creating the right systemd generators to queue cloud-init stages during boot (if something got mangled across the upgrade path)21:19
blackboxswvila: running cloud-init init-local; and cloud-init init 'naked' outside of the standard boot process on an instance is not exactly what we intended (and could be rife with some error condtions)21:20
vilablackboxsw: I used to run nothing and all was rosy ;-)21:20
blackboxswyeah nothing is what we hope is always rosy (and intended).    Just booting normally should take care of ordering all cloud-init stages appropriately (including module configuration etc).21:20
vilablackboxsw: what *is* the intended workflow ? install, save image, boot ?21:20
blackboxswyes vila, boot clean image, install cloud-init, power off, copy clean image, let cloud-init boot in user-configured environment to collect and config based on metadata/user-data21:21
blackboxswtrying to look more at your latest paste21:21
viladamn it, that was I did and I thought may be I missed a step21:22
blackboxswvila, yeah something smells funky, (I don't have a scaleway acct unfortuntately), I'll try to bisect the diffs on Scaleway datasource from 17.1 -> 18.2 I didn't think we have anything significant in that upgrade path other than some exception handling changes on url retries  in that space21:27
vilablackboxsw: yup, went there saw that, could find a link either (but I'm not the expert ;-)21:28
blackboxswI'd like to see a /var/log/cloud-init.log in the case where cloud-init was upgraded and only a reboot run. (not manual run of cloud-init init --local and 'cloud-init init').21:29
vilajust got the instance without cloud-init installed21:29
vilaso, apt-get cloud-init --no-install-recommends21:29
vila*install21:29
vilareboot21:32
vilahttps://paste.ubuntu.com/p/HDhH8hWpmd/21:33
vilayet https://paste.ubuntu.com/p/z6wXtF6Jtx/21:34
vilablackboxsw: you said "in the case where cloud-init was upgraded" s/upgraded/installed/ otherwise, nothing from my script21:35
blackboxswok so Scaleway ordered before Ec2, Ec2 considered maybe, that shouldn't break anything specifically.for Scaleway's datasource. reading your cloud-init.log21:38
blackboxsw2018-06-04 21:31:31,651 - stages.py[DEBUG]: no cache found    === fresh boot, no cruft from previous around21:38
vilaright, so the cache itself is not the root cause, well done21:39
blackboxswline 74 of your first paste is showing us we are properly attempting  the discover Scaleway (and many other datasources) in python (instead of ds-identify which is just a shell script (for speed)21:40
vilaand https://paste.ubuntu.com/p/PJ2C5tQsSJ/ should cover all the datasource inputs21:40
blackboxswline 78 rather21:40
blackboxswohh wait21:41
vilaright21:41
blackboxswno Scaleway in line 7821:41
vilawhile still there in line 7721:41
* vila thinks21:42
vilacould it be that /var/run/scaleway is created too late (aka race ?)21:42
blackboxswI had thought Scaleway datasource was defined as FILESYSTEM only. checking the DataSourceScaleway.py again21:42
blackboxswmy bad21:42
blackboxsw    (DataSourceScaleway, (sources.DEP_FILESYSTEM, sources.DEP_NETWORK)),21:42
blackboxswthat means Scaleway datasource is init-network stage detected only ... ok so we expect to filtered out of init-local stage21:43
blackboxswok so we're still good in init-local stage (not detecting scaleway)21:43
blackboxswbut the fact that init-network (otherwise called via CLI as 'cloud-init init') should not be skipped,21:44
blackboxswthat's what should have detected scaleway....21:44
blackboxswreading down past init-local in your cloud-init log now. sorry for the noise21:44
vilano no ! very helpful21:45
vila(and entertaining ;)21:45
blackboxsw2018-06-04 21:31:32,137 - handlers.py[DEBUG]: finish: init-local: SUCCESS: searching for local datasources21:45
blackboxsw2018-06-04 21:31:34,280 - util.py[DEBUG]: Cloud-init v. 18.2 running 'modules:config' at Mon, 04 Jun 2018 21:31:34 +0000. Up 16.05 seconds.21:45
blackboxswheh21:45
vila(don't get derailed but line 136 : 2018-06-04 21:31:31,963 - util.py[DEBUG]: dmi data /sys/class/dmi/id/sys_vendor returned Scaleway)21:45
vilaThere is a comment that dmi is not implemented IIRC...21:46
vila* check DMI data: not yet implemented by Scaleway, but the check is made to21:46
vila      be future-proof.21:46
blackboxswvila: what's systemctl list-dependencies | grep cloud tell you?21:48
vilahttps://paste.ubuntu.com/p/7SSDVB7ZsG/21:48
blackboxswthat's ok on the dmi read, as it was something cloud-init did to determine that it's not running on DigitalOcean.21:49
vilaha21:49
blackboxswmeh. something is causing cloud-init to skip init-network stage in that environment. (like a systemd job falling over maybe?) I see no tracebacks indicating why that is skipped. lemme see if I can digup the format of the systemd job21:51
blackboxswdo you have a /lib/systemd/system/cloud-init.service  ?21:51
vilayes21:51
vilahttps://paste.ubuntu.com/p/JgDkpMCy5v/21:52
blackboxswbah. ok I think we need a bug here. I'll have to get a scaleway account setup to checkit out. nothing should have changed w.r.t. 17.1->18.2 and the systemd startup jobs/units. but skipping init-network stage is broken and that's why things are falling over.  I'll have to get a scaleway acct setup to triage more21:55
blackboxswwhat ubuntu release was this instance?21:55
vilaxenial21:55
blackboxswbionic? xenial?21:55
blackboxswok21:55
vilablackboxsw:21:56
blackboxswwould you kindly 'ubuntu-bug cloud-init' vila and file a bug per instructions?21:56
blackboxswit'll dump your collect-logs output into a bug attachement21:56
vilablackboxsw: from inside the instance ?21:57
vila-bash: ubuntu-bug: command not found, installing21:58
blackboxswvila: yes please (if it has outbound connectivity). otherwise  you could just file a bug at https://bugs.launchpad.net/cloud-init/+filebug and attach a the cloud-init.tar.gz from your latest run to the bug21:58
blackboxswall ubuntu-bug does is ask a question or two about your cloud platform and collate that data when filing output from 'sudo cloud-init collect-logs'21:59
* vila installs apport21:59
* vila thinks about giving access... should be a matter of adding an ssh key on my account ?22:00
blackboxswyeah in the nearterm your sudo cloud-init init --local; sudo cloud-init init;  sudo cloud-init modules --mode config,  sudo cloud-init modules --mode final    I *think* should get you 90% of the way there22:01
blackboxswvila: right you could add ssh-import-id chad.smith to the instance22:01
blackboxswvila: right you could run 'ssh-import-id chad.smith' to the instance then I'd be able to login as whatever user you did that under22:01
vilablackboxsw: root ! what else ? :-D22:02
blackboxswhah! but that said, I'm going to have to disappear shortly so I may not get to it until tomorrow morn my time22:02
blackboxsw<--- and file your back acct and social security number here ;)22:02
blackboxswit may be good to have a reference bug so the others on the team can peek at the triage/respose too22:03
vilahehe22:03
vilahttps://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/177508622:04
ubot5Ubuntu bug 1775086 in cloud-init (Ubuntu) "cloud-init fails to recognize scaleway" [Undecided,New]22:04
blackboxswthanks again. vila I have to bail for a while. will check it out22:06
vilablackboxsw: thanks to you, at least I'm not mad and something is going on that is worth fixing ;-)22:07
vilablackboxsw: if only for /var/lib/cloud/instance being a dir...22:07
blackboxswthx vila on the not being a dir issue I'll track a separate bug on 'cloud-init collect-logs' cmd being more resilient of failure cases22:19
blackboxswadded that content to https://bugs.launchpad.net/cloud-init/+bug/177507422:22
ubot5Ubuntu bug 1775074 in cloud-init "collect logs: grab /var/lib/cloud data files" [Medium,Confirmed]22:22
blackboxswwill try to kill 2 birds with 1 stone there22:22
blackboxswvila: comment for you on your bug. ok systemd has removed the cloud-init.service job for some reason and I need to dig into why22:31
vilahaaaa22:44

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!