[16:19] <smoser> blackboxsw: if you had a minute
[16:20] <smoser>   - bug 1770712 fixes for ubuntu package branches.
[16:20] <smoser>     devel https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347380
[16:20] <smoser>     bionic https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347381
[16:20] <smoser>     artful https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347382
[16:20] <smoser>     xenial https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347384
[16:20] <smoser> that'd be good to land into cosmic today... just get those all in and an upload would be good.
[16:23] <blackboxsw> +1
[16:23] <blackboxsw> will do
[18:05] <blackboxsw> smoser: we're good on the devel portion of those branches, because PACKAGED_VERSION exists in cloudinit/version.py. in bionic and older branches we haven't yet pulled back 5446c788160412189200c6cc688b14c9f9071943
[18:05] <blackboxsw> shouldn't we pull that back too ?
[18:05] <blackboxsw> I realize the packaging change doesn't break currently, but it also doesn't do anything yet
[18:08] <smoser> blackboxsw: well, the next new-upstream-snapshot will get it
[18:08] <smoser> so at this point it is just "staged".
[18:08] <smoser> you're correct though in that it basically adds dead code.
[18:09] <smoser> (the daily builds *would* have it )
[18:09] <blackboxsw> ok just wanted to make sure this was intended, they are decoupled from each other kindof, so I wanted to confirm that we are staging it and know that we are not yet expecting to report full pkg version number in <= Bionic ok I'm good
[18:12] <blackboxsw> https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/347380 with nit on UNRELEASED -> cosmic
[18:12] <blackboxsw> going through the rest now
[18:12] <blackboxsw> to approve
[18:12] <smoser> blackboxsw: if i uploaded to cosmic i think i'd just do a new-upstream-snapshot
[18:13] <smoser> whicih would then dtrt
[18:13] <blackboxsw> +1 good dela
[18:13] <blackboxsw> deal
[18:13] <smoser> so same there... rwe're just "staging" a change basicalkl.  i think that is generally the flow we'd have on all changes to the packaging branches.
[18:15] <blackboxsw> smoser: want me to queue a new-upstream snapshot then for cosmic.
[18:15] <blackboxsw> ?
[18:15] <blackboxsw> and you can merge in your existing branches?
[18:15] <smoser> you can if you'd like. or i can just do it.
[18:15] <smoser> i'll pull existing.
[18:16] <smoser> merged devel
[18:17] <blackboxsw> ok putting up MP
[18:17] <smoser> i'll grab the others too
[18:18] <smoser> ok. all pacikaging branches have it now.
[18:21] <blackboxsw> smoser: testing this now https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/347396
[18:21] <blackboxsw> I didn't push the tag
[18:23] <blackboxsw> ok package built and proper _package_version is showing up
[18:26] <smoser> blackboxsw: doing 'build and push' so that willl land in cosmic-proposed shortly
[19:06] <blackboxsw> rharper: have 5 mins to chat about https://trello.com/c/Uk7OA71K/798-cloud-provided-network-configuration-openstack-azure-aws ?
[19:06] <blackboxsw> specifically the azure portion
[19:34] <rharper> blackboxsw: here now;  still need to chat?
[19:43] <blackboxsw> cool rharper yeah just for a couple mins
[19:43] <blackboxsw> *-mtj
[19:43] <rharper> ok, lemme get setup
[19:44] <blackboxsw> headphone trouble here. coming
[19:56] <blackboxsw> rharper: lost you at 'wouldn't really'
[20:33] <vila> hi there,
[20:34] <vila> I'm encountering an issue that is hard to debug :-/
[20:35] <vila> A few months ago I did install cloud-init on scaleway images (17.1 was the cloud-init version) then.
[20:35] <vila> Booting from these images worked fine.
[20:36] <blackboxsw> https://bugs.launchpad.net/cloud-init/+bug/1775074 filed.
[20:36] <blackboxsw> hi vila, yeah just explain the prob as best you can, maybe someone can help
[20:36] <vila> I'm now using the exact same scripts to install cloud-init (18.2) on more recent images and things break:
[20:37] <vila> 2018-06-04 19:37:35,378 - stages.py[DEBUG]: cache invalid in datasource: DataSourceScaleway
[20:37] <vila> 2018-06-04 19:37:35,378 - handlers.py[DEBUG]: finish: init-local/check-cache: SUCCESS: cache invalid in datasource: DataSourceScaleway
[20:38] <vila> on top of that, then run finished by creating /var/lib/cloud/instance/boot-finished at a time where /var/lib/cloud/instance does not exist (i.e.the '/var/lib/cloud/instance' symlink), so a dir is created instead
[20:39] <vila> further runs of cloud-init then fail because then can't delete the dir (a symlink is expected)
[20:41] <vila> any hints on how such issues can be debugged highly appreciated
[20:42] <blackboxsw> vila: hrm looking. That specific log message on "cache invalid", means that the datasource will attempt to re-run  metadata collection again because it appeared that that instance cache was invalid (and needed a refresh).
[20:42] <vila> blackboxsw: Where and how is the cache said to be invalid ?
[20:42] <vila> blackboxsw: I was able to unpickle it from python
[20:43] <blackboxsw> specifically in /usr/lib/python3/dist-packages/cloudinit/stages.py
[20:43] <blackboxsw> it checks datasource.check_instance_id
[20:43] <vila> yeah, opened already
[20:44] <blackboxsw> in Scaleway it looks like that always returns False *I think*
[20:44] <blackboxsw> which means always re-run get-data
[20:44] <vila> when I unpickled it, I check hasattr(ds, 'check_instance_id')
[20:44] <vila> but I was unclear about inferring self.cfg
[20:45] <vila> I guessed it was the instance-id and as far as I could see it was the same instance (I did a reboot)
[20:45] <vila> blackboxsw: I have a vague feeeling it may related to /var/lib/cloud/instance being deleted at that point but I don't where to look for that
[20:45] <blackboxsw> from the base class cloudinit/sources/__init__.py:DataSource.check_instance_id  is just a dummy function returning False and I don't see that Scaleway is overriding that
[20:46] <vila> nope indeed
[20:46] <vila> and it used to work
[20:46] <vila> let's rewing a bit may be
[20:46] <vila> rewind
[20:46] <vila> starting from a booted image, I run apt-get install cloud-init --no-install-recommends
[20:47] <blackboxsw> please do.
[20:47] <vila> anything I need to do for c-i to behave at next boot (same instance or different one)
[20:47] <blackboxsw> A way to test a clean boot scenario  from cloud-init would be hrm so 'sudo cloud-init clean --logs --reboot' would perform a
[20:47] <blackboxsw> 'greenfield' install as if the system had never run cloud-init before
[20:47] <vila> \o/
[20:47] <blackboxsw> it's what we use for upgrade testing and fresh boot validatin
[20:47] <blackboxsw> validation
[20:48] <blackboxsw> that will blow away /var/log/cloud* /var/lib/cloud/* with the exception of a /var/lib/cloud/seed  subdir if applicable (as that seeds some metadata on some clouds)
[20:49] <vila> blackboxsw: done
[20:49] <vila> cloud-init analyze show
[20:49] <blackboxsw> http://cloudinit.readthedocs.io/en/latest/topics/capabilities.html#cloud-init-collect-logs for more details
[20:50] <blackboxsw> yeah analyze show is good for quick inspection of what cloud-init performed.
[20:50] <vila> says no cache found but fails to find the scaleway ds
[20:51] <vila> and cloud-init.log shows the scaleway datasource is not seen nor used
[20:51] <blackboxsw> cloud-init status --long?
[20:52] <vila> detail:
[20:52] <vila> ('ssh-authkey-fingerprints', KeyError('getpwnam(): name not found: ubuntu',))
[20:52] <vila> but that's a fallout from not using the ds and not finding the user-data
[20:52] <blackboxsw> hrm, ok so we have a couple errors looks like
[20:52] <blackboxsw> right
[20:52] <blackboxsw> hmm
[20:52] <vila> so, red herring
[20:52] <vila> why is the datasouce missed ?
[20:53] <vila> I have datasource_list: [ Scaleway, None]
[20:53] <vila> and disable_root: false
[20:53] <vila> (right, forgot to mention that I added the later because scaleway default login is on root)
[20:53] <vila> which was a first hint that things behave differently
[20:54] <blackboxsw> that's good at least. mind doing a 'sudo cloud-init collect-logs' and sending an email to chad.smith@canonical.com. I can glance at it quickly here
[20:54] <blackboxsw> collect-logs will dump cloud-init.tar.gz in your cwd
[20:54] <blackboxsw> it'll contain all logs and potentially your user-data
[20:55] <vila> no worries, nothing secret there
[20:55] <blackboxsw> good deal
[20:59] <vila> sent
[20:59] <blackboxsw> checking thanks
[21:01] <vila> I've tried various workflows giving different results, for example, after installing, running 'systemctl start cloud-final && cloud-init status --wait' find the datasource and process properly, but the next boot fails
[21:02] <vila> right now, for the logs I sent, I have a broken /var/lib/cloud/instance (a dir rather than a symlink)
[21:02] <blackboxsw> vila: hrm, normally in cloud-init logs I
[21:03] <blackboxsw> am accustomed to seeing init-local stage, then init   then modules:config, but your logs skip the 'init' stage
[21:03] <blackboxsw> can you cloud-init analyze show | pastebinit
[21:03] <blackboxsw> can you 'cloud-init analyze show | pastebinit'
[21:04] <vila> https://paste.ubuntu.com/p/ZwG4BnnJRd/
[21:04] <blackboxsw> normally I'd see an Starting stage: init-network after init-local and before modules-config
[21:04] <blackboxsw> hrm
[21:05] <blackboxsw> can you cat /etc/cloud/cloud.cfg | pastebinit
[21:05] <vila> https://paste.ubuntu.com/p/Gb6Mzxms9q/ <- that one worked
[21:05] <vila> like ~2 hours ago
[21:06] <vila> blackboxsw: /etc/cloud/cloud.cfg is untouched, but I add:
[21:06] <vila> cat <<EOC > /etc/cloud/cloud.cfg.d/99_byov.cfg
[21:06] <vila> # Generated by byov at $(date)
[21:06] <vila> datasource_list: [ Scaleway, None]
[21:06] <vila> apt_preserve_sources_list: true
[21:06] <vila> disable_root: false
[21:06] <vila> EOC
[21:07] <vila> https://paste.ubuntu.com/p/zVQDMtmG65/ <-  cat /etc/cloud/cloud.cfg
[21:08] <vila> but yeah, it's the missing init-network that I'm tracking indeed
[21:09] <blackboxsw> interesting. so your cloud-init.log mentions 2018-06-04 20:48:32,448 - __init__.py[DEBUG]: Searching for local data source in: []
[21:09] <blackboxsw> that list should have represented Scaleway in it
[21:09] <blackboxsw> something is modifying that datasource list
[21:09] <vila> exactly, sometimes it's there sometimes it's not
[21:09] <blackboxsw> any other files int /etc/cloud/cloud.cfg.d
[21:10] <vila> and it seems the invalid cache somehow mark the datasource entirely wrong
[21:10] <blackboxsw> like /etc/cloud/cloud.cfg.d/90_dpkg.cfg ?
[21:10] <vila> blackboxsw: nope, I used to have datasource_list: [ NoCloud, OpenStack, Scaleway, None] when that was working
[21:10] <vila> right, that one is overriedn... oh, let me check
[21:10] <vila> nope, standard content:
[21:11] <vila> # to update this file, run dpkg-reconfigure cloud-init
[21:11] <vila> datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudStack, None ]
[21:11] <vila> and scaleway is there
[21:11] <blackboxsw> ok that's good. yeah and your ds-itentify.log in the cloud-init.tar.gz also shows ds-identify properly detected Scaleway as an option
[21:11] <vila> Also, I could find when ds-identify is run, but I noticed it's run more than once in some scenarios
[21:11] <blackboxsw> ds-identify rather
[21:12] <vila> Also, I could NOT find when ds-identify is run, but I noticed it's run more than once in some scenarios
[21:13] <vila> so, I keep 'cloud-init clean --logs --reboot' in my notes for the future, but it failed here
[21:14] <vila> which reproduces my issue so it still a good recipe but it doesn't give the result you expected I think
[21:16] <blackboxsw> vila: on the failed case, 'systemctl list-dependencies | grep cloud' this is what I see http://paste.ubuntu.com/p/TQM8RWbwkP/
[21:17] <blackboxsw> I'd expect a cloud-init.server job/unit listed in systemd. it's what runs 'cloud-init init' which is the network stage that we are missing in your failed case
[21:17] <blackboxsw> not sure I'm going down a rat hole there
[21:17] <blackboxsw> not sure *if*
[21:18] <vila> blackboxsw: right, I can rebuild the instance without installing cloud-init and restart from there may be ?
[21:19] <vila> blackboxsw: once cloud-init is installed, I'm running https://paste.ubuntu.com/p/xtGNYpfzZT/
[21:19] <blackboxsw> sounds like a good plan, installing the new cloud-init deb in your environment after the fact should take care of creating the right systemd generators to queue cloud-init stages during boot (if something got mangled across the upgrade path)
[21:20] <blackboxsw> vila: running cloud-init init-local; and cloud-init init 'naked' outside of the standard boot process on an instance is not exactly what we intended (and could be rife with some error condtions)
[21:20] <vila> blackboxsw: I used to run nothing and all was rosy ;-)
[21:20] <blackboxsw> yeah nothing is what we hope is always rosy (and intended).    Just booting normally should take care of ordering all cloud-init stages appropriately (including module configuration etc).
[21:20] <vila> blackboxsw: what *is* the intended workflow ? install, save image, boot ?
[21:21] <blackboxsw> yes vila, boot clean image, install cloud-init, power off, copy clean image, let cloud-init boot in user-configured environment to collect and config based on metadata/user-data
[21:21] <blackboxsw> trying to look more at your latest paste
[21:22] <vila> damn it, that was I did and I thought may be I missed a step
[21:27] <blackboxsw> vila, yeah something smells funky, (I don't have a scaleway acct unfortuntately), I'll try to bisect the diffs on Scaleway datasource from 17.1 -> 18.2 I didn't think we have anything significant in that upgrade path other than some exception handling changes on url retries  in that space
[21:28] <vila> blackboxsw: yup, went there saw that, could find a link either (but I'm not the expert ;-)
[21:29] <blackboxsw> I'd like to see a /var/log/cloud-init.log in the case where cloud-init was upgraded and only a reboot run. (not manual run of cloud-init init --local and 'cloud-init init').
[21:29] <vila> just got the instance without cloud-init installed
[21:29] <vila> so, apt-get cloud-init --no-install-recommends
[21:29] <vila> *install
[21:32] <vila> reboot
[21:33] <vila> https://paste.ubuntu.com/p/HDhH8hWpmd/
[21:34] <vila> yet https://paste.ubuntu.com/p/z6wXtF6Jtx/
[21:35] <vila> blackboxsw: you said "in the case where cloud-init was upgraded" s/upgraded/installed/ otherwise, nothing from my script
[21:38] <blackboxsw> ok so Scaleway ordered before Ec2, Ec2 considered maybe, that shouldn't break anything specifically.for Scaleway's datasource. reading your cloud-init.log
[21:38] <blackboxsw> 2018-06-04 21:31:31,651 - stages.py[DEBUG]: no cache found    [21:39] <vila> right, so the cache itself is not the root cause, well done
[21:40] <blackboxsw> line 74 of your first paste is showing us we are properly attempting  the discover Scaleway (and many other datasources) in python (instead of ds-identify which is just a shell script (for speed)
[21:40] <vila> and https://paste.ubuntu.com/p/PJ2C5tQsSJ/ should cover all the datasource inputs
[21:40] <blackboxsw> line 78 rather
[21:41] <blackboxsw> ohh wait
[21:41] <vila> right
[21:41] <blackboxsw> no Scaleway in line 78
[21:41] <vila> while still there in line 77
[21:42]  * vila thinks
[21:42] <vila> could it be that /var/run/scaleway is created too late (aka race ?)
[21:42] <blackboxsw> I had thought Scaleway datasource was defined as FILESYSTEM only. checking the DataSourceScaleway.py again
[21:42] <blackboxsw> my bad
[21:42] <blackboxsw>     (DataSourceScaleway, (sources.DEP_FILESYSTEM, sources.DEP_NETWORK)),
[21:43] <blackboxsw> that means Scaleway datasource is init-network stage detected only ... ok so we expect to filtered out of init-local stage
[21:43] <blackboxsw> ok so we're still good in init-local stage (not detecting scaleway)
[21:44] <blackboxsw> but the fact that init-network (otherwise called via CLI as 'cloud-init init') should not be skipped,
[21:44] <blackboxsw> that's what should have detected scaleway....
[21:44] <blackboxsw> reading down past init-local in your cloud-init log now. sorry for the noise
[21:45] <vila> no no ! very helpful
[21:45] <vila> (and entertaining ;)
[21:45] <blackboxsw> 2018-06-04 21:31:32,137 - handlers.py[DEBUG]: finish: init-local: SUCCESS: searching for local datasources
[21:45] <blackboxsw> 2018-06-04 21:31:34,280 - util.py[DEBUG]: Cloud-init v. 18.2 running 'modules:config' at Mon, 04 Jun 2018 21:31:34 +0000. Up 16.05 seconds.
[21:45] <blackboxsw> heh
[21:45] <vila> (don't get derailed but line 136 : 2018-06-04 21:31:31,963 - util.py[DEBUG]: dmi data /sys/class/dmi/id/sys_vendor returned Scaleway)
[21:46] <vila> There is a comment that dmi is not implemented IIRC...
[21:46] <vila> * check DMI data: not yet implemented by Scaleway, but the check is made to
[21:46] <vila>       be future-proof.
[21:48] <blackboxsw> vila: what's systemctl list-dependencies | grep cloud tell you?
[21:48] <vila> https://paste.ubuntu.com/p/7SSDVB7ZsG/
[21:49] <blackboxsw> that's ok on the dmi read, as it was something cloud-init did to determine that it's not running on DigitalOcean.
[21:49] <vila> ha
[21:51] <blackboxsw> meh. something is causing cloud-init to skip init-network stage in that environment. (like a systemd job falling over maybe?) I see no tracebacks indicating why that is skipped. lemme see if I can digup the format of the systemd job
[21:51] <blackboxsw> do you have a /lib/systemd/system/cloud-init.service  ?
[21:51] <vila> yes
[21:52] <vila> https://paste.ubuntu.com/p/JgDkpMCy5v/
[21:55] <blackboxsw> bah. ok I think we need a bug here. I'll have to get a scaleway account setup to checkit out. nothing should have changed w.r.t. 17.1->18.2 and the systemd startup jobs/units. but skipping init-network stage is broken and that's why things are falling over.  I'll have to get a scaleway acct setup to triage more
[21:55] <blackboxsw> what ubuntu release was this instance?
[21:55] <vila> xenial
[21:55] <blackboxsw> bionic? xenial?
[21:55] <blackboxsw> ok
[21:56] <vila> blackboxsw:
[21:56] <blackboxsw> would you kindly 'ubuntu-bug cloud-init' vila and file a bug per instructions?
[21:56] <blackboxsw> it'll dump your collect-logs output into a bug attachement
[21:57] <vila> blackboxsw: from inside the instance ?
[21:58] <vila> -bash: ubuntu-bug: command not found, installing
[21:58] <blackboxsw> vila: yes please (if it has outbound connectivity). otherwise  you could just file a bug at https://bugs.launchpad.net/cloud-init/+filebug and attach a the cloud-init.tar.gz from your latest run to the bug
[21:59] <blackboxsw> all ubuntu-bug does is ask a question or two about your cloud platform and collate that data when filing output from 'sudo cloud-init collect-logs'
[21:59]  * vila installs apport
[22:00]  * vila thinks about giving access... should be a matter of adding an ssh key on my account ?
[22:01] <blackboxsw> yeah in the nearterm your sudo cloud-init init --local; sudo cloud-init init;  sudo cloud-init modules --mode config,  sudo cloud-init modules --mode final    I *think* should get you 90% of the way there
[22:01] <blackboxsw> vila: right you could add ssh-import-id chad.smith to the instance
[22:01] <blackboxsw> vila: right you could run 'ssh-import-id chad.smith' to the instance then I'd be able to login as whatever user you did that under
[22:02] <vila> blackboxsw: root ! what else ? :-D
[22:02] <blackboxsw> hah! but that said, I'm going to have to disappear shortly so I may not get to it until tomorrow morn my time
[22:02] <blackboxsw> <--- and file your back acct and social security number here ;)
[22:03] <blackboxsw> it may be good to have a reference bug so the others on the team can peek at the triage/respose too
[22:03] <vila> hehe
[22:04] <vila> https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1775086
[22:06] <blackboxsw> thanks again. vila I have to bail for a while. will check it out
[22:07] <vila> blackboxsw: thanks to you, at least I'm not mad and something is going on that is worth fixing ;-)
[22:07] <vila> blackboxsw: if only for /var/lib/cloud/instance being a dir...
[22:19] <blackboxsw> thx vila on the not being a dir issue I'll track a separate bug on 'cloud-init collect-logs' cmd being more resilient of failure cases
[22:22] <blackboxsw> added that content to https://bugs.launchpad.net/cloud-init/+bug/1775074
[22:22] <blackboxsw> will try to kill 2 birds with 1 stone there
[22:31] <blackboxsw> vila: comment for you on your bug. ok systemd has removed the cloud-init.service job for some reason and I need to dig into why
[22:44] <vila> haaaa