[07:49] blackboxsw: Odd_Bloke: Nice! I'll review the SRU for what I can today (enabling -proposed) [14:54] what is the correct way to pass a fqdn via user data? [14:56] nvm found it [15:05] Nick_A, what was it? [15:10] blackboxsw: it seems like one of my colleagues found a problem with our new datasource, but we can't really understand exactly what is happening. runcmd: in the #cloud-config doesn't seem to be picked up. Current hypothesis is that our merge of set_passwords to run always is triggering a version of https://bugs.launchpad.net/cloud-init/+bug/1532234 (list of cloud_config_modules is *overwritten*, not [15:10] Launchpad bug 1532234 in cloud-init "Merging with data in /etc/cloud/cloud.cfg does not work as expected" [High,Confirmed] [15:10] merged. Could you help confirm? [15:14] marlinc just fqdn: ____ [15:14] It's not necessary though - default hostname sets it as a fqdn if specified that way [16:13] how are boot records "split" for cloud-init? looking through cloud-init analyze i see that a machine i just provisioned and booted up has 3 different boot records [16:13] how did cloud-init determine to have multilple boot records in this case? [16:44] chillysurfer: boot records are based on each time cloud-init sees an init-local 'start' log in /var/log/cloud-init.log. That log line has the format: [16:45] Cloud-init v. running 'init-local' [16:45] so analyze counts the number of those logs in your log file as they are emitted each boot [16:46] blackboxsw: ahh ok i see [16:46] thanks for the explanation! [16:47] blackboxsw: so if you run `cloud-init init --local` manually though it'll create another boot record even though there was no reboot? [16:48] yeah it's all handled in either cloudinit/analyze/dump.py:parse_ci_logline which creates event json objects and cloudinit/analyze/__main__.py:analyze_show which parses those parsed log events by type/name and formats the boot output messages [16:48] chillysurfer: true. that would emit that log line and analyze would then lie I believe [16:48] s/lie/be wrong [16:49] blackboxsw: got it makes total sense! thanks! [16:49] chillysurfer: just validated that behavior [16:50] blackboxsw: I think the use of mergemanydict is wrong :/ Instead, we should look for the particular key we are interested in ("cloud_config_module"), and merge the list by hand (if we find that key, look for a set-password, if you find it, replace it with the two elements list ["set-passwords", "always"] [16:51] tribaal: I assume you are looking at the -proposed SRU content for Exoscale :/ [16:51] blackboxsw: correct. Well, I found the problem in Eoan... [16:51] tribaal: this is *good*, as we can try to resolve that on Eoan quickly and get that into the current SRU [16:52] but I'm afraid I don't know enough about the internals of cloud-init there [16:52] which just started friday [16:52] so no big 'loss' of test/devel time. I was just starting to test clouds today [16:52] tribaal: if you could post me access to an instance (ssh-import-id chad.smith) I can ssh into it and add rharper and we can poke around [16:53] blackboxsw: sure thing [16:54] blackboxsw: I created this instance with the following userdata: https://gist.github.com/chrisglass/fb0cf860be8cf01f456dfff8e162e004 [16:54] tribaal: can you also file a bug against cloud-init (as we'll need one to get it fixed in the SRU/Eoan) [16:54] blackboxsw: ack [16:54] blackboxsw: actually let me file the bug first and link all the relevant stuff there [16:54] that'd be great tribaal thx === blackboxsw changed the topic of #cloud-init to: Reviews: http://bit.ly/ci-reviews | Meeting minutes: https://goo.gl/mrHdaj | Next status meeting Sept 02 16:15 UTC | cloud-init v 19.2 (07/17) | https://bugs.launchpad.net/cloud-init/+filebug [17:02] blackboxsw: https://bugs.launchpad.net/cloud-init/+bug/1841454 [17:02] Launchpad bug 1841454 in cloud-init "Exoscale datasource overwrites *all* cloud_config_modules" [Undecided,New] [17:05] blackboxsw: I imported your pubkeys in "ssh ubuntu@159.100.241.237" [17:05] blackboxsw: it's a test machine on our preprod, so you can break anything you want [17:05] bonus points if you manage to break anything more than the instance itself :) [17:08] haha! thanks, tribaal ok, yeah something going on with datasource config merging order in stages.py. _read_cfg. I'll refresh on why that merge is being overridden instead of merged ther . [17:09] tribaal: as Azure does the same type of thing :/ [17:13] ahh your builtin is setting all of cloud_config_modules to your list. as you were supposing, we need to only augment the ds config on disk with your defaults. I'll work up something. [17:22] Hi! I ran into a weird problem / edge-case, where ds-identify does or does not correctly detect a NoCloud datasource, depending on which version of util-linux (blkid) is installed on the system. Could somebody spare a few minutes to have a look and decide if this should be fixed in cloud-init or in the third party software I am using (xen orchestra)? I described my findings in more detail here: https:// [17:22] github.com/vatesfr/xen-orchestra/issues/4449 Thanks! [18:03] flipsa: thanks for reporting it; can you run a cloud-init collect-logs in the failing case? and ideally open a bug in launchpad ? I'd like to see what udevadm info --query=all /sys/class/block/xvdXX shows so we can see what sorts of properties were on the device; [18:45] tribaal: thanks for access to the system. I have a fix for #1841454 . I can get it to run the modules each boot now https://pastebin.ubuntu.com/p/wTtf9JYDHs/ [18:45] tribaal: rharper I'll put up a branch for this fix shortly (SRU-regression/ Eoan bug) [18:46] nice [18:46] rharper: also openstack v2 related. fixing idempotent normalize_route works wonders for fixing unit test issues [18:46] \o/ [18:47] I thought it might; just prevents mutuation in paths which call it multiple times [18:47] https://paste.ubuntu.com/p/kgxN8qJfcM/ [18:47] I'd like an eye-catcher though so we can see where we're getting multipath passes through [18:47] oh [18:47] nasty [18:47] yeah I think this mutation only happened when prefix was /0 for default routes [18:48] needed a none check instead [18:48] indeed [18:48] +1 on tracking the multiple callers for normalization as well will add some debug [19:11] blackboxsw: \o/ [19:27] rharper: https://bugs.launchpad.net/cloud-init/+bug/1841466 [19:27] Launchpad bug 1841466 in cloud-init "ds-identify fails to detect NoCloud datastore with LABEL_FATBOOT instead of LABEL (change introduced recently in util-linux-2.33-rc1)" [Undecided,New] [19:27] rharper: if you need anything else, ping me... [19:28] rharper: the logs are almost all empty / non existent, because ds-identiy bails with exit code 1 and cloud-init doesn't even run... [19:29] but i guess the problem / cause of the problem are clear anyway. But if not I can clarify / do more testing [19:39] flipsa: do you have the command that creates the partitionless disk with FAT16? [19:40] flipsa: it seems reasonable to me to support both, but I'd like to be able to create one of these type of disks so we can verify the behavior and the fix [19:48] @rharper: I am not familiar with the Xen Orchestra code base, but this should be it: https://github.com/vatesfr/xen-orchestra/blob/master/packages/xo-server/src/fatfs-buffer.js [19:49] it's all done in a web app with nodejs. The external npm library they use is commented out on line 33 [19:51] flipsa: ok, my reading of the util linux code seemed to indicate to me that the filesystem label detection in blkid used to fallback to reading the boot record label as well, but blkid stopped doing that; this is almost certainly going to cause more wide spread failures of where FAT16 labels were provided but now they aren't, instead all of the tools which use to get a LABEL value no longer get that. [19:51] yeah, as soon as people upgrade some will get bitten, quite sure [19:52] now util linux is more accurate separating the two labels out; but now everyone else gets to fix their stuff as well. I wonder what case really broke where blkid reported the boot label as the fs label [19:53] no clue [19:55] flipsa: would you be able to duplicate the original disk? is it something really small we could attach to the bug ? [19:55] looks like dosfstools writes both boot record and volume label with the same value [19:56] https://github.com/dosfstools/dosfstools/blob/master/src/boot.c#L755 [19:56] flipsa: I wonder if the ftafs would do that as well [19:56] and we can certainly check if LABEL_FATBOOT is present as well [19:57] rharper: 10MiB [19:58] yeah, I bet if you xz it, that'll drop smaller, either way if you don't mind attaching that to the bug would be great [19:59] rharper: see the bottom of my initial bug report where i did some experiments... dosfslabel incorrectly reads from the LABEL_FATBOOT field (if LABEL is not present), but it writes to the LABEL field but does not over-write LABEL_FATBOOT. seems like a bug as well imho [19:59] well, it's not _incorrect_ by my reading, rather it checks in _both_ locations [20:00] rharper: will try. what's the size limit for bugs.launchpad? [20:00] much bigger than 10MiB [20:01] flipsa: we'll handle this in the cloud-init side with a fix to ds-identify to support checing LABEL_FATBOOT for cidata [20:01] I would suggest also mentioning writing both label values to orchestra [20:01] so they can generate nocloud data for images which won't yet have cloud-init with the fix [20:02] I see no reason not to write the same value in both places for the cloud-init use-case [20:02] rharper: awesome! [20:02] yeah, think they were hoping that upstream fixes it, but you are right, until this is pushed to distros will take enough time to create problems for people... [20:14] rharper: will a dd image of the virtual disk do? don't think there's a supported way to export it from the Xen Orchestra interface [20:15] is there only a single disk or do you have a rootfs disk and a separate cloud-init data disk ? [20:16] only one disk, no partitions [20:16] and the disk is FAT16 ? [20:17] that's surely not the Operating System disk ? [20:22] I am sure it's not the OS disk... but you might be right, i assumed that it's FAT16 from reading the source code of Xen Orchestra. cfdisk tells me: Label: dos, identifier: 0x00000000, no partitions only "free space". blkid says type "vFAT" [20:25] is there a diff between FAT16 and vFAT? [20:25] btw, this is the whole content: [20:26] . [20:26] ├── meta-data [20:26] ├── network-config [20:26] ├── openstack [20:26] │   └── latest [20:26] │   ├── meta_data.json [20:26] │   └── user_data [20:26] └── user-data [20:29] that's just the metadata disk [20:29] so yeah, just dd that [20:36] rharper: https://bugs.launchpad.net/cloud-init/+bug/1841466/+attachment/5284769/+files/xvdb.img.tar.gz [20:36] Launchpad bug 1841466 in cloud-init "ds-identify fails to detect NoCloud datastore with LABEL_FATBOOT instead of LABEL (change introduced recently in util-linux-2.33-rc1)" [Undecided,New] [20:36] flipsa: thanks! [20:37] tribaal: rharper I have a branch up that fixes #1841454 [20:37] rharper: tribaal https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/371823 [20:38] it allows for overriding existing sys_cfg options [20:39] ok [20:39] blackboxsw: so, why didn't it work ... [20:40] vs. azure's build in ds config merging ? because of the list type ? [20:42] I guess yes [20:43] blackboxsw: I'll give it a try [20:57] blackboxsw: yep, building a local deb and installing to a fresh ewan instance makes it work indeed [20:58] runcmd runs, and get-passwords runs with frequency "always" as was initially expected. [21:01] so, azure's built-in ds doesn't update or modify the modules list, so that's why it's not an issue; [21:04] in azure, the ds.activate() method is used to clear our disk_setup/mount semaphors in the case that they need to reinitialize the disks due to migration; exoscale could clean up the set_passwords semaphor that way. [21:06] rharper: instead of forcing the frequency? [21:07] yes ; in your template that you loaded, did you just copy the current value of modules and replace the one entry ? [21:08] rharper: I'm not sure I understand the question [21:08] before I suggested that the datasource indicate the frequency, you said there was an in-image file which set this value [21:08] which meant that stock images wouldn't run password on each boot [21:08] ah yes [21:09] so the in-image file does it "wrong" as well it turns out [21:09] heh [21:09] that's not surprising; the config is awkward in that you have to somehow know or read the current list; [21:09] we did not copy the list and change the value, we asssumed it would be merged and therefore just added the one entry there [21:10] actually that's how we detected the problem - with a bionic image in our prod. But the fix there is easy enough for us - we'll just copy the full list and tweak the frequency until we can forget about it all and use the proper datasource instead that should do the right thing for us [21:11] in other words as a workaround we'll copy the full list in our in-image file, until we can get rid of the in-image file :) [22:10] rharper: yeah sorry was on an errand. because of the list type, full list value overrode the entire cloud_config_modules list item [22:15] rharper: per overriding sysconfig values from the file system, I didn't see any existing cases where we did that in datasources after that initial /etc/cloud/config.d merge normally our ds's pull only ds_cfg under datasource: : key/values