[11:23] <catphish> morning, i'm almost there with my datasource, but i have a problem that i don't understand, on first boot, my local stage populates netplan with my network config, but the network never comes up. if i then reboot, the network *does* come up
[11:23] <catphish> it feels like cloud-init it suppressing netplan, but not actually configuring the network itself
[11:26] <catphish> this is (i thinkt) the relevant log output: https://paste.ubuntu.com/p/zV8ndvcf7k/
[11:34] <catphish> the documentation for the local stage says "Cloud-init then exits and expects for the continued boot of the operating system to bring network configuration up as configured." but it would seem that for some reason, netplan is not being applied *after* this local stage
[11:41] <catphish> oh, the netplan config is written during the main stage, not the local stage, hmm
[11:43] <meena> catphish: glad you figured all that out without any help
[11:44] <catphish> thank you, though i'm still not clear about how the network should get configured at first boot :(
[11:46] <catphish> my datasource fetches and returns network config (yay), but this doesn't get written to the netplan config file until after netplan has already executed, and hence the network is never actually brought up during the first boot, i hope i'm missing something simple, but i'm not sure what it is
[12:01] <otubo> Hey guys, any chance to take a look at my PR before it gets stale? https://github.com/canonical/cloud-init/pull/586
[12:04] <otubo> Also, a quick question about logging, why does multi_log() is on util.py and not on log.py? I just got an IOError on multi_log()'s flush call over here.
[12:08] <otubo> I see that log.flushLoggers() is the single point of flushing (except for log._resetLogger()) and it passes on IOError exception. Wondering if we should pass on util.multi_log() as well or move multi_log inside log and reroute the flush to log.flushLoggers()
[12:10] <otubo> The bug in question is this one: https://bugzilla.redhat.com/show_bug.cgi?id=1831107
[13:42] <Odd_Bloke> meena: pickle is a way of serialising Python objects: https://docs.python.org/3/library/pickle.html.  cloud-init uses it to persist instance state between boots.
[13:43] <Odd_Bloke> meena: Yep, I discovered you can do the mocking after I'd written that concrete subclass implementation.
[14:27] <rharper> catphish: is your datasource configured to run at local time?   2020-10-14 11:21:08,512 - main.py[DEBUG]: [local] Exiting. datasource DataSourceKatapult not in local mode.  It sounds like it is not, the message your post ends with is in cloudinit/cmd/main.py where it checks the current datasource.mode with the mode;    What does your datasources =  line look like in your new datasource?  if you look at DataSourceConfigDrive,  you
[14:27] <rharper> can see how a datasource class is bound to sources.DEP_FILESYSTEM, tuple; this tells cloud-init that ConfigDrive runs at "local time"; access to local filesystem vs dependency on the network (the datasource needs networking to fetch it's data)
[14:47] <catphish> rharper: i was fairly sure my data source runs at local time. it runs if i execute "clout-init init --local", the dependencies are defined as follows: datasources = [ (DataSourceKatapult, (sources.DEP_FILESYSTEM, )) ]
[14:47] <catphish> rharper: this is the complete (WIP) datasource: https://paste.ubuntu.com/p/s9F4xth7Yc/
[14:48] <catphish> i will check whether "cloud-init init --local" actually populates the netplan config file or not
[14:50] <rharper> catphish: your log suggested it did,  2020-10-14 11:21:08,250 - netplan.py[DEBUG]: V2 to V2 passthrough
[14:50] <rharper> 2020-10-14 11:21:08,252 - util.py[DEBUG]: Writing to /etc/netplan/50-cloud-init.yaml - wb: [644] 662 bytes
[14:50] <rharper> and it invoked netplan generate
[14:50] <rharper> 020-10-14 11:21:08,254 - subp.py[DEBUG]: Running command ['netplan', 'generate'] with allowed return codes [0] (shell=False, capture=True)
[14:52] <rharper> so, the questions that remain are  1) what's in /etc/netplan/50-cloud-init.yaml 2) what's in /run/systemd/network/*   3) is systemd-networkd.service enabled?  4) if so, did it run before or after cloud-init-local.service (journalctl -b 0 -o short-monotonic -u cloud-init-local.service -u systemd-networkd.service
[14:53] <catphish> thank you, i will go back through this now, what i know for sure is that after the first boot, /etc/netplan/50-cloud-init.yaml is populated, but not applied, after a subsequent reboot, it's applied
[15:09] <catphish> so, a --local run definitely populates netplan: https://paste.ubuntu.com/p/hMDgNHjPrW/
[15:14] <catphish> so, after the initial boot, /etc/netplan/50-cloud-init.yaml is populated, but systemd-networkd.service is "dead", if i run "systemctl start systemd-networkd.service", the network comes up fine
[15:15] <rharper> dead usually meants it's not enabled by default in your system
[15:16] <catphish> hmm, yes it does seem like it may be disabled
[15:16] <rharper> on Ubuntu, we have /etc/systemd/system/network-online-target.wants/systemd-networkd-wait-online.service
[15:16] <catphish> but on a subsequent boot, it runs
[15:16] <catphish> nb. this is ubuntu 18.04 with cloud-init installed after a regular installation
[15:16] <rharper> because netplan during boot will now parse the yaml and emit system want targets
[15:16] <rharper> oh
[15:16] <rharper> desktop ?
[15:17] <catphish> server
[15:17] <rharper> from the legacy installer
[15:17] <catphish> yes
[15:17] <rharper> do you have the want's I meantioned in /etc/systemd/system/network-online-target.wants ?
[15:18] <catphish> i'm happy to abandon this weird installation and start from a new cloud image, i just assumed it would work the same
[15:19] <catphish> i don't have anything called /etc/systemd/system/network-online-target.wants
[15:19] <rharper> yeah, I suspect this is part of the cloud images that may not be present without
[15:20] <catphish> it's very much my suspicion that netplan is starting before cloud-init-local
[15:20] <rharper> netplan only runs at generator time; and it writes config files
[15:20] <rharper> mkdir -p /etc/systemd/system/network-online.target.wants; cd /etc/systemd/system/network-online.target.wants && ln -s /lib/systemd/system/systemd-networkd-wait-online.service
[15:21] <rharper> that should ensure that systemd-networkd is part of the boot target; such that when cloud-init calls netplan generate and those files are created in /run/systemd/network/ then networkd should run
[15:21] <rharper> also, systemct status systemd-networkd  should say: Loaded: loaded (/lib/systemd/system/systemd-networkd.service; enabled; vendor preset: enabled)
[15:21] <catphish> rharper: that's fixed it, thanks!
[15:22] <catphish>    Loaded: loaded (/lib/systemd/system/systemd-networkd.service; disabled; vendor preset: enabled)
[15:22] <catphish> interestingly "disabled" but it works
[15:23] <catphish> yay - https://paste.ubuntu.com/p/Bg59SWTvc4/
[15:24] <rharper> catphish: \o/
[15:24] <catphish> anyway, that was probably a waste of time, but at least now i know my datasource isn't the problem, i'll try to get some actual cloud images onto my platform now, thank you for your assistance
[15:25] <catphish> hopefully the cloud images will work out of the box
[15:26] <rharper> yeah, I hope so
[15:27] <catphish> there's still a lot i need to understand, but getting there
[15:27] <meena> Odd_Bloke, that warning sounds…dangerous. do we trust our data??
[15:28] <Odd_Bloke> meena: I was thinking about that earlier: if you have the ability to write nasty data into obj.pkl, then you almost certainly have the ability to do substantially worse things in a much less indirect fashion.
[15:29] <Odd_Bloke> obj.pkl is root:root, 400.
[15:30] <Odd_Bloke> And if you can feed user-data which would cause a vulnerability in, then you may as well just give yourself root more directly, without exploiting some pickling bug.
[15:34] <Odd_Bloke> (If anyone has something more concrete than that, then please follow our security process! https://cloudinit.readthedocs.io/en/latest/topics/security.html)
[16:10] <meena> Odd_Bloke, aye
[16:34] <meena> i need to fix the Azure tests, and get this pr done
[16:36] <meena> i say this, while sitting here watching buddi https://www.netflix.com/title/80993590
[16:57] <vijayendra> rharper, LP: 1893770. I tried your suggestion on adding one more datasource NoCloud along with ConfigDrive but I still see it resets to fallback(dhcp)
[17:04] <rharper> vijayendra: sure, update the bug with logs from that run;  I suggest that you interactively work with ds-identify and your cloud.cfg until you see ds-identify report disabled;  sudo DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force
[17:10] <vijayendra> rharper, updated logs on bug. Sure. Currently doing the same by running /usr/libexec/cloud-init/ds-identify --force
[17:15] <vijayendra> rharper, currently tried by doing this change https://paste.ubuntu.com/p/Ywqv58nGxs/. Looks working with this change
[18:02] <rharper> vijayendra: ok, but you shouldn't need this; I'll look at the logs
[18:02] <vijayendra> rharper, Sure
[18:04] <rharper> this is what it looks like for me; what I expect it should do for you as well;  https://paste.ubuntu.com/p/vfZ98k3VRm/
[18:16] <vijayendra> rharper, In my case last line looks like No ds found [mode=search, notfound=enabled]. Enabled cloud-init [0]. https://paste.ubuntu.com/p/zqfxXfnQXy/
[18:17] <vijayendra> rharper, cloudinit is disabled for you but for me its enabled
[18:19] <rharper> vijayendra: I updated the bug
[18:19] <rharper> something has the ds-identify policy set to notfound=enabled ; which means if you don't find any datasources, enable cloud-init anyway
[18:19] <rharper> that's not the upstream default, or how we run it in Ubuntu
[18:19] <rharper> I don't think RHEL sets that as default either
[18:19] <rharper> but it's possible
[18:20] <rharper> in any case though, smoser already mentioned that a ds-identify change won't help since any cloud-init operation that was supposed to run on every boot (user-data may have enabled these things) wouldn't get run since cloud-init disabled itself;
[18:21] <rharper> so the path forward is either looking at the datasource fallback PR; or deal with manual-cache-clean issues you've found in any backup/template script;
[18:26] <vijayendra> rharper, True. Let me work on the fallback PR you suggested.  I don't see much of an impact on our platform if subsequent boot cloudinit gets disabled. So wanted to give a try
[18:34] <rharper> you don't accept user-data ?
[18:46] <vijayendra> rharper, yes, That will not work, but after reboot dhcp network on the guest seems to be bigger problem than that, was just trying to barrow some time if we can deal with this problem before we address issue completely.
[18:49] <rharper> vijayendra: I see; it's definitely an improvement in the short term
[18:53] <rharper> vijayendra: ah, I see what's up; since power does not have DMI, the default policy is to enable cloud-init even if DS is not found; primarily because we can't rule some datasources out without looking at DMI data;   so, since you're already writing a datasource_list: []  you can also include a ds-indentify policy to set notfound=disabled;  you can write to /etc/cloud/ds-identify.cfg with the content: policy: search,found=all,maybe
[18:53] <rharper> =all,notfound=disabled
[18:56] <vijayendra> rharper, ah! ok. Sure. Let me try this
[19:08] <vijayendra> rharper, yes. This change did not reset to fallback
[19:11] <vijayendra> rharper, this helps in short term, I will work on the PR suggested and update you. Thanks for the support.
[19:17] <rharper> cool
[20:52] <meena> https://discourse.ubuntu.com/t/path-to-a-commit-bit-proposal/18770 ⬅️ i commented on rick_h's commit bit proposal
[20:52] <meena> aaaaand now, i sleep
[23:54] <johnsonshi> Odd_Bloke: I think the Azure datasource report_diagnostic_event refactor can now be merged since I've addressed all of the comments: https://github.com/canonical/cloud-init/pull/563