=== mgagne is now known as Guest16323 [14:15] Hi All! [14:15] is there a command to update only security upgrade on a linux with cloud-init? [14:19] well, he didn't wait around long. :) [14:42] Hello everyone, I'm investigating an issue where the cloud-init.service (bionic) fails to complete because it encounters /var/lib/cloud/instance being a directory instead of a symlink [14:43] I only found LP: #1531880 that talks about a similar issue but it went silent [14:43] Launchpad bug 1531880 in cloud-init "Failed to start Initial cloud-init job (pre-networking)" [Undecided,New] https://launchpad.net/bugs/1531880 [14:44] looks to me like it could be a race condition as two instances off the same image on our infrastructure (Scaleway) do behave differently [14:52] caribou: I've seen those every so often when rebooting before the cloud-init has completed running; we;ve never been able to reproduce them [14:56] well, I started two instances off the same image, one has the symlink & the other has a dir :-/ I'm starting a third one for a tie breaker [15:40] I cannot call it reproducing it but I'm hiting it rather often, let me know if you want me to turn on some more debugging (cc_debug maybe)) [16:40] smoser: I just read your comments on my NTFS mp [16:40] yeh [16:41] or backwards response 'hey'. which ever seems more approriate. [16:41] :-) [16:42] I like how you propose to always wipe NTFS and explain that there are better choices for filesystems [16:42] on linux [16:42] sure, let's not speak of any other os... [16:43] not that ntfs isnt a fine filesystem, but the only reason i can think to use ntfs on linux is dual boot [16:43] or data recovery [16:43] and ... dual boot on a cloud instance ? [16:43] ... yeah [16:44] but still, rharper seemed still pretty concerned with possible data on the NTFS partition... [16:44] * paulmey checks what waagent does [16:45] well, we're only doing this to the ephemeral disk [16:45] right ? [16:45] i think we safeguard that pretty well [16:45] so the peole that we could potentially foobar... [16:46] are the ones that somehow managed to use ntfs on that ephemeral disk (which cloud-init would have wiped if it could access it) [16:46] so on that boot, cloud-init was able to mount the filesystem [16:46] and then that person booted into a kernel (or os) that did not have ntfs [16:47] all while ignoring the "do not put important data here" warning [16:47] https://github.com/Azure/WALinuxAgent/blob/fb7d6c51dac236538a8c9eb8e752159d5e3f54b8/azurelinuxagent/daemon/resourcedisk/default.py#L149 [16:47] *and* MS wipes that disk for them sometimes... [16:47] waagent doesn't care at all [16:47] (wipes on a re-locate) [16:48] so basically... if this issue comes up, i'll just say "i think your instance must have gotten re-located, and MS lost your data, not cloud-init" [16:48] :) [16:48] can i have your phone number to give the user in that case, paulmey ? [16:48] sure, I'll pm it [16:48] :) [16:49] can you think of some reasonable path that got a user to this stage ? [16:49] paulmey: I just wanted smoser to sign off on the approach as well given the effort we went through to make sure cloud-init didn't wipe user-data; [16:49] i really really dont want to delet people's data [16:50] I understand... but their data should not be there in the first place, it's not your fault ? [16:50] :) that was joking. [16:51] how did they get the data there was more the question [16:51] i think this policy seems sane: [16:51] a.) if we can mount it, then check... dont reformat it if it has files [16:52] agreed [16:52] b.) if we cannot mount it... then check a config variable, if that variable says not to delete, then dont. [16:53] how did the user get into the 'b' case... [16:53] the 'b' case where they *had* data on it. [16:53] Default when config is not there is to delete if we can't mount? [16:53] i really think that is probably in the order of 7 people ever [16:54] right. default the config to allow cloud-init to reformat ntfs filesystems on the ephemeral disk [16:54] but we also want to make sure that this is *ONLY* the ephemeral disk that we might do this to [16:54] In the 4 years that I've done oncall here, I've never seen a complaint about data loss on the ephemeral drive... [16:55] My guess is that customer support just points to the docs and helps the customer go through the stages of grief [16:55] https://siliconangle.com/blog/2011/08/01/third-largest-bitcoin-exchange-bitomat-lost-their-wallet-over-17000-bitcoins-missing/ [16:56] * smoser wants to avoid being the reason that that happened [16:56] it's very hard to protect people from aiming at their foot [16:57] but I appreciate the effort [16:57] :) [16:57] I sign off on your policy, @rharper ? [16:59] btw, usually people end up in the b) case due to cloud-init not being able to format the disk in the first place. However, the issue is usually more that the fs is now NTFS instead of ext4... [16:59] paulmey: smoser's additions sounds fine to me; but that is new space; also requires new cloud-init though I suppose they need that anyhow today [16:59] creating swap files or the docker directory on the ephemeral drive if it is NTFS is not going to work well... [17:00] smoser: did you have a suggestion for the config namespace , what key should users add to say, don't nuke my ntfs drive even if you can't mount it ? [17:01] note that the code is in the DSAzure... so maybe a ds config? [17:05] yeah, in the azure datasource is the rigth place it seems [17:05] you'll have to be careful though to not just set it in the dsconfig in that module. [17:05] because if you do that, it ends up trumping config on disk [17:06] we just need to make sure that if the user put it in /etc/cloud.cfg.d/ or in user-data, that they are respected. [17:06] ok, I'll try to make tests for that [17:06] specifically [17:07] paulmey: and can we please me sure that we WARN iif the 'a' case found files on it [17:07] WARNING: it looks like you're using NTFS on the ephemeral disk, to ensure that filesystem does not get wiped, set THIS_CONFIG_OPTION [17:08] will do [17:09] datasource.azure.never-destroy-ntfs ? [17:10] or datasource.Azure.never-destroy-ntfs (I need to look that up) [17:13] underscores rather than - [17:13] and lets qualify that with 'ephemeral' in some way [17:15] ok, datasource.Azure.never_destroy_ntfs_on_ephemeral_disk ? [17:19] Need some help please, out of ideas about whet to try :( need the yaml to configure ntp [17:20] robjo: ok. [17:20] - ntp as entry under cloud_config_modules: enters cc_ntp.py [17:20] but as soon as I add anything then things fall over [17:21] I am thinking I should be able to do [17:21] - ntp: [17:22] pasetbin what you're giving it ? [17:24] https://paste.ubuntu.com/p/58DdVk3wYB/ [17:25] its not under cloud_config_modules [17:26] cloud_config_modules is just a list of modules that are to be run [17:26] OK, so what should it look like? [17:26] just put it in as [17:26] http://paste.ubuntu.com/p/52DgZBNcJh/ [17:28] http://cloudinit.readthedocs.io/en/latest/topics/modules.html#ntp [17:29] the code blocks (Examples) might help as an example [17:30] but shouldn't I see cc_ntp somewhere in the log? [17:30] I even put a debug statement in handle (first line) and I am not getting it [17:31] I am only getting the debug if I have -ntp under config modules [17:31] but hen of course I cannot configure anything [17:35] or maybe I am testing wrong, I tested with "cloud-init modules" and "cloud-init init" and for bothe cases I am not seeing the expected output in the log?? [17:38] robjo: well to most easily test, [17:39] if you're running cloud-init from the command line to re-run modules, you likely want cloud-init --file mycloud.cfg --debug single --name cc_ntp --frequency always --report [17:39] cloud-init single --name= [17:39] yeah that^ [17:39] where myconfig.cfg contains the config from smosers's paste [17:39] but i'd just put myconfig.cfg in /etc/cloud/cloud.cfg.d and not ptoher with --file [17:40] otherwise, if you want to test how it's called during firstboot, you can do: cloud-init clean --logs --reboot; [17:46] thanks, testing [17:53] smoser: so should I still check for mount.ntfs? or is it safe enough to assume that if that existed on $PATH, mount would have found it? [17:56] well, i think we just try to mount it [17:57] with mount -t ntfs [17:57] if that doenst work, then we consult that config option [17:57] right ? [17:58] yeah I'm all for that [18:05] OK, I cleared out /etc/cloud.cfg and set it to only contain what's in http://paste.ubuntu.com/p/52DgZBNcJh/ [18:06] then I ran cloud-init clean --logs --reboot [18:06] and when the machine comes back there is no trace of cc_ntp running in the log file [18:07] and grep ubuntu /etc/ntp.conf comes back empty [18:07] can you paste the log ? /var/log/cloud-init.log [18:07] i suspect your changies to cloud_config_modules made it not ru [18:11] well there is no cloud_config_modules section now, do I need both? i.e. -ntp in the modules section and the ntp config outside? [18:11] cloud_config_modules: is just a list of modules [18:11] not config of those mmnodules. [18:11] if you took that out, then it wont run the ntp module [18:14] yay, sorry for being so dumb about this, just goes to show how often I look/need to look at the config [18:15] I guess it would be good to have an ntp configuration example in the docs [18:15] I'm pretty sure we do [18:15] http://cloudinit.readthedocs.io/en/latest/topics/modules.html#ntp [18:16] I looked, I promise and search fro ntp turned up empty on http://cloudinit.readthedocs.io/en/latest/topics/examples.html [18:16] s/fro/for/ [18:16] yes, not added to examples. but examples under the module documentation [18:16] we could add a message in the examples page to refer to module docs for further examples ? [18:18] well the disconnect, and I was looking at the sources which basically is that example is the it is not obvious, at least it was not obvious to me that - ntp was needed under the cloud_config_modules and the actual configuration for the parameters was outside of the cloud_config_modules section [18:18] which is why I poked around at first trying to shoehorn the ntp config itself into the cloud_config_modules section [18:19] which obviously didn't work either [18:19] so, the way I look at it, and we can clarify, is /etc/cloud configures cloud-init; and users configure the modules via user-data [18:19] but I too struggled with the disconnect for a while as well [18:20] /etc being the system/distro config space, and user-data covers things the user would like to change, modify [18:20] Well that's fair but there are cases where this kind of stuff is built into an image and then you end up in the situation I was just in [18:21] of course I now learned my lesson and thanks for the help, but if it happens to me I bet I will not be the only one [18:21] for sure [18:21] others may just not necessarily know where to go for help [18:21] do you have a suggestion for the docs page? [18:21] so a short example cloud.cfg file that shows the key bits of this case may be helpful [18:24] this case being ? system-wide ntp default configuration ? [18:24] sorry, I missed the start of the convo [18:26] yes [18:27] yeah; that seems like a good add [18:30] hmm doc/examples/ in the source tree has cloud-config-ntp.txt but that doesn't show up on http://cloudinit.readthedocs.io/en/latest/topics/examples.html [18:30] but that looks to be an example about how to send the information as user-data