=== mgagne is now known as Guest16323 | ||
Gael | Hi All! | 14:15 |
---|---|---|
Gael | is there a command to update only security upgrade on a linux with cloud-init? | 14:15 |
dpb1 | well, he didn't wait around long. :) | 14:19 |
caribou | Hello everyone, I'm investigating an issue where the cloud-init.service (bionic) fails to complete because it encounters /var/lib/cloud/instance being a directory instead of a symlink | 14:42 |
caribou | I only found LP: #1531880 that talks about a similar issue but it went silent | 14:43 |
ubot5 | Launchpad bug 1531880 in cloud-init "Failed to start Initial cloud-init job (pre-networking)" [Undecided,New] https://launchpad.net/bugs/1531880 | 14:43 |
caribou | looks to me like it could be a race condition as two instances off the same image on our infrastructure (Scaleway) do behave differently | 14:44 |
rharper | caribou: I've seen those every so often when rebooting before the cloud-init has completed running; we;ve never been able to reproduce them | 14:52 |
caribou | well, I started two instances off the same image, one has the symlink & the other has a dir :-/ I'm starting a third one for a tie breaker | 14:56 |
caribou | I cannot call it reproducing it but I'm hiting it rather often, let me know if you want me to turn on some more debugging (cc_debug maybe)) | 15:40 |
paulmey | smoser: I just read your comments on my NTFS mp | 16:40 |
smoser | yeh | 16:40 |
smoser | or backwards response 'hey'. which ever seems more approriate. | 16:41 |
paulmey | :-) | 16:41 |
paulmey | I like how you propose to always wipe NTFS and explain that there are better choices for filesystems | 16:42 |
smoser | on linux | 16:42 |
paulmey | sure, let's not speak of any other os... | 16:42 |
smoser | not that ntfs isnt a fine filesystem, but the only reason i can think to use ntfs on linux is dual boot | 16:43 |
smoser | or data recovery | 16:43 |
smoser | and ... dual boot on a cloud instance ? | 16:43 |
paulmey | ... yeah | 16:43 |
paulmey | but still, rharper seemed still pretty concerned with possible data on the NTFS partition... | 16:44 |
* paulmey checks what waagent does | 16:44 | |
smoser | well, we're only doing this to the ephemeral disk | 16:45 |
smoser | right ? | 16:45 |
smoser | i think we safeguard that pretty well | 16:45 |
smoser | so the peole that we could potentially foobar... | 16:45 |
smoser | are the ones that somehow managed to use ntfs on that ephemeral disk (which cloud-init would have wiped if it could access it) | 16:46 |
smoser | so on that boot, cloud-init was able to mount the filesystem | 16:46 |
smoser | and then that person booted into a kernel (or os) that did not have ntfs | 16:46 |
smoser | all while ignoring the "do not put important data here" warning | 16:47 |
paulmey | https://github.com/Azure/WALinuxAgent/blob/fb7d6c51dac236538a8c9eb8e752159d5e3f54b8/azurelinuxagent/daemon/resourcedisk/default.py#L149 | 16:47 |
smoser | *and* MS wipes that disk for them sometimes... | 16:47 |
paulmey | waagent doesn't care at all | 16:47 |
smoser | (wipes on a re-locate) | 16:47 |
smoser | so basically... if this issue comes up, i'll just say "i think your instance must have gotten re-located, and MS lost your data, not cloud-init" | 16:48 |
smoser | :) | 16:48 |
smoser | can i have your phone number to give the user in that case, paulmey ? | 16:48 |
paulmey | sure, I'll pm it | 16:48 |
smoser | :) | 16:48 |
smoser | can you think of some reasonable path that got a user to this stage ? | 16:49 |
rharper | paulmey: I just wanted smoser to sign off on the approach as well given the effort we went through to make sure cloud-init didn't wipe user-data; | 16:49 |
smoser | i really really dont want to delet people's data | 16:49 |
paulmey | I understand... but their data should not be there in the first place, it's not your fault ? | 16:50 |
smoser | :) that was joking. | 16:50 |
smoser | how did they get the data there was more the question | 16:51 |
smoser | i think this policy seems sane: | 16:51 |
smoser | a.) if we can mount it, then check... dont reformat it if it has files | 16:51 |
paulmey | agreed | 16:52 |
smoser | b.) if we cannot mount it... then check a config variable, if that variable says not to delete, then dont. | 16:52 |
smoser | how did the user get into the 'b' case... | 16:53 |
smoser | the 'b' case where they *had* data on it. | 16:53 |
paulmey | Default when config is not there is to delete if we can't mount? | 16:53 |
smoser | i really think that is probably in the order of 7 people ever | 16:53 |
smoser | right. default the config to allow cloud-init to reformat ntfs filesystems on the ephemeral disk | 16:54 |
smoser | but we also want to make sure that this is *ONLY* the ephemeral disk that we might do this to | 16:54 |
paulmey | In the 4 years that I've done oncall here, I've never seen a complaint about data loss on the ephemeral drive... | 16:54 |
paulmey | My guess is that customer support just points to the docs and helps the customer go through the stages of grief | 16:55 |
smoser | https://siliconangle.com/blog/2011/08/01/third-largest-bitcoin-exchange-bitomat-lost-their-wallet-over-17000-bitcoins-missing/ | 16:55 |
* smoser wants to avoid being the reason that that happened | 16:56 | |
paulmey | it's very hard to protect people from aiming at their foot | 16:56 |
paulmey | but I appreciate the effort | 16:57 |
smoser | :) | 16:57 |
paulmey | I sign off on your policy, @rharper ? | 16:57 |
paulmey | btw, usually people end up in the b) case due to cloud-init not being able to format the disk in the first place. However, the issue is usually more that the fs is now NTFS instead of ext4... | 16:59 |
rharper | paulmey: smoser's additions sounds fine to me; but that is new space; also requires new cloud-init though I suppose they need that anyhow today | 16:59 |
paulmey | creating swap files or the docker directory on the ephemeral drive if it is NTFS is not going to work well... | 16:59 |
rharper | smoser: did you have a suggestion for the config namespace , what key should users add to say, don't nuke my ntfs drive even if you can't mount it ? | 17:00 |
paulmey | note that the code is in the DSAzure... so maybe a ds config? | 17:01 |
smoser | yeah, in the azure datasource is the rigth place it seems | 17:05 |
smoser | you'll have to be careful though to not just set it in the dsconfig in that module. | 17:05 |
smoser | because if you do that, it ends up trumping config on disk | 17:05 |
smoser | we just need to make sure that if the user put it in /etc/cloud.cfg.d/ or in user-data, that they are respected. | 17:06 |
paulmey | ok, I'll try to make tests for that | 17:06 |
paulmey | specifically | 17:06 |
smoser | paulmey: and can we please me sure that we WARN iif the 'a' case found files on it | 17:07 |
smoser | WARNING: it looks like you're using NTFS on the ephemeral disk, to ensure that filesystem does not get wiped, set THIS_CONFIG_OPTION | 17:07 |
paulmey | will do | 17:08 |
paulmey | datasource.azure.never-destroy-ntfs ? | 17:09 |
paulmey | or datasource.Azure.never-destroy-ntfs (I need to look that up) | 17:10 |
smoser | underscores rather than - | 17:13 |
smoser | and lets qualify that with 'ephemeral' in some way | 17:13 |
paulmey | ok, datasource.Azure.never_destroy_ntfs_on_ephemeral_disk ? | 17:15 |
robjo | Need some help please, out of ideas about whet to try :( need the yaml to configure ntp | 17:19 |
smoser | robjo: ok. | 17:20 |
robjo | - ntp as entry under cloud_config_modules: enters cc_ntp.py | 17:20 |
robjo | but as soon as I add anything then things fall over | 17:20 |
robjo | I am thinking I should be able to do | 17:21 |
robjo | - ntp: | 17:21 |
smoser | pasetbin what you're giving it ? | 17:22 |
robjo | https://paste.ubuntu.com/p/58DdVk3wYB/ | 17:24 |
smoser | its not under cloud_config_modules | 17:25 |
smoser | cloud_config_modules is just a list of modules that are to be run | 17:26 |
robjo | OK, so what should it look like? | 17:26 |
smoser | just put it in as | 17:26 |
smoser | http://paste.ubuntu.com/p/52DgZBNcJh/ | 17:26 |
smoser | http://cloudinit.readthedocs.io/en/latest/topics/modules.html#ntp | 17:28 |
smoser | the code blocks (Examples) might help as an example | 17:29 |
robjo | but shouldn't I see cc_ntp somewhere in the log? | 17:30 |
robjo | I even put a debug statement in handle (first line) and I am not getting it | 17:30 |
robjo | I am only getting the debug if I have -ntp under config modules | 17:31 |
robjo | but hen of course I cannot configure anything | 17:31 |
robjo | or maybe I am testing wrong, I tested with "cloud-init modules" and "cloud-init init" and for bothe cases I am not seeing the expected output in the log?? | 17:35 |
smoser | robjo: well to most easily test, | 17:38 |
rharper | if you're running cloud-init from the command line to re-run modules, you likely want cloud-init --file mycloud.cfg --debug single --name cc_ntp --frequency always --report | 17:39 |
smoser | cloud-init single --name= | 17:39 |
smoser | yeah that^ | 17:39 |
rharper | where myconfig.cfg contains the config from smosers's paste | 17:39 |
smoser | but i'd just put myconfig.cfg in /etc/cloud/cloud.cfg.d and not ptoher with --file | 17:39 |
rharper | otherwise, if you want to test how it's called during firstboot, you can do: cloud-init clean --logs --reboot; | 17:40 |
robjo | thanks, testing | 17:46 |
paulmey | smoser: so should I still check for mount.ntfs? or is it safe enough to assume that if that existed on $PATH, mount would have found it? | 17:53 |
smoser | well, i think we just try to mount it | 17:56 |
smoser | with mount -t ntfs | 17:57 |
smoser | if that doenst work, then we consult that config option | 17:57 |
smoser | right ? | 17:57 |
paulmey | yeah I'm all for that | 17:58 |
robjo | OK, I cleared out /etc/cloud.cfg and set it to only contain what's in http://paste.ubuntu.com/p/52DgZBNcJh/ | 18:05 |
robjo | then I ran cloud-init clean --logs --reboot | 18:06 |
robjo | and when the machine comes back there is no trace of cc_ntp running in the log file | 18:06 |
robjo | and grep ubuntu /etc/ntp.conf comes back empty | 18:07 |
smoser | can you paste the log ? /var/log/cloud-init.log | 18:07 |
smoser | i suspect your changies to cloud_config_modules made it not ru | 18:07 |
robjo | well there is no cloud_config_modules section now, do I need both? i.e. -ntp in the modules section and the ntp config outside? | 18:11 |
smoser | cloud_config_modules: is just a list of modules | 18:11 |
smoser | not config of those mmnodules. | 18:11 |
smoser | if you took that out, then it wont run the ntp module | 18:11 |
robjo | yay, sorry for being so dumb about this, just goes to show how often I look/need to look at the config | 18:14 |
robjo | I guess it would be good to have an ntp configuration example in the docs | 18:15 |
rharper | I'm pretty sure we do | 18:15 |
rharper | http://cloudinit.readthedocs.io/en/latest/topics/modules.html#ntp | 18:15 |
robjo | I looked, I promise and search fro ntp turned up empty on http://cloudinit.readthedocs.io/en/latest/topics/examples.html | 18:16 |
robjo | s/fro/for/ | 18:16 |
rharper | yes, not added to examples. but examples under the module documentation | 18:16 |
rharper | we could add a message in the examples page to refer to module docs for further examples ? | 18:16 |
robjo | well the disconnect, and I was looking at the sources which basically is that example is the it is not obvious, at least it was not obvious to me that - ntp was needed under the cloud_config_modules and the actual configuration for the parameters was outside of the cloud_config_modules section | 18:18 |
robjo | which is why I poked around at first trying to shoehorn the ntp config itself into the cloud_config_modules section | 18:18 |
robjo | which obviously didn't work either | 18:19 |
rharper | so, the way I look at it, and we can clarify, is /etc/cloud configures cloud-init; and users configure the modules via user-data | 18:19 |
rharper | but I too struggled with the disconnect for a while as well | 18:19 |
rharper | /etc being the system/distro config space, and user-data covers things the user would like to change, modify | 18:20 |
robjo | Well that's fair but there are cases where this kind of stuff is built into an image and then you end up in the situation I was just in | 18:20 |
robjo | of course I now learned my lesson and thanks for the help, but if it happens to me I bet I will not be the only one | 18:21 |
rharper | for sure | 18:21 |
robjo | others may just not necessarily know where to go for help | 18:21 |
rharper | do you have a suggestion for the docs page? | 18:21 |
robjo | so a short example cloud.cfg file that shows the key bits of this case may be helpful | 18:21 |
rharper | this case being ? system-wide ntp default configuration ? | 18:24 |
rharper | sorry, I missed the start of the convo | 18:24 |
robjo | yes | 18:26 |
rharper | yeah; that seems like a good add | 18:27 |
robjo | hmm doc/examples/ in the source tree has cloud-config-ntp.txt but that doesn't show up on http://cloudinit.readthedocs.io/en/latest/topics/examples.html | 18:30 |
robjo | but that looks to be an example about how to send the information as user-data | 18:30 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!