/srv/irclogs.ubuntu.com/2023/11/02/#cloud-init.txt

cjp256	if anyone has the cycles, can I get a review on 4560? I have one more queued up after it I'd like to get in for 23.4 if possible	13:56
Guest99	Can I use cloud-init to completly auto install a server image? I have deployed a python http server on port 3003 and created the user-data and meta-data files and placed them in the www directory. I can see that when I boot up qemu that it sees the http server and gives a 200 on the user-data and meta-data files yet when I watch over a SPICE	14:08
Guest99	terminal it always gets stuck on the first step which is select your language/locale/keyboard I've tried some many different combinations of user-data and I even did the installer manually and then grabbed the installer user-data file out of the /var/log/autoinstall directory and tried it with that and still it gets stuck on that same bit. Im kinda	14:08
Guest99	lost on where to go from here debuggin wise have you got any tips or help to aid me in the right direction?	14:08
minimal	Guest99: from the reference to /var/log/autoinstall i'm guessing you're using ubuntu server, is that correct?	14:44
blackboxsw	cjp256: will grab it today.	15:47
dbungert	Guest99: one common problem that people run into is that if you're sending autoinstall via cloud-config, it has to be under an autoinstall top level keyword - double check that https://canonical-subiquity.readthedocs-hosted.com/en/latest/intro-to-autoinstall.html	15:49
blackboxsw	Guest99: how are you providing the URL to your system? I'm presuming kernel cmdline in GRUB with `ds=nocloud-net;s=http://your-service:3003/`? I'd also expect you provide an "autoinstall" param on the kernel cmdline too to avoid getting prompted for input	15:50
Guest99	minimal yeah the server version. Ultimately I'm trying to build a image that is like desktop ubuntu but with a lot of the stuff removed for our medical staff (Nurses) who are placed around the country.	15:50
Guest99	I will quickly add it toa paste bin	15:50
blackboxsw	Guest99: given you see the GET of user-data and meta-data. I'm just going to guess that the format of the autoinstall config in #cloud-config user-data is not valid and cloud-init doesn't process it in the ephemeral boot of the installer and so the installer doesn't see a processed autoinstall key. One can check by entering the shell via the help menu and `sudo cloud-init query userdata` or	15:53
blackboxsw	`sudo cloud-init schema --system`	15:53
Guest99	https://pastebin.com/We1yn5nB	15:54
Guest99	blackboxsw I will try with your grub string as it's different to mine. I read I would need to escape the ';' so it would look like `ds=nocloud-net\;s=http://your-service:3003/`	15:58
Guest99	But I will try without first	15:58
minimal	blackboxsw: does the Ubuntu installer support using cloud-init for BOTH doing an install via subquity and THEN using cloud-init with a NoCloud-Net DS also?	15:59
blackboxsw	Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))' or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data	16:01
blackboxsw	Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))` or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data	16:01
Guest99	Unfortunatly it didn't work. Ahh I see thanks	16:01
Guest99	Where should i run the cloud-init schema --system --anaote command? in the www directory with the user-data?	16:02
blackboxsw	Guest99: `sudo cloud-init schema --system` only works on the target machine that has cloud-init installed and active to report the cached user-data processed by cloud-init. Since you already have a host that has cloud-init installed (and failed) you could run `sudo cloud-init schema --config-file <your_yaml_file>` to start seeing errors in config and it'd allow you to quickly change <your_yaml_file> and reattempt validation	16:04
blackboxsw	Guest99: so something like `sudo cloud-init query userdata > my_userdata.yaml; sudo cloud-init schema --config-file my_userdata.yaml`	16:05
blackboxsw	.. on that qemu system you have	16:05
Guest99	Okay thanks! I will give it a go :D	16:06
dbungert	Guest99: you have several things that should be under autoinstall and aren't, like source, ssh, storage, updates	16:06
dbungert	also line 76 needs an indentation fix	16:07
dbungert	minimal: yes, you can use cloud-config to configure the install environment, then have autoinstall, then have more user-data for the target system actually being installed.	16:07
blackboxsw	thx dbungert and Guest99 line 8 needs a trailing ":" after primary	16:08
blackboxsw	minimal: the Ubuntu installer has two boot stages, "ephemeral" and "first boot". Ephemeral stages uses cloud-init to detect any viable discovered datasource and consume that userdata which provides config performed during "ephemeral" stage or "autoinstall" config which performs target system setup such as disk/partition setup etc.	16:19
blackboxsw	The installer then provides supplemental configuration required by "first boot" as DataSourceNone configuration to /etc/cloud/cloud.cfg.d to setup so cloud-init in that boot stage will only detect DataSourceNone config	16:19
blackboxsw	minimal: so ephemeral boot can accept any viable datasource (including nocloud content which could be provided by kernel cmdline or in image in /var/lib/cloud/seed etc or via a mounted device with CIDATA label etc). And that user-data doesn't have to be "autoinstall:" config, but the installer image itself and ephemeral boot stage is designed to take users through the installer prompts to ultimately configure a target vm.	16:22
blackboxsw	typically the easiest way to provide that nocloud data is manipulating grub on the cmdline appending cloud-init's ds=nocloud-net params like https://ubuntu.com/server/docs/install/autoinstall-quickstart and https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html#method-2-local-filesystem-kernel-commandline-or-smbios	16:26
blackboxsw	easy, yet manual.	16:27
minimal	which reminds me I need to go back to collect up info/logs to write up an Issue for the NoCloud-Net network handling	16:56
meena	is there a way to override a runcmd instance script provided by hetzner? it's basically just: `udevadm trigger -c add -s block -p ID_VENDOR=HC --verbose -p ID_MODEL=Volume` and doesn't work on FreeBSD	18:22
rawtaz	i have a really weird problem. using terraform im provisioning a debian 12 image on an openstack (not mine). it uses cloud-init to among other things create one "admin" user (with users:) and put an ed2519 pubkey into that user's authorized_keys (with ssh_authorized_keys: for that user).	18:23
rawtaz	the weird thing is that while the instance is provisioned seemingly properly, it's up and has network, only like one third of the times i provision this sucker the authorized_keys are actually populated. which makes me unable to log in on the host in the other two out of three times.	18:23
rawtaz	what makes it weird is that this exact setup is not changing at all between when it works and doesnt. the only thing i do betwen reprodutions of the problem is to `terraform destroy` and then clean all terraform files, and then i start over with a new `terraform init` followed by apply. so its literally the exact same thing running every time. yet, sometimes cloud-init does not populate the authorized_keys xD	18:25
rawtaz	if anyone recognize any of this symptom, feel free to let me know. im gonna try to change the image used to debian 11 instead, to see if i can isolate it to the openstack provider's debian 12 image.	18:25
rawtaz	its just so weird.	18:26
rawtaz	ive used this terraform set up with the exact same cloud-init configuration like a year ago, on the same openstack provider, and it worked fine every single time i provisioned an instance (which were a number of times). that suggests it might be debian 12 image used, but its also possible something else changed in their infra since then.	18:27
rawtaz	around a year ago or whenever i ran this and it worked fine 100% of the time, cloud-init v. 20.4.1 was used, now it's v. 22.4.2 in case that matters.	18:34
minimal	meena: I assume that's coming from their vendor stuff, what if you remove "scripts_vendor" from cloud.cfg's module list? ;-)	18:35
meena	I've killed the machine already, but that would make sense.	18:36
meena	it's just confusing that we log it as: 2023-11-02 18:14:38,648 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)	18:36
meena	cc_scripts_user.py	18:36
minimal	rawtaz: have you checked the /var/log/cloud-init.log file when it doesn't work? or can you not SSH in to do that? ;-) Console access instead?	18:37
rawtaz	minimal: i have not. im still doing some big picture isolating of low hanging fruit, e.g. making sure i can reproduce it and then trying the old image i know worked before. after that, i might try to get console access (i'll have to remove `ssh_pwauth: false` and set a password for that, effectively changing the cloud-init config, which kind of sucks). i do have console access but cant log in on it due to no password for root	18:39
rawtaz	i do however have the logs for the provisioning of the instance, where i see most of the cloud-init and dmesg output. i guess there's a more specific cloud-init log though, i recall.	18:40
rawtaz	just to be clear; the symptoms that make me think it's cloud-init not adding pubkey to authorized_keys are: 1) the instance log does not show the "Authorized keys from /home/admin/.ssh/authorized_keys for user admin" output when it doesnt work; and 2) i cant ssh to it, the different methods are exhausted when i try.	18:41
rawtaz	question: im of course interested in /var/log/cloud-init.log and /var/log/cloud-init-output.log - is there a setting to enable more verbose debugging in them?	18:43
minimal	rawtaz: for cloud-init.log that's controlled by /etc/cloud/cloud.cfg.d/05_logging.cfg	18:46
rawtaz	and that is not something i can change with user-data, right?	18:47
minimal	basically in "[logger_cloudinit]" section you want level=DEBUG	18:47
rawtaz	(i presume it's set in the image that the openstack provider loads when provisioning the instance	18:47
minimal	hmm, not sure. Once you get in check it anyway as it might have debugging on alreadt	18:48
rawtaz	yeah. thanks :)	18:49
minimal	are you referring to "ssh_authorized_keys:" in top-level of user-data or "ssh_authorized_keys:" inside a "users:" section?	18:51
blackboxsw	rawtaz: by default cloud-init's log level should be set to DEBUG in most images. And the logs are very very noisy. For network config, openstack cloud-init will typically get network config from network_data.json, it'd be good to 'egrep "Applying network\|network_data" /var/log/cloud-init.log' on a failed system as it's possible to see that network_data.json was provided by the openstack API and that cloud-init applied it.	18:52
rawtaz	minimal: the latter, under a user in users:	18:52
rawtaz	blackboxsw: that's good news :D	18:53
minimal	rawtaz: then you'd want to search the cloud-init.log file for "Running module users_groups" and then in that section, when it handles the relevant user~(s), you should see "Writing to /home/XYZ/ssh/authorizedkeys - wb: [600] 99 bytes"	18:56
minimal	s/ssh/.ssh/	18:56
rawtaz	awesome. i will look for that if i can get myself into the instance after it shows the problem. thank you	18:57
blackboxsw	rawtaz: if you are seeing highly repeatable failure on openstack that seems to alternate on a predictable frequency (as in every other launch or every third launch) it's possible that there's a misconfiguration issue in some of the openstack nodes participating in the nova service back plane resulting in invalid data from that service every time round-robin load balancing talks to the "broken" service node.	19:04
rawtaz	hmm, good point. but even if that is the case, wouldnt my provisionings compete with other users on the same openstack platform's provisionings, i mean in the round-robin? such that i can never know if i hit the first, second or third round, so to speak, as others interefere all the time.	19:08
blackboxsw	rawtaz: good pt. wasn't sure how public/private this openstack deployment is for you.	21:05
rawtaz	it's one that is used by many customers. i have no way of knowing when or whether other customers provision at the same time, but i would presume that over a couple of hours i can hardly be the only one.	21:06
rawtaz	ive been doing provision after provision now, every time the exact same way so the only difference is time (which varies by at most 30 seconds between each try).	21:08
rawtaz	i am unable to reproduce it over six attempts on the debian 11 image with cloud-init 20.4.1 but can reproduce it on the debian 12 image with cloud-init 22.4.2	21:09
rawtaz	the bad news is that i have not been able to reproduce it with a slightly altered user-data which sets a password for root so i could get to the logs :(	21:09
rawtaz	BEWM! i think i got a reproduction on the debian 12 and with ability to log in (assuming cloud-init correctly set that part of the user-data)	21:19
rawtaz	considering the fact that the exact same user-data can result in both success and failure, i'd say that suggests theres something in the openstack. but heck. very inconsistent	21:20
blackboxsw	yes agreed, if you are using the same base image, it's either a timing thing with network setup or a timing thing where openstack isn't providing the right network-config info to the VM launched	21:21
rawtaz	grr. i cant login with the user that's in my users: in the user-data, neither on the console of the instance nor using ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password ..	21:30
rawtaz	i wonder if it's even created. i might have to see if i can put a root user password in the user-data at a higher level than users: (in case users: isnt processed at all)	21:31
rawtaz	i think i found something that might be very relevant. in the failed provisionings, i have "cloud-init[546]: 2023-11-02 16:09:00,377 - cc_final_message.py[WARNING]: Used fallback datasource" right after the "Cloud-init v. 22.4.2 finished" line.	21:43
minimal	yeah that doesn't seem good	21:44
rawtaz	this warning is not present when provisioning succeeds, and what is described about it at https://cloudinit.readthedocs.io/en/latest/reference/datasources/fallback.html is in line with the symptoms im seeing (parts or all of user-data not being used at all).	21:44
blackboxsw	ok yes definitely. `cloud-init status --long` or `cloud-id` should confirm that you are DatasourceNone. So something didn't detect openstack for some reason	21:44
blackboxsw	and `cat /run/cloud-init/ds-identify.log` should tell you whether OpenStack datasource was detected as a maybe or match on the failed runs.	21:45
rawtaz	wow, spot on there blackboxsw. the failed ones has Datasource DataSourceNone and the successful ones have Datasource DataSourceOpenStack	21:45
blackboxsw	I expect there's a possibility of URL GET retry expiration in /var/log/cloud-init.log resulting in some message like OpenStack not detected	21:46
rawtaz	the thing is i have no idea how i could possibly get into the system. im not aware of a default login/password in this debian image they provision :D	21:46
rawtaz	i think you may be right.. i think that with this information the next step would be to talk with the hosting provider about it. is there anything else you think i should do/gather before that?	21:47
rawtaz	(without being able to log in to the failed systems)	21:47
blackboxsw	rawtaz: you could override default root user with something like this in user-data https://dpaste.com/HLYJM7YZA	21:55
blackboxsw	hrm..... n/m yet the prob is that your openstack datasource isn't being detected so user-data isn't passed through nevermind .... thinking again	21:56
rawtaz	blackboxsw: the problem seems to be that uer-data isnt processed at all :D	21:56
rawtaz	im writing them a mail and will talk with them tomorrow. i have a good feeling that theyll be able to resolve it now that we identified such a specific message :)	21:57
rawtaz	it could very well be a round-robin thing, that i cant see a pattern in due to other customers rotating it as well	21:57
blackboxsw	sounds good, right, I can't see how you'd otherwise be able to debug it if Openstack isn't detected on some launches unless you created your own snapshot image that defined DataSourceNone configuration in /etc/cloud/cloud.cfg.d/90-myds.cfg which provided the cloud-config I mentioned above. Then at least you'd have set the default password if DataSourceNone got detected. Example https://pastebin.mozilla.org/cx8Xiem3	22:04
blackboxsw	#do_not_use_in_production as it sets a default passwd for root :)	22:05
rawtaz	yeah that would have been good. but its too much work, i'd rather they fix the issue for real	22:07
rawtaz	haha nice :)	22:07
blackboxsw	yep, not your job to sort the openstack config issues	22:07
rawtaz	thank you. really, for all the input and help. it was very valuable!	22:09
rawtaz	both of you :)	22:09
rawtaz	bam, mail done	22:13
rawtaz	thats about nine hours of my life i will never get back :/	22:13
rawtaz	i had to do other work today, bah	22:13

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!