cjp256 | if anyone has the cycles, can I get a review on 4560? I have one more queued up after it I'd like to get in for 23.4 if possible | 13:56 |
---|---|---|
Guest99 | Can I use cloud-init to completly auto install a server image? I have deployed a python http server on port 3003 and created the user-data and meta-data files and placed them in the www directory. I can see that when I boot up qemu that it sees the http server and gives a 200 on the user-data and meta-data files yet when I watch over a SPICE | 14:08 |
Guest99 | terminal it always gets stuck on the first step which is select your language/locale/keyboard I've tried some many different combinations of user-data and I even did the installer manually and then grabbed the installer user-data file out of the /var/log/autoinstall directory and tried it with that and still it gets stuck on that same bit. Im kinda | 14:08 |
Guest99 | lost on where to go from here debuggin wise have you got any tips or help to aid me in the right direction? | 14:08 |
minimal | Guest99: from the reference to /var/log/autoinstall i'm guessing you're using ubuntu server, is that correct? | 14:44 |
blackboxsw | cjp256: will grab it today. | 15:47 |
dbungert | Guest99: one common problem that people run into is that if you're sending autoinstall via cloud-config, it has to be under an autoinstall top level keyword - double check that https://canonical-subiquity.readthedocs-hosted.com/en/latest/intro-to-autoinstall.html | 15:49 |
blackboxsw | Guest99: how are you providing the URL to your system? I'm presuming kernel cmdline in GRUB with `ds=nocloud-net;s=http://your-service:3003/`? I'd also expect you provide an "autoinstall" param on the kernel cmdline too to avoid getting prompted for input | 15:50 |
Guest99 | minimal yeah the server version. Ultimately I'm trying to build a image that is like desktop ubuntu but with a lot of the stuff removed for our medical staff (Nurses) who are placed around the country. | 15:50 |
Guest99 | I will quickly add it toa paste bin | 15:50 |
blackboxsw | Guest99: given you see the GET of user-data and meta-data. I'm just going to guess that the format of the autoinstall config in #cloud-config user-data is not valid and cloud-init doesn't process it in the ephemeral boot of the installer and so the installer doesn't see a processed autoinstall key. One can check by entering the shell via the help menu and `sudo cloud-init query userdata` or | 15:53 |
blackboxsw | `sudo cloud-init schema --system` | 15:53 |
Guest99 | https://pastebin.com/We1yn5nB | 15:54 |
Guest99 | blackboxsw I will try with your grub string as it's different to mine. I read I would need to escape the ';' so it would look like `ds=nocloud-net\;s=http://your-service:3003/` | 15:58 |
Guest99 | But I will try without first | 15:58 |
minimal | blackboxsw: does the Ubuntu installer support using cloud-init for BOTH doing an install via subquity and THEN using cloud-init with a NoCloud-Net DS also? | 15:59 |
blackboxsw | Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))' or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data | 16:01 |
blackboxsw | Guest99: yes that is invalid yaml user-data. try running `python3 -c 'import yaml; yaml.safe_load(open("your-userdata.yaml"))` or `sudo cloud-init schema --system --annotate` and either will show you invalid yaml syntax for user-data | 16:01 |
Guest99 | Unfortunatly it didn't work. Ahh I see thanks | 16:01 |
Guest99 | Where should i run the cloud-init schema --system --anaote command? in the www directory with the user-data? | 16:02 |
blackboxsw | Guest99: `sudo cloud-init schema --system` only works on the target machine that has cloud-init installed and active to report the cached user-data processed by cloud-init. Since you already have a host that has cloud-init installed (and failed) you could run `sudo cloud-init schema --config-file <your_yaml_file>` to start seeing errors in config and it'd allow you to quickly change <your_yaml_file> and reattempt validation | 16:04 |
blackboxsw | Guest99: so something like `sudo cloud-init query userdata > my_userdata.yaml; sudo cloud-init schema --config-file my_userdata.yaml` | 16:05 |
blackboxsw | .. on that qemu system you have | 16:05 |
Guest99 | Okay thanks! I will give it a go :D | 16:06 |
dbungert | Guest99: you have several things that should be under autoinstall and aren't, like source, ssh, storage, updates | 16:06 |
dbungert | also line 76 needs an indentation fix | 16:07 |
dbungert | minimal: yes, you can use cloud-config to configure the install environment, then have autoinstall, then have more user-data for the target system actually being installed. | 16:07 |
blackboxsw | thx dbungert and Guest99 line 8 needs a trailing ":" after primary | 16:08 |
blackboxsw | minimal: the Ubuntu installer has two boot stages, "ephemeral" and "first boot". Ephemeral stages uses cloud-init to detect any viable discovered datasource and consume that userdata which provides config performed during "ephemeral" stage or "autoinstall" config which performs target system setup such as disk/partition setup etc. | 16:19 |
blackboxsw | The installer then provides supplemental configuration required by "first boot" as DataSourceNone configuration to /etc/cloud/cloud.cfg.d to setup so cloud-init in that boot stage will only detect DataSourceNone config | 16:19 |
blackboxsw | minimal: so ephemeral boot can accept any viable datasource (including nocloud content which could be provided by kernel cmdline or in image in /var/lib/cloud/seed etc or via a mounted device with CIDATA label etc). And that user-data doesn't have to be "autoinstall:" config, but the installer image itself and ephemeral boot stage is designed to take users through the installer prompts to ultimately configure a target vm. | 16:22 |
blackboxsw | typically the easiest way to provide that nocloud data is manipulating grub on the cmdline appending cloud-init's ds=nocloud-net params like https://ubuntu.com/server/docs/install/autoinstall-quickstart and https://cloudinit.readthedocs.io/en/latest/reference/datasources/nocloud.html#method-2-local-filesystem-kernel-commandline-or-smbios | 16:26 |
blackboxsw | easy, yet manual. | 16:27 |
minimal | which reminds me I need to go back to collect up info/logs to write up an Issue for the NoCloud-Net network handling | 16:56 |
meena | is there a way to override a runcmd instance script provided by hetzner? it's basically just: `udevadm trigger -c add -s block -p ID_VENDOR=HC --verbose -p ID_MODEL=Volume` and doesn't work on FreeBSD | 18:22 |
rawtaz | i have a really weird problem. using terraform im provisioning a debian 12 image on an openstack (not mine). it uses cloud-init to among other things create one "admin" user (with users:) and put an ed2519 pubkey into that user's authorized_keys (with ssh_authorized_keys: for that user). | 18:23 |
rawtaz | the weird thing is that while the instance is provisioned seemingly properly, it's up and has network, only like one third of the times i provision this sucker the authorized_keys are actually populated. which makes me unable to log in on the host in the other two out of three times. | 18:23 |
rawtaz | what makes it weird is that this exact setup is not changing at all between when it works and doesnt. the only thing i do betwen reprodutions of the problem is to `terraform destroy` and then clean all terraform files, and then i start over with a new `terraform init` followed by apply. so its literally the exact same thing running every time. yet, sometimes cloud-init does not populate the authorized_keys xD | 18:25 |
rawtaz | if anyone recognize any of this symptom, feel free to let me know. im gonna try to change the image used to debian 11 instead, to see if i can isolate it to the openstack provider's debian 12 image. | 18:25 |
rawtaz | its just so weird. | 18:26 |
rawtaz | ive used this terraform set up with the exact same cloud-init configuration like a year ago, on the same openstack provider, and it worked fine every single time i provisioned an instance (which were a number of times). that suggests it might be debian 12 image used, but its also possible something else changed in their infra since then. | 18:27 |
rawtaz | around a year ago or whenever i ran this and it worked fine 100% of the time, cloud-init v. 20.4.1 was used, now it's v. 22.4.2 in case that matters. | 18:34 |
minimal | meena: I assume that's coming from their vendor stuff, what if you remove "scripts_vendor" from cloud.cfg's module list? ;-) | 18:35 |
meena | I've killed the machine already, but that would make sense. | 18:36 |
meena | it's just confusing that we log it as: 2023-11-02 18:14:38,648 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts) | 18:36 |
meena | cc_scripts_user.py | 18:36 |
minimal | rawtaz: have you checked the /var/log/cloud-init.log file when it doesn't work? or can you not SSH in to do that? ;-) Console access instead? | 18:37 |
rawtaz | minimal: i have not. im still doing some big picture isolating of low hanging fruit, e.g. making sure i can reproduce it and then trying the old image i know worked before. after that, i might try to get console access (i'll have to remove `ssh_pwauth: false` and set a password for that, effectively changing the cloud-init config, which kind of sucks). i do have console access but cant log in on it due to no password for root | 18:39 |
rawtaz | i do however have the logs for the provisioning of the instance, where i see most of the cloud-init and dmesg output. i guess there's a more specific cloud-init log though, i recall. | 18:40 |
rawtaz | just to be clear; the symptoms that make me think it's cloud-init not adding pubkey to authorized_keys are: 1) the instance log does not show the "Authorized keys from /home/admin/.ssh/authorized_keys for user admin" output when it doesnt work; and 2) i cant ssh to it, the different methods are exhausted when i try. | 18:41 |
rawtaz | question: im of course interested in /var/log/cloud-init.log and /var/log/cloud-init-output.log - is there a setting to enable more verbose debugging in them? | 18:43 |
minimal | rawtaz: for cloud-init.log that's controlled by /etc/cloud/cloud.cfg.d/05_logging.cfg | 18:46 |
rawtaz | and that is not something i can change with user-data, right? | 18:47 |
minimal | basically in "[logger_cloudinit]" section you want level=DEBUG | 18:47 |
rawtaz | (i presume it's set in the image that the openstack provider loads when provisioning the instance | 18:47 |
minimal | hmm, not sure. Once you get in check it anyway as it might have debugging on alreadt | 18:48 |
rawtaz | yeah. thanks :) | 18:49 |
minimal | are you referring to "ssh_authorized_keys:" in top-level of user-data or "ssh_authorized_keys:" inside a "users:" section? | 18:51 |
blackboxsw | rawtaz: by default cloud-init's log level should be set to DEBUG in most images. And the logs are very very noisy. For network config, openstack cloud-init will typically get network config from network_data.json, it'd be good to 'egrep "Applying network|network_data" /var/log/cloud-init.log' on a failed system as it's possible to see that network_data.json was provided by the openstack API and that cloud-init applied it. | 18:52 |
rawtaz | minimal: the latter, under a user in users: | 18:52 |
rawtaz | blackboxsw: that's good news :D | 18:53 |
minimal | rawtaz: then you'd want to search the cloud-init.log file for "Running module users_groups" and then in that section, when it handles the relevant user~(s), you should see "Writing to /home/XYZ/ssh/authorizedkeys - wb: [600] 99 bytes" | 18:56 |
minimal | s/ssh/.ssh/ | 18:56 |
rawtaz | awesome. i will look for that if i can get myself into the instance after it shows the problem. thank you | 18:57 |
blackboxsw | rawtaz: if you are seeing highly repeatable failure on openstack that seems to alternate on a predictable frequency (as in every other launch or every third launch) it's possible that there's a misconfiguration issue in some of the openstack nodes participating in the nova service back plane resulting in invalid data from that service every time round-robin load balancing talks to the "broken" service node. | 19:04 |
rawtaz | hmm, good point. but even if that is the case, wouldnt my provisionings compete with other users on the same openstack platform's provisionings, i mean in the round-robin? such that i can never know if i hit the first, second or third round, so to speak, as others interefere all the time. | 19:08 |
blackboxsw | rawtaz: good pt. wasn't sure how public/private this openstack deployment is for you. | 21:05 |
rawtaz | it's one that is used by many customers. i have no way of knowing when or whether other customers provision at the same time, but i would presume that over a couple of hours i can hardly be the only one. | 21:06 |
rawtaz | ive been doing provision after provision now, every time the exact same way so the only difference is time (which varies by at most 30 seconds between each try). | 21:08 |
rawtaz | i am unable to reproduce it over six attempts on the debian 11 image with cloud-init 20.4.1 but can reproduce it on the debian 12 image with cloud-init 22.4.2 | 21:09 |
rawtaz | the bad news is that i have not been able to reproduce it with a slightly altered user-data which sets a password for root so i could get to the logs :( | 21:09 |
rawtaz | BEWM! i think i got a reproduction on the debian 12 and with ability to log in (assuming cloud-init correctly set that part of the user-data) | 21:19 |
rawtaz | considering the fact that the exact same user-data can result in both success and failure, i'd say that suggests theres something in the openstack. but heck. very inconsistent | 21:20 |
blackboxsw | yes agreed, if you are using the same base image, it's either a timing thing with network setup or a timing thing where openstack isn't providing the right network-config info to the VM launched | 21:21 |
rawtaz | grr. i cant login with the user that's in my users: in the user-data, neither on the console of the instance nor using ssh -o PubkeyAuthentication=no -o PreferredAuthentications=password .. | 21:30 |
rawtaz | i wonder if it's even created. i might have to see if i can put a root user password in the user-data at a higher level than users: (in case users: isnt processed at all) | 21:31 |
rawtaz | i think i found something that might be very relevant. in the failed provisionings, i have "cloud-init[546]: 2023-11-02 16:09:00,377 - cc_final_message.py[WARNING]: Used fallback datasource" right after the "Cloud-init v. 22.4.2 finished" line. | 21:43 |
minimal | yeah that doesn't seem good | 21:44 |
rawtaz | this warning is not present when provisioning succeeds, and what is described about it at https://cloudinit.readthedocs.io/en/latest/reference/datasources/fallback.html is in line with the symptoms im seeing (parts or all of user-data not being used at all). | 21:44 |
blackboxsw | ok yes definitely. `cloud-init status --long` or `cloud-id` should confirm that you are DatasourceNone. So something didn't detect openstack for some reason | 21:44 |
blackboxsw | and `cat /run/cloud-init/ds-identify.log` should tell you whether OpenStack datasource was detected as a maybe or match on the failed runs. | 21:45 |
rawtaz | wow, spot on there blackboxsw. the failed ones has Datasource DataSourceNone and the successful ones have Datasource DataSourceOpenStack | 21:45 |
blackboxsw | I expect there's a possibility of URL GET retry expiration in /var/log/cloud-init.log resulting in some message like OpenStack not detected | 21:46 |
rawtaz | the thing is i have no idea how i could possibly get into the system. im not aware of a default login/password in this debian image they provision :D | 21:46 |
rawtaz | i think you may be right.. i think that with this information the next step would be to talk with the hosting provider about it. is there anything else you think i should do/gather before that? | 21:47 |
rawtaz | (without being able to log in to the failed systems) | 21:47 |
blackboxsw | rawtaz: you could override default root user with something like this in user-data https://dpaste.com/HLYJM7YZA | 21:55 |
blackboxsw | hrm..... n/m yet the prob is that your openstack datasource isn't being detected so user-data isn't passed through nevermind .... thinking again | 21:56 |
rawtaz | blackboxsw: the problem seems to be that uer-data isnt processed at all :D | 21:56 |
rawtaz | im writing them a mail and will talk with them tomorrow. i have a good feeling that theyll be able to resolve it now that we identified such a specific message :) | 21:57 |
rawtaz | it could very well be a round-robin thing, that i cant see a pattern in due to other customers rotating it as well | 21:57 |
blackboxsw | sounds good, right, I can't see how you'd otherwise be able to debug it if Openstack isn't detected on some launches unless you created your own snapshot image that defined DataSourceNone configuration in /etc/cloud/cloud.cfg.d/90-myds.cfg which provided the cloud-config I mentioned above. Then at least you'd have set the default password if DataSourceNone got detected. Example https://pastebin.mozilla.org/cx8Xiem3 | 22:04 |
blackboxsw | #do_not_use_in_production as it sets a default passwd for root :) | 22:05 |
rawtaz | yeah that would have been good. but its too much work, i'd rather they fix the issue for real | 22:07 |
rawtaz | haha nice :) | 22:07 |
blackboxsw | yep, not your job to sort the openstack config issues | 22:07 |
rawtaz | thank you. really, for all the input and help. it was very valuable! | 22:09 |
rawtaz | both of you :) | 22:09 |
rawtaz | bam, mail done | 22:13 |
rawtaz | thats about nine hours of my life i will never get back :/ | 22:13 |
rawtaz | i had to do other work today, bah | 22:13 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!