[00:00] hrm... ok seeing we get through cloud-init modules:config stages which means the datasource succeeded [00:01] hrm have to step away a bit. sorry for the moment dojordan_ will check it out [00:01] I'll have something on this later [00:01] sorry I should have tested this again this morning [00:01] no worries [00:02] one thing would be great if you could do, would be to change logging level so we can see more info on the serial port [00:07] * blackboxsw clicks enable boot diagnostics logging in the UI and clicks reboot on this instance [00:19] and bailing for dinner [00:37] same problem on artful [01:20] smoser: just pushed a merge proposal into bionic for today. I need a bit more time to triage what gives on Azure :/ [01:20] https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/336513 [01:20] ok I'm out for the night. gotta do the bedtime routine with the kiddos. more on azure first thing in my morning [01:21] thanks for all the help, sounds good [01:47] blackboxsw: fudge [01:47] https://jenkins.ubuntu.com/server/job/cloud-init-ci/725/console [01:47] :-( [01:47] * smoser fail [01:58] i'm fixing and pushing. http://paste.ubuntu.com/26448094/ [01:58] rharper, powersj blackboxsw if you're still around, to disagree or +1 that. [01:58] looking [01:58] running tox + centos build && git push upstream HEAD [01:59] looks sane [01:59] thakns [02:17] btw, it's a royal pain to launch multi-subnet/ip instances via the console; also it would be of great help if the ec2 docs would tell you what the format of the instance data is, for example local-ipv4s is a list of some sort of ipv4 addresses; but is it comma separated, newline, space? I can't find any examples with my google-fu so trying to get an instance up to check [02:46] *finally* [02:46] vpc is timeconsuming [03:00] hab [03:00] bah [03:04] resubmitting the merge proposal [03:07] with a new snapshot from master [03:09] ok, have crude ec2 network metadata to v1 config [03:09] * rharper calls it a night [03:14] nice [03:18] ok new MP against bionic up. thanks for the fix smoser [03:19] https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/336514 === shardy is now known as shardy_afk === shardy_afk is now known as shardy === Guest28399 is now known as mgagne [15:33] blackboxsw: on azure... [15:33] you there? [15:34] dojordan_: west coasters. [15:36] (they stay up late.. https://finance.yahoo.com/news/exclusive-fitbits-6-billion-nights-sleep-data-reveals-us-110058417.html ) [16:03] here [16:03] :) [16:19] ok azure triage time === hrybacki is now known as hrybacki_mtg [17:05] here now @blackboxsw [17:06] good to know you come to work at a reasonable time like the rest of us :) I'm walking through failure path again, as smoser surmised it's likely the cdrom disappearing on us before I cloud-init clean --reboot.... so I'm adding logs etc now and going through that to confirm [17:07] * blackboxsw wasn't sure yet why this seemed to work with tip of master though too. [17:07] hmm, interesting idea. doesn't waagent copy the ovf_env.xml off of the cd ? [17:09] FWIW we remove the CD as soon as we get a provisioning message [17:44] question, shouldn't we be keeping around the ovf-file before rebooting? [17:44] i think clean --reboot deletes it [17:45] err, nvm, it lives in /var/lib/waagent/ovf-env.xml [17:58] @blackboxsw, same problem on xenial [18:00] dojordan_: i had asked blackboxsw to edit /etc/cloud/cloud.cfg.d/05_logging.cfg and change the console logging from WARN to DEBUG [18:01] and then collect console log [18:01] are you able to easily do that too ? [18:01] yeah [18:02] dojordan_: for boot diagnostics [18:02] which storage account type do you need ? [18:03] * blackboxsw is already mid reboot/test on my bionic instance with debug console logs enabled, an azure storage account created and [18:03] boot logs enabled [18:04] ubuntu@40.70.46.88 [18:04] ssh-import-id smoser ? [18:04] checking boot logs now to make sure cloud-init reported correctly on this last clean boot [18:04] already done for dojordan and smoser [18:04] i'm in byobu term [18:05] permission denied [18:05] in [18:05] added agin [18:05] must've typod [18:06] denied [18:06] try again [18:07] cool [18:08] ok let's see here..... cehcking azure cli now to make sure I could see boot logs [18:08] before rebooting [18:08] worst case i can always get them :) [18:09] az vm boot-diagnostics get-boot-log --ids /subscriptions/12aad61c-6de4-4e53-a6c6-5aff52a83777/resourceGroups/SRUGRP10/providers/Microsoft.Compute/virtualMachines/my-b1 [18:09] 'ascii' codec can't decode byte 0xe2 in position 40610: ordinal not in range(128) [18:09] hrm oops az cli [18:09] checking UI [18:09] serial log in UI is working for me [18:09] ok [18:10] old log [18:10] http:pastebin.ubuntu.com/26453160 [18:10] http://pastebin.ubuntu.com/26453160 [18:11] blackboxsw: 'ordinal not in range' [18:11] ? [18:11] yeah az cli cloudn't decode the boot logs on the machinie [18:11] is that because az is trying to .decode() the console log ? [18:11] yeah [18:11] :-( [18:11] so something to file against azure cli when I dig into it :/ [18:11] but UI works [18:11] ugh, ill make a bug report [18:12] thanks dojordan_ [18:12] lemme get az cli version [18:12] can you pastebin the ui logs? [18:12] http://paste.ubuntu.com/26453179/ [18:12] dojordan_: you're in good company. this week, we've hit. [18:12] https://github.com/lxc/pylxd/issues/268 [18:12] and [18:12] https://github.com/boto/botocore/issues/1351 [18:12] dojordan_: ui logs is http://pastebin.ubuntu.com/26453160 [18:13] second boot? [18:13] blackboxsw: yeah, go for it. [18:13] dojordan_: smoser 2nd rebooting now [18:13] ok === hrybacki_mtg is now known as hrybacki [18:16] hrm any way to show in cli what power state is on node [18:16] let me see [18:18] dojordan_: ok it's looping [18:18] just got logs smoser dojordan_ [18:18] looping on reprovidsiondata [18:18] copying now [18:19] new boot log http://pastebin.ubuntu.com/26453225 [18:20] I'm looking now [18:20] yeah it's looping on 404 from reprovisioning === shardy is now known as shardy_afk [18:21] so, something triggered that poll which shouldn't have [18:21] DataSourceAzure.py[INFO]: Creating a marker file to poll imds [18:21] yup [18:21] well, that part seems like it is functioning as designed. [18:21] hahah [18:21] :) [18:21] dojordan_: logging seems extremely verbose if you're expecting this to sit up for 24 hours before use [18:22] but we won't log debug by default right? [18:22] debug does go to log file, but not to console [18:22] looks like < 1k/second. but that'd add up. [18:23] but thats not the issue. [18:23] why did we get into the imds [18:23] so cfg.PreprovisionedVm == True [18:24] something in _extract_preprovisioned_vm_setting returns True [18:24] we need to look over that ovf file again [18:24] I think [18:24] my guess is the refactoring broke something. the weird thing is it should have been covered by ut [18:24] yeah I thought so too [18:25] im re reading my code now and no idea... [18:26] I'm starting up a 2nd vm now and will run _extract_... on the doc [18:27] smart [18:28] we didnt see any of those debugs though... [18:29] blackboxsw: smoser@52.151.23.91 [18:29] if you want [18:29] take it [18:30] 40.79.65.171 [18:30] as well [18:30] 40.79.65.161 rather [18:31] false [18:31] ok,,,, so that should've been interpreted as false [18:32] oh no [18:32] bool("false") is true [18:32] ahhahhha [18:32] ohhh right [18:32] didn't translate from string type [18:33] ill push a fix [18:33] thanks for all the help [18:33] cheers. gotta go pickup a kiddo from school [18:33] see ya in a bit [19:37] @smoser, @blackboxsw, I pushed a fix, and added another UT that would have caught it. Testing now in azure. [19:47] my thoughts on removing the verbose logging: maybe just log a byte every request of something. Also, do we have a log level that goes to the console by default? [19:58] dojordan_: warning level is configured to the console by default. [20:04] * blackboxsw wonders about us adding a param in a subsequent branch to url_helper.readurl(quiet=(False|True) then callers handling retries outside of that could turn down the volume of logs [20:10] testing your latest branch now too [20:18] same here, *fingers crossed* [20:21] i got permission denied using password auth but at least the ECSDA host key changed on me [20:23] ssh auth shouldnt be affected. if you get there, it really should let you in [20:23] nothing woudl have deleted your keys [20:23] password would have been redacted in the ovf-env.xml, not sure what that changes [20:24] I know it's a nit, but changing the log message Start polling IMDS from debug -> warning feels like it really shouldn't be a warning level log [20:25] maybe I'm wrong (I know you are probably just trying to get it to show up in default console log configuration) [20:26] right, im open to other options, but it would be nice to get to the console [20:27] i can understand wanting to see somethign on the console (for azure platofrm perspectivee) [20:27] but itkind of stinks from the users' perspective. [20:27] they have a right to expect WARN in the logs to mean "something went wrong" [20:28] but here nothing in their control actually went wrong. [20:28] true... [20:28] im fine reverting it now that we found this bug [20:28] yeah. i think that is best for now. [20:28] dojordan_: one more thing while you are in there. [20:28] i have said many times i think python logging lacks level granularity [20:28] and cloudd-init usage of what *is* there is bad. [20:29] there's a util.translate_bool that might be of use in checking that truthy value from ovf file [20:29] it seems to me that this should qualify as INFO level [20:29] and at some point maybe a concerted effort couldg et INFO to the console [20:33] btw smoser and dojordan_ success ubuntu@40.79.65.161 [20:33] sweet! [20:33] what distro? [20:34] dojordan_: bionic, running through xenial now [20:34] Y [20:37] cool, I got back in on xenial too [20:38] just pushed those two changes (correct log level, and util.translate_bool) [20:47] great, xenial looks good for me too. [20:48] \o/ [20:50] ok, I'll land this when ci completes it's vote dojordan_ [20:51] thanks! [20:57] thanks for "dotting the i's and crossing the t's" [20:59] smoser: As touched on in previous discussion platform.linux_distribution() is deprecated in upstream Python and as of version 3.7 is expected to go away, in 3.6 on SUSE it returns an empty tuple, thus useless [20:59] I take it in Ubuntu you guys patched Python [21:00] anyway, I think we shoud make a decisison if we expand the dependencies to python-distro or if cloud-init gets it's own function to determine the distribution [21:00] thoughts? [21:01] https://github.com/nir0s/distro#distro---a-linux-os-platform-information-api [21:01] hm. [21:02] i think i'd just want to build my own. [21:02] s/my/own/ [21:02] s/my/our/ [21:03] i dont want an external dependency for something as seemingly simple as "figure out if you are on ubuntu, suse, redhat, ...". [21:03] :-( [21:03] fair enough, something like this? [21:04] if os.path.exists('/etc/os-release'): [21:04] use it and determine the distro [21:04] else: [21:04] try: [21:05] platform.linux_distribution() [21:05] except: [21:05] ...... [21:05] Sound reasonable? [21:05] yeah i guess. id 'also like to olet the packager easily just set it [21:08] well that's the other option, just punt and make the person running setup set the distro then we save the code all together [21:12] well, i thin i'd l ike it to do the right thing, but if the logic that is there doesnt "do the right thing", then let the packager set it. [21:13] i want trunk to "just work" though [21:13] OK, I'll see what I can come up with [21:24] https://bugs.launchpad.net/cloud-init/+bug/1745235 [21:24] Ubuntu bug 1745235 in cloud-init "distribution detection" [Undecided,New] [23:15] @smoser and @blackboxsw, thank you for all your help landing this PR. When will the nightly azure images contain these changes? [23:15] heh, was going to ping you that I just landed it :) [23:15] should be in bionic tomorrow [23:16] I'm thinking we will probably SRU in February.... so xenial, artful would have it our next SRU [23:21] bionic will work for me :) [23:22] dojordan_: oopsie, sorry I need to propose for merging into bionic [23:22] I'll put up another merge proposal tonight. we can probably land that tomorrow and it'll be published friday [23:22] just landing robjo's btrfs branch too [23:22] gotcha. Is the bionic branch just a delayed mirror of master? [23:24] dojordan_: yeah the way we structure bionic publishing is just to mirror all content from master tip [23:26] for SRUs into xenial, zesty artful releases we take a snapshot of tip as well and if some significant behavior change requires attention to retain backward compatibility we carry a small patch to retain behavior in xenial. [23:28] since bionic is not officially in feature freeze until March 2018, any change in behavior of cloud-init is given the go-ahead, so snapshots are easy https://wiki.ubuntu.com/BionicBeaver/ReleaseSchedule [23:29] SRUs into ubuntu series that are 'stable/released' require a bit more work on our end with testing/verification [23:29] https://wiki.ubuntu.com/CloudinitUpdates for our SRU process (TMI I know) [23:30] got it, this explains a lot. (not TMI :) )