=== shardy is now known as shardy_mtg === shardy_mtg is now known as shardy [15:54] smoser: rharper: Does https://bugs.launchpad.net/cloud-images/+bug/1740176 look at all familiar as a cloud-init bug? [15:54] Launchpad bug 1740176 in cloud-images "Disksize is only 2GB, expected in 10GB" [Undecided,Confirmed] [15:54] We haven't triaged it yet, but thought it might look like something you know how to handle. [16:07] Odd_Bloke: did you recreate that ? [16:16] smoser: I think Tribaal did. [16:17] well, some user reported when using the artful Vagrant image [16:17] and I could reproduce it [16:17] Tribaal: Uppercase T? [16:18] dpb1: says the guy with a "1" appended to his nick :) [16:18] touche! [16:18] dpb1: I thin klowercase was already taken [16:19] good ol irc === Tribaal is now known as tribaal [16:19] there! :) [16:19] hi tribaal [16:19] welcome [16:27] tribaal: yo. we were just talking about that bug [16:28] oh? [16:28] smoser: thought it might be reltaed to https://bugs.launchpad.net/cloud-images/+bug/1726818 [16:28] Launchpad bug 1726818 in linux (Ubuntu) "vagrant artful64 box filesystem too small" [High,In progress] [16:28] and happy new year BTW [16:28] yeah, i marked as a dupe. it certainly smells that way. [16:28] that looks like a total dupe indeed [16:29] tribaal: it would be good to know if this is present in bionic [16:29] smoser: give me a sec, should be easy to try [16:30] as it really needs to be fixed in bionic. [16:30] +100 [16:31] tribaal and then, it looks like there is a 4.14 in proposed. [16:31] i suspect that this is probably still present in bionic in the 4.13 that is in the release pocket [16:31] but has a chance of being fixed in -proposed 4.14 [16:31] https://launchpad.net/ubuntu/+source/linux [16:35] smoser: indeed, bionic is affected as well (well, as of the 20180101 image) [16:38] smoser: I'll enabled -proposed in the machine and reboot - could you remind me what to nuke to make cloud-init think it's running for the first time? /var/lib/cloud/*? [16:39] tribaal: cloud-init clean [16:39] smoser: TIL! Sweet [16:42] tribaal: i'd wonder though if the image might be 'dirty' at that point [16:42] i dont know how easy it is to supply your own image to vagrant [16:42] but you might be able to modify [16:42] https://github.com/cloud-init/qa-scripts/blob/master/scripts/get-proposed-cloudimg [16:42] or basically do what it does [16:43] better to create yourself a "clean" image [16:43] smoser: yeah, I could just build a vagrant image with -proposed enabled but that takes much longer [16:43] blackboxsw: [16:43] root@b2:~# cloud-init clean [16:43] ERROR: Could not remove instance: Cannot call rmtree on a symbolic link [16:43] boo! [16:43] will work up a fix [16:43] (and I'm almost EOD) [16:43] exits non-zero too, and that is clearly not a error. [16:44] tribaal: well, essentially i suspect you may be able to do [16:45] mount-image-callback your.orig.image --system-resolvconf -- /bin/bash [16:45] then inside, just enable proposed, apt-get update, apt-get install linux-image (or whateve rpackage that is) [16:45] then exit [16:50] the dirty way didn't produce the clear case, so I'll have to do something like this, yes (or just build an image with -proposed enabled) [17:01] smoser: how'd you reproduce that cloud-init clean. I launched a bionic container, by default it doesn't hit this symlink error [17:02] I assumed bionic because your instance was named b2 [17:02] in either case, I can fix it easily enough. but wondered how we got there [17:19] blackboxsw: https://paste.ubuntu.com/26314129/ [17:20] +1 powersj I have a fix, just adding unit tests, will [17:20] +1 powersj I have a fix, just adding unit tests, will try to reproduce here too [17:27] blackboxsw: hm. [17:28] i just fresh launched bionic-daily container [17:31] csmith@uptown:~$ lxc --version [17:31] 2.0.11 [17:31] heh. powersj [17:31] xenial [17:31] for the loss [17:32] except I wouldn't expect that to make a difference in this case :\ [17:32] right? should only be the version of cloud-init in the image [17:32] one would think. I'm testing from zesty [17:33] hm. [17:34] https://pastebin.ubuntu.com/26314191/ [17:34] yeah from zesty, still no problem on my side [17:34] anyway we can easily test for link and unlink if needed [17:35] this is weird [17:35] but not quite sure why that shows up in some cases [17:35] i reproduced again, but then not [17:36] 299e803c9fe1 | no | ubuntu 18.04 LTS amd64 (daily) (20180101) [17:36] that's the bionic image I used [17:36] http://paste.ubuntu.com/26314202/ [17:38] https://pastebin.ubuntu.com/26314219/ [17:38] seemingly the same, but I get stuccess (I would've expected it to always fail [17:38] on instance symlink [17:38] yeah. hmm. [17:39] I'm adding a debug print [17:40] blackboxsw: i am guessing. [17:40] but i suspect thatlistdir() [17:40] listdir('.') [17:40] is not any specific order [17:41] and that when it does instances first its ok but when instance first it is not [17:41] or reverse [17:43] yeah, I could have sorted() the dir list and then we would've always seen it. I bet becaause is_dir returns false when a symlink target is already broken [17:43] http://paste.ubuntu.com/26314255/ [17:43] yeah that's the fix I have [17:44] same one. just wanted to know why [17:44] sorted is arbitrary [17:44] it seems dirlist is indeterminate apparently. as you see the issue only sometimes [17:44] yeah, its not guaranteed sorted. [17:45] it is just traversing the dirent [17:45] and even if it was, its arbitrary that 'instnace' would sort before 'instances' [17:47] https://pastebin.ubuntu.com/26314266/ [17:48] I would've thought sorting would have put instanec before instances too, but yeah the dirent iterartor isn't sorted [17:48] ok anyway pushing the fix and unit test [17:49] I wouldn't thought that even a sorted list would have ' [17:49] I wouldn't thought that even a sorted list would have 'instances' > ' [17:49] 'instance' [17:49] but maybe that's arbitrary too as you point uot [17:50] sorry was typing on a different keyboard. [17:56] blackboxsw: sorting would fix it i think, but just seems arbitrary from the perspective of it will start to fail again if we had a link named zz to aa [17:58] right, yeah it was a fragile fix to sort(and would have been wrong) [17:58] because it ignored the problem (that we weren't handling symlinks) [17:58] https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/335671 has the fix [17:58] didn't realize I was doing something a bit different than you suggestion. [17:58] utils util.is_link instead [17:58] s/utils/using/ [18:13] @blackboxsw - happy new year! is there anything blocking checking in my PR to master? https://code.launchpad.net/~dojordan/cloud-init/+git/cloud-init/+ref/azure-preprovisioning [18:17] dojordan: I think there was a side discussion I had with smoser that we might hit an issue with systemd unit timeouts. if we attempt block indefinitely in a polling loop in cloud-init's unit systemd might timeout at 5 minutes??? which could cause the behavior you are looking for to fail. [18:18] I'm not sure about the 5 minute auto-timeout in systemd, lemme find a reference to see if I can dig up a doc on it [18:18] I've tested in 16.04 with polling for much longer than that, and systemd didn't kill cloud init or anything [18:19] what is the service name? [18:21] cloud-init.service I believe [18:23] doug@dojordandev:~$ systemctl show cloud-init.service -p TimeoutStopUSec TimeoutStopUSec=infinity [18:23] thats why is worked :) [18:24] or cloud-init.targetahh there you go [18:25] ooops [18:25] ahh I mean [18:26] looks like 'we' (cloud-init) don't explicitly set that timeout to inifinity, but me OS-specific setting [18:26] ohh wait [18:26] TimeoutSec=0 [18:27] that infinity timeout was on the azure 16.04 LTS image [18:27] looks like we set that for the systemd/cloud-final.service.tmpl [18:27] got it [18:28] ok this might not really be an 'issue' then, though it *may* be worth us explicitly configuring that inifinity timeout on azure images... I'm not certain. [18:28] is 0 == infinity? [18:30] * blackboxsw thinks, as TmieoutSec == setting for both TimeoutStart and timeoutStop [18:30] but trying to confirm [18:30] sd_notify(3)). [18:30] TimeoutSec= [18:30] A shorthand for configuring both TimeoutStartSec= and TimeoutStopSec= to the specifie [18:30] from https://www.freedesktop.org/software/systemd/man/systemd.service.html [18:31] dojordan: i think there is still general issues with timeouts [18:31] though I don't see a reference that "0" == 'infiinity' for Timeout(Stop/Start)Sec [18:32] it just says "Pass "infinity" to disable the timeout logic." on the timeoutstopsec [18:32] hm.. [18:32] but are we currently doing that... [18:32] buti dont know how other things in boot handle it [18:32] so cloud-init-local or cloud-init.service may have TimeoutSec set correctly [18:33] maybe i'm worng. [18:33] doug@dojordandev:~$ systemctl show cloud-init.local -p TimeoutStopUSec TimeoutStopUSec=1min 30s [18:33] but i swear if've seen boot just go on when i didn't think it should. [18:33] so local is showing 1:30, but service is showing infinite [18:34] dojordan: well you typo'd there. [18:34] cloud-init.local is not a service [18:34] cloud-init-local.service [18:34] d'oh.... [18:35] "cloud-init-local.service" == TimeoutStopUSec=infinity [18:42] dojordan: so how does /var/lib/waagent/poll_imds get written? [18:42] newerubuntu and cloud-init i think do not rely on waagent at all [18:42] and i'm not really interested in adding such a dependency back [18:42] there is no dep on waagent [18:43] we can change the path - it should probably be in the instance directory [18:43] what creates it ? [18:43] the azure data source [18:43] oh. ok. yeah. i see that. the waagent thrrew me off. [18:44] also, I confirmed why the timeout is infinity - since we are using type=oneshot the timeout is disabled [18:44] i suspect that 0 did mean infinity at some point [18:45] and probably still does [18:45] REPROVISION_MARKER_FILE, why do we need that? [18:46] the marker file (/var/lib/waagent/poll_imds) is needed in case the VM reboots for whatever reason before it is reused by a customer [18:46] rather than internal state or something. [18:46] we report ready to the fabric which means we detach the provisioning ISO [18:46] why would it reboot? [18:46] hardware, software updates, etc [18:46] underlying platform [18:47] and we don't write the Ovf since we don't want to persist any azure specific data since the real ovf will come from the customer [18:47] we will be in this polling loop for a while [18:47] and if there is an unexpected reboot we want to keep polling when the vm comes back up [18:49] it seems odd that the platform would choose to reboot such a machine [18:49] in azure, all of our VMs are backed by remote storage. so when hardware issues occur, we simply move the VM to a new machine since the data is persisted and reboot it [18:51] by data i mean the os vhd [18:52] seems like maybe you could just kill machines in this state. as they're not owned by anyone while maybe everything "should work" if you just kill a machine while booting and then re-start it, i suspect that in reality there are lots of issues. but thats not really important here. [18:55] smoser: true, but that would require some rearchitecting on a different level which is not going to happen any time soon... [18:58] the same behavior is true today for all VMs. if the reboot before cloud-init finishes we just move thme [19:30] @smoser these are all valid points. the underlying platform is not perfect, and there are cases today where we don't get an ACK from our remote storage layer, then the VM will be busted. I think the important thing about the marker file is that allows us to keep the pre provisioned vms around for longer which enables us to have a higher hit rate for reuse and therefore increase boot performance. the availability won't be any w [19:37] dojordan: i responded on mp there. [19:37] sorry for taking so long to take a look at your proposal [19:37] dojordan: please don't take offense. over all, you've done a good job. [19:38] no worries, appreciate the feedback. Will address the PR comments later today [20:03] pushed a couple changes to the review-mps script in qa-scripts repo for landing branches [20:03] slowly building it into something useful/working [20:03] thx for the review btw. landed [20:07] heya guys, im doing a really ugly hack with write_files but been getting lots of trouble getting write_files to work at all, i have a hard time writing two files as well. first example https://hastebin.com/otubucuqin.pl , second example https://hastebin.com/egoxufopab.js [20:08] like the simplest example with only a path + content works... other than that i have a really hard time getting it to execute [20:09] is my syntax way off? [20:10] You might find it easier just to base64 it. [20:11] ok [20:11] so - encoding: b64 ? [20:11] and then just content: | [20:11] One sec. [20:12] write_files: [20:12] - encoding: b64 [20:12] content: T3JpZ2luOiByZXBvLnFiaXMuY28KTGFiZWw6IHJlcG8ucWJpcy5jbwpDb2RlbmFtZTogeGVuaWFsCkFyY2hpdGVjdHVyZXM6IGkzODYgYW1kNjQgc291cmNlCkNvbXBvbmVudHM6IG1haW4KRGVzY3JpcHRpb246IHFiaXMgcmVwbwpTaWduV2l0aDogZGVmYXVsdCAK [20:12] path: /tmp/distributions [20:12] owner: root:root [20:12] permissions: '0644' [20:12] Sorry that may not have come through very well. [20:12] no worries [20:13] so i have to encode it then [20:13] https://paste.ubuntu.com/26315006/ [20:13] Yes base64 -w0 [20:13] Copy/paste [20:13] alright [20:16] I dont’ thinkl your second example will work. [20:17] http://cloudinit.readthedocs.io/en/latest/topics/examples.html [20:17] It’s very picky about where you put “-“ and spaces, etc. [20:23] aye [20:26] doesn't seem to like multiple files at all :P [20:26] i got it to work once or twice [20:26] this isn't working either [20:26] :( [20:29] ivve: i suspect that you have general yaml issue. [20:30] aye, i keep getting [ 18.299202] cloud-init[861]: 2018-01-03 20:29:00,205 - __init__.py[WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata: '#cloud_config...' [20:30] oh. yeah. [20:30] cloud-init will ignore that [20:30] if it does not start with '#cloud-config' [20:30] it does [20:31] or is not declared as cloud-config with multipart [20:31] just add '#cloud-config' as first line. [20:31] oh shit [20:31] _ not - [20:31] :D [20:31] oh man [20:31] ah. yeah. funnyt. [20:31] i used a heat template first and just shortcutted [20:32] added # and removed the : [20:32] jeez what an idiot i am [20:33] ivve: whenever someone shows me something like the hastebin there, the first thing i do is use 'yaml-dump' [20:33] http://paste.ubuntu.com/26315078/ [20:33] json output is much more clear and identifies errors to a human more clearly. [20:34] (you didn't mess up that way, this is just fyi) [20:34] aye its a good pointer, thanks [20:35] however i can't read json even if my life depended on it [20:35] well it works now [20:35] i guess i will be encoding stuff now [20:35] it help writing stupid expect scripts :P [20:36] Use IntelliJ [20:36] It’s a life saver. [20:38] And it has a vi/vim mode too which is nice. [20:41] stack completed, music to my ears :P [20:41] thanks a bunch guys [20:42] while it's a little late in the process, your machine with cloud-init installed can run 'cloud-init devel schema -c your-configfile.yaml to validate the yaml https://pastebin.ubuntu.com/26315122/ [20:42] :) [20:43] it at least gives you a quick once over on the yaml file once you've discovered something didn't work as you expected [20:44] Is that in mainline now? [20:44] yep. should be in xenial and greater [20:44] and on trukn [20:44] and on trunk [20:44] Cool. [20:45] aye its a good pointer as well and i did think about it but never came to that point since i just added a file and stuff stopped working [20:45] needs a lot of work as it'll eventually support all attributes of each cloud-config module and --annotate [20:45] to report specific errors [20:45] runcmd: also doesn't like " or ; [20:45] or : for that matter [20:47] my new year's resolution is to make "cloud-init schema" a first-class citizen with support for reporting schema errors in all 54 cloud-config modules [20:47] ivve: https://paste.ubuntu.com/26315137/ [20:47] Someone here helped me with that… :-) [20:47] we'll see if that holds (hopefully better than the "exercising daily" new year's resolution) [20:48] Yay exercise. [20:49] I highly recommend: https://stronglifts.com/5x5/ Only 3 or 4 days a week. No two hours at the gym. :-) [20:50] ah yes ofc you can type it that way to get it right [20:50] however as you are pointing out help is needed to write it and read it :P [20:50] Well that gives you some syntax to pattern off of. [20:50] Now that I have that I can generally manage on my own. [20:51] is just preferred using write_files to write the script and runcmd: - bash/expect /path/to/script [20:51] its just a hack for a demo, nothing proper [20:51] i'd use ansible for proper stuff