=== simpoir|afk is now known as simpoir === nacc_ is now known as nacc [15:54] blackboxsw, seen this - https://www.dropbox.com/s/tqao6tbumfb0vbh/Screenshot%202018-04-03%2010.47.30.png?dl=0 before? [15:56] Beret: if this is NUCs, typically I see that when disks start going bad? [15:57] it is [15:57] I figured [15:57] ok [15:57] I haven't yet. was looking over other bugs saw andreas hit a bug related to that failure path, but it was zfs [17:46] Hello all [17:46] I can't seem to find a particular answer for cloud init - can I ask it here? [17:47] Is there a way to tell write_files module to wait until the users module is finished? [17:47] hexorg: ask away questions/discussion is always welcome in this channel [17:47] if someone doesn't know now, maybe others will be able later [17:48] thanks :) [17:48] I'm trying to write_files into a newly created user folder, but cloud-init seems to run write_files before users [17:48] as a result, write files fails with no such directory [17:50] hexorg, generally, ordering of module sequence is not user- (#cloud-config) configurable, but the order in which modules are run is defined in /etc/cloud/cloud.cfg for each stage cloud_init_modules(run at init stage), cloud_config_modules: run in modules-config stage, cloud_final_modules: run last [17:50] let me see if I can answer the specific question or if I have to pass. [17:53] Yeah I have run into that... Hacky way of dealing with it was to to write files to a temp directory then move them into place after user is created. [17:54] Yeah ok. Just making sure I'm not missing some more direct way [17:54] hexorg: right, write-files is run in init stage which happens in the init-network stage of cloud-init boot per http://cloudinit.readthedocs.io/en/latest/topics/boot.html. And - users-groups [17:54] lives by default at the end of that list because it might depend on files written by write_files. [17:55] you might also be able to run your user creation logic in runcmd which runs in the cloud-init final stage (after both write_files and user creation modules) [17:55] which I think is what blkadder is referring to [17:56] Understandable. Thanks! [17:56] blackboxsw, Yep. [17:57] blackboxsw: almost done with your fixes [17:57] +1 rharper, I've got to get my branch in shape for dropping ifconfig [18:40] * blackboxsw tests the last branch for SRU now [18:40] https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/342007 === Raboo_ is now known as Raboo [19:21] platform: lxd encountered error: 'Operation' object has no attribute 'description' [19:21] blackboxsw: powersj thoughts? [19:23] oh, I think that's the parsing of the cloud-init result/status json files [19:24] that likely happens if it's not yet booted [19:45] meh rharper I'm going to reject exception_cb branch, the raising of exceptions in principle makes sense, but the logic checking for exc.code doesn't seem to behave as expected on 404s (the httpcode isn't attached to the UrlError raised) [19:45] ok [19:46] so, I think more rework is needed there, and we'll need discussion on it [19:46] I read that code multiple times, but really needed either more unittests or integration tests to validate exactly what behaviors we want [19:46] I think we should have a series of unittests that cover the various expected behavior paths we need, and then run this against that [19:50] yep, and the existing code I believe actually doens't work right [19:50] even before the rewrite. or the previous rewrite [19:50] not a critical issue (as it'll ultimately retry more than it is supposed to) which costs time, not functionality [19:51] but yeah something smells a bit there [20:15] yep confirmed, that exception_cb refactor needs work, I confirmed that even the implementation smoser took was inconsistent. after we SRU I'll put up a branch which adds unit test coverage to examine proper exception raising behavior from readurl. [20:15] we can pay this "risk" cost on next SRU. [20:16] in terms of having to retest on the affected clouds [20:16] to make sure there isn't a regression [20:16] ok I'm putting up bionic merge proposal now [20:25] rharper: here's the proposal for syncing tip to bionic https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/342605 [20:26] rharper: I'm putting together the SRU for xenial and artful now (should have the same content bump) [20:35] xenial SRU: https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/342606 [20:39] rharper: artful SRU too https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/342608 [20:39] blackboxsw: ok, reviewing [20:39] there are the three release candidates [20:39] thanks [20:39] * blackboxsw ran the new-upstream-snapshot from qa-scripts https://github.com/cloud-init/qa-scripts/blob/master/scripts/new-upstream-snapshot [20:43] meh my comments on smoser's branch are as follows rharper https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/342007 [20:44] Hi, I'm trying to disable ipv6 in an lxd container (ubuntu), I got cloud-init to write the sysctl for it [20:44] sorry for the thrashing. only one minor diff is required in his implementation, but I'd feel better if we got some good unit test coverage on the function [20:44] but I can't get it to restart systemd [20:45] bjonnh: does lxd have a setting to do that? [20:45] dpb1: NO… [20:45] and they say "just set the sysctl" [20:45] ipv6 is disabled on host [20:46] bjonnh: restart systemd? [20:46] but containers still get an ipv6 link-local [20:46] nacc_: sorry restart a systemd service [20:46] bjonnh: ah ok :) [20:46] https://github.com/lxc/lxd/issues/3333 [20:46] oh [20:46] - [systemctl, restart, systemd-sysctl] [20:46] I think I had to put "" around systemd-sysctl [20:46] see alberto's comment about disabling ipv6 in containers [20:46] if that helps [20:46] lxc network set lxdbr0 ipv6.address none [20:47] is it possible to start something really early in cloud-init with my user conf? [20:47] blackboxsw: I can't do that because I'm using my own bridge [20:47] yes, what blackboxsw said [20:47] (that doesn't have ipv6…) [20:47] so lxc complains that it cannot manage my device [20:47] bjonnh: you want this globally or per container? [20:48] globally [20:48] I don't have anything ipv6 here [20:49] blackboxsw: approved push to ubuntu/devel, I got the same you did [20:50] bjonnh: so, can't you just reconfigure your bridge to not have anything ipv6? [20:50] dpb1: that's my point… It doesn'… [20:50] rharper: will push to ubuntu/devel and see if we can get an upload there. [20:50] bjonnh: sorry, don't follow that one [20:51] I don't think you can disable the kernel ipv6 setting from within an unpriv container [20:52] the host has ipv6 disabled [20:52] with what setting? are you ignore RA s ? [20:52] net.ipv6.conf.vlanbr2.disable_ipv6 = 1 [20:52] should I do [20:52] net.ipv6.conf.vlanbr2.accept_ra = 0 [20:52] too? [20:52] yes [20:52] oh [20:53] that will prevent any RAs from showing up on your interfaces [20:54] * dpb1 hesitates to ask why the need to disable ipv6 on this host :) [20:54] When this value is changed from 0 to 1 (IPv6 is being disabled), [20:54] it will dynamically delete all address on the given interface. [20:55] I suspect that at the time it's set, it drops addrs, but if you accept RAs then new ones can come in [20:55] dpb1: because I have nothing ipv6 and the update of packages throws me: Cannot initiate the connection to archive.ubuntu.com:80 (2001:67c:1562::16). - connect (101: Network is unreachable) [IP: 2001:67c:1562::16 80] [20:55] and waits for a second then switch to the ipv6 [20:55] ipv4 sorry [20:55] bjonnh: yet the router is advertising it? [20:56] maybe dnsmasq is doing something on its side [20:56] … [20:56] that wouldn't totally shock me :/ [20:58] powersj: pylxd issue we've seen before? https://pastebin.ubuntu.com/p/VZHHpHD4WN/ [20:59] per ci build https://jenkins.ubuntu.com/server/job/cloud-init-ci/968/console [20:59] blackboxsw: yes usually that's when pylxd is out of sync with lxd [21:00] :\ so hopefully that isn't the case [21:05] possible given the release push on friday I suppose [21:05] blackboxsw: what's the xenial and artful sru bug numbers ? [21:06] new-upstream-snapshot said something to me about that [21:07] hrm, your xenial branch didn't have the changelog update ? [21:07] nor the artful one [21:08] rharper: the SRU bug is https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1759406 [21:08] Ubuntu bug 1759406 in cloud-init (Ubuntu) "sru cloud-init (17.2-35-gf576b2a2-0ubuntu1~16.04.1 update to 18.2-0ubuntu1~16.04.1)" [Medium,Confirmed] [21:08] hrm, checking xenial [21:08] blackboxsw: shouldn't we see a changelog diff between your branch and origin/ubuntu/xenial ? [21:08] like we did for ubuntu/devel ? [21:08] rharper: line 100 of the visual diff at https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/342608 [21:09] * rharper is blind [21:09] yes [21:09] confirmed blind [21:09] * rharper continues [21:09] | eth0 | True | fe80::216:3eff:fed9:8c65/64 | still gets a link-local [21:09] router has ipv6 fully disabled [21:10] the changelog diff between xenial and devel debian/changelogs should only have minor diffs in package version numbers & maybe the new-upstream-snapshot which hasn't been removed from bionic [21:12] * blackboxsw is still fumbling around with why pylxd is complaining (as I thought we pinned it in tox integration-requirements.txt) [21:13] bjonnh: hrm, I'm not quite sure on the ipv6 container at the moment, maybe somebody else has a clue there. [21:13] blackboxsw: pylxd probably didn't change, but lxd could have [21:13] so I'm able to disable it but this happens after the package upgrade [21:13] that cloud-init does [21:13] blackboxsw: ah yes... lxd 3.0 is now installed [21:13] that happened yesterday [21:15] blackboxsw: https://github.com/lxc/pylxd/issues/284 [21:16] bjonnh: honestly, I'd ask about this in #lxcontainers. it's weird to me that you are having to try to workaround this in cloud-init on each instance [21:16] bjonnh: you should do it on the host [21:18] blackboxsw: xenial is the same, though you put xenial-proposed in the release ? is that what we normally do / [21:18] well I'm using a bridge [21:18] so it is an instance by instance problem [21:18] blackboxsw: same for artful [21:19] (it is not the lxd bridge, it is a bridge over a vlan on its specific subnet) [21:19] bjonnh: and your link-local ipv6 interrupts apt ? [21:19] inside the instances yes [21:19] it slows them down [21:19] it does it only during startup [21:19] after that I'm able to set the required sysctls [21:20] so it stop allowing ipv6 [21:20] I'm surprised, I've no ipv6 available here but I wouldn' think the ipv6 addr for the archive is reachable via the link-local, wouldn't think it would try [21:20] me neither… [21:20] I've never seen that… [21:21] nor I [21:21] so something is special about this setup I think [21:25] rharper: for each release xenial and artful we should run dch --release -D artful-proposed or xenial-proposed for the debian/changelog to match the former released stream in debian/changelog [21:25] at least as the final step prior to the upload [21:25] blackboxsw: ok, I wasn't sure [21:25] so I have always dch --release -D artful-proposed or xenial-proposed instead of UNRELEASED [21:25] I do the dch release [21:26] it was whether it should have -proposed or not [21:26] it seems (to me) strange to put a pocket value into the changelog when it's going to get copied over into the archive [21:26] yeah we decided in changelog we want to leave it all as -proposed to indicate when we started performing SRUs for a given stream [21:26] but maybe there's some backend magic that fiddles that value in the change log [21:26] ok [21:26] because any changelog entries before the first SRU would have the base 'xenial|artful' [21:27] yeah, makes sense [21:28] yeah nothing seems to fiddle with it post-release: https://pastebin.ubuntu.com/p/b2639pyZtx/ that's from apt-get changelog on xenial [21:28] xenial-proposed still listed in there [21:28] was wondering whether it'd get scrubbed [21:29] hehe [21:30] * rharper relocates back home [21:37] yeah that pylxd traceback started happening between Apr 2, 2018 8:27 PM and Apr 3, 2018 7:11 PM [21:37] and looks like it affects rharper's ntp branch too [21:38] ok so I feel good this isn't related to the branches I put up against ubuntu/devel|artful|xenial [21:38] but need to fix ci [21:38] trying to reproduce the problem locally [21:44] blackboxsw: you can also hop on the CI box [21:45] it'll be faster.... locally on my xenial box, no error. trying on my other box now not seeing it either [21:45] will do [21:45] * blackboxsw digs up the doc [21:45] blackboxsw: lxd version? [21:45] 2.0.11 [21:45] need 3.0 ;) [21:46] yep need the snap looks like [21:46] or bionic ;) [21:47] snap == faster path to the failure I'm expecting [21:47] :O0 [21:47] yeah [21:47] much faster [21:52] powersj: blackboxsw: is ci back up now? we did some backend storage work for lxd [21:53] rharper: it is, but it appears we did get a lxd 3.0 upgrade last night [21:53] powersj: rharper I can't ssh as ubuntu to ci [21:53] blackboxsw: jenkins@ [21:53] blackboxsw: Ill import you to ubuntu as well [21:53] blackboxsw: your lp name ? [21:54] I don't see you in either key files [21:54] ssh-import-id chad.smith [21:54] ok, in as ubuntu [21:54] thx [21:54] * blackboxsw can take over the world now [21:54] thx [21:54] and jenkins [21:55] what's the pylxd trace back ? [21:55] rharper: https://pastebin.ubuntu.com/p/VZHHpHD4WN/ [21:55] so, lxd pushed 3.0 into the stable branch ? [21:55] that doesn't seem right [21:55] snap info lxd [21:56] so I thought that was related to the cloud-init result.json but that's really pylxd ? [21:56] yep [21:57] we can switch to 2.0 track [21:57] For the LXD snap, 3 tracks are provided: [21:57] latest (latest LXD feature release, currently 3.0) [21:57] 2.0 (previous LTS release) [21:57] 3.0 (current LTS release) [21:58] I'm fine with moving to 2.0 temporarily, especially if you are trying to get a release out [21:58] if they're not going to release pylxd in step with the base, then we need to run behind tip [21:58] and dpb1 I'd like to raise this as an issue with the lxd team [21:58] we continually get broken every single time they change [21:58] well... pylxd did get updated to fix things [21:58] we just hard code the version [21:58] but not *before* [21:58] the release [21:58] it should block a release [21:58] it was updated before [21:58] month ago or so [21:58] so, then I'm confused [21:58] oh, it's not packaged with lxd ? [21:58] correct [21:59] but there is a dependency ther e [21:59] that's still crappy [21:59] it is [21:59] frustrating even [21:59] I suspect this is one of those snap not-yet-solved thingys [21:59] smoser and I chatted about getting rid of pylxd at sprint [21:59] it's supposed to be stand alone ? [21:59] yeah [22:00] yes, that ^ [22:00] dpb1: that said, openstack has to have this problem as well [22:00] they're not going to switch to a cli anytime soon [22:00] one should be able to express dependencies between snaps, or the snap (lxd) would need to provide the pylxd bindings in the snap [22:01] powersj: so we switch back to 2.0 channel or can we bump the pylxd or do we have to change the ci call ? [22:02] either a) switch back to 2.0 to fix things quickly and move on or b) bump pylxd (which we will have to do eventually anyway) [22:02] yes please, let's focus on practicle. we can corner stgraber at the sprint [22:02] * blackboxsw just reproduced the issue on jenkins workspace [22:02] ok [22:02] I'd prefer to update tox.ini to use a newer pylxd [22:03] that way we keep using lxd 3.0 and move on [22:03] +1 [22:03] and can talk about this at later date [22:03] I'm updating now to test [22:04] hrm just updating to tip of github/lxc/lxd isn't cutting it . lemme do a tox -r -e to make sure it actually pulled in latest [22:05] yeah good idea to blow away .tox [22:05] or do that [22:05] urg [22:06] one more commit to master =P [22:06] yeah will have to respin on that [22:06] so tomorrow for SRU [22:07] I'll have the branch queued and landed in tip tonight with powersj blessing, then we can do the dance on bionic artful xenial tomorrow [22:08] note grabbing tip of pylxd hits another traceback that'll need a tiny tweak to integration tests :) [22:08] paste? [22:08] enroute [22:08] heh version 3 :) [22:09] https://pastebin.ubuntu.com/p/8WtSTRKCTh/ [22:09] ooooo yes [22:09] the logging [22:09] there'll never be a v. 3 :) [22:09] well the issue we were having with v1 and v2 should be fixed in v3 [22:09] which is why that is there [22:10] it has to do with console logging with the lxd snap [22:10] wow smoser and I messed up there :) [22:10] oi [22:10] are we sure we don't want to just revert to 2.0 [22:10] str has no attribute startwith [22:10] and sort this lxd/pylxd/ci mess out later ? [22:11] that's typo [22:11] :) [22:11] a tiny little s [22:11] unless blackboxsw typo'd irc [22:11] nope official committed typo [22:11] =/ [22:11] how'd flake8 not get that ? [22:11] good pt [22:11] or pylint [22:11] flake8 look at cloud_tests? [22:11] or ignore that dir [22:12] it should look at tests [22:12] nope both flake and pylint look at tests, right [22:12] cloudinit/ tests/ tools/ [22:12] which has unit and cloud tests [22:12] what file ? [22:13] tox.ini [22:13] failure was in tests/cloud_tests/platforms/lxd/instance.py", line 213, in _has_proper_console_support [22:13] next failure: powersj: rharper: https://pastebin.ubuntu.com/p/nykVTpBQrh/ [22:13] I found it, I meant the startwith [22:13] it's in instance.py [22:13] my flake8 says something about local variable e [22:14] so maybe it's a lint issue [22:14] to speed up iterations, I'm running tox -r -e citest -- run --verbose --os-name xenial --test modules/apt_configure_sources_list.yaml --platform lxd [22:14] lxc is not operational on torkoal [22:14] $ lxc list [22:14] Error: Get http://unix.socket/1.0: dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory [22:14] ok, bbiab [22:14] well that could cause problems ;) [22:15] shocking that flake8 and pylint don't care [22:15] powersj: group/permissions errors? [22:16] hmm that socket file doesn't exist [22:21] blackboxsw: try now [22:22] fwiw looked at https://github.com/lxc/lxd/issues/4245 [22:24] powersj: yep good find [22:25] ... runs fine now with tox and cloud_test patch [22:25] getting patch together [22:26] http://paste.ubuntu.com/p/sVNf2nKCnS/ [22:26] setting pin now [22:29] ok pin works http://paste.ubuntu.com/p/DQ799wxc4K/ [22:29] +1 [22:33] powersj: https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/342617 [22:33] what did you do? [22:33] to fix it? [22:35] powersj: fixed the world. I just changed pinned version and fixed cloud_tests [22:35] powersj: did you sudo snap refresh lxd ? [22:35] per that issue? [22:35] blackboxsw: I did a sudo snap refresh lxd and a sudo snap restart lxd [22:36] lxc list sat there for 2mins and then the world worked [22:36] blackboxsw: you left in a debug statement [22:36] blackboxsw: in cloudinit/url_helper.py [22:37] bah powersj I had uncommitted changes in that branch that I pulled in unknowingly... repushing in 2 mins [22:39] powersj: force pushed. https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/342617 [22:40] +1'ed [22:46] so no idea why lint or flake didn't find it ? [22:50] thanks powersj, yeah just awaiting completion of https://jenkins.ubuntu.com/server/job/cloud-init-ci/970/ [22:50] so strange [22:50] and looks good [22:50] blackboxsw: yea ship it === nacc_ is now known as nacc [22:52] ok landed [22:53] will repropose branches xenial|artful|devel tonight [22:53] but need to make some dinner at the moment [22:55] something about our .pylintrc in cloud-init blocks it [22:55] if I put a simple test into a different dir, then I get [22:55] Module test [22:55] E: 2,31: Instance of 'str' has no 'startwith' member (no-member) [22:56] interesting sine we specifically allow errors [22:56] yeah, havent' tracked down the line yet [22:58] * blackboxsw repushed https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/342605 [22:58] for bionic releaes [22:58] no, that's not it [22:58] something else inthe structure, I removed .pylintrd and didn't find it either [23:02] hrm, so, info is a dict (load_yaml), then we have two gets, which return a value from the dict, which it cannot know [23:02] so, if you str(dver) in there [23:02] then pylint finds it [23:02] just force pushed xenial and artful branches [23:02] need to await CI on them [23:02] -> dinner [23:02] but, our .pylintrc still isn't happy with that [23:10] oh man [23:10] pylint just does a regex on the source file [23:11] the http.client and m_.* have pylint ignore that file [23:16] meh we should improve/limit that ignore if we can [23:17] I don't know what to do about that