=== StoneTable is now known as aisrael === natorious_ is now known as natorious === Guest96631 is now known as mgagne [18:06] magicalChicken: still getting too many open files https://paste.ubuntu.com/23771548/ [18:06] is this more in line with what you were seeing? [18:15] powersj: yeah, especially first ones there where the stacktrace is from in pylxd [18:16] magicalChicken: ok, how can I help here? [18:16] i guess switching back to polling inside the instance lets the test suite get further before it hits it [18:16] powersj: pylxd is broken, there isn;t really anything that can be done here [18:16] I'm going to look into the pylxd bug sometime today or tomorrow I guess and see if i might be able to patch it [18:16] ok - should I fall back to pylxd 2.1? [18:17] I'll have to modify the test suite again to get it working with 2.1, the response from execute() is different [18:17] ok then let's leave it at 2.2 and get pylxd fixed [18:17] We'd also lose all of the error handling for setup image [18:18] Yeah, fixing pylxd shouldn't be too difficult. I've read through the code a couple times before when I was starting on the test suite [18:18] The pylxd bug will probably cause issues for other projects as well at some point, so good to fix it [18:19] magicalChicken: ok sounds like a plan. I am going to send you a list of scenarios that I am using today with no issues and what I am using to test your merge. Hopefully that helps [18:20] Sure [18:20] Raising the open file limit might buy the test suite enough space, but it probably isn't needed [18:21] powersj: I've also traced down the last centos issue to a systemd bug in the centos lxd images, so the test suite is doing the right thing there [18:21] oh nice! [18:48] powersj: magicalChicken: question on using integration tests... in the conf yaml which supplies the cloud-init, I've got a specific IP and hostname that I'd like to reference ... in the test datacollections, there isn't any variable references; I just need to make sure I copy the value into the collect script ? [18:49] rharper: you can access the cloud-config provided to the test from the test script with self.cloud_config [18:50] oh, wait you mean the collect scripts themselves [18:50] I haven't added in any variables there, its definitely doable though [18:51] so I have user-data with server: 192.168.12.23; was thinking I'd set a variable, but that's probably too much trouble; [18:51] if I didn't use shell to run the collect program, I could use python instead and import the user-data configuration to fetch the values [18:51] Most of the current collect scripts are like that [18:52] The script itself just grabs a file or the output of a command, then the python verifier script checks if its correct [18:53] In the python verify scripts there's some helper utilities as well to read through the cloud-config the test ran with [18:54] ah, yeah, that's fair [18:54] I forgot things were spread apart [18:54] I figured this way its easier to debug the verification, because downloading the images and all is slow [18:56] hrm, most of the collect scripts don't redirect output to a file ... I suppose that's what I was confused; most of them run a test (grep $this $that) and return zero or non-zero [18:57] ah, i see, output is stored via the script name [18:57] I don't actually have the test suite looking at the return code from a colelct script [18:57] sorry [18:57] oh, yeah stdout is saved [18:57] * rharper is coming up to speed on adding one [18:57] or rather modifying [18:57] There's also a 'create' command that can give you a template [21:08] magicalChicken: powersj: I added a test_ method to one of the ntp_server ones ... what's the top-level command I'd use to kick off say just the 'ntp_servers' integration test ? [21:09] python3 -m tests.cloud_tests run -v -n xenial -t tests/cloud_tests/configs/modules/user_groups.yaml [21:09] is an example of running on a xenial based one with the user_groups test [21:10] sweet [21:10] * rharper fires away and watches the world burn [21:10] :) [21:15] http://paste.ubuntu.com/23772570/ [21:15] powersj: magicalChicken ^^ [21:16] rharper: what version of pylxd? `pip show pylxd [21:18] pip? [21:18] % apt-cache policy python3-pylxd [21:18] python3-pylxd: [21:18] Installed: 2.0.5-0ubuntu1.1 [21:18] rharper: also, which version the test suite [21:18] I'm on xenial [21:18] magicalChicken: how do I know ? [21:18] I branched cloud-init from trunk last week [21:18] for this bug fix [21:19] I haven't tested the version in trunk [21:19] what's running on jenkins ? [21:19] also, what's in trunk will only work with pylxd 2.1 [21:19] the version in trunk [21:19] In my experience only 2.1.x of pylxd has worked sufficiently well. [21:19] boo [21:19] smoser: ^^ [21:19] there's a bug in pylxd 2.2 [21:19] the version based on 2.2 is much better, but can't be used until that bug is fixed [21:20] do you have a cloud-init-testing ppa with 2.1.x for xenial ? [21:20] * rharper isn't going to do the pip thing here [21:20] rharper: that's probably a good call, pip has messed up my system pretty bad [21:20] hehe [21:20] there isn't a ppa afaik [21:20] The jenkins slaves have to use pip to grab a few things, so grabbing pylxd wasn't to big of a deal, but I agree a ppa would be nice... [21:20] the version in repos is 2.0 which is also incompatible [21:20] powersj: cheater =P [21:21] a ppa would be good [21:21] yeah [21:21] rharper: I know :P [21:21] is 2.1 in yak or zesty ? [21:21] we can just copy archive into any old ppa [21:21] rharper: its in yakkety [21:21] * rharper rmadisons [21:21] cool [21:21] yeah, y and z are 2.1 [21:22] the problem with 2.1 is there's no way for me to do error checking at all during setup, thats why the switch to 2.2 was needed [21:24] sure [21:30] wait... [21:30] we should run the tests in a tox virtual env [21:30] powersj, and i talked about that. [21:30] then we can get the pylxd that we want/need [21:31] smoser: no longer possible with the newer version of the test suite as it requires shell access [21:31] > [21:31] ? [21:31] smoser: i couldn't get support for other distros going without it [21:31] i dont follow that. [21:31] util.subp() doesn [21:31] util.subp() doesn't work inside tox env right? [21:32] why wouldnt it? [21:32] isn't tox inside a chroot? [21:32] smoser: I'm fine with tox; but I'd like a jenkins-runner like top-level entry point so I don't have to figure out how to call it [21:34] tox is not a chroot, no. [21:34] smoser: oh, nvm, sorry [21:34] smoser: tests should run fine inside tox then [21:34] i can write a wrapper script [21:34] tox kiind of complains in some cases if you run a command (through it directly) that is not in the virtual env [21:34] but, you'd run 'tox -e integration-tests' [21:34] or something. [21:35] the issue with this... [21:35] well we have a whole host of arguments we need to pass [21:35] is that pylxd requires c interfaces [21:35] which sucks [21:35] oh, yeah that does suck [21:35] meaning 'pip install pylxd' compiles code with gcc (and requires python-dev and the like) [21:35] i remember when we were trying to test curtin with python parted [21:36] I'm almost considering just writing a wrapper around lxc command line then [21:36] Since the current hold up on the test suite is a pylxd bug [21:36] And the previous one was also a pylxd bug [21:37] i have one question... [21:37] when i saw powersj running this... i have to run this more locally... [21:37] each of the shells into a container took like 1 second to run [21:37] or more [21:37] blame pylxd :) [21:37] so collecting things was painful to watch [21:37] it creates a websocket interface then shuts it down each execute() call [21:37] if i blame pylxd, then i'm quite open to pitch pylxd [21:37] yeah, there's no reason that should take that long [21:38] but that doesntn seem like it relaly should be that slow. [21:38] smoser: after upgrading to a newer version of pylxd that fixed auth issue it is far faster [21:38] powersj: newer version being ? [21:38] recall my updates about 50 mins -> 12 mins (or something of that magnitude) [21:38] I'm not sure why it is, but compare instance.execute() to 'lxc execute cloud_test_... "cat /var/log/cloud-init.log" [21:38] 2.1.3 [21:38] or newer [21:39] powersj: that's interesting, maybe that's part of why tests were taking so much longer on jenkins than my laptop for a while [21:39] hm.. [21:39] $ time sh -c 'for i in "$@"; do lxc exec x1 /bin/true; done' -- $(seq 1 10) [21:39] real 0m1.705s [21:39] user 0m0.240s [21:39] sys 0m0.116s [21:39] that plus zfs backend :) [21:39] powersj: haha yeah that definitely helped [21:39] yeah - everything is running as fast as you were seeing now, so speed isn't an issue imo [21:39] the current devel version should be even faster, image download times are less than half what they were before [21:40] magicalChicken, yeah, your instance.execute() is what i'mo asking about. above, there i did that in ~ 0.17 seconds per each [21:40] (which is still slow) [21:40] but i recal that being like 3 seconds when watching it on powersj [21:40] powersj: magicalChicken: if I want to pip install python3-pylxd which version string should I use ? [21:40] if those are much more similar with the lxc cmdline, then i'd say we keep pylxd [21:40] and maybe we should anyway [21:40] rharper, really... dont do that :) [21:40] just use tox or virtual env to get you waht you need [21:41] rharper: for the version of the test suite in master, 2.1.3 [21:41] but yeah pip is terrible [21:41] i have 4 or 5 different versions of every python library installed on my system [21:41] smoser: all my instance.execute() method does is call pylxd's execute() [21:41] maybe a diff to tox.ini with a cloud-test group with the right defs ? [21:42] I am thinking that dropping pylxd may be good just so it doesn't break the test suite again [21:44] i dont kmpw/ [21:44] know [21:44] pylxd does keep the code clean though [21:45] yeah... and honestly it shouldnt be that bad to bring up a web socket [21:46] so it *shouldnt* be that slow [21:46] and we happen to work with the people who write it :) [21:46] i dont like that i have to have gcc for it in a virtual env... [21:46] i think the speed issue is mostly fixed [21:46] the current issue is https://github.com/lxc/pylxd/issues/209 [21:47] its leaking file descriptors, so the test suite can't get through a full run without running out [21:47] well, we can ulimit -a for now [21:47] and the speed thing shoudl be fixable for sure... [21:47] smoser: there's a half finished fix for that bug, I just need to get it to pass tests and ping someone to pull it [21:47] and we can/should improve our collection and such to do fewer execs [21:48] as over other arches, those will be more expensive [21:48] ie, ssh as tghe transport [21:49] i think 1 exec per collect script isn't too unreasonable [21:50] it may be possible to reduce the number of default collect scripts though [21:50] sure its possible. [21:50] we wioll just need to work things differently. [21:50] but thats fine [21:50] not a big deal now [21:51] Yeah, on kvm execute() should be pretty fast as well [21:51] its only when we get to remote instances that it'll be slow [22:09] http://paste.ubuntu.com/23772854/ [22:09] smoser: powersj: magicalChicken: that let's me run it under tox on my xenial host [22:10] * rharper will figure out how to pass the specific test as args and just include the main run command eventually [22:10] rharper: nice [22:10] rharper: default behavior if -t is not specified is to just run everything [22:11] right [22:11] rharper: sweet... btw is that list of dependencies just a cut and paste of everything you had? Can we just specify pylxd and still run? [22:11] that was the list of deps from apt-cache show python3-pylxd [22:11] it's from the xenial testenv [22:11] ok! [22:12] and then the last 5 or below pylxd are package deps [22:13] 017-01-09 16:08:06,209 - tests.cloud_tests - DEBUG - running collect script: ntp_conf_servers [22:13] 2017-01-09 16:08:06,373 - tests.cloud_tests - DEBUG - running collect script: cloud-init-output.log [22:13] 2017-01-09 16:08:06,534 - tests.cloud_tests - DEBUG - running collect script: cloud-init-version [22:13] smoser, I'm seeing sub-second collects on 2.1.3 pylxd [22:13] also on zfs backend, but I think that looks reasonable; but will wait until we see the whole thing run [22:14] rharper: on my system its usually ~8-10 seconds per test case, of which 6 of that is booting the system [22:17] yeah [22:18] why do we delete the base image each time ? [22:18] that's somewhat annoying since I already use ubuntu-daily:xenial [22:18] rharper: save disk space [22:18] had it on my system [22:19] there's a modified version of the image that the tests actually run from, which should be deleted [22:19] right [22:19] i could set it to leave the base image behind [22:19] but the base images could stay [22:19] sorta like sync-images in curtin [22:19] the issue i run into is i only have 2G given to zfs on my system [22:19] ideally we'd have a sync stage, and run stage which uses what's present [22:19] so I can only have 1 image at a time [22:19] but yeah, the test suite doesn't need to do that [22:20] * rharper hands magicalChicken external SSD disk [22:20] lol [22:20] i have an external 1T disk, but no power cable :( [22:20] but ssd ? [22:20] No, not ssd [22:20] this is usb3 128G ssd, should be helpful [22:20] powered via usb bus which is nice [22:21] nice [22:21] I'll pass it along at the next meet up if you like [22:21] serious? thanks :) [22:21] that'll actually help a lot [22:21] yeah [22:22] I can add in a image-sync command that'll download every image available [22:23] s/available/needed [22:28] yeah [22:28] so when using tox -e foo ;;; it injects a cloud-init-0.7.9.zip ;;; how and when is that made? ie, how do I know that it includes what's committed or what's uncommitted ? [22:29] powersj: ^^ [22:29] wondering if the cloud-init that's running during the cloud_tests is my modified version or something else? [22:32] rharper: the cloud-init in the cloud-tests is either from a repo, a ppa or a .deb [22:32] rharper: unless you specify a specific deb to inject... what he said :) [22:32] the test suite doesn't modify the cloud-init itself [22:32] but how does it get injected into tox [22:33] it's not tox... it's the lxc image [22:33] hrm [22:33] don't we have a curtin-like pack where we package up what's in-tree and inject it ? [22:33] rharper: Oh, the version that gets copied into tox is probably just a clone of the current version [22:33] rharper: but i don't think its used [22:34] There isn't a pack equivalent just because the way cloud-init gets built and what version of python it uses all depends on the release [22:34] so, I know at least when we run the tox command, we're using the right bits since I see a change in the server values [22:34] but when the cloud_tests runner launches an lxc container, it starts with the base image, and then doesn't it inject the current git tree ? [22:35] It can install the current git tree if you want it to with a setup script [22:35] or are we not yet to to using mount-image-callback lxd:container-name [22:35] well, if I'm developing a new test, how do I validate it ? [22:35] rharper: the way setup works right now is yuo can execute a script or use one of the setup args [22:35] rharper: I typically build the deb and specify it [22:36] powersj: urg, really ? [22:36] rharper: I normally use --ppa "ppa:cloud-init-dev/daily" [22:36] that's a long round-trip [22:36] magicalChicken: well, you were creating tests for existing code [22:36] rharper: you don't have to build a new deb just to test a new test case [22:36] I'm fixing a bug [22:36] so I need to run my fixed code; would prefer not to have a ppa build in the iteration loop, no ? [22:36] I'd just push to a ppa [22:36] Oh [22:37] -1 =P [22:37] I can set a script up to do a build automatically [22:37] I get the need for fast turn around, but these aren't unit tests so I guess I can accept some time required. Maybe a full ppa is too much then? [22:37] so, we're doing this via tox; so let me figure out what cloud-init-0.7.9.zip contains [22:37] I think with tox, that zip will be the current contents of cloudinit/ [22:37] if that's the current tree bits, then it's a matter of having the runner pick that up and inject that into the snapshot [22:37] Since that's used for unittests as well [22:38] magicalChicken: I hope so [22:38] yeah [22:38] that makes sense [22:38] * rharper checks .tox [22:38] rharper: the tricky part is automatically building though [22:38] I've never managed to build cloud-init on my system [22:38] looks like the zip is installed but not kept [22:39] Although I guess using setup.py rather than building a deb would work [22:39] rharper: If you modify a file then run tox -e flake8, it sees the change immediately [22:39] So I think its just the current tree [22:39] yeah [22:41] ok, yeah, so pip has tree installed; but we don't yet have a way of putting tree into the container [22:41] rharper: I can add that to setup_image [22:41] magicalChicken: so with proper deps on the host, package/bddeb should create a deb from current tree ... you're saying that doesn't work for you ? [22:41] I've not managed to get bddeb working yet [22:41] possibly due to your pip cruft ; but on a clean setup (like in a container ) it should work [22:42] Yeah, it works fine in a ppa, so that should work [22:42] so, we could 1) copy in source into the container and then build cloud-init via tools/bddeb [22:42] then inject that deb for the snapshot [22:42] That should work fine assuming that bddeb works [22:42] it should, and if we run bddeb in a container, that should make it repeatable [22:43] The other option is setup.py install, but that could be messy [22:43] it's probably pretty close though it will miss any pre-post stuff that dpkg did; (which I don't think is that much ) [22:44] Yeah, I think dpkg mainly just does systemctl enable cloud-init [22:44] I guess if I add the image sync feature then firing up containers is fast [22:44] So there could be a 'build container' that runs before the rest of the tests start [22:44] If the test suite is told to use current tree [22:45] right, I'd definitely sync the images needed; [22:46] That should be pretty straight forward, I guess general policy could be always leave around unmodified versions of images [22:46] then you could copy in the src, into a build container; once it's built and installed, snapshot that; use that snapshot as the base for each test [22:47] we might need to purge the build-deps and other things (that's going to add some startup cost) [22:47] but I think for the developer path; that's not bad [22:47] The build container could even be separate from the test containers [22:47] for the normal ci path, pointing at PPA or repo is fine [22:47] right [22:47] Just build, copy the deb back into the host, then nuke it [22:47] that's also possible [22:48] build once, copy out, then run your normal 'use this cloud-init.deb' [22:48] Yeah, for ci it should work well to just use a ppa [22:48] cool [22:48] That would also make it possible to just build 1 deb for all ubuntu images too [22:48] yeah [22:48] it's python [22:48] Only issue is trusty would need py2 [22:49] bddeb takes flags [22:49] but trusty can just be run separately [22:49] for python2 or python3 [22:49] you could run the build twice [22:49] and generate the 2 and 3 versions, copyout etc [22:49] Simplest way might be just to add a new command to test suite to build a deb in a clean container [22:50] and then a script that does that, then runs the test suite with the given args [22:50] sure; do we have a way of chaining these commands? right now, I just call the 'run' command [22:51] You can tell 'run' to use multiple distros in 1 go [22:52] like 'run -n xenial -n yakkety -n zesty -n stretch' [22:52] also multiple tests and multiple platforms [23:34] rharper: thanks for the tox starter :) here is a simpler version that works for me and an example of passing in arguments https://paste.ubuntu.com/23773398/ [23:36] now as you suggested if we combine that with building the local checkout in a 'build' container and you should have fast tests then for easier development ;) [23:37] powersj: nice, I'll pull that + rharper's tox config into the devel branch [23:38] magicalChicken: cool, really there are only 2 differences 1) is you only need pylxd specified to run, if we want to lock down other requirements we can and 2) the default arguments [23:38] I'm going to try getting the build container + 'run from current tree' script put together tonight [23:38] in mine {posargs:-n xenial} basically means run with -n xenial unless you get something else and in that case run that [23:38] ok [23:39] Right yeah, passing in args is definitely nice there [23:39] I'm actually surprised tox worked so well... I could swear you and I tried it and it failed terribly [23:39] I'll try to base that off of the pylxd 2.1.3 version so that it doesn't have to wait on pylxd being fixed [23:39] powersj: I'm pretty sure I had tried it before too [23:39] I may have just done something wrong with the environment though [23:40] magicalChicken: any concerns about doing this for when we add KVM backend? [23:40] starting to look at that is what I hope to do tomorrow [23:40] hmm [23:41] make sure we don't back ourselves into a corner now [23:41] well, its not going to affect the actual cli [23:41] (you can think about it) ;) [23:41] so we can always just switch bach [23:41] We're going to have to have root permission to modify the images for kvm [23:42] Although I may be able to rig something up where that happens inside of lxd :) [23:42] lol [23:42] I have wanted to do that for a while, so vmtests can stop needing root [23:43] Something like a fuse mount of a directory inside lxd where the image has been mount-image-callbacked