[01:25] rharper, ? [01:25] here [01:25] what is user_data.sh ? context ? [01:26] above, tox -e citest == tox-venv citest [01:26] maas bug [01:26] Bug 1644229 [01:27] Read from http://10.2.0.50/MAAS/metadata/2012-03-01/user-data (200, [01:27] 18443b) after 1 attempts [01:27] util.py[DEBUG]: Writing to /var/lib/cloud/instance/scripts/user_data.sh - wb: [448] 13399 bytes [01:27] I was wondering if maas always sense a "user_data.sh" during deployment, or if that's custom user-data from the maas user [01:28] the bug is that the user_data.sh script doesn't exit 0; so cloud-init says the deployment failed [01:36] i'm not sure what woudl write user_data.sh [01:37] how do i open a x-7z ? [01:42] smoser: 7z program, it has args like tar [01:43] it doesn't do directories right though [01:46] smoser: p7zip-full [01:47] then 7z x [01:47] rharper, i really dont know what wrote that. [01:48] that too is obnoxious [01:48] but anyway... id ont kow what wrote that file. [01:48] cloud-inti in user-ata is getting a multi-part input [01:48] ok, I don't think it's "normal" [01:48] and one of the parts is file name user_data.sh [01:48] right [01:49] i dont see it in maas though [01:49] right [01:49] I suspect it's something they added for debugging or something but that's the cause of the failure (the script) [01:49] if they remove it, things should work. [01:49] ok. there it is [01:49] generate_user_data [01:50] in maas? [01:50] yeah [01:50] what data does it pull in ? [01:51] looks like power info for one [01:51] hrm, but that's config, not a script [01:52] no. its a mime multipart [01:52] one part x-shellscript [01:52] which gets put inthere [01:52] i really woudl like to get the contents of /var/lib/cloud/ [01:52] that'd be sufficient [01:52] ok [01:55] its wierd that it doesnt output anything [01:55] it just exits fail [01:56] i hvae to run [04:02] powersj, rharper: 'run tests on current code' functionality is done, PR is at: [04:02] https://code.launchpad.net/~wesley-wiedenmeier/cloud-init/+git/cloud-init/+merge/314496 === shardy is now known as shardy_lunch === shardy_lunch is now known as shardy === rangerpbzzzz is now known as rangerpb [14:50] magicalChicken: sweet! [15:54] magicalChicken: thanks! will test it out later this morning [16:52] magicalChicken: fyi - I like the previous layout of having an inheritance model with common interfaces and I wish to continue using that model. My hope was to determine the flow and methods we use inside the KVM interface. [16:54] The issue we may have is the model we use to interact with a VM versus a container. With the container we can execute commands directly, with the VM we would need SSH or turn the system off and mount. [17:00] powersj: An additional setup option to set a root password and enable ssh could be used, something like setup_image.backdoor [17:00] Then we could just ssh in with those credentials for everythign [17:01] Since the vm would be running on our local host, or at least on the same LAN there wouldn't be much cost to ssh [17:02] The execute method could still get stdout and exit code over ssh [17:02] magicalChicken: as long as smoser is fine with us modifying the image like that, ok :) I was trying to avoid modifying the image in a way that may affect a test. However, when we add additional platforms like the clouds, we will have to add SSH anyway it seems. [17:02] I think smoser is right with exec; [17:02] I think we want to use exec over whatever transport [17:02] and the substrate layer should do what it needs to enable exec (remote ssh, if needed) [17:02] Yeah, I think for everything other than lxd, ssh is the easiest way to handle exec [17:03] ok [17:03] I think its also a fair assumption that after a certain timeout, if the system isn't accessible over the network, it can just be stopped [17:04] For kvm, we could just send SIGKILL to qemu for shutdown [17:04] why not use a trailing collect script to shutdown like we do in vmtest ? [17:05] at least in powersj proposal, we're injecting the collect scripts anyhow; no reason not to also have a boiler plate to shutdown the instance at the end of collection [17:05] the platform objects are used with a context manager that shuts down using instance.shutdown() [17:05] i don't think it makes sense to inject collect scripts [17:05] * smoser has to read [17:06] magicalChicken: that's fair (no inject) if we're settled on exec [17:06] push_file() can be implemented wiht execute [17:06] yeah, so run_script() is just push_file() + execute('/bin/bash file') [17:06] yeah it sounds like usin exec means using ssh over injecting files + running + shutdown + pull files [17:07] well, pull_files and then shutdown ? [17:07] pull_files() can be done over ssh as well [17:07] the alternative is using a secondary disk image which can be accessed directly offline [17:07] and shutdown is just execute('shutdown') [17:07] ala vmtests 'collect disk' [17:08] that would require collect behavior to change based on platform though [17:08] it wouldn't be too bad, just for part of collect, but its still more work [17:09] sure but that's why we're having platform abstraction ? [17:09] the abstraction was originally just over how we get to the point where we have something we can call execute() on [17:10] it may also be nice to have ssh set up so we can get in for debugging [17:11] I think file pull via execute is fine; we can avoid adding the secondary disk if possible; [17:12] I would like a 'keep the instance on failure' flag like we have in vmtest [17:12] specifically for live inspection [17:12] in which case, it could emit the ssh info to connect [17:12] Yeah, 'keep instance on failure' would be nice [17:12] I've been meaning to do that for lxd as well [17:13] And have it configurable via cmdline [17:13] Yeah, there could be a debug message with ssh info during setup [17:15] magicalChicken: rharper: sounds like the current model then is to get image, modify image by adding ssh backdoor, for each test exec will run the commands specified via ssh and output collected then, not when turned off. [17:15] Also, for the most part, run_script() is the only consumer of execute(), since pull_file() is only used by 'bddeb' and push_file() is only used if running with --deb or --rom [17:16] powersj: Yeah, I think it makes sense to do it like that [17:16] For the image modification and setup, images.execute() could just be a call through to mount-image-callback [17:17] there's possibly a missing step of uploading the modified image [17:17] on clouds, we'll need to create the backdoored image, upload it, then use the uploaded image [17:17] but at least for the 'local' case lxd and kvm, mic is fine [17:17] There's a snapshot object in between image modification and launching instances [17:18] It should work fine for the snapshot to represent a remote image that can be launched right away [17:18] So the upload could happen during snapshot.__init__ [17:21] i'm pretty happy with the backscroll there. :) [17:22] magicalChicken, yes, snapshot was intended to do the upload... basically that takes an "image" and turns it into something that can be strated. [17:22] wrt ssh... do we have a test that disables ssh ? [17:22] if we do not, then at the momentn we can punt on baackdoring image [17:22] and just use port 22 [17:23] default root password might differ between distros though [17:23] well, we're not going to go in as root. [17:23] well, maybe you would [17:23] don't believe we have a disable ssh test at the moment. just a number of ssh key generation tests [17:23] but if you backdoor, you just add a user that can sudo [17:23] right, shutdown cloud be a 'sudo shutdown' [17:24] i don't think any of the collect scripts really need to be root [17:24] well, execute() assumes root [17:24] it shoudl at least [17:24] it could just always sudo then [17:24] that can be done over ssh easily enough [17:24] it cmd as a list so just ['sudo'] + cmd should work fine [17:25] almost [17:25] :) [17:25] http://smoser.brickies.net/git/?p=snippits.git;a=blob;f=bash/ssh-attach;hb=HEAD [17:25] more context on taht basic path at [17:25] https://gist.github.com/smoser/88a5a77ab0debf268b945d46314ea447 [17:26] Oh, that's really nice [17:26] So it doesn't flatten everything into 1 cmd [17:26] s/cmd/arg [17:27] well, yeah. the wrapper unflattens [17:28] if we're doing ssh, we probably do want to use a python library for it... [17:28] paramiko [17:28] but the command execution wrapper business still can be managed. [17:29] Yeah, that should be cleaner than using subp [17:30] The new img_conf format in the devel version of the test can automatically apply certain setup options to images on certain platforms, so that handles enabling backdoor on kvm [17:30] There's a lot of general cleanup in that branch other than just enabling debian/centos, so its probably best to build this off of there [17:33] magicalChicken: didn't see any comments from you on image sourcing. rharper suggested reusing curtin's sync. smoser any comments there? [17:34] powersj: We'd need to have something url based as well for other distros, but that could be files thrown into the same directory with a separate index for them [17:34] I think curtin image sync makes sense [17:34] powersj, well, we want to sync yes. [17:34] but curtin syncs maas images [17:34] what we don't have AFAIK, is any streams data for other distro images [17:35] which require a kernel external [17:35] no boot loader [17:35] it may be we need a sync per 'distro' [17:35] rharper: that's what i meant with having a raw download url as well [17:35] well, that's not what I mean [17:35] oh [17:35] streams publishes where the URL is [17:35] well... http://smoser.brickies.net/image-streams/ [17:35] just for ubuntu though [17:35] ie, if they're building new ones, you can't just blindly wget $URL which may not get you what you want [17:36] magicalChicken: even for ubuntu, we use the streams data to figure out what we want from what's available [17:36] so that is just manually maintained and updated [17:36] rharper: I thought at least for debian there was a kind of 'lastest build' url [17:36] smoser: sneaky =) [17:36] but, it works. and we could do something similar [17:36] magicalChicken: right but that's still not what we want [17:36] rharper: right o [17:36] if you can only recreate on a specific image or release; you sort of want history [17:37] yeah that makes sense [17:37] but: http://smoser.brickies.net/image-streams/ would work [17:37] yeah [17:37] thats actually pretty nice [17:37] we do want to at least be able to easily see that somethig changed between two runs [17:38] my images are currently synced into serverstack [17:38] nice [17:38] would be nice to see if there are other sources of published images that are newer [17:38] like AMIs ? [17:38] surely there are newer centos7 images [17:38] smoser: in general do you want to host the pulling of images ? [17:39] so.. in my design i really just pushed this all off to the "platform" [17:39] we might need to mirror that service to prodstack [17:39] i forget what i called it, but essentially you ask the platform to get you an image that you can modify [17:39] yeah [17:39] platform.get_me_an_image("ubuntu/foo") [17:39] I think that's a good abstraction [17:39] since we'll need to poke a each substrate differently [17:39] smoser: that had to be modified a bit [17:40] there's basically an image config with information about how to locate the image [17:40] which can be different on each platform [17:40] so 'xenial' -> 'os=ubuntu release=xenial arch=amd64' [17:40] thats more an 'alias' than a 'config' [17:41] theres more information there as well [17:41] * smoser looks [17:41] like timeouts for stuff and setup options that may be required [17:41] don't look at master, its broken [17:41] look in wesley-wiedenmeier/cloud-init:integration-testing [17:42] That's the main reason I want to base the kvm development off of the current version of the tests, the new img_conf format is much cleaner [17:43] hm... i'll read some. i'm not convinced :) [17:43] smoser: the version in master is pretty bad, it shouldn't be used [17:44] but there is always going to have to be a kind of alias system, since we want the same os_name to refer to the same release on every platform [17:45] i still dont follow really. [17:45] so whether the identification info is in 1 place or inside the platform or inside releases.yaml doesn't really matter, its the same thing [17:45] smoser: each release has a name, so 'xenial', or 'stretch' or 'centos70' [17:45] and the new img_conf maps that name and the platform name to all config needed to locate and use that image on that platform [17:46] so that platform.get_image() can just be passed config.load_os_config('platform_name', 'image_name') [17:46] so saying xenial in a run of the integration tests knows which AMI to pick on AWS or what lxc command line option to use or what simplestreams command to run to get for kvm [17:52] i agree that something needs to translate 'ubuntu/xenial' into an image, and that takes some additional information. [17:52] a.) 'ubuntu' is the os, and 'xenial' means 16.04 ... [17:53] b.) where to get this image or create access to it (get an ami or use lxc or ... ) [17:53] i think though, that i really consider thedetails of that to be a platform thing [17:53] possibly even configurable through the platform [17:54] smoser: it is configurable for each platform separately [17:54] in the new format [17:54] the main reason for having the image config and per-platform image location information together [17:54] is that some of the image config may change based on the platform [17:55] i.e. the timeout for booting xenial on lxd is not necessarily the same as the timeout for booting xenial on aws [17:55] or setup_image options that are enabled by default are not the same for all platforms [17:56] the actual implementation of downloading the image is handled by the platform object, the img_conf information is just used by that [17:57] thats fine and makes sense. but you've made the 'Platform.get_image()' much less easily usable [17:57] you can't do anything without a platform [17:57] getting an image is just platform.get_image(config.load_os_config('lxd', 'xenial')) [17:57] so having that platform thing be easily usable is important. [17:58] I could also do the config.load_os_config silently inside the platform [17:58] so it could be platform.get_image('xenial') [17:58] or even change os names to be in the 'ubuntu/xenial' format [17:59] i dont have strong feelings on 'ubuntu/xenial'. it is obvious what that means to you and me ('xenial'). [17:59] but it is not so obvious if you just use '7' [17:59] rather than centos/7 [18:00] the centos ones are 'centos70' and 'centos66' rn [18:00] i think having a delimeter there makes sense. [18:00] but... meh. not all that important. [18:01] you're right in that something has to take htat string and make sense of it. [18:01] i think we're mostly in agreement. [18:01] yeah, the delimiter might look nicer [18:01] the name is just used as a dictionary key, so its pretty easy to replace [18:03] its ok if we have some alias thing that turns one into another. [18:07] for now, i think we should just not bother with non-ubuntu on kvm. dont halt yourself on that. we'll find images, and then we'll enable other os there. [18:08] that makes sense [18:08] if the sstreams mirror is per distro, it would be no trouble to add another one once a source is found [18:09] There's also a spreadsheet I have going to track all this at: [18:09] https://docs.google.com/spreadsheets/d/1DAzBlh-wk-rv-WRjllNRG6nnHtAmD0EFBLEYtu8weII/edit#gid=0 [18:12] something that comes to my mind... [18:12] the kvm 'platform' [18:13] lxd is a cloud platform. because it handles metadata for us (and puts that stuff into /var/lib/cloud/seed/) [18:13] kvm is not a cloud platform [18:14] kvm+NoCloud is analogous to lxd in that sense. [18:14] i think when we say 'kvm', we're really meaning "kvm+NoCloud" and even then probably kvm+NoCloud-attachedDisk (versus seeding noCloud). [18:15] yeah, i think kvm + seed disk makes the most sense to do [18:15] since that's used by some cloud setups [18:21] well, seed disk differs. [18:21] no cloud to my knowledge uses Nocloud other than uvt [18:21] ConfigDrive is different [18:22] how do id eal with root... [18:23] we could try basing this on ConfigDrive then [18:23] to test openstack support [18:32] i think nocloud is fine and rpobably best now. cofigdrive is a bit mroe entailed. [18:32] and we can get that easily enough from a real-ish opesnstack [18:32] https://gist.github.com/smoser/b32bb1c33564d1d46971cd9ded2e8477 [18:32] magicalChicken, ^ that is a failsafe ssh that i had set up. [18:33] i think there are some bugs in it, but it is a starting point [18:33] and https://code.launchpad.net/~smoser/+junk/backdoor-image [18:33] smoser: nice, that looks pretty easy to use [18:33] that backdoors an image, adding a user that can sudo [18:34] it might be nice to hook in the failsafe root console too [18:34] for debugging [18:35] yeah, I could add that as a setup_image option [18:35] you add that, and then hit 'alt-f2' and 'enter' and root prompt [18:35] :) [18:35] smoser: nice, would be good for vmtests too [20:21] magicalChicken: when I check out your branch I need to create a tag before I build it looks like [20:22] powersj: the integration-testing-invocation-cleanup branch? [20:22] do you run something like `git tag -a 0.7.9 -m "my test"` [20:22] yeah [20:22] no [20:22] it should just work [20:22] it commits inside the build container, so it doesn't mess with the main repo [20:22] well running the tox citest_run fails because git describe fails [20:22] I'm not sure why that's happening [20:22] is git describe failing inside the build or on the host? [20:23] Is this the proper way to check out your branch? [20:23] git clone -b integration-testing-invocation-cleanup https://git.launchpad.net/~wesley-wiedenmeier/cloud-init [20:23] host [20:23] I'm not even sure what would be calling git describe in the host other than tox [20:23] well that is where it fails [20:23] and the zip build by tox isn't used for anythign really, it shouldn't affect anything [20:24] does tox -e flake8 work? [20:24] here is an example of what I was doing but via jenkins: https://jenkins.ubuntu.com/server/job/cloud-init-citest-run/1/console [20:25] and no flake8 or just 'tox' doesn't work until I create a tag with 0.7.9 in it [20:25] its failing before cloud_tests are even called [20:25] something broke in the git clone [20:25] after cloning I don't have any tags from your branch [20:26] ? [20:26] why [20:26] no idea but git tag -l shows nothing [20:26] Let me try to clone in a clean environment, I have no idea how that would happen [20:27] (this is where my git foo is lacking) [20:33] powersj: I'm seeing the same issue with git describe from cloning with that url [20:33] what repo (i can try and help resolve the git side, at least)? [20:34] nacc: ~wesley-wiedenmeier/cloud-init:integration-testing-invocation-cleanup [20:34] nacc: I think it must be something to do with cloning via https on launchpad, because using my ssh key it works [20:35] magicalChicken: hrm, i see no tags over in your repo? [20:35] https://git.launchpad.net/~wesley-wiedenmeier/cloud-init/refs [20:36] seems to only list branches? [20:36] nacc: that is really strange, i see tags on my working copy [20:37] magicalChicken: with a fresh clone? let me also try locally [20:37] maybe I'm only seeing the tags from my upstream remote and they didn't get pulled in [20:37] I'm going to try with a fresh clone again, maybe my repo just doesn't have tags at all [20:38] magicalChicken: it's possible you only pusehd your branches and not tags by refspec? (or with --tags) [20:39] nacc: I might have, would 'git push --tags' resolve? [20:39] magicalChicken: presuming that's what you want to do (push all your local tags) (and you might need to specify a remote, depending on your git configuration for that repository) [20:40] nacc: I have my repo set as default remote, I think it worked [20:40] There's the same tags as upstream at https://git.launchpad.net/~wesley-wiedenmeier/cloud-init/refs now [20:40] I must have just messed up when I set my repo up originally [20:40] yep i see tags now [20:41] nacc: thanks for the help, I'm still not great with git [20:42] powersj: clone + describe is working now [20:42] magicalChicken: ok [20:42] nacc: thank you! [20:42] http://paste.ubuntu.com/23783539/ [20:42] magicalChicken: np! i think by default, unless you specify a push refspec in your git config, `git push` only pushes your current branch (see `man git-push` for the defaults) [20:43] i guess that makes sense as default behavior since you may have tags just for your own reference [20:43] yep [20:51] powersj, you dont have the tags locally [20:51] nacc knows that sort of stuff [20:52] powersj should be relying on the upstream tag [20:52] smoser: yeah I believe nacc got us all sorted out now :) [20:52] ah. i see. [20:52] no more little hack [20:53] magicalChicken: https://paste.ubuntu.com/23783582/ on my laptop things timed out, on jenkins it is running just dandy :\ [20:54] jenkins run so far: https://jenkins.ubuntu.com/server/job/cloud-init-citest-run/2/consoleText [20:54] is there a way to triage where it is getting stuck or slowing down? [20:57] this is all running on old pylxd so stuff may be failing silently [20:57] looks like jenkins run is working perfectly though [20:57] yeah [20:57] I'm not sure what caused timeout on your laptop [20:57] hmm [20:57] looks like it failed before the first instance ever booted [20:57] I will disconnect from VPN and make sure that isn't killing me again [20:57] yuo may want to try increasing timeout a bit for bddeb [20:58] i thought about adding a flag to adjust it but it didn't seem needed on a decent internet connection [20:58] because the initial boot for bddeb installs devscripts and that has a ton of deps [20:59] also, looking at this, I should have used run_stage for the tree_* commands, since it tried to go and do the actual run even though build failed [20:59] I'll switch to using that real quick, it'll be cleaner too [20:59] ok and which timeout should I bump? [20:59] xenial boot timeout [20:59] ok [21:00] ah ok so the generic timeout for a release [21:00] with the old img_conf there's only one [21:00] ah that's right [21:01] it takes ~80 seconds for me to do initial boot including installing devscripts for bddeb, so I could see it taking 120s if you're on vpn [21:05] yep it took just under 3 mins [21:05] :( [21:06] that's way slower than it should be, but i guess its just network speed [21:06] I don't have the best connection when I'm in WA [21:06] the problem is devscripts has py2 deps [21:06] so tons of stuff gets pulled in [21:07] nothing you can really do about it. [21:07] you cant test a deb without buildling a deb [21:07] its fine for jenkins since the servers have good network [21:07] still not really ok. [21:07] its still a *ton* of io [21:08] but, dont know what we can really do about it. [21:08] I think the bddeb/tree_run paths are really only for testing stuff in local branches anyway [21:08] bddeb could probably us dpkg-buildpackage [21:08] Jenkins can just build 1 deb and use it for all tests [21:08] which has a bunch less [21:08] yeah, debuild may be overkill [21:08] well, it can also just use the daily build ppa [21:09] and not build anything [21:09] yeah, that's probably the cleanest way to do it [21:09] ppa support works well, I pretty much only use that for local testing [21:14] magicalChicken: so the use case for this was a local developer (e.g. rharper) creating a test and wanting to try it out without needing a whole build env. [21:14] smoser: in general, when I'm working on a fix that adds a test-case change, it'd be nice to be able to push the current try into the image and run that; like we have with curtin; a close second is building out of tree, which is what magicalChicken was doing; at least for me, that's a useful workflow for iterating on code/testcases [21:14] ^^^ that :P [21:14] powersj: well, waiting on ppa build sucks [21:15] right, so help you [21:15] so waiting for this is a small price to pay versus a ppa [21:15] and the followup for magicalChicken was that his package/bddeb didn't work so do it in a "clean" environment [21:15] rharper, of course it is. [21:15] I think it would be nice if we had a pack equivalent since that's even faster but [21:15] i wasnt saying that it wasnt. [21:15] Yeah, my python installation is most likely the cause [21:15] smoser: ah, sorry [21:16] we could add an "install from trunk" [21:16] but then you end up building yourself a package manager or all the stuff that is being put intot he package already. [21:17] The bddeb route can run completely (bddeb + 1 test case) in 3 minutes for me, which isn't too bad [21:17] s/3/4 [21:17] Once we're preserving images as well instead of downloading each time it'll be closer to 2, so I think that's fast enough [21:23] magicalChicken: and we re-use the base-image-download + inject bddeb [21:23] so, we shouldn't pay that cost more than one per base-image [21:23] right ? [21:24] rharper: yeah, we'd just keep all the base images downloaded [21:24] then make a copy (which uses zfs copy on write) for the snapshot+instance used to build the deb in and the snapshot+instances for tests [21:24] so only 1 download, and possibly none if we already had an up to date image from running tests before [21:25] y [21:35] magicalChicken: https://paste.ubuntu.com/23783735/ a timing example for you [21:36] powersj: that's pretty slow, but 4 minutes of that were downloading images [21:36] let me hop off vpn and try again brb [21:38] I just ran with 'tox -e citest_run -- -n xenial -t modules/final_message' and got 'real 5m5.283s' for time [21:38] not much better, but a bit [21:49] well that made no difference [21:49] probably limit is local isp, not the vpn then [21:50] I'm working on config for keeping images right now, going to cherry pick new img_conf format out of devel branch back to version in master, add it on there, then rebase bddeb on that [21:50] that'll be the biggest speed increase possible [21:52] magicalChicken: ok I'm about to comment on the merge with the tests I have run so far [21:53] cool === rangerpb is now known as rangerpbzzzz