[06:11] ok will have a branch up tomorrow on the SRU https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+ref/fix-device-path-from-cmdline-regression. Just need to complete a test and the download rates I'm getting right now are painful (56K/b) so I'm leaving the download running til morning [15:29] blackboxsw: is there a bug associated with that fix? [15:30] yep dpb1 I'll dig it up and tie it in. [15:30] https://bugs.launchpad.net/cloud-init/+bug/1725067 [15:30] Ubuntu bug 1725067 in cloud-init "cloud-init resizefs fails when booting with root=PARTUUID=" [High,Triaged] [15:30] dpb1: ^ [15:30] the fix will land today [15:33] oh, cool [15:33] ah, ok, so caught during proposed testing [15:33] ? [15:43] rharper: http://paste.ubuntu.com/25802105/ [15:43] and blackboxsw [15:43] that *does* pass tests. [15:43] k [15:43] we could ideally make some metadata to the patching that would maek that work. [15:44] would make the "block device" return for a stat or something. [15:44] but thats getting fancy [15:49] * rharper likes fancy [16:11] dpb1: yep caught before SRU, so it's where we want to catch it [16:12] well, we'd like to catch that with unit tests/integration tests [16:12] but, before release is better than the alternative [16:14] blackboxsw: can it be added to a unit test? [16:15] s/added to/covered by/ [16:15] dpb1: I added a unit test that gets most of it. I have time to add a bit more concise unit test [16:18] yeah, my work that introduced this regression was to restructure the code so it was a bit easier to unit test (as it didn't have much in the way of testing). I just didn't add enough tests to validate this case. more to come [16:18] * dpb1 nods [16:38] blackboxsw: i think you forgot to push on fix-device-paht-from-cmdline-regression [16:38] https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/332634 [16:39] oops pushed 129beef2..445ee933 [16:40] was debating about adding one more unit test for happy path handle(), but I think I'll add that in a separate merge proposal as it'll be some more rework [16:40] I want a test that shows we are properly attempting to resize a disk by uuid [16:40] not just the test that detects the disk/by-uuid from the commandline [16:41] but I think we don't have to block the SRU for this additional test, since we are manually covering that [16:41] what do you think smoser [16:49] i'm fine with your thinking there. [16:49] rharper: smoser: if i have a cloud-config snippet in my default lxd profile for user.vendor-data a la http://paste.ubuntu.com/25802522/ does that mean no ubuntu user gets created (that is the behavior I am seeing) [16:50] nacc: you're declaring the list of users to add [16:50] so yes, you;re overriding the default user [16:50] rharper: ok [16:50] rharper: I think I must have misunderstood that :) [16:50] rharper: is there an easy way for met to just modify the xisting users? [16:50] *existing [16:51] are you just wanting to import your keys? [16:51] you can use top-level ssh_import_keys: [list of keys here] [16:51] which will get imported into the default user (Ubuntu) [16:51] rharper: ah ok [16:51] rharper: but there is not a top-level ssh-import-ids? [16:51] ssh_import_id: raharper [16:52] ok adding a backlog bug for unit test coverage of resizefs [16:52] rharper: ok, thanks [16:55] ok filed https://bugs.launchpad.net/cloud-init/+bug/1726489 [16:55] Ubuntu bug 1726489 in cloud-init "Need more happy path unit test coverage of cc_resizefs " [Medium,Triaged] [16:56] we need it by 17.2 to avoid any repeated regressions [16:56] (if we were touching resizefs module) [17:04] rharper: does that go in user-data or vendor-data? [17:04] nacc: either will work [17:04] rharper: hrm ok [17:04] nacc: generally you're a user [17:05] things that lxd (or a cloud vendor would set by default) would go into vendor-data; but for your use/scripts/etc user-data is fine [17:05] rharper: ack [17:05] http://paste.ubuntu.com/25802644/ [17:05] they get merged following a very well documented process. [17:05] nacc: that' usually what I put in my user-data configs for vms and such [17:05] powersj: https://jenkins.ubuntu.com/server/job/cloud-init-ci/ [17:06] why oh why. we've gotten so much less reliable recently. :-( [17:06] dpb1: well, it's well documented but not always what folks expect; [17:06] rharper: :) [17:06] grr FAIL: failed to install dependencies with read-dependencies ret=1 [17:06] yes, the documentation is there, the testing thing is what we need. :) [17:06] smoser: it is largely due to the MAAS tests [17:06] e.g. centos [17:06] maybe we put in a retry on the yum package installs? [17:06] err i.e. centos [17:06] yes, but why did those get less reliable ? [17:07] we tried to make them more reliable with the no-fastest-mirror [17:07] smoser: well we dropped the fastest mirror plugin (which we thought was the cause of our pains). [17:07] until its not [17:07] rharper: hrm, it's not added my user's key -- but i'll debug it a bit [17:07] when the "fast" mirror was fast [17:07] heh, right exactly [17:07] it helped [17:07] yes, that was committed on the 19th, even before then things were getting worse [17:08] nacc: are you launching new instances or modifying an existing one ? [17:08] rharper: new insntances [17:08] yes, let's see a cloud-init.log [17:08] rharper: i'm assuming it would write the appropriate key into .ssh/authorized_keys? [17:08] yes, under ubuntu [17:09] rharper: one sec [17:09] http://paste.ubuntu.com/25802666/ [17:10] for this profile: http://paste.ubuntu.com/25802674/ [17:11] nacc: you can also just reference 'default' in your list. [17:11] http://paste.ubuntu.com/25802680/ i think does what you want [17:11] you can use that as vendor-data. user-data would override it. [17:11] smoser: where default is a special string that means keep whatever the default user is? [17:12] nacc: I think you want to change your '-' to '_' in the ssh_import_id key; at least that's what's in mine; [17:13] nacc: yes. default is special, means "the os configured default user" [17:14] rharper: that would be odd that the top level field is different than the users field? [17:14] or is that a yaml quirk [17:14] i'll try it [17:14] rharper: (you can see in smoser's paste ssh-import-id is used there) [17:17] rharper: that worked [17:17] fwiw, I had smoser's variant before and c&p it up [17:17] that's why i had '-' instead of '_' [17:17] not sure if that's a bug or not :) [17:19] one could file a bug and ask for - instead of _ but I believe the manual/rtd shows _ [17:19] rharper: oh it's ok -- just may want to update smoser :-P [17:20] lol [17:20] ironic [17:20] :) [17:22] rharper: nacc both are accepted there in the users: [17:22] generally speaking i think we prefer '_' . [17:22] that's strange [17:23] the ssh_import_id config module only checks _ [17:23] but the user's dictionary kindof fronts "useradd" (or adduser, not sure which). [17:23] so users module is "helping" out ? [17:23] smoser: yeah, something feels odd [17:23] 'normalize_user's is helping out, rharper [17:23] smoser: well that it does work sometimes [17:23] err... normalize_user_groups [17:23] what "does work sometimes" ? [17:24] smoser: '-' works in users but not at the top level [17:24] (that is, ssh_import_id is the only thing that works at the top level) [17:24] for ssh_import_id at least [17:25] smoser: blackboxsw: what if we ran `yum makecache` with retries so we had the metadata cache on hand [17:27] https://paste.ubuntu.com/25802800/ [17:27] rharper: smoser: do you want me to file a bug? i don't think it's a big deal, but it feels inconsistent, at least [17:28] nacc: well, you can file a bug. i think probably the "fix" is to not document 'ssh-import-id' in the users/groups case. [17:28] but only document the '_' varieties of all of those. [17:28] smoser: right [17:29] but do you really want to accept undocumented stuff? [17:29] dont have a choice. :) [17:29] heh [17:29] at least not without a legacy warning period. [17:29] right [17:29] sure [17:35] powersj: so what is the failure path that we're seeing ? [17:36] smoser: on my sampling of failures it is generally the "Cannot retrieve metalink for repository: epel/x86_64" or something similar. [17:36] why would 'makecache' succeed when just doing an nisatll not ? i'd think they'd hit the same resources. [17:37] and i'd also think we want to change the 'install' to be yum -C install [17:37] They do, I suggested it as it might be easier to put a retry around that one command, but maybe it doesn't matter [17:37] other wise we'd still hit the network :) [17:39] blackboxsw: you started 2 minutes after me [17:39] (i was 430, you are 431). which i think testing same thing. [17:39] we can let them both run and have double the chance of success :) [17:43] gambler [17:44] smoser: yeah oopsie, yeah wanted to hope for success in either case [17:47] i think you're ahead of me. or its a dead eheat [17:48] it took you 90 seconds less to checkout [17:52] powersj: not a bad idea on yum makecache. maybe we could do that part daily, and just use cache-only during yum commands? [17:52] --cacheonly I mean [17:52] i think the cache is local to the container that we're running in [17:53] o daily doenst help i dont htink. but yes, if not paired with '--cacheonly' on the install i think it does nothing. [17:54] yeah, looks like it, we'd have to push /var/cache/yum into the container if we wanted to leverage a daily cache update\ [17:54] mmm, bind mount read-only [17:55] it'd save time trying to refresh cache every test run [17:56] and maybe even cut out some of the yum install errors [17:59] hrm, kinda big 45M /var/cache/yum/ [18:03] testing http://paste.ubuntu.com/25803004/ [18:05] lol@MAYBE_RELIABLE_YUM_INSTALL [18:05] not bad at all :) [18:05] jenkins #430 FAIL: failed to install dependencies with read-dependencies ret=1 [18:06] all hopes rest on the lone survivor... 431 [18:07] smoser: with that paste we may want to re-enable fastest mirror plugin per tools/run-centos then right? [18:07] so we allow yum to use fastest mirror and retry if makecache fails [18:07] or use modulo to randomly enable/disable the fast mirror [18:08] sorta want to just ship a USB disk up there and leave it attached [18:10] if you are behind a proxy == don't use fastest mirror [18:13] except when you do want it because it works faster [18:13] per https://wiki.centos.org/PackageManagement/Yum/FastestMirror [18:13] blackboxsw, powersj had pointed at some doc that said dont use fastest mirror if you have a proxy [18:13] right [18:13] of course, that makes a lot of sense to document, instead of just DTRT! [18:13] except disabling it hasn't measurably improved things [18:13] so, that's great but it wasn't the core problem AFAICT [18:14] hah smoser on docs instead of DTRT [18:14] file a bug : [18:14] :) [18:14] lol [18:15] how cloud fastest mirror plugin possible look at environment variables like http_proxy :) [18:17] well, proxies are clearly slower than the fastest mirror [18:19] nice comments on the blog post smoser, didn't notice we could do that [18:19] ... comment inlin ethat is [18:19] inline even [18:21] the 'comment' button. [18:21] i woudlnt have gone looking, but you said something like "you all comment on it" [18:21] and i assumed you meant there was something like that :) [18:25] Error Downloading Packages: [18:25] 1:perl-Module-Pluggable-3.90-144.el6.x86_64: Caching enabled but no local cache of /var/cache/yum/x86_64/6/base/packages/perl-Module-Pluggable-3.90-144.el6.x86_64.rpm from base [18:25] powersj: ^ [18:26] shant we summon yum mirror masters for help here? [18:26] http://paste.ubuntu.com/25803096/ [18:27] i dont understand what makecache does [18:28] smoser: your -C option means to use the deb itself from the local cache.. that's not what we want [18:28] my hope was to wrap around makecache to get the actual repo files as that seems to always be the part that fails and understand if we have a mirror issue or a network issue. [18:28] and by repo files I mean the package metadata info [18:29] the problem is that we don't have a way to *pull* in everything [18:29] my understanding of makecache is similar to apt update [18:29] first [18:29] "does not download or update any *headers* unless it has to to perform the requested action" [18:29] (emphaisis added) [18:29] that is, those packages aren't going to be in the cache, so we can't use -C until we have a copy of those packages [18:29] the makecache is for metadata only [18:30] and the yum install -C means (don't pull from repos only from yum cached rpms) [18:30] so we've a chicken/egg issue [18:30] we need at least one successful run of the install with yum conf keepcached=1 true [18:30] but i'm missing something clearly [18:30] then we can yum makecache [18:30] and then use yum install -C for all further runs [18:30] because it looks to me like there is *ALWAYS* a chicken/egg there. [18:31] yum install -C is equivalent to apt install --reinstall (but wihtout downloading) [18:33] I think we need to create a lambda function mircoservice to host a yum repo for the ci run [18:33] this is insane. [18:33] it's not unrealistic to kick it out for now [18:34] if the repos are unreliable, it's not a helpful test [19:21] https://jenkins.ubuntu.com/server/job/cloud-init-ci/431/ looks hung [19:25] yeeeppp [19:28] don't kill anything yet [19:29] cloud-test-ubuntu-xenial-snapshot-ojta1rm2ous4xibxcp1hrklcwhutd | FROZEN [19:29] that's expected, right? the snapshot ? [19:30] yes [19:31] should the zfs file be on the nvme... [19:32] pid 405 is stopped [19:32] it's a script [19:32] running [19:32] python3 -m tests.cloud_tests run --verbose --os-name xenial --test modules/apt_configure_sources_list.yaml --test modules/ntp_servers --test modules/set_password_list --test modules/user_groups --deb cloud-init_*_all.deb [19:33] that's the integration test [19:33] two tests are still "running" [19:33] cloud-test-ubuntu-xenial-image-modification-1l8nht7xdh4kgp0n5aq and cloud-test-ubuntu-xenial-modules-set-password-list-gs123zqh6npu [19:33] * rharper execs into them [19:34] it completed [19:34] but the harness isn't seeing the result [19:35] does it ssh into it? or exec cat the result.json file ? [19:35] same for both [19:35] they completed, no errors [19:35] I was under the assumption jenkins waits for the return code of the process [19:36] what stops the container ? [19:36] don't we poll fore result.json in the lxd instance ? [19:36] jenkins is fine, it' script spawned the above python3 -m tests.cloud_tests run command [19:36] it ran 4 containers [19:36] 2 of which are completed and exited, these two are remaining, but idle [19:37] cloud-init inside has completed successfully [19:37] but the harness "watching" each of those container runs has somehow given up on it's "exit" condition [19:39] lets see if it likes me [19:39] https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/332666 [19:42] rharper: from the jenkins log, I see 3 tests were launched and collected from. At this point, based no output alone, I would expect it is stuck/hung trying to get result.json [19:42] The next thing it should be doing is launching an 4th container for the 4th test [19:42] powersj: right, but where is the code in the cloud_tests where it's attempt to get the reoslt ? [19:42] collect.py [19:42] what calls collect ? [19:43] run_funcs.py [19:43] so instance.run_script [19:47] hrm, where do we import pylxd ? [19:47] I see platform/lxd.py [19:49] rharper: that should be it [19:51] blackboxsw: shall we say screw jenkins for now ? [19:52] wrt to fix-device-path... [19:52] * blackboxsw was just writing up a retrospective document. that CI breakage on our test service has been painful to say the least [19:52] smoser: I think we need to move on (as we are manually testing this) [19:52] and now have a little bit better unit test coverage [19:53] we'll add the manual logs for deploying proposed for this bug fix we are pulling in [19:55] powersj: well, I don't really see anything here; we don't have a way to get pylxd to log it's commands ? Like the excute ? [19:55] rharper: I'm not familiar enough with pylxd [19:56] smoser: getting closer Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again [19:56] :: failed [1] (1/10). sleeping 5. [19:57] hrm it's looping. so it looks like it's making it farther on attempt 2 [19:58] nice smoser Complete! [19:58] Installing deps: python-configobj python-oauthlib python-contextlib2 python-coverage python-httpretty python-six PyYAML python-jsonpatch python-jinja2 python-jsonschema python-setuptools python-requests python-unittest2 python-mock python-nose e2fsprogs iproute net-tools procps rsyslog shadow-utils sudo python-devel python-setuptools make sudo tar python-tox [19:58] horay! [19:58] maaan [19:58] that's ridiculous [19:58] ridonkulous [19:59] powersj: I'm thinking that we likely want a timeout on those executes with a retry [19:59] blackboxsw: fwiw, sudo is listed twice [19:59] the container still works via the normal lxc client, so something's wonky with pylxd [19:59] rharper: is it possible we are missing a raise in the runscript case? [19:59] blackboxsw: there are no raises afaict [19:59] so of pylxd doesn't raise on it's failure, we just sit [20:00] still smells like a pylxd issue [20:00] ok gotcha. [20:00] rharper: agreed [20:00] ohh true nacc [20:00] we can tweak the deps script to use set([pkg_list]) to ensure uniques [20:00] blackboxsw: yeah, so 'sudo' is in the COMMON and then manually [20:00] blackboxsw: not sure it matters, but it probably implies there's not a uniq-like operation [20:00] blackboxsw: yeah [20:01] yeah [20:01] wait. how did sudo get in there twice [20:01] one can never have enough sudo [20:02] ironically, i just figured out that my git-ubuntu job failure was to not having sudo installed in the LXD image :) [20:02] synchronicity between squads achieved! [20:02] heh smoser I think it's probably pulling in deps from packages/pkg-deps.json [20:02] hahaha nacc #achievementunlocked [20:03] hm.. yeah [20:06] smoser officially approved https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/332666 [20:06] can you land and I can rebase and pull it into my branch [20:07] blackboxsw: http://paste.ubuntu.com/25803766/ [20:07] that'd fuix the sudo twice, which is probably "who cares" [20:07] so i'm fine to ignore it for now. [20:07] and so would your 'sort' [20:08] err... set() [20:08] landing now [20:09] smoser: yeah i don't think it's fatal, but given the output, it's nice to be able to parse it and not be surprised :) [20:12] blackboxsw: its in. [20:12] blackboxsw: you should rebase on upstream/master and then push a change and lets see if you get a happy face. [20:19] ok smoser , sorry missed the paste I was thinking set.unions, but that looks just as 'noisy' as the for loop [20:19] rebasing [20:21] pushed [20:21] rebuilding [20:23] https://jenkins.ubuntu.com/server/job/cloud-init-ci/433/console [20:38] blackboxsw: land ? [20:38] i approved. [20:38] smoser: it just completed, I'm waiting on tox [20:38] and will push [20:38] \o/ [20:39] yeah almost [20:39] then you can do mps for x, y, z [20:39] nacc: we still do not have a 'bb', right ? [20:39] ie, no archive open [20:40] smoser: right [20:40] :-( [20:40] smoser: you can open tasks though [20:40] sure. [20:40] smoser: they are just against an unopened archive :) [20:40] smoser: pushed [20:40] afaik, no name yet anyways [20:40] yeah. ok. so i guess we just upload new upstream snapshot, blackboxsw to x, y, z [20:40] ok smoser time to sync on that upstream [20:40] just want to dot some i's and cross t's [20:40] I'm in hangout [20:41] k [21:34] blackboxsw: uploaded x, z, a [21:34] can't do x [21:34] oops [21:34] tomorrow we need to [21:34] make a recipe build for artful [21:34] and /me is pumpkin now [21:34] thx smoser [21:34] see ya [21:34] will start testing x,z