[06:11] <blackboxsw> ok will have a branch up tomorrow on the SRU https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+ref/fix-device-path-from-cmdline-regression. Just need to complete a test and the download rates I'm getting right now are painful (56K/b) so I'm leaving the download running til morning
[15:29] <dpb1> blackboxsw: is there a bug associated with that fix?
[15:30] <blackboxsw> yep dpb1 I'll dig it up and tie it in.
[15:30] <blackboxsw> https://bugs.launchpad.net/cloud-init/+bug/1725067
[15:30] <blackboxsw> dpb1: ^
[15:30] <blackboxsw> the fix will land today
[15:33] <dpb1> oh, cool
[15:33] <dpb1> ah, ok, so caught during proposed testing
[15:33] <dpb1> ?
[15:43] <smoser> rharper: http://paste.ubuntu.com/25802105/
[15:43] <smoser> and blackboxsw
[15:43] <smoser> that *does* pass tests.
[15:43] <rharper> k
[15:43] <smoser> we could ideally make some metadata to the patching that would maek that work.
[15:44] <smoser> would make the "block device" return for a stat or something.
[15:44] <smoser> but thats getting fancy
[15:49]  * rharper likes fancy
[16:11] <blackboxsw> dpb1: yep caught before SRU, so it's where we want to catch it
[16:12] <blackboxsw> well, we'd like to catch that with unit tests/integration tests
[16:12] <blackboxsw> but, before release is better than the alternative
[16:14] <dpb1> blackboxsw: can it be added to a unit test?
[16:15] <dpb1> s/added to/covered by/
[16:15] <blackboxsw> dpb1: I added a unit test that gets most of it. I have time to add a bit more concise unit test
[16:18] <blackboxsw> yeah, my work that introduced this regression was to restructure the code so it was a bit easier to unit test (as it didn't have much in the way of testing). I just didn't add enough tests to validate this case. more to come
[16:18]  * dpb1 nods
[16:38] <smoser> blackboxsw: i think you forgot to push on fix-device-paht-from-cmdline-regression
[16:38] <smoser> https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/332634
[16:39] <blackboxsw> oops pushed 129beef2..445ee933
[16:40] <blackboxsw> was debating about adding one more unit test for happy path handle(), but I think I'll add that in a separate merge proposal as it'll be some more rework
[16:40] <blackboxsw> I want a test that shows we are properly attempting to resize a disk by uuid
[16:40] <blackboxsw> not just the test that detects the disk/by-uuid from the commandline
[16:41] <blackboxsw> but I think we don't have to block the SRU for this additional test, since we are manually covering that
[16:41] <blackboxsw> what do you think smoser
[16:49] <smoser> i'm fine with  your thinking there.
[16:49] <nacc> rharper: smoser: if i have a cloud-config snippet in my default lxd profile for user.vendor-data a la http://paste.ubuntu.com/25802522/ does that mean no ubuntu user gets created (that is the behavior I am seeing)
[16:50] <rharper> nacc: you're declaring the list of users to add
[16:50] <rharper> so yes, you;re overriding the default user
[16:50] <nacc> rharper: ok
[16:50] <nacc> rharper: I think I must have misunderstood that :)
[16:50] <nacc> rharper: is there an easy way for met to just modify the xisting users?
[16:50] <nacc> *existing
[16:51] <rharper> are you just wanting to import your keys?
[16:51] <rharper> you can use top-level ssh_import_keys: [list of keys here]
[16:51] <rharper> which will get imported into the default user (Ubuntu)
[16:51] <nacc> rharper: ah ok
[16:51] <nacc> rharper: but there is not a top-level ssh-import-ids?
[16:51] <rharper> ssh_import_id: raharper
[16:52] <blackboxsw> ok adding a backlog bug for unit test coverage of resizefs
[16:52] <nacc> rharper: ok, thanks
[16:55] <blackboxsw> ok filed https://bugs.launchpad.net/cloud-init/+bug/1726489
[16:56] <blackboxsw> we need it by 17.2 to avoid any repeated regressions
[16:56] <blackboxsw> (if we were touching resizefs module)
[17:04] <nacc> rharper: does that go in user-data or vendor-data?
[17:04] <rharper> nacc: either will work
[17:04] <nacc> rharper: hrm ok
[17:04] <rharper> nacc: generally you're a user
[17:05] <rharper> things that lxd (or a cloud vendor would set by default) would go into vendor-data; but for your use/scripts/etc user-data is fine
[17:05] <nacc> rharper: ack
[17:05] <rharper> http://paste.ubuntu.com/25802644/
[17:05] <dpb1> they get merged following a very well documented process.
[17:05] <rharper> nacc: that' usually what I put in my user-data configs for vms and such
[17:05] <smoser> powersj: https://jenkins.ubuntu.com/server/job/cloud-init-ci/
[17:06] <smoser> why  oh why. we've gotten so much less reliable recently. :-(
[17:06] <rharper> dpb1: well, it's well documented but not always what folks expect;
[17:06] <dpb1> rharper: :)
[17:06] <blackboxsw> grr FAIL: failed to install dependencies with read-dependencies ret=1
[17:06] <dpb1> yes, the documentation is there, the testing thing is what we need. :)
[17:06] <powersj> smoser: it is largely due to the MAAS tests
[17:06] <powersj> e.g. centos
[17:06] <blackboxsw> maybe we put in a retry on the yum package installs?
[17:06] <powersj> err i.e. centos
[17:06] <smoser> yes, but why did those get less reliable ?
[17:07] <smoser> we tried to make them more reliable with the no-fastest-mirror
[17:07] <blackboxsw> smoser: well we dropped the fastest mirror plugin (which we thought was the cause of our pains).
[17:07] <rharper> until its not
[17:07] <nacc> rharper: hrm, it's not added my user's key -- but i'll debug it a bit
[17:07] <rharper> when the "fast" mirror was fast
[17:07] <blackboxsw> heh, right exactly
[17:07] <rharper> it helped
[17:07] <powersj> yes, that was committed on the 19th, even before then things were getting worse
[17:08] <rharper> nacc: are you launching new instances or modifying an existing one ?
[17:08] <nacc> rharper: new insntances
[17:08] <rharper> yes, let's see a cloud-init.log
[17:08] <nacc> rharper: i'm assuming it would write the appropriate key into .ssh/authorized_keys?
[17:08] <rharper> yes, under ubuntu
[17:09] <nacc> rharper: one sec
[17:09] <nacc> http://paste.ubuntu.com/25802666/
[17:10] <nacc> for this profile: http://paste.ubuntu.com/25802674/
[17:11] <smoser> nacc: you can also just reference 'default' in your list.
[17:11] <smoser> http://paste.ubuntu.com/25802680/ i think does what you want
[17:11] <smoser> you can use that as vendor-data. user-data would override it.
[17:11] <nacc> smoser: where default is a special string that means keep whatever the default user is?
[17:12] <rharper> nacc: I think you want to change your '-' to '_' in the ssh_import_id key; at least that's what's in mine;
[17:13] <smoser> nacc: yes. default is special, means "the os configured default user"
[17:14] <nacc> rharper: that would be odd that the top level field is different than the users field?
[17:14] <nacc> or is that a yaml quirk
[17:14] <nacc> i'll try it
[17:14] <nacc> rharper: (you can see in smoser's paste ssh-import-id is used there)
[17:17] <nacc> rharper: that worked
[17:17] <nacc> fwiw, I had smoser's variant before and c&p it up
[17:17] <nacc> that's why i had '-' instead of '_'
[17:17] <nacc> not sure if that's a bug or not :)
[17:19] <rharper> one could file a bug and ask for - instead of _ but I believe the manual/rtd shows _
[17:19] <nacc> rharper: oh it's ok -- just may want to update smoser :-P
[17:20] <rharper> lol
[17:20] <rharper> ironic
[17:20] <nacc> :)
[17:22] <smoser> rharper: nacc both are accepted there in the users:
[17:22] <smoser> generally speaking i think we prefer '_' .
[17:22] <rharper> that's strange
[17:23] <rharper> the ssh_import_id config module only checks _
[17:23] <smoser> but the user's dictionary kindof fronts "useradd" (or adduser, not sure which).
[17:23] <rharper> so users module is "helping" out ?
[17:23] <nacc> smoser: yeah, something feels odd
[17:23] <smoser> 'normalize_user's is helping out, rharper
[17:23] <nacc> smoser: well that it does work sometimes
[17:23] <smoser> err... normalize_user_groups
[17:23] <smoser> what "does work sometimes" ?
[17:24] <nacc> smoser: '-' works in users but not at the top level
[17:24] <nacc> (that is, ssh_import_id is the only thing that works at the top level)
[17:24] <rharper> for ssh_import_id at least
[17:25] <powersj> smoser: blackboxsw: what if we ran `yum makecache` with retries so we had the metadata cache on hand
[17:27] <powersj> https://paste.ubuntu.com/25802800/
[17:27] <nacc> rharper: smoser: do you want me to file a bug? i don't think it's a big deal, but it feels inconsistent, at least
[17:28] <smoser> nacc: well, you can file a bug. i think probably the "fix" is to not document 'ssh-import-id' in the users/groups case.
[17:28] <smoser> but only document the '_' varieties of all of those.
[17:28] <nacc> smoser: right
[17:29] <nacc> but do you really want to accept undocumented stuff?
[17:29] <smoser> dont have a choice. :)
[17:29] <nacc> heh
[17:29] <smoser> at least not without a legacy warning period.
[17:29] <dpb1> right
[17:29] <nacc> sure
[17:35] <smoser> powersj: so what is the failure path that we're seeing ?
[17:36] <powersj> smoser: on my sampling of failures it is generally the "Cannot retrieve metalink for repository: epel/x86_64" or something similar.
[17:36] <smoser> why would 'makecache' succeed when just doing an nisatll not ? i'd think they'd hit the same resources.
[17:37] <smoser> and i'd also think we want to change the 'install' to be yum -C install
[17:37] <powersj> They do, I suggested it as it might be easier to put a retry around that one command, but maybe it doesn't matter
[17:37] <smoser> other wise we'd still hit the network :)
[17:39] <smoser> blackboxsw: you started 2 minutes after me
[17:39] <smoser> (i was 430, you are 431). which i think testing same thing.
[17:39] <smoser> we can let them both run and have double the chance of success :)
[17:43] <rharper> gambler
[17:44] <blackboxsw> smoser: yeah oopsie, yeah wanted to hope for success in either case
[17:47] <smoser> i think you're ahead of me. or its a dead eheat
[17:48] <smoser> it took you 90 seconds less to checkout
[17:52] <blackboxsw> powersj: not a bad idea on yum makecache. maybe we could do that part daily, and just use cache-only during yum commands?
[17:52] <blackboxsw> --cacheonly I mean
[17:52] <smoser> i think the cache is local to the container that we're running in
[17:53] <smoser> o daily doenst help i dont htink. but  yes, if not paired with '--cacheonly' on the install i think it does nothing.
[17:54] <blackboxsw> yeah, looks like it, we'd have to push /var/cache/yum into the container if we wanted to leverage a daily cache update\
[17:54] <rharper> mmm, bind mount read-only
[17:55] <blackboxsw> it'd save time trying to refresh cache every test run
[17:56] <blackboxsw> and maybe even cut out some of the yum install errors
[17:59] <blackboxsw> hrm, kinda big 45M	/var/cache/yum/
[18:03] <smoser> testing http://paste.ubuntu.com/25803004/
[18:05] <powersj> lol@MAYBE_RELIABLE_YUM_INSTALL
[18:05] <blackboxsw> not bad at all :)
[18:05] <blackboxsw> jenkins #430 FAIL: failed to install dependencies with read-dependencies ret=1
[18:06] <blackboxsw> all hopes rest on the lone survivor... 431
[18:07] <blackboxsw> smoser: with that paste we may want to re-enable fastest mirror plugin per tools/run-centos then right?
[18:07] <blackboxsw> so we allow yum to use fastest mirror and retry if makecache fails
[18:07] <rharper> or use modulo to randomly enable/disable the fast mirror
[18:08] <rharper> sorta want to just ship a USB disk up there and leave it attached
[18:10] <powersj> if you are behind a proxy == don't use fastest mirror
[18:13] <rharper> except when you do want it because it works faster
[18:13] <powersj> per https://wiki.centos.org/PackageManagement/Yum/FastestMirror
[18:13] <smoser> blackboxsw, powersj had pointed at some doc that said dont use fastest mirror if you have a proxy
[18:13] <rharper> right
[18:13] <smoser> of course, that makes a lot of sense to document, instead of just DTRT!
[18:13] <rharper> except disabling it hasn't measurably improved things
[18:13] <rharper> so, that's great but it wasn't the core problem AFAICT
[18:14] <blackboxsw> hah smoser on docs instead of DTRT
[18:14] <blackboxsw> file a bug :
[18:14] <blackboxsw> :)
[18:14] <rharper> lol
[18:15] <blackboxsw> how cloud fastest mirror plugin possible look at environment variables like http_proxy :)
[18:17] <rharper> well, proxies are clearly slower than the fastest mirror
[18:19] <blackboxsw> nice comments on the blog post smoser, didn't notice we could do that
[18:19] <blackboxsw> ... comment inlin ethat is
[18:19] <blackboxsw> inline even
[18:21] <smoser> the 'comment' button.
[18:21] <smoser> i woudlnt have gone looking, but you said something like "you all comment on it"
[18:21] <smoser> and i assumed you meant there was something like that :)
[18:25] <smoser> Error Downloading Packages:
[18:25] <smoser>   1:perl-Module-Pluggable-3.90-144.el6.x86_64: Caching enabled but no local cache of /var/cache/yum/x86_64/6/base/packages/perl-Module-Pluggable-3.90-144.el6.x86_64.rpm from base
[18:25] <smoser> powersj: ^
[18:26] <rharper> shant we summon yum mirror masters for help here?
[18:26] <smoser> http://paste.ubuntu.com/25803096/
[18:27] <smoser> i dont understand what makecache does
[18:28] <powersj> smoser: your -C option means to use the deb itself from the local cache.. that's not what we want
[18:28] <powersj> my hope was to wrap around makecache to get the actual repo files as that seems to always be the part that fails and understand if we have a mirror issue or a network issue.
[18:28] <powersj> and by repo files I mean the package metadata info
[18:29] <rharper> the problem is that we don't have a way to *pull* in everything
[18:29] <powersj> my understanding of makecache is similar to apt update
[18:29] <rharper> first
[18:29] <smoser> "does not download or update any *headers*  unless it has to to perform the requested action"
[18:29] <smoser> (emphaisis added)
[18:29] <rharper> that is, those packages aren't going to be in the cache, so we can't use -C until we have a copy of those packages
[18:29] <rharper> the makecache is for metadata only
[18:30] <rharper> and the yum install -C means (don't pull from repos only from yum cached rpms)
[18:30] <rharper> so we've a chicken/egg issue
[18:30] <rharper> we need at least one successful run of the install with yum conf keepcached=1 true
[18:30] <smoser> but i'm missing something clearly
[18:30] <rharper> then we can yum makecache
[18:30] <rharper> and then use yum install -C for all further runs
[18:30] <smoser> because it looks to me like there is *ALWAYS* a chicken/egg there.
[18:31] <rharper> yum install -C is equivalent to apt install --reinstall (but wihtout downloading)
[18:33] <rharper> I think we need to create a lambda function mircoservice to host a yum repo for the ci run
[18:33] <smoser> this is insane.
[18:33] <rharper> it's not unrealistic to kick it out for now
[18:34] <rharper> if the repos are unreliable, it's not a helpful test
[19:21] <blackboxsw> https://jenkins.ubuntu.com/server/job/cloud-init-ci/431/ looks hung
[19:25] <powersj> yeeeppp
[19:28] <rharper> don't kill anything yet
[19:29] <rharper> cloud-test-ubuntu-xenial-snapshot-ojta1rm2ous4xibxcp1hrklcwhutd | FROZEN
[19:29] <rharper> that's expected, right? the snapshot ?
[19:30] <powersj> yes
[19:31] <powersj> should the zfs file be on the nvme...
[19:32] <rharper> pid 405 is stopped
[19:32] <rharper> it's a script
[19:32] <rharper> running
[19:32] <rharper> python3 -m tests.cloud_tests run --verbose --os-name xenial --test modules/apt_configure_sources_list.yaml --test modules/ntp_servers --test modules/set_password_list --test modules/user_groups --deb cloud-init_*_all.deb
[19:33] <powersj> that's the integration test
[19:33] <rharper> two tests are still "running"
[19:33] <rharper> cloud-test-ubuntu-xenial-image-modification-1l8nht7xdh4kgp0n5aq and cloud-test-ubuntu-xenial-modules-set-password-list-gs123zqh6npu
[19:33]  * rharper execs into them
[19:34] <rharper> it completed
[19:34] <rharper> but the harness isn't seeing the result
[19:35] <rharper> does it ssh into it? or exec cat the result.json file ?
[19:35] <rharper> same for both
[19:35] <rharper> they completed, no errors
[19:35] <powersj> I was under the assumption jenkins waits for the return code of the process
[19:36] <rharper> what stops the container ?
[19:36] <rharper> don't we poll fore result.json in the lxd instance ?
[19:36] <rharper> jenkins is fine, it' script spawned the above python3 -m tests.cloud_tests run command
[19:36] <rharper> it ran 4 containers
[19:36] <rharper> 2 of which are completed and exited, these two are remaining, but idle
[19:37] <rharper> cloud-init inside has completed successfully
[19:37] <rharper> but the harness "watching" each of those container runs has somehow given up on it's "exit" condition
[19:39] <smoser> lets see if it likes me
[19:39] <smoser>  https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/332666
[19:42] <powersj> rharper: from the jenkins log, I see 3 tests were launched and collected from. At this point, based no output alone, I would expect it is stuck/hung trying to get result.json
[19:42] <powersj> The next thing it should be doing is launching an 4th container for the 4th test
[19:42] <rharper> powersj: right, but where is the code in the cloud_tests where it's attempt to get the reoslt ?
[19:42] <powersj> collect.py
[19:42] <rharper> what calls collect ?
[19:43] <powersj> run_funcs.py
[19:43] <rharper> so instance.run_script
[19:47] <rharper> hrm, where do we import pylxd ?
[19:47] <rharper> I see platform/lxd.py
[19:49] <powersj> rharper: that should be it
[19:51] <smoser> blackboxsw: shall we say screw jenkins for now ?
[19:52] <smoser> wrt to fix-device-path...
[19:52]  * blackboxsw was just writing up a retrospective document. that CI breakage on our test service has been painful to say the least
[19:52] <blackboxsw> smoser: I think we need to move on (as we are manually testing this)
[19:52] <blackboxsw> and now have a little bit better unit test coverage
[19:53] <blackboxsw> we'll add the manual logs for deploying proposed for this bug fix we are pulling in
[19:55] <rharper> powersj: well, I don't really see anything here;  we don't have a way to get pylxd to log it's commands ? Like the excute ?
[19:55] <powersj> rharper: I'm not familiar enough with pylxd
[19:56] <blackboxsw> smoser: getting closer Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again
[19:56] <blackboxsw> :: failed [1] (1/10). sleeping 5.
[19:57] <blackboxsw> hrm it's looping. so it looks like it's making it farther on attempt 2
[19:58] <blackboxsw> nice smoser Complete!
[19:58] <blackboxsw> Installing deps: python-configobj python-oauthlib python-contextlib2 python-coverage python-httpretty python-six PyYAML python-jsonpatch python-jinja2 python-jsonschema python-setuptools python-requests python-unittest2 python-mock python-nose e2fsprogs iproute net-tools procps rsyslog shadow-utils sudo python-devel python-setuptools make sudo tar python-tox
[19:58] <smoser> horay!
[19:58] <blackboxsw> maaan
[19:58] <blackboxsw> that's ridiculous
[19:58] <smoser> ridonkulous
[19:59] <rharper> powersj: I'm thinking that we likely want a timeout on those executes with a retry
[19:59] <nacc> blackboxsw: fwiw, sudo is listed twice
[19:59] <rharper> the container still works via the normal lxc client, so something's wonky with pylxd
[19:59] <blackboxsw> rharper: is it possible we are missing a raise in the runscript case?
[19:59] <rharper> blackboxsw: there are no raises afaict
[19:59] <rharper> so of pylxd doesn't raise on it's failure, we just sit
[20:00] <rharper> still smells like a pylxd issue
[20:00] <blackboxsw> ok gotcha.
[20:00] <powersj> rharper: agreed
[20:00] <blackboxsw> ohh true nacc
[20:00] <blackboxsw> we can tweak the deps script to use set([pkg_list]) to ensure uniques
[20:00] <smoser> blackboxsw: yeah, so 'sudo' is in the COMMON and then manually
[20:00] <nacc> blackboxsw: not sure it matters, but it probably implies there's not a uniq-like operation
[20:00] <nacc> blackboxsw: yeah
[20:01] <smoser> yeah
[20:01] <smoser> wait. how did sudo get in there twice
[20:01] <rharper> one can never have enough sudo
[20:02] <nacc> ironically, i just figured out that my git-ubuntu job failure was to not having sudo installed in the LXD image :)
[20:02] <nacc> synchronicity between squads achieved!
[20:02] <blackboxsw> heh smoser I think it's probably pulling in deps from packages/pkg-deps.json
[20:02] <blackboxsw> hahaha nacc #achievementunlocked
[20:03] <smoser> hm.. yeah
[20:06] <blackboxsw> smoser officially approved https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/332666
[20:06] <blackboxsw> can you land and I can rebase and pull it into my branch
[20:07] <smoser> blackboxsw: http://paste.ubuntu.com/25803766/
[20:07] <smoser> that'd fuix the sudo twice, which is probably "who cares"
[20:07] <smoser> so i'm fine to ignore it for now.
[20:07] <smoser> and so would your 'sort'
[20:08] <smoser> err... set()
[20:08] <smoser> landing now
[20:09] <nacc> smoser: yeah i don't think it's fatal, but given the output, it's nice to be able to parse it and not be surprised :)
[20:12] <smoser> blackboxsw: its in.
[20:12] <smoser> blackboxsw: you should rebase on upstream/master and then push a change and lets see if you get a happy face.
[20:19] <blackboxsw> ok smoser , sorry missed the paste I was thinking set.unions, but that looks just as 'noisy' as the for loop
[20:19] <blackboxsw> rebasing
[20:21] <blackboxsw> pushed
[20:21] <blackboxsw> rebuilding
[20:23] <blackboxsw> https://jenkins.ubuntu.com/server/job/cloud-init-ci/433/console
[20:38] <smoser> blackboxsw: land ?
[20:38] <smoser> i approved.
[20:38] <blackboxsw> smoser: it just completed, I'm waiting on tox
[20:38] <blackboxsw> and will push
[20:38] <smoser> \o/
[20:39] <blackboxsw> yeah almost
[20:39] <smoser> then you can do mps for x, y, z
[20:39] <smoser> nacc: we still do not have a 'bb', right ?
[20:39] <smoser> ie, no archive open
[20:40] <nacc> smoser: right
[20:40] <smoser> :-(
[20:40] <nacc> smoser: you can open tasks though
[20:40] <smoser> sure.
[20:40] <nacc> smoser: they are just against an unopened archive :)
[20:40] <blackboxsw> smoser: pushed
[20:40] <nacc> afaik, no name yet anyways
[20:40] <smoser> yeah. ok. so i guess we just upload new upstream snapshot, blackboxsw to x, y, z
[20:40] <blackboxsw> ok smoser time to sync on that upstream
[20:40] <blackboxsw> just want to dot some i's and cross t's
[20:40] <blackboxsw> I'm in hangout
[20:41] <smoser> k
[21:34] <smoser> blackboxsw: uploaded x, z, a
[21:34] <blackboxsw> can't do x
[21:34] <blackboxsw> oops
[21:34] <smoser> tomorrow we need to
[21:34] <smoser> make a recipe build for artful
[21:34] <smoser> and /me is pumpkin now
[21:34] <blackboxsw> thx smoser
[21:34] <blackboxsw> see ya
[21:34] <blackboxsw> will start testing x,z