/srv/irclogs.ubuntu.com/2017/10/23/#cloud-init.txt

blackboxswok will have a branch up tomorrow on the SRU https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+ref/fix-device-path-from-cmdline-regression. Just need to complete a test and the download rates I'm getting right now are painful (56K/b) so I'm leaving the download running til morning06:11
dpb1blackboxsw: is there a bug associated with that fix?15:29
blackboxswyep dpb1 I'll dig it up and tie it in.15:30
blackboxswhttps://bugs.launchpad.net/cloud-init/+bug/172506715:30
ubot5Ubuntu bug 1725067 in cloud-init "cloud-init resizefs fails when booting with root=PARTUUID=" [High,Triaged]15:30
blackboxswdpb1: ^15:30
blackboxswthe fix will land today15:30
dpb1oh, cool15:33
dpb1ah, ok, so caught during proposed testing15:33
dpb1?15:33
smoserrharper: http://paste.ubuntu.com/25802105/15:43
smoserand blackboxsw15:43
smoserthat *does* pass tests.15:43
rharperk15:43
smoserwe could ideally make some metadata to the patching that would maek that work.15:43
smoserwould make the "block device" return for a stat or something.15:44
smoserbut thats getting fancy15:44
* rharper likes fancy15:49
blackboxswdpb1: yep caught before SRU, so it's where we want to catch it16:11
blackboxswwell, we'd like to catch that with unit tests/integration tests16:12
blackboxswbut, before release is better than the alternative16:12
dpb1blackboxsw: can it be added to a unit test?16:14
dpb1s/added to/covered by/16:15
blackboxswdpb1: I added a unit test that gets most of it. I have time to add a bit more concise unit test16:15
blackboxswyeah, my work that introduced this regression was to restructure the code so it was a bit easier to unit test (as it didn't have much in the way of testing). I just didn't add enough tests to validate this case. more to come16:18
* dpb1 nods16:18
smoserblackboxsw: i think you forgot to push on fix-device-paht-from-cmdline-regression16:38
smoserhttps://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/33263416:38
blackboxswoops pushed 129beef2..445ee93316:39
blackboxswwas debating about adding one more unit test for happy path handle(), but I think I'll add that in a separate merge proposal as it'll be some more rework16:40
blackboxswI want a test that shows we are properly attempting to resize a disk by uuid16:40
blackboxswnot just the test that detects the disk/by-uuid from the commandline16:40
blackboxswbut I think we don't have to block the SRU for this additional test, since we are manually covering that16:41
blackboxswwhat do you think smoser16:41
smoseri'm fine with  your thinking there.16:49
naccrharper: smoser: if i have a cloud-config snippet in my default lxd profile for user.vendor-data a la http://paste.ubuntu.com/25802522/ does that mean no ubuntu user gets created (that is the behavior I am seeing)16:49
rharpernacc: you're declaring the list of users to add16:50
rharperso yes, you;re overriding the default user16:50
naccrharper: ok16:50
naccrharper: I think I must have misunderstood that :)16:50
naccrharper: is there an easy way for met to just modify the xisting users?16:50
nacc*existing16:50
rharperare you just wanting to import your keys?16:51
rharperyou can use top-level ssh_import_keys: [list of keys here]16:51
rharperwhich will get imported into the default user (Ubuntu)16:51
naccrharper: ah ok16:51
naccrharper: but there is not a top-level ssh-import-ids?16:51
rharperssh_import_id: raharper16:51
blackboxswok adding a backlog bug for unit test coverage of resizefs16:52
naccrharper: ok, thanks16:52
blackboxswok filed https://bugs.launchpad.net/cloud-init/+bug/172648916:55
ubot5Ubuntu bug 1726489 in cloud-init "Need more happy path unit test coverage of cc_resizefs " [Medium,Triaged]16:55
blackboxswwe need it by 17.2 to avoid any repeated regressions16:56
blackboxsw(if we were touching resizefs module)16:56
naccrharper: does that go in user-data or vendor-data?17:04
rharpernacc: either will work17:04
naccrharper: hrm ok17:04
rharpernacc: generally you're a user17:04
rharperthings that lxd (or a cloud vendor would set by default) would go into vendor-data; but for your use/scripts/etc user-data is fine17:05
naccrharper: ack17:05
rharperhttp://paste.ubuntu.com/25802644/17:05
dpb1they get merged following a very well documented process.17:05
rharpernacc: that' usually what I put in my user-data configs for vms and such17:05
smoserpowersj: https://jenkins.ubuntu.com/server/job/cloud-init-ci/17:05
smoserwhy  oh why. we've gotten so much less reliable recently. :-(17:06
rharperdpb1: well, it's well documented but not always what folks expect;17:06
dpb1rharper: :)17:06
blackboxswgrr FAIL: failed to install dependencies with read-dependencies ret=117:06
dpb1yes, the documentation is there, the testing thing is what we need. :)17:06
powersjsmoser: it is largely due to the MAAS tests17:06
powersje.g. centos17:06
blackboxswmaybe we put in a retry on the yum package installs?17:06
powersjerr i.e. centos17:06
smoseryes, but why did those get less reliable ?17:06
smoserwe tried to make them more reliable with the no-fastest-mirror17:07
blackboxswsmoser: well we dropped the fastest mirror plugin (which we thought was the cause of our pains).17:07
rharperuntil its not17:07
naccrharper: hrm, it's not added my user's key -- but i'll debug it a bit17:07
rharperwhen the "fast" mirror was fast17:07
blackboxswheh, right exactly17:07
rharperit helped17:07
powersjyes, that was committed on the 19th, even before then things were getting worse17:07
rharpernacc: are you launching new instances or modifying an existing one ?17:08
naccrharper: new insntances17:08
rharperyes, let's see a cloud-init.log17:08
naccrharper: i'm assuming it would write the appropriate key into .ssh/authorized_keys?17:08
rharperyes, under ubuntu17:08
naccrharper: one sec17:09
nacchttp://paste.ubuntu.com/25802666/17:09
naccfor this profile: http://paste.ubuntu.com/25802674/17:10
smosernacc: you can also just reference 'default' in your list.17:11
smoserhttp://paste.ubuntu.com/25802680/ i think does what you want17:11
smoseryou can use that as vendor-data. user-data would override it.17:11
naccsmoser: where default is a special string that means keep whatever the default user is?17:11
rharpernacc: I think you want to change your '-' to '_' in the ssh_import_id key; at least that's what's in mine;17:12
smosernacc: yes. default is special, means "the os configured default user"17:13
naccrharper: that would be odd that the top level field is different than the users field?17:14
naccor is that a yaml quirk17:14
nacci'll try it17:14
naccrharper: (you can see in smoser's paste ssh-import-id is used there)17:14
naccrharper: that worked17:17
naccfwiw, I had smoser's variant before and c&p it up17:17
naccthat's why i had '-' instead of '_'17:17
naccnot sure if that's a bug or not :)17:17
rharperone could file a bug and ask for - instead of _ but I believe the manual/rtd shows _17:19
naccrharper: oh it's ok -- just may want to update smoser :-P17:19
rharperlol17:20
rharperironic17:20
nacc:)17:20
smoserrharper: nacc both are accepted there in the users:17:22
smosergenerally speaking i think we prefer '_' .17:22
rharperthat's strange17:22
rharperthe ssh_import_id config module only checks _17:23
smoserbut the user's dictionary kindof fronts "useradd" (or adduser, not sure which).17:23
rharperso users module is "helping" out ?17:23
naccsmoser: yeah, something feels odd17:23
smoser'normalize_user's is helping out, rharper17:23
naccsmoser: well that it does work sometimes17:23
smosererr... normalize_user_groups17:23
smoserwhat "does work sometimes" ?17:23
naccsmoser: '-' works in users but not at the top level17:24
nacc(that is, ssh_import_id is the only thing that works at the top level)17:24
rharperfor ssh_import_id at least17:24
powersjsmoser: blackboxsw: what if we ran `yum makecache` with retries so we had the metadata cache on hand17:25
powersjhttps://paste.ubuntu.com/25802800/17:27
naccrharper: smoser: do you want me to file a bug? i don't think it's a big deal, but it feels inconsistent, at least17:27
smosernacc: well, you can file a bug. i think probably the "fix" is to not document 'ssh-import-id' in the users/groups case.17:28
smoserbut only document the '_' varieties of all of those.17:28
naccsmoser: right17:28
naccbut do you really want to accept undocumented stuff?17:29
smoserdont have a choice. :)17:29
naccheh17:29
smoserat least not without a legacy warning period.17:29
dpb1right17:29
naccsure17:29
smoserpowersj: so what is the failure path that we're seeing ?17:35
powersjsmoser: on my sampling of failures it is generally the "Cannot retrieve metalink for repository: epel/x86_64" or something similar.17:36
smoserwhy would 'makecache' succeed when just doing an nisatll not ? i'd think they'd hit the same resources.17:36
smoserand i'd also think we want to change the 'install' to be yum -C install17:37
powersjThey do, I suggested it as it might be easier to put a retry around that one command, but maybe it doesn't matter17:37
smoserother wise we'd still hit the network :)17:37
smoserblackboxsw: you started 2 minutes after me17:39
smoser(i was 430, you are 431). which i think testing same thing.17:39
smoserwe can let them both run and have double the chance of success :)17:39
rharpergambler17:43
blackboxswsmoser: yeah oopsie, yeah wanted to hope for success in either case17:44
smoseri think you're ahead of me. or its a dead eheat17:47
smoserit took you 90 seconds less to checkout17:48
blackboxswpowersj: not a bad idea on yum makecache. maybe we could do that part daily, and just use cache-only during yum commands?17:52
blackboxsw--cacheonly I mean17:52
smoseri think the cache is local to the container that we're running in17:52
smosero daily doenst help i dont htink. but  yes, if not paired with '--cacheonly' on the install i think it does nothing.17:53
blackboxswyeah, looks like it, we'd have to push /var/cache/yum into the container if we wanted to leverage a daily cache update\17:54
rharpermmm, bind mount read-only17:54
blackboxswit'd save time trying to refresh cache every test run17:55
blackboxswand maybe even cut out some of the yum install errors17:56
blackboxswhrm, kinda big 45M/var/cache/yum/17:59
smosertesting http://paste.ubuntu.com/25803004/18:03
powersjlol@MAYBE_RELIABLE_YUM_INSTALL18:05
blackboxswnot bad at all :)18:05
blackboxswjenkins #430 FAIL: failed to install dependencies with read-dependencies ret=118:05
blackboxswall hopes rest on the lone survivor... 43118:06
blackboxswsmoser: with that paste we may want to re-enable fastest mirror plugin per tools/run-centos then right?18:07
blackboxswso we allow yum to use fastest mirror and retry if makecache fails18:07
rharperor use modulo to randomly enable/disable the fast mirror18:07
rharpersorta want to just ship a USB disk up there and leave it attached18:08
powersjif you are behind a proxy == don't use fastest mirror18:10
rharperexcept when you do want it because it works faster18:13
powersjper https://wiki.centos.org/PackageManagement/Yum/FastestMirror18:13
smoserblackboxsw, powersj had pointed at some doc that said dont use fastest mirror if you have a proxy18:13
rharperright18:13
smoserof course, that makes a lot of sense to document, instead of just DTRT!18:13
rharperexcept disabling it hasn't measurably improved things18:13
rharperso, that's great but it wasn't the core problem AFAICT18:13
blackboxswhah smoser on docs instead of DTRT18:14
blackboxswfile a bug :18:14
blackboxsw:)18:14
rharperlol18:14
blackboxswhow cloud fastest mirror plugin possible look at environment variables like http_proxy :)18:15
rharperwell, proxies are clearly slower than the fastest mirror18:17
blackboxswnice comments on the blog post smoser, didn't notice we could do that18:19
blackboxsw... comment inlin ethat is18:19
blackboxswinline even18:19
smoserthe 'comment' button.18:21
smoseri woudlnt have gone looking, but you said something like "you all comment on it"18:21
smoserand i assumed you meant there was something like that :)18:21
smoserError Downloading Packages:18:25
smoser  1:perl-Module-Pluggable-3.90-144.el6.x86_64: Caching enabled but no local cache of /var/cache/yum/x86_64/6/base/packages/perl-Module-Pluggable-3.90-144.el6.x86_64.rpm from base18:25
smoserpowersj: ^18:25
rharpershant we summon yum mirror masters for help here?18:26
smoserhttp://paste.ubuntu.com/25803096/18:26
smoseri dont understand what makecache does18:27
powersjsmoser: your -C option means to use the deb itself from the local cache.. that's not what we want18:28
powersjmy hope was to wrap around makecache to get the actual repo files as that seems to always be the part that fails and understand if we have a mirror issue or a network issue.18:28
powersjand by repo files I mean the package metadata info18:28
rharperthe problem is that we don't have a way to *pull* in everything18:29
powersjmy understanding of makecache is similar to apt update18:29
rharperfirst18:29
smoser"does not download or update any *headers*  unless it has to to perform the requested action"18:29
smoser(emphaisis added)18:29
rharperthat is, those packages aren't going to be in the cache, so we can't use -C until we have a copy of those packages18:29
rharperthe makecache is for metadata only18:29
rharperand the yum install -C means (don't pull from repos only from yum cached rpms)18:30
rharperso we've a chicken/egg issue18:30
rharperwe need at least one successful run of the install with yum conf keepcached=1 true18:30
smoserbut i'm missing something clearly18:30
rharperthen we can yum makecache18:30
rharperand then use yum install -C for all further runs18:30
smoserbecause it looks to me like there is *ALWAYS* a chicken/egg there.18:30
rharperyum install -C is equivalent to apt install --reinstall (but wihtout downloading)18:31
rharperI think we need to create a lambda function mircoservice to host a yum repo for the ci run18:33
smoserthis is insane.18:33
rharperit's not unrealistic to kick it out for now18:33
rharperif the repos are unreliable, it's not a helpful test18:34
blackboxswhttps://jenkins.ubuntu.com/server/job/cloud-init-ci/431/ looks hung19:21
powersjyeeeppp19:25
rharperdon't kill anything yet19:28
rharpercloud-test-ubuntu-xenial-snapshot-ojta1rm2ous4xibxcp1hrklcwhutd | FROZEN19:29
rharperthat's expected, right? the snapshot ?19:29
powersjyes19:30
powersjshould the zfs file be on the nvme...19:31
rharperpid 405 is stopped19:32
rharperit's a script19:32
rharperrunning19:32
rharperpython3 -m tests.cloud_tests run --verbose --os-name xenial --test modules/apt_configure_sources_list.yaml --test modules/ntp_servers --test modules/set_password_list --test modules/user_groups --deb cloud-init_*_all.deb19:32
powersjthat's the integration test19:33
rharpertwo tests are still "running"19:33
rharpercloud-test-ubuntu-xenial-image-modification-1l8nht7xdh4kgp0n5aq and cloud-test-ubuntu-xenial-modules-set-password-list-gs123zqh6npu19:33
* rharper execs into them19:33
rharperit completed19:34
rharperbut the harness isn't seeing the result19:34
rharperdoes it ssh into it? or exec cat the result.json file ?19:35
rharpersame for both19:35
rharperthey completed, no errors19:35
powersjI was under the assumption jenkins waits for the return code of the process19:35
rharperwhat stops the container ?19:36
rharperdon't we poll fore result.json in the lxd instance ?19:36
rharperjenkins is fine, it' script spawned the above python3 -m tests.cloud_tests run command19:36
rharperit ran 4 containers19:36
rharper2 of which are completed and exited, these two are remaining, but idle19:36
rharpercloud-init inside has completed successfully19:37
rharperbut the harness "watching" each of those container runs has somehow given up on it's "exit" condition19:37
smoserlets see if it likes me19:39
smoser https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/33266619:39
powersjrharper: from the jenkins log, I see 3 tests were launched and collected from. At this point, based no output alone, I would expect it is stuck/hung trying to get result.json19:42
powersjThe next thing it should be doing is launching an 4th container for the 4th test19:42
rharperpowersj: right, but where is the code in the cloud_tests where it's attempt to get the reoslt ?19:42
powersjcollect.py19:42
rharperwhat calls collect ?19:42
powersjrun_funcs.py19:43
rharperso instance.run_script19:43
rharperhrm, where do we import pylxd ?19:47
rharperI see platform/lxd.py19:47
powersjrharper: that should be it19:49
smoserblackboxsw: shall we say screw jenkins for now ?19:51
smoserwrt to fix-device-path...19:52
* blackboxsw was just writing up a retrospective document. that CI breakage on our test service has been painful to say the least19:52
blackboxswsmoser: I think we need to move on (as we are manually testing this)19:52
blackboxswand now have a little bit better unit test coverage19:52
blackboxswwe'll add the manual logs for deploying proposed for this bug fix we are pulling in19:53
rharperpowersj: well, I don't really see anything here;  we don't have a way to get pylxd to log it's commands ? Like the excute ?19:55
powersjrharper: I'm not familiar enough with pylxd19:55
blackboxswsmoser: getting closer Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again19:56
blackboxsw:: failed [1] (1/10). sleeping 5.19:56
blackboxswhrm it's looping. so it looks like it's making it farther on attempt 219:57
blackboxswnice smoser Complete!19:58
blackboxswInstalling deps: python-configobj python-oauthlib python-contextlib2 python-coverage python-httpretty python-six PyYAML python-jsonpatch python-jinja2 python-jsonschema python-setuptools python-requests python-unittest2 python-mock python-nose e2fsprogs iproute net-tools procps rsyslog shadow-utils sudo python-devel python-setuptools make sudo tar python-tox19:58
smoserhoray!19:58
blackboxswmaaan19:58
blackboxswthat's ridiculous19:58
smoserridonkulous19:58
rharperpowersj: I'm thinking that we likely want a timeout on those executes with a retry19:59
naccblackboxsw: fwiw, sudo is listed twice19:59
rharperthe container still works via the normal lxc client, so something's wonky with pylxd19:59
blackboxswrharper: is it possible we are missing a raise in the runscript case?19:59
rharperblackboxsw: there are no raises afaict19:59
rharperso of pylxd doesn't raise on it's failure, we just sit19:59
rharperstill smells like a pylxd issue20:00
blackboxswok gotcha.20:00
powersjrharper: agreed20:00
blackboxswohh true nacc20:00
blackboxswwe can tweak the deps script to use set([pkg_list]) to ensure uniques20:00
smoserblackboxsw: yeah, so 'sudo' is in the COMMON and then manually20:00
naccblackboxsw: not sure it matters, but it probably implies there's not a uniq-like operation20:00
naccblackboxsw: yeah20:00
smoseryeah20:01
smoserwait. how did sudo get in there twice20:01
rharperone can never have enough sudo20:01
naccironically, i just figured out that my git-ubuntu job failure was to not having sudo installed in the LXD image :)20:02
naccsynchronicity between squads achieved!20:02
blackboxswheh smoser I think it's probably pulling in deps from packages/pkg-deps.json20:02
blackboxswhahaha nacc #achievementunlocked20:02
smoserhm.. yeah20:03
blackboxswsmoser officially approved https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/33266620:06
blackboxswcan you land and I can rebase and pull it into my branch20:06
smoserblackboxsw: http://paste.ubuntu.com/25803766/20:07
smoserthat'd fuix the sudo twice, which is probably "who cares"20:07
smoserso i'm fine to ignore it for now.20:07
smoserand so would your 'sort'20:07
smosererr... set()20:08
smoserlanding now20:08
naccsmoser: yeah i don't think it's fatal, but given the output, it's nice to be able to parse it and not be surprised :)20:09
smoserblackboxsw: its in.20:12
smoserblackboxsw: you should rebase on upstream/master and then push a change and lets see if you get a happy face.20:12
blackboxswok smoser , sorry missed the paste I was thinking set.unions, but that looks just as 'noisy' as the for loop20:19
blackboxswrebasing20:19
blackboxswpushed20:21
blackboxswrebuilding20:21
blackboxswhttps://jenkins.ubuntu.com/server/job/cloud-init-ci/433/console20:23
smoserblackboxsw: land ?20:38
smoseri approved.20:38
blackboxswsmoser: it just completed, I'm waiting on tox20:38
blackboxswand will push20:38
smoser\o/20:38
blackboxswyeah almost20:39
smoserthen you can do mps for x, y, z20:39
smosernacc: we still do not have a 'bb', right ?20:39
smoserie, no archive open20:39
naccsmoser: right20:40
smoser:-(20:40
naccsmoser: you can open tasks though20:40
smosersure.20:40
naccsmoser: they are just against an unopened archive :)20:40
blackboxswsmoser: pushed20:40
naccafaik, no name yet anyways20:40
smoseryeah. ok. so i guess we just upload new upstream snapshot, blackboxsw to x, y, z20:40
blackboxswok smoser time to sync on that upstream20:40
blackboxswjust want to dot some i's and cross t's20:40
blackboxswI'm in hangout20:40
smoserk20:41
smoserblackboxsw: uploaded x, z, a21:34
blackboxswcan't do x21:34
blackboxswoops21:34
smosertomorrow we need to21:34
smosermake a recipe build for artful21:34
smoserand /me is pumpkin now21:34
blackboxswthx smoser21:34
blackboxswsee ya21:34
blackboxswwill start testing x,z21:34

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!