/srv/irclogs.ubuntu.com/2017/10/23/#cloud-init.txt

blackboxsw	ok will have a branch up tomorrow on the SRU https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+ref/fix-device-path-from-cmdline-regression. Just need to complete a test and the download rates I'm getting right now are painful (56K/b) so I'm leaving the download running til morning	06:11
dpb1	blackboxsw: is there a bug associated with that fix?	15:29
blackboxsw	yep dpb1 I'll dig it up and tie it in.	15:30
blackboxsw	https://bugs.launchpad.net/cloud-init/+bug/1725067	15:30
ubot5	Ubuntu bug 1725067 in cloud-init "cloud-init resizefs fails when booting with root=PARTUUID=" [High,Triaged]	15:30
blackboxsw	dpb1: ^	15:30
blackboxsw	the fix will land today	15:30
dpb1	oh, cool	15:33
dpb1	ah, ok, so caught during proposed testing	15:33
dpb1	?	15:33
smoser	rharper: http://paste.ubuntu.com/25802105/	15:43
smoser	and blackboxsw	15:43
smoser	that does pass tests.	15:43
rharper	k	15:43
smoser	we could ideally make some metadata to the patching that would maek that work.	15:43
smoser	would make the "block device" return for a stat or something.	15:44
smoser	but thats getting fancy	15:44
* rharper likes fancy		15:49
blackboxsw	dpb1: yep caught before SRU, so it's where we want to catch it	16:11
blackboxsw	well, we'd like to catch that with unit tests/integration tests	16:12
blackboxsw	but, before release is better than the alternative	16:12
dpb1	blackboxsw: can it be added to a unit test?	16:14
dpb1	s/added to/covered by/	16:15
blackboxsw	dpb1: I added a unit test that gets most of it. I have time to add a bit more concise unit test	16:15
blackboxsw	yeah, my work that introduced this regression was to restructure the code so it was a bit easier to unit test (as it didn't have much in the way of testing). I just didn't add enough tests to validate this case. more to come	16:18
* dpb1 nods		16:18
smoser	blackboxsw: i think you forgot to push on fix-device-paht-from-cmdline-regression	16:38
smoser	https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/332634	16:38
blackboxsw	oops pushed 129beef2..445ee933	16:39
blackboxsw	was debating about adding one more unit test for happy path handle(), but I think I'll add that in a separate merge proposal as it'll be some more rework	16:40
blackboxsw	I want a test that shows we are properly attempting to resize a disk by uuid	16:40
blackboxsw	not just the test that detects the disk/by-uuid from the commandline	16:40
blackboxsw	but I think we don't have to block the SRU for this additional test, since we are manually covering that	16:41
blackboxsw	what do you think smoser	16:41
smoser	i'm fine with your thinking there.	16:49
nacc	rharper: smoser: if i have a cloud-config snippet in my default lxd profile for user.vendor-data a la http://paste.ubuntu.com/25802522/ does that mean no ubuntu user gets created (that is the behavior I am seeing)	16:49
rharper	nacc: you're declaring the list of users to add	16:50
rharper	so yes, you;re overriding the default user	16:50
nacc	rharper: ok	16:50
nacc	rharper: I think I must have misunderstood that :)	16:50
nacc	rharper: is there an easy way for met to just modify the xisting users?	16:50
nacc	*existing	16:50
rharper	are you just wanting to import your keys?	16:51
rharper	you can use top-level ssh_import_keys: [list of keys here]	16:51
rharper	which will get imported into the default user (Ubuntu)	16:51
nacc	rharper: ah ok	16:51
nacc	rharper: but there is not a top-level ssh-import-ids?	16:51
rharper	ssh_import_id: raharper	16:51
blackboxsw	ok adding a backlog bug for unit test coverage of resizefs	16:52
nacc	rharper: ok, thanks	16:52
blackboxsw	ok filed https://bugs.launchpad.net/cloud-init/+bug/1726489	16:55
ubot5	Ubuntu bug 1726489 in cloud-init "Need more happy path unit test coverage of cc_resizefs " [Medium,Triaged]	16:55
blackboxsw	we need it by 17.2 to avoid any repeated regressions	16:56
blackboxsw	(if we were touching resizefs module)	16:56
nacc	rharper: does that go in user-data or vendor-data?	17:04
rharper	nacc: either will work	17:04
nacc	rharper: hrm ok	17:04
rharper	nacc: generally you're a user	17:04
rharper	things that lxd (or a cloud vendor would set by default) would go into vendor-data; but for your use/scripts/etc user-data is fine	17:05
nacc	rharper: ack	17:05
rharper	http://paste.ubuntu.com/25802644/	17:05
dpb1	they get merged following a very well documented process.	17:05
rharper	nacc: that' usually what I put in my user-data configs for vms and such	17:05
smoser	powersj: https://jenkins.ubuntu.com/server/job/cloud-init-ci/	17:05
smoser	why oh why. we've gotten so much less reliable recently. :-(	17:06
rharper	dpb1: well, it's well documented but not always what folks expect;	17:06
dpb1	rharper: :)	17:06
blackboxsw	grr FAIL: failed to install dependencies with read-dependencies ret=1	17:06
dpb1	yes, the documentation is there, the testing thing is what we need. :)	17:06
powersj	smoser: it is largely due to the MAAS tests	17:06
powersj	e.g. centos	17:06
blackboxsw	maybe we put in a retry on the yum package installs?	17:06
powersj	err i.e. centos	17:06
smoser	yes, but why did those get less reliable ?	17:06
smoser	we tried to make them more reliable with the no-fastest-mirror	17:07
blackboxsw	smoser: well we dropped the fastest mirror plugin (which we thought was the cause of our pains).	17:07
rharper	until its not	17:07
nacc	rharper: hrm, it's not added my user's key -- but i'll debug it a bit	17:07
rharper	when the "fast" mirror was fast	17:07
blackboxsw	heh, right exactly	17:07
rharper	it helped	17:07
powersj	yes, that was committed on the 19th, even before then things were getting worse	17:07
rharper	nacc: are you launching new instances or modifying an existing one ?	17:08
nacc	rharper: new insntances	17:08
rharper	yes, let's see a cloud-init.log	17:08
nacc	rharper: i'm assuming it would write the appropriate key into .ssh/authorized_keys?	17:08
rharper	yes, under ubuntu	17:08
nacc	rharper: one sec	17:09
nacc	http://paste.ubuntu.com/25802666/	17:09
nacc	for this profile: http://paste.ubuntu.com/25802674/	17:10
smoser	nacc: you can also just reference 'default' in your list.	17:11
smoser	http://paste.ubuntu.com/25802680/ i think does what you want	17:11
smoser	you can use that as vendor-data. user-data would override it.	17:11
nacc	smoser: where default is a special string that means keep whatever the default user is?	17:11
rharper	nacc: I think you want to change your '-' to '_' in the ssh_import_id key; at least that's what's in mine;	17:12
smoser	nacc: yes. default is special, means "the os configured default user"	17:13
nacc	rharper: that would be odd that the top level field is different than the users field?	17:14
nacc	or is that a yaml quirk	17:14
nacc	i'll try it	17:14
nacc	rharper: (you can see in smoser's paste ssh-import-id is used there)	17:14
nacc	rharper: that worked	17:17
nacc	fwiw, I had smoser's variant before and c&p it up	17:17
nacc	that's why i had '-' instead of '_'	17:17
nacc	not sure if that's a bug or not :)	17:17
rharper	one could file a bug and ask for - instead of _ but I believe the manual/rtd shows _	17:19
nacc	rharper: oh it's ok -- just may want to update smoser :-P	17:19
rharper	lol	17:20
rharper	ironic	17:20
nacc	:)	17:20
smoser	rharper: nacc both are accepted there in the users:	17:22
smoser	generally speaking i think we prefer '_' .	17:22
rharper	that's strange	17:22
rharper	the ssh_import_id config module only checks _	17:23
smoser	but the user's dictionary kindof fronts "useradd" (or adduser, not sure which).	17:23
rharper	so users module is "helping" out ?	17:23
nacc	smoser: yeah, something feels odd	17:23
smoser	'normalize_user's is helping out, rharper	17:23
nacc	smoser: well that it does work sometimes	17:23
smoser	err... normalize_user_groups	17:23
smoser	what "does work sometimes" ?	17:23
nacc	smoser: '-' works in users but not at the top level	17:24
nacc	(that is, ssh_import_id is the only thing that works at the top level)	17:24
rharper	for ssh_import_id at least	17:24
powersj	smoser: blackboxsw: what if we ran `yum makecache` with retries so we had the metadata cache on hand	17:25
powersj	https://paste.ubuntu.com/25802800/	17:27
nacc	rharper: smoser: do you want me to file a bug? i don't think it's a big deal, but it feels inconsistent, at least	17:27
smoser	nacc: well, you can file a bug. i think probably the "fix" is to not document 'ssh-import-id' in the users/groups case.	17:28
smoser	but only document the '_' varieties of all of those.	17:28
nacc	smoser: right	17:28
nacc	but do you really want to accept undocumented stuff?	17:29
smoser	dont have a choice. :)	17:29
nacc	heh	17:29
smoser	at least not without a legacy warning period.	17:29
dpb1	right	17:29
nacc	sure	17:29
smoser	powersj: so what is the failure path that we're seeing ?	17:35
powersj	smoser: on my sampling of failures it is generally the "Cannot retrieve metalink for repository: epel/x86_64" or something similar.	17:36
smoser	why would 'makecache' succeed when just doing an nisatll not ? i'd think they'd hit the same resources.	17:36
smoser	and i'd also think we want to change the 'install' to be yum -C install	17:37
powersj	They do, I suggested it as it might be easier to put a retry around that one command, but maybe it doesn't matter	17:37
smoser	other wise we'd still hit the network :)	17:37
smoser	blackboxsw: you started 2 minutes after me	17:39
smoser	(i was 430, you are 431). which i think testing same thing.	17:39
smoser	we can let them both run and have double the chance of success :)	17:39
rharper	gambler	17:43
blackboxsw	smoser: yeah oopsie, yeah wanted to hope for success in either case	17:44
smoser	i think you're ahead of me. or its a dead eheat	17:47
smoser	it took you 90 seconds less to checkout	17:48
blackboxsw	powersj: not a bad idea on yum makecache. maybe we could do that part daily, and just use cache-only during yum commands?	17:52
blackboxsw	--cacheonly I mean	17:52
smoser	i think the cache is local to the container that we're running in	17:52
smoser	o daily doenst help i dont htink. but yes, if not paired with '--cacheonly' on the install i think it does nothing.	17:53
blackboxsw	yeah, looks like it, we'd have to push /var/cache/yum into the container if we wanted to leverage a daily cache update\	17:54
rharper	mmm, bind mount read-only	17:54
blackboxsw	it'd save time trying to refresh cache every test run	17:55
blackboxsw	and maybe even cut out some of the yum install errors	17:56
blackboxsw	hrm, kinda big 45M/var/cache/yum/	17:59
smoser	testing http://paste.ubuntu.com/25803004/	18:03
powersj	lol@MAYBE_RELIABLE_YUM_INSTALL	18:05
blackboxsw	not bad at all :)	18:05
blackboxsw	jenkins #430 FAIL: failed to install dependencies with read-dependencies ret=1	18:05
blackboxsw	all hopes rest on the lone survivor... 431	18:06
blackboxsw	smoser: with that paste we may want to re-enable fastest mirror plugin per tools/run-centos then right?	18:07
blackboxsw	so we allow yum to use fastest mirror and retry if makecache fails	18:07
rharper	or use modulo to randomly enable/disable the fast mirror	18:07
rharper	sorta want to just ship a USB disk up there and leave it attached	18:08
powersj	if you are behind a proxy == don't use fastest mirror	18:10
rharper	except when you do want it because it works faster	18:13
powersj	per https://wiki.centos.org/PackageManagement/Yum/FastestMirror	18:13
smoser	blackboxsw, powersj had pointed at some doc that said dont use fastest mirror if you have a proxy	18:13
rharper	right	18:13
smoser	of course, that makes a lot of sense to document, instead of just DTRT!	18:13
rharper	except disabling it hasn't measurably improved things	18:13
rharper	so, that's great but it wasn't the core problem AFAICT	18:13
blackboxsw	hah smoser on docs instead of DTRT	18:14
blackboxsw	file a bug :	18:14
blackboxsw	:)	18:14
rharper	lol	18:14
blackboxsw	how cloud fastest mirror plugin possible look at environment variables like http_proxy :)	18:15
rharper	well, proxies are clearly slower than the fastest mirror	18:17
blackboxsw	nice comments on the blog post smoser, didn't notice we could do that	18:19
blackboxsw	... comment inlin ethat is	18:19
blackboxsw	inline even	18:19
smoser	the 'comment' button.	18:21
smoser	i woudlnt have gone looking, but you said something like "you all comment on it"	18:21
smoser	and i assumed you meant there was something like that :)	18:21
smoser	Error Downloading Packages:	18:25
smoser	1:perl-Module-Pluggable-3.90-144.el6.x86_64: Caching enabled but no local cache of /var/cache/yum/x86_64/6/base/packages/perl-Module-Pluggable-3.90-144.el6.x86_64.rpm from base	18:25
smoser	powersj: ^	18:25
rharper	shant we summon yum mirror masters for help here?	18:26
smoser	http://paste.ubuntu.com/25803096/	18:26
smoser	i dont understand what makecache does	18:27
powersj	smoser: your -C option means to use the deb itself from the local cache.. that's not what we want	18:28
powersj	my hope was to wrap around makecache to get the actual repo files as that seems to always be the part that fails and understand if we have a mirror issue or a network issue.	18:28
powersj	and by repo files I mean the package metadata info	18:28
rharper	the problem is that we don't have a way to pull in everything	18:29
powersj	my understanding of makecache is similar to apt update	18:29
rharper	first	18:29
smoser	"does not download or update any headers unless it has to to perform the requested action"	18:29
smoser	(emphaisis added)	18:29
rharper	that is, those packages aren't going to be in the cache, so we can't use -C until we have a copy of those packages	18:29
rharper	the makecache is for metadata only	18:29
rharper	and the yum install -C means (don't pull from repos only from yum cached rpms)	18:30
rharper	so we've a chicken/egg issue	18:30
rharper	we need at least one successful run of the install with yum conf keepcached=1 true	18:30
smoser	but i'm missing something clearly	18:30
rharper	then we can yum makecache	18:30
rharper	and then use yum install -C for all further runs	18:30
smoser	because it looks to me like there is ALWAYS a chicken/egg there.	18:30
rharper	yum install -C is equivalent to apt install --reinstall (but wihtout downloading)	18:31
rharper	I think we need to create a lambda function mircoservice to host a yum repo for the ci run	18:33
smoser	this is insane.	18:33
rharper	it's not unrealistic to kick it out for now	18:33
rharper	if the repos are unreliable, it's not a helpful test	18:34
blackboxsw	https://jenkins.ubuntu.com/server/job/cloud-init-ci/431/ looks hung	19:21
powersj	yeeeppp	19:25
rharper	don't kill anything yet	19:28
rharper	cloud-test-ubuntu-xenial-snapshot-ojta1rm2ous4xibxcp1hrklcwhutd \| FROZEN	19:29
rharper	that's expected, right? the snapshot ?	19:29
powersj	yes	19:30
powersj	should the zfs file be on the nvme...	19:31
rharper	pid 405 is stopped	19:32
rharper	it's a script	19:32
rharper	running	19:32
rharper	python3 -m tests.cloud_tests run --verbose --os-name xenial --test modules/apt_configure_sources_list.yaml --test modules/ntp_servers --test modules/set_password_list --test modules/user_groups --deb cloud-init_*_all.deb	19:32
powersj	that's the integration test	19:33
rharper	two tests are still "running"	19:33
rharper	cloud-test-ubuntu-xenial-image-modification-1l8nht7xdh4kgp0n5aq and cloud-test-ubuntu-xenial-modules-set-password-list-gs123zqh6npu	19:33
* rharper execs into them		19:33
rharper	it completed	19:34
rharper	but the harness isn't seeing the result	19:34
rharper	does it ssh into it? or exec cat the result.json file ?	19:35
rharper	same for both	19:35
rharper	they completed, no errors	19:35
powersj	I was under the assumption jenkins waits for the return code of the process	19:35
rharper	what stops the container ?	19:36
rharper	don't we poll fore result.json in the lxd instance ?	19:36
rharper	jenkins is fine, it' script spawned the above python3 -m tests.cloud_tests run command	19:36
rharper	it ran 4 containers	19:36
rharper	2 of which are completed and exited, these two are remaining, but idle	19:36
rharper	cloud-init inside has completed successfully	19:37
rharper	but the harness "watching" each of those container runs has somehow given up on it's "exit" condition	19:37
smoser	lets see if it likes me	19:39
smoser	https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/332666	19:39
powersj	rharper: from the jenkins log, I see 3 tests were launched and collected from. At this point, based no output alone, I would expect it is stuck/hung trying to get result.json	19:42
powersj	The next thing it should be doing is launching an 4th container for the 4th test	19:42
rharper	powersj: right, but where is the code in the cloud_tests where it's attempt to get the reoslt ?	19:42
powersj	collect.py	19:42
rharper	what calls collect ?	19:42
powersj	run_funcs.py	19:43
rharper	so instance.run_script	19:43
rharper	hrm, where do we import pylxd ?	19:47
rharper	I see platform/lxd.py	19:47
powersj	rharper: that should be it	19:49
smoser	blackboxsw: shall we say screw jenkins for now ?	19:51
smoser	wrt to fix-device-path...	19:52
* blackboxsw was just writing up a retrospective document. that CI breakage on our test service has been painful to say the least		19:52
blackboxsw	smoser: I think we need to move on (as we are manually testing this)	19:52
blackboxsw	and now have a little bit better unit test coverage	19:52
blackboxsw	we'll add the manual logs for deploying proposed for this bug fix we are pulling in	19:53
rharper	powersj: well, I don't really see anything here; we don't have a way to get pylxd to log it's commands ? Like the excute ?	19:55
powersj	rharper: I'm not familiar enough with pylxd	19:55
blackboxsw	smoser: getting closer Cannot retrieve metalink for repository: epel/x86_64. Please verify its path and try again	19:56
blackboxsw	:: failed [1] (1/10). sleeping 5.	19:56
blackboxsw	hrm it's looping. so it looks like it's making it farther on attempt 2	19:57
blackboxsw	nice smoser Complete!	19:58
blackboxsw	Installing deps: python-configobj python-oauthlib python-contextlib2 python-coverage python-httpretty python-six PyYAML python-jsonpatch python-jinja2 python-jsonschema python-setuptools python-requests python-unittest2 python-mock python-nose e2fsprogs iproute net-tools procps rsyslog shadow-utils sudo python-devel python-setuptools make sudo tar python-tox	19:58
smoser	horay!	19:58
blackboxsw	maaan	19:58
blackboxsw	that's ridiculous	19:58
smoser	ridonkulous	19:58
rharper	powersj: I'm thinking that we likely want a timeout on those executes with a retry	19:59
nacc	blackboxsw: fwiw, sudo is listed twice	19:59
rharper	the container still works via the normal lxc client, so something's wonky with pylxd	19:59
blackboxsw	rharper: is it possible we are missing a raise in the runscript case?	19:59
rharper	blackboxsw: there are no raises afaict	19:59
rharper	so of pylxd doesn't raise on it's failure, we just sit	19:59
rharper	still smells like a pylxd issue	20:00
blackboxsw	ok gotcha.	20:00
powersj	rharper: agreed	20:00
blackboxsw	ohh true nacc	20:00
blackboxsw	we can tweak the deps script to use set([pkg_list]) to ensure uniques	20:00
smoser	blackboxsw: yeah, so 'sudo' is in the COMMON and then manually	20:00
nacc	blackboxsw: not sure it matters, but it probably implies there's not a uniq-like operation	20:00
nacc	blackboxsw: yeah	20:00
smoser	yeah	20:01
smoser	wait. how did sudo get in there twice	20:01
rharper	one can never have enough sudo	20:01
nacc	ironically, i just figured out that my git-ubuntu job failure was to not having sudo installed in the LXD image :)	20:02
nacc	synchronicity between squads achieved!	20:02
blackboxsw	heh smoser I think it's probably pulling in deps from packages/pkg-deps.json	20:02
blackboxsw	hahaha nacc #achievementunlocked	20:02
smoser	hm.. yeah	20:03
blackboxsw	smoser officially approved https://code.launchpad.net/~smoser/cloud-init/+git/cloud-init/+merge/332666	20:06
blackboxsw	can you land and I can rebase and pull it into my branch	20:06
smoser	blackboxsw: http://paste.ubuntu.com/25803766/	20:07
smoser	that'd fuix the sudo twice, which is probably "who cares"	20:07
smoser	so i'm fine to ignore it for now.	20:07
smoser	and so would your 'sort'	20:07
smoser	err... set()	20:08
smoser	landing now	20:08
nacc	smoser: yeah i don't think it's fatal, but given the output, it's nice to be able to parse it and not be surprised :)	20:09
smoser	blackboxsw: its in.	20:12
smoser	blackboxsw: you should rebase on upstream/master and then push a change and lets see if you get a happy face.	20:12
blackboxsw	ok smoser , sorry missed the paste I was thinking set.unions, but that looks just as 'noisy' as the for loop	20:19
blackboxsw	rebasing	20:19
blackboxsw	pushed	20:21
blackboxsw	rebuilding	20:21
blackboxsw	https://jenkins.ubuntu.com/server/job/cloud-init-ci/433/console	20:23
smoser	blackboxsw: land ?	20:38
smoser	i approved.	20:38
blackboxsw	smoser: it just completed, I'm waiting on tox	20:38
blackboxsw	and will push	20:38
smoser	\o/	20:38
blackboxsw	yeah almost	20:39
smoser	then you can do mps for x, y, z	20:39
smoser	nacc: we still do not have a 'bb', right ?	20:39
smoser	ie, no archive open	20:39
nacc	smoser: right	20:40
smoser	:-(	20:40
nacc	smoser: you can open tasks though	20:40
smoser	sure.	20:40
nacc	smoser: they are just against an unopened archive :)	20:40
blackboxsw	smoser: pushed	20:40
nacc	afaik, no name yet anyways	20:40
smoser	yeah. ok. so i guess we just upload new upstream snapshot, blackboxsw to x, y, z	20:40
blackboxsw	ok smoser time to sync on that upstream	20:40
blackboxsw	just want to dot some i's and cross t's	20:40
blackboxsw	I'm in hangout	20:40
smoser	k	20:41
smoser	blackboxsw: uploaded x, z, a	21:34
blackboxsw	can't do x	21:34
blackboxsw	oops	21:34
smoser	tomorrow we need to	21:34
smoser	make a recipe build for artful	21:34
smoser	and /me is pumpkin now	21:34
blackboxsw	thx smoser	21:34
blackboxsw	see ya	21:34
blackboxsw	will start testing x,z	21:34

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!