/srv/irclogs.ubuntu.com/2015/11/05/#juju-dev.txt

menn0thumper, axw or wallyworld: review pls  http://reviews.vapour.ws/r/3073/00:10
menn0this makes a feature test i'm writing a lot cleaner00:11
thumpermenn0: done00:13
* thumper goes to walk the dog in the sun00:13
menn0thumper: thanks00:13
davecheneyOH WOW00:40
davecheneythe maas provider demands yaml.v200:40
davecheneybut also calls a utilty function with requires yaml.v100:41
davecheneymgz_: thumper http://reviews.vapour.ws/r/3074/01:35
davecheneyplease view01:35
davecheneyshould break the arguments about which version of yaml.v2 juju/utils can use01:35
davecheneyalso, spot the WTF in that pach01:36
davecheneyalso, spot the WTF in that patch01:36
=== natefinch-afk is now known as natefinch
natefinchdavecheney: heh, looks familiar: https://github.com/natefinch/atomic/blob/master/atomic.go#L1701:56
natefinchdavecheney:  looks like I should steal some safeguards from yours, though.01:57
perrito666mm, the chmod wont work in windows, but since it is not tested it wont break anything either01:58
natefinchperrito666: chmod is just a noop on winows01:58
perrito666natefinch: not entirely01:58
perrito666there is an implementation in go that does something stupid01:58
perrito666it changes fs properties for the file but these are ignored by windows01:59
natefincheffectively a noop :)01:59
perrito666natefinch: well the fact that those things can be changed means that something is using them02:00
perrito666I have no clue what that is02:00
perrito666most likely windows 3 :p02:00
thumpermenn0, davecheney, axw: team meeting02:02
natefinchdavecheney: seems like this must be a bug: http://reviews.vapour.ws/r/3074/#comment1913302:07
davecheneycoming02:33
davecheneyoh02:33
davecheneyi'm 33 mins late02:33
davecheneynatefinch: thanks for your comments02:36
davecheneyPTAL02:37
natefinchdavecheney: we're still talking on the hangout btw02:37
davecheneyok, coming02:37
natefinchdavecheney: tests?02:49
natefinchthumper: Looking at that retry package, seems like it would be useful to be able to encapsulate call args into a value, and then be able to call value.Call(somefunc) .... so you could use the same retry semantics with any number of different functions.  Also it then doesn't hide the function you're retrying inside a huge list of args.02:52
natefinchthumper: also you should call the package sisyphus02:53
davecheneynatefinch: this is only temporary02:56
davecheneyonce everyone is at yaml.v2 i'll be deleting those forks02:56
davecheneymy goal is not to move code from other packages into juju02:56
davecheneyin fact the opposite02:56
davecheneybut this yaml.v2 dep keeps fucking that plan up02:56
cheryljwhat's the difference between stateaddresses and apiaddresses in agent.conf?03:08
cheryljthumper, menn0 ^^ ?03:10
menn0cherylj: iirc stateaddresses has the mongodb server addresses and apiaddresses is the API server addresses03:11
* menn0 checks something03:12
thumpernatefinch: interesting idea03:13
menn0cherylj: yep, that's right03:13
thumpernatefinch: and pretty easy to implement03:13
menn0cherylj: really stateaddresses shouldn't be there except on the state/controller servers, but it is03:14
cheryljok, thanks, menn003:14
menn0cherylj: it's probably a historical vestiage - non state agents used to connect directly to mongo for some things03:14
thumpermenn0: could I get you to cast your eyes over http://reviews.vapour.ws/r/3072/ ?03:15
thumpercherylj: menn0 is right in his musings :)03:15
menn0cherylj: related tidbit: a lot of (probably older) code uses "state" to mean "the mongodb server". not at all confusing.03:18
menn0thumper: looking03:18
cheryljha, ok :)03:18
natefinchthumper: also, if you're going to make retry a standalone package, you gotta move clock into a standalone package03:25
thumpernatefinch: axw is moving the clock out in his one03:26
natefinchthumper: awesome03:26
thumpernatefinch, axw: although perhaps we should have a common top level one...03:26
thumperaxw: perhaps juju/clock ...03:26
axwnatefinch: FYI, this is the branch talked about on the call: https://github.com/axw/juju-time03:26
axwthumper: I was thinking juju/time/clock, but *shrug*03:26
natefinchthumper: yeah. thats what I meant, just juju/clock03:27
thumperaxw: I was thinking more to have a common parent for the scheduler one too, rather that the clock with the scheduler03:27
axwthumper: yes, definitely not in the one package03:28
axwwe have enough utils already :)03:28
thumperI was thinking not in the same repo03:28
thumperagreed03:28
natefinchthumper: btw, I think the error handling in your retry code could be improved by following davecheney's advice to assert behavior: http://dave.cheney.net/2014/12/24/inspecting-errors   .. so like have a 'Last() error' method on the error types, etc.03:34
* thumper looks03:35
menn0thumper: well that sucked. review done. lots of stuff missed.03:35
thumpermenn0: :(03:35
menn0thumper: the review process sucked... not the PR03:35
thumpermenn0: it isn't complete...03:35
thumpergeez03:35
thumperthere is another one following...03:35
thumperI think03:35
thumperat least03:35
* thumper looks03:35
menn0thumper: there's lots of test names that still say System and bits of help text missed and a few other things03:36
thumperok03:36
menn0thumper: I basically just did a search in my browser for "system" looked to see if it came up on the right side of the diff :)03:37
menn0thumper: also "server" comes up a bit03:37
* thumper sighs03:37
menn0not sure if you're fixing those03:37
thumperso much stuff to change03:37
natefinchI can't believe we're spending so much time just to change the names of things, instead of, like, implementing features.03:42
natefinchoh yeah, hey, I had a 1.26-alpha1.1 server running, and tried to interact with it via a client built from master, and was getting this error message printing out with a lot of my commands: 2015/11/04 21:27:00 warning: discarding cookies in invalid format (error: json: cannot unmarshal object into Go value of type []cookiejar.entry)03:44
natefinchdavecheney: what's wrong with this picture? https://github.com/juju/persistent-cookiejar/blob/master/jar.go#L1003:47
davecheneynatefinch: the type isn't public ?03:50
natefinchdavecheney: repo name is "persistent-cookiejar" :/03:50
natefinchpackage name is "cookiejar03:50
davecheneyoh for fucks sake03:50
* davecheney throws something03:51
natefinchalso, using that package seems to break client-server compatibility03:51
natefinchI saw this:03:51
natefinch$ juju destroy-environment local -y03:51
natefinchERROR cannot connect to API: cannot load cookies: json: cannot unmarshal object into Go value of type []cookiejar.entry03:51
natefinchwhewn my client was a different version than my server... presumably one of them was using the old cookiejar and one the new03:52
* davecheney bursts into tears03:52
davecheneynatefinch: i'm convencined we've passed more than 100% technical debt to gdp03:53
natefinchdavecheney: time to rewrite juju in rust03:53
davecheneythat wasn't the exact point I was trying to make ...03:55
natefinchdavecheney: well, we're working on tech debt at oakland, right?  One week should just about do it.03:57
davecheneyi'll be late04:05
* natefinch calls it a night to go read his new Go programming book04:07
=== akhavr1 is now known as akhavr
thumperdavecheney: https://github.com/howbazaar/clock-proposed04:28
thumperdavecheney: I know you like non-util named packages :)04:28
thumperdavecheney: to become github.com/juju/clock04:28
thumperaxw: ^^04:28
thumperadded in the testing clock from juju/juju/testing/clock.go04:28
thumperand added a few tests04:29
thumperand type assertions in the test file04:29
axwthumper: LGTM, but I'd like it if you renamed clock/testing to clock/clocktest04:32
axwthumper: saves aliasing everywhere04:33
thumperclock/testclock?04:33
axwthumper: I was thinking clocktest as in testing things related to the clock package, not as in a package that contains a TestClock04:33
thumperI agree with renaming it04:33
axwthumper: much like net/http/httptest04:34
thumperah.. ok04:34
thumperhappy to follow precident04:34
axwthumper: still not really sure about having a whole repo to its own. we're goingto want other time-related things which would be nice to group together04:36
axwthumper: e.g. delay functions for retry/scheduler/backoff thing that bogdan is working04:36
thumperaxw: let's ask fwereade, given tech-lead hat :)04:37
axwthumper: I'm thinking they'd live in juju/time/delays or something like that... and then having clock over by itself feels a bit odd04:37
axwthumper: SGTM04:37
davecheney\o/ all praise the clock04:43
davecheney15:32 < axw> thumper: LGTM, but I'd like it if you renamed clock/testing to clock/clocktest04:43
davecheney^ yes, a million times, yes04:43
thumperI'm just asking our architect and TLs about guidance around separate repo for clock, or one for time that includes the scheduler that axw has04:44
thumperdavecheney: the clock-proposed repo now has that package renamed04:44
davecheneythumper: also if it's going to be a juju project, it needs a cute name04:44
davecheneywhat about clocky ? https://upload.wikimedia.org/wikipedia/commons/4/4b/Clocky_almond_panorama1680.jpg04:44
thumper:)04:45
thumperaxw: got the link to your time proposed repo?04:46
thumperaxw: I'll include it in the email04:46
axwthumper: https://github.com/axw/juju-time04:46
thumperta04:46
menn0thumper: here's juju-dumplogs. I'm just doing the /usr/local/bin symlink now.04:47
menn0http://reviews.vapour.ws/r/3075/04:47
thumpermenn0: I'll continue reviewing shortly, got to go and drop of kids to guides/show04:50
thumperbbs04:50
menn0thumper: np04:51
jammorning all07:54
dimiterndooferlad, hey, can you check if you open http://imgur.com/a/ky3cl please?09:13
fwereadedimitern, nice09:13
dooferladdimitern: yep09:14
dimiternfwereade, dooferlad, thanks :)09:14
dimiternwasn't sure I need to actually sign up for imgur to "public" the album - never used it, and their UI is confusing09:14
dooferladdimitern: I have an environment in EC2 that has had its agent-state-info stuck in "Request limit exceeded" since last night.09:20
dooferladdimitern: any thoughts?09:20
dimiterndooferlad, hmm - is it the shared account?09:21
dooferladdimitern: https://console.aws.amazon.com/trustedadvisor/home?#/dashboard says I am below the service limits09:21
dooferladso I expect this is a cached response :-|09:22
dimiterndooferlad, if it's the shared account, I can have a look09:23
dooferladdimitern: it is shared, in eu-central09:23
dimiterndooferlad, ok, looking09:23
dooferladdimitern: and, the fun part is, I only have 3 machines running after I did an ensure-availability -n 3 and had a charm on machine 1.09:24
dooferladdimitern: so it looks like Juju tried to add another machine then gave up09:25
dooferladdimitern: http://pastebin.ubuntu.com/13111330/09:25
dimiterndooferlad, that smells like the instancepoller getting overexcited09:27
dimiternand polling more often than needed09:27
dooferladdimitern: even though we are, at the moment, under limit?09:27
dooferladdimitern: ignore that comment, I was looking at service limits, not request limits09:28
dooferladdimitern: we definitely need exponential backoff in our EC2 API with automatic retries. Is it supposed to have it already?09:33
dimiterndooferlad, IIRC we already have that09:33
dimiterndooferlad, or maybe it was just for the instancepoller09:33
dooferladdimitern: it shouldn't be related to the bug I am looking at anyway since the customer is using MAAS.09:35
dimiterndooferlad, yeah09:35
dimiterndooferlad, can you try destroy-machine --force on those with the error and add new ones?09:37
dooferladdimitern: possibly, but it isn't important right now. Was just wondering about a quick fix.09:38
dooferladdimitern: just seemed odd09:38
dimiterndooferlad, indeed - before destroying the env, it might be useful to get the machine-0.log for some insight09:39
jamfrobware: sorry I missed standup. Was still meeting with Mark. are you guys still chatting?10:37
frobwarejam: yep10:38
frobwarejam: into topics "catacomb and rate limiting"10:38
jamk. using the restroom and I'll be right there10:39
fwereadejam, it's a `.Kill(nil)`, so that's even easier :)11:06
* perrito666 applies for a visa for the first time in around 20 years11:10
perrito666axw: you have a pretty strict migration policy :p11:10
lazypowerdimitern - Help me out for a sec. Whats the name of your juju-core team?12:31
rick_h__lazypower: sapphire12:31
lazypowerah! thank you12:31
lazypoweris there a chart/roster somewhere?12:31
lazypowerIt'll help when blogging and pointing credit arrows :D12:32
rick_h__lazypower: honestly I use the canonical directory12:32
lazypowerAh, allright12:32
rick_h__lazypower: if you look up dimiter it shows "Juju Core - Sapphire" as his team12:32
rick_h__lazypower: I'm sure there's others but just what I tend to use when I forget.12:32
dimiternlazypower, hey :)12:33
perrito666it would be really nice to have an api for the directory so I can write a plugin for my irc client12:33
rick_h__perrito666: come on, web scrapers ftw! :P12:34
lazypowerperrito666 - automate away the pain. Embedded profiles for IRC12:34
jamperrito666: do you know about mup?12:40
perrito666jam: I do I would like my client to show me faces on hover though :)13:31
jamperrito666: would be good13:32
cheryljfrankban: I see that you're reverting your update for crypto.  Was it updated originally for a particular reason?13:48
mupBug #1513466 opened: Different behavior on ServiceDeploy with Config/ConfigYAML <juju-core:New> <https://launchpad.net/bugs/1513466>13:50
mupBug #1513468 opened: imports github.com/juju/juju/workload/api/internal/client: use of internal package not allowed <blocker> <ci> <regression> <testing> <wily> <juju-core:New> <juju-core 1.25:Fix Released> <https://launchpad.net/bugs/1513468>13:51
frankbancherylj: I don;t remember specific reasons, probably just ended up there because it was updated in my GOPATH by some other project14:05
cheryljfrankban: ok, thanks!14:05
frankbancherylj: should I merge it?14:06
cheryljfrankban: did your local tests pass?14:06
frankbancherylj: I alway have some intermittent failures locally, but they seems not related to the downgrade, CI will tell us I guess14:07
* cherylj curses intermittent failures14:07
cheryljfrankban: yes, please merge14:07
frankbancherylj: done, how do we know if this fixes armhf?14:08
cheryljfrankban: I don't think we have any way to test that without requesting images be built for 1.26-alpha 114:09
frankbancherylj: ok, does merging this automatically unblock master?14:10
cheryljfrankban: no, we'll need to verify that the fix worked first14:10
perrito666someone gave more machine to CI? curses are coming faster14:11
frankbancherylj: Does not match ['fixes-1513236'], I wrote fixes-1513236 in the pr comment, what else should I do?14:11
perrito666frankban: $$fixes-1513236$$ should do the trick14:11
cheryljfrankban:  use "$$fixes-1513236$$"14:11
cheryljrather than $$merge$$14:11
frankbanok done14:12
cheryljthanks, frankban!14:14
abentleyfrankban: If master is blessed, your bug will automatically be marked fix-released.  If there are no other blockers, that will unblock master.14:40
frankbanabentley: sounds good14:40
frankbancherylj: downgrade branch landed14:45
abentleyfrankban: master bec300366 now testing.14:47
mupBug #1513492 opened: add-machine with vsphere triggers machine-0: panic: juju home hasn't been initialized <add-machine> <panic> <vsphere> <juju-core:Triaged> <https://launchpad.net/bugs/1513492>15:00
cory_fuWith the release of Juju 1.25, we seem to be seeing shorter idle times on the agent-state, possibly due to update-status hook being called more frequently.  Was there a specific change related to that in 1.25?15:00
perrito666bbl lunch15:01
=== jog_ is now known as jog
perrito666cory_fu: update status hook will call before entering idle status15:01
perrito666but the added time there might make idle time shorter as a consequence15:01
perrito666fwereade: correct me if that changed15:01
perrito666now yes, bbl15:02
fwereadeperrito666, cory_fu: yes, I wouldn't expect a non-errored agent to be idle for longer than 5 mins (I think that's the update-status period)15:03
fwereadecory_fu, but if it never gets close to that there might be something up15:03
cory_fuI'm not sure if I understand.  The issue I'm running up against is that I'm waiting for a 30s idle period and am not seeing it within a 30min window.  This seems to mostly happen on Azure15:04
cory_fuWhich is notoriously slow for these deploys, so that may be a factor.15:05
tvansteenburghcory_fu: i have examples on other clouds too15:05
cory_futvansteenburgh: But it's not 100% consistent on other clouds?15:05
tvansteenburghcory_fu: azure is notable b/c it hasn't passed on 1.25 at all. hp on the other hand, has passed at least twice :P15:06
katcowwitzel3: standup?15:07
wwitzel3katco: trying .. :/15:08
tvansteenburghfwereade, cory_fu: seems to me that agent-status.since should not be updated unless the value of agent-status.current actually changes15:25
fwereadetvansteenburgh, right, but it executes a hook every 5 minutes15:25
fwereadetvansteenburgh, is workload-status also shortened?15:25
fwereadetvansteenburgh, I would expect ~5mins of idle, separated by (likely) sub-second blips of executing15:26
tvansteenburghfwereade: no, in our examples, wordload-status.since is much older - 8 to 15 minutes older in the one i'm looking at15:27
fwereadetvansteenburgh, cool, I think that's what I'd expect15:27
fwereadetvansteenburgh, is it unhelpful to you?15:27
tvansteenburghfwereade: i'm not sure. we've been using agent-status.since to determine when an environment had "settled" - all agents idle for 30 seconds. that worked well prior to 1.25, but now we see most deployments never reaching that settled state. trying to figure out what changed15:30
frankbancherylj, abentley: bec30036 failed, errors don't seem related though15:30
fwereadetvansteenburgh, well, the update-status thing is the clear proximate cause, but ultimately I think it's incorrect to depend on agent status as a proxy for environment stability15:31
fwereadetvansteenburgh, that should be what workload-status is for -- assuming the charm implements it15:32
abentleyfrankban: no, but it does seem like a legit failure.  The same test failed for the last test of master.15:32
tvansteenburghfwereade: yeah, fair enough. we were trying to accommodate the charms that don't, but i see your point15:34
fwereadetvansteenburgh, cool -- forwarded you a mail where I go into a bit more detail, in case it's relevant :)15:35
tvansteenburghfwereade: thanks!15:35
frankbanabentley: how do we check that https://bugs.launchpad.net/juju-core/+bug/1513236 is fixed?15:36
mupBug #1513236: Cannot build trusty armhf with go1.2 on from master <armhf> <blocker> <go1.2> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1513236>15:36
katcodavecheney: did you have a golang issue or something to justify unblocking master for bug 1513236 ?15:36
katcodavecheney: (just now seeing your message) i'd like to update that bug with justification before untagging it as a blocker15:37
abentleyfrankban: Regardless of whether that bug is fixed, we don't want to unblock until we get a bless.15:37
frankbanabentley: I don't question that15:38
abentleyfrankban: That's tricky, because we don't build armhf as part of testing, because we don't have suitable hardware to test with.  I'd talk to sinzui.15:40
sinzuiabentley: mgz: I beleive we could crossbuild it, which would catch the error.15:41
natefinchabentley, frankban, sinzui : note that davecheney said that this was very likely an upstream go 1.2 bug that is not likely to be fixed anytime soon, possibly ever15:41
sinzuiabentley: mgz: I was also thinking of asking for armhf hardware, but mgz might be able to prove the cae for us with this chromebook15:42
sinzuinatefinch: yeah, Go has moved on to newer versions15:42
cheryljhey dimitern, is there any update on bug 1483879?  I know the fix isn't trivial, but the bug was brought up in the cross team call and I wanted to make sure it was still in progress...15:50
mupBug #1483879: MAAS provider: terminate-machine --force or destroy-environment don't DHCP release container IPs <bug-squad> <destroy-machine> <landscape> <maas-provider> <sts> <juju-core:Triaged> <juju-core 1.24:Triaged> <juju-core 1.25:In Progress by dimitern> <https://launchpad.net/bugs/1483879>15:50
dimiterncherylj, it still is - I'm close to proposing 1.24 fix (needed to fix my maas setup to test it properly, which I finished a couple of hours ago)15:52
cheryljdimitern: great, thanks!  I'll put an update in the bug that it's close for 1.24.15:52
cheryljdimitern: will it be difficult to forward port?15:52
dimiterncherylj, the 1.24 fix is the most difficult, as it needs some cherry-picking from 1.2515:53
cheryljah, ok15:53
dimiterncherylj, the 1.25 and master forward ports  should be much simpler15:53
natefinchwwitzel3, ericsnow: fyi, a few tests failed on the lxd build tags merge, I can fix easily, but just letting you know.15:58
ericsnownatefinch: k15:58
wwitzel3natefinch: ty15:59
natefinchalexisb, katco:  I notice the "roomie" thing on the oakland spreadsheet says N/A for everyone.  Does that mean we all get our own rooms?16:01
katconatefinch: that is my understanding16:02
natefinchkatco: awesome :)16:02
katconatefinch: yes, as an introvert that makes me extremely happy16:02
katconatefinch: i.e. i'll actually have a place i can recharge16:03
* dooferlad is a happy introvert as well at this news16:03
natefinchkatco: yeah, totally understand that16:03
natefinchkatco: I'm generally fine with sharing a room... right up until the actual sleeping part... then I really just want my own room, thankyouverymuch.16:04
katconatefinch: it's pretty disastrous for me as i feel like i have to be "on" for 24/7 for an entire week16:05
wwitzel3I usually wake up at 7am and don't go to bed until 3am during sprints .. they should really do a timeshare thing, probably save money16:07
katcowwitzel3: 1:1?16:07
wwitzel3I also feed off the engery of introverts, so that helps16:08
natefinchwwitzel3: rofl16:08
katcohaha16:08
lazypower> I feed off he energy of introverts16:16
lazypowerstrange place to join the conversation, but knowing wwitzel3 i'm not surprised...16:16
wwitzel3lazypower: :)16:21
natefinchno wonder wwitzel3 likes programming so much... neverending supply of food.16:22
frobwarecherylj, would you have time to try a fix for https://bugs.launchpad.net/juju-core/+bug/141262116:31
mupBug #1412621: replica set EMPTYCONFIG MAAS bootstrap <adoption> <bootstrap> <bug-squad> <charmers> <cpec> <cpp> <maas-provider> <mongodb> <oil> <juju-core:Triaged by frobware> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1412621>16:31
natefinchman, prices for flights went up like 25% from yesterday afternoon :/16:36
perrito666natefinch: nearing christmas16:36
natefinchperrito666: I think we just crossed the "30 days from flight day" mark16:37
perrito666ah, ok, that makes me panic16:37
perrito666my before trip todo is especially long16:38
ericsnownatefinch: I found a couple typos in your build constraints patch (left a review)17:03
natefinchericsnow: thanks!17:06
ericsnownatefinch: np17:06
ericsnownatefinch: found it while rebasing my patches on yours :)17:06
natefinchericsnow: "how did this compile before?"  exactly my question17:08
ericsnownatefinch: I don't think it did :/17:08
natefinchericsnow: there were two compiler errors before I even changed anything.  My guess is that they were because of a merge/rebase17:10
ericsnownatefinch: yep17:10
natefinchericsnow: fixed those spots btw17:10
ericsnownatefinch: thanks17:11
natefinchericsnow: the only thing left is some provisioner tests that are failing17:11
ericsnownatefinch: weird17:12
ericsnownatefinch: look for instance.LXD in those tests17:12
mupBug #1513552 opened: master cannot deploy charms <blocker> <charm> <ci> <deploy> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1513552>17:12
natefinchthe letters lxd don't even exist in files in this directory :/17:12
natefinchericsnow: it's suspicious because it's in container_initialisation_test.go .. which implies it is our fault (maybe we're being punished for the UK spelling in the filename)17:15
perrito666bbll17:39
* perrito666 goes to curse ha screaming and will be back later17:40
cheryljnatefinch: can you take a look at bug 1513552?  It looks like some of your recent commits are causing widespread CI failures17:41
mupBug #1513552: master cannot deploy charms <blocker> <charm> <ci> <deploy> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1513552>17:41
ericsnowalexisb: sorry, zillow killed my browser17:48
alexisblol17:48
alexisb:)17:48
cheryljcould anyone tell me when we would see SECURE_STATESERVER_CONNECTION: "false" in agent.conf rather than "true"?18:11
natefinchcherylj: ok looking18:38
natefinchhmm... many of those failures mention deployer... seems suspicious19:12
natefinchsinzui: I wonder if deployer is doing something different than juju-core... because I can deploy charms just fine with juju deploy19:13
natefinchsinzui: re: that "cannot assign unit" problem19:13
natefinchkatco: FYI, spending some non-trivial time looking into bug 151355219:14
mupBug #1513552: master cannot deploy charms <blocker> <charm> <ci> <deploy> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1513552>19:14
sinzuinatefinch: the deployer jobs that stand up openstack report the same problem19:15
katconatefinch: yep thx for the heads-up19:15
sinzuinatefinch: eg: http://reports.vapour.ws/releases/3270/job/OS-deployer/attempt/50219:15
sinzuinatefinch: the quickstart error is also the same (though reported differently) http://reports.vapour.ws/releases/3270/job/aws-quickstart-bundle/attempt/128219:16
sinzui^ landscape bundle19:17
natefinchsinzui: do we have tests that just use juju deploy?19:17
natefinch(not being snarky, real question... want to make sure I'm right in thinking it's deployer etc)19:18
sinzuinatefinch: to deploy a bundle? not yet19:18
natefinchsinzui: no, to deploy a charm19:19
sinzuinatefinch: many tens of tests deploy a charm19:20
natefinchsinzui: and those all pass?  So it's just deployer and quickstart?19:20
sinzuinatefinch: no. http://reports.vapour.ws/releases/3270 clearly shows "deploy", "deployer", and "quickstart" are broken on all substrates, all series, all archs19:21
sinzuinatefinch: this shows every test that failed in the two revs that were tested http://reports.vapour.ws/releases/issue/563b902c749a563ed218b6cf19:21
sinzuinatefinch: I think I see the issue :)19:26
sinzuinatefinch: nm, I am looking at stale data19:26
cheryljaww, I had gotten my hopes up19:27
natefinchman I wish we didn't redact the API parameters19:48
ericsnowfwereade: isn't it gloomy in the catacombs?19:55
ericsnowfwereade: I suppose they have a certain charm :)19:55
fwereadeericsnow, terribly glum, yeah :)19:55
thumpermenn0: test failure: [LOG] 0:00.206 ERROR juju.apiserver debug-log handler error: tailer stopped: tailable cursor requested on non capped collection20:00
wwitzel3katco ericsnow natefinch ping20:00
katcowwitzel3: brt20:02
natefinchsinzui: I bet this is a race condition where the script is calling expose or something before the unit has a machine assigned, and that's causing us to go down a codepath we didn't previously go down, because unit creation and assignment happened in lock step.20:10
natefinchsinzui: because a simple deploy definitely still works20:10
sinzuinatefinch: maybe. the deploy_stack.py copies a lot of deployment where expose is called immediately after deploy and add-relation. I thnk deployer though deferred that operation until relations were added, and relations were deffered until all units were up20:13
sinzuinatefinch: http://reports.vapour.ws/releases/3270/job/aws-deploy-trusty-amd64/attempt/2476 does show that deploy, add-relation, and export were all called withing seconds of each other20:15
katcowwitzel3: ericsnow natefinch sorry, almost there20:17
natefinchsinzui: definitely looks like expose and/or add-relation is the problem, I can repro if I do juju deploy wordpress && juju add-relation wordpress mysql && juju expose wordpress20:23
thumperfwereade: ping20:23
fwereadethumper, pong20:23
thumperfwereade: can we chat in about 10min?20:23
fwereadethumper, sure20:24
fwereadenatefinch, I think you're right20:24
fwereadenatefinch, is that the firewaller falling over?20:25
natefinchfwereade: sorry, in a meeting20:30
fwereadenatefinch, np20:32
thumperfwereade: https://plus.google.com/hangouts/_/canonical.com/chat?authuser=120:37
cheryljwallyworld: you around?21:26
wallyworldcherylj: sorta21:27
cheryljwallyworld: could you ping me when you get a few minutes?  I need some help with a bug21:27
wallyworldsure, give me 10 mins21:27
cheryljsounds good, thanks21:27
=== sinzui_ is now known as sinzui
wallyworldcherylj: which bug?21:37
cheryljhttps://bugs.launchpad.net/juju-core/+bug/151278221:37
mupBug #1512782: wget cert issues causing failure to create containers <cloud-installer> <juju-core:Triaged> <https://launchpad.net/bugs/1512782>21:37
cheryljThey're doing some weird stuff with their lxcbr0 and nested kvm / lxc21:38
wallyworldcherylj: give me a minute to read bug info; that's wget/lxc stuff hasn't changed in ages so it's likely to be specific to their setup21:39
cheryljThe last update is probably the most useful21:40
cheryljwallyworld: this line in the machine-0.log also looks really weird to me:  DEBUG juju.worker.certupdater certupdater.go:191 new addresses [localhost juju-apiserver juju-mongodb anything 10.0.7.1]21:41
wallyworldcherylj: that's normal - we use those hard coded names plus machine IP addresses in the CA cert SAN21:42
cheryljok21:42
wallyworldcherylj: what may be the issue though is that we add the IP address of juju managed maches to the cert SAN - that I address has to be the source of where the wget comes from21:43
wallyworldif they are using some weird network setup the cert addresses won't match21:43
cheryljwallyworld: I think that's the case21:43
wallyworldhmmm21:44
wallyworldoff hand i'm not sure how to solve that21:44
cheryljthe logs for the lxc-create show they're getting the image through 10.0.3.45, but the only ip in the certupdater is the 10.0.7.121:44
wallyworldyup21:44
menn0thumper: wrt to that the juju-run/juju-dumplogs symlink issue, I've had a better idea21:44
wallyworldthe cert updater wroks off listening to the addresses juju records for the machines21:44
thumpermenn0: yeah?21:44
menn0thumper: make the symlinks relative to the cmd.Context's Dir21:45
thumperwill that always work?21:45
wallyworldcherylj: so we need to look at the address updater worker to see how to make it recognise and report the correct addresses21:45
menn0thumper: in production this will be "/", but in tests (when using testing.RunCommand) it'll be a temp directory21:45
wallyworldcherylj: off hand, i'd need to look into how all that works21:45
thumpermenn0: have you checked that production it is "/" ?21:46
menn0thumper: yep, I just checked that21:46
thumpermenn0: also consider the current local provider21:46
menn0thumper: what's different with the local provider?21:46
thumpermenn0: it is just "special"21:46
thumperdatadir is ~/.juju/<env-name>/21:46
wallyworldcherylj: actrually, looking at the logs, that 10.0.3.45 address is the state server address21:47
menn0thumper: yep, that's not related to this. the source of the symlink doesn't change. it's just where the symlink gets put.21:47
wallyworldi'm talking about the machine on which the container is being created21:47
thumpermenn0: k21:47
cheryljwallyworld: yeah...21:47
menn0thumper: i'll make the change now as a separate PR so you can have a look21:47
wallyworldcherylj: so we need to ensure the CA cert records in its SAN list the IP addresses of all worker machines, the addresses from which wget requests originate21:48
wallyworldthat means we need juju to record the correct thing in the machine address field21:49
wallyworldso we need the machine agent (i think it's the agent) to report the correct addresses21:50
wallyworldnot sure how smart this all is with different nrtwork setups21:50
cheryljwallyworld: I see that it picks up 10.0.3.45 as a machine address.  Not sure why the certupdater didn't include that one21:51
wallyworldcherylj: that's the state server address21:51
wallyworldwe need to record the machine address from which the wget originates21:51
wallyworldcherylj: so we need each machine to correctly report to juju its address21:52
wallyworldor addresses21:52
wallyworldthose are stored in the machine addresses field in state21:52
wallyworldthat's what the cert updater listens to21:53
wallyworldi think its the machine agent on each machine which reports those addresses21:53
cheryljwallyworld:  you mean like this?  INFO juju.worker.machiner machiner.go:100 setting addresses for machine-0 to ["local-machine:127.0.0.1" "local-cloud:192.168.122.1" "local-cloud:10.0.3.45" "local-machine:::1"]21:53
wallyworldso my guess from memory is that we need to look at how the machine agent queries its host addresses21:53
wallyworldcherylj: are they running the lxc containers on machine 0?21:54
wallyworldi was assuming there'd be a machine 1 or 2 or whatever21:55
cheryljwallyworld: no, I think they're doing it on machine-121:56
wallyworldi think you said they were nesting lxc inside kvm?21:56
wallyworldcherylj: right ok. so whatever the source IP address of the wget request is has to be recorded in the SAN list21:56
wallyworldcherylj: that source address is the machine hosting the lxc containers21:57
wallyworldwhich is typically machine 1's address. or it needs to be the kvm address if lxc inside kvm21:57
wallyworldcherylj: so there needs to be a line in the logs like the above but which says setting addresses for machine-1 ....21:58
cheryljwallyworld: okay, I see what you mean now21:58
wallyworldcherylj: i have to relocate, will be afk for 20 minutes21:59
cheryljwallyworld: sure, np21:59
ericsnownatefinch: check out http://reviews.vapour.ws/r/3080/22:24
menn0thumper: this approach is working out pretty nicely22:26
thumpermenn0: awesome22:26
thumpermenn0: did the no tail bit land in master?22:26
thumpermenn0: it may help me fix this problem :)22:26
menn0thumper: yep, that's landed22:27
menn0man there's lots of tests that use cmd.DefaultContext that should probably be using cmdtesting.Context22:29
* menn0 ignores for now22:29
perrito666and then you froze22:32
perrito666wwitzel3:22:32
perrito666I meant wallyworld22:32
axw_perrito666: what do you need a visa for?22:58
perrito666axw_: enter australia22:59
axw_perrito666: just for meetings? :/22:59
perrito666axw_: I got an e-visa23:02
wallyworldperrito666: we're here now23:16
davechen1yhttps://github.com/docker/docker/pull/1770023:27
davechen1y^ docker have removed support for lxc23:27
mupBug #1513659 opened: 1.24.6 fails to bootstrap with "ERROR juju.cmd supercommand.go:430 upgrade in progress - Juju functionality is limited" <cloud-installer> <juju-core:New> <https://launchpad.net/bugs/1513659>23:31
mupBug #1513659 changed: 1.24.6 fails to bootstrap with "ERROR juju.cmd supercommand.go:430 upgrade in progress - Juju functionality is limited" <cloud-installer> <juju-core:New> <https://launchpad.net/bugs/1513659>23:37
mupBug #1513659 opened: 1.24.6 fails to bootstrap with "ERROR juju.cmd supercommand.go:430 upgrade in progress - Juju functionality is limited" <cloud-installer> <juju-core:New> <https://launchpad.net/bugs/1513659>23:40
davechen1ymwhudson: ding ding ding.23:44
* mwhudson vibrates tunefully23:45
mwhudsondavechen1y: eh?23:45
davechen1yyour ppc64 observation23:47
ericsnowkatco: am I okay landing my fix of natefinch's patch?  http://reviews.vapour.ws/r/3080/23:48
mwhudsondavechen1y: ah heh23:48
ericsnowkatco: I'd like to land those LXD patches today23:49
mwhudsondavechen1y: can you re-run test_shared on arm64 pls?23:49
mwhudsondavechen1y: that was pretty mystically debugging from russ23:49
davechen1yhe is the master to psychic debugging23:50
davechen1ymwhudson: arm64 -- will do23:50
mwhudsonta23:51

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!