#juju-dev 2012-04-16
<fwereade_> mornings
<TheMue> morning wrtp
<wrtp> TheMue: hiya
<TheMue> wrtp: would you like to review a checkin?
<TheMue> wrtp: it's the https://codereview.appspot.com/6011047/
<wrtp> TheMue: ok, will have a look
<TheMue> wrtp: great, thx
<wrtp> TheMue: review delivered
<TheMue> wrtp: ah, fine, thx
<niemeyer> Morning everybody
<TheMue> niemeyer: moin
<niemeyer> TheMue: Heya
<wrtp> niemeyer: yo!
<wrtp> niemeyer, TheMue, fwereade__: are we having a meeting now?
<TheMue> wrtp: i'm prepared
<niemeyer> wrtp: Heya
<niemeyer> wrtp, TheMue, fwereade__: Let's do it then
<fwereade__> cool
<wrtp> niemeyer: you doing the invites?
<niemeyer> wrtp: Yep
<fwereade__> niemeyer, it was there for a moment; was empty on join, and then disappeared
<niemeyer> fwereade__: I'm online with Frank already
<wrtp> niemeyer: i haven't seen anything yet
<niemeyer> fwereade__: Try looking at my profile
<niemeyer> wrtp: Ditto
<fwereade__> niemeyer, still nothing :/
<wrtp> niemeyer: i am
<wrtp> niemeyer: canonical profile or your normal one?
<niemeyer> wrtp: My normal one
<niemeyer> wrtp, fwereade__: Sent the invite once more
<wrtp> niemeyer: assuming this is the right one (https://plus.google.com/107994348420168435683/posts) i still see nothing
<wrtp> niemeyer: ah, found it by going direct to Hangouts
<TheMue> niemeyer: shall i start here https://wiki.canonical.com/UbuntuEngineering/Server/Squads/Juju ? it's a very empty page so far
<niemeyer> TheMue: No, start here please: juju.ubutu.com/Roadmap
<TheMue> niemeyer: ok, thx, will do so
<niemeyer> TheMue: Or here, even: juju.ubuntu.com/GoPort
<TheMue> niemeyer: yip, and i'll see how ot can be easily maintainable
<TheMue> niemeyer: pls don't get me wrong, i don't want a huge doc, i only want to take care the all of us can quickly get an overview on what's already done, what we do and what's left until 12.10
<niemeyer> TheMue: Sure, please make sure it stays up-to-date, though
<niemeyer> TheMue: I've seen many such documents become a snapshot of their creation time
<TheMue> niemeyer: oh yes, i know what you mean and i share your headaches. but i've also often seen how important such a simple view could be (a kind of kanban view)
<niemeyer> TheMue: Yep, as long as you have a plan on how to maintain that up-to-date, sounds great
<TheMue> niemeyer: i would do it as a part of our weeklys
<TheMue> niemeyer: as you said with those thre (scrum) statements: what's done, what's planned, what's blocking
<TheMue> niemeyer: and that could be immediately aligned with the little doc
<TheMue> niemeyer: thankfully the risk of new feature requests is relative low (port python is pretty clear)
<TheMue> niemeyer: i've seen enough projects starving due to get more things to do over time than they can be handled
<TheMue> niemeyer: btw, the branch to review i told about is https://codereview.appspot.com/6011047/. roger already made a review, so with your comments it should be done fast (at least i hope so *smile*)
<niemeyer> TheMue: Sounds good, will have a look, possibly after lunch since Ale has a sharp timing today
<TheMue> niemeyer: enjoy your meal
<niemeyer> TheMue: Thanks
<TheMue> wrtp: thx for review again, i think i'll give a changed update() a try and check how it'll look
<wrtp> TheMue: np
<fwereade__> need to be off sharpish today; so, good evenings all!
<wrtp> fwereade__: have a good evening, cheerio
<robbiew> wrtp: ping...just noticed you submitted a nat'l holiday for May 5th (monday), however we'll be at UDS that week...were you planning on swapping that later?
<wrtp> robbiew: yes, if that's ok - i did it before i knew when uds was, i think.
<wrtp> robbiew: i don't quite know what the policy is w.r.t. national holidays during company events
<robbiew> wrtp: cool..yeah, just swap it for sometime before or later
<niemeyer> fwereade__: Cheers man
<TheMue> robbiew: oh, i just forgot that national holidays have to be submitted. we've one on may, 1st
<TheMue> fwereade__: have a nice evening
<robbiew> TheMue: it's no biggie
<TheMue> robbiew: but it has been a good reminder. ;)
<robbiew> :)
<TheMue> robbiew: had done a little press work again: http://www.pressebox.de/pressemeldungen/canonical/boxid/500212 (this time for AWSOME)
<robbiew> nice
<niemeyer> TheMue: As a hint, there are 4 different branches on that one code review
<niemeyer> TheMue: Changes to the watcher package, and three different watcher types
<niemeyer> TheMue: If you want to collaborate with the review queue issue, a great way is to avoid aggregating such changes in a single large change set
<niemeyer> TheMue: Helps you, helps the reviewers
<TheMue> niemeyer: yip, started smaller but grown too fast. will control it better next time
<wrtp> TheMue: you should probably know that "yip" in english usually means the bark of a small dog :-)
<niemeyer> TheMue: In fact, can you please break it down?
<niemeyer> TheMue: I won't be able to provide you a full review in time for you to do anything with it today.. breaking this into smaller chunks will be time well spent today
<wrtp> niemeyer: one review delivered
<niemeyer> wrtp: Right, he can easily address those points while breaking it down into smaller chunks
<niemeyer> wrtp: TheMue was very concerned today with the review queue size, and I agree.. let's start some action towards making it more manageable
<wrtp> niemeyer: i was talking about a review of your code :-)
<niemeyer> wrtp: Oh, woohay!
<niemeyer> wrtp: Thanks ;-)
<niemeyer> Oh, crap, meeting time
<wrtp> niemeyer: FWIW i thought TheMue's review wasn't too bad.
<niemeyer> robbiew: are you joining too?
<wrtp> niemeyer: it consisted of related changes, easy enough to understand together
<robbiew> niemeyer: it's cancelled
 * wrtp has to go. see y'all tomorrow
<niemeyer> robbiew: Oh?>
<robbiew> apparently no one thought it was important to tell us
<robbiew> lol
<robbiew> most folks are at ODS
<niemeyer> Heh :(
<TheMue> niemeyer: will see how i can split it. but for today i'm off.
<wrtp> niemeyer: rather than caching the key->keyid map, might it be better to have a Counter method on store that returned an object that could be used to inc and dec a particular counter.
<wrtp> ?
<TheMue> wrtp: thanks for the yip-hint
<niemeyer> TheMue: If you're concerned with the review queue size, smaller branches are a *huge* help..
<wrtp> TheMue: np. "yup" is more convention BTW.
<TheMue> wrtp: hehe, ok
<niemeyer> wrtp: I don't understand what you mean
<niemeyer> wrtp: How's one thing related to the other?
<wrtp> niemeyer: take it back, i'm not sure it helps
<wrtp> niemeyer: i thought it would be easy to push the cache back into the store server. but it's dynamically creating keys all the time, so it won't help.
<wrtp> anyway, gotta go
<niemeyer> wrtp: Cheers
<niemeyer> It's not clear to me why an ExistenceWatcher is watching the content too..
<niemeyer> Also, what happens if the content changes?
<niemeyer> hazmat: ping
<hazmat> niemeyer, pong
<niemeyer> hazmat: Yo
<niemeyer> hazmat: Was just trying to remember what was that non-expected case of watchers in zookeeper
<hazmat> niemeyer, how so.. exist watches track all events
<niemeyer> hazmat: is it existsw that watches for content changes too?
<hazmat> yes
<niemeyer> hazmat: Aha, ok, cheers
<hazmat> the only sig difference between them is that you can set an exists watch on a non existant node
<hazmat> them = get vs exists
<niemeyer> hazmat: Ah, right
<niemeyer> hazmat: What about the opposite.. do you recall if a getw gets a sig on removal?
<niemeyer> TheMue: Btw, I think ExistenceWatcher should be merged on ContentWatcher..
<hazmat> niemeyer, it does
<hazmat> switching sessions/rooms bbiam
<niemeyer> mthaddon: ping
<m_3> hey gang... FYI
<m_3> 12:59 <SpamapS> Ok, folks PRECISE is the new dev focus for charms.
<m_3> 13:00 <SpamapS> charm promulgate will now promulgate to precise
<m_3> 13:00 <SpamapS> lp:charms/* is now precise
<m_3> 13:00 <SpamapS> all branches have been copied forward
<m_3> 13:00 <SpamapS> oneiric, may ye rest in stability.. aaaahhhmmeeen
#juju-dev 2012-04-17
<TheMue> morning all
<fwereade_> heya TheMue
<wrtp> fwereade_, TheMue: yo!
<TheMue> fwereade_: moin
<TheMue> wrtp: moin
<TheMue> lunchtime
<niemeyer> Morning all
<TheMue> niemeyer: moin
<niemeyer> TheMue: Heya
<wrtp> niemeyer: hiya
<hazmat> g'morning
<fwereade_> wtf, it's hailing
<niemeyer> wrtp: Any chances of a few additional reviews? You've reviewed the tips but the pre-reqs are blocking them
<fwereade_> this is not why I moved to the med
<wrtp> niemeyer: i thought i'd reviewed 'em all.
<niemeyer> wrtp: Nope, there are still pending ones in the queue
<wrtp> niemeyer: one mo, i'll try and remember where the queue page is again... :-)
<niemeyer> wrtp: Meanwhile, I'll code a quick one
<niemeyer> wrtp: TO disable stats so that performance checks do not affect them
<wrtp> sometimes i want a forward link as well as a backward link, so i can easily find the next review to do...
<wrtp> niemeyer: done, i think. let me know if i've missed any
<niemeyer> wrtp: You have, the one I'm pushing *right now*! ;-)
<wrtp> :-)
<niemeyer> wrtp: Should be a breeze, though
<niemeyer> wrtp: https://codereview.appspot.com/6061045
<wrtp> niemeyer: wouldn't it be better to have stats enabled by default?
<wrtp> niemeyer: i.e. disablestats=1 rather than stats!=0
<niemeyer> wrtp: It is enabled by default.. isn't it?
<wrtp> niemeyer: it needs the "stats" field in the form to be set, no?
<niemeyer> wrtp: Nope
<niemeyer> wrtp: It needs it to not be set to zero
<niemeyer> wrtp: Note that tests continue passing
<wrtp> doh
<wrtp> niemeyer: LGTM
<niemeyer> wrtp: Cheers!
<wrtp> niemeyer: i still don't see why you can't just use a blank user name in the key BTW
<niemeyer> wrtp: A blank string is a valid token
<wrtp> rather than having those if statements strewn everywhere
<wrtp> niemeyer: why's that a problem?
<niemeyer> wrtp: It's a problem because there's no point in having to match it
<niemeyer> wrtp: The data should be foo:bar, not foo:bar::
<niemeyer> wrtp: Just like the URL is cs:oneiric/hadoop, not cs:~/oneiric/hadoop
<andrewsmedina> wrtp: hi
<wrtp> andrewsmedina: hiya
<andrewsmedina> wrtp: everything ok?
<wrtp> andrewsmedina: good thanks
<andrewsmedina> wrtp: I run the ec2 (go port) tests here in my machine, but they are failing
<andrewsmedina> wrtp: I need to do something to run these tests?
<wrtp> andrewsmedina: please run with -gocheck.vv and send me the output
<wrtp> niemeyer: is "cs:~/oneiric/hadoop" valid?
<niemeyer> wrtp: No, it's not.. that was the point
<andrewsmedina> wrtp: http://paste.ubuntu.com/933966/
<andrewsmedina> wrtp: It's need set the AWS envs?
<wrtp> andrewsmedina: you need to set up an amazon AWS account and set your $AWS_ variables appropriately
<wrtp> niemeyer: how visible are the keys to users?
<wrtp> niemeyer: i think that having all charm keys of the form series:name:user: even when user is blank should be ok, if it's not generally visible.
<wrtp> niemeyer: it would remove quite a few lines of code, at any rate :-)
<niemeyer> wrtp: It doesn't matter.. let's please not be lazy.
<niemeyer> wrtp: The data is oneiric:hadoop.. not oneiric:hadoop::
<wrtp> niemeyer: i don't think it's lazy, i think it's appropriate canonicalisation
<niemeyer> wrtp: You're trying to save a couple of lines of code.. I can put that in a function if it bothers you so much, but it shouldn't affect the desired data format that we want
<niemeyer> wrtp: The proper data format is oneiric:hadoop, not oneiric:hadoop:
<andrewsmedina> wrtp: this for local tests?
<fwereade_> niemeyer, re https://code.launchpad.net/%7Eniemeyer/juju/go-store-blitz-key/+merge/102230 ...I don't quite get what performance is being tested there, it doesn't seem like it'd hit anything very significant?
<niemeyer> fwereade_: wrtp misunderstood the point of that key, and that's of course because I've been extremely vague about it
<niemeyer> fwereade_: This is just a validation key to tell the service we own this site
<fwereade_> niemeyer, ha! ok, that suddenly makes lots more sense
<wrtp> andrewsmedina: ah, they shouldn't be... that might be a bug. in the meantime, try setting the AWS_ variables to something random and see if it works
<wrtp> niemeyer: ah! i thought it was used directly as part of a performance test...
<niemeyer> wrtp: No, the goal is to test the real endpoints
<wrtp> niemeyer: cool, i understand now. i wondered about the strange form of the URL.
<niemeyer> wrtp: That's why we need stats=0 as well
<wrtp> andrewsmedina: yes, it's a bug. will fix. but for the time being, just do export AWS_ACCESS_KEY_ID=x; export AWS_SECRET_ACCESS_KEY=x; before running the tests
<fwereade_> niemeyer, hmm, it seems like stats=0 is a choice between skewing the stats and skewing the performance data
<fwereade_> niemeyer, and while obviously we don't want to skew the stats, it doesn't feel quite right
<niemeyer> fwereade_: I agree, obviously we don't want to skew the stats :-)
<niemeyer> fwereade_: Hmm.. we can put the IncCounter run in the background, actually
<niemeyer> fwereade_: Which means it won't affect the response timing either way
<andrewsmedina> wrtp: ok!
<fwereade_> niemeyer, well, it's still load on the server, isn't it, even if it doesn't necessarily directly affect the response time of that query? I'd be happiest if we could switch it to perform a real write to a different place
<fwereade_> niemeyer, but that'll potentially have different characteristics too, so maybe I'm heading up the garden path here
<wrtp> niemeyer: how about defining a function like this and using it throughout: http://paste.ubuntu.com/933992/
<niemeyer> <niemeyer> wrtp: You're trying to save a couple of lines of code.. I can put that in a function if it bothers you so much, (...)
<niemeyer> wrtp: Yes :)
<wrtp> niemeyer: it would make me happier, yeah. that "if curl.User" test is in quite a few places.
<niemeyer> wrtp: Sure, will do it
<niemeyer> fwereade_: That feels a bit over the top for the level of traffic we currently have :-)
<fwereade_> niemeyer, yeah, garden path :)
<niemeyer> fwereade_: I'm most interested in knowing which ballpark we're in, and measuring that every once in a while to see how it's going
<niemeyer> fwereade_: It's far from being any kind of issue that would require such care
<fwereade_> niemeyer, indeed
<niemeyer> fwereade_: I appreciate your input though, for real
<wrtp> niemeyer: i think i agree with fwereade_ that we should in general test the usual path (including stats gathering). but it's useful to be able to turn off stats too, so that we can see how much overhead they impose.
<niemeyer> fwereade_: I will move these calls to the background, and put that with the change wrtp just requested
<fwereade_> niemeyer, perfect, thanks
<wrtp> niemeyer: sounds good
<niemeyer> fwereade_, wrtp: https://codereview.appspot.com/6063043
<wrtp> niemeyer: chunk mismatch :-(
<wrtp> niemeyer: oh, it's working now
<wrtp> oh no it's not
<fwereade_> niemeyer, LGTM
<niemeyer> What the heck..
<fwereade_> (I see no mismatches...)
<niemeyer> Well.. wrtp: https://codereview.appspot.com/6063043/patch/1/3
<niemeyer> fwereade_: You're probably not looking at the side-by-side
<wrtp> niemeyer: yeah, that works fine for me
<fwereade_> niemeyer, that is true
<niemeyer> Hmm.. I'll have to add a trick to the test to stabilize it
<wrtp> niemeyer: i'd implement use the same function in the test.
<niemeyer> wrtp: I prefer to see it in the test
<niemeyer> wrtp: Most of them are constants
<wrtp> niemeyer: ok
<wrtp> niemeyer: i'm slightly concerned about the number of background IncCounters there may end up being. depends on the traffic i guess.
<niemeyer> wrtp: They're necessarily proportional to the number of background requests being handled
<niemeyer> wrtp: If incrementing a counter is more expensive than handling the request, something is wrong
<wrtp> niemeyer: ok, if you're sure about that.
<niemeyer> wrtp: Well, it's more that I can't imagine how either of those statements could possibly be false.. :-)
<wrtp> niemeyer: only if the IncCounters aren't completing in a timely way and something's hammering the server with a cheap request.
<niemeyer> wrtp: That's addressed by the second point
<wrtp> niemeyer: indeed.
<niemeyer> Lunch time.. back in a bit to finish merging/applying reviews
 * niemeyer waves
 * wrtp waves back.
<fwereade_> gn all, happy evenings
<niemeyer> fwereade_: Cheers!
<niemeyer> wrtp: I think I forgot a couple of renames you requested, but still have them in mind. Will submit them at the end of the pipeline.
<wrtp> niemeyer: i was just trying to go back and have a look at the reviews, but my gmail seems to be down.
<wrtp> niemeyer: still down. let me know here of anything you'd like me to respond to...
<niemeyer> wrtp: I think it's all good in general.. I've agreed to most things you've said
<wrtp> niemeyer: could you paste a link to the CL please...
<niemeyer> wrtp: I actually didn't miss the review points I mention above either.. they were in the caching branch, which is still not in
<niemeyer> wrtp: Which one the 7-8 ones? :-)
<wrtp> niemeyer: the more recent one (it has links to the previous ones)
<wrtp> s/more/most/
<niemeyer> wrtp: https://code.launchpad.net/~niemeyer/juju/go-store-stats-bg/+merge/102316
<wrtp> i'm sure i sent a LGTM for that one, oh well
<niemeyer> wrtp: You did on IRC
<wrtp> ah, must've forgot to do it on codereview
<wrtp> just the cache one to go
<niemeyer> Oh, I think I missed what fwereade_ meant in one of his comments
<niemeyer> TheMue: ping
<TheMue> niemeyer: pong
<niemeyer> TheMue: Heya
<niemeyer> TheMue: Just wanted to see how the branch is coming
<TheMue> niemeyer: hi, today i have a real splitted day
<niemeyer> TheMue:  Did you read my follow up comments yesterday?
<TheMue> niemeyer: yes, and the changed ChangeWatcher is now in testing and will then go in
<niemeyer> TheMue: Super.. did it feel good while doing the changes?
<TheMue> niemeyer: it would have been better if the GetW() watch also accepts ZNONODE and later can be used for new created nodes
<TheMue> niemeyer: but i think i found a way
<niemeyer> TheMue: Cool
<TheMue> niemeyer: it now also does a re-get if a first get runs on a ZNONODE, does an ExistsW() for the needed watch and stat is not nil (aka somebody has created the node exactly between the get and the exists)
<niemeyer> TheMue: Cool
<niemeyer> TheMue: Does it do a re-exists too, if the node disappears? :-)
<TheMue> yip
<TheMue> niemeyer: eh, yup ;)
<niemeyer> TheMue: Hehe :)
<niemeyer> TheMue: Super
<TheMue> niemeyer: it's a loop waiting for the correct criterias
<niemeyer> TheMue: Makes sense
<niemeyer> TheMue: It's awesome to have that well abstracted away in that compact form
<TheMue> niemeyer: test looks good
<andrewsmedina> wrtp: I'm thinking in move cloudinit to environs package. Because it's used for all providers, ec2, local etc
<wrtp> andrewsmedina: is it used for the local provider?
<andrewsmedina> wrtp: somethings, like jujuOrigin and other things
<wrtp> andrewsmedina: i was certainly thinking of moving into the environs package at some point, but i didn't think it was quite time yet
<andrewsmedina> wrtp: I'm writing the local provider
<wrtp> andrewsmedina: does the local provider actually execute the cloudinit init script?
<wrtp> andrewsmedina: i was under the impression that the local environ would execute the agents directly.
<wrtp> andrewsmedina: (but i'm fuzzy in this area :-])
<wrtp> andrewsmedina: i'm going to have to go very shortly BTW, sorry.
<andrewsmedina> wrtp: ok, can I talk with you about it later?
<wrtp> andrewsmedina: that would be good. any time tomorrow is fine. i'm online from about 0800UTC
 * wrtp always finds crypto auth errors to be a pain to diagnose.
<andrewsmedina> wrtp: ok
<wrtp> niemeyer: as you might guess, i'm actually testing the ssh tunnel stuff. not far now :-)
<wrtp> "sshd: Failed publickey for rog from 127.0.0.1 port 45416 ssh2" ... but why has it failed?
<wrtp> time to stop for the day. see y'all tomorrow.
<wrtp> andrewsmedina: speak tomorrow, i hope
<TheMue> niemeyer: So, changed watcher is in at https://codereview.appspot.com/6059044
<andrewsmedina> wrtp: ok
<andrewsmedina> wrtp: thanks
<niemeyer> TheMue: Within that loop, I believe the eventType is a red-herring
<TheMue> niemeyer: could you please explain?
<niemeyer> TheMue: Sure
<niemeyer> TheMue: We used to consider a removal as an error, so we stopped immediately with the event type hint
<niemeyer> TheMue: Now that's not an error anymore
<niemeyer> TheMue: so we don't really care about what is the *last* event that we got.
<niemeyer> TheMue: We care about our ability to get the state *now*
<niemeyer> TheMue: So you can drop the usage of that detail entirely, I believe, and have a cleaner algorithm with the same semantics
<TheMue> niemeyer: it's a change after a hint of rog. he had a problem with my first approach always trying a GetW() first. his idea has been to skip it if we enter update() with a last EVENT_DELETED
<TheMue> niemeyer: i've done without it before and it worked fine, so i can easily remove it
<TheMue> niemeyer: i'll do so and check it in again. or shall i wait until your review?
<niemeyer> TheMue: Hmm
<niemeyer> TheMue: So the idea is to optimize it so that we try to get the current state right first..
<niemeyer> TheMue: We can simplify it then..
<niemeyer> TheMue: By moving the "exists bool" into the parameter
<niemeyer> TheMue: This should get rid of most of the extra logic handling the event type
<niemeyer> TheMue: We also don't need a watch variable, btw.. we already have one in scope called nextWatch
<TheMue> niemeyer: ok, think I've got your ideas, will try it
<niemeyer> TheMue: As a trivial, comments at lines 126 and 133 are replicating the code below them exactly
<niemeyer> TheMue: That's pretty much it as far as the code is concerned.. it looks nice
<niemeyer> TheMue: Will check the tests once you propose again
<niemeyer> Don't expect anything surprising there..
<TheMue> niemeyer: fine. and I just fetch an evening beer and can so continue now. *lol*
<niemeyer> TheMue: Sounds like a good plan.. :)
<TheMue> niemeyer: yeah
<niemeyer> and I'm *almost* at the end of the pipeline
<niemeyer> Or the graph, more precisely :)
<TheMue> niemeyer: just put some code at the end of the pipeline *smile*
<niemeyer> mthaddon: ping
<TheMue> so, off for today
<niemeyer> I'm taking a break..
#juju-dev 2012-04-18
<TheMue> morning
<fwereade__> heya TheMue
<TheMue> fwereade__: moin
<wrtp> TheMue, fwereade__: mornin'
<TheMue> wrtp: heya
<wrtp> hmm, why is compiz continually burning 84% cpu. i think i may reboot.
<fwereade__> heya wrtp
<wrtp> hmm, turned out restarting chrome helped a lot.
<TheMue> wrtp: yes, sometimes chrome goes wild
<wrtp> anyone here knowledgable about ssh?
<TheMue> wrtp: only as a user. but i'm listening.
<wrtp> TheMue: here's a transcript of an invocation of ssh: http://paste.ubuntu.com/935399/
<wrtp> TheMue: it exits immediately with no error and i can't work out why
<TheMue> wrtp: *click*
<wrtp> TheMue: the debug lines are output from the -v option (debug3 is more verbose than debug1)
<TheMue> wrtp: ok
<wrtp> TheMue: i *think* all the key_read errors are spurious
<TheMue> wrtp: you checked the file? does it exist, does it have proper line endings?
<wrtp> TheMue: it looks fine.
<wrtp> TheMue: http://paste.ubuntu.com/935404/
<wrtp> (i don't care about publishing this particular private key...)
<TheMue> wrtp: hehe
<TheMue> wrtp: hmm, i would now load it into an editor and use 'save as â¦' after controlling settings for line endings and encoding
<wrtp> TheMue: my editor is always utf-8 and always displays carriage returns
<wrtp> TheMue: and it looks like the key read succeeded: debug1: read PEM private key done: type RSA
<TheMue> wrtp: mom, will compare something
<TheMue> wrtp: hm, sh.., looks good so far
<TheMue> wrtp: which line you're referring to?
<wrtp> the one with that value :-)
<wrtp> TheMue: line 123
<TheMue> wrtp: thx
<wrtp> TheMue: ah, i *think* i see it
<wrtp> TheMue: i'm not sure you can run the ssh client without executing a command
<TheMue> wrtp: line 235?
<TheMue> wrtp: here it is closed.
<wrtp> TheMue: line 141
<wrtp> TheMue: it shouldn't be trying to enter an interactive session.
<wrtp> TheMue: i just want a port forwarder
<TheMue> wrtp: but as an interactive session it should stay open, shouldn't it?
<TheMue> wrtp: even if you don't want that here.
<wrtp> TheMue: except that for security reasons i've disabled commands in the sshd
<wrtp> TheMue: by specifying an empty command to be run
<TheMue> wrtp: ah, ok
<wrtp> TheMue: maybe if i make it sleep for a while instead
<wrtp> jeeze
<wrtp> this is such a hack
<wrtp> TheMue: yes, it looks like that's the problem.
<wrtp> TheMue: why should i need to do anything about commands... i just want a port forwarder!
<TheMue> wrtp: I'm looking for that right now. Last time I needed it is long ago. ;)
<wrtp> sigh. the test passes now.
<wrtp> TheMue: thanks for looking at it with me. it helped.
<TheMue> wrtp: np
<wrtp> TheMue: the other really broken thing is that you can't get verbose og messages out of sshd without making it exit after one connection only...
<wrtp> s/og/debug/
<TheMue> wrtp: does http://stackoverflow.com/questions/7533661/how-to-log-ssh-debug-info help?
<wrtp> TheMue: no - that's talking about ssh, not sshd
<TheMue> wrtp: argh, yes
<niemeyer> Heya!
<niemeyer> Passported now :)
<niemeyer> Maybe I'll even go to UDS!
<TheMue> niemeyer: moin
<TheMue> niemeyer: you had troubles with your passport?
<niemeyer> TheMue: No, it just expired
<niemeyer> TheMue: I had to get a new one, but it's always a bit concerning given how frequent we move around
<niemeyer> frequently
<TheMue> niemeyer: ah, ok. mine is still valid, but i needed an ESTA.
<TheMue> niemeyer: how long is it valid in Brazil?
<TheMue> niemeyer: here in Germany it's 10 years.
<niemeyer> TheMue: 5 years only
<niemeyer> TheMue: I was just complaining to the guy about that :)
<TheMue> niemeyer: hehe
<niemeyer> TheMue: At least it's significantly less troublesome to renew it than back in Curitiba
<TheMue> niemeyer: btw, I made some proposals, one only a one line and two ones adding two watcher to unit.
<niemeyer> TheMue: Superb.. I'll put myself to review that right away
<niemeyer> Actually, let me just check if Tom is around
<niemeyer> mthaddon: ping
<andrewsmedina> morning
<andrewsmedina> :D
<TheMue> niemeyer: ok, i started and have to leave early today. daughter has birthday and guests will come soon.
<mthaddon> niemeyer: pings with content get bonus points and save time :)
<TheMue> niemeyer: so i'll see your feedback tomorrow.
<niemeyer> mthaddon: we need to deploy an update to the store
<niemeyer> TheMue: Ah, cool, won't rush then.. you'll have the update by tomorrow for sure
<TheMue> niemeyer: great, thank you
<mthaddon> niemeyer: ok, can you send in an RT with the details and I can follow up?
<niemeyer> mthaddon: Sure
<mthaddon> niemeyer: cool, thx - if you can let me know the number once you have it I can rush it through if it's a priority
<niemeyer> mthaddon: and the number is...
<niemeyer> mthaddon: #52281
<mthaddon> cool, thx
<mthaddon> niemeyer: I was kind of hoping you could include the details of what revnos you would expect to see for each of the parts of the code in the RT so I could cross check that I'm rolling out what you want me to
<niemeyer> mthaddon: Ok, sorry
<niemeyer> mthaddon: It's tip, but I can point you to what tip means for sure
<mthaddon> niemeyer: https://pastebin.canonical.com/64480/ is that what you're looking for?
<niemeyer> mthaddon: yes, that looks right
<mthaddon> cool, will go ahead with that
<niemeyer> mthaddon: I'll be more careful next time
<mthaddon> np, first time we've done this, so just getting the process sorted
<niemeyer> mthaddon: and this is the first time we experience the "schema-upgrades-less" deployment
<mthaddon> indeed!
<niemeyer> mthaddon: I'll get lunch.. back in a bit in case you need me
<mthaddon> k, thx
<mthaddon> niemeyer: let me know when you're back - doesn't seem to be starting with the new code, and nothing obvious in the logs
<mthaddon> niemeyer: ah, I think it's because we've switched to usage: charmd <config path> - didn't realise that was a backwards incompatible change
<mthaddon> niemeyer: have reverted to previous code for now, will make sure to get that updated (and have a few questions for you related to that)
<niemeyer> mthaddon: Hmm.. that was done a while back, by your request
<niemeyer> mthaddon: I thought it was even live already
<mthaddon> niemeyer: it was, but I hadn't yet implemented the changes because I had some questions about the set up
<niemeyer> mthaddon: No problem
<niemeyer> mthaddon: How can I help?
<mthaddon> niemeyer: so I'd like the config that we use to live outside the code tree so that we can include the password in it for production without having that in the branch - there are a few ways we can do this... either just a completely separate config file, or some kind of inheritance where we just include the values that differ from dev in that config file - what do you think?
<niemeyer> mthaddon: I'd just have a file completely outside the tree
<mthaddon> niemeyer: ok cool - and it looks like charmd accepts the full path to the config as an option?
<niemeyer> mthaddon: Yeah
<mthaddon> great, thx
<niemeyer> mthaddon: Both, actually
<niemeyer> mthaddon: You can use the same config for both
<niemeyer> mthaddon: Despite the fact that one of them will ignore one of th eoptions
<mthaddon> niemeyer: cool cool
<mthaddon> niemeyer: and what's the format of including a username and password for the mongo-url? just $host:$port $username:$password ?
<niemeyer> mthaddon: You can use something like this: mongodb://$user:$password@$host/juju
<niemeyer> mthaddon: you'll need a port there as well, if it's non-default
<niemeyer> mthaddon: The "juju" at the end is the database we use
<mthaddon> the example has "mongo-url: localhost:60017"
<niemeyer> mthaddon: That works too :)
<mthaddon> ah okay, so it'd be "mongo-url: mongodb://$user:$password@$host:$port/juju"?
<niemeyer> mthaddon: Right
<mthaddon> k, thx
<niemeyer> mthaddon: If you're curious, http://godoc.labix.org/mgo#Dial
 * mthaddon nods
<mthaddon> niemeyer: it'll likely be tomorrow before I can get these changes in place and do the deploy - is that okay?
<niemeyer> mthaddon: Yeah, should be fine
<mthaddon> thx, deployments should be much quicker normally
<wrtp> niemeyer, fwereade__: finally, working ssh forwarder tests. https://codereview.appspot.com/5970053/
<niemeyer> wrtp: Super!
<niemeyer> wrtp: We need to get our first commands working soon
<wrtp> niemeyer: yeah, definitely.
<niemeyer> wrtp: Will be a lot more exciting when we can start seeing results for real
<wrtp> niemeyer: what command might come first do you think?
<niemeyer> wrtp: bootstrap and status are the obvious pair
<wrtp> niemeyer: bootstrap already works
<niemeyer> wrtp: Well, kind of
<wrtp> niemeyer: oh, ok, it doesn't do the ssh connect... but then it wouldn't need to anyway
<wrtp> niemeyer: next up is making the ssh dial work to zk.
<niemeyer> wrtp: It needs to do more.. we need to figure a way to put commands in the store so we can initialize the ssh
<niemeyer> Emr
<niemeyer> Initialize the zk
<niemeyer> state
<wrtp> niemeyer: i'm not sure what you mean. are you suggesting we do things differently from the python implementation?
<wrtp> ah...
<wrtp> you mean get go to do the initialisation
<wrtp> sure
<niemeyer> wrtp: Yeah, basically that
<niemeyer> wrtp: and so on.. so that we can see results from things being written
<wrtp> ok. well i'm still going to get the ssh working properly first.
<niemeyer> wrtp: Oh yeah, that's a pre-req for everything else
<wrtp> then i'll start mashing the cloudinit stuff
<wrtp> anyway, it's a good time to stop for the day. i can detect twitchiness downstairs :-)
<wrtp> niemeyer: i anticipate lots of remarks on ssh_test.go. i found it quite hard work... i'd appreciate it if you could make sure it works on your computer too, as it might easily be sensitive to ssh version differences.
<niemeyer> wrtp: Sounds great, I'll have some careful experimentation with it
<niemeyer> wrtp: Thanks for the effort there
<niemeyer> fwereade_: Heya
<niemeyer> fwereade_: Have you had a chance to sync up with hazmat?
<hazmat> fwereade_, sync up re?
<hazmat> its demo run around time over here
<hazmat> bcsaller, can you come out on thursday for charm school? it starts at 5:30 in the marina room
<bcsaller> hazmat: as long as thats not the am ;)
<hazmat> bcsaller, fortunately not :-)
#juju-dev 2012-04-19
<wrtp> mornin' al
<wrtp> l
<wrtp> where *is* al?
<fwereade_> wrtp, over the hills and far away ;p
<fwereade_> TheMue, were there meant to be some prereqs on your CLs? Just did one, moved on to the next, and saw the same changes...
<wrtp> fwereade_ is back in Go-land, yay!
<fwereade_> wrtp, yeah, and I can almost remember what the hell I'm doing ;)
<wrtp> fwereade_: bonus
<TheMue> fwereade_, wrtp: moin
<wrtp> TheMue: yo!
<fwereade_> TheMue, heyhey
<TheMue> fwereade_: I would like you to not do something regarding those watchers. It's a sequence of branches based on each other. So I have to realize them bottom-up.
<TheMue> fwereade_: Base is the enhancement of the content watcher (will go in again in a few moments)
<fwereade_> TheMue, ah, ok, sorry; I hope the comments on the one I did might be useful anyway
<TheMue> fwereade_: Sure they will. Which one is it? I've not yet scanned them all.
<fwereade_> TheMue, sorry, got called away; it's https://codereview.appspot.com/6059047/
<TheMue> fwereade_: So, read it, thx. You also prefer "treaten" insted of treated"? *smile*
<fwereade_> TheMue, it's clearly superior ;p
<fwereade_> TheMue, but the tyranny of the majority wins
<TheMue> fwereade_: Hehe, so maybe I should keep it.
<TheMue> fwereade_: The name "loop" for the goroutines is a result of the general discussion about watches here. In my own code I typically call it backend()
<TheMue> fwereade_: The idea of mooving all watches into an own file sounds good. I already have a watch_test.go
<TheMue> fwereade_: unit.go is already quite big.
<TheMue> fwereade_: What I don't get is the "force upgrade". Could you please explain?
<niemeyer> Morning all
<fwereade_> TheMue, heh, I wasn't closely involved, but:
<fwereade_> (heya niemeyer)
<fwereade_> TheMue, a requirement that units be upgradeable in a non-started state came up from somewhere
<fwereade_> TheMue, you can now pass --force to juju upgrade-charm
<fwereade_> TheMue, and the UA interprets that as "upgrade regardless of unit state"
<TheMue> niemeyer: moin
<fwereade_> TheMue, you'd have to take a look at the code to see exactly how it's done
<fwereade_> TheMue, lp:934350
<TheMue> fwereade_: ok, thx for the hint
<niemeyer> The point of ugprades in non-started state comes from the fact upgrades are often used to fix the problem
<TheMue> niemeyer: Good to know. Regarding the state I've yet only tested for the existence of the node for it in ZK.
<TheMue> niemeyer: Could you take a look at http://paste.ubuntu.com/936632/. I fould a compromise between your and mine approach to keep the watcher test code compact.
<niemeyer> TheMue: The approach I suggested isn't really mine.. it's a very common way to structure tests
<niemeyer> TheMue: http://code.google.com/p/go-wiki/wiki/TableDrivenTests
<niemeyer> TheMue: IMO, this is still cleaner
<niemeyer> TheMue: I'm not as religious as other people are.. some actually go overboard and attempt to use this scheme even when it's not really adequate for the test at hand, or could be simplified if different
<niemeyer> TheMue: Your case, though, is a pretty legitimate table
<TheMue> niemeyer: OK, so I'll adopt them, even if I don't even find them more readable.
<niemeyer> TheMue: Let me try to explain why it feels more readable then..
<niemeyer> TheMue: You actually iterate over a list of possibilities with some logic
<niemeyer> TheMue: Instead of having the list of possibilities as a first class entity and having the logic iterating over that list, the test is inverted
<TheMue> niemeyer: Yes, here a table is ok, but later I also have to test for timeouts and closed channels. And the code needed here is due to the structure almost a duplication of the test code above. This is IMHO needless and cluttering.
<niemeyer> TheMue: It has a function with logic displaced, and then manually iterates over the list of possibilities
<niemeyer> TheMue: The table may be richer
<TheMue> niemeyer: Using an anonymous struct, yes.
<niemeyer> TheMue: If you're iterating over a list of possibilities, it's still easier to read through if it is organized as a sequence of facts than if everything is mixed up with a stream of code
<niemeyer> TheMue: It's a bit like having a higher level language to describe the test
<TheMue> niemeyer: How about the call of watcher.Stop() inbetween?
<niemeyer> TheMue: Again, I'm not religious about it, and in many cases this makes things worse.. this looks like a good chance to use it, though
<niemeyer> TheMue: Maybe that call should be out of the iteration logic?
<TheMue> niemeyer: That's what I meant above. There's iteration, then only one err assert, then again a test for timeout. The way I've tried it reads simply top-down (ok, you have to know the three arguments of the assertChange()).
<niemeyer> TheMue: The fact you have a couple of lines afterwards isn't justification to not do an iteration as an actual iteration
<niemeyer> TheMue: Note in my suggestion that the code within the loop is also simpler
<niemeyer> TheMue: and so is the test after it
<niemeyer> TheMue: The logic in the paste instead shoves a true/false pair that is unchanged over the whole iteration, and makes the loop more complex
<TheMue> niemeyer: Yes, the iteration for the (just) 4 values is fine, but later I need two receives with timeout. Now a one-liner, then 4 to 5.
<niemeyer> TheMue: Those are handled in the suggestion I sent you
<niemeyer> TheMue: With something as trivial as
<niemeyer> select {
<niemeyer> case <-time.After(N):
<niemeyer>     t.Fatalf("boom")
<niemeyer> }
<niemeyer> TheMue: That's much better than assert(false)
<TheMue> niemeyer: The new approach uses Fatal to show the timeout better. And still it's more code (and that twice)
<niemeyer> TheMue: It uses fatal in a remote place that is disconnected from the actual logic because we've forced a loop into a sequence of equivalent function calls
<TheMue> niemeyer: But let my try the table approach and then we'll see which one looks cleaner.
<niemeyer> TheMue: Thanks
<niemeyer> TheMue: As a general comment, more code isn't necessarily bad.. there are other factors that have to be taken into account
<TheMue> niemeyer: Yes, readbility. And from my perspective the approach using a little helper is more readble.
<niemeyer> TheMue: Ok.. I disagree, but I respect that
<niemeyer> TheMue: Perhaps one of the reasons I feel different is that you have a comment there playing the role of code that should be readalbe
<niemeyer> 	// Changes() has to be closed.
<niemeyer> 	assertChange(false, nil, false)
<niemeyer> select {
<niemeyer> case ...:
<niemeyer>     c.Fatalf("Changes channel not closed")
<niemeyer> }
<niemeyer> ..
<niemeyer> You need the comment because assertChange(false, nil, false), by itself, says nothing
<TheMue> niemeyer: Hehe, yes, got me. In Smalltalk I wrote "self assertChange: â¦ isClosed: true hasTimout: false."
<TheMue> niemeyer: Here my former lang is more expressive.
<niemeyer> TheMue: Indeed.. one might do the same with Python. The question is whether that'd be good or not. I the specific case we're debating, I still prefer dropping one layer because the helper function is unnecessary.
<TheMue> niemeyer: Maybe it's unnecessary, yes. But many unnecessary things are useful. *smile*
<niemeyer> In other words, isClosed and hasTimeout are both layers that one must look elsewhere to understand.
<niemeyer> They don't pull their weight.
<fwereade_> early lunch, bbs
<TheMue> niemeyer: Here is the new one: http://paste.ubuntu.com/936664/
<niemeyer> TheMue: This comment can go away:
<niemeyer> 	// Receive the four changes create, content change,
<niemeyer> 	// delete and create again.
<niemeyer> TheMue: It's explicit now..
<niemeyer> TheMue: This one too, for the same reason:
<niemeyer> 	// No more changes, expect a timout.
<niemeyer> TheMue: Same about the last one..
<TheMue> niemeyer: So one sentence: Just remove all comments in the method.
<niemeyer> TheMue: Sorry, I was reading through it, but yes.. this is the case.. no comments that duplicate the code please
<TheMue> niemeyer: I would at least for the two last selects prefer to explain what they are concentrating on. E.g. why it makes sense to have an empty case <-timeAfter(). A kind of c.Assert(ImHappyBecauseImCalled).
<niemeyer> TheMue: It makes sense to have an empty time.After because we expect it to happen
<niemeyer> TheMue: We don't need comments for that
<TheMue> niemeyer: OK, so et's hope that a new maintainer in several month will get it as easily too.
<niemeyer> TheMue: If he doesn't get that easily, he shouldn't be working on juju
<niemeyer> (or she)
<TheMue> niemeyer: *lol*
<TheMue> niemeyer: OK, good argument.
<wrtp> TheMue: one advantage of structuring this as a table is that it makes it easy to have different time outs for the expected-timeout vs not-expected-timeout case.
<niemeyer> TheMue: As a counterexample, this is the kind of comment that makes sense to have in the test:
<niemeyer> TheMue: http://paste.ubuntu.com/936679/
<wrtp> TheMue: when you don't expect a timeout, it can be much longer because it doesn't interfere with the normal progress of the test
<niemeyer> TheMue: Comments are not about what the code is doing.. they are about what we can't see
<wrtp> TheMue: the final timeout should be shorter.
<niemeyer> wrtp++
<TheMue> wrtp: The timeout could be an argument of a helper too.
<niemeyer> ROTFL
<niemeyer> I rest my case..
<wrtp> lol too
<TheMue> So at least one success. ;)
<TheMue> Lunchtime ...
<niemeyer> TheMue: Enjoy
<TheMue> niemeyer: so,  https://codereview.appspot.com/6059044 is in
<niemeyer> TheMue: LGTM, thanks!
<TheMue> TheMue: Fine, thx. Will submit and merge it to go on with next one. It's the ConfigWatcher in state.go. What do you think about Williams idea of moving all watchers of the state package to a file named watchers.go?
<niemeyer> TheMue: Sounds fine, but please do it in a separate CL, or we'll lose all comments now
<TheMue> niemeyer: OK, will move the after all are in after an LGTM.
<TheMue> s/the/them/
<mthaddon> niemeyer: got all the changes prepped for moving to config based, and with mongodb auth in place - may be a short service interruption while the changes are applied (will let you know as and when)
<niemeyer> mthaddon: Sounds great
<niemeyer> mthaddon: Thanks
<mthaddon> np
<mthaddon> niemeyer: ok, config updates being applied now - may be service interruption for a bit, will let you know when done
<niemeyer> mthaddon: Cheers
<mthaddon> niemeyer: the appserver processes are running, but not responding on port 8080 to our nagios checks
<TheMue> niemeyer: here's some code diving before ocean diving -> https://codereview.appspot.com/6056047
<mthaddon> niemeyer: ah, I think this is because I've set api-addr: localhost:8080
<mthaddon> niemeyer: to listen on all interfaces, what should I set that to? 0.0.0.0:8080 ?
<niemeyer> mthaddon: Yeah, that'll work
<mthaddon> k, thx
<niemeyer> TheMue: done :)
<TheMue> niemeyer: Hmm, no notification here. Looks good so that I can move on?
<niemeyer> TheMue: Please check it out
<niemeyer> TheMue: I forgot to submit somehow
<niemeyer> TheMue: It LGTM, but there's something to change
<fwereade_> niemeyer, I think I need some guidance re response on https://codereview.appspot.com/5756054/
<niemeyer> fwereade_: Sounds good.. I'll have some breakfast and we can discuss it
<niemeyer> fwereade_: Just replied to the thread in the ML again, btw
<fwereade_> niemeyer, swet, tyvm
<niemeyer> mthaddon: All good there? Can I get some breakfast? :)
<TheMue> niemeyer: OK, thx. Will change that one point and then submit it.
<mthaddon> niemeyer: yep, all looking good thx
<mthaddon> niemeyer: go eat!
<niemeyer> mthaddon: Cheers! Will run tests later
<wrtp> niemeyer: you need to tag the tomb package (or remote all the existing tags)
<wrtp> s/remote/remove/
<niemeyer> wrtp: Hmm, I believe the tags there are right
<wrtp> niemeyer: i don't see a go1 tag
<wrtp> niemeyer: i did go get -u and everything broke
<niemeyer> wrtp: There isn't one.. that shouldn't be a problem, I believe
<wrtp> niemeyer: try it
<wrtp> niemeyer: it went back to release 12, i believe
<niemeyer> [niemeyer@gopher ~]% GOPATH=$PWD/gopath go get launchpad.net/tomb
<niemeyer> [niemeyer@gopher ~]%
<wrtp> get -u
<niemeyer> [niemeyer@gopher ~]% mkdir gopath
<wrtp> and then try to build juju/go
<wrtp> niemeyer: BTW i'm seeing a store test failure: http://paste.ubuntu.com/936825/
<niemeyer> Will have to increase the timing then.. there's a race there due to the background coutning
<niemeyer> counting
<niemeyer> wrtp: Can you please try removing your local tomb package?
<niemeyer> wrtp: The tags are right
<wrtp> niemeyer: that worked. maybe the go pull process didn't update the tags properly.
<wrtp> gotta go to lunch
<niemeyer> wrtp: That's what I imagined
<niemeyer> mthaddon: Oh, ouch.. forgot a detail
<niemeyer> mthaddon: Can you please update the nagios check with stats=0?
<niemeyer> mthaddon: Btw, did you really intend to hit the store about twice a second with the nagios check?
<niemeyer> fwereade_: ping
<fwereade_> niemeyer, pong
<niemeyer> fwereade_: Heya
<niemeyer> fwereade_: So..
<niemeyer> fwereade_: Quick experiment..
<niemeyer> fwereade_: Don't look at the code for a moment
<fwereade_> niemeyer, ok :)
<niemeyer> fwereade_: What has to happen for something to actually be logged?
<niemeyer> fwereade_: In terms of the command setup
<fwereade_> niemeyer, someone to call ctx.InitLogging?
<niemeyer> fwereade_: More..
<fwereade_> niemeyer, the flags to have been set such that InitLogging does something?
<niemeyer> fwereade_: More..
<fwereade_> niemeyer, someone else to call log.Printf/Debugf?
<niemeyer> fwereade_: That too.
<niemeyer> fwereade_: The log commands have to be provided, the command has to call InitLog.. Printf has to be called.. SetsLog has to be true..
<fwereade_> niemeyer, ok, but the breakage occurs at the point where someone calls InitLog
<niemeyer> fwereade_: In your latest comment, you demonstrate a long chain of recommendations
<fwereade_> niemeyer, that's the point at which the host process's logging setup is potentially borked
<niemeyer> fwereade_: but all of them are actually going towards the same goal.. this is too simple a problem for the maze being put in place
<niemeyer> fwereade_: I'm personally lost in that maze already.. I don't know why we have InitLog anymore, for example
<niemeyer> fwereade_: I'm sure there's a good reason.. it's just left my consciousness
<niemeyer> fwereade_: I'd appreciate if we could simplify what's there, rather than introducing additional flags that put logging behind yet another yes/no flag
<fwereade_> niemeyer, it's a little bit lost in the fog of time for me too, but I seem to recall feeling that the code was confirming it was a good idea when the pile of imports in supercommand dropped significantly
<fwereade_> niemeyer, I'm all about the simplicity, even if I don't always zero in on it perfectly without help ;)
<niemeyer> fwereade_: Well, we're all guilty of displeasing the simplicity gods frequently :)
<fwereade_> niemeyer, it does seem that logging and subcommand selection are essentially orthogonal problems that are only implemented in the same place because it has hitherto been convenient
<fwereade_> niemeyer, and we're now in a situation where the one does not imply the other, and that therefore they should be broken up somehow
<fwereade_> niemeyer, a whole new type feels a bit heavyweight; a flag and a bunch of ifs (I agree) feel a bit ugly
<niemeyer> fwereade_: I think a new type that handles the part of the problem of a supercommand preparing a subcommand to might actually simplify things
<niemeyer> fwereade_: and make that logic more readalbe
<fwereade_> niemeyer, awesome; then I'll try that direction
<niemeyer> fwereade_: It feels clever, and "saves a type" (?), but at the same time this isn't the first "Wait, what?" moment we reach with that logic
<fwereade_> niemeyer, I was worried you might see it as wanton code raviolation
<niemeyer> fwereade_: It may be time to at least check how that'd look like
<niemeyer> fwereade_: I suspect it will be simple by itself, and will also make the supercommand implementation more obvious
<fwereade_> niemeyer, yeah, I think so too
<niemeyer> fwereade_: We basically need just a wrapper that handle half of what the supercommand does today, if I understand it well
<niemeyer> handles
<mthaddon> niemeyer: that's haproxy hitting it to confirm the service is up or not - is there a different URL we can use for that?
<fwereade_> niemeyer, exactly; thanks for the guidance :)
<niemeyer> mthaddon: Cool.. I'm not too concerned about the traffic by itself
 * fwereade_ is rather proud of "raviolation"
<niemeyer> mthaddon: At least not yet.. it's good that something is exercising a bit of load there
<mthaddon> niemeyer: so in terms of updating the nagios check - you want stats=0 used how?
<niemeyer> mthaddon: I just find unfortunate that we don't *actually* had 8000+ people interested in Jenkins in the last hour! ;-)
<mthaddon> heh
<niemeyer> mthaddon: This is an http query parameter
<mthaddon> niemeyer: and still for the /charm/oneiric/jenkins URL?
<niemeyer> mthaddon: stats=0 will disable the stats collection
<niemeyer> mthaddon: Yeah
<mthaddon> niemeyer: and we should update haproxy to use the same?
<niemeyer> mthaddon: Yeah, please
<niemeyer> 8.3k and counting, literally
<mthaddon> k, thx
<niemeyer> mthaddon: Then, let's clean up stats to avoid the bogus spike please
<mthaddon> ok, I'll let you know when that's done and you can talk me through cleaning up the stats
<niemeyer> TheMue: Any other branches for review?
<TheMue> niemeyer: I'm doing the one with the NeedsUpgradeWatcher.
<TheMue> niemeyer: You mentioned to refactor the test for the ConfigWatcher too. Here I have to think a bit. The ConfigNode has no public fields, only access methods.
<TheMue> niemeyer: So I'm thinking how I best can test it via tables.
<niemeyer> TheMue: Thanks, no worries.. just wanted to make sure I wasn't missing something
<TheMue> niemeyer: Not yet, I'll notify you.
<wrtp> TheMue: i just got a test failure: http://paste.ubuntu.com/936882/
<TheMue> wrtp: Oh, strange, here it has been ok (from inside the editor and from commandline just before submit).
<wrtp> TheMue: i'll try pulling and testing again
<wrtp> TheMue: i tried again (without pulling) and it succeeded.
<TheMue> wrtp: Uuuh, even more strange.
<wrtp> TheMue: it looks to me like that code is racy
<wrtp> oh, maybe not
<TheMue> wrtp: In method I use no goroutine
<wrtp> i wonder what value got received.
<wrtp> TheMue: WatchConfig starts a goroutine, no?
<TheMue> wrtp: Yes. I meant inside the test to start changing values later.
<wrtp> TheMue: i was thinking that we don't know exactly when the channel is closed. but it doesn't (shouldn't?) matter in fact.
<wrtp> TheMue: slightly concerning, but it seems to pass consistently now.
<TheMue> wrtp: Needs to get warm. *smile*
<TheMue> wrtp: I'm currently in that test file and refactor the tests. So please wait a moment if possible, than I can check it too.
<TheMue> wrtp: For tests you could also take and print the received change. There shouldn't been any one after the error. So I think the value will be nil, but the channel is still open.
<wrtp> TheMue: yes, the test could print the received value
<TheMue> wrtp: The 'illegal content error' to be more special.
<TheMue> wrtp: Otherwise with the race condition it's hard to say "Hey, we know the channel will close soon, but not exactly when.". *sigh*
<wrtp> TheMue: i don't think there's a race condition
<wrtp> TheMue: if a value is received, there's an error
<wrtp> TheMue: (because the watcher has sent a value when it should've)
<TheMue> wrtp: I've got an idea, but I'm not yet sure. I'll take a look during test refactoring.
<niemeyer> mthaddon: Just run a quick test with the store, btw..
<niemeyer> mthaddon: Can you please mail me logs when you have a moment?
<mthaddon> niemeyer: which logs are you interested in? there's very little in charmd.log
<niemeyer> mthaddon: That was the one.. if it has little, that's good
<mthaddon> niemeyer: nothing in there since March
<niemeyer> mthaddon: Super
<niemeyer> mthaddon: Hmm.. can you do me a quick favor?
<niemeyer> mthaddon: How many processors do we have in the frontend servers?
<mthaddon> niemeyer: each one is 8 cores
<niemeyer> mthaddon: What else run in them?
<mthaddon> niemeyer: they both run the full stack, apache/haproxy/appserver/mongodb
<niemeyer> mthaddon: Super
<niemeyer> mthaddon: Can you please export an env var
<niemeyer> mthaddon: For charmd, specifically
<niemeyer> mthaddon: export GOMAXPROCS=4
<mthaddon> niemeyer: like this? https://pastebin.canonical.com/64567/
<niemeyer> mthaddon: Yeah, thanks
<niemeyer> mthaddon: I mean, I think..
<niemeyer> mthaddon: Not versed in the syntax there
<mthaddon> sure, I'll test it first, just checking that looks right in the upstarts script really
<niemeyer> mthaddon: Yeah
<niemeyer> mthaddon: This will allow the runtime to spread the goroutine across more cores
<niemeyer> goroutines
<mthaddon> k
<niemeyer> mthaddon: Any luck there? Just want to run another test before we reset the data
<mthaddon> niemeyer: just filtering through our puppet infrastructure atm - will let you know when it's been applied (currently pending a review from another member of IS)
<niemeyer> mthaddon: Wow, cool
<mthaddon> but it "dry-ran" clean, so shouldn't be too long
<niemeyer> mthaddon: I'm heading out for lunch.. I'd appreciate if we could deploy those changes and then do the cleanup today still so that we can start collecting stats for real
<mthaddon> niemeyer: sure
<TheMue> niemeyer: https://codereview.appspot.com/6050053/ is in.
<wrtp> just gonna reboot.
<mthaddon> niemeyer: changes all applied - let me know what needs to be done to clean up the stats
<rog> niemeyer: you said "the ssh identity is configurable" but it's not configurable from environments.yaml. should it be?
<fwereade_> gn all, might be back a bit later
<niemeyer> mthaddon: Can start with: use juju; db.stat.counters.count()
<niemeyer> fwereade_: Night!
<mthaddon> niemeyer: 288
<niemeyer> mthaddon: Nice
<niemeyer> Well below the 40k+ actual counts made
<niemeyer> mthaddon: So we just have to clean that up
<niemeyer> db.stat.counters.remove()
<mthaddon> ok, done
<niemeyer> mthaddon: Thanks!
<mthaddon> that's it?
<niemeyer> mthaddon: We're good then
<niemeyer> mthaddon: Yeah
<mthaddon> sweet, that was easy
<niemeyer> mthaddon: Wait
<niemeyer> mthaddon: Almost.. there's something wrong..
<niemeyer> mthaddon: Something is still poking at the URL without the stats=0 setting
<mthaddon> hmm, I wonder if haproxy strips the query string...
<niemeyer> mthaddon: It souldn't, at least
<niemeyer> mthaddon: 200+ since reset already
<mthaddon> must be haproxy then - I'll take a look - may need to quote the URL or something
<niemeyer> mthaddon: Is the 0 after = a zero?
<niemeyer> mthaddon: I know it's obvious, sorry, but just to be sure
<mthaddon> yeah, is definitely a zero
<TheMue> so, off for today, bank appointment. have a nice evening.
<mthaddon> niemeyer: how are you checking the stats, just so I can confirm when I try a few possible fixes?
<niemeyer> TheMue: Cheers
<niemeyer> mthaddon: Sent
<mthaddon> thx
<niemeyer> mthaddon: Meanwhile, I'm checking to see if there's a bug in the code.. I had tests, but maybe the tests are broken
<TheMue> niemeyer: you've seen, next proposal is in and tomorrow I'll continue.
<mthaddon> k
<niemeyer> TheMue: Sounds great, thanks
<TheMue> niemeyer: np, and have fun at diving
<niemeyer> TheMue: Cheers man.. will try to provide you some good feedback so you can move on tomorrow
<TheMue> niemeyer: thx, bye
<mthaddon> hmm, according to http://code.google.com/p/haproxy-docs/wiki/httpchk, query strings are permitted, and they don't say anything about needing to quote them
<niemeyer> mthaddon: I believe there's a bug in the code.. the test is probably not good enough
<mthaddon> ah, okay
<niemeyer> mthaddon: Will fix it and ping you
<niemeyer> mthaddon: (even if you're not around)
<mthaddon> niemeyer: cool cool - will pick up the ping in the morning if it's past my EOD and can do a rollout whenever
<niemeyer> mthaddon: Sounds great.. please repeat the remove after the deployment
<mthaddon> yep yep
<niemeyer> mthaddon: So we can get meaningful numbers from then on already
 * mthaddon nods
<niemeyer> mthaddon: Thanks for your help today
<mthaddon> np
<niemeyer> Hah.. found the test bug.. the URL being pinged is wrong, which means we don't need stats=0 to not have counters.. :(
 * niemeyer fixes it
<rog> [LOG] 38.27388 JUJU state: ssh error (will retry: true): ssh: Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-17-virtual i686)
<rog> ha ha
<niemeyer> rog: :-)
<rog> niemeyer: i can't see how the python version stops the ssh session from going into interactive mode
<niemeyer> rog: That's what one of the flags do
<rog> niemeyer: i'm using all the same flags, i think
<rog> niemeyer: for the record: "ssh" "-T" "-o" "StrictHostKeyChecking no" "-o" "PasswordAuthentication no" "-L" "localhost:37722:localhost:2181" "-p" "22" "ubuntu@ec2-174-129-123-190.compute-1.amazonaws.com"
<niemeyer> rog: -N
<rog> niemeyer: ah, thanks. the python version doesn't seem to use that flag though. strange.
<niemeyer> rog: Maybe it's doing the unthinkable.. :-)
<rog> niemeyer: maybe it's ignoring stdout.
<niemeyer> rog: Right, maybe it *is* sitting idle with a shell.. which would be quite funny
<rog> niemeyer: that would probably explaining it.
<rog> niemeyer: yes, i think that's what must be happening. lol.
<rog> niemeyer: funny, adding that option triggered a "Permissions 0664 for '/home/rog/src/go/src/launchpad.net/juju/go/state/sshtest/id_rsa' are too open." error. ssh *is* baroque.
<niemeyer> rog: Woah
<niemeyer> rog: Vhat geev
 * rog hopes that bzr respects permissions.
<niemeyer> rog: Uh oh.. it doesn't
<niemeyer> rog: +x only
<rog> guess i'll add a chmod then
<niemeyer> rog: Yeah, the charm tests do something like that already
<niemeyer> rog: on init(), IIRC
<niemeyer> rog: Would you mind to have a quick look at this: https://codereview.appspot.com/6072046
<rog> sure.
<rog> niemeyer: BTW all tests just passed, using the ssh connect.
<niemeyer> rog: Woohay
<rog> which is highly satisfactory!
<niemeyer> rog: I can imagine
<niemeyer> I'm excited myself
<niemeyer> This is a critical roadblock
<niemeyer> rog: Btw, I'll reduce the 5 back to 2.. that wasn't the issue
<rog> niemeyer: ECONTEXT
<niemeyer> rog: The CL
<rog> oh yeah, i see
<niemeyer> rog: I've changed the timing there to attempt to cause an error, but I'll put that back
<rog> :3 lol
<niemeyer> rog: Seriously..
<rog> sorry i missed that!
<niemeyer> rog: The best compilers wouldn't catch that..
<niemeyer> rog: Sorry that I wrote it!
<niemeyer> :-)
<rog> that's why we have reviewers. i failed!
<niemeyer> rog: What's funny is that this is *exactly* a case where there was a non-obvious bug on the other side
<rog> niemeyer: i think the basic problem was that the test was testing for the *absence* of something.
<niemeyer> A pathetic test when we most need it
<rog> niemeyer: (which it still does AFAICS)
<niemeyer> rog: True
<niemeyer> rog: yeah, but it works now
<rog> lol
<rog> until something changes
<niemeyer> rog: Well.. there's nothing we can do about *that* :-)
<rog> niemeyer: the test should really try an action that does something, test that the something succeeded, *and* check that no stats were taken
<niemeyer> rog: That's exactly what it does right now
<niemeyer> rog: Well, it's not testing that it succeeded
<niemeyer> rog: That's a good idea
<rog> niemeyer: yeah
<rog> niemeyer: how come the GET of /charms/xxx was succeeding, BTW?
<niemeyer> rog: It wasn't..
<rog> oh yeah, that was just the NewRequest error check
<rog> so... that's my feedback
<niemeyer> rog: Cheers.. already on the way
<niemeyer> rog: Updated
<rog> niemeyer: LGTM
<niemeyer> rog: Cheers
<niemeyer> rog: Was a good hint, thanks
<rog> niemeyer: pleasure
<niemeyer> mthaddon: Code is good to go on 122..
<niemeyer> Taking a short break
<rogpeppe> niemeyer: https://codereview.appspot.com/6074044/
<niemeyer> rogpeppe: Cool, I'm sitting down on them now
<rogpeppe> niemeyer: lovely, thanks
<niemeyer> rogpeppe: re. the "just in case of what", that was to close the channel
<niemeyer> rogpeppe: errorc
<rogpeppe> niemeyer: why would closing it be a good thing?
<niemeyer> rogpeppe: You're right, looking at how it's used there isn't a good reason indeed
<niemeyer> rogpeppe: Should -N be added to the list of args?
<rogpeppe> it is... i thought!
<rogpeppe> oh, of course, that's in the next CL
<rogpeppe> can we leave it until the next CL?
<rogpeppe> it doesn't affect the tests
<rogpeppe> niemeyer: ^
<niemeyer> rogpeppe: Yeah, that's fine
<niemeyer> rogpeppe: Going through the testing right now, btw
<rogpeppe> niemeyer: cool.
<niemeyer> rogpeppe: Btw, looking at the issue people are having with the stock ssh package, our decision seems to be paying off
<rogpeppe> niemeyer: yeah.
<rogpeppe> niemeyer: plus, i hope we can move away from ssh entirely before long
<niemeyer> rogpeppe: Indeed
<rogpeppe> niemeyer: i'll be very happy when i delete this code :-)
<niemeyer> rogpeppe: Me too :-)
<niemeyer> rogpeppe: At bed last night I had that crazy thought  that replacing zookeeper with mongodb might not be so hard and may make certain things easier on the transition to an HTTPS API
<rogpeppe> niemeyer: does mongodb do non-polling notifications?
<niemeyer> rogpeppe: Yeah, there's a little used trick for that, with capped collections
<niemeyer> rogpeppe: One can tail a capped collection, basically
<rogpeppe> niemeyer: have you got a reference?
<rogpeppe> i'm not at all familiar with mongodb
<niemeyer> rogpeppe: http://godoc.labix.org/mgo#Query.Tail
<niemeyer> I hope that exists.. /me clicks :)
<niemeyer> Yeah, it does
<rogpeppe> hmm, sounds a bit like a hack to me
<niemeyer> rogpeppe: Wow, that was quick :)
<rogpeppe> niemeyer: seems like we'd be building functionality on top of a feature that really wasn't designed to be used that way
<niemeyer> rogpeppe: I don't know what you mean.. tail was used exactly for that use case
<niemeyer> s/used/created/
<rogpeppe> "Capped collections are not shard-able". is that a problem?
<rogpeppe> niemeyer: perhaps it doesn't mean what i think it means
<niemeyer> rogpeppe: No.. we'd definitely not use shards
<niemeyer> rogpeppe: shard != replica set
<rogpeppe> niemeyer: ah
<rogpeppe> niemeyer: so you could have capped collection with a size of 1 and it would be roughly equivalent to a zk watch?
<niemeyer> rogpeppe: We'd probably use something like 4 or 8k collections, and watch on specific patterns
<niemeyer> rogpeppe: I *think* I can recreate exactly the same API we have in the zookeeper package on top of mongodb with some ease
<rogpeppe> niemeyer: but... what would we gain?
<niemeyer> rogpeppe: Quite a few things
<niemeyer> - Non-RAM storage.. means we can drop the use of S3 for several things
<rogpeppe> niemeyer: that would indeed be nice.
<niemeyer> - SSL and authentication
<niemeyer> - Automatic join and departure of cluster members (zk doesn't do that!)
<rogpeppe> niemeyer: that would be irrelevant if we moved to an HTTPS API
<niemeyer> rogpeppe: Not entirely.. we can move to HTTPS step by step..
<niemeyer> rogpeppe: We could switch over to mongodb first, without an HTTPS api, just by reimplemneting the zk package
<niemeyer> rogpeppe: This is an easy switch over.. no changes to the state package
<rogpeppe> niemeyer: so you think you can create a "mongo-zk" package that looks almost exactly like the zk package?
<niemeyer> rogpeppe: Right.. at least almost-exactly-enough to our purposes
<rogpeppe> niemeyer: that would be quite a cool thing
<niemeyer> rogpeppe: So very low risk
<niemeyer> Another one:
<niemeyer> - Look ma', no Java!
<niemeyer> :)
<rogpeppe> niemeyer: now yer talkin'!
<rogpeppe> niemeyer: is your mongodb client pure go?
<niemeyer> rogpeppe: Yeah
<rogpeppe> niemeyer: another benefit then: no cgo!
<niemeyer> Yep!
<niemeyer> Although that hasn't been much of an issue in practice
<niemeyer> (Java is)
<rogpeppe> niemeyer: so... you'd use a collection for each zk node?
<niemeyer> rogpeppe: No.. a single collection to the whole FS structure, indexed on path
<rogpeppe> niemeyer: ah, the mgo query defines the watch
<niemeyer> rogpeppe: Right!
<niemeyer> rogpeppe: and the watched collection would be a separate one
<niemeyer> rogpeppe: fs.data and fs.events for example
<rogpeppe> niemeyer: right
<rogpeppe> niemeyer: how do you choose a good cap size?
<rogpeppe> niemeyer: for fs.events, that is
<niemeyer> rogpeppe: In practice, given what watches means for us today, pretty much anything non-absurdly small would do
<niemeyer> rogpeppe: They must be large enough for the application to "breath" and still not miss events
<rogpeppe> niemeyer: yeah, because historical events don't matter of course.
<niemeyer> rogpeppe: We could pick something like 1M for example, and be pretty sure to never worry except perhaps on absurdly large cases
<niemeyer> rogpeppe: Right
<niemeyer> rogpeppe: Funny enough, we might actually avoid the polling we're introducing in the presence package with this mechanism
<niemeyer> rogpeppe: Since the watching is less ephemeral than with zk proper
<rogpeppe> niemeyer: i don't *think* so
<rogpeppe> niemeyer: because we want to know when a client goes away
<rogpeppe> niemeyer: so we *want* some kind of ephemerality
<rogpeppe> niemeyer: which is what the presence package gives us
<niemeyer> rogpeppe: Yeah, indeed.. we still need to update the note
<niemeyer> nod
<niemeyer> node
<rogpeppe> niemeyer: but at least we *have* moved away from dependence on ephemeral nodes
<rogpeppe> niemeyer: which is something you couldn't easily emulate, i think.
<rogpeppe> niemeyer: so your event collection is just a set of (path, operation) tuples, right?
<rogpeppe> niemeyer: and Tail is guaranteed to return events in order?
<niemeyer> rogpeppe: Yeah
<niemeyer> (to both)
<niemeyer> rogpeppe: capped collections + tail was implemented to support the operation log of MongoDB itself
<niemeyer> rogpeppe: Replication is based on that
<niemeyer> rogpeppe: So we can be pretty comfortable that they'll pay attention to the feature working well
<rogpeppe> niemeyer: of course, another plus point - we can use mongodb to implement juju logging
<niemeyer> rogpeppe: True
<niemeyer> rogpeppe: Not only juju, actually.. we might stream log for all the machines
<rogpeppe> niemeyer: definitely. i've been wanting to do that.
<niemeyer> Although, we should be careful with that
<rogpeppe> niemeyer: indeed - we could swamp the network
<rogpeppe> niemeyer: but for debugging purposes, i think it's crucial
<niemeyer> rogpeppe: I wouldn't say the network, but the mongodb master for sure
<rogpeppe> niemeyer: ah
<rogpeppe> i'd be sorely tempted to streamline the "zk" API when redoing it
<niemeyer> rogpeppe: Well, that's trivial to do after we're done migrating
<niemeyer> rogpeppe: I mean, I'd be happy to have it improved too
<niemeyer> rogpeppe: But would try to mimic it precisely first
<niemeyer> rogpeppe: So we can play with the idea in a risk less fashion until we're certain we want to flip
<niemeyer> rogpeppe: After flipping, we have no great attachment to it
<niemeyer> rogpeppe: Review delivered!
<rogpeppe> niemeyer: yeah, you're right
<rogpeppe> niemeyer: brilliant, thanks!
<rogpeppe> niemeyer: i expected more nasties there. thanks a lot. will fix tests in the morning. time to go now.
<rogpeppe> niemeyer: i expect to see a working version of mongo-zk by then :-)
<niemeyer> rogpeppe: Haha :)
<niemeyer> rogpeppe: Given the problem, I quite like the approach
<niemeyer> rogpeppe: I'd prefer to not have any of it, as you would
<niemeyer> rogpeppe: Let's just try to make that stuff solid meanwhile, and as clean as possible
<niemeyer> rogpeppe: In that sense, your branch feels like a win. Thanks.
#juju-dev 2012-04-20
<wrtp> fwereade_, TheMue: morning
<fwereade_> heya wrtp
<TheMue> wrtp, fwereade_ : moin
<wrtp> fwereade_: i'd quite like a chat about deploying Go juju some time
<fwereade_> wrtp, sounds good; 5 mins?
<wrtp> fwereade_: cool
<fwereade_> wrtp, invite out
<wrtp> fwereade_: ah, actually i can't do a verbal chat now, 'cos i'll wake up the sleeping angel
<fwereade_> wrtp, np, ready all the same :)
<fwereade_> wrtp, (you mean I got dressed for nothing? :p)
<wrtp> fwereade_: lol
<wrtp> fwereade_: so...
<wrtp> fwereade_: as far as i can see we've got two different problems (or maybe three)
<wrtp> fwereade_: 1) how do we deploy an appropriate binary at development time
<wrtp> fwereade_: 2) how do we deploy binaries that aren't for the platform we're currently on
<wrtp> 3) (?) how do we make sure that we're deploying the right version?
<fwereade_> wrtp, yeah, I think all 3 are real, but niemeyer's plan seems to cover (3) pretty well
<wrtp> 1) is difficult unless we can assume that we're only testing on a compatible platform to the dev platform, which maybe isn't a bad assumption
<wrtp> fwereade_: remind me of niemeyer's plan again
<fwereade_> wrtp, I think that assumption is the right one for dev mode at least to start out
<fwereade_> wrtp, the versioning thread I was being obtuse in yesterday
<fwereade_> wrtp, not sure I can summarise it very well
<wrtp> fwereade_: on the mailing list?
<fwereade_> wrtp, yeah
<wrtp> fwereade_: ah, i totally missed that; "Bootstrap scheme for Go port"
<fwereade_> wrtp, but basially the initial email covers it, with the addition of the odd-number-anywhere-denotes-dev-version bit
 * wrtp is reading
<wrtp> fwereade_: by "semantic versioning" he's referring to this: http://semver.org/ ?
<fwereade_> wrtp, yes
<fwereade_> wrtp, the odd-number dev versions break what's suggested there AFAICT, but it's pretty clear all the same
<wrtp> fwereade_: isn't the odd-number proposal using odd numbers for the same kind of thing that "pre-release" versions are used in the original sem. ver. spec?
<fwereade_> wrtp, it is indeed, so it's pretty clearly comprehensible
<wrtp> fwereade_: so... why not just use prerelease versions so we'll be fully semver compatible?
<fwereade_> wrtp, but it's still an exception that is not obvious if you just say "we're using semver"
<fwereade_> wrtp, well, say we've released 4.0.0
<fwereade_> wrtp, for testing major upgrades of dev versions sanely we'll want to be using 5.x.x
<wrtp> fwereade_: 5.x.x-alpha.0 ?
<fwereade_> wrtp, but assuming 2 breaking changes in a dev cycle, we don't want to make users go 4->8 next release, it would be weird
<wrtp> fwereade_: i don't see the problem. we can do 5.x.x-alpha.0, 5.x.x-alpha.1 etc etc
<wrtp> 5.0.0-alpha.0, 5.0.0-alpha.1 actually
<fwereade_> wrtp, "Major version X (X.y.z | X > 0) MUST be incremented if any backwards incompatible changes are introduced to the public API. It MAY include minor and patch level changes. Patch and minor version MUST be reset to 0 when major version is incremented."
<fwereade_> wrtp, I think we break it one way or another whatever we do
<wrtp> fwereade_: i'm not sure that pre-release versions need to be strictly compatible in the same way
<fwereade_> wrtp, but if we're using the version for upgrade/compatibility logic we need to be able to have sane versions when we're testing that we can actually upgrade, don't we?
<fwereade_> wrtp, it's really just extending the definition of pre-release
<fwereade_> wrtp, maybe I'm wrong and the existing provisions satisfy it
<wrtp> fwereade_: am just trying to sort it out in my head :-)
<fwereade_> wrtp, I just recoil a little from worrying about them in the context of the upgrade logic
<wrtp> fwereade_: i still think i don't see the problem. we can use exactly the same logic that's described on the semver page.
<wrtp> fwereade_: but i'd suggest that in "development mode" the "pre-release" logic switches to giving pre-release a higher precedence than the normal version.
<wrtp> fwereade_: as it is, the comparison logic with odd numbers will be weird.
<wrtp> s/as it is/as proposed/
<fwereade_> wrtp, the core of the problem is that we can't signal a backwards-incompatible change between two differentr pre-release versions with the same major version... can we?
<fwereade_> wrtp, how's it weird? I *think* the logic will be just the same
<fwereade_> wrtp, we just don't release odd-numbered versions to the canonical location
<wrtp> fwereade_: ah, i'd missed that twist
<wrtp> fwereade_: i'm not sure why we'd need to signal a backwards-incompatible change between two different pre-release versions
<wrtp> fwereade_: we don't care about breaking dev deployments
<fwereade_> wrtp, yeah, but to fulfil semver we ought to
<fwereade_> wrtp, ...right?
<fwereade_> wrtp, I don't see it carving out an incompatibility exception for pre-release versions, but maybe I'm just blind
<wrtp> fwereade_: i'm not sure. it doesn't talk about prerelease versions much. and why would you have a prerelease version if it wasn't to find problems and fix them *before* releasing the one that does it "right".
<wrtp> ?
<wrtp> i think if we use pre-release versions in this way, we don't need two repositories and we can just use the normal semver logic
 * fwereade_ thinks
<wrtp> in fact, we don't even need a "development mode" switch, i realise, because we can just up the client version to, say 5.0.0, which is satisfied by 5.0.0-alpha1 until 5.0.0 real version is released.
<wrtp> fwereade_: semver doesn't talk about any compatibility requirements *between* pre-release versions AFAICS
<fwereade_> wrtp, hm, don't we need a dev repo for that all the same? otherwise we have to have a speculative version bump every time we start a new cycle
<wrtp> fwereade_: that sounds right to me
<fwereade_> wrtp, the speculative version bump?
<wrtp> fwereade_: yeah
<fwereade_> wrtp, not convinced
<wrtp> fwereade_: but if it turns out the version bump isn't needed, we can release as a lower version.
<fwereade_> wrtp, I think we should be shooting for releases that *are* backward compatible in general
<wrtp> fwereade_: definitely
<fwereade_> wrtp, that feels like it *also* breaks semver
<wrtp> hmm, yeah, probably.
<wrtp> fwereade_: but won't we have to increment speculatively with the odd-numbering thing too?
<fwereade_> wrtp, hmm
<fwereade_> wrtp, yeah :(
<wrtp> fwereade_: i don't think i mind the speculative versioning. i think it maps quite well to what we're actually doing - building a version that is planned to become a real version at some point.
<wrtp> fwereade_: and if we decide the version is wrong, we can delete the wrongly-speculatively-versioned prerelease versions
<fwereade_> wrtp, wait a mo, with the odd-numbered-versions we *don't* have to
<fwereade_> wrtp, release 4.0.0; dev work starts on 4.0.1; we release 4.0.2 from latest state of 4.0.1 when we're ready
<wrtp> fwereade_: no?
<fwereade_> wrtp, we only have to bump the major version at the point we introduce something that actually is incompatible
<wrtp> fwereade_: how's that different from: release 4.0.0; dev work starts on 4.0.1-alpha.0; we release 4.0.1 from latest pre-release version of 4.0.1 ?
<fwereade_> wrtp, ok, I think you're right :)
<fwereade_> wrtp, don't think we can avoid the separate repos though
<wrtp> fwereade_: why's that?
<fwereade_> wrtp, because 4.0.1-alpha.0 will appear to be a valid version for 4.0.0 to deploy on the server
<fwereade_> wrtp, I think we may be making too much of this really -- the public repo should only have the stuff we've actually released
<fwereade_> wrtp, so we need private repos when we're testing
<wrtp> fwereade_: that's probably true
<wrtp> fwereade_: so the public repo is the dev repo without the prerelease versions
<wrtp> fwereade_: alternatively the comparison logic could be configurable to make it not consider pre-release versions.
<fwereade_> wrtp, yeah, that just crossed my mind too
<fwereade_> wrtp, ok, tell you what, we can get around all this by spinning up parallel universes to do dev work in and just merge the timelines later
<fwereade_> wrtp, I'll go find a physicist ;)
<wrtp> SGTM
<wrtp> fwereade_: might get a few weird effects when we get conflicts tho'
<fwereade_> wrtp, more seriously, I hadn't really been thinking about a central dev repo
<fwereade_> wrtp, details details ;)
<fwereade_> wrtp, as devs the only versions we care about are released ones and whatever we're working on at the moment
<fwereade_> wrtp, there may be a preview repo somewhere for other people to test against, with the no-backward-compatibility-guarantees exception
<wrtp> fwereade_: there's another issue if devs are sharing the same repo: you'll have to explicitly set your client version number to your "own" pre-release version, otherwise you'll get someone else's (unless yours happens to compare greater than others')
<fwereade_> wrtp, yeah, indeed, hence my private-repo assumptions
<wrtp> fwereade_: i think it could work ok though.
<wrtp> fwereade_: even with a public repo
<wrtp> fwereade_: we could accept an environment variable "JUJU_PRERELEASE" which would give the prerelease version that you're working on.
<wrtp> e.g. JUJU-PRERELEASE=rog.0
<fwereade_> wrtp, I'd really prefer to have the repo location be the only moving part
<wrtp> fwereade_: yeah, you're probably right.
<TheMue> re
<wrtp> fwereade_: http://gopkgdoc.appspot.com/pkg/launchpad.net/~rogpeppe/+junk/version
<wrtp> fwereade_: that implements the "must be at latest version to use pre-release versions" rule, as well as the rest of the semantic versioning rules as far as i could make them out.
<fwereade_> wrtp, that seems pretty neat really
<fwereade_> wrtp, let's see what niemeyer thinks :)
<wrtp> fwereade_: cool. yeah, let's see.
<wrtp> fwereade_: oh yes, i forgot, gustavo's not around today
<TheMue> wrtp: Could you please add more doc for the structs fields? Especially the semantics of prerelease and build and how they are handled in Less().
<TheMue> fwereade_: time for a talk about "force upgrade" for units?
<fwereade_> TheMue, ofc :)
<fwereade_> TheMue, I'm no expert but I'll help if I can
<fwereade_> TheMue, re wrtp's stuff, http://semver.org/ shoudl cover most of it
<TheMue> fwereade_: The current watcher implementation only observes if the node in ZK is created. And you wrote a comment that this isn't enough.
<TheMue> fwereade_: Oh, after a quick look an interesting doc. Thx, will read it later.
<fwereade_> TheMue, juju/agents/unit.py:197 is probably a good place to start looking
<TheMue> fwereade_: Will take a look.
<fwereade_> TheMue, basically people want to be able to upgrade broken units
<fwereade_> TheMue, and the default implementation only does an upgrade if it's in the "running" state
<wrtp> TheMue: what fwereade_ says.
<TheMue> fwereade_: IC, so there already has been an initial error implementing NeedsUpgrade() and SetNeedsUpgrade() without viewing the node content
<wrtp> TheMue: i could probably describe the correspondence of the fields with the version components as described at semver.org, i guess
<fwereade_> TheMue, I'm not sure it was an error at the time, the --force flag is relatively recent
<TheMue> fwereade_: Is this a newer change or already longer in code?
<TheMue> fwereade_: Ah, ic, this is an explain (you say so?).
<fwereade_> TheMue, sorry, cannot parse
<wrtp> TheMue: it was a very quick coding job this morning in response to a discussion with fwereade_ - it's not fully documented yet...
<TheMue> wrtp: I only wanted to understand those two fields better, but the doc mentioned above already looks good.
<wrtp> TheMue: the interaction between prerelease and build doesn't really describe the interaction between those two fields, unfortunately.
<TheMue> fwereade_: Argh, yes, hadn't the word. Should read: this is an explanation.
<TheMue> fwereade_: So I will handle the content now too.
<fwereade_> TheMue, cool, thanks (it was merged less than a month ago, so it was before I switched back to python, and I'm sure I remember seeing the first upgrade stuff in go while I was still on go)
<TheMue> fwereade_: I've got to thank you for the hint. Otherwise it would have been wrong from the beginning.
<fwereade_> TheMue, I think this is one of the biggest risks we face tbh
<fwereade_> TheMue, and being the only person on the team with semi-current knowledge of the python makes me a little nervous, but I'll do my best ;)
<TheMue> fwereade_: Your the official Knowledge Transfer Agent. *lol*
<fwereade_> haha :)
<wrtp> fwereade_: do you have a preference for what kind of provider storage we should use for juju binaries?
<wrtp> fwereade_: perhaps we should just use S3 on EC2
<fwereade_> wrtp, not really; I know there's been quiet talk of decoupling storage provider from machine provider but that feels like a distraction at this point
<fwereade_> wrtp, so, yeah, whatever storage we already have for the appropriate provider
<wrtp> fwereade_: i wonder if Environ.Bootstrap should take the client version as an argument, or if the environs package should fetch the version number from somewhere on its own.
<wrtp> fwereade_: my inclination is towards the former
<fwereade_> wrtp, likewise, I think
<fwereade_> wrtp, this may not be python but explicit still beats implicit in general ;)
<wrtp> fwereade_: cool. and then i guess i'd store the location of the binaries in the state, so agents can pull them out using Environ.GetFile
<fwereade_> wrtp, ...I *think* so but most of my cycles are still concentrated on figuring out what the hell I was oing a months ago in the followup branches I hadn't proposed yet
<wrtp> fwereade_: :-)
<wrtp> fwereade_: that can be educational...
<fwereade_> wrtp, also on not being able to type, it seems
<wrtp> fwereade_: what's the most universally available command-line tool for downloading from a web server? wget? (i realise that the code to download the binaries from S3 cannot be written in Go itself... doh!)
<fwereade_> wrtp, ha, I guess so :)
<fwereade_> wrtp, but really you can install what you want in the cloud-init, can't you?
<wrtp> fwereade_: how do i get the cloud-init to install a binary from S3?
<fwereade_> wrtp, (not that I'm advocating a separate juju-getter package or anything :p)
<wrtp> fwereade_: sure, i could apt-get another package. but if a couple of lines of shell will do it, i think that'd be better.
<fwereade_> wrtp, you can run arbitrary scripts, so... however you want ;p
<fwereade_> wrtp, indeed
<fwereade_> wrtp, wget a signed URL seems sensible
<wrtp> fwereade_: you mean a URL containing signed content?
<fwereade_> wrtp, I was just thinking something like juju/providers/ec2/files.py:40
<wrtp> fwereade_: yeah, i was planning to provide that
<wrtp> fwereade_: another question: how can i tell what tools are in a basic ubuntu release? (for instance, i'm wondering if unzip is there by default. the less apt-gets the better, i think)
<fwereade_> wrtp, hmm, I have no idea I'm afraid :(
<wrtp> fwereade_: guess i'll just try it and see :-)
<fwereade_> wrtp, sounds good :)
<wrtp> fwereade_: i *thought* the useful thing about zip vs tar is that it's supported within Go. but tar is also supported, so i think that's probably the better option.
<wrtp> fwereade_: and tar and gzip will most definitely be part of the stock distro
<fwereade_> wrtp, so I would assume ;)
<wrtp> fwereade_: they darn well should be
<wrtp> fwereade_: i guess we should probably sign the content too.
<fwereade_> wrtp, probably sensible
<wrtp> fwereade_: yet another thing: have you got an opinion on where we should put the binaries?
<fwereade_> wrtp, not really :)
<wrtp> fwereade_: /tmp then :-)
<wrtp> fwereade_, TheMue: https://codereview.appspot.com/6081044
<fwereade_> wrtp, LGTM
<wrtp> fwereade, TheMue: https://codereview.appspot.com/6082044
<wrtp> i should probably change it to use gocheck though, i suppose
<wrtp> fwereade, TheMue: right, that's me for the week. see ya monday. have a great weekend!
<TheMue> wrtp: Have a nice weekend. gocheck idea is good. I'm already off, only looking at tjose two last proposals.
<wrtp> TheMue: one reason for not using gocheck is that it's a highly independent package. i quite like not depending on very much, and the testing package works fine here.
<TheMue> wrtp: Yes, you don't need it. It's only that a different maintainer later may ask himself why testing here is different than for other packages.
<wrtp> TheMue: the tomb package doesn't use gocheck, for example
<TheMue> wrtp: It's an external package, not inside the project.
<wrtp> TheMue: yeah. this might become external too.
<TheMue> wrtp: But indeed, you don't need it so far.
<TheMue> wrtp: In this case it's ok.
<fwereade> wrtp, basically LGTM, don't worry about it until monday :)
<wrtp> fwereade: minor versions can add features
<TheMue> So, last proposal done, have a nice weekend.
#juju-dev 2013-04-15
<davecheney> thumper: I think you need to repropose https://codereview.appspot.com/8701043/
<davecheney> it talks about a StringSet type
<thumper> ok...
<davecheney> but one does not exist in trunk
 * thumper proposes
<bigjools> I had no idea you loved me
<thumper> I also tweaked a few bits in the stringset file, to remove some ", _" bits
<thumper> when iterating over the dict
<thumper> bigjools: I do love you, but you are married already
<bigjools> and I have baggage
<davecheney> pick me pick me
<davecheney> i only have carry on
<thumper> davecheney: updated
<davecheney> thumper: responded
<davecheney> very nice
<davecheney> just a few thigns
<thumper> ta
<thumper> I'll look after I've made some lunch :)
<thumper> thinking bacon, cheese and egg toasty
<davecheney> kk
<davecheney> m_3: ping
<thumper> davecheney: it isn't possible to provide an interface for a structure that'll support iterating using range is there?
<davecheney> thumper: no sadly not, user types cannot implement whatever range wants
<thumper> :(
<thumper> davecheney: by using NewStrings in the union, intersection, difference methods, means we can directly poke the implementation map knowing that it has been initialized
<thumper> davecheney: otherwise we'd need to use the Add function that has the check every time
<davecheney> thumper: fair enough, ignore that comment
<thumper> ok
 * thumper continues...
<thumper> davecheney: does the juju team have a pgp key-pair for signing packages?
<thumper> davecheney: I'm thinking about the tool signing process
<thumper> davecheney: for uploading tools, and for verification
<thumper> I'm messing around with go.crypto trying to see how it works
<thumper> in particular, how we could verify keys
<davecheney> thumper: ~juju does not have keys, to the best of my knowledge
<thumper> davecheney: I've just created a 4096 bit RSA gpg key here for juju-dev
<thumper> davecheney: for my experimentation
<thumper> perhaps we could use that...
<thumper> if this works
<davecheney> sure
<davecheney> it's just entropy
<thumper> davecheney: I'm now getting go people following me on twitter...
<davecheney> this is a bad thing ?
<thumper> not necessarily
<thumper> maybe if I tweet enough, we'll get generics :)
<thumper> perhaps it is time for me to write my inaugral go blog post
<davecheney> gotta do it once a year, just like changing your jocks
<thumper> :)
 * thumper is looking for examples of someone actually using go.crypto/openpgp
<davecheney> jamespage_: ping
<davecheney> mramm said that you had mongo for quantal available
<davecheney> can you hook me up ?
<jamespage> davecheney, I've requested the backports for 12.04 and 12.10
<jamespage> ppa:james-page/mongodb2.2 contains backports for 12.04 and 12.10
<davecheney> jamespage: fantastic
<jamespage> suggest that those packages are copied to some sort of official PPA until the packages land in backports
<davecheney> ok, checking it out
<davecheney> we have ~juju/experimental as the PPA which is injected into cloudinit for P and Q machines when they are bootstrapped
 * thumper off to take the kids ice-skating...
<thumper> davecheney: just in case you know people working on go.crypto : http://stackoverflow.com/questions/16007695/verifying-a-signature-using-go-crypto-openpgp
<davecheney> jamespage: https://launchpad.net/~juju/+archive/experimental/+packages
<davecheney> is it possible to bump the build version for quantal ?
<davecheney> I need a 0ubuntu2
<davecheney> unless it is possible to delete from a PPA (which I assume is not possible)
<jam> wallyworld_: welcome back, I see you escaped the lions, but did you do so all in one piece? :)
<wallyworld_> jam: thanks. yes, all in one piece. they are quite placid, unless you are a zebra wtc
<wallyworld_> jam: bloody brilliant. i apt-upgraded this morning and now it appears no usb devices can be seen :-(
<jam> wallyworld_: is your keyboard and mouse PS2 then?
<jam> that is pretty awful
<wallyworld_> jam: no, i'm referring to my headphones, and also checked with a memory stick
<wallyworld_> just now, preparing for meeting
<jam> sure
<wallyworld_> let me reboot. works for windows :-(
<jam> try another port perhaps?
<jam> :)
<wallyworld_> done that
<jam> wallyworld_: saw you for a moment there
<wallyworld_> yeah, usual mumble shittiness
<wallyworld_> still trying to make it happy
<jam> wallyworld_: I'm fine doing google chat for our 1:1's if it works better for you. It is just the standup where we can't all get along.
<wallyworld_> i'll try once more
<mramm> davecheney, thumper: hi
<davecheney> mramm: could you please expand on your prevoius reply
<davecheney> should I stop or continue ?
<mramm> keep working on what you are working on
<davecheney> mramm: understood
 * davecheney goes back to compiling
<mramm> I'm saying that they will be working on whatever needs working on to get stuff into the release
<davecheney> understood
<wallyworld_> jam: mumble died, sec
<jam> wallyworld_: I heard you for a second
<jam> but apparently you don't hear me
<rogpeppe> mornin' all
<rogpeppe> fwereade_: pin
<rogpeppe> g
<fwereade_> rogpeppe, heyhey
<rogpeppe> fwereade_: fancy a chat?
<fwereade_> rogpeppe, sure, would you start one please?
<rogpeppe> fwereade_: https://plus.google.com/hangouts/_/a978de2c1b769c0250011c76024919656376b1f8?authuser=0&hl=en-GB
<aero1> Hi all! I've been working on https://github.com/AeroNotix/hpcloud before learning about juju. Would my help be worthwhile in juju itself or do we follow different goals?
<aero1> HPCloud has some extensions to OpenStack such as RDMS etc, I have bindings to those.
<jam> aero1: do you mean 'hpcloud' vs "goose' (the openstack bindings?)
<aero1> sure
<jam> juju is more about being able to deploy ubuntu packages into the cloud.
<aero1> oh ok
<davecheney> Uploading mongodb-clients_2.2.4-0ubuntu2_amd64.deb: 2645k/19783k
<aero1> so, goose.
<davecheney> Uploading mongodb-clients_2.2.4-0ubuntu2_amd64.deb: 2645k/19783k
<davecheney> whoo
<davecheney> finally
<davecheney> only taken all day
<jam> davecheney: grats
<jam> goose is meant to be reasonably generic, I don't think we have a problem with having some extensions as well.
<davecheney> jam: this should fix the problem bootstrapping quantal nodes
<jam> aero1: but it certainly is focused on being access to any openstack deployment.
 * fwereade_ cheers at davecheney
<jam> davecheney: so the ppa had non-ssl?
<davecheney> jam: nope, bigjools fucked it up :)
<davecheney> we had a fight with LP, and lost
<bigjools> bollocks did I
<jam> bigjools: well, as long as we can blame you, it doesn't matter if you did :)
<davecheney> bigjools: https://bugs.launchpad.net/juju-core/+bug/1168196
<davecheney> the web doesn't lie
<davecheney> something is broken, you probably did it
<bigjools> sure
<bigjools> the executable had ssl options on it.  NEXT.
<davecheney> good, i'm glad we're all clear on that
<davecheney> bigjools: jam Successfully uploaded packages.
<davecheney> but it hasn't shown up on the PPA page yet
<davecheney> how long is the delay ?
<jam> davecheney: if those are source packages, you get queued into the builders, and it depends how busy they are.
<jam> Generally a couple hours.
<davecheney> this was a binary
<davecheney> i spent most of the day building it
<jam> davecheney: personally, I didn't think you could upload binaries to PPAs. I thought LP had to build it.
<davecheney> http://paste.ubuntu.com/5709833/
<davecheney> where did it go ?
<davecheney> i have to go and make dinner
<jam> davecheney: I believe the deb handler will silently ignore files it doesn't like, to avoid being an attack vector.
<davecheney> if anyone knows where that package has gone, pleas let me know
<davecheney> maybe i nede to push my gpg key to LP
<davecheney> is there a command for that ?
<jam> davecheney: I'll ask around on Launchpad, but I'm pretty sure the default is that you can only upload source packages to LP
<davecheney> i'm following the instructions mims sent me
<davecheney> which produced the current ppa's we have in experimental
<jam> bigjools: can you confirm?
<jam> davecheney: I thought there was a <10min time for the archive to poll the new uploads before it will see them.
<jam> But that should be reasonably fast.
<jam> so we might just be waiting there.
<jam> It sounds like you should get an email
<davecheney> it's awesome that the instructions for doing this on LP are completely out of date
<davecheney> i think there was a problem with my gpg key
<davecheney> i suspect LP just threw away that upload
<davecheney> lucky(~/pbuilder/quantal_result) % dput ppa:juju/experimental mongodb_2.2.4-0ubuntu2_amd64.changes
<davecheney> Package has already been uploaded to ppa on ppa.launchpad.net
<davecheney> Nothing more to do for mongodb_2.2.4-0ubuntu2_amd64.changes
<davecheney> you mean I have to build it again !
<davecheney> FUCK
<jam> davecheney: I don't see any gpg keys here: https://launchpad.net/~dave-cheney
<jam> davecheney: so you probably do need to at least add that bit.
<jam> If it is saying "already uploaded" it may or may not still be in the queue of things to finish.
<jam> davecheney: if/when you get back, wgrant and stevenk would be interested in talking to you about it in #launchpad-dev
<danilos> davecheney, hi, I am still having test failures on raring 64bit with system mongodb: http://paste.ubuntu.com/5709879/ (if I simply change my path to first include the extracted 2.2.0 tarball binaries, then all the tests pass)
<TheMue> lunchtime
<dimitern> fwereade_: ping
<fwereade_> dimitern, pong
<dimitern> fwereade_: just run the partial implementation of openstack instance selection past you
<fwereade_> dimitern, cool, g+?
<dimitern> fwereade_: i'll start one
<dimitern> fwereade_: https://plus.google.com/hangouts/_/d16686e011ab34034d07f07641d351a34b0c6c9d?authuser=0&hl=en
<dimitern> fwereade_: https://codereview.appspot.com/8753044
<rogpeppe> fwereade_: just looking at tools.ReadList - given that we no longer fetch just the tools with a given major version from the provider, i don't think there's any particular reason, AFAICS, to have majorVersion as an argument any more. i *think* that's the only reason for ErrNoTools to exist.
<fwereade_> rogpeppe, hmm, I think it still has value, on the basis that you can only ever be legitimately interested in a single major version at a time
<fwereade_> rogpeppe, otherwise you need to remember to do a filter step everywhere that uses it (and filter needs a new field to do that)
<rogpeppe> fwereade_: i'm not sure - it seems like pre-guessing the uses it might be useful for. we might want to know which major versions are supported in an environment. and in our particular case, we want to know about other major versions so we can know if there are any tools.
<rogpeppe> fwereade_: yeah, filter would need another field
<rogpeppe> fwereade_: but that seems more intuitive, i think
<fwereade_> rogpeppe, I don't think it's ever legitimate for the CLI to upgrade juju to a non-matching major version
<rogpeppe> fwereade_: that might not be why we're calling ReadList
<rogpeppe> fwereade_: it's a utility function
<rogpeppe> fwereade_: one review, BTW: https://codereview.appspot.com/8727044/
<fwereade_> rogpeppe, tyvm
<fwereade_> rogpeppe, feels a bit speculative to me tbh
<fwereade_> rogpeppe, not saying that use case will never show up, just that it's not here now
<rogpeppe> fwereade_: i like to pare down functions to their minimum if possible. streamlining increases the possibility for elegant generality in the future
<rogpeppe> fwereade_: it feels like we're putting a special case in for this one scenario
<fwereade_> rogpeppe, well, actually, I'm preserving behaviour that, on consideration, does what we need right now
<fwereade_> rogpeppe, there was a definite temptation to take that on as well but... we wouldn't use it
<rogpeppe> fwereade_: i'm not suggesting you change the behaviour. just that i know that i added the major-version argument specifically so that we could fetch only tools that matched it. we're no longer doing that, so i think it could go. and that simplies other stuff too, i think. but if you think it's worth it, fine.
<fwereade_> rogpeppe, I think that as it stands its benefits outweigh its costs... that's not to say that everything I've done over the last few days couldn't be done much better, ofc
<fwereade_> rogpeppe, believe it or not, I've been trying to keep this stuff focused
<fwereade_> rogpeppe, I just keep bumping up against weird little interactions
<rogpeppe> fwereade_: i guess i don't see the benefits, but i believe you!
<jam> dimitern: poke
<fwereade> rogpeppe, I see what you mean about the name/effect of FindBootstrapTools but I'm not sure about the alternative
<rogpeppe> fwereade: it was just one thought
<rogpeppe> fwereade: the fact that you didn't actually test that side effect is quite indicative to me that the name isn't right
<fwereade> rogpeppe, yeah, that is pretty embarrassing
<rogpeppe> fwereade: i think that making the side effect the primary purpose feels good to me, but i'm totally unsure about the right name for it
<rogpeppe> fwereade: BTW, what's the reasoning behind the "usefulVersion" name i've seen in a few places?
<fwereade> rogpeppe, "it means FFS this test depends on something, let's fake it up and move on"
<rogpeppe> fwereade: ah
<rogpeppe> fwereade: perhaps "requiredVersion" might be more helpful?
<fwereade> rogpeppe, +1, thanks
<rogpeppe> fwereade: and perhaps a comment saying what actually requires it, for future reference
<fwereade> rogpeppe, yeah, sounds reasonable
 * fwereade now has to figure them all out again ;p
 * fwereade wonders whether that'll maybe learn him
<rogpeppe> fwereade: :-)
<fwereade> wtf, the provisioner doesn't remove dead machines?
<fwereade> gaah
 * fwereade knows what he's doing *now* then
 * dimitern lunch
<jam> fwereade: going to lunch with dimitern to drink away your sorrow ? :)
<rogpeppe> fwereade: have you tested any of your recent branches live?
<fwereade> rogpeppe, yes, they appear to all do as they should
<fwereade> rogpeppe, I'll be doing a final one after whatever total set of changes land
<fwereade> rogpeppe, (land in my branches, I mean, not land in trunk)
<rogpeppe> fwereade: cool. i saw the APIInfo being erased in FinishMachineConfig and thought "huh?" but i see now why it's happening
<rogpeppe> fwereade: i've got a suggestion in my review, just coming up
<fwereade> rogpeppe, yeah, that bit's still a bit squirrely but it's better than before, I'm pretty sure
<rogpeppe> fwereade: i hope you like my suggestion, which hopefully is better still.
<fwereade> rogpeppe, cool
<fwereade> rogpeppe, I look forward to it
<rogpeppe> fwereade: i've published my comments so far: https://codereview.appspot.com/8726044/
<fwereade> rogpeppe, tyvm
<fwereade> rogpeppe, I felt like changes to MachineConfig itself were out of scope here... the changes I made will, I think, make it easier to fix MachineConfig in the future
<fwereade> rogpeppe, but there's something smarter struggling to get out, I think, I'm just not quite sure what it is yet
<rogpeppe> fwereade: hmm, i see now that you haven't changed environs/cloudinit at all
<fwereade> rogpeppe, yeah, I was just drawing duplicated behaviour together
<rogpeppe> fwereade: i really don't like FinishMachineConfig though - it feels highly squirrely
<rogpeppe> fwereade: although...
<fwereade> rogpeppe, it's a single squirrel instead of a... <looks up collective nouns...> dray, or scurry, thereof
<rogpeppe> fwereade: the basic idea of using lots of info from the config.Config to inform our cloudinit script seems dead right
<fwereade> rogpeppe, I don;t claim it's anything more that that
<fwereade> rogpeppe, yeah, I'm confident it's a good direction, but only a first step
<rogpeppe> fwereade: it's just that now for any potential provider-writer, it is totally non-obvious which fields in the MachineConfig should be filled out be whom.
<rogpeppe> s/be whom/by whom/
<fwereade> rogpeppe, agreed, but MachineConfig has pretty solid validation, so I'm not too botherered there
<rogpeppe> fwereade: that's not the point!
<fwereade> rogpeppe, it all had to be cargo-culter from ec2 for the existing ones anyway, really
<rogpeppe> fwereade: it still does
<rogpeppe> fwereade: but the amount of code copied will be smaller, which is good
<fwereade> rogpeppe, precisely my point, no worse than before ;p
<rogpeppe> fwereade: i'm not entirely sure.
<rogpeppe> fwereade: you're probably right.
<rogpeppe> fwereade: but it doesn't feel like a proper simplification
<rogpeppe> fwereade: and there's definitely one to be made.
<fwereade> rogpeppe, totally agreed
<fwereade> rogpeppe, I was trying to do that before atlanta but got too tangled
<rogpeppe> fwereade: basically the MachineConfig fields are parameters to cloudinit.Configure and now half of them are redundant
<fwereade> rogpeppe, well, none are *redundant* yet, but they're plainly due for the chopping block
<rogpeppe> fwereade: they are redundant, i think - it would be wrong if a provider didn't call FinishMachineConfig immediately before Configure.
<rogpeppe> fwereade: here's a suggestion:
<rogpeppe> fwereade: don't remove any fields fron MachineConfig
<rogpeppe> fwereade: but...
<rogpeppe> fwereade: move FinishMachineConfig into cloudinit.Configure
<rogpeppe> fwereade: and mark the fields that it fills out as deprecated in the MachineConfig
<rogpeppe> fwereade: i would be much happier with that
<rogpeppe> fwereade: and the providers would be simpler still.
<rogpeppe> fwereade: and they wouldn't need to change when the fields go
<fwereade> rogpeppe, I really just want to avoid touching environs/cloudinit in this pass through the code
<rogpeppe> fwereade: ok, then please leave a TODO comment on FinishMachineConfig.
<fwereade> rogpeppe, they were generated in a disturbing fashion; now they are generated in a slightly less disturbing one; that's all :)
<fwereade> rogpeppe, sgtm
<jam> wallyworld: poke
<rogpeppe> fwereade: rest of comments published
<rogpeppe> fwereade: interestingly, sync-tools is probably a place where we *do* want to do a ReadList across major versions. there's no particular reason why we only want to copy tools with a single major version AFAICS.
<TheMue> fwereade: do i get it right that regarding the units and the relations "only" the relation-errors are interesting?
<fwereade> rogpeppe, my view is that a given CLI major version can and will, for now, only interact with tools sharing a major version
<fwereade> TheMue, we don't really have relation errors
<rogpeppe> fwereade: even to the extent of refusing to copy tools for other major versions? i think that's a bit gratuitous - we see them but we can't touch them.
<fwereade> rogpeppe, we can't deploy them either, so...
<rogpeppe> fwereade: someone else might be able to though
<fwereade> rogpeppe, then they can run sync-tools?
<rogpeppe> fwereade: maybe they want to use a public bucket?
<fwereade> rogpeppe, when I hear a user complaining that they need to run two different major versions of juju out of the same bucket, but *don't* have CLI tools for all the major versions they are running, I will ask the user how they plan to run the alternate major versions without suitable CLI tools
<rogpeppe> fwereade: it means that if you're an admin and want to copy several versions into a public bucket, you have to have a separate juju binary for each major version, and run it that many times.
<TheMue> fwereade: aha?!? that's the only stuff generated out of the relations and the relation service map in Python I can find. *wonder*
<rogpeppe> fwereade: the point is that you can use sync-tools for many users
<fwereade> TheMue, I'm sure we had a chat about relation errors
<rogpeppe> fwereade: if there wasn't a --public flag i'd tend to agree
<fwereade> TheMue, does it not check for unit presence in the relation? I'm pretty sure it does
<fwereade> rogpeppe, I see the public flag as a convenience for users who want their tools shared across a few of their own environments at this stage
<fwereade> rogpeppe, it's probably worth revisiting when we get another major version though :)
<fwereade> TheMue, coming up with a sensible plan (and justification for it) is your main job here I think
<TheMue> fwereade: _process_unit passes the relations and the service map to _process_unit_relations, and there below unit only relation errors are generated as output
<fwereade> TheMue, that's frickin' awesome news
<fwereade> TheMue, nothing to do for units
<fwereade> TheMue, yay!
<TheMue> fwereade: afaics yes. i'm checking the tests, but there also only relation-errors are mentioned
<fwereade> TheMue, (incidentally, this informs my suspicion of specs -- there's a python doc explaining in detail why they had to make a backward-incompatible change to unit relation status output, describing relation-errors plus a bunch of other things -- and I didn't check the code for that bit)
<fwereade> TheMue, it now emerges it isn't there, I agree
<TheMue> fwereade: *phew* ;)
<rogpeppe> fwereade: you've got another review: https://codereview.appspot.com/8748046/
<TheMue> fwereade: btw, the change after your last review is proposed again
<rogpeppe> fwereade: i'm wondering whether to put juju-wait in its own repo, or make a repo for several like-minded tools. juju-utils, perhaps?
<dimitern> rogpeppe: we already discussed having something like lp:juju-core-tools, there's even a card for that
<rogpeppe> dimitern: ha. too late!
<rogpeppe> dimitern: https://launchpad.net/juju-utils
<dimitern> rogpeppe: :) the card can be changed as well
<fwereade> rogpeppe, lovely, thanks
<dimitern> rogpeppe: it's in Blue backlog btw
<rogpeppe> fwereade, mramm: https://plus.google.com/hangouts/_/539f4239bf2fd8f454b789d64cd7307166bc9083
<rvba> fwereade: here it is: https://code.launchpad.net/~rvb/juju-core/fix-maas-pvd-conf/+merge/158938
<rogpeppe> lunch
<fwereade> rvba, thanks
<TheMue> fwereade: thx for review. sorting is IMHO only interesting for testing, so you can exactly test against the expected value
<fwereade> TheMue, it's the dupes, not the sorting
<fwereade> TheMue, I kinda feel that if there's a dupe it's probably meant to be there
<fwereade> TheMue, or even if it represents total crack, we should show it
<TheMue> fwereade: ok, so i'll change it into a sorted slice again and we'll what happes during the audit
<TheMue> fwereade: ;)
<fwereade> TheMue, well, I was suggesting we keep it out of paranoia but flag it with a TODO for later investigation
<fwereade> rvba, hey, I have a theory re mysql/mediawiki: was mediawiki in an error state trying to leave the relation?
<rvba> fwereade: you mean after I ran "juju destroy mysql"?
<fwereade> rvba, yeah
<dimitern> TheMue: how about peer relations?
<rvba> I don't know but it happens that I have this setup in the lab right now http://paste.ubuntu.com/5710574/ so I can try to destroy mysql and have a lookâ¦
<TheMue> dimitern: just seen your review, thanks, will add.
<fwereade> rvba, I would be interested to see what happens, yeah, because I *do* have a local tweak to the code in mine... that I *think* is actually not necessary
<fwereade> dimitern, TheMue: good point re peers
<fwereade> TheMue, should just need a test; but also, subordinates; but please land this one as-is (just a peer relation test should be fine)
<dimitern> TheMue: yeah, peer relations are simple enough to make it a separate test case; this one is already complex enough
<TheMue> fwereade, dimitern: sure, won't make it larger
<fwereade> rvba, ah, hell, it has finally clicked
<fwereade> rvba, you *do* need my code change
<rvba> fwereade: here are the state I'm in now, with links to all the logs. http://paste.ubuntu.com/5710599/
<dimitern> fwereade: 3 reviewed, 2 to go
<fwereade> rvba, but even with that, if mediawiki has a hook error on mysql's departure, it won't drop its reference to that relation until it's resolved
<fwereade> rvba, and that will keep the service alive
<fwereade> rvba, nothing to do with your unit situation
<dimitern> fwereade: if you don't mind, I'll wait for your https://codereview.appspot.com/8726044/ to land first, and then I'll land my openstack constraints branch
<fwereade> dimitern, +1, tyvm
<rvba> fwereade: right.
<fwereade> rvba, ok, sorry for crack suggestion just now, but I think I know (1) what's wrong and (2) how to fix it, and will hopefully nail (3) how to test it before too long
<rvba> dimitern: Thanks for the review of lp:~rvb/juju-core/fix-maas-pvd-conf.  Would you mind landing it for me please?  lbox refuses to cooperate with me.
<dimitern> rvba: sure, I'll pull your branch and land it for you, if you're fine with that
<rvba> dimitern: that would be great, thanks!
<rvba> fwereade: coolâ¦ once you have a branch ready, I can help with the real-world testing if you want.
<fwereade> rvba, I've approved https://code.launchpad.net/~rvb/juju-core/fix-maas-pvd-conf/+merge/158938 -- I think it's a trivial
<fwereade> rvba, go ahead and land it :)
<dimitern> fwereade, rvba: I'll land it for him
<dimitern> fwereade: (problems with lbox)
<dimitern> rvba: here it is: https://codereview.appspot.com/8772043
<fwereade> dimitern, ofc, thanks
<dimitern> rvba: take a look if it's ok, I'll submit it
<dimitern> fwereade: set the old MP to Rejected because of the above
<fwereade> dimitern, sgtm, just comment it
<dimitern> fwereade: yeah, did and linked the new CL in the old MP
<fwereade> dimitern, <3
<rvba> dimitern: looks good, thanks for doing this.
<dimitern> rvba: np, lbox is a bitch at first, I know :)
<rvba> dimitern: if that error rings a bell: http://paste.ubuntu.com/5710470/, please do advise :)
<dimitern> rvba: rings a bell, but not sure how. I had to patch lbox locally to use it, I can send you a diff to see if it works for you
<rvba> dimitern: that would be greatâ¦ I was also thinking about manually hacking lbox's inferPushURL() methodâ¦
<dimitern> rvba: actually, it's not lbox - it's lpad the one that lbox uses to communicate with lp
<dimitern> rvba: http://paste.ubuntu.com/5710712/
<rvba> dimitern: ta
<dimitern> rvba: apply this (well, change dimitern to match your lp id) and see if it helps - you can try a simple branch and "lbox propose -wip"
<rvba> dimitern: will do, thanks again.
<dimitern> rvba: just a reminder, after patching, run "cd $GOPATH/src/launchpad.net/lbox && go install"
<fwereade> dimitern, rvba: https://codereview.appspot.com/8748048
<fwereade> rvba, confirmation that it works in your exact scenario would be appreciated, but I'm pretty confident
<fwereade> bbiab
<rvba> fwereade: dimitern: I'm testing this in the lab right nowâ¦
<dimitern> fwereade: LGTM
<rogpeppe> fwereade: is there a ticket for the python/go environments.yaml incompatibility issue?
<rogpeppe> fwereade: i'm having difficulty reproducing the problem
<rogpeppe> mgz: ^
<dimitern> danilos: I see you have a kanban account already
<dimitern> rogpeppe: is a defer statement function-scoped or block-scoped?
<rvba> fwereade: doesn't look like your fix worked: http://paste.ubuntu.com/5710819/
<rogpeppe> dimitern: the former
<dimitern> rogpeppe: I thought so, although seeing it into an if block within a function raised an eyebrow
<rogpeppe> dimitern: that's just fine
<rogpeppe> dimitern: can be in a for loop too...
<dimitern> rogpeppe: ah, cool - although potentially messy (for i=1-1000 defer something each time..)
<dimitern> rvba: I can see the unit is removed, but the service seems still there
<rvba> dimitern: yep
<dimitern> rvba: that wasn't the case before, right? the both the unit and the service were in the status output
<rvba> dimitern: indeed (that was before the fix: http://paste.ubuntu.com/5710599/)
<dimitern> fwereade , rvba: I think the fix should include removeOps in the amended case, like we're doing a bit further up in the "if s.doc.Life == Dying && s.doc.RelationCount == 0 && s.doc.UnitCount == 1" case
<dimitern> rvba: removeOps is responsible for removing the service itself; the change is in removeUnitOps, which only removes the unit (except in the former mentioned case)
<mgz> rogpeppe: there is a ticket, that might have been overly general, then closed when a specific issue was fixed
<dimitern> rvba: added comment to the proposal
<rogpeppe> mgz: do you know of a specific problem?
<rogpeppe> mgz: as far as i can tell, go juju will parse unknown environment attributes without complaint
<rogpeppe> mgz: but i'm pretty sure that's actually not the case, and that this really is a genuine issue
<rogpeppe> dimitern: you chatted to ian about this - do you know anything about the problem?
<dimitern> rogpeppe: if should ignore unknown keys (like "placement: local"), but barf on known keys whose type was changed
<dimitern> rogpeppe: we didn't discuss specifics yet, and from the kanban chart I can see danilos actually picked that one now
<rogpeppe> dimitern: ah, do you mean we want a *single environment entry* to work with both go and python juju?
<dimitern> rogpeppe: i'm sure that's not what we need - not for a bootstrapped env for sure
<dimitern> rogpeppe: but for a fresh one, perhaps
<rogpeppe> dimitern: i don't know. seems like a somewhat dubious requirement to me.
<rogpeppe> dimitern: if that's really what the requirement is
<dimitern> rogpeppe: we'll never make existing py-juju envs work with go-juju, but at least trying to do something with an existing py-env.yaml should fail gracefully, I think
<rogpeppe> dimitern: i'm not sure what you mean by "fail gracefully" there
<dimitern> rogpeppe: like producing a meaningful error
<dimitern> rogpeppe: can we detect a py env with certainty from the yaml only?
<rogpeppe> dimitern: only with dodgy heuristics, probably
<rogpeppe> dimitern: basically, i can't see this as a high priority task
<dimitern> rogpeppe: i think the general idea, after having py/go juju co-installable, is to detect incompatible yaml and report it early
<rogpeppe> dimitern: but i'd really like to talk to someone for whom this is a priority
<rogpeppe> dimitern: we do that
<dimitern> rogpeppe: how does it look?
<rogpeppe> dimitern: you get an error like:
<rogpeppe> error: placement: expected nothing, got "345"
<dimitern> rogpeppe: and how is this helpful?
<dimitern> rogpeppe: for the user
<rogpeppe> dimitern: it's detecting incompatible yaml and reporting it early...
<dimitern> rogpeppe: but not in a meaningful for the user way
<rogpeppe> dimitern: the error message could probably do with some work, it's true
<rogpeppe> dimitern: i'd like us to work on stuff that is important functionally as we've only got hours remaining
<dimitern> rogpeppe: put yourself in the user's shoes - the error should provide a hint what's actually wrong and how to fix it
<rogpeppe> dimitern: i agree
<rogpeppe> dimitern: but i wouldn't put this at the top of the list of things that need solving *now*
<dimitern> rogpeppe: +1, not sure how and why this is high prio
<rogpeppe> dimitern: also, i think that the whole idea behind all the JUJU_HOME work was so that we would not have to share the environments.yaml file!
<rogpeppe> dimitern: and that added loads of complexity to the system
<rogpeppe> dimitern: and now we're saying that it's not enough
<dimitern> rogpeppe: we discussed this on the standup; jam said he suggested having a different name/path for go-env.yaml, but it was rejected
<rogpeppe> dimitern: i also think that's the wrong solution
<rogpeppe> dimitern: a) we *can* currently use the same environments.yaml, AFAIK (but you can't have the same environment entry for both go and py juju)
<rogpeppe> dimitern: b) we have JUJU_HOME
<dimitern> rogpeppe: agreed, on both points (except for the more helpful error msg)
<rogpeppe> m_3, hazmat: do you know anything more about this issue, by any chance?
<rogpeppe> dimitern: which ticket on the kanban board are you referring to?
<dimitern> rogpeppe: "environments.yaml should cope with pyjuju and gojuju environments"
<dimitern> rogpeppe: the description seems to shed some light on the actual issue as well
<fwereade> rvba, working as intended according to that paste
<rogpeppe> dimitern: the problem is that the "best compromise" is exactly what we do now
<rogpeppe> dimitern: i fixed the issue, was writing tests, and then realised that
<fwereade> rvba, the errored mediawiki is keeping a ref to the relation, which is keeping a ref to the service
<dimitern> fwereade: ah, ok then
<rogpeppe> dimitern: AFAICS the description "juju-core will fail to start if there are any unrecognized keys" is not true
<fwereade> rvba, if you resolve mediawiki/0 it mysql should go away
<fwereade> gtg again
<dimitern> rogpeppe: it's still dubious imho
<rogpeppe> dimitern: what's still dubious?
<rogpeppe> dimitern: our error messages?
<dimitern> rogpeppe: that's for sure, but the "fail to start" as well
<rogpeppe> dimitern: we don't fail to start if there are any unrecognised keys in environments.yaml
<dimitern> rogpeppe: start=? bootstrap? connect? return an error on any cmd?
<rogpeppe> dimitern: any of the above
<dimitern> rogpeppe: so it seems we're good then
<rogpeppe> dimitern: when the unrecognised keys aren't part of the chosen environment
<rogpeppe> dimitern: exactly
<rogpeppe> dimitern: but perhaps i'm wrong and there's a subtle issue i've missed
<rogpeppe> dimitern: which is why i want to speak to someone that's experienced the issue
<rogpeppe> dimitern: before dismissing the ticket
<dimitern> rogpeppe: it's probably worth a @juju-dev post
<rogpeppe> dimitern: yeah
<dimitern> eod or me
<dimitern> good night all!
<rogpeppe> fwereade: does this LGTY in its new location? https://codereview.appspot.com/8565044/
<m_3> rogpeppe: hey
<rogpeppe> m_3: hiya
<m_3> did you get the answer you needed about shared envs?
<rogpeppe> m_3: not really
<m_3> 'origin: ppa' is usually the biggest difference atm
<rogpeppe> m_3: do you want to be able to use the same environment entry in both go and py juju?
<rogpeppe> m_3: i'm not entirely sure whether that's a goal worth striving for
<m_3> rogpeppe: yeah, that'd be the preference, but not too high of a priority
<m_3> rogpeppe: let's say goju envs and pyju envs in the same file
<m_3> rogpeppe: but not the same envs
<rogpeppe> m_3: we can do that currently
<rogpeppe> m_3: unless you know something that i don't...
<m_3> i.e., we don't want to promote people trying to use the same environment
 * rogpeppe realises that m_3 knows *loads* of things that he doesn't
<rogpeppe> m_3: yeah, that's what i'm thinking
<rogpeppe> m_3: so by my thinking there's nothing that currently needs to be solved
<m_3> lemme check to see if I can  use them together
<rogpeppe> m_3: thanks
<m_3> I've been using dedicated environments.yaml files to day
<m_3> date
<m_3> until package changes (with update-alternatives) lands
<rogpeppe> m_3: BTW if you have some moments free, i'd very much appreciate your thoughts on the new juju-wait command, which i *think* is just about as we discussed: https://codereview.appspot.com/8565044/
<m_3> rogpeppe: looks like they can coexist in the same file
<rogpeppe> m_3: cool
<rogpeppe> m_3: you can try out the juju-wait cmd by doing: go get launchpad.net/~rogpeppe/juju-utils/000-juju-wait/cmd/juju-wait
<m_3> no need to change anything atm then
<rogpeppe> m_3: great!
<rogpeppe> m_3: although you probably want to delete $GOPATH/src/launchpad.net/~rogpeppe afterwards...
<m_3> rogpeppe: will that change enough that I should tear the envi down before go getting?
<rogpeppe> m_3: nope
<rogpeppe> m_3: it should just work
<m_3> ack
<m_3> I'll try it in scale stuff now then
<rogpeppe> m_3: brill
<m_3> gotta rebuild tho?
<rogpeppe> m_3: nope
<rogpeppe> m_3: that one command should have made "juju-wait" available
<rogpeppe> m_3: assuming $GOPATH/bin is in your PATH
<m_3> rocking
<rogpeppe> m_3: although...
<rogpeppe> m_3: it depends what revno of juju-core you're on
<rogpeppe> m_3: and what version you've bootstrapped
<rogpeppe> m_3: if you've bootstrapped 1127 or later, you should be good
<m_3> yup
<rogpeppe> fwereade: ping
<fwereade> rogpeppe, pong
<m_3> sorry.... suuuuper-latent
<fwereade> rogpeppe, supper is just here
<rogpeppe> fwereade: ok, back later?
<fwereade> rogpeppe, definitely
<rogpeppe> fwereade: cool, ping us then
 * rogpeppe smells curry odours drifting up the stairs
<m_3> rogpeppe: http://paste.ubuntu.com/5711150/
<rogpeppe> m_3: ah!
<rogpeppe> m_3: trivially fixed
<rogpeppe> m_3: try go get -u launchpad.net/~rogpeppe/juju-utils/000-juju-wait/cmd/juju-wait
<rogpeppe> m_3: and try the command again
<rogpeppe> m_3: i had forgotten to import all the providers
<rogpeppe> m_3: it's an interesting point actually - we probably want to make it easier for commands to import all providers.
<m_3> rogpeppe: trying
<rogpeppe> m_3: any joy?
<m_3> rogpeppe: still waiting
<m_3> rogpeppe: it seems to be waiting on hadoop-slave/99
<m_3> so now I'm just waiting for provisioning
<m_3> might have to do something with secgroups
<m_3> not sure atm
<rogpeppe> m_3: a good first test is to juju-wait for something that you already know is in the state you're waiting for
<m_3> lemme try it on a lower number :)
<rogpeppe> m_3: you can do it without interrupting the other one
<m_3> rogpeppe: seems to be beautiful
<rogpeppe> m_3: there's a known provisioning problem at the moment - if the provisioner gets a temporary error, it just marks the machine with the error and never retries
<rogpeppe> m_3: excellent!
<fwereade> rogpeppe, ping
<rogpeppe> fwereade: pong
<fwereade> rogpeppe, more-or-less back
<rogpeppe> fwereade: i'm just implementing service config settings in the AllWatcher
<rogpeppe> fwereade: i just wanted to run my plan by you in case it's crack
<rogpeppe> fwereade: i'll watch the settings collection; if i see a service settings change, i look at the current service info that i've got stored; if the settings are for a different url, then i ignore the change, otherwise i set it.
<rogpeppe> fwereade: when i see a new service, i fetch the settings too
<rogpeppe> fwereade: basically a similar approach to the constraints
<rogpeppe> fwereade: does that sound more or less right?
<fwereade> rogpeppe, hmm,have you considered per-charm settings?
<rogpeppe> fwereade: that's the "if the settings are for a different url" part
<fwereade> rogpeppe, sorry, I did not read as closely as I should have: yes, that sounds perfectly correct
<rogpeppe> fwereade: great
<fwereade> rogpeppe, service existence should guarantee settings existence for the its charm url
<rogpeppe> fwereade: i'm not sure
<rogpeppe> fwereade: i don't think there's an ordering guarantee on transaction ops
<fwereade> rogpeppe, they execute inthe order passed, AIUI
<rogpeppe> fwereade: i'd like to know that for sure. i thought transactions with ops in different collections could execute concurrently.
<rogpeppe> fwereade: i've been assuming there's no such guarantee
<rogpeppe> fwereade: it makes my life easier if there is such a guarantee, but i'd like chapter and verse from the txn package before i change things accordingly.
<fwereade> rogpeppe, this is one where I fall back to argument from authority rather than experience -- niemeyer said they execute in order, and I believe him
<niemeyer> fwereade, rogpeppe: you can trust it.. it'd be a very weak system otherwise
<niemeyer> fwereade, rogpeppe: Silly example: a bank transference might temporarily have more total money than exists
<rogpeppe> niemeyer: ok. it might be worth adjusting the documentation for txn.Runner.Run then.
<rogpeppe> niemeyer: it doesn't mention ordering at all
<rogpeppe> niemeyer: "
<rogpeppe> Operations across documents are not atomically applied, but are
<rogpeppe>     guaranteed to be eventually all applied or all aborted
<rogpeppe> "
<rogpeppe> niemeyer: i'd read that as "no ordering guaranteed"
<rogpeppe> niemeyer: BTW in benchmarks, i'm only seeing 20-30 transaction-based operations per second. is there some way we can speed that up, or is that to be expected?
<niemeyer> rogpeppe: It's significantly faster than that..
<niemeyer> rogpeppe: IIRC, txn can pump about ~250 operations per second
<niemeyer> rogpeppe: Of course, that depends on the operations performed
<TheRealMue> so, time to stop, cu tomorrow
<niemeyer> rogpeppe: In fact, it's actually better than this
<niemeyer> rogpeppe: My original tests were showing about 200 *transactions* per second
<niemeyer> rogpeppe: So that was at least double that in operations, or about 500 operations per second
<niemeyer> rogpeppe: If transactions increase the number of operations, naturally that will impact the volume of *transactions* you get
<m_3> hey, can somebody help me turn off secgroup per machine?... just temporarily for scale testing
<rogpeppe> niemeyer: there's a benchmark in state that i'd be interested in trying to improve
<rogpeppe> niemeyer: BenchmarkAddUnit
<niemeyer> rogpeppe: Sure, so.. hmm.. improve it..? :)
<rogpeppe> niemeyer: i can call AddUnit about 47 times a second
<rogpeppe> niemeyer: (as of just now)
<m_3> I'll let dave check this for profiling, but just wanna get past this hurdle
<rogpeppe> niemeyer: it's true that AddUnit is reasonably complex (it does a fetch, then a transaction), but it's not unrepresentative of the kinds of things we'll want to do
<rogpeppe> niemeyer: and adding 10000 units is something we do want to do
<niemeyer> rogpeppe: Sure
<niemeyer> rogpeppe: +1 :)
 * rogpeppe goes for something to eat
<rogpeppe> back in a bit
<fwereade> m_3, firewall-mode: global, I *think*
<fwereade> m_3, you definitely need a fresh environment for that though
<fwereade> m_3, I don't know how the firewaller will react to a mode switch but it is very unlikely to be pretty
<fwereade> m_3: yeah, firewall-mode: global should do it
<rogpeppe> back
<m_3> fwereade: got it thanks... that got us past the hump
<fwereade> m_3, sweet
<m_3> next hump is ram limits on the acct :(
<m_3> rogpeppe: wait working like a champ
<rogpeppe> m_3: that's very good to know!
<rogpeppe> m_3: will submit soon. then it'll be go get launchpad.net/juju-utils/cmd/juju-wait
<rogpeppe> fwereade: oh darn, i've forgotten the other wrinkle - watching the settings collection doesn't give you values which are defaulted. i guess i'll need to keep all the charm metadata around and merge each time.
<thumper> morning
<rogpeppe> thumper: hiya
<thumper> hi rogpeppe
<rogpeppe> thumper: i'd appreciate a review of this branch, if you have a moment: https://codereview.appspot.com/8761045/
<rogpeppe> m_3: juju-wait has landed
<m_3> rogpeppe: ack, thanks!
<rogpeppe> m_3: np
<rogpeppe> m_3: is it working?
<rogpeppe> m_3: (i mean juju-core in general, really)
<thumper> rogpeppe: sure, just looking at one for fwereade_
<rogpeppe> thumper: np
<m_3> rogpeppe: we're waiting to get past some more acct limits
<m_3> but great so far
<rogpeppe> m_3: that's good to hear
 * thumper goes to make coffee and bagel
<m_3> rogpeppe: we're gonna do the wed charmschool with juju-core
<m_3> I think
<m_3> barring anything we run into tomorrow
<rogpeppe> m_3: that will be interesting
<m_3> I'll do a dry-run tomorrow to decide
<rogpeppe> right, that's me done and dusted
<rogpeppe> g'night all
<thumper> night
#juju-dev 2013-04-16
<davecheney> m_3: ping
<davecheney> bigjools: LP keeps eating my package
<davecheney> is there any log of what or why ?
<davecheney> hang on
<davecheney> LP says I have no pgp keys registered ...
<m_3> davecheney:
<m_3> yo
<m_3> davecheney: so good news... just about to spin 200 nodes
<m_3> davecheney: btw, we got approval for 2k as soon as hp catches up
<fwereade> m_3, cool, has anything fallen over yet?
<m_3> davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game
<m_3> (afaik)
<m_3> fwereade: nope, only at 100 atm
<m_3> fwereade: have 200-node answers shortly
<davecheney> m_3: "10:24 < m_3> davecheney: btw, no log whatsoever... just email half an hour later saying it failed... til then, guessing game"
<davecheney> ^ what does this mean ?
<m_3> davecheney: lemme know when you can play... I'm just bouncing things around atm, but plan to hand it to you in an hour or two
<fwereade> m_3, excellent
<davecheney> m_3: soon
<m_3> davecheney: oh, sorry, that was in response to package uploads
<davecheney> just getting fucked by pgp and launchpad at the moment
<m_3> davecheney: ack
<m_3> feel your pain
<davecheney> best I can tell, it is just throwing away my upload because my pgp keys were wrong
<davecheney> m_3: what is the url of the host ?
<davecheney> i'll shoulder surf
<m_3> fwereade: just the sensitivity to rate limiting... makes this soooo much more pleasant than before
<m_3> davecheney: same as before... /me looks
<m_3> ubuntu@15.185.162.247
<m_3> davecheney:
<m_3> ^^
<m_3> davecheney: `tmux attach`
<m_3> davecheney: sorry, can't do voice atm
<bigjools> davecheney: https://answers.launchpad.net/launchpad/+faq/227
<davecheney> m_3: that is fine
<davecheney> bigjools: ack
<davecheney> bigjools: * If the upload is signed, even if it gets rejected by packaging-inconsistencies, you should receive an email explaining the reasons within 5 minutes.
<davecheney> ^ never happens
<fwereade> davecheney, you might have a particular interest in https://codereview.appspot.com/8786043 because it hits the provisioner
 * davecheney looks
<fwereade> rogpeppe, if you're on, and/or thumper, ^^
<bigjools> davecheney: "You probably have not signed the upload, or have not signed it with a GPG key registered for your Launchpad account"
<thumper> fwereade: s'up?
<fwereade> thumper, https://codereview.appspot.com/8786043
<davecheney> m_3: turn offed all that debug shit
<davecheney> m_3: purdy
<m_3> davecheney: totally want an ncurses ui
<m_3> like htop
<m_3> juju-top
<thumper> fwereade: I'll look when I'm done with the current train of thought
<m_3> jcastro says "hi"
<fwereade> thumper, lovely, thanks
<thumper> m_3: where is jcastro?
<fwereade> hi jcastro
<m_3> crap latency killing us
<m_3> openstack devel summit
<m_3> davecheney: can you ctrl-c that tail?
<m_3> nm
<m_3> now it's a waiting game
<m_3> davecheney: http://15.185.169.172:50070/
<m_3> "Live Nodes"
<m_3> that's when they show up from the relation
<davecheney> 52 ... not bad
<m_3> coming up nicely
<m_3> davecheney: feel free to turn on the tail when you want... just turn it off when you're done cause it clogs up my pipes
<davecheney> m_3: i followed your package build isntructions
<m_3> :)
<m_3> davecheney: and?
<davecheney> but LP is shitty at me because it has produced a mixed upload
<davecheney> contains both src and bin
<m_3> working? or just stuck on dput and lp?
<m_3> oh, right
<m_3> so the pbuilder-dist stuff is _only_ to test it out
<davecheney> riiigh
<m_3> when it comes time to dput it to lp... just use the debuild
<m_3> davecheney: I think the last email in the chain of three or so I sent the other day has all you need
<davecheney> that might be where I am going wront
<davecheney> wrong
<davecheney> i hav been working off the first
<m_3> davecheney: yeah, sorry
<davecheney> s'ok
<davecheney> its not your fault
<m_3> davecheney: that's the dev process... build and test
<m_3> there's probably a way to just uplaod the source bits to lp
<m_3> but shit I don't know
<m_3> davecheney: so I'm currently planning on _starting_ a terasort once the 197 slaves are up
<m_3> won't let that one finish or run for too long
<m_3> once that's working, then I'll turn it all over to you
<davecheney> m_3: ok, what are the rules about shutting it down ?
<davecheney> we're paying for this right ?
<m_3> play at will... current limits to 200, but that might bump to 2000 as early as a few hours
<m_3> davecheney: we're paying yes
<m_3> davecheney: just destroy it when you're not activlely testing something
<davecheney>  7863 root      20   0 1035m 317m    0 S   25 15.8   2:28.72 mongod
<davecheney>  7892 syslog    20   0  331m 1748 1212 S    4  0.1   0:34.75 rsyslogd
<davecheney>  7903 root      20   0  676m 118m 6712 S    1  5.9   0:13.78 jujud
<davecheney> top three processes on the bootstrap machine
<davecheney> fwereade: we have to turn down all the document logging bullshit
<davecheney> rsyslog is nearly the top process on the bootstrap machine
<fwereade> davecheney, dammit, I just wish we had slightly more sophisticated logging so we could trun that stuff on when we need it
<davecheney> juju-goscale2-machine-0:2013/04/16 00:29:34 DEBUG state/watcher: got request: watcher.reqWatch{key:watcher.watchKâÂ·Â·Â·Â·Â·ey{c:"machines", id:interface {}(nil)}, info:watcher.watchInfo{ch:(chan<- watcher.Change)(0xf840220a50), revno:0}âÂ·Â·Â·Â·Â·}
<davecheney> ^ i'm sure we do not need this crap
<fwereade> davecheney, I actually use it somewhat regularly... it has useful information buried in amongst the spam
<fwereade> davecheney, however
<fwereade> davecheney, it *is* fricking ridiculous
<davecheney> fwereade: i've seen in other places
<davecheney> DEBUG2 and TRACE
<davecheney> i think the watcher stuff could be classed as TRACE
<fwereade> davecheney, yeah, that sounds reasonable, but we don't have any useful filtering gubbins regardless
<davecheney> m_3: looks pretty decent to me
<davecheney> mongo is taking a pounding
<davecheney> but the jujud process is basically idle (although it may be blocking on mongo)
<davecheney> m_3: actually at the 200'th node is the most important time
<fwereade> davecheney, however, so long as it's not *too* difficult to turn it back on I would trivial LGTM something that turned off the watcher stuff
<davecheney> every new machien in the environment adds a worker which is racing to complete any outstanding transaction
<davecheney> so the more workers, the bigger the race
<davecheney> this is lower case race, for those watching at home
<fwereade> davecheney, I would consider "s/false/true/ somewhere and upload new tools" to be not *too* difficult
<fwereade> davecheney, yeah, I have been wondering about how those would end up
<davecheney> fwereade: yeah, we can hack it for load testing
<fwereade> davecheney, although it's not *any* outstanding transaction
<davecheney> fwereade: really ?
<fwereade> davecheney, yeah, just one that's blocking one it wants to make
<davecheney> ohhh, so if you are not actrively waiting on a transaction to complete
<davecheney> you don't participate
<davecheney> that makes it a lot better
<fwereade> davecheney, however certain documents are much too popularly written
<davecheney> m_3: i think some of the delay in juju status is too many round trips
<fwereade> davecheney, I *suspect* that contention for the service document of whatever has lots of units is the real killer
<fwereade> davecheney, I would be very interested to know how 1x200 looks vs 10x20
<davecheney> fwereade: understood
<davecheney> good test
<m_3> fwereade: yup, that sounds like a decent next step... easy to gen multiple smaller named clusters
<m_3> fwereade: launchpad id?
<fwereade> m_3, I am fwereade, I think
<m_3> davecheney: whooops wtf was that?
<m_3> strace
<davecheney> trying to figure out where all the time is going
<m_3> oh, the '-v'
<m_3> ack
<davecheney> there is a large block where status is waiting for the other side to return some data
<davecheney> atually, let me try something
<m_3> k
<davecheney> m_3: in theory I should be able to scp the .juju from the control machine, then use JUJU_HOME=... juju status
<davecheney> to run from my machine
<m_3> davecheney: we didn't inject your keys
<davecheney> lucky(/tmp) % JUJU_HOME=/tmp/.juju juju status -v
<davecheney> 2013/04/16 11:09:59 INFO JUJU:juju:status environs/openstack: opening environment "goscale2"
<m_3> into the environment... lemme check
<davecheney> 2013/04/16 11:10:02 INFO JUJU:juju:status environs/openstack: waiting for DNS name(s) of state server instances [1500421]
<davecheney> i only need the outer machine
<davecheney> fwereade: that is a win for JUJU_HOME
<m_3> nope, only the outer machine's keys are in that env
<davecheney> you can just grab the .juju for another environment
<davecheney> then use JUJU_HOME=... juju $SUBCOMMAND
<davecheney> m_3: veyr very very slow on my host
<davecheney> i suspect a lot of round trips
<fwereade> davecheney, shame not to share caches though
<davecheney> fwereade: what do we not cache ?
<fwereade> davecheney, I think that `juju switch` thing might have some mileage
<fwereade> davecheney, charms mainly
<davecheney> fwereade: i remain -1 on that proposal
<fwereade> davecheney, that might be it actually
<davecheney> for the reasons stated
<fwereade> davecheney, yeah, I'll keep it to the list, it just made me think of it
<m_3> davecheney: also... in az2 of hp so west US prob
<m_3> davecheney: the "outer" machine is local to that az
<davecheney> m_3: ahh, need -f
<davecheney> basically just too many round trips
<davecheney> some multiple of the number of machines and services
<m_3> ack
<davecheney> dunno, i think on balance that is better than the topology node
<m_3> still got a few danglers...
<davecheney> i say start, you've got 95% of the machines reporting in
<m_3> really need to adjust the numbers tho :)
<m_3> haha
<m_3> lemme bump them up so something a little more appropriate for that cluster
<davecheney> fwereade: we have a lot of machine agetns restarting
<m_3> fwereade: your keys are there btw
<fwereade> m_3, cool, thanks
<davecheney> fwereade: http://paste.ubuntu.com/5711961/
<davecheney> why does the machine agent keep reconnecting to state
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1169378
<davecheney> i guess there is no _mup_ 'cos linnode got hacked
<m_3> davecheney: I'm gonna go grab food
<m_3> davecheney: you can just let the job run or not
<m_3> davecheney: easiest is to just destroy-environment
<davecheney> m_3: lets tear it down
<m_3> davecheney: ok
<davecheney> some good results already
<davecheney> we just need the all-machines.log from the 0 machine
<davecheney> that is all we need
<m_3> davecheney: I'm out feel free to do whatever
<davecheney> ok will do and destroy
<m_3> davecheney: I'll try to bump up to 2k tomorrow
<davecheney> fwereade: I would like to add a 'starting $CMD' log message
<fwereade> davecheney, thanks
<fwereade> davecheney, +1 to that
<davecheney> we're making a connection to state every few seconds per worker
<davecheney> so two per machine
<davecheney> but no error lines ...
<fwereade> davecheney, actually, there's a log.Noticef("agent starting")
<fwereade> davecheney, I don't think the actual process is bouncing
<davecheney> fwereade: right, so the agent isn't restarting
<davecheney> but the job is rerunning
<davecheney> so something is killing the Tomb
<davecheney> ubuntu@juju-goscale2-machine-27:~$ head  /var/log/juju/unit-hadoop-slave-25.log
<davecheney> 2013/04/16 00:36:52 NOTICE agent starting
<davecheney> indeed there is a process restart message
<davecheney> ubuntu@juju-goscale2-machine-27:~$ grep -c starting /var/log/juju/unit-hadoop-slave-25.log
<davecheney> 13
<fwereade> davecheney, ok, but those dials are happening every 30s
<fwereade> davecheney, I bet it is mgo
<davecheney> that fucking anti feature
<fwereade> davecheney, we pass that dial func in
<fwereade> davecheney, I imagine it is checking all the addresses in the cluster
<davecheney> fwereade: m_3: i have the all-machines log, i'm turning off the 200 machine environment
<fwereade> davecheney, cool
<davecheney> juju-goscale2-machine-0:2013/04/16 00:33:33 ERROR worker/provisioner: cannot start instance for machine "16": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)
<davecheney> juju-goscale2-machine-0:2013/04/16 00:35:52 ERROR worker/provisioner: cannot start instance for machine "28": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)
<davecheney> juju-goscale2-machine-0:2013/04/16 00:36:08 ERROR worker/provisioner: cannot start instance for machine "30": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)
<davecheney> juju-goscale2-machine-0:2013/04/16 00:46:25 ERROR worker/provisioner: cannot start instance for machine "82": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)
<davecheney> juju-goscale2-machine-0:2013/04/16 00:46:55 ERROR worker/provisioner: cannot start instance for machine "85": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)
<davecheney> m_3: this is why those machines didn't come up
<davecheney> i think I have a patch for that logging snafu
<davecheney> interesting
<davecheney> destroy-environment blocks on hpcloud
<davecheney> on ec2, it's fire and forget
<davecheney> fwereade: ubuntu@juju-hpgoctrl2-machine-0:~$ juju destroy-environment -v
<davecheney> 2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: opening environment "goscale2"
<davecheney> 2013/04/16 01:36:39 INFO JUJU:juju:destroy-environment environs/openstack: destroying environment "goscale2"
<davecheney> ubuntu@juju-hpgoctrl2-machine-0:~$
<davecheney> do we need a DEBUG or INFO "command finished"
<davecheney> so we can tell how long the command runs for ?
<thumper> would be nice
<davecheney> i'll raise a ticket
<davecheney> lucky(~) % bzcat all-machines-201304016.log.bz2  | wc -l
<davecheney> 1548384
<davecheney> lucky(~) % bzcat all-machines-201304016.log.bz2  | grep -c 'watcher: got'
<davecheney> 1023345
<davecheney> 66% of all log lines are 'watcher got such and such'
<fwereade> davecheney, +1
<davecheney> fwereade: card raised
<fwereade> thumper, https://codereview.appspot.com/8663045/ has a couple of extra comments and surprisinglyfew actual changes
<davecheney> the whole log file, 200 machines, compressed to 5mb
<davecheney> sooooo much duplication
<fwereade> davecheney, I had a vague thought in mind that it might compress quite nicely, yeah, especially considering every one of those messages is sent to every machine
<davecheney> yeah, it might be a low blow
<davecheney> those log lines contain exactly the kind of duplication bz2 loves
<thumper> davecheney: I have a var foo [20]byte
<thumper> davecheney: and I want a string of that...
<thumper> but string(foo) doesn't work
<thumper> what does?
<davecheney> string(foo[:])
<davecheney> gotta slice the array first
<thumper> ta
<thumper> davecheney: can strings contain embedded nulls?
<davecheney> thumper: yes
<davecheney> strings (and slices) know their length
<davecheney> the don't rely on \0
<thumper> davecheney: what is the best way to compare to byte slices?
<davecheney> reflect.DeepEquals(slice, slice) is the simplest
<thumper> davecheney: can I assign a byte array to a byte slice?
<thumper> and will it do what I expect?
<davecheney> thumper: yes
<davecheney> the array backs the slice
<thumper> thought so...
 * thumper pokes some more
<thumper> fucking channel magic...
<thumper> if this works, fair dinkum, it'll be a miricle
<thumper> hah, well the first bit worked...
<thumper> heh, it worked
<thumper> colour me surprised...
 * thumper fears review comments on this one...
<thumper> but proposing anyway
<thumper> Rietveld: https://codereview.appspot.com/8602046 for a file system lock implementation using lock directories
 * thumper sighs
<thumper> realised I missed a test for Unlock, but it can wait as I have to make dinner now...
<bigjools> nice one thumper
<thumper> thanks bigjools
<thumper> maybe it'll even get through review without changing too much :)
<bigjools> thumper: it's the sort of thing that should be in Go's core
<thumper> :)
<thumper> yeah, but it isn't in python either
<thumper> that is why bzrlib implemented one
 * thumper moves into the kitchen
<thumper> ciao
<rogpeppe> mornin' all
<rvba> fwereade: Hiâ¦ if it's the intented behaviour, then fineâ¦ I was troubled because pyJuju behaves differently: http://paste.ubuntu.com/5712470/.
<fwereade> rvba, yeah, pyjuju doesn't have lifecycle management
<rvba> fwereade: all right thenâ¦ I'll just make sure that it works as expected if I run "resolve mediawiki/0" as you advised.
<fwereade> rvba, yeah, if that doesn't work there's a problem
<fwereade> rvba, it did work for me though :)
<fwereade> TheMue, dimitern, rogpeppe: morning all btw
<rogpeppe> fwereade: hiya
<rogpeppe> fwereade, dimitern: i'd appreciate a review of this, if poss. the gui people are wanting to use it.
<fwereade> rogpeppe, allwatcher service config?
<rogpeppe> fwereade: yup
<TheMue> fwereade: heya, already woke up? seen a 4am comment by you.
<TheMue> rogpeppe, dimitern: good morning to you too
<fwereade> TheMue, just a short nap ;p
<rvba> fwereade: by "resolving" I suppose you mean removing the (broken) relation right?
<fwereade> rvba, yeah
<fwereade> rvba, `juju resolved mediawiki/0`
<TheMue> fwereade: take care for yourself
<fwereade> TheMue, I'm ok, thanks, but I think I will be unilaterally declaring a couple of swap days next week ;p
<TheMue> fwereade: yeah, sgtm
<TheMue> fwereade: we need you in the long term
<rvba> fwereade: it does not seems to fix the problem here: http://paste.ubuntu.com/5712542/
<fwereade> TheMue, I am reasonably well attuned to my own burnout signs, right now the psychologically healthy thing is to Get Things Done ;p
<rvba> seem*
<fwereade> rvba, I don't see a `juju resolved mediawiki/0` in there
<fwereade> rvba, I see a destroy-relation, which would be silently ignored because the relation's already dying
<TheMue> fwereade: i've been in a similar flow once, but w/o any burnout signs my health striked back over night. that's why i care.
<rvba> fwereade: ah right, that's what I was missing (sorry, I'm still used to py juju). With that it worked fine!
<fwereade> rvba, sweet
<rvba> fwereade: tyvm :)
<TheMue> dimitern: you had a few comments on https://codereview.appspot.com/8705043. could you please take a new look?
<fwereade> TheMue, btw, how's juju-deploy looking? in terms of what status is checks for?
<TheMue> dimitern: i think it's all covered now.
<fwereade> rvba, fwiw quite a lot of the lifecycle stuff is covered in some detail in the stuff under doc/
<TheMue> fwereade: will start now after i just had proposed the latest changes. so far i only did a quick scan into how it is configured, but not how it is working.
<rvba> fwereade: ok, I'll have a look.
<rvba> ta
<fwereade> rvba, it's generally aimed at developers and might clarify a few things
<fwereade> rvba, start with the glossary, terms in there are used without explanation elsewhere
<rvba> fwereade: another question: I terminated all the machines, they were successfully released (I see that on the MAAS server), but they still show up in "juju status".  Is that normal? http://paste.ubuntu.com/5712552/
<fwereade> rvba, that's in review :/
<rvba> fwereade: all right then :)
<rvba> Thanks.
<fwereade> rogpeppe, reviewed
<rogpeppe> fwereade: thanks
<fwereade> rogpeppe, fwiw parts of https://codereview.appspot.com/8786043/ might make you happy :)
<fwereade> rogpeppe, I actually got a physical tingle from hitting `d`
 * rogpeppe is very happy to see those big blocks of red
<wallyworld_> jam: hi, did my email make sense?
<jam> wallyworld_: I understood it, still trying to sort out if I agree with it. Also, William has a patch that changes things around.
<wallyworld_> ok, np
<wallyworld_> i can explain a bit more in the standup if required
<rogpeppe> fwereade: i think tim got as far as the "info0" name and threw his hands up in disgust
<fwereade> rogpeppe, without context, it is a pretty bad name ;)
<rogpeppe> fwereade: the context is all there to see...
<fwereade> rogpeppe, there's quite a lot of assumed knowledge that you have to just kinda pick up by osmosis though
<rogpeppe> fwereade: yeah
<fwereade> rogpeppe, reading the docs helps
<fwereade> rogpeppe, but I suspect that really you need to read them, forget them, hit the code in anger a bit, and then read them again, at which point things may start clicking
<fwereade> rogpeppe, I have found that is often my pattern
<rogpeppe> fwereade: fwereade: BTW i thought about using the Map method, but honestly we are already knee deep in knowledge about the settings and i prefer to avoid generating unnecessary garbage; maybe i should just avoid all use of the Settings object and just fetch into directly into the map like GetAll does
<rogpeppe> fwereade: yeah
<rogpeppe> fwereade: the Go docs, you mean?
<fwereade> rogpeppe, most large systems I have to assimilate tbh
<fwereade> rogpeppe, it's in the nature of technical documentation
<rogpeppe> fwereade: yeah
<rogpeppe> fwereade: it doesn't make sense until you start trying to do something with it
<fwereade> rogpeppe, every sentence is important but the importance of some cannot be readily grasped on a first read through
<jam> wallyworld_: interestingly, if you set "public_bucket_url" it also fails to sync-tools --public
<jam> Gives an Unauthenticated error.
<jam> so if you *don't* set it, then it goes via the swift and existing client (I guess).
<jam> If you do set it
<jam> then it does a different unauthed connection
<jam> ?
<wallyworld_> jam: i got it to work by commenting out the FindTools code which looked at the private bucket
<wallyworld_> i set public-bucket-url and it just looked at that and didn't attempt to open the private bucket
<jam> wallyworld_: fwereade's patch changes that around a lot, though it still looks at the private bucket (to see if there are tools there causing it to ignore the public bucket)
<wallyworld_> sure, but thsat patch should allow control-bucket to be ""
<fwereade> rogpeppe, I argued for keeping the error in https://codereview.appspot.com/8748046/ - let me know what you think
<jam> I believe his patch changes it to only look at the pub bucket of the source (good), but still look at pub and private when --public is set.
<wallyworld_> jam: it should do that but allow control bucket to be ""
<wallyworld_> and ignore it if not specified
<jam> fwereade: well offhand it would fix a bug if you just didn't search the private bucket at all.
<wallyworld_> so that we can set up and env for just a public bucket
<wallyworld_> for the shared swift account
<fwereade> jam, wallyworld_: https://codereview.appspot.com/8726044/ and https://codereview.appspot.com/8748046/ are the relevant CLs
<fwereade> jam, wallyworld_: as I recall we agreed in atlanta that any private tools should exclude all public ones from consideration
<wallyworld_> fwereade: yes, but if an account only has a public bucket dfined, we should allow for that
<jam> fwereade: the downside to that is just not working at all, but I think the argument was with dev versions you don't expect it to work
<rogpeppe> fwereade: looking
<jam> fwereade: so the specific bug is a bit involved. 1) our shared HP account only has object store (no compute), 2) in Goose when you search the private bucket it checks that you have compute access.
<wallyworld_> fwereade: so the current HP Cloud shared public bucket should be able to be set up and work just to provide tools etc, and no private bucket is needed, since it's just a tools repository
<jam> so that it can give a nicer error message than falling over and failing later.
<fwereade> jam, wallyworld_: I'm not convinced an environment without a control-bucket is meaningful
<jam> fwereade: so again, the hp shared tools account isn't useful
<wallyworld_> fwereade: jam: the reason it checks for compute is that a single openstack client is used to access all server resources - swift and compute
<jam> it is a storage for a public bucket
<jam> no compute means you can't run juju there
<jam> but that is fine
<fwereade> jam, wallyworld_: ISTM it would be easiest to have a public-tools env with the control-bucket set to the other envs' public-bucket
<jam> you just want to store files
<jam> fwereade: you need the creds
<jam> to write to the buckewt
<jam> bucket
<rogpeppe> jam, wallyworld: if the public bucket is "", doesn't the provider just return an EmptyStorage?
<wallyworld_> rogpeppe: yes, but the issue is the private bucket
<jam> rogpeppe: public-bucket vs public-bucket-url I believe
<rogpeppe> wallyworld_: sorry, i meant the private bucket
<wallyworld_> fwereade: it's like the s3 public bucket - we just want a place to get tools from, not run juju
<wallyworld_> rogpeppe: for openstack, it currently assumes control bucket must be specified
<rogpeppe> wallyworld_: "it" being which piece of code, sorry?
<fwereade> damn sorry bbiab
<wallyworld_> rogpeppe: that's an implementation decision that needs to be changed if we want to allow public bucket only ens to be specified
<wallyworld_> for openstack
<wallyworld_> rogpeppe: the SetConfig() for the openstack provider
<rogpeppe> wallyworld_: ah, so it's an openstack provider issue
<wallyworld_> yes, an implementation decision that control bucket is expected
<wallyworld_> since juju won't work without one
<wallyworld_> but if we want sync-tools to work with just a public bucket, we need to change that
<jam> wallyworld_, rogpeppe: so there isn't a default config for control-bucket, so you have to specify one
<jam> and I don't know what s3Unlocked.Bucket("") does
<wallyworld_> jam: the default is "" but the code assumes it is specfied
<wallyworld_> for openstack
<wallyworld_> since juju needs it
<rogpeppe> jam: that would be easy to change - nothing outside the provider-specific code knows about the control-bucket setting AFAIK
<jam> wallyworld_: for ec2, there is no default, so you have to specify something.
<wallyworld_> jam: effectively, that's the same for openstack
<jam> but I don't know what "" does for a bucket.
<wallyworld_> since it dies if it is ""
<wallyworld_> but for sync-tools, we just want an env that specifes a public bucket to copy to
<wallyworld_> and not require a control bcket
<jam> wallyworld_: technically both from and to, but I cheat with "juju-dist" as the private source bucket.
<jam> since that overlaps with the actual public bucket (I believe)
<wallyworld_> yes, the public bucket for tools assumes juju-dist
<wallyworld_> rogpeppe: yes, only the provider knows about the control bucket, so it is easy to change
<rogpeppe> wallyworld_: cool
<davecheney> rogpeppe: can you please try bootstrapping a quantal state server again
<davecheney> i believe the problem is fixed
<wallyworld_> rogpeppe: the issue came up cause the account where the "standard" hp cloud public bucket was created only had swift enabled, not compute. but we dont need compute for that since it's just a tools repoistory, but the provider code needs to be tweaked to allow that
<rogpeppe> davecheney: great!
<rogpeppe> davecheney: you'd probably be best asking someone that's actually running quantal though
<davecheney> rogpeppe: who reported the issue that you reported to me ?
<davecheney> rogpeppe: if it's not conveninet
<davecheney> don't sweat it
<jam> davecheney: yay, you got https://launchpad.net/~juju/+archive/experimental sorted out?
<rogpeppe> davecheney: it might've been benji
<davecheney> i'll bootstrap a machine after din dins
<davecheney> jam: yeah, turns out there is an amount of foul language that can solve any problem
<jam> davecheney: I can imagine that level is pretty high
<rogpeppe> davecheney: i think using default-series=quantal should bootstrap a quantal node
<davecheney> rogpeppe: indeed, i'm well versed in hacking that crap
<rogpeppe> davecheney: :-)
<davecheney> jam: rogpeppe i have heard from sources that a backport of 2.2.4 is in the works
<davecheney> so we may not have to live with this hack for too long
<TheMue> *: python freaks to the front. what does the machine = machine = in machine = machine = status["machines"][m_id]["dns-name"] mean?
<fwereade> TheMue, er, file/line please?
<TheMue> fwereade: one moment
<TheMue> fwereade: http://bazaar.launchpad.net/~gandelman-a/juju-deployer/trunk/view/head:/utils.py#L88
<fwereade> TheMue, I think it's just a typo, equivalent to machine = machines[...]
<fwereade> TheMue, er, you know what Imean
<fwereade> it's getting harder to read python these days without refactoring it to go in my head
<TheMue> fwereade: that's how i interpreted it too, just a typo. ;)
<fwereade> btw, can I get a review from somebody on https://codereview.appspot.com/8786043/ please?
<fwereade> it unfucks some fairly critical behaviour
<rogpeppe> fwereade: looking
<rogpeppe> fwereade: replied to earlier review also, BTW
<fwereade> rogpeppe, tyvm
<TheMue> fwereade: you've got a review
 * TheMue found another nice py statement he has to think twice about. looks like a list of sets is created by a post-positioned for loop. 
<davecheney> ooh, some sneaky sod has introduced another dependency on the build
<davecheney> TheMue: rogpeppe today I found a great use for JUJU_HOME
<rogpeppe> davecheney: oh yes?
<davecheney> scp over the ~/.juju of another environment
<rogpeppe> davecheney: what's the new dep?
<davecheney> JUJU_HOME=/tmp/.juju juju status << you see their environment
<davecheney> rogpeppe: maas
<davecheney> it's a build dep on environs/maas
<davecheney> but I don't think it is part of the jujud deps
<rogpeppe> davecheney: ah yes. i didn't actually notice when that went in
<rogpeppe> davecheney: it should be
<rogpeppe> davecheney: otherwise jujud won't work on maas
<davecheney> well, then they haven't updated the check
<rogpeppe> davecheney: that's a nice use for JUJU_HOME
<TheMue> davecheney: nice
<davecheney> var expectedProviders = []string{ "ec2", "openstack",
<davecheney> }
 * rogpeppe still misses plan 9: bind /n/remote/usr/rog/.juju $home/.juju; juju status
<rogpeppe> davecheney: yup, that should be there
<rogpeppe> davecheney: i hadn't seen environs/all before
<rogpeppe> davecheney: i was just wanting to do something like that
<rogpeppe> davecheney: to be honest, the expectedProviders check should probably be a test in environs/all
<davecheney> rogpeppe: no, absolutely not
<davecheney> you can duplicate it there if you like
<davecheney> but it must be part of the cmd/juju/main_test
<davecheney> otherwise we'll just fuck ourselves like we did in Atlanta when a transitive dep changed
<rogpeppe> davecheney: did we have environs/all back then?
<davecheney> no
<davecheney> i will still oppose any move to move that check
<TheMue> lunchtime, bbiab
<davecheney> lucky(~/src/launchpad.net/juju-core) % juju bootstrap -v --upload-tools
<davecheney> 2013/04/16 20:37:11 INFO environs/ec2: opening environment "ap-southeast-2"
<davecheney> 2013/04/16 20:37:14 INFO environs/tools: built 1.9.14-quantal-amd64 (2299kB)
<davecheney> 2013/04/16 20:37:14 INFO environs/tools: uploading 1.9.14-quantal-amd64
<davecheney> 2013/04/16 20:37:55 INFO environs/ec2: bootstrapping environment "ap-southeast-2"
<davecheney> 2013/04/16 20:38:00 ERROR command failed: environment is already bootstrapped
<davecheney> when did the bootstapped check move to after the upload tools ?
<rogpeppe> davecheney: fwereade's been doing quite a bit of work in that area
<davecheney> indeed
<davecheney> rogpeppe: https://canonical.leankit.com/Boards/View/103148069/104826393
<davecheney> 66% of our logging goes in watcher debugging messages
<rogpeppe> davecheney: yeah
<rogpeppe> davecheney: it was even worse
<davecheney> rogpeppe: this was a 200 node hadoop instance
<davecheney> 20% cpu to mongo
<davecheney> 16% cpu to rsyslog
<rogpeppe> davecheney: (most of the messages *were* saying "i just saw nothing")
<davecheney> 1-2% for jujud on the bootstrap machine
<rogpeppe> davecheney: i'm surprised about that error. uploadTools shouldn't make the provider-state object in the control bucket
<davecheney> Get:7 http://ppa.launchpad.net/juju/experimental/ubuntu/ quantal/main mongodb-clients amd64 1:2.2.4-0ubuntu3 [20.3 MB]
<davecheney> fuck yea
<rogpeppe> davecheney: that's just 'cos jujud's blocked by mongod, probably
<davecheney> wut ?
<rogpeppe> davecheney: the 1-2% for jujud
<davecheney> oh, yeah, i suspect jujud could use more cpu
<davecheney> but was blocked by mongo
<rogpeppe> davecheney: yup
<davecheney> we are super chatty
<rogpeppe> davecheney: yes
<rogpeppe> davecheney: we should turn log level to info by default
<davecheney> rogpeppe: +100
<rogpeppe> davecheney: and pass through --debug only if the environment is bootstrapped with --debug
<davecheney> + another 100
<rogpeppe> davecheney: and then (not right now) allow dynamic changing of debug level
<rogpeppe> davecheney: ah, i see the problem with your bootstrap
<davecheney> so, ive' overwritten the tools the environment (may) have been using, then failed
<rogpeppe> davecheney: it's that you shouldn't try to upload tools if the environment is already bootstrapped
<rogpeppe> davecheney: right?
<davecheney> correct
<davecheney> but it looks like th echeck happens too lat enow
<rogpeppe> davecheney: i wonder if we should have an Environ.PrepareForBootstrap method
<rogpeppe> davecheney: which will return an error if it's already bootstrapped
<rogpeppe> davecheney: or actually, just "Prepare"
<rogpeppe> davecheney: then the environment could create the control bucket and put "pending" (or something) inside the provider-state object, so that something else can't bootstrap while we're uploading tools
<davecheney> rogpeppe: that sounds like an old bug, "don't go bootstrappin' twice"
<rogpeppe> davecheney: it would be nice if bootstrap could be race-free
<rogpeppe> davecheney: and i'd prefer to design our API such that it's actually possible for a provider to do that
<fwereade> rogpeppe, responded again... I think it must be that there's a use case I'm not seeing
<fwereade> davecheney, rogpeppe: fwiw upload-tools moved to command-time a while ago
<rogpeppe> fwereade: do you see dave's issue though?
<fwereade> davecheney, rogpeppe: coincidentally and not deliberately my pipeline always uploads unique build numbers and so shouldn't overwrite
<rogpeppe> fwereade: if i call juju bootstrap, it shouldn't upload the tools, *then* check that the env is not already bootstrapped
<fwereade> rogpeppe, sure, but you argued very firmly against an IsBootstrapped method when I suggested it a while back...
<rogpeppe> fwereade: yes, and i still think it's wrong, hence my Prepare suggestion above.
<fwereade> rogpeppe, so Prepare would upload the tools?
<rogpeppe> fwereade: no, Prepare would check that the control-bucket doesn't exist and create it otherwise (and do anything else necessary to make it possible to use the environment's Storage)
<fwereade> rogpeppe, that feels to me exactly as racy in effect as an IsBootstrapped
<rogpeppe> fwereade: not quite, because currently there's a very large window (the amount of time it takes to upload the tools) for the race
<rogpeppe> fwereade: and if a provider does have access to an atomic operation, then it's easy to make it non-racy
<rogpeppe> fwereade: whereas IsBootstrapped is *inherently* racy
<fwereade> rogpeppe, and the providers you're aware of with atomic check-and-set operations we could use that way are..?
<rogpeppe> fwereade: it's trivially conceivable.
<rogpeppe> fwereade: i imagine that amazon provides such a thing if we look hard enough
<davecheney> https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit#
<davecheney> release notes for 1.9.14
<davecheney> gonna be tappin' y'all for input if you touched the card
<fwereade> rogpeppe, afaict dave's case would be fixed with a check for ErrNoTools before first upload, while the fancy anti-race stuff is restricted to a very specific set of users that aren't, I think, very common
<fwereade> rogpeppe, ie those sharing environs that they all promiscuously start up and shut down
<fwereade> rogpeppe, I submit that if you want to treat environs that way, you get your own ;)
<rogpeppe> fwereade: in general we try to make all operations safe in a concurrent environment. the fact that aws makes it hard to do so doesn't mean that we don't want to do it
<fwereade> rogpeppe, describe to me the set of customers you expect to be impacted by this
<fwereade> rogpeppe, it's not the hardness, it's the utility
<rogpeppe> fwereade: i could ask the same about set-environ
<fwereade> rogpeppe, that is one of our explicit stated goals for the sprint
<fwereade> rogpeppe, what alternative functionality do you have in mind?
<fwereade> s/sprint/release/
<rogpeppe> fwereade: i mean - why do we go to so much bother to make it safe to use concurrently?
<fwereade> rogpeppe, we don't, it's pitiful horsecrap
<rogpeppe> fwereade: when only a "very specific set" of users will be concurrently setting environment settings
<fwereade> rogpeppe, and I don't care too much about that because the multiple-admins story is still in the future
<rogpeppe> fwereade: that's what i think about concurrent bootstrap
<fwereade> rogpeppe, but that set of people is still way larger than the set of people who will ever be impacted by concurrent bootstrap issues
<rogpeppe> fwereade: i have no idea
<rogpeppe> fwereade: i don't know how we can
<rogpeppe> fwereade: i just want to make a tool that works reliably
<fwereade> rogpeppe, *any* multi-admin situation opens the possibility of concurrent env modification
<rogpeppe> fwereade: same could be said for bootstrap, i think
<davecheney> dimitern: with machine errors in status, is there anything to add to the release notes about it ?
<dimitern> davecheney: something about nonce provisioning perhaps?
<davecheney> dimitern: https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit#
<fwereade> rogpeppe, a strict subset of those involves concurrent bootstraps, because I promise I will at least once create an environment and then give the details to someone else after it's bootstrapped
<davecheney> would you be able to write a line or two about what that means for the customer ?
<dimitern> davecheney: cheers
<davecheney> TheMue: do you have anything to add to the release notes for JUJU_ENV_UUID ?
<davecheney> fwereade: with "unused machines will not be reused", is there anything for the customers to know about this in the release notes
<fwereade> davecheney, possibly, yes -- "automatic machine reuse has been disabled for now; similar effects can be more reliably obtained by using the "--force-machine" with to `juju deploy` and `juju add-unit`, which duplicated the action of jitsu deploy-to"?
<fwereade> s/with to/option with/
<fwereade> s/duplicated/duplicates/
<davecheney> fwereade: roger
<davecheney> fwereade: this is because we can't really guarentee what state a previous charm will leave the machine in
<davecheney> , correct ?
<dimitern> davecheney: I don't think I can explain nonced provisioning in a meaningful way to the end user, without revealing how bad it used to be :)
<fwereade> davecheney, yeah
<TheMue> davecheney: only that this variable is supported now inside the hooks
<davecheney> dimitern: understood, don't mention the war
<TheMue> dimitern: thx for your feedback
<jam> danilos: ping for mumble
<dimitern> TheMue: np, I just think splitting the test table doesn't give much benefit, and duplicates a bit of code
<TheMue> dimitern: it helped me during testing ;) but i'll keep the optimization in mind for later
<fwereade> well, yay!
<fwereade> latest tools code all still seems to work
<fwereade> agents quietly ignore failed upgrades with missing tools, and then handle the ones they have tools for
<fwereade> the provisioner barfs if it tries to start a new machine with no tools available, and (probably) sets the error on the machine
<dimitern> fwereade: \o/
<fwereade> but we can't see it because of (1) a status bug: that a missing instance-id causes us to skip checking for machine errors (whoops)
<fwereade> and (2), sometimes, another status bug, wherein any error examining one machine causes the *whole* machines dictionary to be replaced with some "status error: cannot find instance id for machine 3" nonsense
<fwereade> 1) is a big deal I think because it means we *don't* get display of provisioning errors
<fwereade> 2) is less so, but still a bit crap, because if there's a 2-minute delay on new instances showing up in ec2, as there seemed to be today, it means you lose all machine status info, not just the missing ones
<dimitern> fwereade: when do you expect to merge the tools stuff?
<fwereade> dimitern, I need to look back through and figure out what has/hasn't been reviewed
<TheMue> fwereade: i shared a doc with my juju-deploy notes with you. one thing we don't cover are subordinates
<fwereade> TheMue, great, thanks, what is going to hurt us worst?
<TheMue> fwereade: i have to do another crosscheck against our code but it looks as we are mostly clean, only subordinates are missing 100%
<dimitern> fwereade: because the chain of dependency just got longer - i'm waiting on you and wallyworld_ is waiting on me for the openstack constraints flavor/image picking
<dimitern> fwereade: and I think we should have a short discussion
<rogpeppe> dimitern: i need another LGTM on this, if you want to have a look: https://codereview.appspot.com/8761045
<fwereade> TheMue, that is excellent news -- I wonder a little about the error states
 * dimitern looking
<rogpeppe> dimitern: ta!
<fwereade> TheMue, do you think you can get subordinates done today?
<TheMue> fwereade: have to check what it means exactly. the output below services and the units is changed.
<TheMue> fwereade: let me take a deeper look
<fwereade> TheMue, ISTM they are additions, not changes, to what we produce; and that state supplies all the necessary info
<dimitern> rogpeppe: reviewed
<rogpeppe> dimitern: thanks!
<TheMue> fwereade: yes, that's my first impression too
<fwereade> rogpeppe, how would you feel about EnsureAgentVersion for FindBootstrapTools?
<rogpeppe> fwereade: much better.
<fwereade> rogpeppe, I think I have a better followup but structure is strictly more pressing at this point :)
<rogpeppe> fwereade: i understand :-)
<fwereade> then, rogpeppe and dimitern, I think it comes down to the sync-tools stuff
<danilos> jam: hi, sorry, I sent an email that I won't be able to make a stand-up today; sorry again
<rogpeppe> fwereade: i still feel quite strongly about the force-version semantics. have you been able to fix that?
<rogpeppe> fwereade: i've got another possible solution there actually, simpler than the function argument.
<fwereade> rogpeppe, I'm afraid not -- like MachineConfig, it's one of the boundaries I am not keen to cross lest this pipeline explode further
 * rogpeppe 's heart sinks a bit
<fwereade> rogpeppe, I *am* very much keen to discuss and implement how I could do all this more cleanly
<fwereade> rogpeppe, and indeed to fix up the building, because I think it's important
<rogpeppe> fwereade: i just feel that this semantic is breaking the very thing you're trying hard to fix
<rogpeppe> fwereade: and it will rebound on us 10 fold
<fwereade> rogpeppe, it is breaking a single case AFAICT: we won't automatically explode when compiling one major version of the tools with another CLI
<fwereade> rogpeppe, when we fix it, it's a simple "--upload-tools now respects source version as far as possible line, and basically nobody is affected but us"
<rogpeppe> fwereade: it's breaking juju status
<fwereade> rogpeppe, huh?
<rogpeppe> fwereade: we won't be able to tell what versions the agents are running
<rogpeppe> fwereade: so an extremely useful diagnostic tool becomes useless
<fwereade> rogpeppe, because we will have forgotten what;s in our source tree?
<rogpeppe> fwereade: because the version and agent reports in the status won't have any necessary connection with the version of the code that the agent is actually running
<rogpeppe> s/and agent/an agent/
<fwereade> rogpeppe, they *already don't*
<rogpeppe> fwereade: they do if you haven't used upgrade-juju
<rogpeppe> fwereade: and that's a bug in upgrade-juju that i would very much like to fix
<fwereade> rogpeppe, I would too
<rogpeppe> fwereade: rather than *breaking it further*
<fwereade> rogpeppe, but I insist we upload tools consistently across bootstrap and upgrade-juju
<rogpeppe> fwereade: i'm convinced it would be just as easy to fix UploadTools to do the right thing
<fwereade> rogpeppe, it would be easy to fix it *badly*
<fwereade> rogpeppe, and that would make it harder to fix it well, and get some sort of clear tools-on-disk abstraction going
<rogpeppe> fwereade: arguably. but the scope is very limited. and the externally visible behaviour is really important here.
<rogpeppe> fwereade: i really don't belive it would make it harder to fix well
<rogpeppe> fwereade: we're talking about 10 lines of non-test code here
<fwereade> rogpeppe, which people get used to, and make little tweaks assuming, and next thing you know it's another 200-line diff to unpick it all
<fwereade> 2000
<rogpeppe> fwereade: UploadTools is not used everywhere
<rogpeppe> fwereade: and i don't believe it will be
<fwereade> rogpeppe, it's only a matter of time before someone realises that it's crazy to have two implementations of it, and adds a func that calls it to envtesting
<fwereade> rogpeppe, tentacles!
<rogpeppe> fwereade: why two implementations?
<fwereade> rogpeppe, because of UploadFakeTools which does roughly the same thing
<fwereade> rogpeppe, itself factored out of a range of tool-uploading tests in some prereq
<rogpeppe> fwereade: i don't want to support juju users with this misfeature in
<fwereade> rogpeppe, dev version == not supported
<fwereade> rogpeppe, upload-tools == dev version
<rogpeppe> fwereade: like we don't actually be supporting developers...
<rogpeppe> s/don't/won't/
<rogpeppe> fwereade: please tell me: why is this whole pipeline of changes important?
<rogpeppe> fwereade: i mean, important enough that we're desperately trying to get it in before the deadline
<fwereade> rogpeppe, because our tools-picking was close to random, and it was wantonly fucking over developers, and I have no confidence that the implementation that fucks over devlopers will not also fuck over users
<fwereade> rogpeppe, because there were 3 distinct live implementations of tools-picking, each of which was wrong, and probably in the same way, but I'm not confident of that either
<fwereade> rogpeppe, I believe it is absolutely critical that we are as *predictable* as possible
<rogpeppe> fwereade: that's why i believe we should be able to predict the agent version from the version of the agent we're uploading
<rogpeppe> fwereade: otherwise developers will continue to be wantonly fucked over
<fwereade> rogpeppe, "oh yeah, sometimes the wrong tools get chosen, I forget the details" inspires much less confidence than "developer tools are always uploaded with the cli version plus a unique build number, we're on it, see lp:1168754"
<fwereade> rogpeppe, which we will have to fix imminently anyway
<rogpeppe> fwereade: it was actually "tools are chosen from the public bucket if you haven't uploaded a version with the right series". which is a fairly similar statement
<rogpeppe> fwereade: at least this change will fix the default case.
<fwereade> rogpeppe, but you cannot in any way characterise what those tools will be
<rogpeppe> fwereade: but when someone comes to us and says "my environment is stuffed" and we want to find out what version they're running, we'll have to tell them to ssh to a machine, remove the force-version file and call jujud version again
<fwereade> rogpeppe, we'll say "what's the version in your $GOPATH"?
<rogpeppe> fwereade: that may bear no resemblance to the version they bootstrapped with last week
<rogpeppe> fwereade: also, it's the version in your PATH that is the important thing
<rogpeppe> fwereade: and that's part of the point.
<fwereade> rogpeppe, I don't follow: that's what they're *reported as*, not what they *are*
<rogpeppe> fwereade: oh i see. who knows whether they're still using the same branch?
<fwereade> rogpeppe, they should if they're playing with sharp tools?
<fwereade> rogpeppe, also, builds with the same exact version will always have been built from the same source
<fwereade> rogpeppe, which is a pretty useful guarantee
<fwereade> rogpeppe, x.x.x.1 was built from 1.10.2; x.x.x.2 was built from 1.11.7; upgrade, downgrade, dump one set of tools and see what happens
<fwereade> rogpeppe, you might even want to build 2 versions of the cli to check that each can interact with each nicely
<fwereade> rogpeppe, and that's really all you need, I think, to do sensible upgrade behaviour checking as a developer
<fwereade> hazmat, ping
<fwereade> does anyone have ~15s for my most trivial review ever? https://codereview.appspot.com/8688044
<TheMue> fwereade: done
<rogpeppe> fwereade: i really don't think this is so bad: lp:~rogpeppe/juju-core/fwereade-do-not-lie
<rogpeppe> fwereade: it would need a little more test coverage around Upload, but i would be much happier with it done like this.
<fwereade> rogpeppe, it's injecting a little snippet of custom logic in between steps 1 and 2 of three distinct separate operations -- it is taking things that are tighly coupled and could be profitably separated (if only so we could test the blasted things) and making them *more* coupled
<fwereade> rogpeppe, and as soon as we're signing builds it will become more so
<rogpeppe> fwereade: i agree, but it fixes a real issue without undue perturbation to the code
<fwereade> rogpeppe, I think this is where we differ
<rogpeppe> fwereade: and causes several big "THIS IS WRONG" comments to be unnecessary
<rogpeppe> fwereade: it's not a 1000 line diff
<rogpeppe> fwereade: kanban?
<fwereade> rogpeppe, ah yeah
<rogpeppe> mramm: ^
<mramm> rogpeppe: yea, be there in a minute
<rogpeppe> saved by a "declared and not used" error once again
<rogpeppe> niemeyer: hiya!
<niemeyer> rogpeppe: Yo
<rogpeppe> fwereade: could you please take another look at this before i submit? https://codereview.appspot.com/8761045
<fwereade> rogpeppe, lgtm, nice
<rogpeppe> fwereade: thanks
<fwereade> I'll be back to do a submit-burst a bit later, need a quick rest
<rogpeppe> dimitern, fwereade, TheMue: trivial? https://codereview.appspot.com/8664047
<fwereade> rogpeppe, LGTM trivial with quibbles left to yourjdugment
<fwereade> and I really am off for a bit now
<mramm> How goes everything?
<rogpeppe> just about to leave
<rogpeppe> fwereade: trivial? https://codereview.appspot.com/8658045
<mramm> Many more items in the release notes: https://docs.google.com/a/canonical.com/document/d/1zj8zs5SUTvKAcnLlLiaXOalMp07zInJz1fN7w1OTDLo/edit#
<mramm> I just took things from the kanban board, and wrote them up.
<mramm> A few of them may have been available in 1.9.13 but were not announced then.
<rogpeppe> fwereade: there's a very simple reason why we don't see logs from the unit agent
<rogpeppe> fwereade: it's just not implemented
<rogpeppe> fwereade: no time to do it today i'm afraid
<rogpeppe> time to go
<rogpeppe> see y'all tomorrow!
<rogpeppe> mramm: thanks for that - quite a substantial list!
<mramm> rogpeppe: agreed
<mramm> I also got the force-machine stuff merged
<mramm> so that part of the release notes is now true ;)
<rogpeppe> mramm: cool
<rogpeppe> mramm: has it been tested live?
<rogpeppe> actually, i really am leaving :-)
<kapil_> so the global firewall mode, still is adding entries per machine..
<kapil_> into a global sec group, which still runs into size limits
<kapil_> its actually a smaller size limit then the number of groups
<mgz> ha
<mgz> well, that's fixable
<mgz> but... shouldn't dupes be rejected anyway?
<mgz> ie, I add a rule saying allow tcp 80 to 0.0.0.0/0
<mgz> if I then try to add that rule again, I get back an error from the api saying it's already got that
<m_3> hazmat: juju-goscale2-machine-0:2013/04/16 00:46:25 ERROR worker/provisioner: cannot start instance for machine
<kapil_> mgz, if there differentiating on address then they would be distinct
<m_3> hazmat: "85": cannot set up groups: failed to create a rule for the security group with id: %!s(*int=<nil>)
<kapil_> the ostack provider ensureGroups looks sane
<kapil_> hmm
<m_3> hazmat: ubuntu@15.185.162.247
<mgz> m_3: can you ssh-import-id gz too please?
<kapil_> mgz, we're in the middle of performing an experiment, so read only observation pls unless coordinated
<mgz> indeed.
<m_3> mgz: added
<mgz> ta.
<kapil_> fwereade, if we're not reusing, we should probably also be destroying during destroy-svc
<mgz> I only see two ports opening in the log in home
<mgz> ...so, is it just lack of group cleanup between runs?
<kapil_> mgz, looks sane
<kapil_> we're only opening port on the master which is single instance
<kapil_> perhaps it was accidental expose of the hadoop slave
<mgz> it's probably just the code not being tolerant of the api "already got that" response and yeah, a double open
<mgz> the error is weird though, not what I'd expect
<m_3> mgz: you want anything set up before we kick off a bigger run?
<hazmat> mgz, i wonder if we're getting different error strings causing a value mismatch on the duplicate group detection
<hazmat> mgz, where you at..
<hazmat> mgz, i'd like to pair on this.. the variation in errors is a bit high, it looks like some rate limiting is missing on flavor listing
<bac> with juju-core (r1164) i'm seeing juju commands failing rather than queueing up.  for instance if i bootstrap and then deploy in a script the deploy fails with "error: no instances found".  very non-juju.  anyone else seen it?
<bac> this: http://pastebin.ubuntu.com/5714170/
<mgz> hazmat: sorry, just missed you before lunch, I'm in B113 right now, we could meet up somewhere to poke this
<thumper> morning
<thumper> bac: not seen it, but not played much
<thumper> bac: I agree not very juju :)
<bac> thumper: it was suggested i clean out my buckets.  haven't gotten to try that yet.
<thumper> bac: I don't think that buckets should have anything to do with that...
<mgz> what exactly are you deploying on?
<TheMue> thumper: morning
<mgz> what you need to debug this is to run the list command on your underlying cloud and see what the instances are up to
<mgz> you can see that kind of behaviour if, for instance, the instance went to the error state
<m_3> mgz: http://paste.ubuntu.com/5714448/
<m_3> mgz: I'm gonna bring up 200 and then add some incrementally
<mgz> m_3: ace
<mgz> 20 security group rules is pretty tight
<mgz> default and the environ group will take about 10 just on their own
<m_3> mgz: we can just go ahead and bump that up a bit
<mgz> it wouldn't hurt
<m_3> mgz: didn't realize we were going to be adding that many rules
<m_3> is that because we're in global mode?
<m_3> mgz: we're not going to nest any security groups right?
<mgz> we'll add rules to the global group for everything that opens ports
<mgz> m_3: session done now, coming to find you
<m_3> mgz: booth
<thumper> rogpeppe: don't suppose you are around?
<thumper> hmm... just after midnight
<thumper> perhaps not...
<thumper> hi wallyworld
<thumper> wallyworld: how was the holiday?
<wallyworld> g'day
<wallyworld> farking awesome
<wallyworld> can't wait to go back
<mgz> no getting eaten by lion...
<wallyworld> no, i am a fast runner
<wallyworld> mgz: how's ODS?
<wallyworld> thumper: i like your Set stuff - i really lament Go's lack of collections and associated standard things like Array.contains etc - there's some much boiler plate in our business logic where all this is done by hand each time :-(
<thumper> :)
<mgz> wallyworld: but writing a loop is so easy
<thumper> wallyworld: yeah
<wallyworld> seems like for every 100 lines of code, 50% is not business logic at all
<thumper> mgz: don't make me hurt you
<mgz> m_3: we're still getting the mongo timeout thing every minute or so
<mgz> all seems to be from one machine, so that might just have something duff with networking
<thumper> mgz: is mramm there with you?
<mgz> he's within yelling distance somewhere
<thumper> mramm: oh hai... I'm guessing that we won't have a one-on-one call this week
<mramm> thumper: I was not planning on doing one on ones with everybody
<TheMue> so, 1st part of subordinates in status, time to go to bed.
<mramm> but I can sneak away from meetings to do some if they are helpful (on a case by case basis)
<TheMue> have a good night all
<mramm> TheMue: thanks!
<mramm> TheMue: good work.
<TheMue> mramm: yw, and thanks.
<thumper> mramm: nothing urgent, I talked with fwereade about work
<mgz> so, machine 7 just never arrived at a good state: <http://paste.ubuntu.com/5714587/>
<m_3> mgz: lemme know if we should bounce
<davecheney> m_3: rog committed a fix overnight to reduce the amount of logging spam
<davecheney> so that sound cause less rsyslog load on the bootstrap node
<mgz> filed bug 1169773
#juju-dev 2013-04-17
<gary_poster> if there's a review of https://codereview.appspot.com/8811043/ we could get it in under the wire
<gary_poster> which would be nice :-)
<gary_poster> and it is very small
 * davecheney looks
<davecheney> gary_poster: this diff is dirty
<davecheney> there is some bzr .THIS crap in thre
<gary_poster> davecheney, I was assuming that it was a clean up
<gary_poster> davecheney, might a repush clean it up?
<gary_poster> Makyo, ^^^ could you try a repush?
<gary_poster> davecheney, than kyou for looking
<davecheney> actually, this removes a mistaken checkin
<davecheney> let me publish my review, there is some more dirtyness
<gary_poster> ok thanks
<davecheney> hazmat: this was your fault!
<davecheney> done
<davecheney> thumper: https://codereview.appspot.com/8811043/
<davecheney> can you give the gui guys a second LGTM
<davecheney> gary_poster: or you could just commit it
<davecheney> if time is a factor
<gary_poster> davecheney, thanks much.  I'll check what is possible,
 * thumper looks
<thumper> done
 * thumper off to the gym shorly for some sparring
<davecheney> gary_poster: Makyo you've got two, make those changes and fire at will
<gary_poster> awesome thanks very much davecheney and thumper
<davecheney> does anyone have a link to last week's hangout agenda ?
<davecheney> m_3: ping
<mgz> okay, have the other bug down as well
<m_3> davecheney: hey
<m_3> mgz: we're stuc
<m_3> k
<davecheney> m_3: how's it going ?
<m_3> mgz: I'm leaving it be atm
<m_3> davecheney: seems hung up on something
<davecheney> hang on, changing hosts
<davecheney> m_3: tell me where it hurts
<m_3> davecheney: we brought up 200 earlier to checck on adding incremental nodes via nova cli
<mgz> m_3: I saw it's not increasing, but didn't see anything in the log
<m_3> davecheney: mgz played a bit
<m_3> davecheney: then added another 200 (juju add-unit hadoop-slave -n200)
<mgz> I have two of the three issues I've seen tracked down
<m_3> davecheney: it added 15 more and hung
 * m_3 gotta relocate... getting kicked out of the expo area
<davecheney> mgz: i've seen an issue where the PA will get into a loop if there are problems creating security groups
<m_3> back on after while... I'll leave it up for y'all
<davecheney> m_3: mgz: i suggest destroying this instance
<davecheney> bzr pull to rev 1164
<davecheney> then trying again
<davecheney> 2013/04/16 23:41:06 ERROR provisioning worker: failed to GET object provider-state from container goscale2-1
<davecheney> caused by: https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/provider-state%!(EXTRA string=failed executing the request)
<davecheney> i think i have a fix for these formatting errors
<davecheney> mgz: i'd like to destroy this envbironment and start again with a fresh build
<mgz> okay, we could probably do with fixing a couple of the dumb issues too
<davecheney> rev 1164 cuts down on 66% of the log span
<davecheney> spam
<mgz> that particular one is probably api flood issue again
<davecheney> i'll grap the all machines log
<davecheney> the blow away the environment
<davecheney> ubuntu@juju-hpgoctrl2-machine-0:~$ scp 15.185.160.145:/var/log/juju/all-machines.log .
<davecheney> all-machines.log                                                            20%  141MB   5.6MB/s   01:38 ETA
<davecheney> that is a great network hp
<mgz> filed bug 1169778
<mgz> davecheney: compress first :)
<davecheney> mgz: sure
<davecheney> but 5.4Mb/s is shit
<davecheney> my pandaboard can do better than that
<davecheney> mgz: it's nice to know you can resume juju destroy-environment
<davecheney> mgz: m_3 we have 7 stray nova instances after destroy-environemnt
<davecheney> hmm
<davecheney> actually only 2
<davecheney> never mind
<mgz> one was mine, that I killed, and one is the manage one?
<davecheney> fixed it
<davecheney> wc -l was the wrong thing to tuse
<davecheney> mgz: m_3 deploying 200 hadoop-slave units
<davecheney> 100% of the time is being consumed waiting on hp to respond to new instance requests
<m_3> davecheney: hey
<m_3> davecheney: how's it going?
<m_3> davecheney: btw, you never want to use 'hadoop-slave' the charm... (long story)
<davecheney> m_3: 2013/04/17 01:52:25 NOTICE worker/provisioner: started machine 200 as instance 1509985
<m_3> davecheney: make sure you're doing `juju deploy hadoop hadoop-slave -n200`
<davecheney> m_3: sorry, i was looking in the history, that is all I could find
<m_3> ack
<davecheney> nm, we can destroy the instance
<m_3> always ran from bin/hadoop-stack
<davecheney> ok, let me nuke this environment
<davecheney> (it worked fine btw
<davecheney> rev 1164 is much less chatty
<m_3> davecheney: did it get above 200?
<m_3> davecheney: we have perms to go to 2000
<davecheney> just did -n200
<m_3> were just having probs right near the 200 mark
<davecheney> i'll bootstrap a fresh env
<davecheney> do bin/hadoop-stack
<m_3> I did a `-n1997` before
<davecheney> yeep!
<davecheney> thati'll take hours
<m_3> did you get mgz's changes in?
<davecheney> no, but the logging change went in
<m_3> yup... about 8hrs or so
<davecheney> so we can actually see what is going on
<m_3> ah, cool
<m_3> wanna do 300?
<davecheney> ok
<m_3> or shall we go big
<davecheney> 500 should see us throught to drinks
<davecheney> how is ODS ?
<davecheney> is the marketing completely off tap
<davecheney> ?
<m_3> dude
<davecheney> hookers and blow ?
<m_3> it's pretty over the top
<m_3> haha
<m_3> yeah, all hookers and blow
<m_3> supposed to meet kapil in the lobby for dinner in a bit
<davecheney> actaully, card tables, laptops and tshirts
<m_3> yeah :(
<m_3> better than couch, popcorn, and underwear at least
<davecheney> m_3: i don't need to go to a conference for that
<davecheney> m_3: we're good to go
<davecheney> hit it with the old -n300
<davecheney> hmm, that isn't good
<davecheney> let me drive for a second
<davecheney> ok, still piss farting around
<davecheney> the last bootstrap was much faster
<m_3> ack
<davecheney> still scroblling in apt
<davecheney>      ââcloud-init-cfgââ¬âadd-apt-reposit
<davecheney>      â                ââshâââtee
<davecheney> wow
<davecheney> that is slow
<davecheney> this instance may fail to bootstrap
<m_3> might just killl it
<davecheney> do it
<davecheney> oh wait
<davecheney> yeah, that is really slow
<davecheney> nuke it
<m_3> one that one's up just run bin/hadoop-stack
<m_3> I'll chcek in after dinner if that's cool with you?
<davecheney> m_3: ok, is that configured for -n300 ?
<m_3> maybe you can catch what's going wrong near the 200 mark
<davecheney> totolly cool
<m_3> yup
<davecheney> you go have dinner with K
<m_3> awesome thanks man... ttyl
<m_3> oh, and let's archive the logs in ~/arch/<date>
<davecheney> sure, i've been scp'ing them onto my machine
<davecheney> will fix
<m_3> at least to keep some sort of record of what we're doing
<m_3> thanks
<davecheney> actually, i'll put it in google drive
<m_3> I'm still rsyncing the whole control node's /home/ubuntu offsite daily
<m_3> perfect
<davecheney> probably isn't a bad idea
<davecheney> make sure you use rsync -z
<m_3> (-azvP)
<m_3> cool... /me food
<m_3> thanks man
<davecheney> m_3: also, status now shows relations
<davecheney> 2013/04/17 02:13:05 DEBUG environs/openstack: openstack user data; 2708 bytes
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/6"
<thumper> \o/
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/6"
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/5"
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/5"
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/6"
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/6"
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/5"
<davecheney> 2013/04/17 02:13:05 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/5"
<davecheney> if this is an error, why does the provisioner not stop
<thumper> davecheney: that doesn't look good
<davecheney> ?
<davecheney> 2013/04/17 02:12:29 INFO environs/openstack: starting machine 14 in "goscale2" running tools version "1.9.14-precise-amd64" from "https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/tools/juju-1.9.14-precise-amd64.tgz"
<davecheney> 2013/04/17 02:12:29 DEBUG environs/openstack: openstack user data; 2710 bytes
<davecheney> 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/1"
<davecheney> 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/1"
<davecheney> 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#provider#hadoop-master/0"
<davecheney> 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#1#provider#hadoop-master/0"
<davecheney> 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/1"
<davecheney> 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/1"
<davecheney> 2013/04/17 02:12:40 ERROR settings for entity with unrecognized key "r#0#provider#hadoop-master/0"
<davecheney> 2013/04/17 02
<davecheney> starts to screw up here
<davecheney> caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers%!(EXTRA string=failed executing the request)
<davecheney> i really want to fix this stupid formatting errror
<davecheney> shit shit shit
<davecheney> the 1.9.14 release cannot seem to download any tools
<davecheney> however when I build the release from source
<davecheney> it can !
<davecheney> 2013/04/17 02:50:21 DEBUG checking tools 1.9.14-precise-amd64
<davecheney> 2013/04/17 02:50:21 INFO environs/openstack: starting machine 298 in "goscale2" running tools version "1.9.14-precise-amd64" from "https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/tools/juju-1.9.14-precise-amd64.tgz"
<davecheney> 2013/04/17 02:50:21 ERROR worker/provisioner: cannot start instance for machine "298": cannot find image satisfying constraints: failed to get list of flavours
<davecheney> caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors%!(EXTRA string=failed executing the request)
<davecheney> caused by: Get https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors: lookup az-2.region-a.geo-1.compute.hpcloudsvc.com: Temporary failure in name resolution
<davecheney> i'm having a lot of trouble validating the 1.9.14 release
<davecheney> is anyone in a position to install 1.9.14 from ppa:juju/devel
<davecheney> and try
<davecheney> (don't forget to make sure you're calling /usr/bin/juju, not ~/bin/juju)
<fwereade> hazmat, if you're there, I don't think we necessarily want to destroy machines
<fwereade> hazmat, I suspect the deploy-ubuntu-to-prime, force-machine-to-deploy (anti-)pattern will be popularly used
<fwereade> davecheney, thumper: if you're there, can I get a trivial on https://codereview.appspot.com/8815043 please?
<davecheney> fwereade: i'll take a look
<davecheney> fwereade: i'm very concerned that 1.9.14 is stillborn
<davecheney> i'm thinking of proposing we move to daily point releases
<davecheney> so m_3 and I stop using the development chain
<fwereade> davecheney, I don't think it'll build, given this stuff
<fwereade> davecheney, it's an old test file that somehow crept in
<davecheney> wut ?
<fwereade> davecheney, at a casual glance it looks like it slipped in from the force-machine branch
<davecheney> this is the second turd to fall out of that branch
<fwereade> davecheney, although it's not quite clear how
<davecheney> without being rude, 4 people looked at that branch
<fwereade> davecheney, ouch :(
<davecheney> and nobody spotted the .THIS file there
<davecheney> fwereade: LGTM, please submit
<fwereade> davecheney, happening right now
<fwereade> davecheney, looks like that file slipped in somewhere between 1 and 5 days ago after the main round of reviewing
<davecheney> so that means the submitter did not run the tests before submitting
<davecheney> do we need to make another adding to .lbox.check ?
<davecheney> /s/adding/addition
<fwereade> I never found out what happened to the bot proposal but that also seems to be stillborn
<fwereade> in its absence, maybe we do
<thumper> hi fwereade
<thumper> what are you doing up?
<fwereade> thumper, fell asleep early last night, back up now to merge a whole pile of things and do some reviews
<davecheney> fwereade: we can't have a bot til the tests pass reliably
<davecheney> that was my understanding
<thumper> fwereade: oh, I have four reviews up?
 * thumper goes to drink the coffee
<fwereade> davecheney, if we can't have a bot without reliable tests but we can't commit reliably without a bot *something* needs to change
<davecheney> fwereade: i think we can at least do a test compile of the tests
<davecheney> that would ahve caught this error
<davecheney> but it does seem a bit futile
<davecheney> we're like the TSA
<davecheney> running after the horse
<fwereade> davecheney, oh! how do we compile tests without running them?
<fwereade> davecheney, actually, don't tell me, I might start doing it ;p
<davecheney> fwereade: i'm sure there is a way
<davecheney> (cd /tmp ; go test -c $PKG) will do it
<fwereade> davecheney, emphatic +1 then
<davecheney> fwereade: http://paste.ubuntu.com/5714987/
<davecheney> can anyone else test the 1.9.14 release ?
<davecheney> the top invocation is from the pacakge
<davecheney> the bottom is the same source
<davecheney> 2013/04/17 02:50:17 INFO environs/openstack: starting machine 207 in "goscale2" running tools version "1.9.14-precise-amd64" from "https://region-a.geo-1.objects.hpcloudsvc.com/v1/17031369947864/goscale2-1/tools/juju-1.9.14-precise-amd64.tgz"
<davecheney> 2013/04/17 02:50:17 ERROR worker/provisioner: cannot start instance for machine "207": cannot find image satisfying constraints: failed to get list of flavours
<davecheney> caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors%!(EXTRA string=failed executing the request)
<davecheney> caused by: Get https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/flavors: lookup az-2.region-a.geo-1.compute.hpcloudsvc.com: Temporary failure in name resolutio
<davecheney> after 200 machines, we loose the ability to resolve the name of the hp endpoint
<fwereade> davecheney, hmm, I seem to be bootstrapping ok in us0east-1 with 1.9.14 from the package
<thumper> fwereade: so is it like 6:30am for you now?
<fwereade> thumper, yeah
<thumper> fwereade: cool, not even mind-numbingly early
<thumper> fwereade: don't suppose you have time for a quick hangout?
<fwereade> as predicted I fell asleep early and woke up early
<fwereade> thumper, sure
<davecheney> fwereade: i cannot bootstrap into ap-southeast-1 or 2 with that release
<thumper> fwereade: I'll start
<davecheney> but it works perfectly using the source
 * davecheney has had a gutfull of unreliable cloud providers
<davecheney> ubuntu@juju-hpgoctrl2-machine-0:~$ juju destroy-environment
<davecheney> error: failed to delete server with serverId: 1510323
<davecheney> caused by: https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/servers/1510323%!(EXTRA string=failed executing the request)
<davecheney> caused by: remote error: handshake failure
<fwereade> davecheney, ap-southeast-2 is... well, appearing to bootstrap anyway
<fwereade> davecheney, but acting a bit weird, I will let you know how I get on
<fwereade> davecheney, ok, the good news is that the bootstrap node comes up and works fine, the bad news is that the CLI will not acknowledge its existence
<davecheney> fwereade: -v ?
<davecheney> did it bootstrap the wrong tools
<davecheney> ?
<fwereade> 2013/04/17 06:55:37 INFO environs/ec2: waiting for DNS name(s) of state server instances [i-79c80744]
<fwereade> 2013/04/17 06:55:42 ERROR command failed: no instances found
<fwereade> error: no instances found
<davecheney> eep
<fwereade> davecheney, the instance definitely exists and it's definitely got the right tools
<davecheney> fwereade: i;m getting these errors constantly during the load tests
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/2"
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/2"
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/0"
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/0"
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/2"
<fwereade> davecheney, so that is good anyway
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/2"
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#0#requirer#hadoop-slave/0"
<davecheney> 2013/04/17 04:57:50 ERROR settings for entity with unrecognized key "r#1#requirer#hadoop-slave/0"
<fwereade> davecheney, ah ffs there is no way those should be logged
<fwereade> davecheney, they're not errors, it's just the allwatcher getting its knickers in a twist because it's watching things it doesn't know about
<fwereade> davecheney, btw, we seem to be waiting only a very few seconds for the DNS name (apse2 issues), which is surprising to me
<davecheney> fwereade: oh, so that is the api server
<fwereade> davecheney, yeah
<davecheney> i was wondering why provisioning didn't land in a screaming heap
<davecheney> fwereade: i'll log a bug
<fwereade> davecheney, they're not even really errors -- it's just logging and stopping processing for those docs
<fwereade> davecheney, cheers
<davecheney> i'll log two bugs
<davecheney> can I get a review on https://codereview.appspot.com/8818043/
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1169825
<fwereade> davecheney, what happened to errors.Newf (if anything)?
<davecheney> nothign, just people were not calling it correctly
<davecheney> URL was not a format specifier
<davecheney> it was arg 1
 * fwereade sees now
<fwereade> davecheney, LGTM
<davecheney> http://paste.ubuntu.com/5715085/
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1169826
<TheMue> morning
<TheMue> fwereade: seen my part 1? during sleep I found a by far simpler approach. will continue after breakfast.
<jam1> afternoon wallyworld, how's it going today?
<fwereade> TheMue, great news, I'm afraid I fell asleep last night ;p
<wallyworld> jam1: going well thanks, just sorting out a test isolation issue and i will send a mp for my work
<TheMue> fwereade: no wonder after your night before
<jam> fwereade: second LGTM on https://codereview.appspot.com/8818043/
<jam> sorry, I meant davecheney ^^
<jam> bigjools, jtv: Is the maas provider ready to be exposed in jujud/juju client?
 * jtv is not here
<bigjools> jam: what is the effect of that?  Sorry, I have no context on it.
<jam> bigjools: if you have it enabled, people can configure it in ~/.juju/environments.yaml and try to bootstrap against a maas service.
<bigjools> as far as I know it all works, so if there's something missing I don't know why it's needed
<jam> If it is ready for people to do so, then we should expose it.
<jam> (which it has been)
<jam> But we also want to configure jujud to require it
<bigjools> jam: we can already bootstrap with it, I don;t understand
<jam> so we don't accidentally lose it later.
<jam> bigjools: I was just confirming that the code has gotten to the "working" state. I hadn't heard a specific update to where you guys were.
<jam> great.
<davecheney> jam: bzr: ERROR: Cannot lock LockDir(chroot-72851216:///%2Bbranch/goose/.bzr/branch/lock): Transport operation not possible: readonly transport
<davecheney> I cannot submit that mp
<jam> davecheney: so I would say LGTM trivial on exposing it.
<bigjools> jam: ok :)  yes it should all work
<davecheney> any ideas ?
<jam> davecheney: you can't use 'lbox submit' for goose. You mark the MP as approved (make sure it has a commit message), and then goose bot will run the tests and land it for you.
<jam> I can do it if you need help.
<davecheney> ta
<m_3> davecheney: how's it going?
<jam> davecheney: so on here: https://code.launchpad.net/~dave-cheney/goose/001-fmt-error/+merge/159287
<davecheney> m_3: hey
<jam> there is a "Set commit message" which you can just copy the description if you are happy with it.
<jam> and then mark it Approved.
<davecheney> sent you an email with the sitrep
<davecheney> basicallyu can't get above 228 machines
<davecheney> i susecpt we're running out of ipv4 allocations
<m_3> ouch
<davecheney> and it is being reported in a strange way
<m_3> actually, that's awesome... this is the shit we're doing this to flush  out
<davecheney> the way the dns address of the keystone endpoint stops resolving is both
<davecheney> a. repeatable
<davecheney> b. fucked up
 * m_3 reading mail
<m_3> davecheney: we can see our ip quotas... one sec
<davecheney> 228 is close enough to 254 that it makes me go hmmm
<davecheney> m_3: ahh
<davecheney> interesting
<davecheney> 228 isn't near any of those numbers
<davecheney> jam: now what happens ?
<m_3> davecheney: hmmmm.... thought that showed ip allocations too
 * m_3 rtfm-ing
<davecheney> jam: is the bot broken ?
<m_3> davecheney: I'll dig through the acct setup with antonio tomorrow
<m_3> davecheney: jorge and I are talking tomorrow, so won't get to this til later in my daytime
<jam> davecheney: usually it takes about 5 min for it to wake up and run the test suite
<jam> but if it fails it *should* report a failure to Launchpad and unmark the proposal as pneding
 * davecheney thinks its dead
<m_3> davecheney: thanks for the help... /me beddy-bye
<davecheney> m_3: no worries mate
<jam> davecheney: from what I can tell 'canonistack' thinks the machine is running, but I'm getting 0 response from port 22
<davecheney> jam: i remember a discussion a while ago where you metnoined that you weren't able to contact th emachine
<davecheney> but it appeared to be doing stuff, so whateva
<wallyworld> jam: i'm off to soccer, but here's a wip mp. dimiter was interested to see it. i can't land till his work goes in since i need to account for whatever changes he has made. i also need to land a small goose change
<wallyworld> https://code.launchpad.net/~wallyworld/juju-core/openstack-image-lookup/+merge/159301
<wallyworld> the covering letter also need to be expanded, but i am late
 * TheMue is happy, the simpler approach works fine. now part two adding the subordinate unite information.
<rogpeppe2> mornin' all
<TheMue> rogpeppe2: morning
<rogpeppe2> in case my earlier email didn't get through, my phone line is down currently
<rogpeppe2> i'm connected through my mobile currently, but i don't know how reliable it will be
<rogpeppe2> trivial branch that fixes #1169825 ; review appreciated please: https://codereview.appspot.com/8821043
<TheMue> YEEEEEAH!
 * TheMue does the subordinate dance ...
<rogpeppe2> fwereade: i'm after a review of this fairly trivial branch, please. i think it's important to get in before the release. https://codereview.appspot.com/8823043
<fwereade> rogpeppe2, ok, I'll chat about s3 in a mo
<rogpeppe2> fwereade: ok
<fwereade> rogpeppe2, nice, LGTM
<rogpeppe2> fwereade: trivial?
<TheMue> fwereade: you've got a proposal :)
<fwereade> rogpeppe2, yeah, I think so
<rogpeppe2> fwereade: cool, thanks
<fwereade> TheMue, cheers
 * rogpeppe2 wonders how long it'll take to run lbox submit through the mobile data connection
<TheMue> fwereade: it's https://codereview.appspot.com/8824043/
<fwereade> TheMue, cheers
<fwereade> rogpeppe, I am somewhat ambivalent about changing Storage.Put
<rogpeppe> fwereade: i think that's going to be a necessary thing if we ever want retries to work.
<fwereade> rogpeppe, I would be much happier if it took a Reader and tried to ReadSeeker it
<rogpeppe> fwereade: i wondered about that, but i was very glad i didn't
<fwereade> rogpeppe, oh yes?
<rogpeppe> fwereade: because the static type caught lots of places that i would not otherwise have found
<rogpeppe> fwereade: it's not that we can stream data without having it first either
<rogpeppe> fwereade: because we need to know the length
<fwereade> rogpeppe, it is not unheard of for content lengths to be declared
<rogpeppe> fwereade: that is true, i suppose
<fwereade> rogpeppe, if they are, then great,but even if not I don't really think we should force all clients to buffer everything
<rogpeppe> fwereade: but in all the cases we care about, we do want retries to work
<rogpeppe> fwereade: and they won't work on a straight io.Reader
<fwereade> rogpeppe, maybe it's our own responsibility to buffer though
<rogpeppe> fwereade: i think so
<fwereade> rogpeppe, ha, different "our" I suspect
<rogpeppe> fwereade: ah, maybe so :-)
<rogpeppe> fwereade: i really don't think that Put should do the buffering itself
<rogpeppe> fwereade: if that's what you were thinking of
<fwereade> rogpeppe, yeah, I definitely see that side of it too
<fwereade> rogpeppe, but I'm not sure it's a very strong objection -- convince me?
<rogpeppe> fwereade: in many of the places, we already have the data in a file
<fwereade> rogpeppe, in which case, yay, easy ReadSeeker
<rogpeppe> fwereade: buffering in Put would force us to make another copy of that before doing the Put
<fwereade> rogpeppe, I'm only suggesting we should buffer ourselves if we can't ReadSeeker it
<rogpeppe> fwereade: oh i see, you mean "if it doesn't implement ReadSeeker, then pull everything into a buffer" ?
<fwereade> rogpeppe, more ideally a temp file
<rogpeppe> fwereade: i don't really see the point.
<rogpeppe> fwereade: in all the places we have, it's trivial to make a ReadSeeker
<rogpeppe> fwereade: and i much prefer to avoid dynamic type checking magic where reasonable
<fwereade> rogpeppe, so that we can inject the reliability as close to the problematic bit as possible, and so we don't push a new restriction into the API that I'm not sure is justified
<rogpeppe> fwereade: you're saying we shouldn't force all clients to buffer everything, but we're going to buffer everything anyway, right?
<rogpeppe> fwereade: for the rare (currently non-existent) case where the client doesn't have a seeker, it's easy for them to make one.
<rogpeppe> fwereade: and know about the trade-offs involved
<rogpeppe> fwereade: as i said, having the static type caught lots of cases where i could trivially use a readseeker but didn't. if i hadn't made that change, we'd be creating a temp file for each of those.
<fwereade> rogpeppe, or maybe we'd be making use of the length param to decide whether or not to create one ;)
<rogpeppe> fwereade: that seems to me like lots of heuristics and extra code where currently we don't need any
<rogpeppe> fwereade: if we find lots of places that want to use an io.Reader, we can easily make a func PutBuffered(s Storage, name string, r io.Reader, length int64) error
<fwereade> rogpeppe, isn't that because you're just pushing the responsibility onto the client?
<rogpeppe> fwereade: so it's obvious to the caller that we're going to be creating a temporary file
<rogpeppe> fwereade: sure. but creating a temporary file is something i think it's worth the client be aware of
<rogpeppe> fwereade: and it means we don't have to write more code now
<fwereade> rogpeppe, possible compromise: at the point where you'd seek, just error out if it's not a ReadSeeker?
<rogpeppe> fwereade: that's worse, i think
<fwereade> rogpeppe, automatic reliability in the cases we already have, no external code changes, no magic buffering
<fwereade> rogpeppe, I guess there's something I'm missing
<fwereade> rogpeppe, (ok, maybe we should check first to know where to seek back to)
<rogpeppe> fwereade: except that i want automatic reliability in all cases, and it's not hard
<rogpeppe> fwereade: i don't really see why there's a problem with requiring ReadSeeker. it loses no generality
<rogpeppe> fwereade: yes, that is also an issue with just dynamically type converting to ReadSeeker
<rogpeppe> fwereade: we'd have to say "*if* this is a ReadSeeker, it must be positioned at the start"
<rogpeppe> fwereade: that's ok i guess, but i'd much prefer to just require a ReadSeeker
<fwereade> rogpeppe, the issue is that it's a restrictive interface change to all the providers to implement a change that's ec2-specific
<fwereade> rogpeppe, I feel like it's ec2 driving the interface, not the other way round
<rogpeppe> fwereade: it's not necessarily ec2-specific. it applies to any provider where the request might encounter a transient failure
<fwereade> rogpeppe, it is in practice ec2-specific... and any other provider that implements Put-reliability will need to make similar internal changes to those in ec2 regardless
<rogpeppe> fwereade: i agree. but if you've just got an io.Reader, you *cannot* retry a request - so regardless of the provider implementation, they'll want some way of rewinding the stream.
<rogpeppe> fwereade: so we're going to force all providers that want to retry to implement their own temp-file buffering stuff
<rogpeppe> fwereade: i'd prefer to keep that cleverness outside of the individual providers
<fwereade> rogpeppe, the tradeoff is between asking a limited number of providers to do so, and asking every client to do so
<fwereade> rogpeppe, our perspective on this may skewed because we are focused on writing one client, and we have 3+ providers to think about
<rogpeppe> fwereade: if it's just a difference between s.Put(r, name, len) and environs.PutBuffered(r, name, len) then i don't see the issue
<rogpeppe> PutBuffered(s, name, r, len) of course
<rogpeppe> fwereade: then the temp-file buffering is implemented in a single place, once and for all
<rogpeppe> fwereade: (and currently we don't even need it, because there's not a single place that's really streaming)
<fwereade> rogpeppe, the question I am trying to get clarity on is: purely at the level of the interface, why is it better to have a ReadSeeker than a Reader?
<rogpeppe> fwereade: because a Reader allows streaming only, and here we can't stream
<fwereade> rogpeppe, we can stream, it's just not necessarily so reliable
<rogpeppe> fwereade: and that's a good thing?
<fwereade> rogpeppe, considering the capabilities of the various providers, I think it is all we can reasonably guarantee
<rogpeppe> fwereade: we can guarantee to make available to a provider some data that they can retry on. i think that's a nice thing to do. it makes the provider's job easier.
<rogpeppe> fwereade: your solution is asking me to put more intelligence and code in the provider, where actually we want to move as much as possible *outside* the provider
<fwereade> rogpeppe, *we* can, yes -- but I thought we weren't writing the interfaces just for our own convenience?
<fwereade> rogpeppe, I don't know how you are going to put S3-specific buffering outside the provider
<rogpeppe> fwereade: as far as i'm concerned, it's the job of that interface to make it as easy as possible to write a provider
<fwereade> rogpeppe, the provider should absolutely contain *provider-specific* intelligence and code
<rogpeppe> fwereade: for clients, we can easily write convenience functions that wrap it
<rogpeppe> fwereade: agreed. but buffering the reader is not provider-specific. i'm not sure what you mean by "S3-specific buffering".
<rogpeppe> fwereade: another possibility is to pass a function (func() io.Reader) that can be called multiple times to get the source.
<rogpeppe> fwereade: that would make it easier for a client to do streaming.
<fwereade> rogpeppe, does every provider have to implement this retry stuff?
<fwereade> rogpeppe, it seems it is a provider-specific enhancement
<rogpeppe> fwereade: it's a good question. i suspect it's not unlikely.
<rogpeppe> fwereade: the point is that any provider that *does* want to implement retry *must* implement the same buffering code.
<rogpeppe> fwereade: and that client code must work against any provider
<rogpeppe> fwereade: i suppose your argument could be: for providers that *don't* implement retry, we'll be doing more work than necessary.
<rogpeppe> fwereade: assuming we actually have some client code that wants to stream.
<fwereade> rogpeppe, well, we already did, even if it was just a smattering of test code
<rogpeppe> fwereade: really?
<rogpeppe> fwereade: i thought the changes were just changing bytes.Buffer to bytes.Reader in the main
<rogpeppe> fwereade: no extra buffering required
<fwereade> rogpeppe, my argument is really that *most* providers don't retry, and that even if we get to a point where most *do* I would rather have a new provider able to implement the interface to the full more easily
<rogpeppe> fwereade: huh? this doesn't make it harder in the slightest for a new provider
<fwereade> rogpeppe, rather than make Put-retries a fundamental requirement for a provider, I would like them to be an optional enhancement
<rogpeppe> fwereade: they are
<rogpeppe> fwereade: a provider can totally ignore the fact that it's a ReadSeeker
<fwereade> rogpeppe, if an Environ is always required to retry, ReadSeeker makes sense
<fwereade> rogpeppe, Storage sorry
<rogpeppe> fwereade: i don't see any requirement - it makes it *possible* for a provider to retry, but by no means mandatory
<fwereade> rogpeppe, a Reader in the interface implies potentially-unreliable streaming, a ReadSeeker tenacious persistence
<fwereade> rogpeppe, if the lowest common denominator is unreliable streaming, that is what we should advertise
<rogpeppe> fwereade: i don't see any difference in reliability between a Reader and a ReadSeeker
<rogpeppe> fwereade: they both implement io.Reader
<rogpeppe> fwereade: a seeker has no more reliability guarantees
<fwereade> rogpeppe, if any, or all, providers want to get smarter and buffer, then they can do so and everyone who uses them gets to see the benefits transparently
<fwereade> rogpeppe, I'm talking about the reliability of the operation, not its data source
<rogpeppe> fwereade: but we can implement this behaviour once and for all providers
<rogpeppe> fwereade: another possibility would be to hoist the retry logic outside the providers, but i think it's too inherently provider-specific
<rogpeppe> fwereade: and even then, we'd need a stream we could rewind
<fwereade> rogpeppe, how about EnsureSeeker(Reader) (ReadSeeker, error), and fall back to only trying once?
<rogpeppe> fwereade: who calls that?
<fwereade> rogpeppe, any provider that wants to add retry logic
<fwereade> rogpeppe, they just have to be prepared for it to fail and fall back to non-retrying
<rogpeppe> fwereade: what would EnsureSeeker do other than .(io.ReadSeeker) ?
<fwereade> rogpeppe, one day, maybe, buffer to a file if that fails
<fwereade> rogpeppe, I think that's a cleaner separation of concerns
<rogpeppe> fwereade: i'd find it preferable for a client to explicitly disable retrying (for instance by passing in a ReadSeeker where the Seek always fails)
<rogpeppe> fwereade: then a client has to make a positive action to *disable* reliability, rather than the other way around
<fwereade> rogpeppe, except in the cases where it isn't reliable
<rogpeppe> fwereade: func ErrorReadSeeker(r io.Reader) io.ReadSeeker
<rogpeppe> fwereade: sure, some providers just aren't reliable. but when we use a reliable provider, we are assured of reliability
<rogpeppe> fwereade: rather than, some time in the future "oh, that's failing because i forgot to use a bytes.Reader rather than a bytes.Buffer"
<rogpeppe> fwereade: and this question is currently academic - we have a seekable reader in every single case, trivially.
<fwereade> TheMue, https://codereview.appspot.com/8824043/ reviewed
<TheMue> fwereade: cheers
<fwereade> rogpeppe, if not all providers are reliable, why would we imply they are?
<rogpeppe> fwereade: we are not. we are implying that a provider *may* be reliable.
<fwereade> rogpeppe, and placing the burden for supporting reliability on the *client*
<rogpeppe> fwereade: ... which is trivial!
<fwereade> rogpeppe, despite the fact that it may not actually be present
<rogpeppe> fwereade: there is no burden
<rogpeppe> fwereade: honestly - how does it actually burden the client?
<fwereade> rogpeppe, you just suggested ErrorReadSeeker
<fwereade> rogpeppe, which STM to be boilerplate you expect every client to use if they don't know the provenance of their Reader
<fwereade> rogpeppe, which, honestly, I do not think they should have to care about
<rogpeppe> fwereade: yes - i'd like each client to be fully aware when they might not get reliability against an otherwise bulletproof provider
<rogpeppe> fwereade: but currently there are *no* clients that would use it
<fwereade> rogpeppe, IIRC Put returns an error
<fwereade> rogpeppe, under-promise, over-deliver ;)
<rogpeppe> fwereade: this isn't promising anything, really.
<rogpeppe> fwereade:
<rogpeppe> 1) this does not require any providers to do anything different
<rogpeppe> 2) this enables a provider to implement retry easily
<rogpeppe> 3) there is no significant impact on how easy it would be to write a client that uses the Storage.
<rogpeppe> i really can't see a down side
<fwereade> rogpeppe, *every* provider needs to Read; *some* providers may be able to be smarter if they can also Seek
<rogpeppe> fwereade: sure.
<fwereade> rogpeppe, this seems to be exactly why we can do `rs, ok := r.(io.ReadSeeker)`
<rogpeppe> fwereade: and *every* client needs to work with *every* provider
<rogpeppe> fwereade: so what the client provides will enable those *some* providers to be smarter.
<rogpeppe> fwereade: if there was a significant difficulty for clients to provide a Seeker, i'd agree with you.
<rogpeppe> fwereade: but really, there's not AFAICS.
<fwereade> rogpeppe, whether or not a given Reader is also a Seeker is not something clients should have to think about
<rogpeppe> fwereade: it really is - a client should be aware that by using a reader that's not a seeker it is making the operations more unreliable on some providers.
<rogpeppe> fwereade: and we can let the compiler tell us that
<rogpeppe> fwereade: i think that's awesome
 * rogpeppe really doesn't like dynamic type checks where they're easily avoided.
<fwereade> rogpeppe, you're proposing that introducing a magic implicit reliability switch will make people's lives *easier*?
<fwereade> rogpeppe, the client should not get to affect that choice
<rogpeppe> fwereade: it's not magic. doing a dynamic type check under the hood to see if the client *might* have turned on that switch *is* magic.
<fwereade> rogpeppe, how does it affect the client, except to cause fewer errors?
<rogpeppe> fwereade: fewer errors is a huge thing
<fwereade> rogpeppe, yes
<fwereade> rogpeppe, and a good thing
<fwereade> rogpeppe, that the client doesn't have to think about
<rogpeppe> fwereade: i'm glad we agree on something :-|
<rogpeppe> fwereade: indeed - the io.ReadSeeker contract implies that they get reliability for free
<rogpeppe> fwereade: and is trivial for all current (and probably most future) clients to provide
<rogpeppe> fwereade: that's really the crux - if that's not true, then i think you're right.
<fwereade> rogpeppe, saying "you have to provide a readseeker, and hence reliability, but it might not be used" is not better than saying "just give us a Reader -- if it's also a ReadSeeker, you might see some benefits on some providers"
<rogpeppe> fwereade: i think it is
<rogpeppe> fwereade: because it means that the compiler tells us where things might be unreliable.
<rogpeppe> fwereade: so we *know* that in all current cases, you *will* see benefits on the providers that can use it
<rogpeppe> fwereade: and when making this change, i was surprised by the places that it told me
<rogpeppe> fwereade: and they were all trivially changeable to make work
<dimitern> jam: you around?
<rogpeppe> fwereade: i think that's a concrete benefit
<fwereade> rogpeppe, you'll have to explain further
<rogpeppe> fwereade: so, when i changed the signature, i ran the build
<rogpeppe> fwereade: and it said, in various places "that argument does not implement io.ReadSeeker"
<rogpeppe> fwereade: and i went there
<fwereade> rogpeppe, so, it doesn't get to be reliable on ec2, just like it doesn't everywhere
<rogpeppe> fwereade: and, lo! it was using bytes.Buffer not bytes.Reader
<wallyworld> dimitern: https://code.launchpad.net/~wallyworld/juju-core/openstack-image-lookup/+merge/159301
<jam> dimitern: I'm back
<rogpeppe> fwereade: i don't understand that
<fwereade> rogpeppe, the interface should reflect the minimum set of required capabilities
<fwereade> rogpeppe, if I've just got a Reader I *know* its potentially unrepeatable anyway
<fwereade> rogpeppe, you're asking me to turn it into a ReadSeeker that you might not bother to use
<fwereade> rogpeppe, how does it help?
<fwereade> rogpeppe, OTOH transparently making use of reliability, or injecting it ourselves if it's really critical, frees the client to solve their own problem
<rogpeppe> fwereade: we require that code to run against every provider. we *know* that when we run with the ec2 provider that the ReadSeeker *will* be used (assuming an operation fails in a transient way)
<fwereade> rogpeppe, why do all the clients for all the other providers have to care?
<rogpeppe> fwereade: there is no client "for another provider". a client is for *all providers*
<fwereade> rogpeppe, you have been talking like the client knows its using ec2
<rogpeppe> fwereade: we are indeed writing code that we know will be used against ec2 (and all the other providers too)
<fwereade> rogpeppe, the *all providers* context should only make it more starkly clear that we are messing with a perfectly good interface so we can show off a special feature of a single provider
<rogpeppe> fwereade: on the contrary, i think that the ec2 requirement shows the inadequacy of the interface
<rogpeppe> fwereade: and i'm sure there will be other providers that will want to retry a Put too.
<rogpeppe> fwereade: by making the change now, we make it easy for them to do so
<fwereade> rogpeppe, AFAICT the only "inadequate" thing about the interface is that it requires that you do about the most innocuous possible type check check
<fwereade> rogpeppe, in order to add *provider-specific* functionality
<fwereade> rogpeppe, ReadSeeker makes no sense unless reliability-vs-transient-errors is in the contract of the Storage type
<rogpeppe> fwereade: the inadequate thing is that it's not obvious to people using that interface that if the *happen* to pass a ReadSeeker in, that somehow things will magically get more reliable on some providers.
<fwereade> rogpeppe, but you seemed to be saying that people would usually be using ReadSeekers anyway
<rogpeppe> fwereade: as far as i'm concerned, the contract of the Storage type is "do whatever you can to try and fulfil these requests"
<rogpeppe> fwereade: not really - i'm saying that people would usually be using some that could trivially *be* a ReadSeeker
<fwereade> rogpeppe, so you're breaking the contracts of maas and openstack,because they don;t do that
<rogpeppe> fwereade: not at all
<rogpeppe> fwereade: perhaps they don't get transient errors
<fwereade> rogpeppe, then they *definitely* don;t need ReadSeekers!
<rogpeppe> fwereade: that's fine. all they need is the reader bit then.
<rogpeppe> fwereade: there's no requirement to use the seeker part
<fwereade> rogpeppe, then why would we require that it be supplied?
<rogpeppe> fwereade: ec2 won't use the seeker part unless something fails with a certain kind of error
<rogpeppe> fwereade: because thing client needs to supply a reader for *all* providers!
<rogpeppe> s/thing/the/
<fwereade> rogpeppe, and it does
<fwereade> rogpeppe, and ec2 would make use of it if you would just make use of the language feature designed explicitly for that purpose
<fwereade> rogpeppe, type casts and types switches were not just put in for fun
<rogpeppe> fwereade: *and* if clients actually passed in a readseeker
<fwereade> rogpeppe, we don;t get to control our inputs and probably nor do they
<fwereade> rogpeppe, *they* then have to advertise ReadSeeker
<rogpeppe> fwereade: when would it ever *not* be appropriate for a client to pass in a ReadSeeker?
<fwereade> rogpeppe, all the places you changed, for a start
<rogpeppe> fwereade: in that branch, there are various places where the changed code does "bytes.NewReader(buf.Bytes())". that code is totally non-obvious if the argument to Put is just a Reader. but it has significant runtime implications.
<fwereade> rogpeppe, being?
<rogpeppe> fwereade: that when running against ec2 the operation will be less reliable if you don't
<rogpeppe> fwereade: essentially we want all our client code to work as well as possible against all providers. that means that the client code should work as well as possible against *at least* the ec2 provider. so *all* clients should provide the thing that makes ec2 work well - i.e. a ReadSeeker. given that, why not make the interface appropriate to that?
 * rogpeppe thinks it might be possible to phrase most of that in mathematical notation
<wallyworld> fwereade: are you free sometime soon for a meeting with me and dimitern ?
<wallyworld> soon = next 30 minutes ?
<dimitern> fwereade: about openstack constraints/images/flavors selection
<fwereade> wallyworld, dimitern: sure, I can do now
<dimitern> fwereade: ok, I'll start a g+
<wallyworld> ok
<dimitern> https://plus.google.com/hangouts/_/edc5333a9131548ea93258b2c5c90f6a9ef4af15?authuser=0&hl=en << fwereade, wallyworld
<wallyworld> trying, not connecting
<dimitern> wallyworld: william was having problems with sound
<jam> wallyworld: as a side thing, when you know your holidays, try to put them on the juju team calendar. I think I did it correctly for you this time.
<wallyworld> jam: i did add them
<wallyworld> the week before i left
<jam> wallyworld: I mean the ones you just applied for national holidays in April and June
<wallyworld> np sorry
<jam> april 25th june 10th I think
<rogpeppe> fwereade: "Why aren't we doing this at the end of cmd.Main? We already get a log message on error."
<rogpeppe> fwereade: that's the way i tried to do it first
<rogpeppe> fwereade: but unforunately it breaks the jujuc log command
<rogpeppe> fwereade: we really don't want jujuc log to say "command finished" every time it runs.
<rogpeppe> jam: "
<rogpeppe> Some
<rogpeppe> platforms allow renaming an empty dir over another empty directory.
<rogpeppe> "
<rogpeppe> really?
<rogpeppe> i thought the whole point of using directories in fslock was that that wasn't the case
<rogpeppe> fwereade: if that *is* the case, then fslock is broken AFAICS
 * dimitern bbiab
<TheMue> fwereade: the propose is in again
 * TheMue is at lunch
<fwereade> rogpeppe, I'm not at fslock properly yet
<rogpeppe> fwereade: ok
<rogpeppe> fwereade: a review from you would be good before it goes in
<fwereade> rogpeppe, re jujuc: hmm, I'm not so sure jujuc logging is entirely a misfeature
<fwereade> rogpeppe, if anything I'd say "good point, add start logging to jujuc commands"
<fwereade> rogpeppe, I would expect that in the eventual case that would be logged at a lower level than that specified for juju-log
<rogpeppe> fwereade: the problem is the jujuc *log* command. do we really want that to produce two lines of log for every line the user intended to log?
<rogpeppe> fwereade: i thought that the level was one of the arguments to juju-log
<rogpeppe> fwereade: well, one that we currently discard, i'm aware
<rogpeppe> fwereade: but given that we have those levels now, i think it probably should support that
<fwereade> rogpeppe, it is one of the arguments to juju-log, and ofc it should support that
<fwereade> rogpeppe, but that's the level at which the supplied message should be logged, and it's independent of the level at which we log command completion
<rogpeppe> fwereade: so if i do juju-log --level=debug foo, do i really want to see two lines at that level in the log?
<rogpeppe> fwereade: if you think that's ok, then i'll go with logging in cmd.Main. i didn't think it was, which was why i changed it.
<fwereade> rogpeppe, yes, I think so, because if we unfuck juju-log we will also, I imagine, badge its output clearly such that we can filter that stuff easily
<fwereade> rogpeppe, might be wrong
<rogpeppe> fwereade: i'm just imagining a scenario where the user is producing *lots* of output using juju-log. we just doubled the number of lines produced
<rogpeppe> fwereade: and log file size is actually a significant issue for us.
<fwereade> rogpeppe, how much would we save if we dropped the logging in state.Open except (1) before the dial func and (2) on errors inside the dial func?
<fwereade> rogpeppe, saving 240 lines/agent/hour will probably leave us some wiggle room:)
<rogpeppe> fwereade: i don't know. i have thrown away the last huge log file i acquired
<fwereade> rogpeppe, not as much help as silencing the txn spam, I agree
<rogpeppe> fwereade: i just think it's a bit weird that a command whose sole purpose is to produce a line of log output actually produces two.
<rogpeppe> fwereade: well, probably more still actually, as the uniter probably logs when it gets a jujuc command excecution request
<rogpeppe> fwereade: i think the changes to logging in state.Open sound reasonable.
<mramm> morning all (well early my morning)
<rogpeppe> mramm: hiya
<fwereade> rogpeppe, I think that the general "it's good to know what commands we run" principle *probably* beats the (otherwise clearly sane) juju-log consideration
<rogpeppe> mramm: i won't be able to join the kanban meeting today as my broadband connection is out
<rogpeppe> fwereade: i think i agree. but the "finished" message perhaps doesn't
<fwereade> rogpeppe, sorry, the agent finished messages?
<mramm> bummer that
<rogpeppe> fwereade: the "finished" message in cmd.Main
<rogpeppe> mramm: you never know, the engineers might turn up (chortle, chortle)
<fwereade> rogpeppe, I don't see one
<fwereade> rogpeppe, I see"command failed"
<rogpeppe> fwereade: that's the one you're suggesting
<rogpeppe> fwereade: "Why aren't we doing this at the end of cmd.Main?"
<fwereade> rogpeppe, and I just suggested an alternative: in SuperCommand
<rogpeppe> fwereade: orly? i don't think i've seen that
<fwereade> rogpeppe, or... apparently I didn't
<fwereade> rogpeppe, the insanity peppers must be kicking in
<fwereade> rogpeppe, or maybe I'll discover it in the wrong buffer somewhere in an hour
<rogpeppe> :-)
 * fwereade pokes himself with something sharp
<rogpeppe> fwereade: that happens to me too often
<rogpeppe> fwereade: why would doing it in SuperCommand help? doesn't jujuc use SuperCommand?
<fwereade> rogpeppe, I *think* niemeyer talked me out of it, let me check
<rogpeppe> fwereade: you're right, i don't think it does
<rogpeppe> pwd
<rogpeppe> fwereade: still seems a bit iffy to me. we're logging in SuperCommand because use of SuperCommand *happens* to correspond exactly with the commands we want to log finished messages for.
<fwereade> rogpeppe, feels a bit icky but maybe not too much so
<fwereade> rogpeppe, because SuperCommand at least does know about logs
<fwereade> rogpeppe, and in general I don't think we should expect either main.Main *or* cmd.Main to have logging configured
<rogpeppe> fwereade: and juju doesn't?
<fwereade> rogpeppe, not much point logging if you don't have a target IMO
<rogpeppe> fwereade: true, but no harm either, no?
<fwereade> rogpeppe, message that looks like it should be printed but actually isn't
<rogpeppe> fwereade: true of all log messages...
<fwereade> rogpeppe, I think that , all other things being equal, it is better to write a given message when you know that logging setup has occurred, and to write important messages elsewhere if you can't be sure
<fwereade> rogpeppe, but that is kinda a derail
<fwereade> rogpeppe, how do you feel about dropping it from cmd.Main and logging success/failure in SuperCommand?
<fwereade> rogpeppe, independent of reasons I might like it ;)
<rogpeppe> fwereade: that seems reasonable to me. i'd forgotten that logging was so tightly bound up with SuperCommand.
<fwereade> TheMue, https://codereview.appspot.com/8824043/ LGTM
<TheMue> fwereade: yeah, just read it, thank.
<fwereade> dimitern, rogpeppe, jam, anyone: a second look at that would be handy; I am not entirely happy about some of the code but it produces the correct output and that's the critical thing at this point
<TheMue> fwereade: the deleting of the units of subordinate service is done this way in py too. the data is used before to add the unit info to the principal units.
<rogpeppe> fwereade, TheMue: looking
<fwereade> rogpeppe, https://codereview.appspot.com/8821043/ LGTM trivial
<rogpeppe> fwereade: ta!
 * rogpeppe is trying hard to grok the code in processServices
<TheMue> rogpeppe: maybe i should move those loops into two proprocessing funcs with a speaking name
<rogpeppe> istm that it should be possible to get the type system working for us there, rather than fighting it in every line
<TheMue> rogpeppe: it's only a mix of the collected data into the output
<TheMue> rogpeppe: especially the lower loop is exactly how it is done today in py
<rogpeppe> TheMue: the difficulty i have is that we're spending a lot of code dynamically type casting stuff that perhaps we can already know the types of.
<TheMue> rogpeppe: but this map[string]interface{} indeed complicates it *sigh*
<TheMue> rogpeppe: yes, i'm not happy about it too
<TheMue> rogpeppe: but regarding the feature freeze i think we should change this afterwards
<rogpeppe> TheMue: at the least it could do with some comments so we know what "post-processing the subordinates" is actually doing.
<TheMue> rogpeppe: will add them
<rogpeppe> TheMue: what do subFromMap and subToMap do?
<TheMue> rogpeppe: will find better names. the subToMap had its name from collecting the service names for the output "subordinate-to". ;)
<rogpeppe> TheMue: what's are the keys and what to the keys map to?
<rogpeppe> s/to the/do the/
<TheMue> rogpeppe: they do hangout ;)
<rogpeppe> TheMue: given that the data types are so dynamic, we really need more comments
<TheMue> rogpeppe: i'll add comments and change the names
<rogpeppe> hmm, the link box says it's got a link, though the phone's still dead. will try broadband again.
<rogpeppe1> woo!
<mramm> https://plus.google.com/hangouts/_/539f4239bf2fd8f454b789d64cd7307166bc9083
<rogpeppe1> fwereade: hmm, i'm not sure if it's working
<rogpeppe1> very limited bandwidth!
<rogpeppe1> dimitern: i saw you for a moment...
<TheMue> fwereade: is it ok to move the status in with your comments handled and improvements in readability and comments?
<fwereade> TheMue, has anyone else at least glanced at it?
<fwereade> TheMue, I care a lot more about output than about readability though :)
<TheMue> fwereade: roger is looking, and i could as dimitern
<fwereade> TheMue, but yes, you have my LGTM, just get someone else's too :)
<TheMue> fwereade: ok, will do
<rogpeppe1> interesting, bandwidth tester indicates 1Mb download (not too bad) but 52Kb upload. not surprising that the hangout worked ok when i turned off my vid and microphone
<dimitern> TheMue: you really hate merging conflicts it seems ;)
<TheMue> dimitern: yeah, really, i do
<TheMue> dimitern: and also i got my first lgtm ;)
<dimitern> TheMue: no worries, it'll be easier for me as well once you land your stuff
<TheMue> dimitern: great. had a chance to take a look at it?
<dimitern> TheMue: just a brief glance
<TheMue> dimitern: can i charm you a lgtm? ;)
<dimitern> TheMue: in a bit, just looking through cards and bugs to make sure we're in sync
<TheMue> dimitern: just wait, a new propose will fly in in a few moments
<dimitern> TheMue: sure, np
<TheMue> fwereade: ping
<fwereade> TheMue, pong
<TheMue> fwereade: your comment in the status_test, line 842, how do you think it shall work?
<fwereade> dimitern, rogpeppe1, TheMue: trivial if anyone wants it: https://codereview.appspot.com/8768045
<TheMue> fwereade: i'm lost with this comment.
<dimitern> fwereade: LGTM, trivial
<TheMue> fwereade: LGTM
<fwereade> thanks guys
<TheMue> fwereade: i thought i followed your first idea, but it seems that this has been wrong
<fwereade> TheMue, I mean that adding a relation is a prereq of adding a subordinate, and that won't (shouldn't ;)) work if you add a subordinate of service S to more than one unit of service T
<fwereade> TheMue, it's a good start
<fwereade> TheMue, but relateServices already exists and shouldn't be duplicated
<fwereade> TheMue, while adding the subordinate is an operation on an existing unit and an existing service
<TheMue> fwereade: ah, ic
<fwereade> popping out for a mo, bbiab
<benji> gary_poster/rick_h_: yeah, I'm looking at it
<gary_poster> benji, hi channel jumper
<benji> pff
<TheMue> dimitern: https://codereview.appspot.com/8824043/ is back in again
<dimitern> TheMue: I'm on it
<dimitern> TheMue: you got LGTM from me
<dimitern> TheMue: please go ahead an land it, so I can continue on mine
<TheMue> dimitern: thx, will do
<TheMue> anyone experiencing the same bootstrap_test problems like me? i just merged the trunk and since then the test fails
<dimitern> TheMue: what's you series?
<TheMue> precise
<dimitern> TheMue: i'm on quantal - jut got latest trunk and running tests now; haven't seen this issue before though
<fwereade> TheMue, if you're seeing https://bugs.launchpad.net/juju-core/+bug/1169826 then I need help tracking it down please :)
<TheMue> fwereade: hmm, no, i don't get a panic
<dimitern> I also didn't get a panic, but got this error in environs/openstack: http://paste.ubuntu.com/5716314/
<TheMue> fwereade: it's in bootstrap_test line 105
<TheMue> fwereade: c.Check(urls, HasLen, len(test.uploads))
<TheMue> fwereade: if i get it right it's one url instead of the expected two
<fwereade> TheMue, expected/actual? which test?
<dimitern> and i can see the cmd/juju tests almost doubled in running time the past 2 weeks
<TheMue> fwereade: cmd/juju/bootstrap_test.go line 105
<fwereade> dimitern, that's because we're actually testing stuff
<fwereade> TheMue, that's a table test, the line number tells me very little
<dimitern> fwereade: that's good then :)
<fwereade> dimitern, ha, I can repro yours (one of them anyway) if I edit /etc/lsb-release
<TheMue> fwereade: does "test 10: --upload-tools always bumps build number" help you more?
<fwereade> TheMue, helpful, tyvm
<fwereade> TheMue, dimitern, sorry I have to take a break :/ back soon
<TheMue> fwereade: as my CL doesn't cover this is it ok to still submit it?
<dimitern> fwereade: that's awesome, I'll remember this trick to change current series
<dimitern> TheMue: submitting onto a broken trunk is a no no
<hazmat> ods keynote livestream fwiw http://openstackportland2013livestream.eventbrite.com/
<TheMue> dimitern: ok, so status has to wait
<dimitern> hazmat: it says it's ended, but I can't see a link to the video..
<hazmat> dimitern, by bad.. better link.. http://www.openstack.org/
<dimitern> hazmat: cool, 10x!
<hazmat> dimitern, mark's keynote is in +1hr 15m
<dimitern> TheMue: just dump the whole error output + logs and paste it please
<dimitern> hazmat: isn't it streaming live now? I can see it has started..
<hazmat> dimitern, the keynotes have started, but there are some other ones scheduled first
<dimitern> hazmat: ah, ok, cheers
<dimitern> fwereade: I can reproduce the error I posed above consistently
<TheMue> dimitern, fwereade: http://paste.ubuntu.com/5716425/
<dimitern> TheMue: cheers
 * TheMue has to take a break
<dimitern> mark's keynote is staring now
<rogpeppe1> dimitern: link?
<dimitern> rogpeppe1: http://www.openstack.org/home/Video/
<rogpeppe1> dimitern: is it really not possible to find out a service's subordinate or principal without looking at its units?
<dimitern> rogpeppe1: I don't think you can
<rogpeppe1> dimitern: seems a bit weird. so if there are no units, there's no subordinate-principal relationship between two units?
<rogpeppe1> s/two units?/two services?/
<dimitern> rogpeppe1: you can take a look at the service's charm as well
<rogpeppe1> dimitern: how does that help?
<rogpeppe1> dimitern: presumably you actually need to look at the relations
<dimitern> rogpeppe1: if the service is running a subordinate charm, then all it's units are/will be subordinates
<rogpeppe1> dimitern: yes, but for status we need to find out a *service's* principal service, irrespective of units, no?
<dimitern> rogpeppe1: well, from the list of relation a subordinate service is participating in then
<dimitern> rogpeppe1: relations*
<rogpeppe1> dimitern: yup. i think that if we see that there's a relation between two services and scope is container, then we know
<dimitern> rogpeppe1: makes sense, yeah
<rogpeppe1> niemeyer: is there any way of telling goyaml to ignore a field when marshalling? (unexported fields seem to cause an error)
<niemeyer> rogpeppe1: unexported fields shouldn't cause errors
<niemeyer> rogpeppe1: Example?
<rogpeppe1> niemeyer: one mo
<rogpeppe1> niemeyer: http://paste.ubuntu.com/5716622/
<rogpeppe1> niemeyer: one mo, perhaps i'm running an old version of goyaml
<rogpeppe1> niemeyer: when i run that code, i see "error: YAML error: reflect.Value.Interface: cannot return value obtained from unexported field or method"
<niemeyer> rogpeppe1: Haven't made any changes about that
<niemeyer> rogpeppe1: Will test
<niemeyer> rogpeppe1: That's a bug
<niemeyer> rogpeppe1: Will fix
<rogpeppe1> niemeyer: ah, ok
<rogpeppe1> niemeyer: there's another bug i just found too
<niemeyer> Surprising to see that broken
<rogpeppe1> niemeyer: i haven't looked into the source, but it seems like it might be hashing types by name somewhere
<niemeyer> I'd have guessed someone would have reported by now
<niemeyer> rogpeppe1: Can't tell what that means
<rogpeppe1> niemeyer: because if you use two function-scoped types of the same name, goyaml gets confused
<niemeyer> rogpeppe1: Sorry, I still have no idea about what you mean.. an example helps
<rogpeppe1> niemeyer: ok, here's an example (slightly bigger than it could be, but it's the stage i got to when i realised i knew what the problem was:
<rogpeppe1> niemeyer: http://paste.ubuntu.com/5716642/
 * rogpeppe1 has a very dodgy network connection currently
<rogpeppe1> niemeyer: search for "noMethods" in that code
<niemeyer> rogpeppe1: Ok?
<niemeyer> rogpeppe1?
<rogpeppe1> niemeyer: sorry, i'm just finding the output
<rogpeppe1> niemeyer: ok, this is a slightly smaller example: http://paste.ubuntu.com/5716665/ and this is its output: http://paste.ubuntu.com/5716667/
<rogpeppe1> niemeyer: the fields showing under exposed-service/0 are from the serviceStatus type, not from the unitStatus type as they should be
<rogpeppe1> niemeyer: for example "charm" is showing the field from slot 1 of the unitStatus struct
<niemeyer> rogpeppe1: It looks correct given the code
<niemeyer> func (s serviceStatus) GetYAML() (tag string, value interface{}) {
<niemeyer> 	type noMethods serviceStatus
<niemeyer> 	return "", noMethods(s)
<niemeyer> }
<niemeyer> rogpeppe1: serviceStatus *is* returning a serviceStatus
<rogpeppe1> niemeyer:
<rogpeppe1> yuip
<rogpeppe1> niemeyer: and unitStatus is returning a unitStatus
<niemeyer> rogpeppe1: Yeah, and how will it ever get into a unitStatus?
<rogpeppe1> niemeyer: but goyaml thinks that the unitStatus returned (well actually a noMethods value) is a serviceStatus
<niemeyer> rogpeppe1: let me run the code here rather than guessing.. hold on
<niemeyer> rogpeppe1: Yeah, there's a bg
<niemeyer> bug
<rogpeppe1> niemeyer: indeed :-)
<niemeyer> rogpeppe1: I actually improved that logic a long time ago in bson, and forgot to implement it in goyaml
<niemeyer> rogpeppe1: Easy to fix, though
<rogpeppe1> niemeyer: cool
<niemeyer> rogpeppe1: Both fixed and pushed.
<rogpeppe1> niemeyer: woah!
<rogpeppe1> niemeyer: nice one!
<niemeyer> rogpeppe1: I already had the context for the latter issue, and the former was a silly mistake
<rogpeppe1> niemeyer: at some point, it would be nice for goyaml to support `goyaml:"-"` like json does
<niemeyer> rogpeppe1: Done and pushed. Next? :)
 * rogpeppe1 can't think of any more at the moment :-)
<niemeyer> rogpeppe1: Cool, thanks for the reports
<rogpeppe1> niemeyer: np. i was also surprised by the unexported field error, BTW.
<niemeyer> rogpeppe1: I found a better naming for the flusher stuff, btw, which made me happier as well.
<niemeyer> rogpeppe1: v2 is coming
<rogpeppe1> niemeyer: great!
<rogpeppe1> niemeyer: new name is?
<niemeyer> rogpeppe1: type Task interface { Run; Kill }
<rogpeppe1> niemeyer: perfect
<rogpeppe1> niemeyer: that maps much better to how i came to understand it
<niemeyer> rogpeppe1: Yeah, it maps better to the overall problem indeed.. and, funny enough, it was the previous name
<niemeyer> rogpeppe1: But there were several refactorings after that which made things flow back and forth
<rogpeppe1> niemeyer: i know how that goes...
<niemeyer> rogpeppe1: The current design was finally sound, but the naming wasn't perfect anymore
<rogpeppe1> niemeyer: overall it's a nice small package. i really like it.
<niemeyer> rogpeppe1: Nice, I'm glad to hear it
 * TheRealMue goes to bed now, good night
<thumper> morning
<fwereade> thumper, heyhey
<thumper> hi fwereade
<fwereade> thumper, I WIPped most of your fslock branches but there's only one serious question at the root of it, and then a feature we'd all like that will hopefully not hurt too much ("message" or similar on acquire)
<thumper> fwereade: ok, what is the one serious question?
<fwereade> thumper, the latter will enable units to reliably break their own locks, which I think is a Good Thing
<fwereade> thumper, if some FSs can rename over empty dirs, shouldn't be we breating the held file before moving the dir into place?
<fwereade> thumper, or does that break something I'm not seeing?
<fwereade> s/breating/creating/
<thumper> actually that makes a lot of sense, and yes, I'm +1 on that
<fwereade> thumper, sweet
<mgz> dumb go question, I have n *int, how do I do (n != nil ? *n : 0) idiomatically?
<mgz> this is to fill in a bunch of struct fields, so I really don't want a three line if block per one...
<fwereade> mgz, func valueOrZero(*int) int, I suspect
<fwereade> mgz, nothing neater springs to mind
<thumper> fwereade: reading your review comment, how would you confirm that you didn't take a lock while someone else had it?  when running in a go routine, repeated?
<fwereade> thumper, fist thought, may not be very smart, send a bunch of "acquired" and "released" thingies down a channel from n goroutines and check that you always get true/false/true/false/true/false
<fwereade> thumper, can't think of anything obviously wrong with it though
<fwereade> thumper, acq1/rel1/acq3/rel3/acq3/rel3/acq1/rel1 etc
 * thumper nods
<thumper> I get the feeling that I should smash the pipes together, and have a single proposal...
<fwereade> thumper, I'd be fine with that I think
<fwereade> thumper, (oh, I would also really like tests for the uniter)
<thumper> fwereade: yeah, read that, will look into it
<thumper> fwereade: is there any way to nicely simulate a crash during hook execution?
<thumper> fwereade: actually, we can trivially have a test that has a lock for that uniter
<fwereade> thumper, not nicely -- I think best to create the lock dir really
<thumper> fwereade: and we can check when we create one, it unlocks it
<fwereade> thumper, exactly
 * thumper nods
<thumper> fwereade: also, I don't like the message being the held file...
<thumper> fwereade: feels wrong some how
<thumper> fwereade: how do I check I own it? before unlocking
<thumper> if we use message?
<thumper> I'll update the code to use atomic writes for message
<fwereade> thumper, message doesn't have to be the held file, you can create the two indpendently before the copy
<thumper> and have an expected message on Lock
<thumper> fwereade: jam was suggesting to use the message for held
<thumper> just saying I'm -1 on that
<fwereade> thumper, yeah, I rather like the nonce in held
<fwereade> thumper, there is I think a meaningful distinction between "did this identity create the lock" and "did this specific chunk of memory create the lock"
<fwereade> thumper, so, yeah, I don't think it's necessary to conflate the two
 * thumper nods
<thumper> fwereade: one problem with logging every time you can't get a lock
<thumper> fwereade: when trying, is that retries are every second (by default)
<thumper> fwereade: and hook execution could be a while
<fwereade> thumper, ok, that's too many
<thumper> in the order of minutes
<thumper> fwereade: how about writing it the first time
<fwereade> thumper, (not too many checks, just too many messages)
<fwereade> thumper, first time is fine by me
<thumper> fwereade: and each subsequent time the message changes?
<fwereade> thumper, ah, yeah, nice
<thumper> should give an obvious, not too invasive trail
<fwereade> thumper, and *maybe* one every 120 times though the loop or something?
<thumper> fwereade: I'd rather use time than a loop counter
<thumper> although
<thumper> hmm...
<thumper> synchronization is hard
 * thumper recalls his talk on "multithreading 101"
<fwereade> thumper, follow your heart :)
<thumper> gave that talk at a conference 7 years ago
<thumper> I thought I knew heaps about it, in the end, I learned so much researching my talk it scared me
<thumper> :)
<fwereade> haha
<fwereade> I know enough to be scared in general
<fwereade> I have learned a lot over the years but somehow the amount I know remains characterizable by "enough to be scared"
<thumper> I'm busy downloading sabdfl's openstack keynote in the background
<thumper> youtube keeps killing it watching live
<fwereade> it's on youtube, I'm listening to it
<fwereade> ah it's only fallen over once for me
<thumper> I had a huge amount of fun writing some lock free algorithms
<rogpeppe1> thumper: i might be tempted to use atomic.AddInt32 to check that two things aren't holding a lock at the same time
<thumper> after the third time it fell over, I decided o download it
<thumper> rogpeppe1: can you explain more?
<thumper> I don't quite get it
<thumper> oh, so increment on lock
<thumper> and decrement on unlock?
<rogpeppe1> thumper: yup
<thumper> and assert values
<rogpeppe1> thumper: and after the increment, you look at the value returned
<rogpeppe1> thumper: if >1 you've fucked up
 * thumper nods
 * fwereade expected rogpeppe1 to show up and explain how to do it better , cheers :)
<thumper> sounds like a plan
<rogpeppe1> thumper: also, i'd change the retry interval to zero when doing that test
<rogpeppe1> thumper: all the better to stress with
<thumper> rogpeppe1: ok, sounds good
<rogpeppe1> thumper: what platform are we worried about not having O_EXCL, BTW?
<thumper> rogpeppe1: I don't have an explicit reason, just went with what bzrlib had because I had talked to lifeless a lot about it, and it is convenient to add informational messages to
<rogpeppe1> thumper: ok. messages would be just as convenient with a file though, i think, no?
<thumper> I'm not familiar with O_EXCL
<thumper> but I trusted the giant who came before me :)
<rogpeppe1> thumper: :-)
<rogpeppe1> thumper: bzr probably needs to run on way more platforms than us, but i suppose it can't harm
 * thumper nods
<rogpeppe1> thumper: i'm just slightly concerned about EBS network traffic though, when we could really do it all locally.
<thumper> what EBS network traffc?
<rogpeppe1> thumper: doesn't disk storage on amazon instances with EBS go across the network?
<thumper> well, we are storing in the agent datadir, where it puts tools
<thumper> we could do it anywhere
<thumper> /var/run/juju would make sense to me
<rogpeppe1> thumper: /tmp ?
<thumper> I would assume some of these would be local
<thumper> no, /tmp is fugly
<thumper> /var/run was made for this reason
 * rogpeppe1 has never heard of /var/run
<thumper> AFAIK
<fwereade> thumper, btw, I forget, do you definitely create the temp dirs on the same filesystem for actual atomic renames?
<thumper> fwereade: I was assuming that /tmp and the juju data dir were on the same filesystem...
<thumper> could create a temp filename in the lock dir to be sure
<fwereade> thumper, +1
<thumper> but I thought that was a bit messy
<fwereade> thumper, subdir with invalid lock name?
<thumper> .nonce or something
<thumper> where nonce is the hex nonce
 * thumper will work something out
 * thumper goes to make a coffee and toast
<fwereade> rogpeppe1, thumper: can I get a quick look at https://codereview.appspot.com/8804044/ please?
 * thumper looks
<fwereade> rogpeppe1, thumper: I promise I was using quantal for some of my testig across the pipeline, but clearly not quite as much as I thought I had
<thumper> fwereade: given, with trivial
<fwereade> rogpeppe1, thumper: the only other quantal complaints I am aware of are (1) verified transient, fixed by completing pipeline and (2) cannot repro at all -- that's a *weird* error in UpgradeJujuSuite that I propse to ignore until I can squeeze more details out of themue
<fwereade> thumper, tyvm
<rogpeppe1> fwereade: looking
<thumper> np
<rogpeppe1> fwereade: PTAL  https://codereview.appspot.com/8842043
<rogpeppe1> thumper: would appreciate it if you could take a look too
 * thumper looks
<rogpeppe1> fwereade: reviewed
 * rogpeppe1 has to go to bed very soon
<thumper> rogpeppe1: +1
<rogpeppe1> thumper: thanks
<thumper> rogpeppe1: personally I wouldn't have had map[string]map[string]interface{}, but instead had a few other structs instead of just reusing *state bits, but that's just me
<thumper> and your approach is equally valid
<thumper> so no review comment on that :)
<rogpeppe1> thumper: that's about my maximum tolerance
<rogpeppe1> thumper: thanks for forbearing :-)
<thumper> np
<rogpeppe1> thumper: actually, i *don't* have map[string]interface{} - i don't use interface{} at all and i'm really happy about that
<thumper> actually you are right...
<thumper> I typed from memory miss remembering
<rogpeppe1> thumper: that was what really prompted this branch was the million occurrences of .(sometype) and thinking "i have no idea where this might panic or if it's valid"
<thumper> :)
<fwereade> rogpeppe1, LGTM just trivials
<thumper> for me, I would have had the units map inside a struct with the service state
<thumper> that's all
<rogpeppe1> fwereade: oh sorry, i've just submitted
<thumper> the rest looks really good
<fwereade> rogpeppe1, meh, fix them tomorrow
<rogpeppe1> fwereade: i saw your earlier LGTM
<fwereade> rogpeppe1, huge win regardless
<fwereade> rogpeppe1, yep, np at all
<rogpeppe1> fwereade: i generally agree with your remarks. putting StatusError alongside other fields would be wonderful and mean that the json/goyaml custom marshallers could go
<fwereade> rogpeppe1, no worries tonight though
<rogpeppe1> fwereade: i just tried to be ultra-compatible because i didn't know what the constraints were
<fwereade> rogpeppe1, +1
<fwereade> rogpeppe1, in general adding fields is fine
<fwereade> rogpeppe1, existing stuff we're trying to be a bit more careful about
<rogpeppe1> fwereade: right, it's that time of day. actually way past. and if it's that time for me, then i suspect it might be for you too :-)
<rogpeppe1> thumper, fwereade: g'night
<fwereade> SLEEP IS FOR THE WEAK
<thumper> rogpeppe1: night
<fwereade> rogpeppe1, gn :)
 * thumper is writing the lock stress test
<davecheney> m_3: ping
<mgz> davecheney: I can go find him if it's anything urgent
<thumper> ah poos
<davecheney> mgz: nah, just wanted to talk about load testing stuffs
 * thumper facepalms
<wallyworld_> thumper: how's the face?
<thumper> covered
<wallyworld_> what exasperated you?
<thumper> $ go test
<thumper> throw: all goroutines are asleep - deadlock!
<wallyworld_> \o/
 * thumper sighs c.Check(state, Equals, 1)
<thumper> ... obtained int32 = 1
<thumper> ... expected int = 1
<wallyworld_> lol
<thumper> davecheney: is there a nice way to get a go routine to release it's time segment?
<thumper> davecheney: I want to say, go run something else for a bit
<thumper> davecheney: will time.sleep(0) do that?
 * thumper recalls it does in other languages
<davecheney> that will do it
<davecheney> runtime.Gosched()
 * thumper goes to the manual
<davecheney> godoc runtime Goschel
<thumper> that's what I want, awesome
<davecheney> godoc runtime Gosched
<davecheney> sleep will do the same thing
<thumper> hmm...
<davecheney> thumper: sorry, maasive lag in australia
<thumper> stress test now fails
<thumper> :(
<thumper> it shouldn't
 * thumper goes to read some more
<davecheney> also try with GOMAXPROCS=8 (your n cores) go test
<davecheney> will probably have a similar effect
<thumper> heh, passes with max procs
<thumper> fails without
<thumper> failed that time
<thumper> bugger
<thumper> debugging this is going to be a PITA
<davecheney> thumper: tried the race detector
<davecheney> ?
<davecheney> if you have a branch, I can test it for you
<thumper> what is the race detector?
<davecheney> it's a feature on go 1.1
<davecheney> it is hte same thread santiser that is available in clag 3.2 / gcc 4.8
<davecheney> thumper: http://tip.golang.org/doc/articles/race_detector.html
<thumper>  davecheney: lp:~thumper/juju-core/fslock-mashup has a failing stress test in utils/fslock
<thumper> passed with 1 and 2 concurrent locks, failed with 3
<thumper> that is why it currently says 3
<thumper> I want 10 in the end...
<davecheney> thumper: two secs
<davecheney> mgz: is mramm around ?
<mgz> not near me currently, could either be at the booth or in a meeting
<davecheney> mgz: nm
<davecheney> thumper: sorry, got distracted by submitting my own branch
#juju-dev 2013-04-18
 * thumper wonders if os.Rename is stable under stress
<thumper> it is the only think that would be causing this to fail imo
<thumper> ah fuk
<davecheney> ??
<thumper> I wasn't checking errors
 * thumper has a screed of them
<davecheney> bzzzt
<thumper> directory not empty...
<thumper> that is an error I didn't expect
<davecheney> is there is a .turd in there ?
<davecheney> or an editor file, or something ?
<thumper>     c.Check(err, IsNil)
<thumper> ... value *os.SyscallError = &os.SyscallError{Syscall:"readdirent", Err:0x2} ("readdirent: no such file or directory")
<thumper> huh?
<thumper> this is on Unlock
<thumper> ah.. I think I know what this is...
<thumper> maybe
<thumper> \o/
<thumper> stress test passes now
 * thumper ups the stress
<thumper> davecheney: what is a reasonable amount of stress in your opinion?
<thumper> 3*100 showed the problem, which is now fixed
<bigjools> obvious joke is obvious
<thumper> had to make unlock atomic at fs level too
<thumper> and rename returned more errors than just ErrExists
<thumper> which caught me out
<thumper> it comes down to time
<thumper> 1000 iterations with 10 concurrent locks takes about 2.5 seconds
<thumper> wow, or 7s without the max procs
<thumper> I really don't want 7s of time added to the test :(
<thumper> 200 and 10 is 1.5s, which is bareable
<thumper> just
<thumper> ok, that has taken longer than I wanted...
<thumper> but I'm off for lunch
<thumper> which is really heading into town to go to the supermarket and buy the new device CD
<bigjools> davecheney: does juju core have any kind of automated integration testing?
<davecheney> bigjools: i think the best answer to that is the charm testing harness that m_3 has built
<davecheney> 2013/04/18 01:06:45 INFO environs/openstack: started instance "1517935"
<davecheney> 2013/04/18 01:06:45 NOTICE worker/provisioner: started machine 45 as instance 1517935
<davecheney> 2013/04/18 01:06:45 INFO worker/provisioner: found machine "46" pending provisioning
<davecheney> 2013/04/18 01:06:45 INFO worker/provisioner: found machine "47" pending provisioning
<davecheney> ^ we need to log when the PA reloads
<bigjools> ok thanks
<jtv> Hi there bigjools
<bigjools> jtv: you're doing an awesome impression of someone who has the week off :)
<jtv> I get the hint
<jtv> I'll be off later, through a region with spotty GSM coverage let alone internet.
<jtv> I just wanted to pop online for a moment, and then had a long fight with bluetooth tethering in Raring.
<jtv> (I checked out of the resort earlier this morning)
<jtv> Apart from bluetooth tethering still not working except once just after installation, raring is working out pretty well so far.
<jtv> Why does the Ubuntu Software Center now have a big A on it?
<bigjools> jtv: you can re-enable virtual desktops in settings btw
<jtv> Yeah, already did thanks.
<jtv> I tried to say it on IRC yesterday, but I think my network connection was in a bit of a limbo state at that point.
<bigjools> it seemed so!
<jtv> Oh, gotta go
<jtv> You may not get this message because my IRC ping time just now was 44 seconds.
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1170176
<thumper> that's nasty
<davecheney> thumper: i think a nil in instance is being stored in that map
<davecheney> checking now
<davecheney> thumper: ubuntu@juju-hpgoctrl2-machine-0:~$ juju bootstrap -v --upload-tools
<davecheney> 2013/04/18 01:57:18 INFO environs/openstack: opening environment "goscale2"
<davecheney> 2013/04/18 01:57:22 INFO environs/tools: built 1.9.15.1-precise-amd64 (2193kB)
<davecheney> why does upload tools append a build number to the tool ?
<thumper> davecheney: I don't know
<davecheney> that kind of sucks
<davecheney> i wanted to use those numbers
<thumper> ask fwereade
<davecheney> thumper: https://bugs.launchpad.net/juju-core/+bug/1170176/comments/1
<thumper> hmm, that would explain it :)
<thumper> jam: hi there, I've munged all my fslock branches into one, and addressed all the comments (I think)
<thumper> jam: spent most of the day writing tests actually :)
<thumper> and fixing the fallout when something failed...
<davecheney>   "17":
<davecheney>     instance-id: "1520273"
<davecheney>     dns-name: 15.185.165.35
<davecheney>     agent-version: 1.9.15.1
<davecheney>     agent-state: started
<davecheney>   "18":
<davecheney>     instance-id: "1520275"
<davecheney>     dns-name: 15.185.165.81
<davecheney>     agent-version: 1.9.15.1
<davecheney>     agent-state: down
<davecheney>     agent-state-info: (started)
<davecheney> whut ?
 * davecheney away til 18:00h
 * thumper just realised that the meeting is not in 15 minutes
<danilos> jam: heya, let me know when you can pair up
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: dimitern | Bugs: 2 Critical, 61 High - https://bugs.launchpad.net/juju-core/
<dimitern> morning all
<rogpeppe1> dimitern: hiya
<rogpeppe1> mornin' everyone else too!
<rogpeppe1> darn internet is still down
<dimitern> rogpeppe1: hey, what happened to status?
<rogpeppe1> dimitern: sorry, it changed
<dimitern> rogpeppe1: I like it :)
<rogpeppe1> dimitern: glad to hear it :-)
<TheMue> rogpeppe1: me too, even i wondered this morning
<TheMue> rogpeppe1: i know ctx as abbrev for context, so i wondered here too. the context argument for the command execution btw has the name ctx ;)
<rogpeppe1> TheMue: now not a single occurrence of .(someType) or interface{} in sight?
<rogpeppe1> TheMue: yeah, i'd forgotten that. maybe i should change ctxt to ctx to fit in
<rogpeppe1> TheMue: i hadn't even realised that i'd got "ctx" and "ctxt" in the same function scope...
<TheMue> rogpeppe1: would be nice. the introduction of a context (or a similar type) as brace for all status operations would have been the next refactoring step, as well as getting rid of this pseudo generic structure
<TheMue> rogpeppe1: but i thought due to other needs with respect to the code freeze we could do this later
<rogpeppe1> TheMue: i figured it was easier to do the refactoring than to get the subord and relation stuff to work nicely
<rogpeppe1> TheMue: i'm afraid i found processService entirely opaque. and i think there was a problem with it too.
<rogpeppe1> thumper: i don't think you've pushed your latest changes to https://codereview.appspot.com/8602046/
<rogpeppe1> TheMue: oh, i see, you've made the changes in the final branch only
<rogpeppe1> oops
<rogpeppe1> thumper: ^
<TheMue> rogpeppe1: which changes do you refer to?
<TheMue> rogpeppe1: ah, you meant thumper
<rogpeppe1> TheMue: yeah, sorry, tab malfunction :-)
<TheMue> rogpeppe1: i know that, happens often to me with ro<tab>bbiew, *shit* ^H^H^H^H^Hg<tab> *lol*
 * TheMue just got the order to write an article about rust. will be interesting to see the differences to go
<davecheney> 'sup ?
<fwereade> davecheney, heyhey, sorry late
<fwereade> davecheney, it appends a build number so that uploaded tools always have dev versions
<davecheney> fwereade: right
<davecheney> but we never made any use of the concept of dev version
<davecheney> so now it just sticks out there like a sore thumb
<davecheney> fwereade: the reason I wish to complain is I need to use the build version, as I will discuss in the hangout in T-10
<fwereade> davecheney, ok, cool
<davecheney> fwereade: https://codereview.appspot.com/8648045/
<davecheney> looking for a 2nd lgtm
 * fwereade looks
<davecheney> this turned up today in laod testing
<fwereade> davecheney, I don;t think that's correct
<fwereade> davecheney, why hide state info if the instance is missing?
<fwereade> davecheney, (this is not to say that panicing is correct either)
<davecheney> fwereade: ok, that is fine, but this is just making, https://codereview.appspot.com/8842043 work
<davecheney> i'm not changing the behavior
<davecheney> https://docs.google.com/a/canonical.com/document/d/1bSiicbYOV25fq73dZqXz738OU5l96QjkY3mqnXFdLlE/edit
<thumper> hi rogpeppe1
<rogpeppe1> thumper: hiya
<thumper> rogpeppe1: yes, changes all in fslock-mashup
<dimitern> rogpeppe1: thanks for doing this
<rogpeppe1> thumper: am just reviewing. a few more comments on the fslock code after some pondering.
<fwereade> well, it's making it *slightly* less broken, and mea culpa for not spotting that in the review yesterday, but it's not really following intent
<dimitern> where did the g+ link from the even go?
<dimitern> event
<dimitern> can anybody send me the link?
<dimitern> fwereade, rogpeppe1: ?
<dimitern> davecheney, thumper: link please?
<rogpeppe1> dimitern: https://plus.google.com/hangouts/_/calendar/bWFyay5yYW1tLWNocmlzdGVuc2VuQGNhbm9uaWNhbC5jb20.gdt9rkp5uspih9n3db6b95kccc
<dimitern> rogpeppe1: cheers
<rogpeppe1> dimitern: i'm probably not going to be able to make it.
<rogpeppe1> dimitern: will try through my phone connection, but i have my doubts
<davecheney> 1:05 AM
<davecheney> Thursday, April 18, 2013 (PDT)
<davecheney> Time in Portland, OR, USA
<fwereade> um
<fwereade> do we have any expectation that the packaged version will work? we don't know what the critical differences were in 1.9.14
<fwereade> (other than that they shouldn't exist)
<fwereade> jam, rogpeppe2, dimitern: ^
<rogpeppe2> fwereade: we should try the packaged version and see whether we see the same problems
<dimitern> fwereade: not sure i understand your question
<rogpeppe2> fwereade: it looked like the problem was client-side, so shouldn't be too hard to diagnose
 * fwereade is trying to remember what the hell he saw going wrong when he tried ap-southeast-2
<fwereade> ok, rogpeppe2 and dimitern, you should not be worrying about this
<fwereade> you have code to land :)
<dimitern> fwereade: I'll try with the ppa and one random region
<dimitern> fwereade: (after I land my stuff)
<fwereade> dimitern, awesome
<fwereade> TheMue, would you pick a region, let us know what it is, and get bootstrapping from the PPA please?
 * TheMue just prepares a test image with ppa
<fwereade> TheMue, AFAICT us-east-1 works
<TheMue> fwereade: hehe, just wrote when you sent
<fwereade> TheMue, except... hmm, no, I think that maybe even that does not
<rvba> Hi guys, I just put up for review a branch which adds constraints support in the MAAS provider: https://codereview.appspot.com/8842045/
<rvba> Please have a look.
<dimitern> rvba: i'm glad to see you got lbox working!
<fwereade> rvba, cool
<rvba> dimitern: yeah, I must admit the problem was my fault, wrong bzr config.
<dimitern> rvba: please share your findings with the rest of the red squad, so they can set it up too :)
<rvba> dimitern: already done :)
<dimitern> rvba: great, thnaks!
<fwereade> rvba, I will take a look at it but can I please ask you not to land anything today, while we try to handle the release frenzy
<rogpeppe2> fwereade: what do think about reporting agent-state=pending when instance-state==pending or missing?
<rvba> fwereade: sure, no problem.
<fwereade> rogpeppe2, I'd rather not pretend we can give an instance-state when we can't
<rogpeppe2> fwereade: oops, sorry, i meant instance-id not instance-state
<fwereade> rogpeppe2, the list appeared sanguine about the prospect of (temporarily) missing agent-state
<dimitern> rvba: I'll review it shortly
<rvba> ta
<rogpeppe2> fwereade: the current tests assume no agent-state when instance-id is pending or missing
<fwereade> rogpeppe2, while an instance id is pending I'm fine with reporting only machine series
<fwereade> rogpeppe2, missing is a different matter
<rogpeppe2> fwereade: my changes are pushing towards reporting it always
<rogpeppe2> fwereade: otherwise i have to special-case
<rogpeppe2> fwereade: which seems kinda unnecessary.
<fwereade> rogpeppe2, that's probably simplest -- the downside is that not-yet-provisioned machines are nicely visually distinct today
<fwereade> rogpeppe2, and this would work against that
<rogpeppe2> fwereade: yeah, maybe i'll make pending the only special case
<fwereade> rogpeppe2, when you say special-case... ISTM that it's just one branch in one place
<rogpeppe2> fwereade: sure
<fwereade> rogpeppe2, if it has tentacles that's a different matter
<fwereade> rogpeppe2, +1 on early exit on pending
<rogpeppe2> fwereade: every if statement doubles the number of reachable states :-)
<fwereade> rogpeppe2, but I thought dimitern was doing that?
<rogpeppe2> fwereade: it's actually "if id==pending {agentstate=""}.
<rogpeppe2> fwereade: it meshed too closely with what i was doing already
<dimitern> fwereade: we agreed rogpeppe2 to take life and I'll do series
<fwereade> rogpeppe2, agreed, but we are trying to report on a very large number of states ;p
<fwereade> dimitern, rogpeppe2: isn't that just asking for conflicts?
<rogpeppe2> fwereade: i was already mucking with processMachine
<fwereade> rogpeppe2, you have one to land and one to write and get reviewed already though
<rogpeppe2> fwereade: one to land?
<fwereade> rogpeppe2, https://codereview.appspot.com/8821043/ ?
<dimitern> rogpeppe2: cmd logging
<fwereade> rogpeppe2, that one on top
<rogpeppe2> fwereade: oh yeah; will do
<fwereade> dimitern, it's a significant reduction in logspam when the allwatcher's running, been approved for a day or 2
<fwereade> rogpeppe2, dimitern: regardless, rogpeppe2 is messing with exactly the method dimitern needs to change
<dimitern> fwereade: what's that?
<fwereade> dimitern, processMachine
<rogpeppe2> fwereade: that is true
<fwereade> rogpeppe2, I do not think this work is sanely parallelisable
<dimitern> the changes shouldn't conflict (or not badly)
<rogpeppe2> fwereade: the tests are the main part of the work
<rogpeppe2> fwereade: i don't care about conflicts in the code - it's all trivial
<fwereade> rogpeppe2, so dimitern needs to modify every one of your tests, and also your code
<rogpeppe2> fwereade: hmm. dimitern shall i pass over my WIP to you?
<fwereade> rogpeppe2, I appreciated the ninja-rewrite last night very much but I think this is decidedly less convenient tbh
<rogpeppe2> fwereade: i think that all the changes we agreed to do will clash
<dimitern> rogpeppe2: I'm proposing mine now
<rogpeppe2> dimitern: cool
<fwereade> rogpeppe2, dimitern: cool, I think that is a much cleaner direction to merge
<fwereade> rogpeppe2, dimitern: objections withdrawn
<fwereade> rogpeppe2, please focus on the other branches while waiting on dimitern's to land though
<rogpeppe2> fwereade: will do
<rogpeppe2> fwereade: only one branch, right?
<fwereade> rogpeppe2, cheers -- well, 290 to land if not already done, command output to propose with the logging just in SuperCommand
<rogpeppe2> fwereade: ah, thanks for reminding of that one. i should've made a list!
<dimitern> I'm having trouble authenticating on rietveld while proposing :( tried 10 times, sign out/in from the web site works, I authorized the app (again), still no joy - probably related to the recent google apps / ubuntu sso change?
<fwereade> dimitern, np, it's pretty easy to review on LP
<fwereade> dimitern, you have an LGTM, but considering the current circumstances I'm not keen to call it a trivial
<fwereade> rogpeppe2, would you glance at https://code.launchpad.net/~dimitern/juju-core/034-status-shows-machine-series/+merge/159589 briefly please?
<dimitern> fwereade: sure
<fwereade> TheMue, fwiw my current findings are that eu-west-1 works perfectly but I *think* the us-east-1 issues were down to ec2 not us
<rogpeppe2> fwereade: looking
<rogpeppe2> fwereade, dimitern: i might remove the omitempty, 'cos if we do have a blank series for some reason, we'll want to know
<dimitern> rogpeppe2: ok
<dimitern> rogpeppe2: can you pull my branch and merge it for me please?
<fwereade> rogpeppe2, you can't create a machine without a series, and you can't set a machine's series once it's created
<rogpeppe2> fwereade: so it can never be empty?
<fwereade> rogpeppe2, yeah, I think that is a guarantee that state makes
<fwereade> rogpeppe2, hence no ,bool or ,error
<rogpeppe2> fwereade: in which case we don't need the omitempty, right?
<dimitern> rogpeppe2: i'm still struggling to get lbox/lpad working - auth failing
<fwereade> rogpeppe2, ha, that is true
<rogpeppe2> fwereade: i don't care much though
<rogpeppe2> dimitern: ok, i'll merge it for you
<fwereade> rogpeppe2, AFAICT omitempty is entirely academic
<dimitern> rogpeppe2: tyvm
<fwereade> rogpeppe2, follow your heart
<rogpeppe2> fwereade: yeah. i might leave it there for consistency
<fwereade> rogpeppe2, +1
<fwereade> TheMue, ok, yes, us-east-1 problem confirmed as an unhappily-timed connection loss to s3 causing apparent lack of tools
<fwereade> TheMue, where are you looking?
<rogpeppe2> fwereade: Put needs to retry :-)
<fwereade> rogpeppe2, it's List actually
<fwereade> ;
<rogpeppe2> fwereade: hmm. i thought List did
<dimitern> wtf is this: 2013/04/18 12:48:49 RIETVELD 0xf840000150 client.Get returned (*http.Response)(nil), &url.Error{Op:"Get", URL:"http://example.com/marker", Err:(*errors.errorString)(0xf8400a26d0)}
<TheMue> fwereade: wanted to look at us-east-1, but will now choose a different one. had troubles with my test image. :(
<fwereade> TheMue, what's this test image?
<fwereade> TheMue, you can just install from the ppa, can't you?
<TheMue> fwereade: have an extra vm for it
<dimitern> Get http://example.com/marker: redirect blocked ???
<dimitern> reported by net/http/client.Get
<fwereade> dimitern, have you ever visited example.com?
<fwereade> dimitern, it's a placeholder basically
<dimitern> fwereade: i *know* what it is, but why is lbox misbehaving?
<dimitern> fwereade: interestingly, searching for that message in google gave me a #juju-dev log where I complained about the same thing on 2012/11/21 :)
<dimitern> http://irclogs.ubuntu.com/2012/11/21/%23juju-dev.txt
<fwereade> dimitern, I was taking the use of example.com to be evidence of lbox hitting the crack pipe pretty hard today, but I don't know *why*
<fwereade> dimitern, does nuking your various relevant .files  and reauthing help?
<dimitern> fwereade: tried that already
<fwereade> dimitern, sorry, out of ideas then :(
<dimitern> fwereade: yeah... drawing knowledge from my earlier self in that conversation - it seems it's a go 1.0.3 issue, which was fixed on tip, and I need to rebuild lbox with go tip
<rogpeppe2> cmd/juju tests almost take 3 minutes!
<fwereade> FFS this is altogether too eventual for my liking
<fwereade> and I need food
<fwereade> rogpeppe2, I'm aware of those tests, they are for now the price we pay for coverage
<fwereade> rogpeppe2, I will be trying to figure out what the hell is taking so long very soon
<rogpeppe2> fwereade: me too :-)
<fwereade> but for now, lunch is an absolute necessity
<fwereade> maybe the instance will have shown up next hour
<fwereade> dimitern, rogpeppe2: if you can find someone else to review everything before I return, I would encourage one of you to set the build number and follow up on my juju-dev email
<fwereade> dimitern, rogpeppe2: but please, whoever does that, make sure it works live ;p
<fwereade> dimitern, rogpeppe2: if not I'll bbiab
<rogpeppe2> fwereade: ok; i'm currently doing the manual diff thing on DeepEqual output
<dimitern> it worked!
<dimitern> so, for the record: goetveld is broken before a patch from wallyworld_ (https://code.launchpad.net/~wallyworld/goetveld/auth-cookie-fix/+merge/147585)
<dimitern> now it works with go 1.0.3, the "redirect blocked" issue is gone and I can use it normally
<rogpeppe2> everything crashes
<wallyworld_> jam: dimitern: ping
 * rogpeppe2 thanks thumper for passing on the nm-applet hack
<dimitern> I cannot bootstrap on any region with 1.9.14 from the ppa: http://paste.ubuntu.com/5718528/
<dimitern> fwereade: any idea?
<TheMue> currently it looks as i can bootstrap but status doesn't return :(
<TheMue> ah, now, a pending machine 0
<dimitern> TheMue: how did you manage? with the ppa version and just "juju bootstrap" on ec2?
<dimitern> TheMue: no --upload-tools or --series, right?
<dimitern> rogpeppe2: you managed to get my branch?
<rogpeppe2> dimitern: my phone went down, and i've run into unexpected difficulties with the Life change.
<rogpeppe2> dimitern: will submit your branch now
<TheMue> dimitern: with the ppa version
<TheMue> dimitern: and a pure juju bootstrap
<dimitern> TheMue: which region?
<dimitern> TheMue: ah, you're running precise!
<TheMue> dimitern: i wanted to start from west to east, so us-east-1 now
<TheMue> dimitern: yes, precise
<dimitern> so it's not working on quantal
<rogpeppe2> dimitern: i'm trying to figure out the correct logic for processAgent
<rogpeppe2> dimitern: i can't convince myself that it's currently right, and the tests aren't great
<rogpeppe2> dimitern: i've been writing out a truth table
<dimitern> rogpeppe2: I see
<rogpeppe2> TheMue: i wonder if you could talk me through the logic in processAgent (it was processStatus)
<dimitern> rogpeppe2: the idea is to have Life(), AgentAlive(), AgentTools() and Status() for entities that support it - units and machines
<dimitern> rogpeppe2: and process them similarly
<rogpeppe2> i can't quite get my head around this condition: status != params.StatusPending && !agentAlive && !entityDead
<rogpeppe2> dimitern: i realise that
<dimitern> rogpeppe2: I can help with that
<rogpeppe2> dimitern: it's just that under *some* conditions the agentAlive status is lost
<dimitern> rogpeppe2: this is used to determine if the agent is down
<dimitern> rogpeppe2: it's down if the machine is alive, but the agent is not and the status is not pending (i.e. provisioned and started)
<TheMue> rogpeppe2: never touched processAgent(), sorry
<rogpeppe2> TheMue: it's the same logic you wrote in processStatus
<TheMue> rogpeppe2: one moment, have to open then code
<rogpeppe2> TheMue: it's ok, i think i'm there
<dimitern> rogpeppe2: I wrote that actually
<rogpeppe2> dimitern: ah, ok
<dimitern> rogpeppe2: see above, does it make sense?
<rogpeppe2> dimitern: it's all those double-negatives makes me see boggle-eyed
<dimitern> rogpeppe2: simple logic :)
<rogpeppe2> dimitern: yeah. i have a better intuitive grasp when it's 	if !(status == params.StatusPending || agentAlive || entityDead) {
<dimitern> rogpeppe2: change it, if you think it'll be more readable
<dimitern> rogpeppe2: as long as it's the same logic
<rogpeppe2> dimitern: i'm not sure. it helped me, but probably only 'cos i'd been staring at it the other way
<rogpeppe2> dimitern: standard boolean transformation
<dimitern> rogpeppe2: yeah
<rogpeppe2> dimitern: i can't remember the name of the rule though
<rogpeppe2> dimitern: i think there's no point in calling AgentAlive if the status is pending
<dimitern> rogpeppe2: yeah
<dimitern> rogpeppe2: correct
<dimitern> http://paste.ubuntu.com/5718578/ - still cannot bootstrap from the ppa (tried both us-east-1 and eu-west-1)
<fwereade> dimitern, 2013/04/18 14:00:46 ERROR command failed: cannot find tools: use of closed network connection
<fwereade> dimitern, I have been seeng that sometimes, but not always
<fwereade> dimitern, I don't *think* it's us
<TheMue> dimitern: i now have a problem with us-west-2 when creating the s3 control bucket (conflicting condition)
<TheMue> dimitern: us-east-1 worked fine
<fwereade> TheMue, s3 bucket names are global
<rogpeppe2> dimitern: i've reworked the code a little (the logic should still be the same though) http://paste.ubuntu.com/5718581/
<dimitern> fwereade: yeah, the issue before (with uncommented public-bucket) was different (no compatible tools found)
<fwereade> TheMue, you need a new name
<dimitern> rogpeppe2: looks good
<TheMue> fwereade: even if the former is destroyed?
<rogpeppe2> dimitern: at least my small brain can wrap itself around it now :-)
<dimitern> rogpeppe2: :)
<fwereade> TheMue, I don;t recall that ever working, no
<fwereade> TheMue, dimitern: I have successfully bootstrapped and deployed in both us-east-1 and eu-west-1 with the ppa
<TheMue> fwereade: hmm, so when testing between east and southeast i seem to have switched my buckets. can't remember, but it looks like.
<dimitern> fwereade: you're running precise, maybe that's why
<TheMue> fwereade: and i successfully to us-east-1
<fwereade> TheMue, dimitern: I have "successfully bootstrapped" in ap-southeast-2, as in I have a bootstrap instance running, but it's been running for an hour and I'm still unable to get the instance
<TheMue> fwereade: do we have an issue or doc to collect the test results
<fwereade> TheMue, creating an issue now
<TheMue> fwereade: thx, +1
 * rogpeppe2 hates the spot-the-difference competition when a DeepEqual fails: http://paste.ubuntu.com/5718597/
<TheMue> rogpeppe2: hehe, i know that from my tests. i then replaced those map[i]i by \n and some manual sorting. so it gets easier
<rogpeppe2> current mostly-manual solution: Edit ,|gofmt Edit ,x/"}/c/",\n}/ Edit ,x/,/a/\n/ Edit ,x/{./v/}/x/{/a/\n/
<rogpeppe2> :-)
<fwereade> TheMue,  dimitern: https://bugs.launchpad.net/juju-core/+bug/1170326
<fwereade> dimitern, rogpeppe2: btw how is progress? should I be reviewing, supporting, etc?
<dimitern> fwereade: I'm done - mine has landed
<fwereade> dimitern, awesome
<rogpeppe2> fwereade: just working out the right way to fix the tests. at least i'm convinced the logic is good now.
<fwereade> rogpeppe2, ?/3
<fwereade> rogpeppe2, excellent
<rogpeppe2> fwereade: i've submitted the log changes
<fwereade> rogpeppe2, cool
<rogpeppe2> fwereade: not the finished ones though
<rogpeppe2> fwereade: i think status life is 100x more important
<fwereade> rogpeppe2, well, I did originally ask dimitern to do it because (1) he'd already started and (2) you had 2 other branches to do
<fwereade> rogpeppe2, and now he is kickinghis heels
<rogpeppe2> fwereade: yeah, sorry, i thought life was trivial to do along with what i was doing anyway
<fwereade> rogpeppe2, I didn't think you were actually doing anything with status
<rogpeppe2> fwereade: for some reason i thought i was
<fwereade> rogpeppe2, we agreed and minuted otherwise
<fwereade> rogpeppe2, https://docs.google.com/a/canonical.com/document/d/1bSiicbYOV25fq73dZqXz738OU5l96QjkY3mqnXFdLlE/edit#
<fwereade> rogpeppe2, but hey ho
<rogpeppe2> fwereade: ah, i think it was because i was already half way through some changes when that was minuted
<dimitern> so, with tip I'm able to bootstrap with default region and public-bucket (commented out) (no --upload-tools or --series), with default-series: precise
<dimitern> but not with 1.9.14 from the ppa
<fwereade> dimitern, what's the error?
<dimitern> fwereade: the same - use of closed network connection
<fwereade> dimitern, and it's meaningless to compare tip and 1.9.14, I think
<fwereade> hey!
<fwereade> has goamz updated recently?
<dimitern> so 1.19.14-amd64-quantal1 is confirmed broken
<dimitern> fwereade: it has i think
<fwereade> GAAAH
<fwereade> magic dependency updates FTL
<rogpeppe2> right, tests pass.
<rogpeppe2> the difference between agent-state=pending and instance-id=pending is subtle
<fwereade> hmm, no goamz changes since march it seems
<rogpeppe2> fwereade: you might be stuck
<rogpeppe2> fwereade: try removing the goamz directory and go getting again
<fwereade> rogpeppe2, I went to look on launchpad ;)
<fwereade> rogpeppe2, but hmm maybe I hadn't actually updated since before then?
<fwereade> rogpeppe2, but no
<fwereade> rogpeppe2, we've done releases that didn't exhibit these issues, right?
<rogpeppe2> fwereade: you're right
<rogpeppe2> fwereade: 35 is my latest revno
<rogpeppe2> fwereade, dimitern: https://codereview.appspot.com/8852043
<rogpeppe2> one known issue - the summary in the test is wrong; fixing
<benji> the current state is that when the user clicks "Add" on a charm page the charm details will dissapear and the left sidebar will stay visible and the service configuration panel will be displayed at the right
<benji> in full-screen mode we will switch to sidebar mode and go to the same state
<benji> I don't know if Rick is communicating with Jovan or not
<rogpeppe2> fwereade, dimitern: now proposed with that fixed
<dimitern> rogpeppe2: reviewed
<dimitern> fwereade: I updated bug 1170326
<dimitern> _mup_: wtf?
<dimitern> https://bugs.launchpad.net/juju-core/+bug/1170326
<fwereade> rogpeppe2, reviewed
<fwereade> TheMue, how do you "install" in juju?
<TheMue> fwereade: typo, deployed a service and waited until it is started
<TheMue> fwereade: currently in us-west-1
<rogpeppe2> fwereade: what instance id would we set?
<rogpeppe2> fwereade: when there's no instance
<fwereade> rogpeppe2, the one in state?
<fwereade> rogpeppe2, I don;t understand why you'd ever call instance.Id()
<TheMue> hmm, sadly can't edit, will comment it after current test.
<rogpeppe2> fwereade: excellent point
<TheMue> but looks good so far, mysql is pending
<fwereade> TheMue, are you watching the provisioner logs?
<rogpeppe2> fwereade: FWIW this is an old problem - the logic there hasn't changed
<fwereade> rogpeppe2, it was also stuff that I'd figured out with dimitern before you took it over unilaterally
<TheMue> fwereade: as long as the commands and status tell me it's ok not. shall i look for something special?
<rogpeppe2> fwereade: v sorry about that
 * dimitern lunch
<fwereade> rogpeppe2, no worries, it happens, I'm just a bit confused that it did when I thought I'd been extra clear -- but it takes at least 2 to experience a communication problem ;)
<rogpeppe2> fwereade: is there ever a case that we can have an alive agent when the entity status is pending?
<dimitern> rogpeppe2: not really, with the nonced provisioning changes, even if this happens briefly, the agent will commit suicide soon after starting (even before setting AgentAlive I think)
<rogpeppe2> dimitern: that's what i think. i was asking because of the review comment about that, so thought perhaps fwereade had some more useful input there.
<rogpeppe2> fwereade, dimitern: PTAL https://codereview.appspot.com/8852043
<dimitern> rogpeppe2: changing status is *not* what an agent does first - it sets itself as alive first
<rogpeppe2> dimitern: interesting
<rogpeppe2> dimitern: perhaps it should be the other way around
<dimitern> rogpeppe2: take a look at both uniter/modes and machiner
<dimitern> rogpeppe2: how so?
<rogpeppe2> dimitern: it's an early indication of liveness
<rogpeppe2> dimitern: we save round trips in status
<rvba> dimitern: thanks for the review (MAAS provider constraints branch)!  I see you guys are busy, just ping me when it's ok for me to land this.
<rogpeppe2> dimitern: it's a once-and-for-all "i have started running!" - then the liveness status can change over time
<dimitern> rvba: please don't - we're about to release and it can land after that
<dimitern> rvba: it's a bit of a mess anyway, let's not complicate it
<rvba> dimitern: sure, I will wait until you guys tell me it's good to go.
<dimitern> rvba: cheers!
<dimitern> rogpeppe2: i don't think it's an early indication of liveness
<dimitern> rogpeppe2: setagentalive is that indication, not the status change
<dimitern> rogpeppe2: it might seem so only from the command's perspective
<rogpeppe2> dimitern: it's an indication that it got there anyway.
<rogpeppe2> dimitern: if we set the status, then die immediately, then we at least see that it got that far
<dimitern> rogpeppe2: how is that useful?
<dimitern> rogpeppe2: if it dies, the status will be incorrect anyway
<dimitern> rogpeppe2: the agent has to be alive to set the status
<rogpeppe2> dimitern: if it dies, we'll print "down" for the status
<rogpeppe2> dimitern: so there's no difference there
<dimitern> rogpeppe2: "down" doesn't actually mean "oops i crashed while starting"
<rogpeppe2> dimitern: it means "i crashed"
<dimitern> rogpeppe2: it means "i was running ok, entity was started, then something went wrong and i died"
<rogpeppe2> dimitern: for me, it means that the agent started running and then stopped working for some reason
<dimitern> rogpeppe2: the significant distinction here is, the entity went into a started state before "down" being meaningful
<rogpeppe2> dimitern: and a good (the only) indication we have that an agent started running is that it set its status
<dimitern> rogpeppe2: exactly, in addition to being alive as well
<dimitern> rogpeppe2: if the agents sets status to "started" and then sets itself alive, we'll see "down" a lot more often, and it will be a lie
<dimitern> rogpeppe2: it's a very brief moment, i agree, but it still be a lie
<rogpeppe2> dimitern: hmm, good point
<dimitern> rogpeppe2: that's the point in setting the status after setting the agent to alive
<rogpeppe2> dimitern: perhaps we should have a "pending but alive" status
<dimitern> rogpeppe2: how?
<dimitern> rogpeppe2: agent live and status are tightly linked ("agent-state" is what we use for status)
<fwereade> rogpeppe2, dimitern: if we did, I think it'd probably be "running"
<rogpeppe2> fwereade: that's a good idea
<fwereade> rogpeppe2, dimitern: but I'm not sure
<dimitern> fwereade: we'll be departing from py-juju compatibility a bit if we do this
<rogpeppe2> fwereade: https://codereview.appspot.com/8658045/
<rogpeppe2> dimitern: ^
<fwereade> dimitern, not much tbh -- it's an extra step in unit status, and a bit closer to python for the machine -- although "running" will still not be a terminal state for a machine
<fwereade> rogpeppe2, cheers
<rogpeppe2> fwereade: still waiting on https://codereview.appspot.com/8852043/ too
<dimitern> rogpeppe2: why remove Noticef("agent starting") ?
<dimitern> fwereade: well, in that case we can have "agent-state": "running" when status is pending and the agent is alive
<rogpeppe2> dimitern: which file?
<dimitern> fwereade: but then we'll have "agent-state": "started" for most of the time
<dimitern> rogpeppe2: see the comments inline
<fwereade> dimitern, yeah, matching the unit agent
<rogpeppe2> dimitern: ah, the *agent exiting* messages
<rogpeppe2> dimitern: they're now redundant
<dimitern> rogpeppe2: how so?
<fwereade> dimitern, I *think* it is more important to impose consistency here by messing with the less-interesting-to-observe status
<rogpeppe2> dimitern: as the messages are printed by supercommand
<rogpeppe2> dimitern: which is the point of the CL
<rogpeppe2> dimitern: no point in printing the info twice, i think
<dimitern> rogpeppe2: I though the point was to report "command completed successfully" for cli commands, not agents
<rogpeppe2> dimitern: i did that originally, but fwereade suggested the supercommand change
<rogpeppe2> dimitern: and given that we print it *anyway* for agents, why not?
<dimitern> fwereade: yeah, but it's slightly confusing to have "running" (for a short while) and "started" otherwise
<fwereade> dimitern, yeah
<fwereade> dimitern, maybe "starting" would be better
<fwereade> dimitern, anyway I don;t think that's one for today
<dimitern> rogpeppe2: I don't think these two are related - cli commands report success as a courtesy to the user; agents log stuff which is greppable by admins/etc.
<dimitern> fwereade: +1 for "starting"
<fwereade> wait, did we not do life for services/relations?
<rogpeppe2> dimitern: they're both logging the same thing, no?
<dimitern> fwereade: and I also agree it's better postponed for after today
<rogpeppe2> dimitern: given that the exit status is now logged by supercommand, what's the reason for the Noticef in the juju commands?
<dimitern> rogpeppe2: unless i'm on crack cli commands use stdout/err to report these things, not the logging infrastructure
<fwereade> dimitern, you're on crack ;p
<fwereade> dimitern, about one CLI command does
<dimitern> :)
<rogpeppe2> +1 :-)
<fwereade> dimitern, they all *should* but that's not important enough for now
<rogpeppe2> dimitern: this CL is entirely about log messages
<dimitern> so, with this change - if I run "juju somecommand" will I see "command completed successfully" on the console after it finished?
<fwereade> rogpeppe2, the status one's looking good given what you've done
<rogpeppe2> dimitern: no
<rogpeppe2> dimitern: only if you use --verbose
<fwereade> rogpeppe2, but there's no life for services/relations that I can see
<dimitern> rogpeppe2: so that's what I was thinking
<rogpeppe2> fwereade: ah
<dimitern> rogpeppe2: can we add the stdout/err message like this as well please?
<rogpeppe2> dimitern: no
<rogpeppe2> dimitern: :-)
<rogpeppe2> dimitern: i don't think a command should print this stuff by default
<rogpeppe2> dimitern: it's noise
<dimitern> it's nice and reassuring
<dimitern> but, fine
<rogpeppe2> dimitern: so is the next shell promt :-)
<fwereade> rogpeppe2, LGTM with the extra life fields
<rogpeppe2> dimitern: if there's an error, that *will* be printed
<dimitern> rogpeppe2: fair enough
<dimitern> rogpeppe2: LGTM then
<rogpeppe2> dimitern: thanks
<rogpeppe2> dimitern: i'll leave "completed successfully" for another day if that's ok
<dimitern> rogpeppe2: series is still omitempty - should it be left like this?
<rogpeppe2> dimitern: yeah, i decided it was fine
<dimitern> rogpeppe2: you mean the wording or the stdout output?
<rogpeppe2> dimitern: the wording
<rogpeppe2> dimitern: i don't want to spend another 10 minute round-trip
<dimitern> rogpeppe2: a command always finishes, even when it fails
<rogpeppe2> dimitern: yeah, i'm +1 on the change, but i don't think it's that important right now
<dimitern> rogpeppe2: ok, if we're absolutely rushing things, fine
<rogpeppe2> dimitern: meeting at 3
<dimitern> rogpeppe2: so once you land these 2 we're done?
<fwereade> god, meeting
<fwereade> I'm going to lie down for 20 mins
<fwereade> see you then
<dimitern> who is gonna do the release following dave's process?
<rogpeppe2> fwereade: how do you suggest we show relation life status?
<rogpeppe2> fwereade: currently relations are
<rogpeppe2> 	Relations     map[string][]string   `json:"relations,omitempty" yaml:"relations,omitempty"`
<rogpeppe2> fwereade: no struct for a life field
<dimitern> rogpeppe2: the same way?
<rogpeppe2> dimitern: how do you mean?
<dimitern> rogpeppe2: having relationStatus instead of string?
<rogpeppe2> dimitern: that will break compatibility, no?
<dimitern> rogpeppe2: i think so, yeah
<dimitern> fwereade: when you're back comment on this one please
<dimitern> rogpeppe2: well, we can always add it in parenthesis after it :)
<rogpeppe2> dimitern: bad idea
<dimitern> rogpeppe2: best compromise I think
<rogpeppe2> dimitern: that breaks scripts horribly
<rogpeppe2> dimitern: i'd add another field, RelationLife map[string]state.Life
<rogpeppe2> dimitern: or something like that
<dimitern> rogpeppe2: yeah, should work, at the expense of extra output size, but meh..
<rogpeppe2> dimitern: we'd only include dead and dying ones
<dimitern> rogpeppe2: good point!
<dimitern> rogpeppe2: were we doing that in python?
<dimitern> rogpeppe2: reporting not alive relations
<rogpeppe2> dimitern: i'm not sure there was such a concept in the python
<dimitern> rogpeppe2: really? oh, well
<rogpeppe2> dimitern: quick: remind me of a good way to get a service that's in a dying state?
<dimitern> rogpeppe2: st.Service(id) should work
<rogpeppe2> dimitern: no, i mean, to create a service and then put it into a dying state
<dimitern> rogpeppe2: ah, Destroy()
<rogpeppe2> dimitern: doesn't that just remove the service if there's not something keeping it around
<rogpeppe2> ?
<dimitern> rogpeppe2: yeah, you'll need to add a subordinate at least
<rogpeppe2> dimitern: i think i probably need to add a unit
<rogpeppe2> dimitern: but i'm not sure i can call Destroy if there's a unit
 * rogpeppe2 thinks sometimes that we should have a way of recreating a given desired State rather than going through all the steps necessary to arrive at it
<dimitern> rogpeppe2: take a look at preventUnitDestroyRemove
<rogpeppe2> dimitern: thank you
<rogpeppe2> dimitern: perfect!
<rogpeppe2> dimitern: oh, no
<rogpeppe2> dimitern: that's the unit not the service
<dimitern> rogpeppe2: hmm..
<rogpeppe2> dimitern: actually, maybe i can destroy it even if it has units
<dimitern> rogpeppe2: if it has 1 unit and no relations, destroy() will remove it, otherwise it'll set it to dying
<rogpeppe2> dimitern: so i need a relation i guess
<dimitern> rogpeppe2: looks that way - at least according to some of the tests that are faking a relation to test destroy()
<dimitern> rogpeppe2: TestDestroyStillHasUnits
<rogpeppe2> dimitern: interesting. i think that might be wrong actually.
<dimitern> rogpeppe2: which one?
<rogpeppe2> dimitern: if i destroy a service before it's been provisioned i think it should go away and all its units too
<dimitern> rogpeppe2: provisioning applies to machines, not services, right?
<rogpeppe2> dimitern: yeah
<dimitern> rogpeppe2: I can't deploy a service with no units with the cli
<rogpeppe2> dimitern: but it looks like i can't destroy a service until the units i created on it have started and stopped
<dimitern> rogpeppe2: well, adding a unit assumes you want it started
<rogpeppe2> dimitern: if i do {juju deploy wordpress; juju destroy-service wordpress} i shouldn't have to wait 10 minutes for the machine to come up
<rvba> fwereade: I know you're busy right now and this is definitely not urgent but when you have time, could you please see what you have to say about Jeroen's comment here: https://code.launchpad.net/~maas-maintainers/juju-core/maas-provider-skeleton/+merge/157025/comments/347752
<rogpeppe2> dimitern: yeah, but people change their minds
<dimitern> rogpeppe2: peaople should rtfm
<rogpeppe2> dimitern: and there's little more annoying than a service that doesn't do what it could do
<dimitern> :)
<rogpeppe2> dimitern: i think we could do a better job here - the manual says "you can't stop what you've started, even though it's quite possible to do so|"
<dimitern> rogpeppe2: they can change their mind, they just have to wait for the action they issued before destroying
<rogpeppe2> dimitern: yeah. we could do better there
<dimitern> rogpeppe2: possibly yeah, but that's not the only place, i assure you :)
<rogpeppe2> dimitern: indeed :-)
<dimitern> rogpeppe2: it could be worth adding a wishlish bug?
<dimitern> wishlist
<rogpeppe2> dimitern: just a bug would do
<dimitern> kanban meeting guys?
<dimitern> rogpeppe2, TheMue: kanban?
<TheMue> ouch, yes
<fwereade> rvba, responded
<rvba> fwereade: ta
<rogpeppe2> lunch
 * TheMue has to step out in a few moments for dinner in a restaurant, younger daughter has her 17th birthday today
<rogpeppe2> TheMue: have fun
<rogpeppe2> live tests passed against trunk for me (except the usual StopInstances failure)
<TheMue> rogpeppe2: thx, we'll have. and daddy is allowed to pay. :)
<rogpeppe3> anyone here know the easiest way to script install setuptools ?
<rogpeppe3> the best i've got currently is:
<rogpeppe3> 	wget -o setuptools.egg http://pypi.python.org/packages/2.7/s/setuptools/setuptools-0.6c11-py2.7.egg#md5=fe1f997bc722265116870bc7919059ea
<rogpeppe3> 	sh *.egg
<rogpeppe3> which seems a bit arbitrary
<rogpeppe3> this is in a charm, BTW
<rogpeppe> here's my current juju environment which i'm using to try out some stuff: http://paste.ubuntu.com/5719234/
<rogpeppe> i've done upgrade-charm a few times
<rogpeppe> dimitern: ^
<rogpeppe> dimitern: seems to be working well!
<dimitern> rogpeppe: good to hear! :)
<rogpeppe> dimitern: this was the script i used to set things up: http://paste.ubuntu.com/5719240/
<dimitern> rogpeppe:  status looks nicer as well
<rogpeppe> dimitern: yeah, it's good to see it working
<rogpeppe> dimitern: hmm, actually i'm not entirely sure it is working
<dimitern> rogpeppe: i can see the hook failed
<rogpeppe> dimitern: actually i don't think it did
<rogpeppe> dimitern: oh, it did
<rogpeppe> dimitern: i ssh'd in to the one machine that it didn't fail on!
<rogpeppe> dimitern: two "juju resolved" invocations later and it's all running
<dimitern> rogpeppe: nice!
<rogpeppe> dimitern: yeah it feels really good to just play with it a bit
<rogpeppe> dimitern: just found a bug in juju get though
<rogpeppe> % juju get logging
<rogpeppe> error: constraints do not apply to subordinate services
<dimitern> rogpeppe: oh?
<rogpeppe> interrresting error
<dimitern> rogpeppe: indeed
<rogpeppe> dimitern: it only happens when doing juju-get on the subord
<rogpeppe> dimitern: ha, found it i think
<rogpeppe> dimitern: yup
<rogpeppe> dimitern: fixed.
<rogpeppe> dimitern: am sorely tempted to push the fix :-)
<dimitern> rogpeppe: what was it?
<rogpeppe> dimitern: in statecmd.ServiceGet, it calls svc.Constraints without checking if the service is principal or not
<rogpeppe> dimitern: personally i'd be tempted to make Service.Constraints return a zero constraints if the service is subord
<rogpeppe> dimitern: rather than an error
<rogpeppe> dimitern: hmm, another bug in juju get
<rogpeppe> dimitern: it doesn't appear to print default values
<dimitern> rogpeppe: hmm.. well, bugs will appear anyway :) good that we have some time now to actually test it and find them
<rogpeppe> dimitern: definitely
<rogpeppe> dimitern: and none of these are show-stoppers
<dimitern> rogpeppe: yeah
<rogpeppe> hmm, i thought juju get was supposed to work now
<rogpeppe> right, that's me done
<rogpeppe> g'night all!
<thumper> hmm... forgot to close irc last night
<thumper> oh well,
<thumper> morning
<mgz> mornin'
<dimitern> morning mgz, thumper
<dimitern> ;)
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: danilos | Bugs: 2 Critical, 61 High - https://bugs.launchpad.net/juju-core/
<thumper> dimitern: why start log messages with lower case?
<fwereade> thumper, convention
<thumper> oh hai fwereade
<fwereade> thumper, heyhey
<thumper> fwereade: good timing,
<thumper> fwereade: https://codereview.appspot.com/8849043/ just being updated
<fwereade> thumper, cool
<thumper> fwereade: hmm...
<thumper> $ juju bootstrap
<thumper> error: cannot find tools: use of closed network connection
<thumper> $ juju version
<thumper> 1.9.14-raring-amd64
<thumper> that is the package version
<thumper> fwereade: I don't suppose you could help me test the serialization?
<thumper> fwereade: like how to make some fake charms and fake subordinates that just take time :)
<thumper> fwereade: try and force contention
<fwereade> thumper, hum, that raring thing is not nice
<fwereade> thumper, we have been unable to adequately characterise it
<thumper> running with -v
<thumper> also, if I use the raring mongodb tests fail, with tarball, it works
<thumper> 2013/04/19 11:32:36 ERROR command failed: cannot find tools: Get https://s3.amazonaws.com/juju-c54985419ee80c98531550e15fdcc6a8/?prefix=tools%2Fjuju-1.&delimiter=&marker=: remote error: handshake failure
<fwereade> thumper, AFAIWCT it is coming out of s3 somehow, very much more in some regions than others, and possibly varying by client series
<fwereade> thumper, handshake failures appear to Just Happen
<fwereade> thumper, I know this is shit
<fwereade> thumper, but IME they have never progressed beyond a mild annoyance
<fwereade> thumper, huh, was the mongo package used in 1.9.14?
 * thumper tries again then
<thumper> I think the terminal running juju is using the packaged mongo too
<thumper> no, using tar ball
<fwereade> thumper, that shouldn't be hitting mongo except via mgo
<fwereade> thumper, the tests will just use whatever's on your path
 * thumper nods
<thumper> got connection closed that time
<thumper> 2013/04/19 11:36:26 ERROR command failed: cannot find tools: use of closed network connection
 * thumper runs of local version
<fwereade> thumper, assuming for now that you never see handshake failures, do you ever see anything other than closed connections to s3?
<thumper> not backage
<thumper> I get it with trunk too
<fwereade> !
<thumper> I can't bootstrap at all
<fwereade> ok we have a reproducible case with actual source
<fwereade> hallelujah
<fwereade> may I ask you to log the heel out of what is happening there?
<thumper> why does our logging not give us file and line numbers?
 * thumper switches to trunk, and tries again
<thumper> 2013/04/19 11:39:31 INFO environs: reading tools with major version 1
<thumper> 2013/04/19 11:39:37 INFO environs: falling back to public bucket
<thumper> 2013/04/19 11:39:37 ERROR command failed: use of closed network connection
<thumper> from tip of trunk
<thumper> where do I start logging?
<thumper> must be in the tools search right?
<fwereade> thumper, sorry got distracted
<fwereade> thumper, ok that is interesting
<fwereade> thumper, I think we might want to delve into goamz/s3 and log in some detail
<thumper> environs/tools.go line 26 fails
<fwereade> thumper, what region are you in?
<fwereade> thumper, that'd be a Storage.List(), right?
<thumper> the default
<thumper> fwereade: right
<thumper> how do you format a bool for %s ?
<fwereade> thumper, %v usually works
<fwereade> thumper, there might be something I ought t ouse instead
<fwereade> thumper, I'm sorry, but I think I have to sleep :(
<thumper> fwereade: np, I'll keep digging
<fwereade> thumper, the absolutely most valuable thing you can do is to mail that, indeed
<fwereade> thumper, nail that
<thumper> heh
<thumper> funny typo
<fwereade> thumper, and possibly to just try deploying to ap-southeast-2 in case you're "lucky" enough to encounter difficulties finding instances from ids
<thumper> ok, I'll try that now
<fwereade> thumper, I will finish your review first though
<fwereade> mramm, thumper seems to be able to repro one of the elusive issues against a source build
<thumper> fwereade: changing region made no difference
<thumper> fwereade: although unhelpfully, the region you are using isn't logged anywhere
<fwereade> mramm, in the light of mgz's comments re time, I think that we should not be releasing right now, but I would dearly appreciate clarity re the precise situation to which he alludes
<fwereade> thumper, ha
<mramm> fwereade: I have been in customer meetings all afternoon, so I'll try to check in on that now
<fwereade> thumper, I will gladly LGTM just about anything that improves our logging
<thumper> hmm, how am I supposed to log from within goamz??
<fwereade> thumper, I know we produce a lot but we're cutting down the useless ones and so we have space for ones that might be more useful
 * thumper does printf
<fwereade> thumper, what else does core/log import? I thought it should be doable directly without cycles
<fwereade> thumper, (not saying it's nice, just expedient)
<thumper> actually, should probably be able to import it
#juju-dev 2013-04-19
<mramm> fwereade: I'll look into the details, but am not able to get clarity on that right at this moment because people are not around
<fwereade> mramm, np, I'm heading off to sleep soon
<mramm> fwereade: understood
<mramm> take care of yourself -- it's late your time!
<fwereade> mramm, cheers :)
<thumper> ah ffs
<davecheney> m_3: ping
<thumper> davecheney: I have bootstrap problems with raring with the 1.9.14 package, and tip of trunk
<thumper> davecheney: I'm in dive mode, but breaking for lunch now
<thumper> as in diving in with logging to try and figure out wtf is going on
<davecheney> thumper-afk: ok, i'm doing the same
<davecheney> it could be that we have too many tools in the public bucket
<davecheney> in fact, that is probably it
<davecheney> release mode uses best fit, so it iterates over the tools
<davecheney> dev mode uses exact fit, so it's a striaght hit and run
<fwereade> thumper-afk, davecheney: sleep
<fwereade> (me sleep, not you)
<davecheney> go
<m_3> davecheney: pong
<davecheney> m_3: wazzup ?
<m_3> back home
<davecheney> right
<davecheney> i'm going to make one more attempt to figure out what is going on with hp cloud
<davecheney> i'm going to allocate 100 machines
<davecheney> manually remove their public addresses
<davecheney> another 100
<davecheney> etc
<davecheney> etc
<davecheney> see if that gets us over the 2^8 hump
<m_3> yeah, did you ever try it from _outside_?
<m_3> that's what I was gonna do next
<davecheney> m_3: i was able to launch 100 extra isnances from inside via the nova command
<m_3> basically spin up what we have from outside of hp
<m_3> but I've seen this problem before, a bunch
<m_3> intermittent inability to resolve the enpoint urls
<davecheney> maybe it will work from outside
<davecheney> it's like when we spin up two many machines inside an openstack tennant
<m_3> this kills charmtests against hp sometimes (w/ juju-0.6)
<davecheney> it runs out of ip's to respond _to_ dns queries
<m_3> I really don't understand it
<davecheney> i can sort of explain it
<m_3> couldn't work with it yesterday cause of the talk
<davecheney> given the ways that each tennant probably has the _same_ 10/8 address space
<m_3> dude, there were >300 people in our talk yesterday!
<davecheney> so there is probably shittones of nat going on
<m_3> they mostly stayed
<davecheney> m_3: i saw the photo
<davecheney> that is fucking amaizing !?!
<m_3> but /8's frickin huge
<davecheney> sure, but it's the same 10/8 for each customer
<m_3> you're thinking that's natted out using a limited pool of outsides?
<davecheney> just like your router is using 192.168.0/16
<davecheney> my suspicion is because we're asking for so many public addresses, we're sort of choking off our own air surply
<davecheney> anyway, going to explore that theory today
<m_3> hmmm... it really looks like the same as wehn I hit it from ec2
<davecheney> m_3: is this at all related to the stuff mgz was saying about security groups and screwing ourselves by making too many stupid requests ?
<m_3> dude... dunno
<m_3> I plan to sort of wrap my head around all of this again tomorrow
<davecheney> roger
<davecheney> you've got to uncompress from ODS
 * davecheney waves his wand
<m_3> I'll start from scratch and grok what you and mgz's done this week
<davecheney> "thou shall never have to say cloud again"
<m_3> haha
<m_3> yes
<davecheney> "i've got a cloud in my pants, and everyone is invited"
 * m_3 groan
<davecheney> too soon ?
<m_3> :)
<davecheney> m_3: fyi, just bringing the code on juju-hpgoctrl2-machine-0
<davecheney> up to date with the overnight changes
<m_3> davecheney: awesome... thanks man
<davecheney> m_3: no worries, we fixed some good issues this week
<davecheney> the deploy logs from the load test are much cleaner
<davecheney> they actually tell you what is going on
<davecheney> status is now usable while you are doing a big deploy
<davecheney> etc
<davecheney> also, when hp cloud wants too, it is nearly twice as fast to bring up an instance than ec2
<davecheney> which doesnt' suck
<davecheney> 2013/04/19 01:59:49 ERROR worker/provisioner: cannot start instance for machine "16": cannot set up groups: failed to create a rule for the sec
<davecheney> urity group with id: <nil>
<davecheney> caused by: Maximum number of attempts (3) reached sending request to https://az-2.region-a.geo-1.compute.hpcloudsvc.com/v1.1/17031369947864/os-
<davecheney> security-group-rules
<thumper-afk> davecheney: &http.Response{Status:"200 OK", StatusCode:200, Proto:"HTTP/1.1", ProtoMajor:1, ProtoMinor:1, Header:http.Header{"X-Amz-Request-Id":[]string{"90D56A3D6895C07E"}, "X-Amz-Id-2":[]string{"6lThSRAi5lMeq9oe8oSeibO7fjvZQLjgKGYG0Gs7vRMBZrQ6Z0xVlIfyILAoWO4A"}, "Date":[]string{"Fri, 19 Apr 2013 02:38:03 GMT"}, "Content-Type":[]string{"application/xml"}, "Server":[]string{"AmazonS3"}}, Body:(*http.bodyEOFSignal)(0xf84022ec60),
<thumper-afk> ContentLength:-1, TransferEncoding:[]string{"chunked"}, Close:false, Trailer:http.Header(nil), Request:(*http.Request)(0xf8403ac600)}
<thumper-afk> davecheney: this is the response from the http request inside goamz/s3 for the request to list the public bucket
<thumper-afk> seems like body is: Body:(*http.bodyEOFSignal)(0xf84022ec60)
<thumper-afk> ContentLength:-1,
<thumper> hate it when I forget to reset the nick
<davecheney> that it because it is chunked
<davecheney> TransferEncoding:[]string{"chunked"},
<davecheney> length is unknown from the server
<thumper> ok, what does that mean?
<davecheney> rfc 2616 chunked transfer encoing
<davecheney> encoding
<davecheney> it's not a problen
<thumper> ok
<davecheney> its a way of sending the http body without having to specify the length first
<davecheney> very common if you are streaming a response
<thumper> davecheney: this is the line that is failing: err = xml.NewDecoder(hresp.Body).Decode(resp)
<thumper> davecheney: I wonder if we have moved into a chunked response now, which the decoder can't handle due to number of tools...
 * thumper writes up findings to the list
<davecheney> thumper: nah, it just gets a Reader
<davecheney> what implements the reader is not important
<thumper> davecheney: so what would it be then?
 * thumper has put the heating up
<thumper> davecheney: ping
<rogpeppe> mornin' all
<davecheney> m_3: https://bugs.launchpad.net/juju-core/+bug/1170595
<davecheney> bingo
<davecheney> this is why we're having problems in load test
<davecheney> 2013/04/19 07:07:20 INFO rpc: discarding obtainer method reflect.Method{Name:"Kill", PkgPath:"", Type:(*reflect.commonType)(0x7468a8), Func:reflect.Value{typ:(*reflect.
<davecheney> commonType)(0x7468a8), val:(unsafe.Pointer)(0x4d6359), flag:0x130}, Index:4}
<davecheney> 2013/04/19 07:07:20 INFO rpc: discarding obtainer method reflect.Method{Name:"requireAgent", PkgPath:"launchpad.net/juju-core/state/apiserver", Type:(*reflect.commonTyp
<davecheney> e)(0x767768), Func:reflect.Value{typ:(*reflect.commonType)(0x767768), val:(unsafe.Pointer)(0x4d63e7), flag:0x131}, Index:8}
<davecheney> 2013/04/19 07:07:20 INFO rpc: discarding obtainer method reflect.Method{Name:"requireClient", PkgPath:"launchpad.net/juju-core/state/apiserver", Type:(*reflect.commonTy
<davecheney> pe)(0x767768), Func:reflect.Value{typ:(*reflect.commonType)(0x767768), val:(unsafe.Pointer)(0x4d64ac), flag:0x131}, Index:9}
<davecheney> ^ rogpeppe1 is this a problem ?
<rogpeppe1> davecheney: no
<davecheney> was spotted on pa restart
<rogpeppe1> davecheney: that's expected behaviour
<rogpeppe1> davecheney: the warnings are useful when developing
<rogpeppe1> davecheney: i know they're annoying otherwise
<davecheney> ok, nm
<rogpeppe1> davecheney: i guess we should probably move those methods off the rpc root object to stifle the warnings.
<davecheney> rogpeppe1: if they aren't bugs then I wouldn't worry about it for the moment
<rogpeppe1> davecheney: do you know if anything's happened about 1.10 yet?
<rogpeppe1> davecheney: 'cos i have a couple of minor bugs (i already have the fixes for them) that it would be great to sort out if there was a moment or two more.
<rogpeppe1> davecheney: it seems nobody has ever used juju get.
<davecheney> rogpeppe1: yeah, i saw that bug
<davecheney> i think you are right
<davecheney> noone ever did use it
<rogpeppe1> davecheney: i had a fun time yesterday starting up a juju env, making some weirdish relations, upgrading charms, resolving hooks, etc
<rogpeppe1> davecheney: it actually seemed to work pretty well
<davecheney> rogpeppe1: i don't doubt that, we have excellent charm compatibility
<rogpeppe1> davecheney: i've just had an idea for a way to make it easy to write little charms that exercise particular functionality; trying to knock something together today
<davecheney> jujud   8613 root    1w   REG  253,1    71378 131869 /var/log/juju/machine-0.log
<davecheney> jujud   8613 root    2w   REG  253,1    71378 131869 /var/log/juju/machine-0.log
<davecheney> jujud   8613 root    3r   CHR    1,9      0t0   5786 /dev/urandom
<davecheney> jujud   8613 root    4w   REG  253,1    71378 131869 /var/log/juju/machine-0.log
<davecheney> do I even want to ask why we have 3 fd's pointing to the same log file ...
<davecheney> rogpeppe1: sweet
<rogpeppe1> davecheney: stdout and stderr are expected
<rogpeppe1> davecheney: not sure about 4
<davecheney> that isn't as important as# lsof -p $(pgrep jujud) | grep -c ESTABLISHED
<davecheney> 129
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1170595
<davecheney> that is why we can't provision more than about 200 machines in a run
<rogpeppe1> oops
<rogpeppe1> have you found the source of the leak?
<davecheney> looking now
<davecheney> shouldn't take long
<davecheney> given the number of times this problem turns up
<davecheney> i'm smacking myself it wasn't the first thing I looked for
<rogpeppe1> davecheney: this was the status from one of yesterday's environments http://paste.ubuntu.com/5719234/
<rogpeppe1> davecheney: note the interesting relationship between mongo and logging there
 * davecheney isn't quite sure what is wrong there
<davecheney> are they circular ?
<rogpeppe1> davecheney: nope
<rogpeppe1> davecheney: there's nothing wrong
<rogpeppe1> davecheney: it's just quite cool that you can do it
<rogpeppe1> davecheney: basically, logging requires mongo to store its logs. but we also want to store the log files produced by mongo itself, so the logger is subordinate to mongo as well as being related to it.
<davecheney> ok
<rogpeppe1> davecheney: i set it up deliberately like that, thinking it might not work
<rogpeppe1> davecheney: but it seems to work fine (at least on the surface! i haven't *actually* looked at the logs in mongo)
<rogpeppe1> anyone know if there's an easy way for a charm to find out its service name?
<davecheney> you'd think that would be straight forward
<rogpeppe1> currently the only thing i can think of is `pwd | sed blahblah`
<rogpeppe1> which is a hack
<davecheney> it isn't a config property ?
<rogpeppe1> davecheney: no
 * davecheney gives up
<rogpeppe1> davecheney: i'm not sure it should be a config property
<davecheney> maybe i used the wrong word
<davecheney> setting might be appropriate
<rogpeppe1> davecheney: it could easily be an env var though
<rogpeppe1> davecheney: settings can change
<rogpeppe1> davecheney: this is immutable
<davecheney> again i'm using the wrong word
<davecheney> surely we have a class of setting which are immutable
<rogpeppe1> davecheney: ah, ok
<rogpeppe1> davecheney: i don't *think* so
<rogpeppe1> davecheney: there might be a special case for public-address i suppose
<rogpeppe1> davecheney: ah, but that's relation setting anyway
<rogpeppe1> davecheney: currently service settings map exactly to the config defined in the charm
<rogpeppe1> davecheney: which seems good to me
<rogpeppe1> davecheney: i think just an env var JUJU_SERVICE to go along with JUJU_ENV_UUID would be good
<davecheney> i agree
<davecheney> sounds like something very useful
<rogpeppe1> davecheney: and i'd add JUJU_SERVERS too
<rogpeppe1> davecheney: yeah, it's very useful because it's an easy and predictable disambiguation mechanism
<rogpeppe1> davecheney: so i can create a directory that has a predictable name but is guaranteed not to clash with similar names chosen by other colocated charms
<rogpeppe1> davecheney: JUJU_SERVICE is a one-line change
<rogpeppe1> davecheney, fwereade, dimitern: do you know if trunk is still frozen?
<fwereade> rogpeppe1, davecheney, dimitern: I have heard nothing from mramm re the deadline ambiguity alluded to my mgz
<fwereade> rogpeppe1, davecheney, dimitern: whoops, I did actually, had missed that mail
<rogpeppe> fwereade: to you only? i don't think i saw anything
<fwereade> rogpeppe, davecheney, dimitern: I think we should revert the 1.10 version for now
<rogpeppe> fwereade: ok - what's the situation?
<fwereade> rogpeppe, apparently the *real* deadline is EOD monday
<rogpeppe> fwereade: oh, that's great! i'll propose a couple of bug fixes then, if that's ok.
<fwereade> rogpeppe, so I think we should revert the version and keep going on low-risk/high-impact bugfixes for today at least
<rogpeppe> fwereade: 1130149 and 1170425 are both easy and worth doing
<rogpeppe> #1130149
<fwereade> rogpeppe, although, tbh, today at *most* also applies ;p
<rogpeppe> lp#1130149
<rogpeppe> fwereade: agreed entirely
<rogpeppe> fwereade: BTW what do you think about a $JUJU_SERVICE env var?
<rogpeppe> fwereade: so a charm can know what service it's running as
<fwereade> rogpeppe, use case?
<rogpeppe> fwereade: to go along with JUJU_ENV_UUID
<rogpeppe> fwereade: it gives an easy way for a charm to create a predictable directory that won't clash
<rogpeppe> fwereade: also it provides a reliable way for a knowledgable charm to find the unit config (although tbh i think we should provide JUJU_SERVERS or something like that instead of needing to do that)
<fwereade> rogpeppe, I think service name is too coarse, and you really want unit name
<fwereade> rogpeppe, sorry, what's the unit config?
<rogpeppe> fwereade: the uniter agent config
<rogpeppe> fwereade: yeah, unit name would be good
<rogpeppe> fwereade: currently you *can* find it out, but only by mangling pwd, which is dreary and nasty.
<rogpeppe> fwereade: mind you i'm not sure it's currently possible to have two units of the same service in the same container, is it?
<fwereade> rogpeppe, yeah, but hitting the agent conf at all is dreary and nasty -- we should be explicitly making the API server addresses available if hooks need them
<fwereade> rogpeppe, nothing stopping you doing that
<rogpeppe> fwereade: yeah, i think we should; but i think the unit name is useful info too.
<fwereade> rogpeppe, JUJU_UNIT_NAME is already there, isn't it?
<rogpeppe> fwereade: for my particular use case, i'm wanted to write a charm that made it easy to test pwd
<rogpeppe> fwereade: ah, i missed that
<fwereade> rogpeppe, sorry, I'm being slow, test what about pwd?
<rogpeppe> mistype!
<fwereade> haha
<rogpeppe> fwereade: for my particular use case, i'm wanting to write a charm that made it easy to test aspects of charm behaviour
<rogpeppe> fwereade: $JUJU_UNIT_NAME is great
<fwereade> rogpeppe, sweet
<rogpeppe> fwereade: although perhaps $JUJU_SERVICE might be useful too, i dunno
<fwereade> rogpeppe, I *am* wondering about the juju gui charm though
<rogpeppe> fwereade: yeah
<rogpeppe> fwereade: i really think we should provide server address info
<fwereade> rogpeppe, I'm not sure the juju gui should be bound to the juju that deployed it
<rogpeppe> fwereade: ah, that's an interesting point
<fwereade> rogpeppe, I suspect that API information should just be service config
<fwereade> rogpeppe, even if it's a little less convenient to set it up
<rogpeppe> fwereade: the problem with that is that in a HA world that info changes
<rogpeppe> fwereade: i could see that it might be good to allow both ways actualy
<rogpeppe> fwereade: use the local server unless a config option is set
<fwereade> rogpeppe, maybe, I need to think about this for a bit
<rogpeppe> fwereade: then we can potentially have something that watches some environment and makes config changes when the set of server addresses changes
<fwereade> rogpeppe, it kinda feels like the same old service-output problem
<rogpeppe> fwereade: anyway, i don't think there's a good reason to make it hard for a charm to access its own API server
<rogpeppe> fwereade: ?
<rogpeppe> fwereade: which problem was that?
<fwereade> rogpeppe, that we'd kinda like to be able to get information back out of services
<rogpeppe> fwereade: ah yes. we really really do
<fwereade> rogpeppe, it should ideally always be possible to deploy a service with default configuration and have it work nicely
<rogpeppe> fwereade: i think that's one of the most crucial missing juju features. that and allowing a charm to change things asynchrously.
<rogpeppe> fwereade: i agree.
<fwereade> rogpeppe, in the case of a password a default password is painfully insecure, and generating one on the fly should be perfectly possible, but there's no way to get it out
<fwereade> rogpeppe, for the async stuff you mean juju-run basically?
<rogpeppe> fwereade: yeah
<fwereade> rogpeppe, agreed on both points
<fwereade> rogpeppe, anyway, those bugs
<fwereade> rogpeppe, 1130149, +100
<rogpeppe> fwereade: because currently there's no way for a unit to *say* anything other than in response to something else
<fwereade> rogpeppe, 1170425, I'll take quite a lot of convincing
<fwereade> rogpeppe, yep, definitely
<rogpeppe> fwereade: are you suggesting that juju get shouldn't work on a subordinate service?
<fwereade> rogpeppe, I'm suggesting that calling Constraints on a subordinate service is DIW
<rogpeppe> fwereade: did i suggest otherwise?
<fwereade> rogpeppe, last night, I think you did ;p
<rogpeppe> fwereade: ah, i didn't know you'd seen that :-)
<rogpeppe> fwereade: i knew you'd be -1 on that suggestion
<fwereade> rogpeppe, so long as it's done by skipping the Constraints call I'm fine, I guess, but I'm a bit surprised that the gui always wants to get constraints alongside config
<rogpeppe> fwereade: i can't really see a down side, but there y'go.
<rogpeppe> fwereade: it gets all the service info in one call
<rogpeppe> fwereade: the fix i made just tested IsPrincipal
<fwereade> rogpeppe, ok, that's fair enough in the current context
<davecheney> wow, hp cloud is so much faster than ec2
<rogpeppe> davecheney: it wouldn't take much :-)
<davecheney> bootstraps take < 2 mins on hp cloud
<rogpeppe> fwereade: your password use-case is an interesting one.
<rogpeppe> fwereade: and highlights one particular issue with getting stuff out of a service - can the service somehow choose a "shared" value that all units agree on, or can you just see a set of values for each unit? i think probably just the latter actually.
<fwereade> rogpeppe, that's just a matter of exposing stuff we already have, so it would certainly be simpler
<rogpeppe> fwereade: in a way you could think of the service config the relation settings of the juju client
<rogpeppe> s/the relation/as the relation/
<rogpeppe> fwereade: so a similar model could apply - a charm could run config-set to set its own config settings that could be seen by the client.
<fwereade> rogpeppe, that is a *very* nice way of looking at it
<fwereade> rogpeppe, but it does ring up interesting race possibilities, I think
<rogpeppe> fwereade: really? each unit would have its own set of config settings
<fwereade> rogpeppe, if I deploy 3 units of something, which one gets to pick the output password for the service administrator?
<rogpeppe> fwereade: they all pick their own passwords
<fwereade> rogpeppe, I don't think it necessarily makes sense at a unit level but go on
<rogpeppe> fwereade: as a client i have to choose which unit to get the password from
<rogpeppe> fwereade: that's why the relation analogy is nice - with relations, there's one group of settings for each unit, and each unit can set its *own* settings, but can only read the remote settings.
<rogpeppe> fwereade: if you have a service where all units must agree on a password, they can work it out together and present a unified front
<rogpeppe> fwereade: doing leader election perhaps through a peer relation
<rogpeppe> fwereade: shared read-write settings are a no-no i think
<davecheney> https://codereview.appspot.com/8668048
<davecheney> ^ fixes openstack connection leak
<rogpeppe> davecheney: yay!
<davecheney> doing a 300 node test now
<davecheney> it's not leaking
<davecheney> so sayeth lsof
<davecheney> but i'll leave it running and get some dinner
<rogpeppe> davecheney: i'm not sure the fix is quite right
<rogpeppe> davecheney: it can still potentially leak, i think
<davecheney> rogpeppe: oh realy ?
<rogpeppe> davecheney: if retryAfter == 0 we leak
<davecheney> oh fuck, i didn't see all those stupid returns
<davecheney> right, will fix some more
<rogpeppe> davecheney: i'd be tempted to put it into its own function
<davecheney> rogpeppe: ohhh
<davecheney> i have many many refactors to this package
<rogpeppe> davecheney: with a deferred "if err != nil {resp.Close()}
<davecheney> rogpeppe: the body closing is all over the shop in that package
<davecheney> i have a branch for fixing that as well
 * rogpeppe is not greatly surprised
<davecheney> PTAL https://codereview.appspot.com/8668048
<rogpeppe> davecheney: i think that's wrong too, probably
<davecheney> well fuck
<davecheney> where do you think it goes ?
<rogpeppe> davecheney: does nothing read the resp body returned from sendRateLimitedRequest ?
<rogpeppe> oops sorry!
<rogpeppe> you're good, i think
<davecheney> cool
<davecheney> it is hard to understand when it is read and not read
<davecheney> and there are other potential places where the connection can leak
<davecheney> check out client.BinaryRequest
<davecheney> i've patched all those in my other branch, but they didn't appear to be the problem
<rogpeppe> davecheney: LGTM
<davecheney> no rush on the review, I have something similar bodged into the load testing machine
<davecheney> and it's doing the job
<fwereade> davecheney, LGTM also
<davecheney> fwereade: what is the story with patches to trunk ?
<davecheney> yes ? no ? please ? maybe ?
<fwereade> davecheney, I'm going to revert the version right now
<davecheney> ok
<fwereade> davecheney, low-risk/high-value changes to trunk are fine for today I think
<davecheney> AHHH SHIT
<davecheney> this is too goose
<davecheney> and jon's bot is fucked
<fwereade> davecheney, hell-damn -- I think the juju-core revert still stands
<fwereade> davecheney, rogpeppe: https://codereview.appspot.com/8855044
<fwereade> davecheney, but jam's not around today is he?
<fwereade> dimitern, do you know how we can land goose fixes ATM?
 * davecheney grumbles about things 
<rogpeppe> fwereade: LGTM trivial
<jam> fwereade: I'm definitely not here and responding to davecheney's request
<jam> definitely not right now.
<davecheney> jam: good to know
<davecheney> or not
<davecheney> i think
<fwereade> jam, well, that is very lazy and irresponsible of you ;P
<fwereade> jamtyvm
<davecheney> fwereade: LGTM, just commit it
<fwereade> jam, sorry, I though this was your day off
<fwereade> davecheney, don;t worry already happening ;
<davecheney> :)
<jam> fwereade: it is
<jam> which is why I'm definitely *not* doing it exactly right now.
<jam> and it should be done in as long as it takes to confirm it doesn't break juju-core's test suite
<davecheney> jam: thanks for fixing gz's one as well
<davecheney> to opine for a second
<davecheney> the http package it trail by fire for everyone
<davecheney> surely there must be a better way to write a http client that doesn't mame anyone who touches it
<jam> davecheney: so why doesn't gc close the resp.Body stuff? Or it does, but may take a while. Or it doesn't because underlying it all is a shared http connection that keeps a reference?
<davecheney> jam: there is no finaliser on the response body
<davecheney> this is part of the connection reuse logic
<davecheney> a very questionable decision
<davecheney> eventually if every refreence to the response, and hence the net.Conn was freed
<davecheney> the finaliser on the fd would close it
<davecheney> but because of the way the connection reuse logic works, a response (and hence the body) is 'checked out' until you close it
<rvba> fwereade: can I land the maas provider constraints stuff today?
<fwereade> rvba, ...honestly I can't think why not, if it works, let me go review that right away
<fwereade> rvba, I don;t think it's likely to be destabilizing
<rvba> fwereade: it should be pretty safe
<fwereade> rvba, but I seem to be being dense, because I don't see a review
<fwereade> rvba, MP
<rvba> fwereade: https://codereview.appspot.com/8842045/
<fwereade> rogpeppe, btw, are you planning to look at both those bugs you linked before?
<rvba> fwereade: it has been reviewed by dimitern already.
<rogpeppe> fwereade: yeah, i'm doing them
<fwereade> rogpeppe, <3
<fwereade> rvba, we try to have 2 reviews (except for the truly trivial), may I take a quick look before I approve?
<rvba> fwereade: sure, please do.
<fwereade> rvba, that's approved
<fwereade> rvba, tyvm
<rvba> fwereade: ta
<fwereade> rvba, I am dense, I found it in LP and reviewed there
<fwereade> rvba, close enough
<rogpeppe> fwereade: BTW the old "// Breaks compatibility with py/juju" comment in statecmd/get.go - do you know anything more about that? i'm presuming that py juju printed the actual value and the compatibility breakage is just because we're returning null
<davecheney> rogpeppe: i think I wrote that
<rogpeppe> davecheney: do you remember what the issue was?
<fwereade> rogpeppe, I'm afraid I'm almost 100% ignorant of get, but, well, we should avoid compatibility breaks where possible
<davecheney> rogpeppe: this was probbly lisbon II
<davecheney> and gustavo said do it this way
<rogpeppe> fwereade: agreed totally.
<davecheney> i think it was something he felt was an improvement over python
<rogpeppe> davecheney: hmm, interestin
<rogpeppe> g
<rogpeppe> davecheney: surely the fact that it never prints default values isn't right though...
<davecheney> rogpeppe: it was certainly this issue surounding the difficulty in diferentiating between the default value
<davecheney> and a value which was set, but set to the default
<davecheney> http://paste.ubuntu.com/5721158/
<davecheney> ^ i've broken HP Cloud, where is my medal
<rogpeppe> davecheney: yeah. maybe py juju didn't have the "default" bool
<davecheney> from emmory
<rvba> fwereade: the change to the MAAS provider is merged now.  Make sure you have the last version of the gomaasapi lib otherwise some tests in environs/maas will fail.
<davecheney> it was the issue of telling the default value, ie, nothing set, from the value which was set, but was set to the default
<fwereade> rvba, cool, thanks
<fwereade> davecheney, rogpeppe: isn't it mportant that we differentiate between those cases?
<rogpeppe> fwereade: yeah, i'm thinking that
<rogpeppe> fwereade: i think our DeepEquals there is wrong
<davecheney> fwereade: i think so as well
<davecheney> it is a tricky problem
<davecheney> and gets orders of magnitude more complicated when you consiuder upgrade charm
<davecheney> may supply a default value where one was previous set
<fwereade> davecheney, the upgrade-charm logic is that values left default change to new defaults; values set and coincidentally matching the old defaults should not change
<rogpeppe> fwereade: that seems right to me
<rogpeppe> fwereade: which would indicate that if the service config has an entry, then default == false
<fwereade> rogpeppe, agreed
<rogpeppe> fwereade: so no need for an equality comparison
<fwereade> rogpeppe, and in that case we just poke in the value from the charm default, and I think we're done there
<davecheney> fwereade: doesn't that mean if I set a config value, then upgrade my charm, then unset that config value, I may find that the default value then makes it look like nothing happened ?
<fwereade> rogpeppe, +1
<rogpeppe> davecheney: wouldn't that be the correct behaviour
<rogpeppe> ?
<davecheney> rogpeppe: i'm trying not to make a judgement here, at the time this problem sounded NP hard
<rogpeppe> davecheney: because in fact the value of that setting from the charm's pov has not changed
<davecheney> rogpeppe: i'm talking about the human using the tool
<davecheney> defaults appear to work for the charm, not the user
<rogpeppe> davecheney: ah, the user *would* see that something changed
<rogpeppe> davecheney: they'd see the "default" attribute switch to true
<rogpeppe> davecheney: although the "value" entry would stay unchanged
<davecheney> rogpeppe: it's is clear I'm tlaking out my rectum
<davecheney> i don't have anything more useful to add at this point :)
<rogpeppe> davecheney: well you *are* down under
<rogpeppe> :-)
<davecheney> rogpeppe: fuckit, everthing is upside down here
<davecheney> my favorite part of doing juju destroy-environment is the way all the ssh connections to your HP tenant stall
<davecheney> i'm sure they are doing some network reconfiguration as machines leave your tenant vlan
<fwereade> davecheney, btw, do you know the latest status of the mongodb in raring?
<fwereade> rogpeppe, davecheney, dimitern: has anyone been able to bootstrap onto raring today?
<dimitern> fwereade: haven't tried on raring
<rogpeppe> fwereade: i haven't tried - will do
<TheMue> fwereade: just doing it with quantal, but not yet raring (have just update my test image to quantal, raring will follow)
<fwereade> TheMue, you've been testing bootstrap *to* not just bootstrap *from*, though, right?
<TheMue> fwereade: yep, set in environments.yaml, i only wanted to have a matching image ;)
<fwereade> TheMue, so you've tested bootstrap to precise/quantal/raring from precise in a bunch of ec2 regions?
<TheMue> fwereade: from quantal
<fwereade> TheMue, I has a confuse, I though you only just started working with the quantal image?
<TheMue> fwereade: yesterday i tested precise from precise, now i want to test quantal from quantal
<fwereade> TheMue, ok, so everything in 1.9.14 works to/from precise? or not?
<TheMue> fwereade: yes, for me with a clean image (no dev stuff lying around) and installed juju from the ppa e'thing worked fine
<TheMue> fwereade: now it complains, charm not found
<TheMue> fwereade: but bootstrap worked
<jam> allenap: I just posted the reason why 'go build' still really needs you to be in GOPATH: https://code.launchpad.net/~jtv/juju-core/makefile/+merge/158640
<jam> Let me know if I can clarify anything for you.
<davecheney> fwereade: re mongo in raring
<davecheney> as far as I am concerned, it works
<fwereade> TheMue, ofc it complains charm not found, there are hardly any charms for quantal ;p -- you'll need to deploy precise/mysql (or env-set default-series=precise, deploy mysql)
<fwereade> davecheney, I guess you talked to thumper about it last night? he seemed to be having problems iirc
<davecheney> TheMue: yes, the only series which is worth bootstrapping is precise
<davecheney> there are precious little charms for Q and R
<davecheney> not even the ubuntu charm
<fwereade> davecheney, well, there should in theory be no reason not to bootstrap into other series
<davecheney> sure, you can bootstrap into Q
<davecheney> then deploy cs:precise/mysql
<fwereade> davecheney, but raring does seem to be acting pretty weird
<fwereade> davecheney, hey, btw, what would go wrong if we did start building everything for i386 as well?
<davecheney> fwereade: no, we can start doing that straght away
<davecheney> fwereade: on
<davecheney> oh
<davecheney> one thing
<davecheney> if I bootstrap from a 386 machine
<davecheney> are the tools going to look for 386 or amd64 versions ?
<fwereade> davecheney, they should default to amd64
<davecheney> if the arch is clamped to amd64, then there will be no problem
<TheMue> fwereade, davecheney: feels somehow funny, bootstrapping quantal and deploying precise
<davecheney> good, then there will be no problem
<fwereade> davecheney, client series should not affect chosen tools
<davecheney> TheMue: i agree, i think cross series environments are the work of the devil
<davecheney> fwereade: cool, i've adjusted the recipes to build amd64 and 386
<fwereade> davecheney, brilliant -- and the mongodb one too?
<davecheney> that is alreaedy done
<fwereade> davecheney, if it's not a problem that will enable developer to upload-tools from i386
<davecheney> and remember, you don't need mongo on the client
<fwereade> davecheney, <3
<davecheney> only on the server, and as we discussed that will always be amd64
<fwereade> davecheney, I had thought that was due to actual problems with i386?
<davecheney> fwereade: no, their problem is apt-get install juju-core on a 386 machine is a noop
<fwereade> davecheney, not being able to upload tools from i386 is I think also a problem, but I agree it is not a critical one that should seriously delay us
<davecheney> fwereade: you can run 386 tools on amd64
<davecheney> but the version won't match so the bootstrap won't work
<davecheney> if the arch on the uploaded tools' were clamped to amd64, this would solve that problem
<dimitern> i'm installing a fresh vbox with raring daily to try bootstrapping
<fwereade> davecheney, heh, I had thought that should work, someone assured me it wouldn't and I never tried it
<fwereade> davecheney, but I think there's something I still don't get: what is the problem with running i386 servers?
<fwereade> davecheney, it seems like all our tools and dependencies *can* be built for i386
<davecheney> fwereade: there are two problems
<davecheney> 1. we don't have any released tools for 386 (that is being fixed, we build them from the packages in PPA)
<davecheney> 2. for most of the ec2 machines, amd64 is the deftault
<davecheney> the t1.micro is the only machine that runs 386
<davecheney> so the answer to both is, it should work, but we haven't tried
<davecheney> mainly because upload tools was such an arse before you fixed it
<davecheney> to follow series
<fwereade> davecheney, also m1.small, m1.medium, c1.medium
<davecheney> m1 small is amd64
<davecheney> anyway, it doesn't matter
<davecheney> we can fix it
<davecheney> it was just never a priority before
<davecheney> https://code.launchpad.net/~dave-cheney/+recipe/juju-core-daily
<davecheney> doing a test build now
<fwereade> davecheney, http://aws.amazon.com/ec2/instance-types/ says otherwise
<fwereade> davecheney, i386 is not my highest priority but it'd be great to have it as a possibility
<davecheney> fwereade: not really interested in arguing about this, the m1.small we always bootstrap for the state server is amd64
<davecheney> and this argument is impinging on my personal dislike for i386
<davecheney> let me get back to you when I have the build recipe straightened out
<fwereade> davecheney, sorry, I wasn't trying to argue with you -- but yeah, I am not helping your productivity
<davecheney> np probs
<davecheney> i don't want to argue about this -- it's trivial
<davecheney> one thing, i don't think the tests pass on 386
<davecheney> certainly not for all series
<davecheney> because we don't have the right ec2 cloud data service fixtures
<fwereade> davecheney, hmm, that's interesting, sounds like it may partially be a test isolation issue though
<rogpeppe> davecheney, fwereade: my raring bootstrap failed
<fwereade> davecheney, excluding upload-tools, client arch should not mater
<fwereade> rogpeppe, Processing triggers for ureadahead ... forever?
<rogpeppe> fwereade: just looking
 * rogpeppe typed "juju looking" there initially
<davecheney> rogpeppe: occupational hazzard
<rogpeppe> fwereade: yiup
<rogpeppe> fwereade: wtf is ureadahead?
<fwereade> rogpeppe, it speeds up boot stuff AIUI
<rogpeppe> lol
<fwereade> rogpeppe, I have *no* idea what is going on there or why it changed
<rogpeppe> fwereade: just looking at the ps output. what is whoopsie? http://paste.ubuntu.com/5721320/
<davecheney> jesus fuck people
<davecheney> can everyone stop asking about mongo/386
 * davecheney was referring to the mailing list
<davecheney> which trailed the IRC channel by 20 mins
<rogpeppe> fwereade: and the pstree output which makes it clearer http://paste.ubuntu.com/5721323/
<TheMue> lunchtime
<davecheney> whoopsie is the thing that catches any SIGSEGV's and sends a report to ubuntu
<davecheney> rogpeppe: what does /var/log/cloud-init-output.log say
<davecheney> i bet it couldn't find the tools
<rogpeppe> davecheney: i don't think it got that far
<davecheney> or possibly there was an error in bootstrapping which your set -xe change caused bootstrapping to quit before running its full course
<fwereade> davecheney, that's just a wget, but it gets stuck just after installing mongo
<davecheney> rogpeppe: it installed mongo, it has done at least some of cloud init
<rogpeppe> davecheney: http://paste.ubuntu.com/5721330/
<rogpeppe> davecheney: it's "processing triggers for ureadahead"
<davecheney> sounds like a bug in raring
<rogpeppe> davecheney: indeed
<davecheney> i couldn't get it to intsall in a vm on tuesday
<dimitern> fwereade: danilos reports successful bootstrap and all on raring us-east-1
<fwereade> dimitern, normally I would be happy about that sort of news
<fwereade> dimitern, today I just WTF even harder
<davecheney> fwereade: different regions may have different version or raring
<dimitern> fwereade: once my raring vbox finally installs, i'd be trying out some other regions
<fwereade> davecheney, ah, yes, maybe they haven't been updated anywhere else yet
<danilos> dimitern, fwereade: package version I am using: http://pastebin.ubuntu.com/5721342/
<fwereade> danilos, if it's still up, would you let me know the AMI you're running?
<danilos> dimitern, juju status: http://pastebin.ubuntu.com/5721346/
<danilos> fwereade, sure, looking
<dimitern> wallyworld_: will you joining us on mumble?
<danilos> fwereade, ami-d0f89fb9
<fwereade> danilos, golly
<fwereade> danilos, I wonder how that one got there
<fwereade> danilos, ah, ok, that's bootstrapping into precise from raring
<davecheney> fwereade: each ami differs per region
<rogpeppe> fwereade, dimitern, davecheney: https://codereview.appspot.com/8851045
<fwereade> davecheney, yeah, I was surprised that it wasn't a raring AMI
<fwereade> davecheney, then I realised default-series
<davecheney> boom!
<rogpeppe> fwereade: it's nice that i can now trivially start a raring bootstrap instance
<rogpeppe> fwereade: that CL fixes those bugs with juju get BTW
<fwereade> rogpeppe, yeah, just a shame that it doesn't work ;p
<rogpeppe> fwereade: not our fault :-)
<rogpeppe> fwereade: apart from "we" is really canonical, so of course it's our fault...
<rogpeppe> danilos: as on call reviewer, could i ask for a review of https://codereview.appspot.com/8851045 please?
<rogpeppe> time for lunch
<dimitern> fwereade: so ap-southeast-1 gives the "use of closed network connection" R->P
<danilos> dimitern, fwereade: with ap-southeast-1 region it fails: http://pastebin.ubuntu.com/5721370/
<fwereade> dimitern, danilos: I think that is the chunked encoding business that tim found
<dimitern> fwereade: so how to go about fixing it?
<fwereade> dimitern, danilos: would one of you try hacking up goamz/s3/s3.go to ReadAll of the response before trying to decode the XML, and see whether that helps?
<fwereade> dimitern, danilos: it's in List IIRC
<dimitern> fwereade: I can take a look, but let me first reproduce it
<fwereade> rogpeppe, the raring bootstrap failure is resolved by dropping set -xe
<fwereade> rogpeppe, ie the cannot bootstrap *onto* raring, vs *from* raring that dimitern's poking at
<dimitern> fwereade: i.e. removing set -xe allows you to bootstrap to raring?
<fwereade> dimitern, *onto* not *from*
<fwereade> dimitern, yes
<rogpeppe> fwereade: that's odd - is the scope of that set -xe greater than the cloudinit scripts that we use?
<dimitern> fwereade: what's the use - there are not charms for raring?
<fwereade> rogpeppe, I reckon the ureadahead is connected
<fwereade> dimitern, charms don't tend to run on the bootstrap machine anyway
<rogpeppe> fwereade: i'm surprised that any of our scripts have run by that stage, including the set -xe
<rogpeppe> fwereade: i'd have expected to see some output
<rogpeppe> fwereade: from the initial mkdir at any rate
<fwereade> rogpeppe, yeah, it makes little sense
<rogpeppe> fwereade: i'm just having a look at the cloud-init sources
<fwereade> rogpeppe, bah, status output ordering has changed
<rogpeppe> fwereade: hmm, where's that an issue?
<fwereade> rogpeppe, seems a bit arbitrary, surprised my eye
<fwereade> rogpeppe, not saying it's significant  to automatic consumers of that data
<rogpeppe> fwereade: ok. it's trivial to fix. do you want alphabetic ordering again?
<rogpeppe> fwereade: i hadn't realised it was an issue, sorry
<fwereade> rogpeppe, yeah, would be nice, was just starting to do it myself
<fwereade> rogpeppe, np, nor had I
<fwereade> rogpeppe, stick with what you're doing, I'll propose in a mo
<rogpeppe> fwereade: if you could do it, that would be great. please leave Err at the top. otherwise, just pipe the struct fields through sort, and the tests should remain identical
<dimitern> rogpeppe: reviewed
<rogpeppe> dimitern: ta!
<fwereade> rvba, ping
<rvba> fwereade: pong
<fwereade> rvba, I think I just answered my own question actually but it could probably use some discussion
<TheMue> rogpeppe: and another review
<rogpeppe> TheMue: thanks
<fwereade> rvba, sorry,trying to marshal my thoughts
<rogpeppe> fwereade: i'd like your input too if that's ok.
<fwereade> rogpeppe, sorry, on which?
<rogpeppe> fwereade: on https://codereview.appspot.com/8851045
<rogpeppe> fwereade: 'cos it's a last-minute change that may well be crackful :-)
<fwereade> rog, fuck, never sent my comment
<rogpeppe> fwereade: ah, np
<fwereade> rogpeppe, I think we always want to output the actual value
<rogpeppe> fwereade: we always do, i think
<rogpeppe> fwereade: or do you mean that it should print nil when it's unset?
<fwereade> rogpeppe, it should print the actual value
<fwereade> rogpeppe, whatever the default happens to be, or maybe nil if there's no default
<rogpeppe> fwereade: doesn't it do that?
<fwereade> + "outlook": map[string]interface{}{
<fwereade> + "description": "No default outlook.",
<fwereade> + "type": "string",
<fwereade> + "default": true,
<fwereade> + },
<rogpeppe> fwereade: in that case there is no default
<rogpeppe> fwereade: i thought that omission was better than saying "nil" explicitly
<rogpeppe> fwereade: then the value is *always* of the correct type
<fwereade> rogpeppe, I thought we were aiming for compatibility
<fwereade> rogpeppe, python always outputs a value, I think
 * rogpeppe looks back at the python code
<TheMue> fwereade: so "None" would be the correct value?
<fwereade> TheMue, well, nil I think
<TheMue> fwereade: nil in Py is None, isn't it?
<rogpeppe> fwereade: ok; i don't like it, but i accept the compatibility argument.
<wallyworld_> fwereade: dimitern: i've finally finished the openstack constrants work. i got bitten badly by a stupid Go gotcha regarding for loop variables, took me ages to find the cause of my test falures but it's finally done.
<wallyworld_> https://codereview.appspot.com/8816045
<fwereade> TheMue, ah sorry I thought you meant the string "None"
<dimitern> wallyworld_: awesome
<fwereade> wallyworld_, excellent
<rogpeppe> wallyworld_: using a loop variable in a closure, by any chance? :-)
<TheMue> fwereade: ok, nil or None, how is it represented in yaml?
<wallyworld_> rogpeppe: i was assigning the address of the for loop variable to another variable
<fwereade> TheMue, IIRC it's nil
<TheMue> fwereade: I thought it would be the string None
<fwereade> TheMue, sorry, "null"
<rogpeppe> wallyworld_: ah yes. in a for range, presumably. i argued strongly that it should be in its own scope, but failed to persuade.
<TheMue> fwereade: yep, just found it on yaml.org, null
<wallyworld_> rogpeppe: yes, that is it. i am disappointed that Go behaves like that. it is so unintuitive and no other language suffers from that
<rogpeppe> wallyworld_: C behaves like that
<rogpeppe> wallyworld_: and C++
<dimitern> wallyworld_: have you live tested this on both ec2 and canonistack or hp?
<TheMue> rogpeppe: :D
<wallyworld_> rogpeppe: hmm. i've never been bitten by the issue in those langiages
<wallyworld_> dimitern: not yet. i wanted feedback in parallel with that testing. it works with the doubles etc.
<rogpeppe> wallyworld_: that's because one generally doesn't take the address of local variables. but i've been bitten by that kind of thing many times in C.
<wallyworld_> fwereade: dimitern: you guys are most familiar with the required logic, so if you could look closely that would be great. not straight away, but at your convenience
<dimitern> wallyworld_: sure, i'll look into it
<fwereade> wallyworld_, cheers
<wallyworld_> dimitern: there still needs to be a followup branch to rework some of the default image id stuff used in the live tests. but this branch is waaaaaay big enough already
<dimitern> wallyworld_: what would be the nett gains from the follow-up?
<danilos> wallyworld_, hey, you should be sleeping or drinking right now, not putting out huge branches while I am OCR! :)
<dimitern> wallyworld_: considering the release is nigh, etc.
<wallyworld_> danilos: sorry, i got back from soccer and really want to get this stuff done
<wallyworld_> dimitern: the followup branch simply removes the need to specify default instance type and image id for the live tests
<danilos> wallyworld_, no worries, it's going to be an interesting exercise for me, I am sure others will review it much faster though :)
<wallyworld_> dimitern:  i thought i had already missed the deadline
<dimitern> wallyworld_: ok, so istm we can postpone the follow-up post release on monday?
<wallyworld_> danilos: it's a very big branch sorry. but a lot is deleted and/or moved code
<dimitern> wallyworld_: the abs deadline is now eod monday
<danilos> wallyworld_, yeah, I can see that (and you said as much in the MP)
<wallyworld_> dimitern: yes, we can postpone, although the followup will just be test changes
<dimitern> wallyworld_: sweet
<wallyworld_> dimitern: hopefully when i do the live tests with this everything will work ok, otherwise i'll need to tweak a bit
<rogpeppe> fwereade: PTAL https://codereview.appspot.com/8851045/
<wallyworld_> danilos: the idea is that the logic used to live in ec2, but should be common to ec2 and openstack etc. the only ec2 and openstack specific bits is the logic to select what instance types to consider and where the image metadata comes from
<rogpeppe> wallyworld_: i'm wondering if the moved logic would work better in its own package
<TheMue> gna, bootstrap and later destroy works, but status and scp any log not (local: quantal / remote: raring)
<rogpeppe> wallyworld_: (this is only after a tiny peek BTW)
<rogpeppe> wallyworld_: perhaps environs/instances ?
<wallyworld_> rogpeppe: i'd have no objection to that
<fwereade> rogpeppe, LGTM with one tedious request
<rogpeppe> wallyworld_: i'd just like to try to avoid cluttering environs with lots of logic that isn't truly universal to all providers
<wallyworld_> sure, np
<rogpeppe> fwereade: ooo kkkk
<fwereade> TheMue, when you say "bootstrap works", did you log in and check for a running agent?
<fwereade> rogpeppe, wallyworld_: +1 on environs/instances
<allenap> jam: That was a useful explanation, thank you :)
<rogpeppe> fwereade, wallyworld_: actually, how about environs/instance and environs/image ?
<wallyworld_> dimitern: fwereade: danilos: one thing i forgot to add but will do so before landing is logging when the fallback instance choosing logic is invoked so the user knows their chosen instance type is not being used, but a "best guess" is
<rogpeppe> fwereade: what's your objection to "  if s, ok := serviceCfg[k]; ok {" BTW?
<dimitern> wallyworld_: oh yeah, sgtm, thanks
<fwereade> rogpeppe, that was danilos
<rogpeppe> fwereade: oh yeah
<rogpeppe> fwereade: sorry
<fwereade> rogpeppe, np
<wallyworld_> rogpeppe: i think that's a bit too far? the logic is conceptually about choosing an instance to bootstrap so it sort of all belongs together
<wallyworld_> /sbootstrap/run
<rogpeppe> wallyworld_: ok, seems reasonable. i'd keep it singular though probably, though YMMV
 * wallyworld_ goes to get an alcoholic drink :-) or three
<dimitern> wallyworld_: have phun :)
<TheMue> fwereade:
<TheMue> fwereade: will do
<fwereade> rogpeppe, dimitern: https://codereview.appspot.com/8834047 (trivial probably)
<danilos> fwereade, is that just sorting entries in a struct?
<rogpeppe> fwereade: LGTM
<dimitern> fwereade: me too
<rogpeppe> fwereade: trivial
<fwereade> danilos, yep
<fwereade> cheers guys
<danilos> fwereade, I was going to LGTM, but I am too late I suppose :)
<danilos> anyway all, I am OCR, so feel free to ping me for reviews
<fwereade> danilos, if you do it quickly you'll beat the submit ;p
<danilos> fwereade, heh, nah, you've got 3 already, that's plenty enough ;)
<rogpeppe> fwereade: i think i'll just delete that "breaks compatibility" comment
<dimitern> danilos: when you're getting into the code all all is new, don't hesitate to review even stuff that has 2 LGTMs; asking questions always helps
<rogpeppe> fwereade: i agree with danilos' remark
<danilos> dimitern, sure thing
<fwereade> rogpeppe, sorry, which?
<rogpeppe> fwereade: the on
<rogpeppe> 			// This breaks compatibility with py/juju, which will set
<rogpeppe> 			// default to whether the value matches, not whether
<rogpeppe> 			// it is set in the service confguration.
<fwereade> rogpeppe, that we ought to collect that stuff for the release notes? yeah :/
<rogpeppe> fwereade: that is true
<rogpeppe> fwereade: i suppose i should email dave
<fwereade> rogpeppe, stick it in a Done card maybe
<fwereade> dimitern, any interesting results with ReadAll?
<dimitern> fwereade: not really
<dimitern> fwereade: my raring vm is misbehaving still
<fwereade> dimitern, so ReadAll gets a closed connection?
<fwereade> dimitern, bah, ok
<fwereade> danilos, are you set up on raring atm?
<danilos> fwereade, yeah
<dimitern> fwereade: i'll let you know if i make a breakthrough
<fwereade> danilos, ok, can I ask you to investigate the public-tools issue please?
<fwereade> danilos, you'll want to try bootstrapping without --upload-tools
<fwereade> danilos, and observing a "cannot find tools: connection closed" or something
<danilos> fwereade, sure, on any region specifically?
<fwereade> danilos, have you seen those at all?
<danilos> fwereade, seen it with ap-southeast-1, not with the default us-east1
<fwereade> danilos, or have you just not been bootstrapping without --upload-tools?
<danilos> fwereade, I haven't tried with --upload-tools, no
<fwereade> danilos, ok, cool, I seem to see it all the time in ap-southeast-2, but wherever you can repro it reliably
<fwereade> danilos, we don't want --upload-tools here I think
<danilos> fwereade, right, understood
<fwereade> danilos, if you look in goamz/s3/s3.go
<danilos> fwereade, should I use 1.9.14 package or trunk?
<fwereade> danilos, in the List method IIRC
<fwereade> danilos, trunk please
<fwereade> danilos, not worried about actually bootstrapping successfully, just listing the tools ok
<fwereade> danilos, there's a line with xml.NewDecoder(hresp.Body) or something
<fwereade> danilos, try to ReadAll the  body into a buffer and see whether we can decode that ok
<rogpeppe> fwereade: FWIW the next lines after "Processing triggers for ureadahead ..." are:
<rogpeppe> Setting up multiarch-support (2.17-0ubuntu5) ...
<rogpeppe> (Reading database ... 52136 files and directories currently installed.)
<danilos> fwereade, yeah, that's in S3.run()
<fwereade> rogpeppe, sorry, ECONTEXT -- this is bootstrapping into raring from tip?
<fwereade> rogpeppe, maybe I just never gave the ureadahead trigger long enough?
<rogpeppe> fwereade: into raring from precise
<rogpeppe> fwereade: you did - mine is still there.
<rogpeppe> fwereade: after some hours
<fwereade> rogpeppe, yeah, I thought I'd given it long enough... but I never saw anythng after the ureadahead triggers
<fwereade> rogpeppe, am I blithering?
<rogpeppe> fwereade: only if i am
<rogpeppe> fwereade: so probably :-)
<fwereade> rogpeppe, ISTM that dropping `set -xe` makes it all work, can you confirm/deny?
<rogpeppe> fwereade: i will try
<rogpeppe> fwereade: submitting the config get branch first
<fwereade> rogpeppe, go for it
<rogpeppe> fwereade: done
<fwereade> rogpeppe, ok, hum, now it seems not to work any more
<fwereade> rogpeppe, which is sort of good, because it was obviously an insane "fix"
<rogpeppe> fwereade: yup. although i could kind of imagine a way in which it might possibly have been a fix in some moderately insane kind of way
<fwereade> rogpeppe, but... now I don't know at all what is going on :(
<rogpeppe> fwereade: i really looks like a raring problem
<rogpeppe> fwereade: you know, i think cloud-init isn't hung up there - i think it's probably finished
<rogpeppe> fwereade: maybe the runcmd thing doesn't work in raring
<rogpeppe> fwereade: but... surely cloud-init can't be broken that badly?
<fwereade> rogpeppe, hmm, that is surprising to me
<fwereade> rvba, ok, I know my question now
<rogpeppe> fwereade: i can't currently see anything in ps alxw that says "python" or "cloud"
<rogpeppe> fwereade: and the final line in cloudinit.log is "Apr 19 10:58:16 ip-10-4-50-223 [CLOUDINIT] cloud-init[DEBUG]: Ran 18 modules with 0 failures"
<fwereade> rvba, once the node is acquired, how do we find out what arch it has?
<danilos> fwereade, does log.Printf have a length limit?
<fwereade> danilos, not that I am *aware* of, but...
<fwereade> rogpeppe, huh, that is most upsetting
<danilos> fwereade, it's cut-off XML that I get
<danilos> fwereade, can't pastebin it since it thinks it's PHP or other web scripts
<fwereade> danilos, bah
<fwereade> danilos, this works: http://paste.ubuntu.com/5721579/ ..?
<rogpeppe> fwereade: the set -xe can't be anything to do with it - the scripts really are in their own #!/bin/sh file.
<danilos> fwereade, http://people.canonical.com/~danilo/list-xml.txt
<fwereade> danilos, cheers
<danilos> fwereade, in general it does, but pastebin heuristic is probably bad here ;)
<danilos> fwereade, I'll try this with a region I had it working with just to make sure it's not Printf problem
<rogpeppe> hmm 7882 bytes; i wonder if that's an 8K block with a 310 byte html header or something
<danilos> fwereade, without the region set it works, but the output is much shorter: http://pastebin.ubuntu.com/5721586/
<fwereade> rogpeppe, are you aware of any magic necessary to deal with chunked transfer?
<danilos> doing any of this from gdb is not very useful it seems :/
<rogpeppe> fwereade: there is definitely magic in that area, but whether it's relevant here i dunno
<danilos> fwereade, I'll take a peek at Content-Length as well
<rogpeppe> danilos: that's actually longer (8138 bytes)
<fwereade> danilos, tyvm
<danilos> rogpeppe, is it? it seemed shorter ;)
<rogpeppe> danilos: istm that a truncation at 8192 might be possible
<rogpeppe> danilos: wc is your friend :-)
<danilos> rogpeppe, that would require me to get out of gdb (which is not a bad idea, considering how "useful" it is :))
 * rogpeppe doesn't like gdb much
<rogpeppe> danilos: does this only happen with the released binary?
<danilos> rogpeppe, nope, I am using trunk now
<danilos> rogpeppe, and region ap-southeast-1
<rogpeppe> danilos: and you can reproduce the issue? fantastic.
<danilos> rogpeppe, yeah, it seems so
<rogpeppe> danilos: i'm just trying myself, from the raring instance that failed to bootstrap correctly :-)
<danilos> rogpeppe, fwereade: ContentLength is -1 fwiw (indicates "unknown" according to http://godoc.org/net/http#Response)
<danilos> rogpeppe, cool
<rvba> fwereade: something like this should work: http://paste.ubuntu.com/5721626/
<rogpeppe> blasted goyaml requires gcc :-)
<fwereade> rvba, ok, cool -- and are the possible values "amd64", "i386", "arm"?
<fwereade> rvba, or should there be a translation layer?
 * rogpeppe finally has a juju binary built on raring
<rvba> fwereade: "i386" / "amd64" / "armhf/highbank"
<rvba> fwereade: wait, no: 'i386/generic' / 'amd64/generic' / 'armhf/highbank'
<fwereade> rvba, this is I guess the point at which I need to start understanding something about arm ;p
<fwereade> rvba, anyway the thing on my mind is the arbitrary tools choice
<fwereade> rvba, I guess you only have amd64 machines available currently?
<rvba> amd64 and arm machines actually.
<rvba> (in the MAAS lab)
<fwereade> rvba, hmm, how is it that we never accidentaly pick and arm machine?
<rvba> fwereade: when I test things in the lab (with Go juju), I disable the arm nodes.
<fwereade> rvba, (the issue is just that acquireNode doesn't pay attention to possibleTools, and it ought to be constraining the arch of the machine chosen to one we have tools for)
<fwereade> rvba, I guess we can always just loop over acquires until we get one we have tools for
<fwereade> rvba, but that feels a bit crap
<rvba> It does.
<rogpeppe> danilos: right, i've replicated the same problem
<fwereade> rvba, is there any way to say "a machine with one of these architectures"?
<fwereade> rvba, because possibleTools.Arches() will give you the necessary input for that
<rvba> fwereade: that's already what the constraints do IIRC.
<rvba> Oh, you mean one of these archs as opposed to just this arch right?
<fwereade> rvba, yeah
<rvba> let me checkâ¦
<fwereade> rvba, actually maybe we don't want exactly that
<danilos> rogpeppe, it seems to fail at reading "https://s3.amazonaws.com/juju-dist/?delimiter=&prefix=tools%2Fjuju-&marker=" for me, which comes in just fine in a browser
<rvba> fwereade: this would require a change in MAAS.  Right now, it expects one or zero value for the architecture constraint.
<rogpeppe> danilos: i'm just trying to see if i get the same failure when compiling against go tip
<fwereade> rvba, ok I think I know what we should do then
<fwereade> rvba, of the arches from possibleTools (which have already taken constraints into account), sort by preference, and construct a fresh constraints for each arch
<fwereade> rvba, if we can't acquire a node matching the first arch, try the others in order before giving up
<fwereade> rvba, sane?
<rvba> fwereade: sounds sensible
<fwereade> oh blast meeting
<mramm> https://plus.google.com/hangouts/_/539f4239bf2fd8f454b789d64cd7307166bc9083
<danilos> rogpeppe, now I am getting the same even without the region set: http://pastebin.ubuntu.com/5721653/ (I am not sure if it's related to reusing control-buckets or not since that's in the URLs it tries to get a list off)
<rvba> fwereade: changing the maas side to accept a list of arch constraints is simpler though, and completely backward compatible.
<rogpeppe> fwereade: was just kicked off
<fwereade> rvba, will have to think about that, not sure if there's some sort of preference ordering we can/should assume
<danilos> rogpeppe, fwiw, I am printing URLs it sends requests to in there
<rogpeppe> danilos: in a call currently
<danilos> rogpeppe, sure, I suppose you won't need me anymore if you can reproduce it yourself anyway ;)
<rogpeppe> danilos: you may well find a more promising line of enquiry
<danilos> rogpeppe, oh, this was probably failing because I consumed the entire response body with ReadAll()
<rogpeppe> danilos: sorry, what was probably failing?
<danilos> rogpeppe, never mind, brain blip
<danilos> rogpeppe, fwereade: however, reusing the same HTTP connection (re-enabling keep-alive by commenting out "Close:true" in the request) made it work for me with ap-southeast-1, or at least not fail in the same spot
<danilos> rogpeppe, fwereade: so it seems that amazon decides to kill off connections on some zones earlier than on others
<hazmat> so what i was saying is that juju doesn't specify a key pair when launching an instance
<hazmat> an aws key pair that is
<danilos> rogpeppe, fwereade: yeah, this patch was sufficient to get latest juju core to work even with ap-southeast-1 for me: http://pastebin.ubuntu.com/5721681/; juju status at http://pastebin.ubuntu.com/5721680/
<fwereade> danilos, yay!
<danilos> fwereade, rogpeppe: should I leave that with you guys since I am not sure what I could be breaking by switching to keep-alive HTTP? :)
<dimitern> danilos: awesome!
<rogpeppe> danilos: good work
<mramm> rogpeppe: can you land that one?
<fwereade> danilos, I'm not sure we know either, but I think rogpeppe has access to goamz
<rogpeppe> dimitern: i want to find out *why* it makes that difference
<mramm> rogpeppe: I think you have goamz commit
<fwereade> rogpeppe, ofc
<danilos> rogpeppe, I assume it's amazon deciding to kill off if you do 10-15 HTTP requests in separate connections in space of 1-2s
<danilos> rogpeppe, perhaps a firewall setting on their side to defend against DoS or bad API clients or similar...
<rogpeppe> danilos: i suppose so; seems weird. why from raring only?
<danilos> rogpeppe, good point, I don't know :)
<fwereade> rogpeppe, not from raring only I think actually
<rogpeppe> danilos: no? it works ok from precise
<fwereade> rogpeppe, it looked for a while as if it were
<danilos> rogpeppe, fwereade: want me to try something else before I destroy-environment?
<fwereade> rogpeppe, not always and not for everybody
<rogpeppe> fwereade: the s3 package has, i think had quite a lot of use.
<rogpeppe> fwereade: ah!
<fwereade> rogpeppe, but our public-bucket has only recently started bumping up against 8k of tools info
<rogpeppe> i think it's probably an old go bug
<rogpeppe> fixed in tip
<fwereade> rogpeppe, that sounds encouraging
<rogpeppe> because i just compiled against go 1.1 beta and it works
<fwereade> rogpeppe, that is less encouraging because I have no desire whatsoever to switch language version
<fwereade> feck
<rogpeppe> i just successfully kicked off an instance from a raring-built juju ex
<rogpeppe> e
<rogpeppe> fwereade: yeah i know
<fwereade> rogpeppe, dimitern, mramm: what's the worst thing that could happen if we just trash everything in the bucket older than, say, 1.9.10?
<rogpeppe> fwereade: kittens die?
<rogpeppe> fwereade: do it!
<fwereade> rogpeppe, dimitern, mramm: if it is, as it seems it may be, size-related, that would give us some breathing room
<fwereade> rogpeppe, shit, I don't think I have keys for that bucket
<rogpeppe> fwereade: i think i might have
<TheMue> strange, even ssh-keygen for the host doesn't help
<dimitern> fwereade: go for it
<fwereade> TheMue, which authorized-keys are you using?
<rogpeppe> fwereade: i just PM'd you some that might work
<mramm> I'm fine with trashing old tools now
<TheMue> fwereade: i've done a ssh-keygen -R ec2-... for the dns name
<mramm> as long as fwereade thinks it makes sense ;)
<rogpeppe> TheMue: this is what i do to ssh in to an instance: ssh -i $home/.ec2/rog.pem ubuntu@$1
<fwereade> mramm, I *think* it does, but I am drawing a blank on what fricking tools I should be using
<dimitern> so bootstrapping from Q to R gives me the cloud-init-output.log up to "processing triggers for ureadahead"
<fwereade> dimitern, and apparently no scripts run, right?
<mramm> it would be good to make copies of the older tools somewhere before deleting them all
<dimitern> fwereade: how can I tell - mongo seems installed and running
<mramm> that said, I *doubt* we've hit the maximum size limit on s3
<fwereade> dimitern, anythig starting with "juju" in /etc/init?
<dimitern> fwereade: but status is failing with "2013/04/19 16:37:05 ERROR state: connection failed, paused for 2s: dial tcp 54.216.30.85:37017: connection refused"
<fwereade> mramm, just an 8k chunk size for that particular response
<fwereade> dimitern, we shouldn't be starting that mongo at all actually
<mramm> ahh
<fwereade> dimitern, it should be "juju-db" or something
<dimitern> fwereade: no juju* in /etc/init/
<dimitern> fwereade: wanna see the full c-i-o.log?
<TheMue> dimitern: that's what i got too when doing juju status
<fwereade> dimitern, I've seen it plenty of times
<fwereade> dimitern, TheMue: there is no controversy about what happens, we just want to figure out why ;p
<dimitern> fwereade: how should it look like when it works?
<mramm> fwereade: I'm fine with moving the old ones out, and testing to see if that makes a difference
<dimitern> fwereade: I mean, what fails to run exactly?
<fwereade> dimitern, AFAICT, none of the scripts we set ourselves
<fwereade> dimitern, it's just the packages
<fwereade> dimitern, hmm, maybe poke around in the actual userdata from the metadat service
<fwereade> dimitern, just to verify that we do have sane input data
<fwereade> dimitern, not that I really doubt it
<fwereade> dimitern, the stuff we want to run is all in environs/cloudinit/cloudinit.go
<dimitern> fwereade: i'll look
<fwereade> dimitern, there's an addScripts func that's called about 100 times :)
<hazmat> dimitern, ec2metadata is installed for looking at the raw metadata service
<hazmat> also in /var/lib/cloud/instance/user-data.txt
<fwereade> dimitern, if the userdata has stuff that looks familiar from there, we can start getting serious about blaming cloudinit ;p
<hazmat> dimitern, its not particularly important.. but because juju doesn't specify an amz keypair name, its not actuall installed. amz does not install a default key pair on instances, unless one is specified. its cloudinit running and dropping in the key thats working. it can be seen
<hazmat> fwereade, unlikely
 * hazmat is trying it out on raring too
<fwereade> hazmat, well, indeed, so hopefully we *will* be finding that we have fucked out input data somehow
<hazmat> fwereade, what was the fix for 2013/04/19 07:43:31 ERROR command failed: use of closed network connection
<hazmat> just apply the pastebin patch?
<fwereade> hazmat, I think so -- but I haven't actually verified that one myself
<dimitern> there are 2 files in /var/lib/cloud/instance with similar names: user-data.txt (9616) and user-data.txt.i (60748)
<hazmat> i'm not able to bootstrap on ec2 atm  because of it..
<hazmat> dimitern, the first one
<hazmat> oh
<hazmat> its compressed with juju-core
<dimitern> hazmat: what's the .i one?
<dimitern> hazmat: it looks weird - like a mail message dump
<dimitern> there's a bug reported about raring ignoring runcmd in cloudinit: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1158724
<hazmat> ouch
<rogpeppe> rofl
<rogpeppe> that might just have something to do with it
<fwereade> well, fuck me
<dimitern> and several others related: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1103881 (marked as dupe of the one above)
<rogpeppe> dimitern: nice one
<dimitern> so raring's cloudinit's fucked then
<hazmat> dimitern, thats not correct
<hazmat> dimitern, you can see it trying to run in the cloudinit output
<hazmat> its getting confused by the cert stuff
<hazmat> cloud init output .. http://paste.ubuntu.com/5721757/
<rogpeppe> hazmat: it doesn't get that far for me
<hazmat> rogpeppe, this is with fresh trunk
<dimitern> hazmat: in my case it never got that far
<hazmat> and upload-tools
<hazmat> without upload-tools i can't bootstrap because of the closed network conn error
<rogpeppe> hazmat: it stops just after line 176 on your paste there
<dimitern> hazmat: try danilos's patch above
<rogpeppe> hazmat: with a "Processing triggers for ureadahead ..." line
<dimitern> rogpeppe: exactly like here
<rogpeppe> hazmat: the "closed network conn" error is, i'm pretty sure, a go1.0.2 bug. it's fixed in tip. it may even be fixed in 1.03 - i'll just try that.
<danilos> rogpeppe, fwereade: fwiw, I've tested go 1.0.3 from https://launchpad.net/~gophers/+archive/go/+build/3851809 and based on test case from http://code.google.com/p/go/issues/detail?id=4704 I've created my own at http://pastebin.ubuntu.com/5721759/ which consistently fails with raring go 1.0.2 and succeeds with this 1.0.3 from niemeyer's PPA
<danilos> rogpeppe, just as you were saying that :)
<hazmat> we should really have 1.0.3 in raring..
<rogpeppe> hazmat: +100
<rogpeppe> danilos: ok, so it was fixed in 1.0.3, the actual go release
<fwereade> aw hell, 1.0.2 -> 1.0.3 is a theoretically unscary sort of change I guess
<dimitern> fwereade: right now before the release of raring? :)
<fwereade> but there is always the famous difference between theory and practice
<rogpeppe> fwereade: the only problem is 1.0.3 does actually have a bug that affects http retries
<hazmat> Daviey, arosales, can you have someone push golang 1.0.3 for raring?
<rogpeppe> fwereade: i don't *think* it affects us, but niemeyer knows much more
<fwereade> rogpeppe, where does that hit us?
<fwereade> rogpeppe, ah ok
<hazmat> rogpeppe, so both versions are busted?
<rogpeppe> hazmat: there have been shitloads of bugs fixed since 1.0.3
<hazmat> rogpeppe, and no 1.0.4 in site?
<mramm> but only in trunk :/
<mramm> 1.1 is coming
<mramm> soon and very soon
<rogpeppe> hazmat: 1.1 is on feature freeze now
<fwereade> TheMue, any luck so far?
<niemeyer> danilos, rogpeppe: Probably related to issue 4914
<TheMue> fwereade: just trying from a different machine
<mramm> so for this one, can we just add the workaround?
<niemeyer> It's not about 1.0.2 vs 1.0.3.. it's about a patch someone cowboyed on the Debian package
<rogpeppe> niemeyer: ah
<mramm> changing go versions at this point will be counterproductive I think
<fwereade> mramm, agreed
<hazmat> arosales, Daviey pls ignore 1.0.3 is apparently broken and we have workarounds for 1.0.2
<niemeyer> and AFAIK it's still there, despite me trying to find a new maintainer for the package on the ML
<hazmat> niemeyer, oh.. can we yank that patch
<mramm> niemeyer: that is no good
<niemeyer> if there are any proud Debian package maintainers around, taht'd be a great time :)
<niemeyer> hazmat: We should really update to 1.0.3 instead
<rogpeppe> niemeyer: there's that problem with 1.0.3 that you encountered. is that going to a problem for us?
<niemeyer> rogpeppe: That was about rietveld, IIRC
<niemeyer> rogpeppe: We don't have to build lbox with 1.0.3
<rogpeppe> niemeyer: that's what i thought
<rogpeppe> niemeyer: but you might have used similar techniques in goamz, i thought
<niemeyer> rogpeppe: I don't *think* so..
<rogpeppe> niemeyer: if you haven't then we're all good :-)
<mramm> what was the bug though?   Have we tested on 1.0.3?  Are we going to hit it somewhere else?
<niemeyer> rogpeppe: The missing feature in 1.0.3 is the ability to break redirections
<niemeyer> rogpeppe: Which we use with Rietveld to catch a cookie in-flight
<dimitern> niemeyer: yeah, that's why I had to patch lpad for lbox to work for me on 1.0.3
<niemeyer> rogpeppe: 1.0.3 bogusly broke the ability to that with the http package
<rogpeppe> niemeyer: yeah, i had vague recollections of that
<fwereade> rogpeppe, mramm, dimitern, hazmat: I can confirm that cloudinit does the right thing if we switch off apt upgrade :/
 * rogpeppe wishes niemeyer had pushed harder at the time for a patch to 1.0.3
<dimitern> fwereade: i'll try
<niemeyer> rogpeppe: We survived fine, though
<rogpeppe> fwereade: do we need apt upgrade?
<rogpeppe> niemeyer: true. it's been itchy at times though
<fwereade> rogpeppe, well, it seems in general like a sensible thing to do
<dimitern> fwereade: so just comment out this line: 	c.SetAptUpgrade(true)
<fwereade> dimitern, that's all I did
<fwereade> rogpeppe, maybe it is less important just after a series has been released though ;p
<niemeyer> mramm,hazmat: It's worse than that..
<niemeyer> There's nothing to import from Debian either
<hazmat> fwereade,  interesting.. for some reason it works for me.. in terms of getting to runcmd (us-east-1, trunk w/ upload-tools) http://paste.ubuntu.com/5721791/
<hazmat> one odd thing in that cloud-init .. its adding in the experimental ppa
<fwereade> hazmat, ISTM you're bootstrapping into precise
<hazmat> maybe that's  for mongodb
<dimitern> fwereade: i can confirm it works for me with that line commented out
<hazmat> fwereade, doh.. if that's the default then yes
<fwereade> hazmat, which it will if you don't specify otherwise -- that's what default-series now defaults to
<mramm> right
<hazmat> i thought the default was the client series
<mramm> awesome
<fwereade> hazmat, it was, but 99% of charms are for precise
<fwereade> hazmat, the ideal would maybe be to separate bootstrap-series from default-series
<fwereade> hazmat, but in practice people who want to `juju deploy wordpress` want precise as a default
<rogpeppe> now that we've isolated the issues, i'm going for the rest of my lunch break :-)
<fwereade> rogpeppe, enjoy
<fwereade> rogpeppe, tyvm
<mramm> fwereade: but I think this is fine
* ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: - | Bugs: 2 Critical, 61 High - https://bugs.launchpad.net/juju-core/
<dimitern> fwereade: we can add a script calling apt-get upgrade at the end maybe?
<hazmat> fwereade, makes sense.. sorry for the confusion
<fwereade> dimitern, ha, yes, we could -- that's nice
<mramm> and I think it would even be fine to *always* start the bootstrap machine on the LTS
<dimitern> fwereade: i'll try it out and if it works will propose it
<mramm> dimitern: sounds like a good move
<mramm> niemeyer: do you know who has keys to the public tools bucket in amazon?
<fwereade> mramm, I think I would prefer to stick with the existing behaviour if we can get it working right
<niemeyer> mramm: I was hoping that only David would do, but I think he gave to someone else as well
<mramm> fwereade: agreed
<niemeyer> mramm: i have them as well, obviously, as I created the bucket
<niemeyer> mramm: I mean, not his keys, but access to the bucket
<mramm> the underlying upgrade issue is: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1124384
<hazmat> that's a dup of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1103881
<mramm> niemeyer: can you share permissions on the bucket to fwereade?
<dimitern> so adding "apt-get upgrade -y" at the end of scripts seems to work
<fwereade> dimitern, I don't think it should be at the end really, better to have it at the beginning
<dimitern> fwereade: the main problem is upstart, if we upgrade it too early, the same issue will happen
<dimitern> fwereade: and i don't think it matters when, all relevant stuff will be restarted anyway after
<fwereade> dimitern, unless it's still running while a charm tries to use apt itself
<fwereade> dimitern, academic for now but a potential landmine all the same
<dimitern> fwereade: I can not do it at all, if you think it's riskier to have it
<dimitern> fwereade: just leave the commented out part, but remember this will affect all series, not just raring
<fwereade> dimitern, I think we should definitely be special-casing it for raring
<fwereade> dimitern, if cfg.Tools.Series == "raring"
<arosales> hazmat, ack
<dimitern> fwereade: ah, good point, will do
<fwereade> dimitern, and in other cases just do a normal SetAptUpgrade
<dimitern> fwereade: alas cfg.Series is unknown
<fwereade> dimitern, look up
<fwereade> dimitern, cfg.Tools.Series
<dimitern> fwereade: there's cfg.Tools.Series instead - should I use that?
<dimitern> fwereade: ok\
<fwereade> dimitern, that's what I said to begin wth ;
<dimitern> fwereade: sorry :)
<fwereade> no worries :)
<dimitern> fwereade: switching too fast between sessions
<fwereade> dimitern, I know the feeling ;P
 * danilos is off for the day and week, enjoy it everyone
<dimitern> danilos: happy weekend! and thanks for debugging!
<fwereade> danilos, enjoy, and thank yu very much
<danilos> cheers
<TheMue> fwereade: just a short intermediate information, it looks like i've got a ssh config problem. :( i only can reach one of my two private hosts, no other one. have to look why :/
<fwereade> dimitern, ok, so, doing it at the end is probably ok
<TheMue> fwereade: so it's no wonder i can't reach any ec2 host
<dimitern> fwereade: i already did it in the beginning and testing now
<fwereade> dimitern, well if that works I think it'd be best
<fwereade> dimitern, but you made a good point ;)
<dimitern> fwereade: cheers
<fwereade> TheMue, which authorized-keys are you using?
<fwereade> TheMue, surely you can usually ssh to your machines, right?
<dimitern> it'll be a bitch to test it - i have to clone a bunch of cloudinit outputs and force series to raring just for these 2 lines of code
<fwereade> dimitern, shouldn't be *too* bad though
<TheMue> fwereade: to my machine?
<TheMue> fwereade: ah, wrong read
<fwereade> TheMue, when you run juju, can you usually `juju ssh 0`?
<TheMue> fwereade: it has been possible, but not now anymore. so i tried two private hosts. the one is working, the other not. i'm puzzled.
<dimitern> aaaaand it works!
 * fwereade cheers at dimitern
<TheMue> dimitern: applause
<fwereade> TheMue, ok, we *know* it's not going to work on raring without dimitern's fix
<TheMue> fwereade: great
<fwereade> TheMue, none of the juju commands will
<fwereade> TheMue, because no state servers or agents or anything are started
<TheMue> fwereade: but i just also tested precise, just to make sure that this is not the reason, and my box still fails
<TheMue> fwereade: running without a state server is a bit, hmm, useless :D
<fwereade> TheMue, ok, so, for the 3rd time of asking, which public key are you authorizing?
<fwereade> TheMue, and can you ssh to it directly if you `ssh -i appropriate-private-key ubuntu@blahblahblah`?
<TheMue> fwereade: i've tried with the private key in my .ssh folder
<TheMue> fwereade: i've got to admit i've never directly connected to ec2 before, never needed it
<TheMue> fwereade: and that missing experience now is my problem
<fwereade> TheMue, do you maybe have a strange authorized-keys, or authorized-keys-path, configured?
<dimitern> TheMue: can you please go here: https://portal.aws.amazon.com/gp/aws/securityCredentials
<TheMue> fwereade: can't remember, but i'll take a deeper look. normally i use everything as standardized as possible
<dimitern> TheMue: authenticate beforehand, then go to Key Pairs tab
<dimitern> TheMue: create a new key pair, download the private key, save it to your ~/.ssh/, chmod it to 600, then add the snipped I pasted to the kanban meeting into your ~/.ssh/config
<dimitern> TheMue: after that it should work and you will be able to ssh without problems
<TheMue> dimitern: i'm doing
<TheMue> dimitern: could you please paste that snipped again?
<fwereade> dimitern, what happens to you if you comment out that line from your config and just `ssh ubuntu@blah`?
<dimitern> fwereade: without the ssh config like this it fails (pubkey auth error), I have to use ssh ubuntu@blah -I ~/path/to/key
<fwereade> dimitern, so it doesn't automatically pick the right key? do you have loads of them set up or something? :)
<dimitern> fwereade: because i hate typing on the console more than i should, i added the ssh config to save me some typing, that way i can do just ssh blah and it works, as long as the dns name ends with .computeaws.com
<dimitern> fwereade: i have like 10 keys in there
<fwereade> dimitern, so, hmm, maybe we are picking a first choice that ssh doesn't?
<dimitern> fwereade: most likely, yeah
<fwereade> dimitern, ok, cool, that makes sense
<TheMue> hmm, error changes, now i have a timeout. but eu-west-1 is slow the whole day.
<dimitern> TheMue: it usually takes me 2-3 mins to connect with ssh, after a successful bootstrap on eu-west-1
<TheMue> dimitern: bootstrap has been a longer time ago ;)
<dimitern> fwereade, rvba: I'm seeing test failures related to maas in trunk now - any clue?
<fwereade> dimitern, update maaslib
<fwereade> dimitern, or whatever it's called
<dimitern> fwereade: ok, cheers
<dimitern> rogpeppe: what about danilos's fix to goamz?
<rogpeppe> dimitern: it's the wrong fix
<rogpeppe> dimitern: we really need to use a non-broken version of go
<rogpeppe> dimitern: i think that danilos' fix will probably break other things
<dimitern> rogpeppe: but won't the fix help us with the current release at least?
<rogpeppe> dimitern: there's a good reason why Close is set to true
<dimitern> rogpeppe: not really - HTTP/1.1 + Keep-Alive has been around since forever now
<dimitern> rogpeppe: if go implementation is crack, that might be a reason
<rogpeppe> dimitern: i don't think you can reuse connections to an S3 server.
<rogpeppe> dimitern: i may be wrong - niemeyer will know why Close is true there.
 * rogpeppe wonders if there's any chance of go1.0.3 going into raring
<dimitern> rogpeppe: you can, but up to 100 reqs on the same connection, according to the official docs
<dimitern> rogpeppe: https://forums.aws.amazon.com/thread.jspa?threadID=91402
<dimitern> rogpeppe: we should ask Daviey and/or arosales perhaps?
 * arosales reads backscroll
<dimitern> arosales: basically what's the chance of including go 1.0.3 instead of 1.0.2 in raring?
<arosales> dimitern, that definitely an ubuntu dev uploader question
<arosales> dimitern, but what is the delta?
<rogpeppe> dimitern: so if we don't set Close, then it will die randomly after maybe 10 requests
<rogpeppe> arosales: do you mean how big is the difference between 1.0.2 and 1.0.3 ?
<arosales> rogpeppe, correct
<rogpeppe> arosales: there's no difference really apart from bugs fixed
<rogpeppe> arosales: that's not *strictly* true, but for our purposes it is
<arosales> rogpeppe, how big are the bug's patches?
<dimitern> fwereade, rogpeppe: https://codereview.appspot.com/8648047 - raring fix
<rogpeppe> arosales: for this particular bug?
<arosales> rogpeppe, for the bug fixes between 1.0.2 and 1.0.3
<arosales> rogpeppe, the main issue to the package will be the delta in the changes.
<dimitern> arosales: and 1.0.3 has been mainstream for quite some time now (months)
<arosales> reason I am asking
<arosales> dimitern, gotcha I am trying to just grasp the package delta from .2 to .3 to give better input on the package upload question
<dimitern> rogpeppe: can you prepare a delta easily?
<arosales> given an ubuntu dev with upload rights, such as Daviey, would need to weigh in. But I think he would have similar questions.
<niemeyer> arosales: http://code.google.com/p/go/source/list?name=release-branch.go1
<dimitern> arosales: I assume you ask for the diff between 1.0.2 and 1.0.3. releases?
<rogpeppe> arosales: the delta in the go source tree between 1.0.2 and 1.0.3 is 22484 lines
<arosales> niemeyer, thanks
<arosales> dimitern, yes
<fwereade> niemeyer, that bucket is still yours, right?
<niemeyer> fwereade: It is
<fwereade> niemeyer, because I think that if we just delete all the tools older than, say, 1.9.10 (a month ago) we will cut the XML down comfortably below 8k
<rogpeppe> arosales: well, that's the context diff anyway
<arosales> dimitern, mramm, do you know if Daviey had sponsored the upload yet?
<arosales> rogpeppe, gotcha
<niemeyer> fwereade: XML?
<dimitern> arosales: not really, no
<fwereade> niemeyer, and buy ourselves some breathing room without messing with either last-minute cowboy hacks to x3, or changing the platform we build on
<mramm> arosales: not yet I don't think
<fwereade> niemeyer, it only started happening with the last release
<fwereade> niemeyer, AFAIWCT the relevant code has not changed
<niemeyer> fwereade: Sorry, I'm out of context
<fwereade> niemeyer, but the response that gives us trouble got to ~8k at that point
<fwereade> niemeyer, ah sorry
<fwereade> niemeyer, when we  list the juju-dist bucket, we get this "connection closed" error when trying to decode the XML
<rogpeppe> niemeyer: the XML in the LIST response
<arosales> mramm, ok so then it may not be as big of an issue to get .3 uploaded over .2. The next question would be stability/testing which I am guessing is better in .3
<fwereade> niemeyer, and if we ReadAll to see what we get before trying to decode, we see that it cuts off suspiciously close to 8k
<rogpeppe> arosales: .3 has generally been used considerably more than .2 by my understanding
<niemeyer> fwereade: I see
<fwereade> niemeyer, if we set Close: false on the request we do get all the data, but I'm not sure what other consequences might be lurking there
<dimitern> fwereade: limit on s3 connection reuse, for one
<dimitern> fwereade: https://forums.aws.amazon.com/thread.jspa?threadID=91402
<fwereade> niemeyer, IMO source hacks, and platform changes, are both much riskier and more potentially destabilizing than just trashing some old tools
<niemeyer> arosales: Some of these patches will need to be yanked as well
<niemeyer> arosales: The offending one, mainly
<niemeyer> arosales: But possibly others
<niemeyer> fwereade: I don't mind trashing the old tools, but I disagree with the overall principle
<niemeyer> fwereade: This is putting smoke around the actual problem
<arosales> niemeyer, ok and that could be a SRU if needed too
<fwereade> niemeyer, I'm not holding this up as a good solution :)
<niemeyer> arosales: Right, very much think so
<fwereade> niemeyer, I am proposing it as the least risky way to give ourselves the breathing space to resolve the actual problem
<niemeyer> arosales: There's some useful background here too: http://code.google.com/p/go/issues/detail?id=4914
<niemeyer> arosales: Which describes how the bogus patch came to get into the package, and never leave for whatever reason
<niemeyer> arosales: The Debian package is quite poorly maintained right now
<arosales> niemeyer, thanks for additional info
<niemeyer> fwereade: Understood.. I'm saying it doesn't sound less risky
<niemeyer> fwereade: Saying "we think it breaks around 8k so let's reduce the payload" is a total guess, and doesn't address or describe the real cause of the issue
<fwereade> niemeyer, we could be screwed tomorrow by s3 sending smaller chunks, you mean? or something more subtle?
<dimitern> bump: https://codereview.appspot.com/8648047/
<niemeyer> fwereade: This is the real bug: http://code.google.com/p/go/issues/detail?id=4914
<niemeyer> fwereade: If it's not addressed, the bug is still there
<fwereade> niemeyer, agreed
<fwereade> niemeyer, but I am pretty sure that switching go version is riskier, and hacking goamz is... *slightly* hackier
<fwereade> niemeyer, smarter solutions accepted with joy and gratitude, ofc
<rogpeppe> fwereade: i don't think switching go version is risky
 * fwereade raises an eyebrow
<rogpeppe> fwereade: we've always been testing with different go versions
<rogpeppe> fwereade: i'm pretty sure we're robust in that regard
 * dimitern everybody seems to be ignoring my fix, and i though we're in a hurry
<mramm> rogpeppe: have we been testing with 1.0.3?
<niemeyer> fwereade: I think not fixing the bug is riskier than fixing it
<mramm> I thought we tested with 1.0.2 and tip mostly
<niemeyer> fwereade: In either case, I'll remove the old tools as requester after lunch
<niemeyer> brb
<rogpeppe> mramm: i've been testing with 1.0.3 and tip interchangeably
<mramm> can we patch just that bug and release 1.0.2.1
<mramm> ?
<dimitern> mramm: i'm only using 1.0.3.
<mramm> dimitern: rogpeppe: sounds like we are testing
<mramm> good deal
<dimitern> the only issue i had with 1.0.3. is the "redirect blocked" error with lbox/lpad
<fwereade> dimitern, sent a couple of comments
<fwereade> rogpeppe, I am concerned that just running the tests, and maybe a simple env or two, is not enough to say it's not risky
<rogpeppe> fwereade: i've bootstrapped --upload-tools and deployed etc
<fwereade> rogpeppe, but perhaps I mischaracterise the effort you have been putting into his
<rogpeppe> fwereade: and also a lot on tip, which is considerably different again, and we work there fine
<fwereade> rogpeppe, I just don't think that we have time to reasonably verify a change of that sort
<dimitern> fwereade: i did test both cases - there was a raring specific test already, which i changed, and the other is non-raring specific
<fwereade> dimitern, sweet, sorry
<fwereade> dimitern, that's much nicer than I feared then
<mramm> I think upgrading go in the archive is unlikely
<dimitern> fwereade: surprisingly, me too :)
<mramm> given timeline
<fwereade> dimitern, ok, LGTM with trivial rearrangement of apt-related settings
<dimitern> fwereade: cheers
<mramm> upgrading juju after feature-freeze is one thing
<mramm> it is an applicaiton
<fwereade> mramm, regardless of timeline I have never upgraded a framework version, let alone a language version, without encountering... surprises
<mramm> but tools like go really *should* be frozen
<mramm> fwereade: understood
<dimitern> mramm: it's not like any other project in the archive is using go, right?
<mramm> dimitern: I don't know
<dimitern> mramm: and the users would likely want the latest stable go version, which was released 12 sept 2012
<mramm> dimitern: if we wanted to do that, it would have been good
<dimitern> mramm: it can be checked trivially by finding packages that depend on it
<mramm> dimitern: but doing it after feature freeze, and then after final freeze -- that is not the time
<dimitern> rogpeppe: can you take a look too please? https://codereview.appspot.com/8648047/
<fwereade> mramm, dimitern, niemeyer, rogpeppe, et al: I believe the safest path is to stick with 1.0.2, paper over the problems, and get onto 1.0.3 as soon as we can after the release, so we can get an update that works properly into our users' hands ASAP after that. I am well aware that if this fucks up, it is on my head
<mramm> I agree that those processes are there to serve users, so if nobody else uses go from the archive.....
<mramm> dimitern: also it is not just about packages in the archive, it's about applications our users build with the "released version" of go
<fwereade> but I have done the last-minute-cowboy thing in the past, and it has not had the success ratio I might have hoped for
 * mramm admits that most go users are probably not using a packaged version of go
<rogpeppe> fwereade: tbh i think it's ridiculous that raring isn't shipping with the latest version of go anyway
<dimitern> mramm: yeah, my point was the users will likely want a better version of go with more fixes (if they're not already using it by manually upgrading the one in the archive)
<fwereade> so I do not believe that I can in good conscience approve this change
<fwereade> rogpeppe, agreed, but IMO orthogonal
<mramm> dimitern: I doubt that matters to many users
<dimitern> we're pulling straws here..
<mramm> rogpeppe: that is very true, but we can't fix that anymore
<dimitern> ("grasping at" apparently) :)
<rogpeppe> dimitern: i agree with you
<rogpeppe> dimitern: and i think that juju has had so little live testing that it makes no significant difference at this stage
<rogpeppe> fwereade: ^
<dimitern> rogpeppe: +10
<rogpeppe> fwereade: we're gonna hoof loads of bugs out anyway
<rogpeppe> fwereade: and at least we won't be doing it against an old and buggy version of Go
<niemeyer> fwereade: I don't think it's orthogonal.. the version of Go in the archive is *broken*
<niemeyer> fwereade: and juju is being directly affected by it
<niemeyer> fwereade: Fixing this isn't cowboying.. it's using the freeze process for what it's meant to be used for
<fwereade> rogpeppe, niemeyer: I cannot recall a single case in which a significant framework or library upgrade has been free of unpleasant surprises
<fwereade> rogpeppe, niemeyer: my experience overwhelmingly directs me to do these just *after* a release, not just *before*
<niemeyer> It's so ironic that I'm involved in fixing this now, when my request to be the package maintainer in Ubuntu was declined
<niemeyer> fwereade: I can recall many of those
<rogpeppe> fwereade: we have already tested against this upgrade many times
<niemeyer> fwereade: In the case of minor release updates
<niemeyer> fwereade: Patch release updates, in fact
<niemeyer> fwereade: Happens all the time in Ubuntu
<niemeyer> fwereade: It's 1.0.2 to 1.0.3.. it's not 1.0 to 1.1, or 1.0 to 2.0
<fwereade> niemeyer, maybe I do the golang guys a disservice -- I probably do -- but even if I had perfect knowledge of the consequences, I'm concerned that it's impractical
<niemeyer> fwereade: I can't address your psychological feelings I'm afraid
<fwereade> niemeyer, and, sure, it happens all the time in ubuntu; but the professionally paranoid stick to LTSs for that reason
<niemeyer> fwereade: It happens in LTSs as well
<niemeyer> fwereade: That's why LTSs exist, in fact
<fwereade> of a series of unappealing options, I choose the one that least perturbs all the other things we don't know we depend upon
<fwereade> and I must now be away, lest marital strife come to pass
<fwereade> I'm sorry to disappoint
<rogpeppe> fwereade: what's your solution?
<niemeyer> I have no idea, but apparently he doesn't want to see the real bug fixed
<fwereade> rogpeppe, fix it ASAP *after* the release
<fwereade> niemeyer, please do not mischaracterise my position like that
<dimitern> rogpeppe: ping
<niemeyer> fwereade: I'm not, by all means
<rogpeppe> fwereade: what do you think is the worst can happen?
<niemeyer> fwereade: I want to see the bug fixed in raring
<rogpeppe> dimitern: pong
<niemeyer> fwereade: You're suggesting we don't do that
<niemeyer> fwereade: It's as simple as that, I think
<dimitern> rogpeppe: sorry being a pest - https://codereview.appspot.com/
<dimitern> rogpeppe: *for being*
<fwereade> niemeyer, I don't think it's remotely acceptable to swap out the go version *after* final freeze
<fwereade> niemeyer, if it wasn't a big enough deal before, it's surely not now
<rogpeppe> dimitern: is there a CL number i should be interested in?
<niemeyer> fwereade: What's final freeze for, if we can't fix real bugs in that period?
<niemeyer> fwereade: Huh.. okay
<dimitern> rogpeppe:  :) oops - https://codereview.appspot.com/8648047/
 * niemeyer moves on to other things then
<fwereade> niemeyer, I *wish* we had encountered this 2 weeks ago
<fwereade> niemeyer, but we did not
<fwereade> niemeyer, thank you for your forbearance
<rogpeppe> fwereade: so what test are you worried might fail?
<rogpeppe> fwereade: if our live tests pass, we can deploy and add relations etc, all with go1.0.3, what's not tested there that we have tested with the other stuff?
<niemeyer> rogpeppe: Should we mention that we actually use tip?
 * niemeyer ducks
<rogpeppe> niemeyer: i just might have already mentioned that a few times
<rogpeppe> fwereade: tip is hundred times as different as 1.0.3 from 1.0.2 and it works fine
<rogpeppe> fwereade: i really think your paranoia is unwarranted here
<rogpeppe> fwereade: because noone is going to use this version in production *anyway*
<rogpeppe> dimitern: LGTM assuming live tests pass against raring
<dimitern> rogpeppe: yeah, tested 3 times
<dimitern> rogpeppe: with default-series: raring
<rogpeppe> dimitern: cool
<TheMue> dimitern: great
<dimitern> TheMue: did you manage to fix your ssh issues?
<TheMue> dimitern: i'm in progress, cleaning up a bit during this evening.
 * dimitern this is ridiculous! i have to reboot..
<TheMue> dimitern: can't believe *lol*
<dimitern> TheMue: for the past 2-3h whatever I do the machine freezes for a second about every 5s or so
<rogpeppe> right, i'm done here
<dimitern> well, not everywhere, but mostly in emacs and terminal
<rogpeppe> see y'all monday
<rogpeppe> sunny evening, yay!
<dimitern> rogpeppe: have a good weekend!
<rogpeppe> dimitern: and you
<TheMue> rogpeppe: have a nice weekend
<rogpeppe> TheMue: and you too
<rogpeppe> fwereade: and you also
<TheMue> rogpeppe: thx
<dimitern> yeah.. i'm off as well
<dimitern> see you guys and take care
<TheMue> dimitern: have a nice weekend too
<dimitern> TheMue: same to you :)
<TheMue> dimitern: thx
<mramm2> internet here is pretty terrible
<mramm2> so IRC is totally unreliable
#juju-dev 2013-04-21
<Daviey> awd: wassuo?
<thumper> morning
<bigjools> morning
<thumper> hi bigjools
<bigjools> how's tricks thumper
<thumper> bigjools: a little frustrating... debugging stuff
<thumper> in a live environment
<thumper> so messing with the ec2 console
<thumper> and trying to bootstrap in sydney :)
<thumper> although, pretty happy with this week
<thumper> wednesday and thursday off
<thumper> wednesday so I can go the aerosmith concert
<thumper> thursday is a public holiday
<bigjools> thumper: yeah thu is hol here too
<bigjools> nice to get to see Aerosmith - I wanted to go down to Sydney on Saturday to see them and Van Halen at Stonefest, alas my son inconveniently had a birthday
<thumper> haha
<thumper> I've not been to a big band concert like this
<thumper> seen exponents in a london walkabout
<thumper> and sting at the royal albert hall
<thumper> but not a rock concert before
<thumper> so it'll be interesting
<thumper> there are like three warm up acts
<thumper> so about 6 hours of music
<thumper> just so people are clear, one of the things I think about when punching a heavy bag is "don't use fucking one character variable names!"
<bigjools> thumper: lol
<bigjools> extend that to one-char struct names :)
<thumper> bigjools: it seems to help :)
 * thumper twitches
 * thumper takes a deep breath
<thumper> 20 line function has m, p, k, g, and ps
<wallyworld> thumper: it's like Seasame St - this function was brought to you by the letters "m, p, k, g" :-D
<thumper> thanks wallyworld
<thumper> that makes things so much better :)
<wallyworld> pleased to help :-)
#juju-dev 2014-04-14
<thumper> davecheney: does this bug still happen? https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1304167
<_mup_> Bug #1304167: syntax error, trusty beta-2 cloud image <apparmor (Ubuntu):Confirmed> <https://launchpad.net/bugs/1304167>
<wallyworld> thumper: maybe you were smoking weed when you wrote the email
<thumper> davecheney: seems like a quite major bug if so
<thumper> wallyworld: nah...
<thumper> although I am wondering if it would help
<wallyworld> couldn't hurt :-)
<thumper> ha
<davecheney> thumper: would it be possible for you to log
<davecheney> "%T", err
<thumper> davecheney: sure
<davecheney> thanks
<davecheney> thumper: yes, the bug is still open
<davecheney> it has screwed LXC on any platform that uses apparmor
<thumper> :-(
<davecheney> thumper: when you run the destroy-enviromnet, you're not in that directory are you
<davecheney> ie; mkdir /tmp/t
<davecheney> cd /tmp/t
<davecheney> rmdir /tmp/t
<thumper> davecheney: no
<davecheney> ok, just checking
<davecheney> http://gcc.gnu.org/releases.html
<davecheney> gcc 4.9 released
<davecheney> but not really
<thumper> if you destroy too close to bootstrap, you don't get it
<davecheney> thumper: hmm ok
<thumper> oh...
<thumper> I think I know what it could be...
<davecheney> thumper: hold pls
<thumper> when we kill the machine agent with pkill
<thumper> it cleans up after itself
<thumper> we then have a race
<davecheney> thumper: right, so things are racing on the directory listing
<thumper> the agent is trying to remove some files
<thumper> and then so does the destroy command
<davecheney> http://golang.org/src/pkg/os/error_unix.go
<davecheney> so is the agent removing ~/.juju/local ?
<davecheney> ie it's not a file
<davecheney> but the top level directory itself ?
<davecheney> so os.RemoveAll goes to remove ~/.juju/locla
<davecheney> and the whole thing has been deleted already ?
<thumper> not all of it...
<thumper> but some of it
<thumper> oh...
<thumper> yeah, sometimes all of it
<thumper> yeah...
<thumper> it does
<thumper> *os.SyscallError
<thumper> they are racing to remove the datadir
<davecheney> thumper: ok, that should be possible to make a repro
<davecheney> i'll do that while i'm waiting for gccgo to compile
<thumper> davecheney: what do you think should happen?
<davecheney> 10:12 < thumper> *os.SyscallError
<davecheney> ^ is that %T ?
<thumper> yeah
<davecheney> cheaky bugger
<davecheney> thumper: leave it with me
<davecheney> raise an issue maybe
<davecheney> i need to make a repro
<thumper> davecheney: you see it as a golang bug?
<davecheney> thumper: it won't fit through http://golang.org/src/pkg/os/error_unix.go
<davecheney> http://play.golang.org/p/mp5i8GFL47
 * davecheney goes to find out where that os.SysclalError is coming from
<davecheney> thumper: for the moment you'lre going to have to code around it
<davecheney> this won't be fixed in 1.2
<davecheney> dir_unix.go
<davecheney> 41:                             return names, NewSyscallError("readdirent", errno)
<davecheney> this is where it's coming from
 * davecheney feels very depressed
<davecheney> it's just bugs, bugs, and more bugs
<thumper> davecheney: I'll work around it
<thumper> davecheney: we already ignore errors from two other things that we are racing with
<davecheney> thumper: i'll get a repro quick smart
<davecheney> i can see where it happens
<waigani> morning davecheney.
<waigani> davecheney: when I run make check on vm I get the following: http://pastebin.ubuntu.com/7246968
<waigani> any hints?
<waigani> thumper: wip on jujud isolation: https://codereview.appspot.com/87130045
<waigani> thumper: cmd/juju and environs/bootstrap are now passing
<waigani> environs/sync is going to take a bit more thought
<waigani> and right now I'm too hungry to think
<davecheney> waigani: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304754
<_mup_> Bug #1304754: gccgo on ppc64el using split stacks when not supported <ppc64el> <trusty> <gccgo-4.9 (Ubuntu):Confirmed> <https://launchpad.net/bugs/1304754>
<waigani> davecheney: reading
<davecheney> waigani: short versoin
<davecheney> downgrading to an older kernel works around the problem
<davecheney> but isn't a fix
<waigani> davecheney: yep, thanks
<waigani> I neeeeed food. bbl
<davecheney> thumper: if err, ok := err.(*os.SyscallError); ok { if os.IsNotFound(err.Err) }
<davecheney> or something
<thumper> axw: just saw your answer too
<thumper> axw: however the error that is being returned isn't os.IsNotExist
<thumper> axw: as the race is being caught elsewhere
<axw> thumper: ah, maybe in the Readdir then
<axw> anyway, there's definitely a race, and you should ignore it I think
<davecheney> thumper: lucky(~/devel/issue) % go run issue.go
<davecheney> 2014/04/14 10:58:58 creating temporary directories rooted at "/tmp/issue015782153"
<davecheney> 2014/04/14 10:58:59 preparing workers
<davecheney> 2014/04/14 10:58:59 release the swarm
<davecheney> 2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"
<thumper> ah... read-dir-int
<davecheney> 2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"
<davecheney> 2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"
<davecheney> 2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"
<thumper> not re-addir-int
<davecheney> thumper: raising an issue
<thumper> axw: yeah, that's it
<thumper> I couldn't parse the smashedtogetherwords
<mwhudson> heh i finally have results for waigani and he's gone
<mwhudson> but i think his problem was actually the "things randomly die on ppc" bug...
<mwhudson> things all in all don't look too bad on arm64 actually
<davecheney> thumper: https://code.google.com/p/go/issues/detail?id=7776&thanks=7776&ts=1397437695
<thumper> mwhudson: \o/
<mwhudson> not actually good
<mwhudson> just not terrible
<davecheney> mwhudson: /usr/include/features.h:374:25: fatal error: sys/cdefs.h: No such file or directory
<davecheney> any suggestions which package contains this header
<mwhudson> uh, no, looks basic though
<mwhudson> hm
<mwhudson> dpkg -S sez libc6-dev-i386
<mwhudson> which seems a bit random
<davecheney> % dpkg -S /usr/include/sys/cdefs.h
<davecheney> libc6-dev-i386: /usr/include/sys/cdefs.h
<davecheney> yeah
<mwhudson> ah
<mwhudson> um
<davecheney> mwhudson: this is compiling gcc 4.9
<mwhudson> "real" libc6-dev installs it to /usr/include/$triplet/sys/cdefs.h
<mwhudson> davecheney: from upstream or the deb?
<davecheney> mwhudson: upstream
<davecheney> mwhudson: our deb produces broken binaries
<mwhudson> davecheney: on powerpc64 i assume?
<mwhudson> um, that sounds like something doko should know about :)
<mwhudson> is this the split stack thing?
<davecheney> yup
<mwhudson> i guess libc6-dev-i386 must be some kind of pre-multiarch thing
 * davecheney tries patching in some of the arguments from /usr/bin/gcc -v
<mwhudson> davecheney: "dpkg --listfiles libc6-dev | grep cdefs.h" on your platform?
<davecheney> $ dpkg --listfiles libc6-dev | grep cdefs.h
<davecheney> /usr/include/powerpc64le-linux-gnu/sys/cdefs.h
<davecheney> maybe ./configure got the tripplet wrong
<davecheney> well, i was wondering why this was such a good compile box
<davecheney> clock           : 4284.000000MHz
<davecheney> ziiing
<davecheney> gcc, just keep adding flags until it compiles
<davecheney> nope
<davecheney> still broke
<davecheney> fuck this
<davecheney> i'm using symlinks
<davecheney> wow. such multiarch
<davecheney> mwhudson: ok, here is what I think
<davecheney> gccgo on ppc is correctly detecting that split stacks are not supported
<davecheney> and using the default 'large' stack model
<davecheney> but .. the stack is still too small
<davecheney> i'm bt'in in gdb and at stack frame 1475 with no end in sight
<mwhudson> haha
<mwhudson> ok
<mwhudson> so stack overflow?
<mwhudson> hmm
<davecheney> make that stack frame 3,300
<mwhudson> is this on the altstack?  i.e. while handing a signal?
<davecheney> so, in summary, gccgo doesn't give a clean indication when you fall off the end of the stack
<davecheney> mwhudson: nope, with split stacks disabled
<davecheney> you get a c style stack per goroutine
<mwhudson> davecheney: that's not what i mean
<mwhudson> sure
<mwhudson> but signals are handled on a different stack again
<mwhudson> (sigaltstack and all that)
<mwhudson> i think those stacks are smaller?
<mwhudson> anyways
<davecheney> mwhudson: i'm going to say, conditionally, yes
<davecheney> mwhudson: the sig handled gets a SEGV
<mwhudson> davecheney: it's easy ish to make the stacks bigger i think
<davecheney> and it blames the topmost stack frame for hittig a nil
<mwhudson> i found the code that was allocating them
<davecheney> when actaully all it did was call a function
<mwhudson> yeah, well, if you fall off the end of the stack it's certainly going to break
<davecheney> mwhudson: are you adding -fsplit-stack on aarch64 ?
<mwhudson> davecheney: no
<davecheney> shit, 5,000 stack frames
<davecheney> how in gods name could juju use so much stack ...
<mwhudson> could this "just" be application infinite recursion for some reason?
<mwhudson> or does the backtrace look reasonable?
<davecheney> mwhudson: the latter
<davecheney> maybe a dozen frames
<davecheney> this is going to be an 8mb stack
<davecheney> 18,000 stack frames
<davecheney> #31380 0x000000001000522c in main.count ()
<davecheney> #31381 0x0000000010005854 in main.main ()
<_mup_> Bug #31381: POMsgSet.active_texts assumes POFile.pluralforms is an int <lp-translations> <oops> <Launchpad itself:Fix Released by matsubara> <https://launchpad.net/bugs/31381>
<_mup_> Bug #31380: source package sort by version doesn't cope with invalid version numbers <lp-foundations> <oops> <Launchpad itself:Fix Released by kiko> <https://launchpad.net/bugs/31380>
<mwhudson> that doesn't sound reasonable
<mwhudson> lolmup
<davecheney> #-1
<mwhudson> although, eh, i guess it works well enough on platforms that do have split stacks
<davecheney> mwhudson: most gccgo developers are on amd64
<davecheney> when I say most
<davecheney> i mean
<mwhudson> all 1 of them?
<davecheney> everyone except you and me and some neckbeard using mips
<mwhudson> strange this doesn't happen on arm64 though
 * davecheney goes to talk to ian taylor
<davecheney> mwhudson: gccgo src/test/peano.go
<davecheney> ./a.out
<mwhudson> i wouldn't have thought that stack frames would be much bigger on that
<mwhudson> well yes, that fails on arm64 too
<davecheney> i wonder if it is unrelated
<davecheney> that gives a straight segfault
<davecheney> and the go handler doens't catch it
<davecheney> i wonder if we're barking up the wrong tree
<davecheney> mwhudson: i'm thinking these are two different issues
<davecheney> [492932.974051] a.out[25065]: bad frame in setup_rt_frame: 000000c20ffaf0e0 nip 0000000010004e0c lr 00000000100051fc
<davecheney> ^ this is what running off the stack looks like
<davecheney> note nip
<davecheney> [2028013.988376] jujud[400]: bad frame in setup_rt_frame: 0000000000000000 nip 0000000000000000 lr 0000000000000000
<davecheney> ^ this is what a juju segfault on a bad kernel looks like
<davecheney> nip and lr are 0
<davecheney> something branched to 0 and nuked the lr for good measure
<mwhudson> well, once you have a disagreement over whether a bit of memory is stack or not, it's not exactly predictable what happens next
<davecheney> true
<davecheney> but why is the ip 0
<davecheney> both cases this is unmapped memory
<mwhudson> because something stomped over the link register on the stack, so it branched to lala land when trying to do a procedure return?
<mwhudson> i don't know the ppc abi but i certainly saw that sort of thing a lot on arm64
<davecheney> mwhudson: anything with a LR is probably going to act the same
<mwhudson> also
<davecheney> mwhudson: ok, so if we're not running of the end of the stack
<davecheney> and i'm pretty sure we're not
<davecheney> then why does the size of the kernel page size affect the result
<davecheney> $ pmap -x 969
<davecheney> 969:   /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
<davecheney> Address           Kbytes     RSS   Dirty Mode  Mapping
<davecheney> total kB               0       0       0
<davecheney> ---------------- ------- ------- -------
<davecheney> well, thanks
<davecheney> thumper: juju stutus returns 0 if there are hook errors
<davecheney> axw: sorry, maybe this question is best addressed to you
<axw> is that a problem?
<davecheney> axw: dunno
<davecheney> depends what we've promised status willdo
<davecheney> i know that people want to be able to say 'is this environment ok'
<davecheney> $ pmap -x 969
<davecheney> 969:   /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
<davecheney> Address           Kbytes     RSS   Dirty Mode  Mapping
<davecheney> sory
<davecheney> ---------------- ------- ------- -------
<davecheney> $ pmap -x 969
<davecheney> 969:   /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
<davecheney> Address           Kbytes     RSS   Dirty Mode  Mapping
<davecheney> oh for fucks sake
<davecheney> ---------------- ------- ------- -------
<davecheney> total kB               0       0       0
<davecheney>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
<davecheney>   969 root      20   0 1413376 515456  19136 S   9.6  6.2   0:18.51 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
<axw> yeah I can see the use case, but AFAIK it always just returned 0
<davecheney> axw: i think this might be related
<davecheney> heavy use of the api server causes RES to rise
<davecheney> oh god
<davecheney> i hate everything
<davecheney> upstart isn't logging the stderr of jujud-machine-0
<davecheney> :cry: SIGQUIT doesn't do what I think on gccgo
<thumper> wallyworld: hangout died
<thumper> wallyworld, axw, waigani: I figured I was done anyway :-)
<axw> thumper: will take a look at your CL after I finish up on this HA thing
<thumper> axw: ack
<thumper> axw: I first read that as "hating"
<axw> heh
<thumper> made me chuckle
 * thumper goes for a brief lie down before his head explodes
<waigani> wallyworld: I found the mockable BuildToolsTarball, what was the other one? bundleTools?
<wallyworld> yeah BundleTools
<wallyworld> in environs/tools
<waigani> that isn't mockable?
<waigani> environ/tools/build.go:205
<wallyworld> you just need to introduce a var
<wallyworld> make the method lower case
<wallyworld> make te var upper case
<waigani> ah sure, make it mockable - no problem
<davecheney> mwhudson: https://bugs.launchpad.net/juju-core/+bug/1307282
<_mup_> Bug #1307282: cmd/jujud: gccgo api server consumes ~500mb of ram on machine-0 <gccgo> <ppc64el> <juju-core:Triaged> <https://launchpad.net/bugs/1307282>
<davecheney> ERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)
<davecheney> ERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)
<davecheney> did this get fixed ?
<davecheney> waigani: can you send me `uname -a` from your vm ?
<waigani> davecheney: Linux winton-09 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:09:21 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux
<davecheney> waigani: intersting
<davecheney> i'm trying a -24 kernel and I can't get it to crash
<davecheney> waigani: did you just upgrade to that kernel ?
<waigani> hmmm
<davecheney> waigani: uptime
<waigani> davecheney:  03:27:40 up  1:09,  2 users,  load average: 0.00, 0.01, 0.05
<waigani> I did a restart, to see if that helped at all
<waigani> ran make check after, same problem
<davecheney> waigani: ok
<davecheney> thanks, that makes it concrete
<davecheney> dmesg
<davecheney> ^^
<waigani> davecheney: http://pastebin.ubuntu.com/7247924/
<davecheney> waigani: ta
<davecheney> i should have said
<davecheney> dmesg | tail
<davecheney> waigani: could I ask you to check again
<waigani> davecheney: http://pastebin.ubuntu.com/7247927/
<davecheney> sorry
<davecheney> the test
<davecheney> not the dmesg
<waigani> ah right
<davecheney> what i'm looking for is a line like
<davecheney> (no worries, this was my fault)
<davecheney> 11:54 < davecheney> [2028013.988376] jujud[400]: bad frame in setup_rt_frame: 0000000000000000 nip 0000000000000000 lr 0000000000000000
<davecheney> ^ should see something like this
<waigani> okay, I'll paste when done and keep an eye out for a line like that
<davecheney> waigani: can you ssh-import-id dave-cheney on your vm
<davecheney> so I can stooge around you /var/log/
<davecheney> and see what kernel you were running before reboot
<waigani> davecheney: already done, your public key is on the vm
<davecheney> danka
<davecheney> waigani: i have a theory that -24 kernel fixes the issue
<davecheney> it's not much of a theory atm
<waigani> davecheney: http://pastebin.ubuntu.com/7247954/
<waigani> davecheney: I have a theory that I did something stupid
<waigani> not so much a theory as a constant axiom
<davecheney> waigani: ubuntu@winton-09:/var/log$ grep '\-generic' dmesg.0 dmesg
<davecheney> dmesg.0:[    0.000000] Linux version 3.13.0-20-generic (buildd@denneed04) (gcc version 4.8.2 (Ubuntu 4.8.2-17ubuntu1) ) #42-Ubuntu SMP Fri Mar 28 09:55:49 UTC 2014 (Ubuntu 3.13.0-20.42-generic 3.13.7)
<davecheney> dmesg.0:[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-3.13.0-20-generic root=UUID=30486aa4-f767-4397-ab88-dd0e02e66651 ro console=hvc0 earlyprintk
<davecheney> dmesg:[    0.000000] Linux version 3.13.0-24-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #46-Ubuntu SMP Thu Apr 10 19:09:21 UTC 2014 (Ubuntu 3.13.0-24.46-generic 3.13.9)
<davecheney> dmesg:[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-3.13.0-24-generic root=UUID=30486aa4-f767-4397-ab88-dd0e02e66651 ro console=hvc0 earlyprintk
<davecheney> looks like you were running -20, then you got -24 when you rebooted
<davecheney> waigani: dmesg ?
<waigani> davecheney: http://pastebin.ubuntu.com/7247958/
<waigani> sorry, is that what you meant?
<davecheney> waigani: yup
<davecheney> intersting
<davecheney> all prevoius panics of this class leave a message in dmesg
<davecheney> ok, there could be two unrelated issues
<davecheney> waigani: could you log a bug for http://pastebin.ubuntu.com/7247954/
<davecheney> tag it gccgo ppc64el
<waigani> davecheney: yep, gladly :)
<davecheney> waigani: ta
<waigani> davecheney: I'll just double check that I have not done something stupid it the code. It *should* be latest trunk
<davecheney> waigani: nah
<davecheney> this isn't you
<davecheney> the panic is happening in /usr/bin/go
<davecheney> if you want to ingestigate
<davecheney> apt-get source gccgo-go
<waigani> right, that is what stumps me
<davecheney> then have a look at that line in build.go
<davecheney> waigani: i ran into that about a week ago
<davecheney> that was when the floor fell out from under me
<waigani> lol
<waigani> yep, I know that one
<waigani> davecheney: https://bugs.launchpad.net/juju-core/+bug/1307289
<_mup_> Bug #1307289: Go panics when running tests on ppc64 <gccgo> <ppc64el> <juju-core:New> <https://launchpad.net/bugs/1307289>
<davecheney> waigani: jolly good
<davecheney> axw: ERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)
<davecheney> ERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)
<davecheney> ^ did this get fixed recently
<davecheney> or should I log a bug
<thumper> axw: do you really think that two filtering methods is better than one with a bool?
<thumper> axw: I'll write it and look at the diff
<axw> thumper: I really do. With that approach you can see without a doubt that nothing can change the behaviour at runtime; with the bool you need to ensure that nothing changes it
<thumper> ok
<axw> davecheney: wallyworld fixed that already I think
<davecheney> axw: right
<davecheney> this is 1.17.8 (ish)
<axw> yeah, fixed in 1.18.1 I believe
<wallyworld> yeah, fixed in trunk
<davecheney> i think I saw a branch last week
<davecheney> right o
<thumper> axw: like this http://paste.ubuntu.com/7247988/ ?
<axw> thumper: yup
<axw> thumper: comment on countedFilterLine needs fixing
<davecheney> ping jam ?
<jam> davecheney: /wave
<davecheney> jam: i think we're eating hte elephant from different ends
<davecheney> wrt to the api server memory usage
<jam> I'm not sure I understand
<jam> thumper: I'm around whenever you would like to hangout
<davecheney> jam: ok
<davecheney> in trying to trace down the panics i'm seeing
<davecheney> i've sort of discovered just how much memory jujud consumes
<davecheney> it's horrific
<jam> my initial results showed about 0.5MB per agent, which wasn't great, but wasn't terrible. but when something gets into a bad situation, I see memory spike terribly
<davecheney> jam: gccgo
<davecheney> it's more like 250mb per agent
<davecheney> two agesnts per machine
<davecheney> at a minimum
<jam> wow...
<davecheney> its complicated
<jam> that's way different
<davecheney> gccgo when not using split stacks
<davecheney> allocates an 8 mb stack from the heap
<davecheney> so that puts the heap under a lot of pressure
<davecheney> even if large amounts of that 8mb stack remain uncomitted
<jam> yeah, 8MB per goroutine would be really bad for how much we use it
<davecheney> i'm also seeing strange things that make me thing when a client disconnects
<jam> so I think we have a bug that if a client disconnects in a bad way, it cascades into causing an APIServer restart, but I haven't tracked down the exact issues yet.
<davecheney> we're not releasing all the server side resources used by the client
<davecheney> in my test
<davecheney> 3 machines
<davecheney> on the manual provider
<jam> It might just be that it leaves resources behind, right
<davecheney> killing the agents on the service units
<davecheney> causese memoryu usage to almost double
<davecheney> with gc and 8k stacks, you won't feel a few leaked goroutines
<davecheney> with 8mb stacks
<davecheney> yup, you'll feel it
<davecheney> $ grep -c goroutine /tmp/out
<davecheney> 247
<davecheney> ^ starts at 169 for 4 agents
<davecheney> after a few restarts of the agents we're up to 247
<thumper> jam: ok, with you in 1m
<davecheney> axw: http://paste.ubuntu.com/7248220/
<davecheney> i don't get it
<davecheney> i did destroy-machine as requiested
<davecheney> the agents are stopped
<davecheney> but I can't destroy the environment
<axw> davecheney: umm
<axw> davecheney: if they never disappear from state, seems that's a bug. but you can do destroy-machine --force to clean up manually
<davecheney> axw: right
<jam> thumper: the connection seems to have died
<thumper> jam: google tells me my connectivity is experiencing issues
<davecheney> axw: --force doesn't give me any love
<axw> davecheney: did it return an error or anything?
<axw> or just silence?
<davecheney> silence
<axw> davecheney: the provisioner should remove the machine from state when it's dead... it's entirely possible that someone has changed the provisioner so that it doesn't work with manual anymore
<axw> we need a "no provider left behind" act
 * davecheney reaches for rm 
<axw> davecheney: destroy-environment --force should work as a last resort, if all the machines really are cleaned up
<davecheney> ok, some good news, 3.13.0-24 may fix the issue
<davecheney> oh
<davecheney> nope
<davecheney> hmm
<davecheney> hard to tell
<davecheney> need more information
<rogpeppe> mornin' all
<jam> morning rogpeppe
<davecheney> 'moin
<rogpeppe> jam: hiya
<rogpeppe> davecheney: yo!
<axw> morning rogpeppe
<rogpeppe> axw: hiya
<axw> rogpeppe: landed the EnsureAvailability MP. is there something else you'd like me to look at now?
<davecheney> evening
<rogpeppe> axw: there is one thing that would be awesome if we could do
<rogpeppe> axw: currently we can't upgrade to a HA environment
<axw> ok
<rogpeppe> axw: because there is no mongo user configured on the admin database
<rogpeppe> axw: we need to change EnsureMongoServer to add one
<rogpeppe> axw: (if necessary)
<axw> rogpeppe: I guess there's a tonne of other things that need to be done for upgrades too, though? like rewriting mongo scripts? or has nate done that already?
<rogpeppe> axw: the mongo upstart script is already written when necessary (well, actually, it's been disabled for the moment, pending this)
<axw> I see
<axw> ok, I will take a look
<rogpeppe> axw: to add the admin user, while the service is stopped, we need to start the mongod in non-authenticated mode
<rogpeppe> axw: then add the admin user in that mode
<axw> thanks
<rogpeppe> axw: before tearing mongod down again and starting it up normally
<rogpeppe> axw: i did manually verify that that does actually work, but i'm afraid i can't remember the exact steps i used
<wallyworld> rogpeppe: hiya, i have a reflection question for you if you have a moment
<rogpeppe> wallyworld: sure
<wallyworld> i have a reflect.Value
<wallyworld> i want to create a nil value pointer
<wallyworld> eg reflect.ValueOf((*string)(nil))
<wallyworld> if it were for a *string
<wallyworld> but i want to do it dynamically
<wallyworld> reflect.New(val.Type().Elem()) gives me a pointer to a zero value
<wallyworld> but i want a pointer to nil that i can use with value.Set()
<wallyworld> make sense?
<rogpeppe> wallyworld: what would the code in normal Go look like? use T for the type of the value
<wallyworld> var foo *T
<wallyworld> foo = nil
<wallyworld> foo is a field of a struct
<wallyworld> i have it working using a switch on the field Kind and using reflect.ValueOf((*sgtring)(nil))
<wallyworld> but i want to do it without that
<rogpeppe> wallyworld: so you want a nil value of the same type as a pointer to the type of the field?
<wallyworld> yeah, i think so, so that a call to value.Set() works
<rogpeppe> wallyworld: do you want to actually set the value of the field in the struct?
<wallyworld> yep
<rogpeppe> wallyworld: i don't think you want a pointer, in that case
<wallyworld> reflect.ValueOf(*mystruct).Elem().FieldByName(fieldName) is what i use to get the value
<rogpeppe> wallyworld: right, well you can just call Set on the result of that
<wallyworld> so if val is the result of the above
<wallyworld> i call Set() yes
<wallyworld> but i can't find out what to pass to Set()
<rogpeppe> wallyworld: a reflect.Value of the same type as the field...
<wallyworld> reflect.New(val.Type().Elem())  gives a pointer to "" for example
<wallyworld> i want to do it dynamically
<rogpeppe> wallyworld: are you just trying to set the field to nil?
<wallyworld> yes
<wallyworld> i thought i'd need value.Set()
<rogpeppe> wallyworld: val := reflect.ValueOf(mystructptr).Elem().FieldByName(fieldName); val.Set(reflect.Zero(val.Type())
<jam> dimitern: morning. We can do a 1:1 if you would like, though officially that's natefinch's responsibility now.
<wallyworld> but reflect.Zero() gives me "" doesn't it?
<wallyworld> rogpeppe: ah, it seems to have worked
<wallyworld> thank you. for some reason i was thinking reflect.Zero() would give me the wrong thing
<jam> wallyworld: you did "reflect.New(val.Type().Elem())"
<jam> note that Elem is an element of the pointed to type
<jam> vs
<jam> reflect.New(val.Type())
<jam> val.Type() is a pointer, val.Type().Elem() is the actual object
<jam> and the Zero of a pointer is nil
<jam> the Zero of a string is ""
<wallyworld> ah ffs, stupid mistake, thanks
<jam> (11:18:25 AM) wallyworld: reflect.New(val.Type().Elem())  gives a pointer to "" for example
<rogpeppe> yeah, New is exactly equivalent to the language primitive "new"
<dimitern> jam, oh is that so
<dimitern> jam, well, i can join the regular meeting?
<jam1> fwereade: looks like we made our N^2 problem with CharmURL worse in 1.18 because of the changes to Upgrade now watching the machine's agent version.
<jam1> This one may not matter *quite* as much in practice, if you aren't deploying multiple units to machines.
<jam1> But in my sim tests, we wake up the Upgrader even more often than we wake up the CharmURL
<fwereade> jam1, ha
<fwereade> jam1, yeah, I think we write something extra to the machine doc now -- dimitern, do I recall correctly?
<fwereade> dimitern, btw can we please undo those errors changes? I added a note to the review but it was already landed ofc
<jam1> fwereade: well, we also wake up every 15 min because the instance poller claims the machine has a new address
<fwereade> jam1, yeah, indeed
<dimitern> fwereade, I'm working on that now as a follow-up
<fwereade> jam1, I cannot figure out how to schedule those sorts of fixes though -- unless we carve out X% of time for paying down tech debt and classify it as that
<jam1> fwereade: well, if we have a client that wants us to scale to 10000 units, we can bill them for it, as well
<jam1> fwereade: ATM, I'm mostly focused on "this is where we're at"
<fwereade> jam1, I guess :)
<fwereade> jam1, clarity on that front is indeed helpful
<jam1> fwereade: "juju status" with 10k machines actually is doing ok performance wise, but nobody wants 10,000 lines of output
<fwereade> jam1, indeed
<jam1> so there are quite a few things that would need tweaking to scale to that level
<jam1> fwereade: though for *testing* purposes, the N^2 stuff bites me in the ass a lot. 'juju add-unit" to add another 100 units each to 19 machines takes: 200s, 400s, 1200s, 2800s, and I'll let you know when it finishes seconds.
<rogpeppe> jam1: 1-1?
<fwereade> jam1, yeah -- I kinda feel like those sorts of issues are... they should work properly *now*
<jam1> rogpeppe: I just need to switch machines, 1 sec
<fwereade> jam1, but, ehh, prioritisation :/
<jam> dimitern: so it looks like Canonical admin got it backwards, its actually you on my team and roger's on nate's team.
<jam> dimitern: so I think everyone is still on the same standup for now
<dimitern> jam, what team am i supposed to be on?
<jam> dimitern: so looking at Alexis's email about Nate and Ian, you're on my team
<dimitern> jam, yeah, I thought so
<rogpeppe> jam: you've frozen...
<jam> rogpeppe: I got logged out of my google account somehow
<jam> end of month?
<rogpeppe> jam: perhaps
<perrito666> morning
<jam> mgz: 1:1? (just running to the restroom myself)
<mgz> sure, I'll wait for you thdere
<mgz> ...the hangout, not the restroom
<waigani> wallyworld: I can get TestUpgradeJujuWithRealUpload to pass by patching sync.BuildToolsTarball but not when I patch envtools.BundleTools
<waigani> wallyworld: here is my attempt at mocking out bundleTools: http://pastebin.ubuntu.com/7248910/
<wallyworld> what is the error?
<waigani> wallyworld: ... and http://pastebin.ubuntu.com/7248930/
<waigani> wallyworld: error uploading tools: no tools uploaded
<wallyworld> waigani: why is the bundle tools mock uploading tools as well?
<wallyworld> it shouldn't be doing that
<waigani> wallyworld: good question! I just read the logic, let me give that another go ...
<wallyworld> that is my guess as to what the error is, as there would be no metadata or anything
<waigani> wallyworld: I basically ripped the logic out of BuildToolsTarball
<wallyworld> upload tools needs the tarball and also metadata
<waigani> wallyworld: right, let me try again
<jamespage> evilnickveitch, the links on https://juju.ubuntu.com/docs/ looked foobared to me - are you aware?
<evilnickveitch> jamespage, ooh. they were working yesterday. let me have a look
<perrito666> fwereade: morning, are you around?
<evilnickveitch> jamespage, hmm. seem to be working for me - was there a particular page or link that wasn't working for you?
<jamespage> evilnickveitch, the links on the lhs of the page don't appear for me
<evilnickveitch> jamespage, the links are pasted in by a bit of javascript at the end of the page
<evilnickveitch> so either the js isn't loading
<evilnickveitch> because something is messed up on that page, or the page isn't loaded
<jamespage> hmm
<evilnickveitch> have you tried refreshing etc?
<evilnickveitch> are you sure page has finished loading? some external assets take a while to load sometimes, and the link JS is right at the end
<TheMue> evilnickveitch: quick test here on FF show no links too
<TheMue> evilnickveitch: jamespage is right
<evilnickveitch> TheMue, jamespage okay, I guess mine was fetching from cache. i will check into it
<evilnickveitch> TheMue, jamespage okay, I found the problem, some wonky HTML which prevents the rest of the page loading, it's only on the front page, the others should work fine
<evilnickveitch> I will fix it ASAP
<TheMue> evilnickveitch: Great, thanks.
<mgz> evilnickveitch: do you not validate? :P
<evilnickveitch> mgz, it was the stupid linter that caused the problem :P
<mgz> evilnickveitch: :D
<fwereade> perrito666, sorry, I completely missed yu there
<perrito666> fwereade: happns :)
<perrito666> fwereade: still missing the transaction hooks tests but https://codereview.appspot.com/86430043 I did ignore some of your comments because they broke functionality :) but I am willing to re-try once I make sure this goes the right way (altough my assert is either broken or making blow an error existing that was not being discovered bc I am failing 5 tests) https://codereview.appspot.com/86430043
<fwereade> perrito666, cheers, I'll take a look
<waigani> wallyworld: I exported tools.archive: http://pastebin.ubuntu.com/7249092 (tests pass now)
<wallyworld> great, in standup, will look later
<waigani> ah
<wallyworld> had a quic look, looks nice and simple
<wallyworld> like i'd hoped
<waigani> yeah, just hope it's okay that I've made Archive public - adding noise to the API?
<waigani> anyway, I'll leave it for the review
<fwereade> perrito666, rogpeppe has a deepcopy package that may help with cloning
<rogpeppe> fwereade, perrito666: it doesn't work any more
<fwereade> rogpeppe, bah
<perrito666> fwereade: ah, might be much better than the by-hand copy I am doing ther...
<perrito666> rogpeppe: :(
<rogpeppe> it was trying to be too clever
<rogpeppe> perrito666: what are you copying?
<perrito666> rogpeppe: units and machines
<rogpeppe> perrito666: why?
<rogpeppe> perrito666: is it just for testing?
<perrito666> rogpeppe: sory I was listening on the other side :) no, not just for testing
<perrito666> trying to get a copy that ensures me won't change wile I am working in it in certain circumstances (I am just making a method of something previously done by hand)
<rogpeppe> perrito666: what are you actually trying to do?
<fwereade> rogpeppe, clone state.Machine/Unit -- I commented that it'd be nice to do it properly
<rogpeppe> fwereade: ah
<fwereade> rogpeppe, there are a few places we do it in varyingly hackish ways iirc
<evilnickveitch> TheMue, jamespage docs should be working now
<jamespage> evilnickveitch, ta - next question - do release notes get published on /docs ?
<evilnickveitch> jamespage, very good question - not as yet, but I do have a branch that will add them to the reference section. At least for the ones I can find
<evilnickveitch> Check back after 7.30pm
<jamespage> evilnickveitch, its something that ceph does quite well upstream
<evilnickveitch> jamespage, cool, I will check out what they do. I was just intending to dump them all in newest first order with an index of links at the top
<rogpeppe> fwereade, perrito666: two thoughts: 1) we could probably avoid doing a deep copy of the machineDoc, as we don't allow mutation of pieces inside its components
<fwereade> rogpeppe, I suspect that statement is only mostly accurate
<rogpeppe> fwereade, perrito666: 2) if we decided to, it would be easy (but not greatly efficient) to clone by serialising/deserialising through bson
<fwereade> rogpeppe, perrito666: ha, I could live with that
<rogpeppe> fwereade: tbh i think it's reasonable to have methods that return mutable values with a stipulation that you should not modify the contents
<rogpeppe> fwereade: (i presume you're thinking about the Jobs method here)
<rogpeppe> fwereade: if we did that, then Clone could be ultra cheap
<mgz> rogpeppe: got around to finishing the last few test failures: https://codereview.appspot.com/87540043
<rogpeppe> mgz: thanks. looking.
<rogpeppe> mgz: LGTM
<mgz> rogpeppe: thanks!
<rogpeppe> oops, upgrade-juju seems to have killed its own environment
 * rogpeppe hates it when that happens
<rogpeppe> hmm, this is the second time this morning i've had a live bootstrap fail with this error:
<rogpeppe> 2014-04-14 11:53:05 ERROR juju.cmd supercommand.go:299 cannot write file "tools/releases/juju-1.19.0.1-precise-amd64.tgz" to control bucket: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
<fwereade> rogpeppe, perrito666: if the clone were an internal method with "do not modify result" I'd be fine
<fwereade> rogpeppe, perrito666: if it's exported there's just way too much opportunity to screw up at a distance
<rogpeppe> fwereade: i'm thinking of the Jobs method only
<fwereade> rogpeppe, does Jobs not copy? it should ;p
 * perrito666 sees another ball coming his way :p
<rogpeppe> fwereade: well, if Jobs copies, then why do we need a deep copy of the machine doc?
<fwereade> rogpeppe, plenty of methods mutate bits and pieces of state
<rogpeppe> fwereade: (it doesn't, BTW, but it probably should)
<rogpeppe> fwereade: which methods mutate stuff that's pointed to by the machine doc, rather than machine doc fields themselves?
<fwereade> rogpeppe, and considering current cases misses the point; things will change and if we expose this functionality without insulating the objects frm one another we *will* screw it up
<rogpeppe> fwereade: if we make all methods return copies of the underlying data, what is there to screw up?
<rogpeppe> fwereade: AFAICS this should be fine: func (m *Machine) Clone() *Machine { m1 := *m; return &m1}
<fwereade> rogpeppe, various methods write to the document on success
<rogpeppe> fwereade: that's fine
<rogpeppe> fwereade: the document is stored in the Machine as a value type. as long as none of our Machine methods change things that are pointed to by things in the machine doc, we're ok
<fwereade> rogpeppe, we can be sure that none of them will ever change, say, a slice on the document?
<rogpeppe> fwereade: that's not a hard invariant to maintain (it's local)
<rogpeppe> fwereade: we can be sure they don't now, and it's not hard to verify that in the future
<fwereade> rogpeppe, my experience is that it's a very difficult invariant to maintain, even with a team of ultra-smart people half the size of this one
<rogpeppe> fwereade: i think it's better than adding to memory pressure and writing a bunch more code that needs to be maintained every time a field is changed.
<fwereade> rogpeppe, if we're exporting a Clone method, that clone method must deep-copy the data
<fwereade> rogpeppe, if it's not exported I'm willing to be a bit laxer
<fwereade> rogpeppe, not because it won;t screw us, because it *will*
<fwereade> rogpeppe, but because at least the scope of the weirdness will be small enough that we'll have a chance of dealing with it
<rogpeppe> fwereade: tbh, i would prefer us to make Machine etc immutable
<rogpeppe> fwereade: i don't think we gain much by having methods mutate our local idea of state
<fwereade> rogpeppe, that's probably a reasonable position, especially considering current usage, but it's not really on the table at the moment
<fwereade> rogpeppe, in terms of potentially fiddly changes, errgo has a much bigger payoff ;p
 * fwereade needs to go to the airport, hadn;t realised he was flying so early
 * fwereade will say hi again this evening if he can
<natefinch> rogpeppe: gotta help with my daughter for a bit, probably be 45-60 mins
<rogpeppe> fwereade: where are you going?
<rogpeppe> natefinch: ok
<perrito666> rogpeppe: well have to wait until he returns to know :p
<rogpeppe> ha, i have a machine where provisioning failed (amazon says "Server.InternalError: Internal error on launch") but i can't call retry-provisioning because the machine isn't in an error state
<dimitern> rogpeppe, mgz, errors package improvements - https://codereview.appspot.com/87560043 - it's a bit big, but most changes are renames
<mgz> ...scary
<rogpeppe> dimitern: the Suffix field looks like it's not used - is it?
<rogpeppe> dimitern: similarly ArgsConstructor doesn't appear to be used
<dimitern> rogpeppe, it's used in tests only
<rogpeppe> dimitern: right. i'm not sure we need to pollute the production code with test-specific functionality.
<dimitern> rogpeppe, allErrors is unexported - how does it pollute?
<rogpeppe> dimitern: it makes the code more complex
<dimitern> rogpeppe, so you're saying let's have 2 almost identical []struct{} defined - one for testing, the other for production?
<rogpeppe> dimitern: i don't think you need the table at all in the production code - i'm just writing up a suggestion
<dimitern> rogpeppe, if it stays like this there's less chance of forgetting to add a new error type to allErrors and have it tested
<dimitern> rogpeppe, ok, thanks
<sinzui> jamespage, Do you know who I can show Bug #1305280 to get an apparmor issue addressed?
<_mup_> Bug #1305280: juju command get_cgroup fails when creating new machines, local provider arm32  <armhf> <local-provider> <lxc> <packaging> <juju-core:Triaged> <apparmor (Ubuntu):New> <https://launchpad.net/bugs/1305280>
<jam> hi sinzui, I had some CI things I wanted to work with you on
<sinzui> hi jam
<jam> sinzui: specifically, looking at the log files, you're using "juju-1.18.0" to do "scp"
<jam> which is known broken for you
<jam> and we released 1.18.1 with that specific fix for you
<jam> though beyond that, "juju scp" always requires the API server to be functioning, which is what is breaking in the "upgrade" test
<jam> so it might be nice if we tried to use raw "scp" if we can.
<jam> either try to raw scp first, or try "juju scp" first and fall back
<jam> sinzui: we can get the API IP address from the environment/foo.jenv file
<sinzui> jam, yes, abentley and I discussed the fallback
<jam> (If we've ever connected successfully, we'll be caching the value there, and we'd like to get to the point where we cache it at the end of bootstrap)
<jam> sinzui: I'm trying to debug the local bootstrap problem. I haven't reproduced it yet, but I'm currently on Trusty
<sinzui> jam, and we can also update to 1.18.1 today
<jam> so I have to fire up a Precise instance first
<sinzui> jam, interesting. http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/aws-upgrade-trusty/ shows trusty is upgrade fails in parallel to precise
 * sinzui starts update and upgrade
<sinzui> jam, I can re-run an upgrade test for the cloud and series of your choice
<jam> sinzui: but that is not local
<sinzui> 1.18.1 is installed
<sinzui> now
<jam> I'm just starting with trying to fix the local-deploy issue
<jam> so I can try to not fire up a remote machine just to debug upgrade
<jam> sinzui: the main question about local right now is that probably the version of mongod running on precise is different from trusty
<jam> so while I think we also have an upgrade bug
<jam> It might be that bootstrap is failing because trusty has 2.4.9 (which works for us), and Precise is running 2.4.6 or something
<jam> I just realized my test won't work, as local under LXC doesn't work
<rogpeppe> dimitern: reviewed (kinda)
<dimitern> rogpeppe, cheers, I have the next one for you btw :) - it's tiny https://codereview.appspot.com/87470044
<rogpeppe> dimitern: i agree that the mgo/txn docs could be clearer, BTW
<dimitern> rogpeppe, no doubt about it
<rogpeppe> dimitern: LGTM
<dimitern> rogpeppe, ta!
<dimitern> rogpeppe, updated https://codereview.appspot.com/87560043 - it's nicer now I think
<rogpeppe> axw: ha, it seems that 7 maximum parallel try attempts is way too small for real world API dialling
<rogpeppe> BTW I now have a functional environment where I destroyed the bootstrap instance
 * axw tries to remember why it's 7
<axw> cool :)
 * dimitern will bbiab (1h)
<rogpeppe> axw: in my environment with 3 state servers, i see 21 addresses cached in the .jenv file...
<axw> rogpeppe: I guess I was thinking one per state server, but we will need more for each address type...
<rogpeppe> axw: and because each dial attempt takes ages to time out, we don't get to try the second valid address until it has.
<rogpeppe> axw: i'm tempted to just allow unlimited concurrent dials
<axw> rogpeppe: how do you have 21 addresses for 3 state servers?
<axw> why so many?
<rogpeppe> axw: http://paste.ubuntu.com/7249818/
<rogpeppe> axw: machine-local addresses, ipv6 addresses, etc
<axw> rogpeppe: we are ignoring the machine-local ones, right?
<mgz> that number should get filtered a little, yeah
<axw> I should know the answer to this :)
<rogpeppe> axw: no, not in api dial, because we don't currently store the metadata in the jenv
<rogpeppe> axw: that needs to be fixed
<axw> ah right, yeah
<axw> rogpeppe: so really, I think we'd have 2*state-server for both public and internal, if we had that
<rogpeppe> still, the point remains that you probably want to try dialling all your api server addresses at once, because sod's law says that the one you don't try is the only one that works
<axw> true. there's the private-inside-private scenario to cater for
<rogpeppe> axw: probably 4, because DNS-name vs numeric
<axw> rogpeppe: I was thinking public IP & name, but yes we do need to try private too
<rogpeppe> axw: yup
<jam> jamespage: sinzui: do we know why cloud-archive:tools only has juju-1.16.3 ?
<jamespage> jam: its called a blocked SRU
<jamespage> can't get into cloud-tools before you go into saucy
<axw> rogpeppe: I guess we can do unlimited... if we get in trouble, we could try a more complicated initially-short but expanding timeout
<jamespage> jam: we don't have an MRE yet so I have to detail how to test every bug in full - see bug 1277526
<_mup_> Bug #1277526: [SRU] juju-core 1.16.6 point release tracker <juju-core (Ubuntu):Fix Released> <juju-core (Ubuntu Saucy):In Progress by james-page> <juju-core (Ubuntu Trusty):Fix Released> <https://launchpad.net/bugs/1277526>
<jam> jamespage: ouch. Going from 1.16.6 => 1.18.X is going to be a massive PITA for that.
<jamespage> jam: there is not SRU for 1.16.6 -> 1.18.x
<jam> jamespage: I realize there isn't (yet), but wouldn't the plan be to have the stable version of Juju in cloud-archive:tools ?
<jamespage> jam: that happens when cloud-archive:tools get's superceded by the trusty version
<jamespage> superceeded/replaced
<jamespage> jam: actually - while I'm thinking about this - how does backup/restore work on 1.16.6?
<jamespage> I see the update-bootstrap-node stuff mgz did in the bug list
<jam> jamespage: AFAIK it works through all the 1.16's because that is what we wrote it against.
<jamespage> jam: but for 1.16.x there is no backup or restore plugin?
<jam> jamespage: I think it was added in 1.16.5 ?
<jamespage> really?
<jam> jamespage: we added it for CTS
<mgz> yeah, it was a bit of a fudge for minor version
<natefinch> rogpeppe: hey, sorry, that took a lot longer than expected, obviously.
<rogpeppe> natefinch: i'm just about to go to lunch
<natefinch> rogpeppe: ok, where are we right now?
<rogpeppe> natefinch: i've had a mostly-success with my integrated branch
<rogpeppe> natefinch: two things we need to fix: agent.Config.StateInfo needs to return localhost always
<rogpeppe> natefinch: api.Open should try all addresses concurrently
<rogpeppe> natefinch: oh, and one other one line fix
<rogpeppe> natefinch: APIWorker needs to fetch agent config again after dialling
<natefinch> rogpeppe: ok, I can start working on those.  Should I branch off your branch or just do that in a new branch off trunk?
<rogpeppe> natefinch: i'd just do new branches off trunk
<rogpeppe> natefinch: they're all trivial
<natefinch> rogpeppe: yep, ok
<jam> jamespage: "juju-local" doesn't seem to depend on rsyslog-gnutls
<jam> ah, maybe it does now, but upgrade didn't do it?
<jam> weird
<jamespage> it does
<jam> jamespage: I thought I did apt-get upgrade, but I had to "apt-get install juju-local" again to get it
<jam> jamespage: anyway, it looks like 1.18.1 does depend on it, so thanks for that, sorry about the confusion
<jamespage> np
<jam> sinzui: I'm unable to reproduce the "local bootstrap" failure with trunk and cloud-archive:tools version of mongo (2.4.6)
<jam> I see the replicaSet line, but it doesn't fail
<sinzui> jam, I don't know which bug you are working on. The lxc bug I know of is about apparmor: bug 1305280
<_mup_> Bug #1305280: juju command get_cgroup fails when creating new machines, local provider arm32  <armhf> <local-provider> <lxc> <packaging> <juju-core:Invalid> <apparmor (Ubuntu):New> <https://launchpad.net/bugs/1305280>
<jam> sinzui: https://bugs.launchpad.net/juju-core/+bug/1306212
<_mup_> Bug #1306212: juju bootstrap fails with local provider <bootstrap> <ci> <local-provider> <regression> <juju-core:In Progress by jameinel> <https://launchpad.net/bugs/1306212>
<jam> sinzui: since I can't reproduce that right now, I'm switching to https://bugs.launchpad.net/juju-core/+bug/1307450
<_mup_> Bug #1307450: upgrading from 1.18.1 to 1.19 (trunk) fails (API server stops responding) <ci> <juju-core:Triaged by jameinel> <https://launchpad.net/bugs/1307450>
<sinzui> jam: please do
<jam> sinzui: so offhand, we have a different bug, which is that "juju upgrade-juju --upload-tools" doesn't end up putting the tools where the agents can find them. :(
<sinzui> damn
<jam> sinzui: it looks like it uploads the tools, but doesn't make it readable
<sinzui> jam, We would be happy if local-provider honoures tools-metadata-url. We want to set it to a testing stream since local has to use streams to get tools for different series
<sinzui> jam, but I won't redirect you for delivering the fastest fix
<jam> sinzui: well this is testing "juju-1.19.0 upgrade-juju --upload-tools"
<jam> which should be working, but something isn't right
<alexisb> morning all (and good evening)
<jam> sinzui: sorry I couldn't get any farther on this, but I have to EOD
<jam> wallyworld wanted to pick it up in the morning
<sinzui> Thank you for you time jam
<jam> sinzui: and I think he was the one who did the changes to "upload-tools" so he probably has better insight there
<natefinch> alexisb: morning alexis  (I think the convention is just to use the greeting relative to your own time zone... everyone knows what you mean :)
<jam> morning alexisb
<jam> you're up awfully early
<jam> sinzui: launchpad Q. If I have sensitive all-machines.log, can I upload it as a private attachment?
<sinzui> jam No private attachment :(
<jam> sinzui: fortunately VIM can global search for the secrets and replace them with XXX without too much trouble
<rogpeppe> axw: ping
<rogpeppe> alexisb: hiya
<jam> sinzui: hmm... It looks like "juju bootstrap" started creating i386 instances, and you cant "upgrade-juju --upload-tools" with a 64-bit version
<jam> it will let you, but it can't find the i386 tools (for obvious reasons)
<axw> rogpeppe: hey
<alexisb> jam, not that early 8am for me
<jam> sinzui: can you check if your 1.18.1 bootstrapped instances are i386 ?
<jam> it was for me
<jam> which is also a bug
<rogpeppe> axw: the existing code doesn't seem to mention juju-mongodb
<rogpeppe> axw: do you know how we should tell if it's available?
<rogpeppe> axw: (looking at your comments on https://codereview.appspot.com/86920043 )
<jam> alexisb: well, you were on a bit earlier, but I did the math wrong. 11 hours makes you 1 hour closer, not 1 hour farther away
<axw> rogpeppe: right. no, I don't. I guess it just hasn't been done yet - so that can be TODO
<jam> I thought it was 5:30 ish
<rogpeppe> axw: ok, cool
<axw> rogpeppe: this upgrade thing is a massive PITA. may take me a little while yet to come up with a nice solution
<rogpeppe> axw: where do the main difficulties lie?
<axw> rogpeppe: upgrade steps require API server & state, API server dies when state gets bounced
<rogpeppe> axw: don't do it in upgrade steps
<rogpeppe> axw: do it in EnsureMongo
<jam> sinzui: so I have a bit more I can try to go on tomorrow, or *maybe* later tonight depending on how things go.
<rogpeppe> axw: where we're already stopping and restarting the service
<sinzui> jam, okay. I am still looking for  the arch that was used
<axw> rogpeppe: I *think* there's a problem then that server.pem may not exist
<axw> err
<axw> maybe not that one
<axw> there was another file that was created on upgrade
<jam> sinzui: I think we have a 1.18.2 Critical bug that 1.18.X no longer prefers amd64
<rogpeppe> axw: EnsureMongoServer is responsible for writing out the files that mongo requires, so we *should* be ok, i think
<axw> rogpeppe: anyway. I did start down that path... I'll keep looking into it tomorrow
<axw> ok
<rogpeppe> axw: thanks a lot
<jam> I *think* someone commented that it was because of PPC/ARM64 enablement
<jam> (we can't force amd64, so we let the cloud tell us what to use, but that means if both i386 and amd64 are available we now do i386, when we should do amd64 if possible)
<axw> sleepy time.. night all
<jam> sinzui: I do believe you can force it with: juju bootstrap --constraints="arch=amd64"
<rogpeppe> axw: BTW the reason for moving InitiateMongoServer into peergrouper is...
<jam> and now, I really must go spend time with my family :)
<rogpeppe> too late!
<sinzui> jam. CI has started a new round of tests. These will use 1.18.1. I will watch them for arch mismatches
<rogpeppe> natefinch: ping
<natefinch> rogpeppe: hi
<rogpeppe> natefinch: hangout?
<natefinch> rogpeppe: sure
<rogpeppe> natefinch: https://plus.google.com/hangouts/_/canonical.com/juju-core?authuser=1
<rogpeppe> could someone have a look at this please? we've addressed comments, but it still needs a LGTM and it's a major blocker for HA. https://codereview.appspot.com/86920043/
<sinzui> jam: I don't see an arch mismatch deploying 1.18.1. CI doesn't use upload-tools when deploying stable (since upload-tools is officially an developer feature)
 * sinzui tries locally
<natefinch> dimitern, mgz, jam, ping on the review above that roger posted
<rogpeppe> trivial review anyone? https://codereview.appspot.com/87560044
<rogpeppe> dimitern, mgz, jam: ^
<dimitern> rogpeppe, looking
<rogpeppe> dimitern: ta!
<dimitern> rogpeppe, i'd swap you for https://codereview.appspot.com/87560043 :)
<rogpeppe> dimitern: will do, after i've finished investigating this issue
<dimitern> rogpeppe, sure, np - just reminding
<dimitern> rogpeppe, LGTM
<rogpeppe> dimitern: we really really need a review of https://codereview.appspot.com/86920043/ if you could muster the energy for it
<rogpeppe> dimitern: but thanks for that review too :-)
<dimitern> rogpeppe, looking that one as well
<rogpeppe> dimitern: much appreciated
<jam> rogpeppe: on https://codereview.appspot.com/87560044/ is there something about direct State destruction that we lose with your patch?
<rogpeppe> jam: no
<rogpeppe> jam: AFIK
<rogpeppe> AFAIK
<rogpeppe> jam: we only connect to the API if we don't use --force, and in that case we really want to use the usual API connection methods
<dimitern> rogpeppe, natefinch, that HA CL LGTM with some trivials
<rogpeppe> dimitern: thanks muchlu
<rogpeppe> y
<dimitern> rogpeppe, i'll poke you again about https://codereview.appspot.com/87560043 though :) (last time for today)
<rogpeppe> dimitern: ok, will look now :-)
<dimitern> rogpeppe, tyvm!
<rogpeppe> dimitern: the only comment i might have would be that it might be more idiomatic to have the error types themselves as pointer types, embedding wrapper as a value
<rogpeppe> dimitern: in fact, i think that's definitely worth doing
<rogpeppe> dimitern: because it means that %#v will work better on errors
<rogpeppe> dimitern: so: type notFound {wrapper}
<rogpeppe> dimitern: and func (*notFound) new( etc
<dimitern> rogpeppe, ok, that sgtm
<dimitern> rogpeppe, did I see LGTM as well? :)\
<rogpeppe> dimitern: i really think those tests could use sorting out
<dimitern> rogpeppe, which ones?
<rogpeppe> dimitern: i've been struggling to understand the logic
<rogpeppe> dimitern: errors_test.go
<rogpeppe> dimitern: after some effort, i think i've managed to tease out a suggestion
<dimitern> rogpeppe, for each error in allErrors I add like 20ish cases
<dimitern> rogpeppe, I didn't want to repeat the same tests for all types and possibly miss something along the way
<rogpeppe> dimitern: i know, but the logic is quite a bit more complex than it needs to be
<rogpeppe> dimitern: lines 180 to 190 are really hard to follow
<rogpeppe> dimitern: and the errorSatisfier type doesn't seem to be doing much any more
<dimitern> rogpeppe, I confess I kept it only for the String() method
<rogpeppe> dimitern: yeah, it feels like a weird holdover
<rogpeppe> s/holdover/relic/
<rogpeppe> dimitern: and you don't even need the String method for what you're using it for
<dimitern> rogpeppe, I need a way to compare 2 satisfiers (== or !=) and i can't do it with func pointers it seems
<rogpeppe> dimitern: you could have two nested loops over allErrors
<dimitern> rogpeppe, isn't that worse than using reflect?
<rogpeppe> dimitern: then you just need to compare indexes (or perhaps pointers if you prefer)
<rogpeppe> dimitern: it's certainly simpler
<rogpeppe> dimitern: so i think it's better
<dimitern> rogpeppe, but I have test.satisfier and allErrors[i].satisfier
<rogpeppe> dimitern: you don't need test.satisfier
<dimitern> rogpeppe, I can't just compare them and the indexes don't matter
<rogpeppe> dimitern: the only reason you have that is that you're mixing in nil satisfier tests
<rogpeppe> dimitern: they don't really fit, and they complicate all the logic
<dimitern> rogpeppe, hmm..
<dimitern> rogpeppe, I guess I can make a separate set of tests + loop in another test case for nils
<rogpeppe> dimitern: i'd move the contextf tests into their own function too
<rogpeppe> dimitern: it's really a totally independent function
<dimitern> rogpeppe, but it needs to loop over allErrors as well
<dimitern> rogpeppe, ok, can be done separately I agree
<rogpeppe> dimitern: not necessarily
<rogpeppe> dimitern: its logic is independent of allErrors
<rogpeppe> dimitern: you do need to check that each error implements the newer interface, but that's easy to check statically
<dimitern> rogpeppe, the origin of this CL is the behavior of ErrorContextf - I need to check each error type is preserved
<rogpeppe> dimitern: fair enough. but that's a very simple test and loop over allErrors.
<dimitern> rogpeppe, yeah, but that's an implementation detail that you, as a user of Contextf doesn't need to know
<dimitern> rogpeppe, exactly
<jam> natefinch: https://codereview.appspot.com/87570043/ <= log the version of mongo as we create the upstart job
<dimitern> rogpeppe, ok, I appreciate your comments and will look at it a bit later or tomorrow
 * dimitern reached eod
<rogpeppe> dimitern: np, sorry for the push-back.
<dimitern> rogpeppe, not to worry - it was useful :)
<jam> sinzui: just in case it wasn't clear, "juju upgrade-juju --upload-tools" failed because bootstrap picked an i386, but upload-tools can only upload the amd64 that I'm running.
<jam> so it was a combination of bug #1304407
<_mup_> Bug #1304407: juju bootstrap defaults to i386 <amd64> <apport-bug> <ec2-images> <metadata> <trusty> <juju-core:Triaged> <juju-core 1.18:Triaged> <juju-core (Ubuntu):New> <https://launchpad.net/bugs/1304407>
<jam> and bug #1282869
<_mup_> Bug #1282869: juju bootstrap --upload-tools does not honor the arch of the machine being created <bootstrap> <constraints> <ppc64el> <upload-tools> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1282869>
<sinzui> o O (clue x 4)
<jam> sinzui: so I'm going to try it again and see if I can reproduce the failing to upgrade (for the right reason)
<jam> sinzui: though it looks like bug #1282869 isn't quite complete, as we fixed "bootstrap" but not "upgrade-juju"
<_mup_> Bug #1282869: juju bootstrap --upload-tools does not honor the arch of the machine being created <bootstrap> <constraints> <ppc64el> <upload-tools> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1282869>
<jam> sinzui: I reproduced the "cannot upgrade to 1.19.0" bug: 2014-04-14 18:11:40 ERROR juju runner.go:220 worker: exited "state": cannot log in to admin database as "machine-0": unauthorized mongo access: auth fails
<jam> natefinch: ^^
<jam> rogpeppe: if you're still around, found the upgrade bug
<jam> specifically, 1.19.0 always tries to login to the "admin" db
<rogpeppe> jam: really? cool.
<jam> but on an upgrade, it doesn't have rights as machine-0
<rogpeppe> jam: oh of course, dammit
<jam> rogpeppe: so... do we back out logging into admin, do we make it "try but be ok if it fails" ?
<jam> rogpeppe: if we aren't going to do the full "upgrade support for HA" then we need to put in hacks
<rogpeppe> jam: i think we've got to do the latter
<rogpeppe> jam: otherwise HA won't work even when not upgraded
<jam> rogpeppe: so out of curiousit, why are we doing "admin := session.DB(AdminUser)" I realize the name of the db is "admin" but that shouldn't be AdminUser should it?
<rogpeppe> jam: hmm
<jam> rogpeppe: it is just that we're using the "admin" as the name of the user as the name of the DB
<jam> mostly just a constant that "works" but isn't actually the right named constant
<rogpeppe> jam: yeah, it does seem odd
<jam> rogpeppe: k, the other Database names are just hard-coded strings in the function, so I'll follow suit for clarity
<rogpeppe> jam: sgtm
<rogpeppe> jam: personally i like hard-coded strings anyway - i think they're often clearer
<rogpeppe> jam: DB(AdminUser) does seem wrong to me. i don't know what i was thinking.
<jam> rogpeppe: is there an obvious way how to remove an agent from admin? (I'd like to add a test that we come up ok when we can't access 'admin' as we'd run into after upgrade)
<rogpeppe> pwd
<jam> afaict we don't do anything with the "admin" db we just logged into
<jam> at least not directly
<jam> the other DB objects in that func are put into the State object
<rogpeppe> jam: st.db.SessionDB("admin").RemoveUser(AdminUser)
<jam> rogpeppe: thanks
<jam> well, in this case "RemoveUser(info.Tag)" aka ("machine-0"
<rogpeppe> jam: yeah
<rogpeppe> jam: no, we don't do anything with the admin db
<rogpeppe> jam: but we do need access to it for manipulating the replica set
<jam> rogpeppe: right, it allows you to call particular functions *if* you're logged in
<rogpeppe> jam: yeah
<jam> side-effect is on Mongo side
<jam> rogpeppe: presumably we also need to change State.setMongoPassword to allow for AddUser on the "admin" table to fail?
<jam> or we shouldn't ever be creating one of those
<jam> since we can't be in HA we shouldn't be adding any machines that would want to
<rogpeppe> jam: yeah
<rogpeppe> jam: we should change EnsureAvailability to fail if we're not in replica set mode
<rogpeppe> jam: that way people can't get themselves into a nasty twist
<jam> rogpeppe: &mgo.LastError{Err:"not authorized to remove from admin.system.users",
<rogpeppe> jam: hmm
<rogpeppe> jam: i suppose it might have removed the user anyway
<jam> that is after trying to do:
<jam> 	adminDB := s.state.db.Session.DB("admin")
<jam> 	password := testing.FakeConfig()["admin-secret"].(string)
<jam> 	adminDB.Login(AdminUser, password)
<jam> so theoretically ensuring that I'm admin, though I need to check the err code
<jam> auth fails ...
<jam> rogpeppe: from what I can tell, TestingInitialize returns State object that isn't actually logged into the Admin db
<jam> TestingInfo doesn't have a password
<jam> rogpeppe: what is *really* strange is that SetMongoPassword was perfectly happy, which *should* be setting the password in "admiN"
<jam> so you are authed to add people, but not remove them?
<rogpeppe> jam: it does seem odd
<rogpeppe> jam: mongo has some weird semantics sometimes
<jam> rogpeppe: so I haven't figured out the password for "admin", but I have found that if I call Machine.SetMongoPassword() I can then log into "admin" as "machine-0" with the password I just gave it, and then use *those* credentials to remove the "machine-0" user.
<jam> WTWTTWW WTF?
<natefinch> lol mongo is wacky
<jam> natefinch: yeah, so $CURRENT_USER can add admins, but can't remove them, but you can add one, log in as it, and then do whatever-you-want
<jam> natefinch: apparently the model changed in mongo 2.6: http://docs.mongodb.org/manual/reference/method/db.addUser/
<jam> rogpeppe: from what I can tell, calling adminDB.RemoveUser("machine-0") removes it completely, and not just from admin
<rogpeppe> jam: ha
<rogpeppe> jam: so i guess you'll have to remove it, then add it back to the ones you want it on
<jam> rogpeppe: actually, looks like I was just screwing up the password, so I need to try again
<jam> finally, failing in the way I wanted
<jam> and now success
<rogpeppe> jam: so mongo wasn't being weird at all, in fact?
<jam> rogpeppe: well, I still have to log in as the agent I just created to delete it
<jam> that is still weird-as-fuck
<rogpeppe> jam: ah yes
<jam> but removing it from admin only removes it from admin
<sinzui> I think we want a juju-local-kvm package to sort of kvm deps. juju-local is lxc centric
<natefinch> rogpeppe: lp:~natefinch/juju-core/043-localstateinfo
<jam> natefinch: rogpeppe: It makes me wonder if we couldn't just add ourselves if we weren't in admin to start with ...
<rogpeppe> jam: i *think* i tried that
<rogpeppe> jam: but try it anyway
<jam> rogpeppe: natefinch: sinzui: https://code.launchpad.net/~jameinel/juju-core/soft-login-failure-1307450/+merge/215742
<jam> I was able to reproduce the upgrade failure with the local provider
<jam> and that patch lets it get further
<rogpeppe> jam: codereview?
<sinzui> \o/
<jam> rogpeppe: lbox is thinking about it
<jam> sinzui: not that there won't be any other bugs, but the first one I think I got
<jam> rogpeppe: weird, still thinking
<jam> rogpeppe: https://codereview.appspot.com/87730043
<jam> rogpeppe: natefinch: I'm off to sleep, unfortunately, so if it needs tweaks, I'm sure curtis would appreciate you picking it up from here.
<rogpeppe> jam: i like "haven't implemented bug #xxxx" - sounds like we want to implement a bug...
<jam> or you can point wallyworld at it when he gets up
<natefinch>  jam:cool
<rogpeppe> jam: thanks
<rogpeppe> natefinch: FWIW, the last remaining diffs that we haven't already got branches in progress for: http://paste.ubuntu.com/7251459/
<natefinch>  rogpeppe: wow, that's awesome
<bac> sinzui: so swift remains dead to us.  jenkins for charmworld uses juju to update to the newly blessed code, so staging is now stuck and useless.
<sinzui> bac: yep
 * bac is sad
<sinzui> bac: our only option at this time would be to replace the stack under our personal credentials, but we also need different public IPs
<bac> sinzui: why the last part?
<bac> sorry, that was cryptic, sinzui why do we need new public IPs?  because we can't wrest them away from the current assignees?
<sinzui> bac: public IPs are not shareable or transferable between accounts
<bac> hi thumper
<thumper> o/
<bac> sinzui: oh.  can they be revoked from orange and given to us?
<sinzui> bac: if we wanted to preserve the current IPs we need to revoke then hope we get the same ones when we allocate new ips
<bac> oi goi oi
<thumper> sinzui: I'm going to test bootstrapping on precise
<bac> or, dios mio as they say here
<thumper> sinzui: I have a precise machine here
<sinzui> bac: my success rate is is 25% in my attempts to get an IP I had in another account
<thumper> what is our ppa for precise stuff?
<bac> sinzui: so, what's another RT to update dns in the grand scheme?
<waigani> morning all
<bac> sinzui: so it looks like i need to push a change directly to production without running on staging first.  guess i'll wait until the morning.
<sinzui> thumper, while you slept I worked out how to use debug-log
<thumper> sinzui: okay...
<sinzui> bac: yes lets wait till the morning. I can think about how to make the machine do an update like the charm would
<sinzui> thumper, I will ping you when I would like your review.
<thumper> sinzui: oh... for docs...
<thumper> yeah, this documentation thing still slips by me ...
<thumper> sinzui: we should get a summary of the help doc into the actual command line help
<sinzui> thumper, I agree. Maybe I will make that a topic for vegas
<thumper> sinzui: for the local bootstrap test on precise
<thumper> sinzui: what is the minimum I need to install?
<sinzui> thumper, CI uses real precise + juju + juju-local
<thumper> sinzui: juju and juju-local from where?
<thumper> sinzui: also, do you know which compiler?
<sinzui> thumper, any recent. I have juju 19 and juju-local 1.18.1. I haven't changed the last package in a while
<thumper> sinzui: as I may need to build additional logging
<sinzui> thumper, good question
<thumper> sinzui: let me be clear, the precise box I have currently has no juju deps at all
 * sinzui looks
<thumper> sinzui: I'm assuming there is a ppa
<sinzui> $ apt-cache madison golang-go
<sinzui>  golang-go | 2:1.1.2-2ubuntu1~ctools1 | http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/cloud-tools/main amd64 Packages
<sinzui> thumper, and if you want very close matches to packages I can offer this...bug I assure you I have't changed local packaging since 1.18.0
<sinzui> http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/publish-revision/ws/tmp.fQ6PU5ZxX5/
<thumper> ah ha
<thumper> I have precise-updates/cloud-tools in apt
 * thumper installs juju-local
<thumper> sinzui: it seems weird to me that jam was able to boot trunk on aws but CI was not
<sinzui> thumper, I think you have that reversed
<thumper> ah... wat?
<sinzui> thumper, http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/
<sinzui> CI can deploy fine
<thumper> sinzui: what is the current state of the local provider CI tests?
<thumper> which one is that
<sinzui> jam reports that deploy is using the wrong tools. I have not seen that personally or in CI
<thumper> local-deploy is red
<sinzui> it has been broken for a few days. it is not "techincally" aws as we have done this on canonistack too
<rogpeppe> hmm, this is a regrettable error when the machine in question is down (actually, its instance has been destroyed): 2014-04-14 21:32:16 ERROR juju.cmd supercommand.go:299 some agents have not upgraded to the current environment version 1.19.0.3: machine-0
<thumper> sinzui: how come the precise-updates cloud tools doesn't have 1.18?
<rogpeppe> i think there should probably be a way to force that
<rogpeppe> thumper: hiya
<thumper> or perhaps another question would be
<thumper> why doesn't my machine see it?
<thumper> hi rogpeppe
<thumper> rogpeppe: I'm wondering if the 'regrettable error' is an understatement for something?
<rogpeppe> thumper: well, it means that the environment is now broken - i cannot upgrade it
<sinzui> thumper, politics
<rogpeppe> thumper: it is an understatement, yeah
<thumper> rogpeppe: I suppose an error message that says "you're borked, sucks to be you" wouldn't be appreciated
<rogpeppe> thumper: at least then i'd know it was deliberate...
<sinzui> thumper, Ubuntu rejected 1.16.4 (they consider backup and restore a feature). jamespage is still trying to get 1.16.6 into archive for precise to ensure they can upgrade, then go to 1.18.0
<rogpeppe> thumper: it's an interesting situation actually, because usually i'd be able to do destroy-machine --force, but in this case the machine in question is a state server
<sinzui> thumper, we have never said users can upgrade from 1.16.3 to 1.18.0
<thumper> sinzui: even in the cloud-tools?
<sinzui> It's not our repo
 * rogpeppe creates a bug
<sinzui> thumper, I talked with a few people today about it. There is a chance 1.18.1 will become official in the archive when trusty is released and customers cannot upgrade to it
<rogpeppe> hmm, actually maybe it's just a bug for me at this moment
<thumper> sinzui: aargh... that is terrible
<thumper> sinzui: ok, can confirm that 1.19.0 bootstraps the local provider on my precise machine
<thumper> r2626
<thumper> which I can see fails on CI
<thumper> sinzui: so the big question now becomes, what is different?
<sinzui> thumper, well.
<sinzui> what changed in 	lp:juju-core r2593
<sinzui> thumper, when CI slows I can run the deploy with --debug
<thumper> sinzui: that was when the machine agent became responsible for setting up the mongo upstart script
<thumper> sinzui: can you capture the mongo logs from the CI machine?
<thumper> I wonder if this is the crash that dave had reported
<thumper> sinzui: https://bugs.launchpad.net/juju-core/+bug/1306536
<_mup_> Bug #1306536: replicaset: mongodb crashes during test <juju-core:Triaged> <https://launchpad.net/bugs/1306536>
<sinzui> bugger, CI is trying the local job more than 5 times
<sinzui> thumper, I think it is related since the logs report it http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/local-deploy/1174/console
<thumper> sinzui: there is also the mongo log file
<thumper> sinzui: /var/log/upstart/juju-db-tim-local.log is my file
<thumper> sinzui: replace <tim> for ci user, and <local> for the env name
<sinzui> thumper, noted.
<thumper> sinzui: that way we'll get any extra crash info
<sinzui> the local upgrade test is still playing so I cannot start the deploy test
<thumper> ack
<thumper> surely if the upgrade test is running, then the local provider bootstraps?
<thumper> or is it taking a long time to fail?
<thumper> sinzui: perhaps also worth noting that my precise machine is running i386
<sinzui> thumper, 1.18.1 is good. We can bootstrap with stable, we cannot upgrade to unstable
<sinzui> We are amd64
<thumper> sinzui: http://paste.ubuntu.com/7252245/
<sinzui> I can bootstrap now
<thumper> sinzui: where?
<sinzui> On the CI machine
<thumper> ?!
<thumper> what changed?
<sinzui> thumper, this is the log of my bootstrap attempt https://pastebin.canonical.com/108508/
<sinzui> thumper, I didn't mean CI could pass bootstrap. I meant that the env was free for me to bootstrap
<sinzui> thumper, I didn't get logs in a local dir or juju-jenkins-local
<sinzui> thumper, maybe this config offends you: https://pastebin.canonical.com/108509/
 * thumper looks
<thumper> what is test-mode?
<thumper> what is bootstrap-timeout in?
<rogpeppe> oops, ensure-availability shouldn't have done *that*
<sinzui> thumper, this is mongodb-server https://pastebin.canonical.com/108510/
<thumper> sinzui: log?
<sinzui> thumper, test-mode tell the charm store to not count the deployment
<sinzui> no logs
<sinzui> ^
<sinzui> thumper, bootstrap failures don't seem to ever leave logs
<thumper> sinzui: same mongo version
<sinzui> hmm
<sinzui> thumper, I can try to tail something in another terminal while I bootstrap
<thumper> sinzui: can I log into that machine?
<sinzui> sure
<sinzui> thumper, ssh -i ./cloud-city/staging-juju-rsa jenkins@54.84.137.170
<thumper> sinzui: I don't have that identity
<sinzui> thumper, the key  is in lp:~sinzui/+junk/cloud-city
<sinzui> which is shared with you
 * thumper looks
<sinzui> That also has the env for everything we test
<thumper> sinzui: I'm in
<sinzui> thumper, export GOPATH=/var/lib/jenkins/jobs/local-deploy/workspace/extracted-bin/
<sinzui> thumper, export JUJU_HOME=~/cloud-city
 * rogpeppe has an environment that seems reasonably HA
<thumper> rogpeppe: \o/
<rogpeppe> thumper: there are still... strangenesses
<rogpeppe> thumper: but still, i destroyed the bootstrap instance and everything carried on much as usual
<wallyworld> thumper: sinzui: i am going to land john's recent branch "soft-login-failure-1307450" which fixes an issue preventing upgrade from 1.18 to .19 from working
<sinzui> rogpeppe, send me a bried summary of how you made it HA via the command line. I think I can reused the backup restore test to instrument a failure of a machine. I expect with HA, juju status still works after the failure
<sinzui> \o/ wallyworld
<rogpeppe> sinzui: the requisite branches haven't landed yet
<wallyworld> sinzui: well, i'm going by the description - there may be other issues :-)
<rogpeppe> sinzui: there's one which isn't ready to be proposed yet
<rogpeppe> sinzui: i can push the branch that i'm testing, if you like
<rogpeppe> sinzui: essentially i did this: http://paste.ubuntu.com/7252375/
<sinzui> rogpeppe, no rush. I am busy preparing for a release and trying to get juju 1.16.6 in the cloud archive
<sinzui> rogpeppe, excellent. as I hoped
 * rogpeppe grinds to a halt
<rogpeppe> g'night all
<waigani> night rogpeppe
<rogpeppe> waigani: ttfn
<waigani> congrats on HA
<rogpeppe> it's not there yet!
<waigani> congrats on *almost* HA ;)
<sinzui> thumper, Can you read my debug-log draft at https://docs.google.com/a/canonical.com/document/d/1BXYrLC78H3H9Cv4e_4XMcZ3mAkTcp6nx4v1wdN650jw/edit
<hazmat> why does local provider try to reverse dns on the ip addresses..
 * hazmat wonders how he got dns-name: 176.52.236.23.bc.googleusercontent.com.
<hazmat> ha.. yummy!
<hazmat> rogpeppe, sinzui  if you want an additional tester for that.. send me some instructions
<sinzui> hazmat, thank you
<hazmat> smoser, you ever seen cloudinit on trusty hang..  i'm in a container.. and the last output is http://pastebin.ubuntu.com/7252494/  but its blocking the rest of the container startup (ssh, etc).
<wallyworld> sinzui: john's branch landed at r2627 so hopefully that might help the upgrade tests pass. we'll see i guess
<smoser> hazmat, can you turn cloud-init debug on.
<smoser> and get paste.
<smoser> hazmat, in /etc/cloud/cloud.cfg.d/05_logging.cfg just turn 'handler_consoleHandler' to be
<smoser> level=DEBUG
<smoser> rather than
<smoser> level=WARNING
<smoser> you should see lots more output.
<smoser> not sure how you ran that though
<sinzui> thumper, I think CI will start testing in 15 minutes. Do you want me to disable the local tests so that you can use the env as you like?
<thumper> sinzui: yeah, for now would be good
<thumper> just otp with alexisb
<alexisb> yes sinzui I was distracting thumper, I am done he is all yours now
<sinzui> thumper, local is all yours. Say when you are done so that I can re-enable the test
<thumper> sinzui: ok
<thumper> in poking around now
<thumper> sinzui: um...
<thumper> sinzui: hangout?
<hazmat> smoser, ack
<hazmat> smoser, it was an old version of trusty i was updating.
<hazmat> i'll see if i can reproduce and log
<sinzui> thumper, I can 40 minutes. My children want dinner
<thumper> ok
<thumper> sinzui: can in 40 minutes?
<thumper> or only for 40 minutes
<thumper> :)
#juju-dev 2014-04-15
<hazmat> smoser, interesting.. coreos guys rewrote cloudinit in go..
<smoser> i hadnt' seen that.
<hazmat> smoser, its very limited subset and assumes coreos /systemd https://github.com/coreos/coreos-cloudinit
<hazmat> its a bit much for them to call it  cloudinit... its almost zero feature set overlap
<perrito666> did anyone see fwereade after this am? (and when I say AM I mean GMT-3 AM)
<davecheney> perrito666: its unusual to see him online at this time
<perrito666> davecheney: I know, he just said that he was taking a plane and returning later and then I got disconnected
<davecheney> perrito666: ok, you probably know more than i then
<perrito666> heh tx davecheney
<hazmat> hmm.. odd /bin/sh: 1: exec: /var/lib/juju/tools/unit-mysql-0/jujud: not found
<sinzui> hazmat, looks like the last message in juju-ci-machine-0's log. Jujud just disappeared 2 weeks ago. Since that machine is the gateway into the ppc testing, we left it where it was
<sinzui> thumper, I can hangout now
<hazmat> sinzui, its odd its there.. the issue is deployer/simple.go
<hazmat> it removes the symlink on failure, but afaics that method never failed, the last line is install the upstart job, and the job is present on disk.
<thumper> sinzui: just munching
<thumper> with you shortly
 * sinzui watches ci
<hazmat> sinzui, ie its resolvable with sudo ln -s /var/lib/juju/tools/1.18.1-precise-amd64/  /var/lib/juju/tools/unit-owncloud-0
<hazmat> hmm.. its as though the removeOnErr was firing
<hazmat> even on success
 * sinzui nods
<thumper> sinzui: https://plus.google.com/hangouts/_/76cpik697jvk5a93b3md4vcuc8?hl=en
<sinzui> wallyworld, jam: looks like all the upgrade test are indeed fixed. I disabled the local-upgrade test for thumper. I will retest when I have the time or when the next rev lands
<wallyworld> \o/
<thumper> sinzui: do local upgrade and local deploy run on the same machine?
<thumper> sinzui: can't hear you
<wallyworld> sinzui: so if thumper actually pulls his finger out, we could release 1.19.0 real soon now?
<hazmat> deployer worker is a bit strange .. does it use a tombstone to communicate back to the runner?
<hazmat> thumper, when you have a moment i'd like to chat as well..
<thumper> hazmat: ack
<wallyworld> hazmat: the deployer worker is similar to most others, it is created by machine agent but wrapping it inside a worker.NewSimpleWorker
<hazmat> wallyworld, ah. thanks
<wallyworld> np. that worker stuff still confuses me each time i have to re-read the code
<hazmat> the pattern is a bit different
<hazmat> trying to figure out why i'd get 2014-04-15 00:00:42 INFO juju runner.go:262 worker: start "1-container-watcher"  .. when there are no containers.. basically my manual provider + lxc seems a bit busted with 1.18
<hazmat> also trying to figure out if on a simpleworker erroring, if the runner will just ignore it and move on.
<hazmat> with no log
<hazmat> the nutshell being deploy workloads gets that jujud not found
 * hazmat instruments
<thumper> hazmat: whazzup?
<hazmat> thumper, trying to debug 1.18 with lxc + manual
<hazmat> thumper, mostly in the backlog
<sinzui> Wow.
<sinzui> abentley replace the mysql + wordpress charms with dummy charms that instrument and report what juju is up to. They have take 2-4minutes off of all the tests
<sinzui> Azure deploy in under 20 minutes
<sinzui> AWS is almost as fast as HP Cloud
<davecheney> sinzui: \o/
<waigani> wallyworld: should I patch envtools.BundleTools in a test suite e.g. coretesting? Or should I copy the mocked function to each package that is failing and patch there?
<waigani> wallyworld: it's just there seem to be a lot of tests that are all effected/fixed by this patch
<wallyworld> use s.PatchValue
<waigani> wallyworld: yep I am
<waigani> but should I do it in a more generic suite?
<wallyworld> so if the failures are clustered in a particular suite, you can use that in SetUpTest
<wallyworld> not sure it's worth doing a fixture for a one liner
<waigani> wallyworld: that is what I'm doing now, but aready I've done that in about 4 packages, with more to go
<waigani> wallyworld: oh okay, you mean just patch in each individual test?
<wallyworld> possibly, depends on hwere the failures are
<waigani> okay, I'll do it the verbose way and we can cleanup in review if needed
<wallyworld> but if the failures are in a manageable nuber of suites, doing the patc in setuptest makes sense
<waigani> okay
<thumper> what the actual fuck!
<sinzui> wallyworld, CI hates the unit-tests on precise. Have you seen these tests fail consistently in pains before? http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/617/console
<sinzui> ^ The last three runs on different precise instances has tghe same failure
<thumper> sinzui: I have some binaries copying to the machine
<wallyworld> sinzui: i haven't seen those. and one of them, TestOpenStateWithoutAdmin, is the test added in the branch i landed for john to make upgrades work
<sinzui> thank you thumper.
<wallyworld> so it seems there's a mongo/precise issue
<wallyworld> thumper: were you running some tests in a precise vm?
<thumper> wallyworld: I have a real life precise machine
<thumper> wallyworld: that it works fine
<thumper> on
<thumper> I've hooked up loggo to the mgo internals logging
<thumper> so we can get internal mongo logging out of the bootstrap command
<thumper> uploading some binaries now
<wallyworld> hmm. so what's different on jenkins then to cause the tests to fail
<thumper> not sure
<thumper> same version of mongo
<thumper> my desktop is i386
<thumper> ci is amd64
<thumper> that is all I can come up with so far
<wallyworld> if that is th cause then we're doomed
<thumper> :-)
<thumper> FSVO doomed
<wallyworld> yeah :-)
<thumper> the error is that something inside mgo is explicitly closing the socket
<thumper> when we ask to set up the replica set
<wallyworld> thumper: so, one thing it could be - HA added an admin db
<thumper> hence the desire for mor logging
<thumper> wallyworld: my binaries work locally
<thumper> and copying up
<thumper> if that is the case
<thumper> and my binaries work
<wallyworld> and the recently added test which i reference above tests that we can ignore unuath access to that db
<thumper> it could be that
 * thumper ndos
<wallyworld> and that test fails
<thumper> still copying that file
 * thumper waits...
 * wallyworld waits too....
<thumper> and here I was wanting to sleep
<thumper> not feeling too flash
<wallyworld> :-(
<hazmat> thumper, sinzui fwiw.  my issue was user error around series. i have trusty containers but had registered them as precise, machine agent deployed fine, unit agents didn't like it though. unsupported usage mode.
<thumper> haha
<hazmat> thumper, concievably the same happens when you dist-upgrade a machine
<sinzui> thumper, wallyworld: the machines the run the unit tests are amd64 m1.larges for precise and trusty. We 95% of users deploy top amd64
<thumper> hmm...
<thumper> sinzui: right...
<sinzui> we saw numbers that showed a very small number were 1386, we assume those are clients, not services
 * thumper nods
<thumper> wallyworld: can I get you to try the aws reproduction?
<thumper> wallyworld: are you busy with anything else?
<wallyworld> i am but i can
<wallyworld> what's up with aws?
<thumper> just trying to replicate the issues that we are seeing on CI with the local provider not bootstrapping
<thumper> it works on trusty for me
<thumper> and precise/i386
<thumper> but we should check real precise amd64
<wallyworld> ok, so you want to spin up an aws precise amd64 and try there
<thumper> right
<wallyworld> okey dokey
<thumper> install juju / juju-local
<wallyworld> yarp
<thumper> probably need to copy local 1.19 binaries
<thumper> to avoid building on aws
<wallyworld> right
<thumper> ugh...
<thumper> man I'm confused
<thumper> wallyworld: sinzui: using my extra logging http://paste.ubuntu.com/7253010/
<thumper> so not a recent fix issue
<wallyworld> thumper: we should just disable the replica set stuff
<wallyworld> it has broken so much
<thumper> perhaps worth doing for the local provider at least
<thumper> we are never going to want HA on local
<thumper> it makes no sense
<sinzui> closed explicitly? That's like the computer says no
<thumper> sinzui: ack
 * thumper has a call now
<sinzui> axw, Is there any more I should say about azure availability sets? https://docs.google.com/a/canonical.com/document/d/1BXYrLC78H3H9Cv4e_4XMcZ3mAkTcp6nx4v1wdN650jw/edit
<axw> sinzui: otp
<wallyworld> thumper: sinzui: i'm going to test this patch to disable the mongo replicaset setup for local provider https://pastebin.canonical.com/108522/
<wallyworld> this should revert local bootstrap to be closer to how it was prior to HA stuff being added
<wallyworld> and hence it should remove the error in thumper's log above hopefully
<axw> sinzui: can I have permissions to add comments?
<thumper> sinzui: this line is a bit suspect 2014-04-15 02:20:44 DEBUG mgo server.go:297 Ping for 127.0.0.1:37019 is 15000 ms
<thumper> sinzui: locally I have 0ms
<sinzui> sorry axw I gave all canonical write access as I intended
<axw> sinzui: ta
 * sinzui looks in /etc
<axw> sinzui: availability-sets-enabled=true by default; I'll update the notes
<thumper> wallyworld: that patch is wrong
<wallyworld> i know
<wallyworld> found that out
<wallyworld> doing it differently
<thumper> wallyworld: jujud/bootstrap.go line 165, return there if local
<wallyworld> yep
<axw> sinzui: I updated the azure section, would you mind reading over it to see if it makes sense to you?
<sinzui> Thank you axw. Looks great
<thumper> sinzui: wallyworld, axw: bootstrap failure with debug mgo logs: http://paste.ubuntu.com/7253155/
<thumper> sinzui: I don't know enough to be able to interpret the errors
<thumper> sinzui: perhaps we need gustavo for it
<sinzui> thanks for playing thumper
<wallyworld> sinzui: can you re-enable local provider tests in CI? i will do a branch to try and fix it and then when landed CI can tell us if it works
<thumper> sinzui: I'm done with the machine now
<sinzui> I will re-enable the tests
<wallyworld> thanks
<wallyworld> let's see if the next branch i land works
<sinzui> thumper, wallyworld . I think you had decided to disable HA on local...and how would I do HA with local...Does that other machine get proper access to my local machine that probably has died with me at the keyboard
<thumper> sinzui: you wouldn't do HA with the local provider
<thumper> :)
<wallyworld> sinzui: we are trying to set up replicaset and other stuff which is just failing with local and for 1.19 t least, i can't see why we would want that
<sinzui> :)
<wallyworld> so to get 1.19 out, we can disable and think about it later
<sinzui> wallyworld, really, I don't think we ever need to offer HA for local provider.
<wallyworld> maybe for testing
<wallyworld> but i agree with you
<wallyworld> i was being cautious in case others were attached to the idea
<wallyworld> axw: this should make local provider happy again on trunkhttps://codereview.appspot.com/87830044
<axw> wallyworld: was afk, looking now
<wallyworld> ta
<axw> wallyworld: reviewed
<wallyworld> ta
<wallyworld> axw: everyone hates that we use lcal provider checks in jujud
<wallyworld> been a todo for a while to fix
<axw> yeah, I kind of wish we didn't have to disable replicasets at all though
<axw> I know they're not needed, but if they just worked it would be nice to not have a separate code path
<wallyworld> axw: yeah. we could for 1.19.1, but we need 1.19 out the door and HA still isn't quite ready anuway
<wallyworld> it is indeed a bandaid. nate added another last week also
<axw> wallyworld: yep, understood
<wallyworld> makes me sad too though
<sinzui> wallyworld, Your hack solved local. The last probable issue is the broken unit tests for precise. I reported bug 1307836
<_mup_> Bug #1307836: Ci unititests fail on precise <ci> <precise> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1307836>
<wallyworld> sinzui: yeah, i just saw that but didn't think you'd be awake
<sinzui> I don't want to be awake
<wallyworld> i didn't realise we still had the precise issue :-(
<wallyworld> i'll look at the logs
<wallyworld> hopefully we'll have some good news when you wake up
<sinzui> wallyworld, azure-upgrade hasn't passed yet. It may not because azure is unwell this hour. We don't need to worry about a failure for azure. I can ask for a retest when the cloud is better
<wallyworld> righto
 * sinzui finds pillow
<wallyworld> good night
<davecheney>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
<davecheney>  7718 ubuntu    20   0 2513408 1.564g  25152 S  45.2 19.6   2:41.51 juju.test
<davecheney> memory usage for Go tests is out of control
<wallyworld> jam1: you online?
<jam1> morning wallyworld
<jam1> I am
<wallyworld> g'day
<wallyworld> jam1: so with you branch, and one i did, CI is happy for upgrades
<wallyworld> but
<wallyworld> a couple of tests fail under precise
<wallyworld> there's the one you added for your branch, plus TestInitializeStateFailsSecondTime
<jam1> wallyworld: links to failing tests ?
<wallyworld> the error says that a connection to mongo is unauth
<wallyworld> http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/621/consoleFull
<jam1> wallyworld: and are you able to see the local provider fail with replica set stuff, because neither Tim or I could reproduce it.
<wallyworld> yeah, i saw it
<wallyworld> and fixed
<wallyworld> i had to disable HA for local provider
<jam1> and while we don't have to have replica set local, I'd prefer consistency and the ability to test out HA locally if we could
<wallyworld> sure
<wallyworld> but to get 1.19 out the door i went for a quick fix
<wallyworld> which we can revisit in 1.19.1
<wallyworld> curtis was ok with that
<jam1> wallyworld: so I certainly had a WTF about why I was able to create a machine in "admin" but not able to delete it without logging in as the admin I just created.
<jam1> wallyworld: so it seems like some versions of Mongo don't have that security hole
<jam1> but I can't figure out how to log in as an actual admin, but I can try digging into the TestInitialize stuff a bit more for my test.
<wallyworld> so we are using a different mongo on precise vs trusty?
<jam1> wallyworld: 2.4.6 vs 2.4.9
<wallyworld> ok, i didn't realise that
<jam1> Trusty is the one that lets you do WTF stuff.
<wallyworld> :-(
<wallyworld> there are 2 failing tests
<wallyworld> maybe more, i seem to recall previous logs showing more
<wallyworld> but the latest run had 2 failures only
<wallyworld> the other one was TestInitializeStateFailsSecondTime
<wallyworld> jam1: i gotta run to an appointment soon, but will check back when i return. if we can this this sroted, we can at least release 1.19.0 asap and deal with the workarounds for 1.19.1
<jam1> wallyworld: is your code landed?
<wallyworld> yep
<jam1> k
<wallyworld> happy to revert it if we can find a fix
<jam1> I'll pick it up
<wallyworld> thanks, i can look also but ony found out about precise tests just before and sadly i gotta duck out
<jam1> hmm... LP failing to load for me right now
<davecheney> wallyworld: Ci is running an anchient version of mongo
<davecheney> that won't help
<jam1> davecheney: sinzui: I would think we should run mongo 2.4.6 which is the one you get from the cloud-archive:tools
<davecheney> jam1: agreed
<jam1> davecheney: are they running 2.2.4 from the PPA?
<davecheney> jam1: good point, 2.0 was all that shipped in precise
<jam1> I'm just trying to find a way to reproduce, and I thought there was a 2.4.0 out there for a while, but I can't find it
<jam1> and it isn't clear *what* version they are runnig.
<davecheney> jam1: Get:40 http://ppa.launchpad.net/juju/stable/ubuntu/ precise/main mongodb-clients amd64 1:2.2.4-0ubuntu1~ubuntu12.04.1~juju1 [20.1 MB]
<davecheney> Get:41 http://ppa.launchpad.net/juju/stable/ubuntu/ precise/main mongodb-server amd64 1:2.2.4-0ubuntu1~ubuntu12.04.1~juju1 [5,135 kB]
<davecheney> this is our fault
<davecheney> remember that old ppa
<jam1> yep, thanks for pointing me to it
<jam1> well, I can at least test with it.
<davecheney> so, that isn't he cloud archive
<davecheney> :emoji concerned face
<jam1> At one point we probably wanted to maintain compat with 2.2.4, but I'm not *as* concerned with it anymore.
<davecheney> 2.2.4 never shipped in any main archive
<davecheney> i don't think we have a duty of compatability
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1307289/comments/1
<davecheney> if anyone cares
<davecheney> btw, go test ./cmd/juju{,d}
<davecheney> takes an age because the test setup is constantly recompiling the tools
<davecheney> why are the cmd/juju tests calling out to bzr ?
<davecheney>  FAIL: publish_test.go:75: PublishSuite.SetUpTest
<davecheney> publish_test.go:86:
<davecheney>     c.Assert(err, gc.IsNil)
<davecheney> ... value *errors.errorString = &errors.errorString{s:"error running \"bzr init\": exec: \"bzr\": executable file not found in $PATH"} ("error running \"bzr init\": exec:
<davecheney>  \"bzr\": executable file not found in $PATH")
<davecheney> what is this shit ?
<rogpeppe> mornin' all
<davecheney> https://bugs.launchpad.net/juju-core/+bug/1307865
<davecheney> this seems like an obvious failure
<davecheney> why does it only happen sporadically ?
<rogpeppe> davecheney: that's been the case for over a year (tests running bzr)
<davecheney> rogpeppe: fair enough
<rogpeppe> davecheney: i agree, that does seem odd
<jam1> rogpeppe: do we have thoughts on how we would have a Provider work that didn't have storage? I know we don't particularly prefer the HTTP Storage stuff that we have.
<rogpeppe> jam1: we'd need to provide something to the provider that enabled it to fetch tools from the mongo-based storage
<jam1> rogpeppe: so we'd have to do away with "provider-state" file as well, right?
<rogpeppe> jam1: other than that, i don't think providers rely much on storage, do they?
<jam1> rogpeppe: we use it for charms
<rogpeppe> jam1: so... provider-state is *supposed* to be an implementation detail of a given provider
<jam1> sure
<jam1> it is in the "common code" path, but you wouldn't have to use it/could make that part optional
<rogpeppe> jam1: we don't really rely on it much these days
<jam1> rogpeppe: we'd want bootstrap to cache the API creds and then we rely on it very little
<jam1> you'd lose the fallback path
<rogpeppe> jam1: yeah, and we don't want to lose that entirely
<rogpeppe> jam1: for a provider-state replacement, i'd like to see the fallback path factored out of the providers entirely
<jam1> well, it only works because there is a "known location" we can look in that is reasonably reliable. If a cloud doesn't provide its own storage, then any other location is just guesswork
<jam1> anyway, switching machines now
<rogpeppe> jam1: ok
<rogpeppe> axw: looking at http://paste.ubuntu.com/7252280/, in the first status machines 3 and 4 are up AFAICS.
<rogpeppe> axw: and that's the status that i am presuming that ensure-availability was acting on
<axw> rogpeppe: in the first one, yes, but how do you know when they went down?
<axw> rogpeppe: my point was it could have changed since you did "juju status"
<rogpeppe> axw: there was a very short time between the first status and calling ensure-availability. i don't see any particular reason for it to have gone down in that time period, although of course i can't be absolutely sure
<axw> right, that's why I asked about the log. I'm really only guessing
<rogpeppe> axw: luckily i still have all the machines up, so i can check the log
<axw> rogpeppe: I see no reason why the agent would have gone down after calling ensure-availability either
<axw> cool
<rogpeppe> axw: it would necessarily go down after calling ensure-availability, because mongo reconfigures itself and agents get thrown out
<axw> rogpeppe: for *all* machines? not just the shunned ones?
<rogpeppe> axw: yeah
<rogpeppe> axw: we could really do with some logging in ensure-availability to give us some insight into why it's making the decisions it is
<axw> yeah, fair enough
<rogpeppe> axw: here's the relevant log: http://paste.ubuntu.com/7252375/
<rogpeppe> axw: the relevant EnsureAvailability call is the second one, i think
<rogpeppe> axw: it's surprising that the connection goes down so quickly after that call
<axw> rogpeppe: wrong pastebin?
<rogpeppe> axw: ha, yes: http://paste.ubuntu.com/7253848/
<axw> rogpeppe: machine-3's API workers have dialled to machine-0's API server ...
<axw> rogpeppe: not saying that's the cause, but it's strange I think
<rogpeppe> axw: that's not ideal, but it's understandable
<rogpeppe> axw: one change i want to make is to make every environ manager machine dial the API server only on its own machine
<axw> yep
<jam> axw: rogpeppe: right, we originally only wrote "localhost" into the agent.conf. I think the bug is that the connection caching logic is overwriting that ?
<rogpeppe> jam: yeah - each agent watches the api addresses and caches them
<jam> rogpeppe: I thought when we spec'd the work we were going to explicitly skip overwritting when the agents were "localhost"
<rogpeppe> jam: but also, the first API address received by a new agent is not going to be localhost
<jam> rogpeppe: well, the thing that monitors it could just do if self.IsMaster() => localhost
<rogpeppe> jam: i don't remember that explicitly
<jam> or not run the address poller if IsMaster
<jam> sorry
<jam> IsManager
<jam> not Master
<rogpeppe> jam: i don't think it's IsMaster - i think it's is-environ-manager
<rogpeppe> jam: right
<rogpeppe> jam: i've been thinking about whether to run the address poller if we're an environ manager
<rogpeppe> s/poller/watcher/
<rogpeppe> jam: my general feeling is that it is probably worth it anyway
<rogpeppe> jam: because machines can lose their environment manager status
<rogpeppe> jam: even though we don't fully support that yet
<jam> rogpeppe: won't they get bounced under that circumstance?
<jam> anyway, we can either simplify it by what we write in agent.conf, or we could detect that we are IsManager and if so force localhost at api.Open time.
<rogpeppe> jam: they'll get bounced, but if they do we want them to know where the other API hosts are
<rogpeppe> jam: i was thinking of going for your latter option above
<axw> rogpeppe: I can't really see much from the logs, I'm afraid. there is one interesting thing: "dialled mongo successfully" just after FullStatus and before EnsureAvailability
<rogpeppe> axw: i couldn't glean much from them either
<rogpeppe> axw: i'm just doing a branch that adds some logging to EnsureAvailability
<rogpeppe> axw: then i'll try the live tests again to see if i can see what's going on
<axw> rogpeppe: any idea why agent-state shows up as "down" just after I bootstrap? should FullStatus be forcing a resynchronisation of state?
<rogpeppe> axw: i think it's because the presence data hasn't caught up
<axw> rogpeppe: oh. I wonder if that's it? FullStatus may be reporting wrong agent state in your test too
<rogpeppe> axw: we should definitely look into that
<rogpeppe> axw: i think that FullStatus probably sees the same agent state that the ensure availability function is seeing
<axw> rogpeppe: yeah, true
<axw> rogpeppe: https://codereview.appspot.com/88030043
<rogpeppe> axw: nice one! looking.
<axw> jam: I've reverted your change from last night that eats admin login errors; this CL adds machine-0 to the admin db if it isn't there already
<jam> axw: any chance that we could get the port from mongo rather than passing it in?
<axw> rogpeppe: this is just the bare minimum, will follow up with maybeInitiateMongoServer, etc.
<axw> jam: can do, but it requires parsing and I thought it may as well get passed in since it's already known to the caller
<jam> axw: well we can have mongo start on port "0" and dynamically allocate, rather than our current ugly hack of allocating a port, and then closing it and hoping we don't race.
<axw> jam: I assume you are referring to the EnsureAdminUserParams.Port field
<jam> axw: if it is clumsy to parse, then we can pass it in.
<axw> oh I see what you mean
<axw> umm. dunno. I will take a look
<jam> we *can* just start on port 37017, but that means other goroutines will also think that mongo is up, and for noauth stuff, we really want as little as possible to connect to it.
<jam> axw: I always get thrown off by "upstart.NewService" because *no we don't want to create a new upstart service*
<jam> but that is just "create a new memory representation of an upstart service"
<axw> jam: heh yeah, it is a misleading name
<jam> axw: I'm not sure why upstart specifically throws me off.
<jam> as I certainly know the pattern.
<jam> axw: can "defer cmd.Process.Kill()" do bad things if the process has already died ?
<jam> axw: is it possible to do EnsureAdminUser as an upgrade step rather than doing it on all boots?
<axw> jam: if the pid got reused very quickly, yes I think so
<jam> axw: I'm not particularly worried about PID reuse that fast
<axw> jam: not really feasible as an upgrade step, as they require an API connection
<jam> I'm more wondering about a panic because the PID didn't exist
<axw> then there's all sorts of horrible interactions with workers dying and restarting all the others, etc.
<axw> jam: I'm pretty certain it's safe, but I'll double check
<wallyworld> jam: hi, any update on the precise tests failures?
<axw> jam: late Kill does not cause a panic
<jam> wallyworld: they pass with mongo 2.4.6 from cloud-archive:tools, they fail with 2.2.4 from ppa:juju/stable
<jam> on all machines that matter we use cloud-archive:tools
<jam> wallyworld: so CI should be using that one
<wallyworld> great, so we can look to release 1.19
<jam> wallyworld: and axw has a patch that replaces my change anyway.
<jam> wallyworld: the replicaset failure isn't one that I could reproduce...
<jam> since it is flaky
<wallyworld> hmmm. i hate those
<wallyworld> CI could reproduce it
<jam> wallyworld: it is *possible* we just need to wait longer, but I hate those as well :)
<axw> jam: this is what happens if you try to use "--port 0" in mongod: http://paste.ubuntu.com/7254007/
<jam> axw: bleh.... ok
<jam> I don't think we want to use the "default mongo port of 27017" so we might as well use our own since we know we just stopped the machine
<jam> stopped the service
<rogpeppe> axw: reviewed
<axw> thanks
<rogpeppe> jam: using info.StatePort seems right to me (at least in production).
<jam> rogpeppe: for "bring this up in single user mode so we can poke at secrets and then restart it" I'd prefer it was more hidden than that, but I can live with StatePort being good-enough.
<rogpeppe> jam: if there's someone sitting on localhost waiting for the fraction of a second during which we add the admin user, i think the user is probably not going to be happy anyway
<rogpeppe> jam: note that the vulnerability is *only* to processes running on the local machine
<rogpeppe> jam: and if there are untrusted processes running on the bootstrap machine, they're in trouble anyway
<jam> rogpeppe: I'm actually more worried about the other goroutines in the existing process waking up, connecting, thinking to do work, and then getting shut down again.
<jam> rogpeppe: more from a cleanliness than a "omg we broke security" perspective
<rogpeppe> jam: what goroutines would those be?
<jam> rogpeppe: so this is more about "lets not force ourselves to think critically about everything we are doing and be extra careful that we never run something we thought we weren't". Vs "just don't expose something we don't want exposed so we can trust nothing can be connected to it."
<rogpeppe> jam: AFAIK there are only two goroutines that connect to the state - the StateWorker (which we're in, and which hasn't started anything yet) and the upgrader (which requires an API connection, which we can't have yet because the StateWorker hasn't come up yet.
<rogpeppe> jam: even if we *are* allowed to connect to the mongo, i don't think we can do anything nasty accidentally
<rogpeppe> jam: well, i suppose we could if were malicious
<axw> rogpeppe: I tested by upgrading from 1.18.1. that's good enough right?
<rogpeppe> axw: i think so, yeah
<waigani> wa
<waigani> wallyworld: branch is up: https://codereview.appspot.com/87130045 :)
<waigani> lbox didn't update the description on codereview, but did on lp??
<waigani> anyway, bedtime for me.
<waigani> night all
<natefinch> morning all
<jam> morning natefinch
<rogpeppe> axw: https://codereview.appspot.com/88080043
<rogpeppe> axw: a bit of a refactoring of EnsureAvailability - hope you approve
<wallyworld> wallyworld: ok
<axw> rogpeppe: cooking dinner, will take a look a bit later
<rogpeppe> jam, natefinch, mgz: review of above would be appreciated
<natefinch> rogpeppe: sure
<rogpeppe> natefinch: have you pushed your latest revision of 041-moremongo ?
<rogpeppe> natefinch: (i want to merge it with trunk, but i don't want us to duplicate that work, as wallyworld's recent changes produce fairly nasty conflicts)
<wallyworld> rogpeppe: if you can fix local provider, feel free to revert my work
<wallyworld> i only landed it to get 1.19 out the door
<wallyworld> and local provider + HA (mongo replicaets) = fail :-(
<rogpeppe> wallyworld: it seemed to work ok for me actually
<wallyworld> not for me or CI sadly
<natefinch> rogpeppe: it's pushed now
<rogpeppe> wallyworld: how did it fail?
<wallyworld> CI has been broken for days
<wallyworld> mongo didn't start
<rogpeppe> pwd
<wallyworld> hence machine agent didn't come up
<rogpeppe> wallyworld: what was the error from mongo?
<jamespage> sinzui, I think I just got an ack to use 1.16.6 via SRU to support the MRE for juju-core
<wallyworld> um, can't recall exactly, it will be in the CI logs
<jamespage> sinzui, I'll push forwards getting it into proposed this week
<wallyworld> my local dir is now blown away
<rogpeppe> wallyworld: np, just interested
<wallyworld> sorry, i should have taken beter notes
<wallyworld> rogpeppe: i think that there wasn't much in the mongo logs from memory, tim had to enable extra logging
<wallyworld> he was debugging why stuff failes on precise
<wallyworld> but we know now that's due to 2.26 vs 2.49
<natefinch> rogpeppe: are there tests for that EnsureAvailability code?
<rogpeppe> natefinch: yes
<natefinch> rogpeppe:  cool
<rogpeppe> natefinch: the semantics are unaffected, so the tests remain the same
<natefinch> rogpeppe:  awesome, that's what I figured.
<axw> rogpeppe: reviewed. thanks, it's a little clearer now
<rogpeppe> axw: thanks a lot
<jam> wallyworld: rogpeppe: The error I saw in CI was when Initiate went to do a replicaSet operation, it would get a Explicitly Closed message.
<jam> Note, though, that CI has been testing with mongo 2.2.4 for quite some time.
<jam> (and still is today, AFAIK, though I'm trying to push to get them to upgrade)
<rogpeppe> jam: interestin
<rogpeppe> g
<jam> rogpeppe: https://bugs.launchpad.net/juju-core/+bug/1306212
<_mup_> Bug #1306212: juju bootstrap fails with local provider <bootstrap> <ci> <local-provider> <regression> <juju-core:In Progress by jameinel> <https://launchpad.net/bugs/1306212>
<wallyworld> yes, i do recall that was one of the errors
<jam> 2014-04-10 04:57:43 INFO juju.replicaset replicaset.go:36 Initiating replicaset with config replicaset.Config{Name:"juju", Version:1, Members:[]replicaset.Member{replicaset.Member{Id:1, Address:"10.0.3.1:37019", Arbiter:(*bool)(nil), BuildIndexes:(*bool)(nil), Hidden:(*bool)(nil), Priority:(*float64)(nil), Tags:map[string]string(nil), SlaveDelay:(*time.Duration)(nil), Votes:(*int)(nil)}}} 2014-04-10 04:58:18 ERROR juju.cmd supercommand.go:299 cannot initiat
<jam> rogpeppe: natefinch: I wrote this patch https://code.launchpad.net/~jameinel/juju-core/log-mongo-version/+merge/215656 to help us debug that sort of thing if anyone wants to review it
<wallyworld> although i'm running 2.4.9 locally and still has issues
<wallyworld> had
<jam> wallyworld: interesting, as neither myself nor tim were able to reproduce it
<jam> and I tried 2.4.9 on Trusty and 2.4.6 on Precise
<jam> local bootstrap always just worked
<rogpeppe> natefinch: i've merged trunk now - you can pull from lp:~rogpeppe/juju-core/natefinch-041-moremongo
<wallyworld> all i know is that it didn't work, and then i disableded --replSet from the upstart script and it worked
<jam> though... hmmm. I did run into godeps issues once, so it is possible juju bootstrap wasn't actually the trunk I thought it was.
<wallyworld> and that also then fixed CI
<natefinch> jam: I think I've seen the explicitly closed bug once or twice.
<jam> natefinch: CI has apparently been seeing it reliably for 4+ days
<jam> wallyworld: CI passed local-deploy in r 2628 http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/local-deploy/
<jam> and now even with axw's 2629 patch
<natefinch> jam: google brings up a 2012 convo with gustavo about it where the culprit seemed to be load on mongo, but not definitively.  We should mention it to him
<jam> natefinch: given this is during bootstrap, there should be 0 load on mongo
<wallyworld> jam: 2628 was my atch to make it work
<jam> wallyworld: certainly, just mentioning CI saw it and was happy again
<wallyworld> don't worry, i was watching it :-)
<jam> natefinch: so I just checked the previous 6 attempts, and all of them failed with replica set: Closed explicitly.
<natefinch> rogpeppe: thanks
<jam> natefinch: note that 2.2.4 failed other tests that using TestInitiate
<natefinch> jam: the important part was: we should talk to Gustavo
<natefinch> (where we probably means me :)
<jam> with being unable to handle admin logins.
<natefinch> jam: interesting
<natefinch> jam: any chance we can abandon 2.2.4?
 * natefinch loves dropping support for things
<jam> natefinch: hopefully. It shouldn't be used in the field. It is only in our ppa, which means only Quantal gets it.
<jam> http://docs.mongodb.org/v2.4/reference/method/db.addUser/
<jam> says it was changed in 2.4
<jam> and is superseded in 2.6 by createUser
<jam> natefinch: and the mgo docs say we should be using UpsertUser: http://godoc.org/labix.org/v2/mgo#Database.UpsertUser
<mgz> we drop quantal support... tomorrow
<jam> natefinch: seems like mongo's security model is instable over 2.2/2.4/2.6 which doesn't bode very well for us managing compatibility
<mgz> no, end of the week
<jam> mgz: well it wouldn't be hard to just put 2.4.6 into ppa:juju/stable
<jam> regardless of Q
<natefinch> jam: that seems wise
<jam> mgz: that would also "fix" CI, because they seem to install it from the PPA as well
<mgz> well, we'd have upgrade questions like that
<mgz> but yeah
<jam> jamespage: ^^ is it possibel to get 2.4.6 into ppa:juju/stable ?
<natefinch> rogpeppe: if you want to work on that moremongo branch, I can try to get that localhost stateinfo branch in a testable state.
<rogpeppe> natefinch: ok
<rogpeppe> natefinch: what more needs to be done in the moremongo branch?
<jamespage> jam: context?
<jam> jamespage: CI and Quantal users will install MongoDB from ppa:juju/stable, but it is currently 2.2.4 which is "really old" now.
<jam> So if we could just grab the one in cloud-archive:tools (2.4.6) it would make our lives more consistent.
<jam> I believe that is the version in Saucy, and Trusty has 2.4.9
<jamespage> jam: I've pushed a no-change backport of 2.4.6 for 12.04 and12.10 into https://launchpad.net/~james-page/+archive/juju-stable-testing
<jamespage> just to see if it works
<jamespage> I have a suspicion that its not a no-change backport
<jam> jamespage: we only really need it for P for the CI guys
<jam> since Q is going EOL
<jam> jamespage: we can potentially just point them at cloud-archive:tools if it is a problem
<jamespage> jam: that might be better
<jamespage> jam: that way they will get the best mongodb that has been released with ubuntu
<jam> jamespage: well, we need them to be testing against the version that we'll be installing more than "just the best", but given that we install from their ourselves it seems to fit.
<jam> There is a question about U
<jam> given that it won't be in cloud-tools
<jam> so we may have to do another PPA trick
<jam> natefinch: the recent failure of rogpeppe's branch in TestAddRemoveSet is interesting. It seems to be spinning on: attempting Set got error: replSetReconfig command must be sent to the current replica set primary.
<jam> context: https://code.launchpad.net/~rogpeppe/juju-core/548-destroy-environment-fix/+merge/215697
<jamespage> jam: what about U?
<jam> jamespage: in the version after Trusty, how do we install the "best for U". For Q we had to use the ppa:juju/stable because P was the only thing in cloud-archive:tools
<jam> which then got out of date
<jam> We didn't have to for S because the "best" was the same thing as in cloud-archive:tools
<jamespage> jam: the best for U will be in U
<jam> jamespage: well, it wasn't in Q
<jam> and when V comes out, it may no longer be the best for U, right?
<jamespage> jam: that's probably because 18 months ago this was all foobar
<jamespage> go juju did not exist in any meaningful way
<jam> jamespage: sure. I can just see that 2.6 is released upstream, and we may encounter another "when do we get 2.6 in Ubuntu" where the threshold is at an incovenient point
<jamespage> jam: you must maintain 2.4 compat as that's whats in 14.04
<rogpeppe> jam, natefinch, mgz: how about this? http://paste.ubuntu.com/7254781/
<mgz> rogpeppe: seems reasonable
<mgz> I prefer interpreted values to raw dumps of fields in status
<mgz> as it's the funny mix between for-machines markup and textual output for the user
<natefinch> rogpeppe: when does a machine get into n, n?
<rogpeppe> natefinch: when it's deactivated by the peergrouper worker
<natefinch> but why would that happen?
<rogpeppe> natefinch: ok, here's an example:
<rogpeppe> natefinch: we have an active server (wantvote, hasvote)
<rogpeppe> natefinch: it dies
<rogpeppe> natefinch: we run ensure-availability
<rogpeppe> natefinch: which sees that the machine is inactive, and marks it as !wantsvote
<rogpeppe> natefinch: the peergrouper worker sees that the machine no longer wants the vote, and removes its vote
<rogpeppe> natefinch: and sets its hasvote status to n
<rogpeppe> natefinch: so our machine now has status (!wantsvote, !hasvote)
<rogpeppe> natefinch: if we then run ensureavailability again, that machine is now a candidate for having its environ-manager status removed
<rogpeppe> natefinch: alternatively, the machine might come back up again
<natefinch> I see, so hasvote is actual replicaset status, and wants vote is what we want the replicaset status to be
<rogpeppe> natefinch: yes
<natefinch> sorry gotta run, forgot it's tuesday
<jam> mgz: so the branch up for review (which is approved) actually has the errors as a prereq
<mgz> jam: yeah, I was sure there was something like that
<rogpeppe> natefinch: i've dealt with a bunch more conflicts merging trunk and pushed the result: ~rogpeppe/juju-core/natefinch-041-moremongo
<rogpeppe> natefinch: ping
 * rogpeppe goes for lunch
<natefinch> rogpeppe: sorry, just got back
<axw> rogpeppe natefinch: I'll continue looking at HA upgrade - upstart rewriting and MaybeInitiateMongoServer in the machine agent. Let me know if there's anything else I should look at
<natefinch> axw: that seems like a good thing to do for now.  the rewriting should work as-is, once we remove the line that bypasses it
<axw> natefinch: it doesn't quite, because the replset needs to be initiated too
<axw> natefinch: and that's slightly complicated because that requires the internal addresses from the environment
<natefinch> axw: you should be able to get the addresses off the instance and pass it into SelectPeerAddress, and get the right one.  That's what jujud/bootstrap.go does.  Should work in the agent, too, I'd think
<axw> natefinch: yep, the only problem is getting the Environ. the bootstrap agent gets a complete environ config handed to it; the machine agent needs to go to state
<axw> natefinch: anyway, I will continue on with that. if you think of something else I can look at next, maybe just send me an email
<natefinch> axw: will do, and thanks
<axw> nps
<rogpeppe> natefinch: that's ok
<rogpeppe> natefinch: how's localstateinfo coming along?
<rogpeppe> mgz, jam, natefinch: trivial (two line) code review anyone? fixes a sporadic test failure. https://codereview.appspot.com/88130044
<natefinch> rogpeppe: haven't gotten far this morning.  My wife should be back any minute to take the baby off my hands, which will makes things go faster
<rogpeppe> natefinch: k
<jam> rogpeppe: shouldn't there be an associated test sort of change ?
<natefinch> rogpeppe: how does that change fix the test failure?
<rogpeppe> jam: the reason for the fix is a test failure
<rogpeppe> jam: i can add another test, i guess
<natefinch> ideally a test that otherwise always fails :)
<jam> rogpeppe: so this is that sometimes during teardown we would hit this and then not restart because it was the wrong type ?
<rogpeppe> jam: the test failure was this: http://paste.ubuntu.com/7255340/
<rogpeppe> jam: i'm actually not quite sure why it is sporadic
<natefinch> I see, we always expect it to be errterminateagent, but we were masking that along with other failures
<rogpeppe> natefinch: yes
<natefinch> rogpeppe: how does the defer interact with locally scoped err variables inside if statements etc?
<natefinch> maybe that's the problem?  It's modifying the outside err, but we're returning a different one
<rogpeppe> natefinch: the return value is assigned to before returning
<natefinch> ahgh right
<rogpeppe> natefinch: from http://golang.org/ref/spec#Return_statements: "A "return" statement that specifies results sets the result parameters before any deferred functions are executed."
<jam> rogpeppe: so it looks like you only run into it if you get ErrTerminate before init actually finishes
<rogpeppe> jam: i'm not sure why the unit isn't always dead for this test on entry to Uniter.init
<rogpeppe> jam: tbh i don't want to take up the rest of my afternoon grokking the uniter tests - i'll leave this alone until i have some time.
<rogpeppe> jam: (i agree that it indicates a lacking test in this area)
<jam> rogpeppe: so LGTM for the change, though it does raise the question that if we wrapped errors without dropping context it might have worked as  well :)
<rogpeppe> jam: yeah, i know
<rogpeppe> jam: but i'd much prefer it if we have a wrap function that explicitly declares the errors that can pass back
<rogpeppe> jam: then we can actually see what classifiable errors the function might be returning
<rogpeppe> jam: there are 9 possible returned error branches in that function - it's much easier to modify the function if you know which of those might be relied on for specific errors
<natefinch> rogpeppe: that would be pretty useful in a defer statement, since it would then be right next to the function definition, as well.
<rogpeppe> natefinch: perhaps
<rogpeppe> natefinch: tbh i'm not keen on ErrorContextf in general
<rogpeppe> natefinch: it just adds context that the caller already knows
<natefinch> rogpeppe: yes, I wouldn't have it change the message, just filter the types.  I don't want to have to troll through the code in a function to figure out what errors it can return
<rogpeppe> natefinch: the doc comment should state what errors it can return
<rogpeppe> natefinch: and i'd put a return errgo.Wrap(err) on each error return
<rogpeppe> natefinch: (errgo.Wrap(err, errgo.Is(worker.ErrTerminateAgent) for paths where we care about the specific error)
<rogpeppe> natefinch: i know what you mean about having the filter near the top of the function though
<natefinch> rogpeppe: btw for want/hasvote, what about : non-member, pending-removal, pending-add, member?  I feel like inactive and active sound too ephemeral, like it could change at any minute, when in fact, it's likely to be a very stable state.   But maybe I'm over thinking it.
<jam> natefinch: fwiw I like your terms better
<rogpeppe> natefinch: those terms aren't actually right, unfortunately.
<rogpeppe> natefinch: there's no way of currently telling if a machine's mongo is a member of the replica set
<rogpeppe> natefinch: even if a machine has WantVote=false, HasVote=false, it may still be a member
<rogpeppe> natefinch: basically, every state server machine will be a member unless it's down
<rogpeppe> natefinch: how about "activated" and "deactivated" instead of "active" and "inactive" ?
<natefinch> rogpeppe: isn't the intended purpose that those with y/y that they're in the replicaset?  I guess if it doesn't reflect the replicaset, what does it reflect?
<rogpeppe> natefinch: the intended purpose of those with y/y is that they are *voting* members of the replica set
<natefinch> I see
<rogpeppe> natefinch: we can have any number of non-voting members
<rogpeppe> natefinch: (and that's important)
<natefinch> member-status: non-voting, pending-unvote, pending-vote, voting?    I know unvote is not a word, but pending-non-voting is too long and confusing.
<sinzui> jamespage, I sent a reply about 1.16.4.
<jamespage> sinzui, so the backup/restore bits are not actually in the 1.16 branch?
<sinzui> jamespage, no backup
<jamespage> hmm
<natefinch> rogpeppe: check my last msg
<sinzui> restore aka update-bootstrap worked for customers who had the bash script
<sinzui> jamespage, by not installing juju-update-bootstrap, I think we can show that no new code was introduced to the system
<rogpeppe> natefinch: "not voting", "adding vote", "removing vote", "voting" ?
<perrito666> jamespage: sinzui I assigned myself https://bugs.launchpad.net/juju-core/+bug/1305780?comments=all just fyi
<_mup_> Bug #1305780: juju-backup command fails against trusty bootstrap node <backup-restore> <juju-core:Triaged by hduran-8> <https://launchpad.net/bugs/1305780>
<natefinch> rogpeppe: sure, that's good
<rogpeppe> natefinch: although i'm not entirely happy with the vote/voting difference
<rogpeppe> natefinch: how about: "no vote", "adding vote", "removing vote", "has vote" ?
<sinzui> jam, wallyworld: the precise unit tests are now running with mongo from ctools. I have a set of failures. They are different from before. CI is automatically retesting. I am hopeful
<natefinch> rogpeppe: "voting, pending removal" "not voting, pending add"?  That makes it a little more clear that even though the machine is not going to have the vote in a little bit, it actually still does right now
<natefinch> (and vice versa)
<rogpeppe> natefinch: i think that it's reasonable to assume that if something says "removing x" that x is currently there to be removed
<rogpeppe> natefinch: likewise for adding
<natefinch> rogpeppe: fair enough
<rogpeppe> ha, i've just discovered that if you call any of the Patch functions in SetUpSuite, the teardown functions never get called.
<mgz> rogpeppe: heh, yeah, another reason teardown is generally dangerous
<hazmat> jam, thanks for the scale testing reports
<rogpeppe> mgz: i think we should change CleanUpSuite so it just works if you do a suite-level patch
<natefinch> rogpeppe: whoa, really?  I assymed they'd do the right thing
<rogpeppe> natefinch: uh uh
<natefinch> rogpeppe, mgz: yes, definitely.   Totally unintuitive otherwise
<rogpeppe> i won't do it right now, but i'll raise a bug
<rogpeppe> where are we supposed to raise bugs for github.com/juju/testing ?
<rogpeppe> on the github page, or on juju-core?
<natefinch> last I heard we were keeping bugs on launchpad
<natefinch> (not my idea)
<rogpeppe> natefinch: done. https://bugs.launchpad.net/juju-core/+bug/1308101
<_mup_> Bug #1308101: juju/testing: suite-level Patch never gets restored <juju-core:New> <https://launchpad.net/bugs/1308101>
<natefinch> rogpeppe: I very well may have made that mistake myself recently.
<rogpeppe> natefinch: that was what caused me to investigate
<rogpeppe> natefinch: i knew it was an error, but i thought it would get torn down at the end of the first test
<rogpeppe> natefinch: i wondered how it was working at all
<natefinch> rogpeppe: yeah, of the two likely behaviors, never getting torn down is definitely the worse of the two
<natefinch> rogpeppe: but also the one least likely to be obvious
<rogpeppe> natefinch: yup
<alexisb> jam, sinzui any news on the bootstrap issue? https://bugs.launchpad.net/juju-core/+bug/1306212
<_mup_> Bug #1306212: juju bootstrap fails with local provider <bootstrap> <ci> <local-provider> <regression> <juju-core:In Progress by jameinel> <https://launchpad.net/bugs/1306212>
<sinzui> alexisb, thumper and wallyworld landed a hack to remove HA from local to make tests pass...
<sinzui> alexisb, I think devs hope to fix the real bug...
<natefinch> alexisb, sinzui:  jam looked into it some, and it may be due to an old version of mongo (2.2.x) that we don't really need to support anyway... cloud archive has 2.4.6 I believe, which may solve the problem
<sinzui> natefinch, already update the test
<alexisb> sinzui, natefinch: can we add those updates to the bug?
<sinzui> natefinch, This is the current run with mongo from ctools: http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/633/console
<alexisb> sinzui, any other critical bugs blocking the 19.0 release?
<natefinch> sinzui: that looks good
<sinzui> alexisb, I don't think the of HA removal of local as a hack. It is insane to attempt HA for a local env. I make close the bug instead of deferring it to the next release
<natefinch> sinzui: but that's with wallyworld's hack, right?  We'll need to remove that hack at some point, and it would be good not to use an old version of mongo anyway.
<sinzui> natefinch, there is a previous run with failure, but different failures thatn before. CI choose to retest assuming the usual intermittent failure
<alexisb> sinzui, understood
<natefinch> sinzui: the devs discussed it this morning.  Having HA on local may not actually give "HA", but it can be useful to show how juju works with HA, like you can kill a container and watch juju recover, re-ensure and watch a new one spin up, etc
<sinzui> natefinch, wallyworld's hack was about not try to do HA setup for local.
<natefinch> sinzui: it's basically just like the rest of local.... it's not actually *useful* for much other than demos and getting your feet wet.... but it's really useful for that.
<sinzui> natefinch, alexisb unit tests are all pass
<sinzui> \0/
<sinzui> natefinch, alexisb azure is ill today and the azure tests failed. I am asking for a retest. the current revision will probably be blessed for release today
<alexisb> sinzui, awesome!
<natefinch> sinzui: what version of mongo is that running?
 * natefinch doesn't know what ctools means
 * sinzui reads the test log
<natefinch> sinzui: ahh, I see, I missed it somehow, looks like 2.4.6
<sinzui> natefinch,  1:2.4.6-0ubuntu5~ctools1
<natefinch> sinzui: happy
<jamespage> sinzui, I'm not adverse to introducing a new feature - the plugins are well isolated but afaict its not complete in the codebase
<jamespage> sinzui, if that is the case then I agree not shipping the update-bootstrap plugin does make sense
<jamespage> otherwise afaict I have no real way of providing backup/restore to 1.16 users right?
<sinzui> jamespage, They aren't complete since we know that a script is needed to get tar and mongodump to do the right thing
<jamespage> sinzui, OK - I'll drop it then
<natefinch> sinzui: is that the version of mongo we were running before wallyworld's hack?  I'd like to know if the version of mongo is the deciding factor
<sinzui> natefinch, for the unittests. that version of mongo is the fact
<sinzui> natefinch, for the local deploy, wallyworld's hack was the fix
<sinzui> and jam's fix for upgrades fixed all upgrdes
<natefinch> ok
<natefinch> oh right, it was the version of mongo for upgrades that was changing how we add/remove users.
<sinzui> natefinch, CI has mongo from ctools though. all I did for test for the test harness was ensure that precise hosts add the same PPA as CI itself
<natefinch> sinzui: ok, I thought someone had mentioned this morning that CI adds the juju/stable PPA for mongo, but I may have misunderstood or they may have been wrong
<sinzui> natefinch, If added the juju stable ppa, I ensure the ctools archive is added and then manually install it before we rung make install-dependencies
<natefinch> sinzui: I believe you know what your tools are running better than some random juju dev :)
<natefinch> (and my memory thereof)
<sinzui> Azure is very ill.
<sinzui> The best I can do is manually retest hoping that I catch azure whe it is better
<natefinch> poor azure
<rogpeppe> natefinch: ping
<natefinch> rogpeppe: yo
<rogpeppe> natefinch: i've just pushed lp:~rogpeppe/juju-core/natefinch-041-moremongo/
<rogpeppe> natefinch: all tests pass
<rogpeppe> natefinch: could you pull it and re-propose -wip, please?
<natefinch> rogpeppe: sure
<rogpeppe> natefinch: i guess i could make a new CL, but it seems nicer to use the current one
<natefinch> rogpeppe: yeah
<natefinch> rogpeppe: wiping
<natefinch> rogpeppe: that should be wip-ing
<natefinch> rogpeppe: done
<rogpeppe> natefinch: ta
<rogpeppe> natefinch: i've pushed again. could you pull and then do a proper propose, please?
<rogpeppe> natefinch: and then i think we can land it
<natefinch> rogpeppe: sure
<natefinch> rogpeppe: one sec, running tests on the other branch
<jam1> natefinch: sinzui: so I don't know if changing mongo would have made CI happy without wallyworld's hack. wallyworld was the only one who has reproduced the replicaset Closed failure, and he did so on trusty running 2.4.9, so it seems like it *could* still be necessary.
<jam1> natefinch: the concern is that the code is actually not different in Local, so if it is failing there it *should* be failing elsewhere
<jam1> and maybe we just aren't seeing it yet
<natefinch> jam1: yep, also a good reason not to have local be different
<jam1> natefinch: I believe my stance on local HA, it doesn't provide actual HA, it is good for demos, I like common codebase, but I'm willing to slip it if we have to.
<sinzui> jam1 I think you are forgetting the unit test failed days before upgrade failed and days before local deploy failed.
<natefinch> jam1: yeah.  perfect is the enemy of good
<mgz> so, I fixed the test suite hang... but still don't understand why it actually did that.
<sinzui> I went to sleep with unit tests and azure failing (the latter is azure, not code)
<rogpeppe> i cannot get this darn branch to pass tests in the bot: https://code.launchpad.net/~rogpeppe/juju-core/548-destroy-environment-fix/+merge/215697
<rogpeppe> it's failed on replicaset.TestAddRemoveSet three times in a row now, and the changes it makes cannot have anything to do with that
<rogpeppe> let's try just one more time
<sinzui> jam, I changed the db used in http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/ #632 I got a pass in #633. The errors were different between dbs
<jam1> sinzui: so TestInitialize failing was the mongo version problem. The fact that adding --replicaSet to the mongo startup line caused local to always fail with Closed is surprising, but might be a race/load issue that local triggers that others don't.
<jam1> natefinch: a thought, we know it takes a little while for replicaset to recover
<jam1> IIRC, mongo defaults to having a 15s timeout
<jam1> natefinch: is it possible that Load pushes local over the 15s timeout?
<rogpeppe> natefinch: could you push your 043-localstateinfo branch please, so I can try to get the final branch ready for proposal?
<sinzui> jam1 I changed the db for unit tests because it didn't match the db used by local/ci the one from ctool is the only one will trust now for precise
<jam1> sinzui: right, the only place we actually use juju:ppa/stable is for Quantal and Raring
<natefinch> jam: yes, possible. Mongo can be sporadically really slow
<jam1> I forgot about R
<natefinch> rogpeppe: reproposed moremongo
<jam1> but I don't think 2.4 landed until Saucy
<rogpeppe> natefinch: thanks
<rogpeppe> natefinch: i'll approve it
<sinzui> jam1, the makefile disagrees with your statement
<mgz> R is no longer supported
<sinzui> jam1 the make files doesn't know about ctools
<jam1> sinzui: so what I mean is, when you go "juju bootstrap" and Q and R is the target, we add the juju ppa
<jam1> sinzui: ctools doesn't have Q and R builds
<jam1> only P
<natefinch> rogpeppe:  pushed
<rogpeppe> natefinch: ta
<jam1> sinzui: but as mgz points out, Q is almost dead, and R is dead, so we can punt
<rogpeppe> natefinch: how's it going, BTW?
<sinzui> jam1, good, because I don't test with r (obsolete) and q (obsoelete in 3 days)
<jam1> sinzui: otherwise the better fix is to get 2.4.6 into the ppa
<sinzui> jam1, +1
<sinzui> jam1, I was not aware the versions were different until this morning
<jam1> sinzui: I wasn't that aware either
<jam1> sinzui: I just saw the failing and davecheney pointed out CI was using an "old" version, which I tracked down
<sinzui> I saw it too, but I haven't gotten enough sleep to see how the old version was selected
<jam1> sinzui: it is nice to see so much blue on http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/
<sinzui> jam1. I see CI has started next revision will I was trying to get lucky with azure.
<natefinch> rogpeppe: I think most tests pass on 043-localstateinfo now.  saw some failures from state, but they looked sporadic, haven't checked them out yet
<rogpeppe> natefinch: cool. "most" ?
<mgz> rogpeppe: can I request a re-look at https://codereview.appspot.com/87540043
<rogpeppe> mgz: looking
<mgz> rogpeppe: I don't like that that fixed the hang... pretty sure it means the test is relying on actually dialing 0.1.2.3 and that failing
<natefinch> rogpeppe: I didn't let the worker tests finish because I was impatient and they were taking forever, so possible there are failures there too
<natefinch> rogpeppe:  state just passed for me
<rogpeppe> natefinch: cool
<sinzui> jam1, everyone. This is a first,  lp:juju-core r2630 passed 7 minutes after CI cursed the rev because I was forcing the retest of azure.
<rogpeppe> mgz: the code in AddStateServerMachine is kinda kooky
<sinzui> jam1. I will start preparation for the release while the current rev is tested. I will use the new rev if it gets a natural blessing
<mgz> AddStateServerMachine should probably just be removed
<rogpeppe> mgz: probably.
<mgz> it's not a very useful or used helper
<mgz> and its doc comment is wonky
<mgz> I probably shouldn't have touched it, as stuff passes without poking there
<mgz> but it came up in my grep
<natefinch> rogpeppe:  worker tests pass except for a peergrouper test - workerJujuConnSuite.TestStartStop got cannot get replica set status: cannot get replica set status: not running with --replSet
<rogpeppe> it should probably be changed to SetStateServerAddresses
<rogpeppe> mgz: as that's the reason it was originally put there
<mgz> right, something like that
<rogpeppe> mgz: but let's just land it and think about that stuff later
<mgz> I'll try it on the bot
<rogpeppe> mgz: bot is chewing on it...
<natefinch> rogpeppe: peergrouper failure was sporadic too, somehow.  All tests pass on that branch.
<sinzui> alexisb, I will release in a few hours. CI is testing a new rev that I expect to pass without intervention. I can release the previous rev which pass with extra retests
<alexisb> sinzui, awesome, thank you very much!
<mattyw> folks, has anyone seen this error before when trying to deploy a local charm? juju resumer.go:68 worker/resumer: cannot resume transactions: not okForStorage
<rogpeppe> natefinch: any chance we might get it landed today?
<rogpeppe> natefinch: i'm needing to stop earlier today
<alexisb> go rogpeppe go! :)
<rogpeppe> alexisb: :-)
 * rogpeppe has spent at least 50% of today dealing with merge conflicts
<natefinch> rogpeppe: yes
<rogpeppe> natefinch: cool
<rogpeppe> natefinch: BTW the bot is running 041-moremongo tests right now... fingers crossed
<rogpeppe> natefinch: i've just realised that it would be quite a bit nicer to have a func stateInfoFromServingInfo(info params.StateServingInfo) *state.Info, and just delete agent.Config.StateInfo
<natefinch> rogpeppe: yeah
<rogpeppe> natefinch: n'er mind, we'll plough on, i think.
<natefinch> rogpeppe: that was my thinking :)
<natefinch> awww, I think my old company finally cancelled my MSDN subscription
<rogpeppe> natefinch: i'm needing to stop in 20 mins or so. any chance of that branch being proposed before then?
<natefinch> rogpeppe: https://codereview.appspot.com/88200043
<rogpeppe> natefinch: marvellous :-)
<natefinch> :)
<rogpeppe> natefinch: reviewed
<natefinch> rogpeppe: btw, I had to rename params to attrParams because params is a package that needed to get used in the same function
<rogpeppe> natefinch: i know
<natefinch> rogpeppe: oh, I misunderstood the comment, ok
<rogpeppe> natefinch: i just suggested standard capitalisation
<natefinch> rogpeppe: yep, cool
<natefinch> rogpeppe: why is test set password not correct anymore?  It still does that, I think?
<rogpeppe> natefinch: oh, i probably missed it
<rogpeppe> natefinch: you're right, i did
<natefinch> rogpeppe: cool
<rogpeppe> natefinch: 41-moremongo is merged...
<natefinch> rogpeppe:awesome
<natefinch> 43-localstateinfo should be being merged now
<natefinch> rogpeppe: what's left?
<rogpeppe> natefinch: it's actually retrying mgz's apiaddresses_use_hostport
<rogpeppe> natefinch: i'm trying to get tests passing on my final integration branch
<natefinch> rogpeppe: nice
<rogpeppe> natefinch: currently failing because of the StateInfo changes (somehow we have a StateServingInfo with a 0 StatePort)
<rogpeppe> natefinch: would you be able to take it over from me for the rest of the day
<rogpeppe> ?
<natefinch> rogpeppe: yeah definitely
<rogpeppe> natefinch: it needs a test that peergrouper is called (i'm already mocking out peergrouper.New)
<natefinch> rogpeppe: what's the branch name?
<rogpeppe> natefinch: i haven't pushed it yet, one mo
<rogpeppe> natefinch: bzr push --remember lp:~rogpeppe/juju-core/540-enable-HA
<rogpeppe> natefinch: there are some debugging relics in there that need to be removed too
<rogpeppe> natefinch: in particular, revno 2355 (cmd/jujud: print voting and jobs status of machines) needs to be reverted and proposed separately as discussed in the standup
<natefinch> rogpeppe: ok
<rogpeppe> natefinch: i'd prioritise the other branches though
<rogpeppe> natefinch: i have to go now
<rogpeppe> g'night all
<natefinch> rogpeppe: g'night
<sinzui> natefinch, Do you have a moment to review https://codereview.appspot.com/88170045
<natefinch> sinzui: done
<sinzui> thank you natefinch
<BradCrittenden> sinzui: would you have a moment for a google hangout?
<sinzui> bac: yes
<bac> sinzui: cool.  let me set one up and invite you in a couple of minutes
<bac> sinzui: https://plus.google.com/hangouts/_/canonical.com/daily-standup
<davecheney> good morning worker ants
<perrito666> davecheney: my window says otherwise
<davecheney> perrito666: one of us is wrong
<davecheney> i'll roshambo you for it
 * perrito666 turns a very strong light on outside and says good morning to davecheney 
<davecheney> perrito666: it helps with the jet lagg
<perrito666> davecheney: I traveled under 20km today, I dont have that much jetlag :p
#juju-dev 2014-04-16
<wallyworld> davecheney: hiya, who do we assign the static linking bug to in order to get the packaging fixed?
<davecheney> wallyworld: its weird
<davecheney> it should already be fixed
<davecheney> we don't do it for ppc
<davecheney> i don't see why arm64 is different
<davecheney> try jamespage or sinzui
<wallyworld> ok, ta
<wallyworld> i think that's the only issue for arm in 1.19 which is great
<mwhudson> what is this issue?
<davecheney> mwhudson: tools for arm64 are built against a dynamic libgo.so.5
<mwhudson> ah
<davecheney> so don't run if you have got that library installed on the target
<mwhudson> and this isn't the case for ppc?
<mwhudson> and both build tools with "go build"?
<davecheney> mwhudson: i know
<davecheney> that is what is whack
<davecheney> lemmie apt-get source
<mwhudson> it does seem a bit unlikely
<davecheney> golang_archs:= amd64 i386 armhf
<davecheney> ifeq (,$(filter $(DEB_HOST_ARCH), $(golang_archs)))
<davecheney> # NOTE(james-page) statically link libgo for the jujud binary for gccgo
<davecheney> # this allows the binary to be re-cut for upstream tool distribution and
<davecheney> # mimics the behaviour of the golang gc compiler.
<davecheney> JUJUD_FLAGS:= -gccgoflags -static-libgo
<davecheney> endif
<davecheney> is this a double negative ?
<davecheney> honeslyt we can just always pass that flag
<davecheney> it does no harm if you don't compile with gccgo
<davecheney> wallyworld: you can assign that issue to me if you like
<davecheney> i'll figure out where the packaging branch is and propose a fix
<wallyworld> ok, thanks :-)
<wallyworld> axw: azure doesn't support a root-disk constraint does it?
<axw> wallyworld: what's that constraint do again? specifies the size?
<wallyworld> yeah
<axw> wallyworld: checking. I think you specify the disk size.
<axw> wallyworld: actually it does: http://msdn.microsoft.com/en-us/library/azure/dn197896.aspx
<wallyworld> ok, cool. thanks
<axw> wallyworld: although I've just looked, and they're all 127GB
<axw> wallyworld: could change in the future I guess.
<wallyworld> yeah, ok. i am looking to allow providers to specify constraints that are unsupported
<wallyworld> so we can log a warning or perhaps even error if users attempt to use them
<wallyworld> i'll leave root-disk alone for now for azure
 * davecheney plays the 'where is the package source branch' game
<davecheney> wallyworld: i can't find the packageing branch
<wallyworld> ok, we can ask james or curtis
<davecheney> i can make a diff, attach it to the issue
<wallyworld> ok, feel free to assgn back to me and i can follow up
<davecheney> done
<davecheney> no thumper ?
<waigani> davecheney: thumper is sick
<jam> morning all
<axw> morning jam
<waigani> morning jam
<davecheney> booh
<jam> waigani: that's a shame about thumper, especially since he's on vacation next week as well. I hope he gets better.
<waigani> yeah true, we better make the most of him tomorrow!
 * davecheney is on vacatoin in just over 24 hours
<waigani> Hopefully I'll get my current branch landed before I go
<davecheney> axw: do you know offhand if inside juju we set any signal handlers ?
<axw> davecheney: we do indeed
<axw> davecheney: why?
<davecheney> axw: right, that is what is causing juju to explode on ppc
<jam1> davecheney: we catch SIGINT during bootstrap, and IIRC SIGABRT to signal we should shut down and clean up after ourselves
<axw> davecheney: ? :(
<davecheney> axw: nearly got a working test case
<davecheney> axw: signal handling on ppc64el 64k kernels appears broke
<axw> yep, what jam1 said
<axw> wonderful
<davecheney> axw: we can fix this
<axw> ok, cool.
<davecheney> axw: whereabouts do we setup signal handlers
<davecheney> so I can crib the exact code
<axw> davecheney: worker/terminationworker is one
<axw> davecheney: cmd/cmd.go is another
<davecheney> right
<davecheney> thanks
<jam1> guys, Trunk is broken for bootstrap
<jam1> it isn't installing mongodb on EC2.
<jam1> I can see it work at juju-1.19.0, but tip is just broken
<vladk> good morning
<jam1> morning vladk
<jam1> so I worked out the bootstrap stuff. 1.19.1 expects to "apt-get install mongodb-server" during "jujud bootstrap-state" while 1.19.0 expects it to be installed during the "juju bootstrap" and cloud-init portion of the client.
<jam1> so juju-1.19.1 trying to bootstrap jujud-1.19.0 is broken
<vladk> jam1: currently the only way to bring up networks is to use:
<vladk> juju deploy --networks ... --exclude-networks
<vladk> I would move these options to --constraints option,
<vladk> so it will be possible to add networks on 'juju bootstrap' and 'juju add-machine', too.
<vladk> Also, I would always bring up all networks on MaaS nodes.
<jam1> vladk: we explicitly don't want them as constraints, because the act differently.
<jam1> with existing constraints, you can change them after deploying one instance
<jam1> and it has effects only on new instances
<jam1> but for networks
<jam1> that doesn't work well
<jam1> so we are intentionally modeling them differently.
<jam1> we do want to add it to "juju add-machine"
<jam1> I believe mgz has a card for doing so.
<jam1> I'm not sure about bootstrap, it is certainly something to discuss
<mattyw> davecheney, ping?
<vladk> jam1: what is your opinion about always activating all networks on InstanceSetup (not only when --networks specified)
<davecheney> mattyw: pong
<jam1> vladk: so getting the list of what networks should be available on the given machine, and setting them all up, even if we didn't supply --networks listing that network?
<jam1> vladk: I'm probably happier if we set up everything rather than only the ones the user supplied
<jam1> as then if you want to deploy another service, in say a container, then we know that we do have that network
<jam1> now, when we have the NetworkWorker that can do dynamic setup of networks
<jam1> it matters less
<jam1> because then we can just set up the minimum, and then add ones that we need later.
<jam1> I thought we were starting all by default.
<vladk> jam1: current dimitern's implementation is to setup them all, if --networks was specified, and none otherwise
<vladk> I suggest to setup them all also on add-machina and bootstrap
<jam1> vladk: I think we want to set them up even if you didn't supply --network, because they are (essentially) a property of the machine, not a property of the service we deployed.
<jam1> We *do* want to support a bit more SDN, where we can add and remove networks dynamically
<jam1> but until we get there, we should be setting them up
<rogpeppe> mornin' all
<axw> rogpeppe: good morning, here' a review ;)   https://codereview.appspot.com/88350043/
<rogpeppe> axw: hiya
<rogpeppe> axw: looking. (as i try to repro jam1's tip bootstrap failure)
<jam1> rogpeppe: I don't know if you saw the whole thread, but just doing "juju bootstrap" without --upload-tools is broken in trunk
<jam1> because juju-1.19.1 client expects jujud to install mongo
<jam1> but while jujud-1.19.1 will do it, 1.19.0 will not
<rogpeppe> jam1: ah, without --upload-tools
<rogpeppe> jam1: i thought we always tried to pick an identical version to bootstrap with
<axw> nope, there was a thread about it recently on juju-dev
<rogpeppe> jam1: how much do we need to preserve compatibility between dev versions?
<jam1> rogpeppe: well it also means that trunk cannot bootstrap stable (1.18) but I think we've said you always bootstrap matching Major.Minor
<jam1> we've talked about, but not implemented, Major.Minor.Patch
<jam1> certainly CI would prefer that, and some users have been surprised it wasn't the case.
<jam1> rogpeppe: I believe we can break compatibility between dev releases, but we shouldn't do so unless we have strong reason to.
<rogpeppe> jam1: i think there's a reasonable reason to here
<rogpeppe> jam1: it cleans up cloudinit a lot to move the mongo stuff out of there
<rogpeppe> jam1: if you're using tip, you should basically always use upload-tools
<rogpeppe> axw: looks like you haven't recently merged trunk into that branch
<axw> rogpeppe: probably not. things moved around a bit, but they're otherwise not that much different are they?
<rogpeppe> axw: i think that's worth doing, as you will unfortunately encounter a few conflicts, and i'd prefer to review with them resolved
<rogpeppe> axw: EnsureMongoServer has changed quite a bit
<axw> rogpeppe: okay no worries
<rogpeppe> hmm, params.MachineStatus and params.MachineInfo could really do with some reconciliation
<rogpeppe> each one has info that the other does not
<wallyworld> jamespage: hi, you around?
<jamespage> wallyworld, I am
<wallyworld> jamespage: looks like there's a packaging bug for arm bug 1308263
<_mup_> Bug #1308263: /var/lib/juju/tools/1.19.0.1-trusty-arm64/jujud: error while loading shared libraries: libgo.so.5: cannot open shared object file: No such file or directory <hs-arm64> <juju-core:Triaged by wallyworld> <https://launchpad.net/bugs/1308263>
<wallyworld> the wrong compiler options are being used
<wallyworld> are you able to fix the packaging branch> dave cheney looked but couldn't find it
<jamespage> wallyworld, I disagree
<jamespage> wallyworld, I'll look now
<jamespage> wallyworld, https://launchpadlibrarian.net/172662879/buildlog_ubuntu-trusty-arm64.juju-core_1.18.1-0ubuntu1_UPLOADING.txt.gz
<jamespage> that's the build log for arm64
<jamespage> you can quite clearly see that static-libgo is being used
<wallyworld> ok
<wallyworld> jamespage: i'll talk to dave tomorrow
<jamespage> wallyworld, this looks like a problem with whatever branch is used for building PPA packages
<jamespage> wallyworld, this is not a problem in the distro packages
<wallyworld> ok
<wallyworld> thanks for looking
<jamespage> wallyworld, let me dig a bit
<wallyworld> ok, i'm about to go out for dinner so much appreciated
<jamespage> sinzui, where do you keep the packaging you use for the PPA builds?
<wallyworld> davecheney: hey, you just joined, i asked jamespage about bug 1308263
<_mup_> Bug #1308263: /var/lib/juju/tools/1.19.0.1-trusty-arm64/jujud: error while loading shared libraries: libgo.so.5: cannot open shared object file: No such file or directory <hs-arm64> <juju-core:Triaged by wallyworld> <https://launchpad.net/bugs/1308263>
<wallyworld> the build logs show the correct options are being used
<wallyworld> davecheney: https://launchpadlibrarian.net/172662879/buildlog_ubuntu-trusty-arm64.juju-core_1.18.1-0ubuntu1_UPLOADING.txt.gz
<wallyworld> so more investigation needed
<wallyworld> but i notice that log is for 1.18
<wallyworld> it is 1.19 which has the issue
<wallyworld> jamespage: could there be a 1.19 build issue? as opposed to 1.18.1?
<wallyworld> and something else has just occurred to be - the bug says 1.19.0.1 which implies they used upload-tools
<jamespage> wallyworld, no its the packaging
<jamespage> wallyworld, it makes no attempt to use static-libgo
<wallyworld> and upload-tools doesn't use those compile options
<wallyworld> so i think there is also a juju bug - we need to use the correct compile options for upload-tools on arm
<davecheney> jamespage: wallyworld oh
<davecheney> i know what happened
<davecheney> oh
<davecheney> no
<davecheney> hang on
<davecheney> only jujud is built with a static lib go
<davecheney> the other cmds aren't
<davecheney> but their packaging depends on libgo.deb
<davecheney> as an install dependency
<davecheney> jamespage: wallyworld one thing, the build recipe is a bit weird, it says only use -gccgoflags -static-libgo for armhf,386 and amd64
<davecheney> so, not for ppc/arm64
<davecheney> jamespage: will you accept my patch to unconditionally pass that flag ?
<davecheney> it's harmless if you are using the gc toolchain
<davecheney> the optoin is just ignored
<jamespage> davecheney, there is not a bug in the ubuntu packaging in distro - it working exactly as designed
<jamespage> davecheney, I don't know where the packaging for the ppa's comes from - sinsui would know best
 * wallyworld has a dinner appointment, sorry gotta run
<jamespage> I don't think its done from a recipe any longer
<davecheney> jamespage: ok
<davecheney> we'll keep looking
<davecheney> jam: ubuntu@winton-06:/var/lib/juju$ sudo du -sh *
<davecheney> 16K     agents
<davecheney> 1.2G    db
<davecheney> juju keeps doing this
<davecheney> the mongodb goes ape shit and consumes gigs of space
<perrito666> morning
<davecheney> perrito666: you again!
<davecheney> and now you can't even tell what time it is
<jam1> morning perrito666
<raywang> hello guys,  do you know how to disable the debug info from juju debug-log?  it's too much info to sort out useful info
<perrito666> davecheney: to be honest its still night :p
<jam1> raywang: you can change the default logging with "juju help logging"
<jam1> though I thought Debug was disabled by default
<jam1> unless you did "juju bootstrap --debug"
 * jam1 switches machines
<raywang> jam1, ok, let me check it, thanks :)
<raywang> jam1, well, there is no juju help logging..
<natefinch> rogpeppe: sorry I didn't email about your branch.  I had trouble figuring out what state it was in.  It seemed to be out of date with trunk and with a bunch of conflicts, so I don't know what was going on there.
<rogpeppe> natefinch: ok
<jam1> raywang: there is in juju-1.18, that might be new since 1.16
<jam1> yep
<raywang> jam1, yeah, i'm using 1.16
<natefinch> jam1: we should put that in help topics or something, it doesn't show up anywhere under juju help or juju help topics
<raywang> jam1, just wondering how ot disable the debug mode in juju debug-log output :)
<rogpeppe> simple review anyone? (code movement only - finally removing state/statecmd): https://codereview.appspot.com/88130047
<natefinch> rogpeppe: I can look
<vladk> mgz: please, take a look: https://codereview.appspot.com/88380044
<natefinch> rogpeppe:  lgtm'd
<rogpeppe> natefinch: ta
<mgz> vladk: sure
<jam> whose a man got ta threaten to get a trivial review: https://codereview.appspot.com/87570043/
<natefinch> jam:  looking
<jam> natefinch: while you're there: https://code.launchpad.net/~jameinel/juju-core/api-endpoints-from-cache-1268470/+merge/216058
<jam> that ones a bit less trivial
<perrito666> jam: you got a regex matching \n$ is that correct?
<mgz> it's fine, oddly
<jam> perrito666: "." doesn't match "\n"
<jam> so you have to do (.|\n) for multiline
<perrito666> ah, didnt have that one, sweet
<natefinch> jstandup
<jam> rogpeppe: wallyworld: standup ?
<jam> rogpeppe: so 1.18.0 *doesn't* return HostPorts, so we do have to maintain compatibility there.
<rogpeppe> jam: yup - but i think that because we always add the dialled address, that it will work ok even then
<rogpeppe> pwd
<jam> pwd to you too :)
<rogpeppe> :-)
<rogpeppe> trivial code review anyone? https://codereview.appspot.com/88430044
<rogpeppe> natefinch, jam, mgz: ^
<natefinch> rogpeppe: looking
<jam> rogpeppe: LGTM
<natefinch> rogpeppe: when does it happen that we get a stateport of 0?
<rogpeppe> natefinch: when the state is created
<rogpeppe> jam: thanks
<natefinch> man, I would pay a lot of money to stop being sick.  Damn cold has only gotten worse for like 8 days now.
<rogpeppe> wallyworld: ping
 * rogpeppe goes for lunch
<rogpeppe> natefinch: hangout?
<wallyworld> rogpeppe: hiya
<rogpeppe> wallyworld: yo!
<rogpeppe> wallyworld: i was just looking at jujud.MachineWithCharmsSuite
<wallyworld> ok
<rogpeppe> wallyworld: and wondering whether it might be possible to integrate TestManageEnvironRunsCharmRevisionUpdater with MachineSuite
<wallyworld> could be. i don't recall offhand that test suite
<wallyworld> let me look
<rogpeppe> wallyworld: it looks awkward though because of the relationship with charmrevisionupdater/testing/CharmSuite
<rogpeppe> wallyworld: i *think* that the reason it was separated was that both suites embed JujuConnSuite
<wallyworld> i must admit i don't recall
<rogpeppe> wallyworld: ok, fair enough
<rogpeppe> wallyworld: it all seems a bit twisty and it had your name on it, so thought i may as well ask :-)
<wallyworld> i'd have to grok the code again
<wallyworld> i wonder if i added the code from scratch or just refactored
<wallyworld> but feel free to change whatever needs fixing
<rogpeppe> wallyworld: ta
<rogpeppe> wallyworld: looks like it was probably your code from scratch, but i can't be sure
<wallyworld> could be. i'll have to re-read it to remember what was done
<wallyworld> but i'm not emotionally attched to it so feel free to fip in and change stuff
<natefinch> rogpeppe: sorry, busy with the kids for the next 30-45 minutes.
<rogpeppe> natefinch: ok
<sinzui> jamespage, The trusty package was made with lp:~juju-qa/juju-core/devel-packaging , The other series were made with lp:~juju-qa/juju-core/devel-mongodb-packaging
<jamespage> wallyworld, ^^
<wallyworld> jamespage: ok, so that might explain that bug?
<jamespage> wallyworld, those branches don't match the d/rules in trusty no
<jamespage> wallyworld, but the PPA's currently only build for armhf and x86 anyway
<wallyworld> not arm64?
<jamespage> wallyworld, not as far as I am aware of
<wallyworld> hmmm. that needs to be fixed i guess
 * wallyworld knows little about packaging
<jam> rogpeppe: rotate connection to front: https://codereview.appspot.com/88470043
<rogpeppe> jam: thanks, will look in a mo
<jam> I'm off, but I'm likely to stop by in the evening.
<natefinch> rogpeppe: back now
<rogpeppe> natefinch: https://plus.google.com/hangouts/_/canonical.com/juju-core?authuser=1
<rogpeppe> mgz, natefinch, jam, wallyworld, axw: trivial review anyone? (just updating dependencies.tsv) https://codereview.appspot.com/88160049
<mgz> rogpeppe: looking
<mgz> wait, what happened to loggo...
<rogpeppe> mgz: good question - someone never added it, i guess
<mgz> no, it was there
<mgz> I bet the 1.18 merge lost it
<mgz> nope, still there on that...
<mgz> wait, it's there in trunk
<mgz> ah, I see
<mgz> we have it as github.com/juju/loggo
<mgz> rogpeppe: you added it again as github.com/loggo/loggo
<mgz> which is actually the right location?
<rogpeppe> mgz: ah - i guess somewhere in the code is using the old path
<rogpeppe> mgz: that needs to be fixed
<mgz> state/apiserver/usermanager/usermanager.go
<mgz> rogpeppe: can you just fix that in this branch?
<rogpeppe> mgz: am doing
<mgz> (I can do a seperate one, but you'll need to remove the extra loggo dep line anmyway)
<mgz> rogpeppe: star
<rogpeppe> mgz: review? https://codereview.appspot.com/88490043
<rogpeppe> natefinch: https://codereview.appspot.com/88490043
<mgz> wow, no pleaselook comment
<mgz> okay, that CharmSuite change bends my knowledge of go
<rogpeppe> natefinch: lp:~rogpeppe/juju-core/540-enable-HA/
<rogpeppe> mgz: :-)
<rogpeppe> mgz: what are you having difficulty with?
<mgz> what implications does a struct having another... what's even the word... inherited, bar the convieninece of having names direct on the object?
<natefinch> embedded
<mgz> there's no auto initialise of anything anyway, is there anything else?
<mgz> embedded, thanks.
<rogpeppe> mgz: an embedded struct is just a field
<natefinch> direct access to the methods
<rogpeppe> mgz: and the type that embeds it gets all its methods
<mgz> okay, just that. thanks!
<rogpeppe> mgz: it's just the convenience of having names direct on the object, that's it
<natefinch> the suites are special because of gocheck doing reflection to look for methods called SetupSuite etc
<rogpeppe> natefinch: that's true, but it's always possible to define your own SetupSuite and make it call something else, so the embedding thing is still just a convenience
<natefinch> rogpeppe: right
<axw> rogpeppe: I just reproposed the HA upgrade branch; ignore it please, I've broken something
<rogpeppe> axw: sure
<rogpeppe> mgz: i've updated the dependencies.tsv branch: https://codereview.appspot.com/88160049
<mgz> rogpeppe: consider it stamped
<rogpeppe> mgz: ta
<natefinch> mgz: how do I branch someone else's branch into a colocated bzr branch on my local machine?
<rogpeppe> natefinch:
<rogpeppe> 553-stateservinginfo-validity
<rogpeppe> 555-jujud-charmsuite-use-commonmachinesuite
<mgz> natefinch: native colo?
<natefinch> mgz: yeah, I use bzr switch between colocated branches
<mgz> `bzr branch lp:~USER/PROJ/BRANCH co:BRANCH`
<natefinch> awesome, thanks
<mgz> with --switch if you want to switch to it with one command
<natefinch> mgz: wow, that is abnormally really really really slow  (compared to normally branching)
<mgz> natefinch: ...might not be doing what you want?
<mgz> I'd only expect slow if the branches had no common history
<natefinch> mgz: actually , I just did it in the wrong directory, which probably confused the hell out of BZR
<mgz> or if you missed the co: so it was creating a new branch in a subdir
<mgz> natefinch: right... not too harmful though
<mgz> I generally just ctrl+c if I notive it's doing too much work, generally something like that
<mgz> the problem of colo working, harder to keep track of where you are
 * rogpeppe is off for the day. should make it for the meeting tomorrow, but probably not earlier than that
<mgz> rogpeppe: later!
 * rogpeppe has a wedding to go to...
<natefinch> rogpeppe: good luck, hope she's worth it ;)
<perrito666> sinzui: hello, the comments on https://bugs.launchpad.net/juju-core/+bug/1305780?comments=all are you testing with the same backup/restore test I used for the other restore bug?
<_mup_> Bug #1305780: juju-backup command fails against trusty bootstrap node <backup-restore> <juju-core:Triaged by hduran-8> <https://launchpad.net/bugs/1305780>
<hazmat> anyone around to help debug a user issue.. he's on 1.18.. there's a panic in his machine-0.log (amd64)  http://paste.ubuntu.com/7262521/
<hazmat> also discussing with him #juju
<natefinch> hazmat: I'm around
<hazmat> natefinch, basically he can't connect to his state server from client.. more interestingly is the panic in the state server log
<hazmat> in that pastebin
<perrito666> hazmat: did he/she try the kvm module suggestion from the error? (yes, I know, stupid question but has to be asked :) )
<natefinch> hazmat: looking
<hazmat> perrito666, not sure that its related
<hazmat> perrito666, natefinch what does look interesting is the ip he's trying to connect to is discovered by juju after the state server has bound all ip addresses. shouldn't matter though as he's also restarted it by hand
<perrito666> hazmat: most likely not, but can cause extra entropy in the output :s
<hazmat> 192.168.1.2
<hazmat> perrito666, there's lot of output entropy
<hazmat> perrito666, natefinch but the panic in that pastebin link above is a there is a separate issue
 * hazmat grammar fails
<natefinch> hazmat: the panic is interesting.  implies we called the stateworker multiple times...
<natefinch> hazmat: I think the panic is an effect, not a cause
<hazmat> natefinch, the cause issue he's having is networking connectivity i suspect.. the panic  is a separate unrelated issue which also needs looking at
<natefinch> hazmat: yep, ok
<perrito666> anyone seen this on before? (error: no "trusty" images in az-1.region-a.geo-1 with arches [amd64 armhf i386])
<natefinch> hmm no
<perrito666> :( life
<natefinch> man I hate tests that test huge parts of the system all at once.  They're such a gigantic time sync whenever anything changes.
<perrito666> natefinch: tell me about :p
<jam1> hazmat: the close of closed channel might have come up with the Upgrader code having access to call EnsureStateWorker at the same time that the normal StateWorker would start up from the APIWorker. At least ISTR someone patching the Upgrader logic to deal with double startup. I thought that was in 1.18, though.
#juju-dev 2014-04-17
<davechen1y> waigani: /o
<davechen1y> o/
<waigani> davechen1y: hiya
<davechen1y> waigani: whatya working on ?
<davechen1y> at the 11th hour, i'm not quite sure what I should work on
<davechen1y> sinzui: there won't be another release into trusty at this stage, right ?
<waigani> davechen1y: adding tests for the mock functions BundleTools etc
<davechen1y> waigani: SGTM
<davechen1y> maybe i'll look at some of those race detector failures
<waigani> davechen1y: writing tests forces me to actually understand what the functions are meant to do and their consequences
<waigani> davechen1y: so I've been going a bit slow as I rethink / re-understand things
<sinzui> davechen1y, I doubt it. We might rebuild 1.18.1 if there are linking issues. I think there ill be a 1.18.2 before we do 1.20. I expect 1.18.2 to get excepted into trusty
<waigani> my IRC has been playing up, just noticed that I was not connected all morning
<davechen1y> sinzui: it was not clear what the state of the linking issue was
<davechen1y> is arm64 the only build affected ?
<sinzui> I don't know. jamespage asked about my packaging branches. We can rebuild to fix bugs easily
<davechen1y> sinzui: can you point me to the package branch
<davechen1y> please
<sinzui> davechen1y, , The trusty package was made with lp:~juju-qa/juju-core/devel-packaging , The other series were made with lp:~juju-qa/juju-core/devel-mongodb-packaging
<davechen1y> sinzui: ta
<davechen1y> /win13
<thumper> hmm...
 * thumper seems to have a test failure on trunk
<davechen1y> thumper: what do you see
<thumper> davechen1y: bootstrap test failure
 * davechen1y has many test fialures
 * thumper is running them all
<davechen1y> god damn store errors
<davechen1y> is that branch ever going to bemerged ?
<davechen1y> $ rm -rf store/
<davechen1y> fixed
<thumper> heh
<thumper> I haven't double checked dependencies for a while
<thumper> should do that
<thumper> menn0: oh hai
<thumper> davechen1y: does the -exec flag for 'go test' in 1.3 allow for different test runners?
<davechen1y> thumper: possibly
<davechen1y> actually no
<davechen1y> it was added so we could insert the nacl loader
<davechen1y> so go run x.go
<davechen1y> compiles x.go
<davechen1y> then rathre than calling ./x
<davechen1y> it calls -$(thing you execed) ./
<davechen1y> thumper: what do you mean by test runner
<thumper> I mean an alternative way to run and format outputs
<thumper> for example, being able to output subunit format
<thumper> so you can then hook into things like testrepository
<thumper> davechen1y, waigani, axw, wallyworld: BootstrapSuite.TestTest in cmd/juju fails for me on trunk  paste.ubuntu.com/7265017/ can anyone else reproduce?
<axw> thumper: heh, proposing a fix now :)
<thumper> could be a precise vs trusty test failure
<thumper> axw: ah cool
<thumper> thanks
<davechen1y> thumper: what the hell
<davechen1y> never seen that before
<axw> nps, I guess it was a package update that triggered it
<thumper> davechen1y: we upload different tools for different versions
<axw> davechen1y: what's your "distro-info --lts" say?
<thumper> davechen1y: could be an error from that
<axw> if it says trusty, then it will (should?) break the test
<axw> thumper: https://codereview.appspot.com/88730043
<davechen1y> $ distro-info --lts
<davechen1y> trusty
<davechen1y> axw: my tests are running
<davechen1y> ... they take a while
<axw> davechen1y: go test -gochech.v -gocheck.f BootstrapSuite.TestTest launchpad.net/juju-core/cmd/juju
<axw> gocheck.v*
<davechen1y> $ go test -gocheck.v -gocheck.f BootstrapSuite.TestTest launchpad.net/juju-core/cmd/juju
<davechen1y> OK: 0 passed
<davechen1y> PASS
<davechen1y> ok      launchpad.net/juju-core 0.079s
<davechen1y> oh
<davechen1y> 0 tests ran :)
<axw> ehh
<davechen1y> oh cmd/juju and cmd/jujud don't pass on ppc
<davechen1y> they take > 600 seconds
<axw> ah, heh :)
<menn0> thumper: g'day. just lurking :)
<thumper> menn0: I'm just writing you an email :)
<davechen1y> machine_test.go:849: c.Assert(err, gc.IsNil)
<davechen1y> ... value *errors.errorString = &errors.errorString{s:"cannot assign unit \"s0/0\" to machine 1: machine \"1\" cannot host units"} ("cannot assign unit \"s0/0\" to machin
<davechen1y> e 1: machine \"1\" cannot host units")
<davechen1y> I get this faliure all thetime
<menn0> thumper: ok cool
<davechen1y> axw: $ go test -gocheck.v -gocheck.f BootstrapSuite launchpad.net/juju-core/cmd/juju
<davechen1y> OK: 0 passed
<davechen1y> i don' tthink that is the test you mean
<davechen1y> PASS
<davechen1y> ok      launchpad.net/juju-core 0.077s
<axw> davechen1y: lemme check what I actually did
<axw> davechen1y: sorry, package needs to come directly after "go test"
 * davechen1y would like to remove all the use of version.Current in our test
<davechen1y> what we _actually_ mean is version.Current.Binary
<davechen1y> ie, make tools of the current development version
<davechen1y> but the use of the larger version.Current means the series of the host that is running the tests leaks into the test
<axw> davechen1y: but actually I just ran from cmd/juju
<axw> davechen1y: and now I'm not sure specifying it on the command line works
 * axw stops trying to be helpful
<axw> thumper: I know why it broke now, if you care: distro-info returns different values depending on the date
<axw> wallyworld_: do you have an SSD?
<thumper> axw: wallyworld_ is having irc connection fun
<axw> so it seems
<thumper> on a hangout with him now
<thumper> the hangout is fine
<axw> ok
<wallyworld_> axw: my irc is shit today :-(
<axw> okey dokey
<axw> wallyworld_: I will ask you about it in the hangout
<wallyworld_> ok
<wallyworld> davechen1y: hi, what was the final outcome of the discussion about bug 1308263? is there something for james to fix or do we need to engage someone else?
<_mup_> Bug #1308263: /var/lib/juju/tools/1.19.0.1-trusty-arm64/jujud: error while loading shared libraries: libgo.so.5: cannot open shared object file: No such file or directory <hs-arm64> <packaging> <juju-core:Triaged by wallyworld> <https://launchpad.net/bugs/1308263>
<davechen1y> wallyworld: finger pointing mainly
<wallyworld> :-(
<davechen1y> nobody knowns who owns the package branch
<davechen1y> when we find tat
<davechen1y> that
<davechen1y> apply the diff I included in the issue and bobs your uncle
<wallyworld> what's the branch url?
<davechen1y> wallyworld: i'm afraid i don't know
<wallyworld> ok, i'll find someone to annoy about it
<thumper> *big sad face*
<thumper> more intermittent failures in cmd/juju and cmd/jujud
<davechen1y> thumper: http://paste.ubuntu.com/7265483/
<davechen1y> my hammer
 * thumper does the lbox wait
<thumper>  a not very interesting branch: https://codereview.appspot.com/88800043
<davechen1y> thumper: LGTM
<davechen1y> nice cleanup
<thumper> davechen1y: ta
<davechen1y>                 instance, ok := instance.(*azureInstance)
<davechen1y>                 if !ok {
<davechen1y>                         continue
<davechen1y> should we really be ignoring that someone has passed instances from one provider to another ?
 * axw tries to remember why that's there
<axw> davechen1y: no I don't think we need that actually, that was my mistake
<axw> davechen1y: I *think* that I was thinking manual instances would come through, but they won't
<davechen1y> axw: we can return an error from that method
<axw> davechen1y: I don't think there's any need to check at all
<axw> nothing should be passing in foreign instances
<davechen1y> axw: ok, i'll just remove the 2 arg check
<jam1> thumper: LGTM, though there was a 'debugf' helper, that should probably be Tracef rather than Debugf, because it had extra suppression
<jam1> thumper: and pinger/watcher stuff is likely to be quite noisy otherwise
<thumper> jam1: ah.. good point
<jam1> axw: are you actually working on bug #1247232 ?
<_mup_> Bug #1247232: Juju client deploys agent newer than itself <canonical-is> <ci> <deploy> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1247232>
<jam1> axw: also, your Kanban cards seem out of sync with reality. "provider-driven machine/unit assignment policy" seems to actually be landed.
<axw> jam1: not really, there's more work to be done to support ec2 AZs
<axw> jam1: yes just started working on #1247232
<_mup_> Bug #1247232: Juju client deploys agent newer than itself <canonical-is> <ci> <deploy> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1247232>
<jam1> axw: it isn't a huge deal, but it would be nice if you could add cards when you pick things up, since it gives visibility on my team to all the great stuff that you're getting done.
<jam1> axw: and great about #1247232, really looking forward to that one
<_mup_> Bug #1247232: Juju client deploys agent newer than itself <canonical-is> <ci> <deploy> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1247232>
<axw> jam1: sure, I try to - bad memory :(
<axw> jam1: would've been good to do it before 1.18 I guess, but it shouldn't be too difficult to maintain bootstrap compat within the 1.18 series
<jam1> axw: yeah, honestly breaking bootstrap compatibility in a stable release would really make me question our guidelines of what "stable" is
<dimitern> 218609
<dimitern> ahem, morning :)
<jam1> morning dimitern, welcome back, did you have a good break?
<dimitern> jam1, hey, yeah, only the weather is shitty :)
<jam1> dimitern: were you diving?
<dimitern> jam1, ah, no - it's too cold for that yet
<dimitern> jam1, i'm showing william & co around
<rogpeppe> mornin' all
 * rogpeppe wishes trusty hadn't halved his battery life
<dimitern> hey rogpeppe
<rogpeppe> dimitern: hiya
<dimitern> rogpeppe, wow i haven't experienced such change in battery life
<dimitern> with trusty
<rogpeppe> dimitern: i just opened the laptop and it predicts me 3.40h lifetime. previously it would predict 8-10 hours
<vladk> jam1, dimitern, rogpeppe: morning
<rogpeppe> vladk: hiya
<jam1> morning vladk
<jam1> rogpeppe: the question is whether the prediction is wrong or the usage is wrong :)
<rogpeppe> jam1: usually the prediction overestimates somewhat
<jam1> rogpeppe: I can change your indicator to *predict* 5 days if you like :)
<vladk> dimitern: I'm going to start work on NetworkerWorker
<rogpeppe> jam1: i certainly used to be able to use it for almost the entirety of a transatlantic flight. i suspect that won't be possible any more.
<jam1> rogpeppe: you could always just boot into the terminal prompt... runlevel 2, IIRC
<rogpeppe> jam1: not great if i want to watch videos, though :-)
<jam1> rogpeppe: http://www.youtube.com/watch?v=0nRPoS2WDJA
<jam1> Text mode Quake
<rogpeppe> ha ha
<jam1> IIRC, it was a OpenGL driver to text
<jam1> but in Unity everything is just a GL texture anyway, right?
<dimitern> hey vladk
<dimitern> vladk, that sgtm
<dimitern> rogpeppe, re ErrorContextf - it's actually a good thing that now errors.Contextf will preserve known error types
<rogpeppe> dimitern: i'm afraid i don't agree
<vladk> dimitern: could you explain shortly the purpose of NetworkerWorker
<dimitern> rogpeppe, why is that?
<rogpeppe> dimitern: it can actually open up possible bugs
<dimitern> rogpeppe, I havent' seen any failures
<dimitern> rogpeppe, and william agrees on the changes btw :)
<rogpeppe> dimitern: because error types (particularly not-found errors) are used to mean many different things
<rogpeppe> dimitern: our error paths are not that well tested
<dimitern> vladk, it's a simple worker that on startup gets the networks the machine is supposed to be on and does what we're currently doing in cloudinit
<dimitern> rogpeppe, this is a gradual improvement on that as well
<rogpeppe> dimitern: if i have a function that uses errors.Contextf that has 10 possible error return paths, currently I know that it doesn't matter if one of them might return a non-found error for some unrelated reason - it won't cause the function itself to return a not-found error
<rogpeppe> dimitern: the direction i'm trying to move towards with errgo is to explicitly mention what error types might pass through
<dimitern> rogpeppe, some users of ECf are actually checking for a specific error before returning the prefixed error, so that it's not lost
<rogpeppe> dimitern: except that it is adding more possible typed errors to the return
<dimitern> rogpeppe, the errgo direction is the way to go I agree, but until we have that, this CL makes things slightly better
<rogpeppe> dimitern: i actually don't think so
<dimitern> rogpeppe, can we agree to disagree? :)
<rogpeppe> dimitern: it changes every single place where errors were masked by ErrorContextf into a place where the error type can pass through
<rogpeppe> dimitern: that invalidates one of the key transformations that my errgo gofix module puts in place
<dimitern> rogpeppe, masking errors with ECf is wrong imo
<rogpeppe> dimitern: i think it's usually right
<dimitern> rogpeppe, well then that means the assumption can be improved, no?
<dimitern> vladk, and in order to do that, we'll need a Networker API facade, similarly to Provisioner API for example
<rogpeppe> dimitern: by making errorcontextf preserve type in all circumstances, it's moving further away from where we want to be (explicitly mentioning what errors pass through)
<rogpeppe> dimitern: which is why i'd like to see two versions of it
<rogpeppe> dimitern: (or perhaps one with a filter function)
<dimitern> rogpeppe, i'm afraid i don't get it still
<dimitern> rogpeppe, what's stopping us from giving a list of errors to pass through ECf later?
<vladk> dimitern, please, take a look https://codereview.appspot.com/88380044/
<rogpeppe> dimitern: because then we have to audit each place to see what errors need to be preserved
<dimitern> vladk, looking
<rogpeppe> dimitern: currently we *know* that those places that use ErrorContextf do not preserve the error type
<dimitern> rogpeppe, won't we need to do that anyway?
<rogpeppe> dimitern: not in places that use ErrorContextf
<rogpeppe> dimitern: which is why i'm not keen on the change - it adds more work to do
<axw> rogpeppe: if battery life has gone down, check if your video drivers are getting loaded correctly - in the past I've had that, and it turned out my driver failed and my CPU was busy software rendering everything
<dimitern> rogpeppe, expand a bit on what more work it creates?
<rogpeppe> axw: hmm, where would i check that?
<axw> rogpeppe: glxinfo, I forget what to grep for though
<rogpeppe> dimitern: it means that we need to audit all those places that previously we would not
<axw> rogpeppe: (I think) glxinfo | grep "OpenGL renderer string"
<rogpeppe> axw: i guess i need to apt-get that
<rogpeppe> hmm, no glxinfo
 * rogpeppe googles it
<axw> anyone know how I can find the revno for 1.18.0? seems there's no tag
<axw> oh.. because we branched
<axw> never mind
<rogpeppe> axw: i can't see any obvious errors in this
<rogpeppe> name of display: :0
<rogpeppe> display: :0  screen: 0
<rogpeppe> direct rendering: Yes
<rogpeppe> server glx vendor string: SGI
<dimitern> rogpeppe, and none of the packages changed in any way since the last audit?
<rogpeppe> http://paste.ubuntu.com/7266522/
<rogpeppe> dimitern: ?
<dimitern> rogpeppe, my point is we need to do the audit anyway
<dimitern> again
<axw> rogpeppe: yeah, looks like it's not that.
<rogpeppe> dimitern: when we use ErrorContextf, there's no need currently to do an audit
<rogpeppe> dimitern: because we *know* that the error type is not preserved
<rogpeppe> dimitern: by making ErrorContextf preserve the type, it adds the need to audit all those places
<dimitern> vladk, reviewed
<dimitern> rogpeppe, that's not true
<dimitern> rogpeppe, in all cases
<rogpeppe> dimitern: no?
<dimitern> rogpeppe, i'll give you an example, just a sec
<dimitern> rogpeppe, there are a few cases where ECf is not called when a specific error needs to be returned
<dimitern> rogpeppe, because it masks the error
<dimitern> rogpeppe, like in service.SetConstraints
<rogpeppe> dimitern: indeed. places that don't use ECf need to be audited
<rogpeppe> dimitern: but my point is that places that *do* use ECf do not need to be audited
<rogpeppe> dimitern: (currently)
<dimitern> rogpeppe, so you're saying "let's keep the old inflexible behavior, so that when we start using errgo it will all get better, rather than incrementally improving bits of the error handling"
<rogpeppe> dimitern: i'm saying keep the masking in the places where we're already masking
<rogpeppe> dimitern: because masking is actually good (although less flexible, as you say - we do need a way of avoiding the masking when we need to)
<rogpeppe> dimitern: so i'd much prefer that we explicitly change a few places where we really know that we want to pass through the type
<rogpeppe> dimitern: rather than changing everything wholesale
<dimitern> rogpeppe, "masking is good" is the root cause of my disagreement here - it's not good, it's actually bad, because even when you have specific error types you need to pass through and have them annotated for the user's sake, you can't do that and have to resort to trickery like "is it ErrX? ok, pass it through unannotated; otherwise just mask it"
<dimitern> rogpeppe, but just so that I'm not the unyielding bad guy here, i'll take your suggestion and have a errors.Maskf working as before and Contextf as it is now and use the latter for networks/interfaces
<dimitern> ;)
<rogpeppe> dimitern: thanks a lot
<rogpeppe> dimitern: BTW re the trickery - that's why errgo.Mask takes a function that allows it to choose which errors to mask
<rogpeppe> dimitern: i'm not that keen on ErrorContextf in general - it adds the same info to all returned errors, where we actually usually want to add specific info about what error path has failed
<dimitern> rogpeppe, i'm just saying that not having proper/smarter error handling is bad and blocking any improvements to the existing error handling so that once if have errgo everything will be perfect is not ok
<rogpeppe> dimitern: i certainly don't want to block improvements to the existing error handling
<rogpeppe> dimitern: but i don't want to move a lot of it in the wrong direction either
<axw> rogpeppe: when you're free, I reproposed the HA upgrade changes: https://codereview.appspot.com/88790043/
<rogpeppe> axw: cool
<axw> rogpeppe: I missed a bunch of things before. this version generates the shared secret and sets it in agent config and state on upgrade
<axw> rogpeppe: there was missing code in agent config serialisation for shared secret, so I added that
<rogpeppe> axw: lovely stuff, thanks
<rogpeppe> axw: looking
<axw> cheers
<axw> btw if anyone gets weird errors when testing cmd/juju (specifically in BootstrapSuite), then pull trunk
<axw> the LTS release date triggered a test failure
<dimitern> rogpeppe, changing all error types to embed wrapper instead of *wrapper forces me to implement func (e *errorType) Error() on each one, and all of them are the same
<rogpeppe> dimitern: i don't think so
<rogpeppe> dimitern: define func (*wrapper) Error() string
<rogpeppe> dimitern: but embed wrapper by value
<rogpeppe> dimitern: and it'll just work
<dimitern> rogpeppe, ah, sorry - i needed to change the Is<type> type assertions to use *type
<rogpeppe> dimitern: yes, definitely
<rogpeppe> dimitern: that's a good thing
<rogpeppe> dimitern: error types by convention are almost always pointers
<dimitern> rogpeppe, right
<axw> jam1: if you have any time for a review, here's a CL for the bootstrap tools match: https://codereview.appspot.com/88840043/
<natefinch> morning all
<axw> morning natefinch
<jam1> axw: so one thing I hit recently, is using trunk to bootstrap
<jam1> because 1.19.0 is publicly available, it uses that, even though local is 1.19.1
<jam1> have you tried that?
<jam1> axw: I'm concerned it will bootstrap 1.19.1 and then immediately downgrade to 1.19.0 which seems a bit odd
<axw> jam1: if you don't --upload, it'll try but warn
<axw> (try to use 1.19.0 that is)
<axw> jam1: it's based on whatever tools it finds in the tools source
<axw> agent-version is set to the most recent tools matching major/minor that exist in the tools source
<axw> jam1: I'll add another test with patch > what's available, so it's more obvious
<jam1> axw: so where is it setting AgentVersion, as I don't see that in your patch
<jam1> perhaps it was already being done elsewhere
<axw> jam1: already done, in SetBootstrapTools
<jam1> axw: so LGTM though I'd like to see the new chunk of code in its own function
<jam1> it seems tightly focused, so lets put it somewhere focused
<jam1> something like "findExactToolMatch" or somesuch
<axw> jam1: sure, will do
<axw> thanks
<jam1> axw: so "APIEndpointFor" I could say APIEndpointFromName, but I really don't like to call it APIEndpoint when it isn't the same signature as the existing APIEndpoint functions.
<axw> jam1: I think FromName would be preferable, to keep in line with NewConn*
<jam1> axw: can you look at https://code.launchpad.net/~jameinel/juju-core/api-caches-connected-to-1308487/+merge/216091 while your there?
<axw> jam1: sure
<jam1> axw: alternatively APIEndpointForEnv() ?
<axw> jam1: that reads better than For. your choice, but IMHO I think it would be better to be consistent with NewConn/NewAPI*FromName
<jam1> axw: well, we got rid of FromName for NewKeyManagerClient and NewUserManagerClient
<jam1> because FromName was considered ugly
<axw> heh :)
<jam1> but this doesn't actually return a client-like thing so I can't use that pattern
<axw> jam1: LGTM on the API caching one
<axw> erm
<axw> sliding
<axw> whatever :)
<natefinch> meeting everybody
<jam1> switching machines
<jam> perrito666 mgz standup
<wallyworld> jam: my connection dropped out earlier when you were talking about upload tools issues. Do you have bug numbers you can point me at?
<jam> wallyworld: https://bugs.launchpad.net/juju-core/+bug/1307643
<_mup_> Bug #1307643: juju upgrade-juju --upload-tools does not honor the arch <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1307643>
<jam> wallyworld: that one?
<wallyworld> ta
<wallyworld> yeah
<wallyworld> was there one more?
<jam> that one links to: https://bugs.launchpad.net/bugs/1304407
<_mup_> Bug #1304407: juju bootstrap defaults to i386 <amd64> <apport-bug> <ec2-images> <metadata> <trusty> <juju-core:Triaged> <juju-core 1.18:Triaged> <juju-core (Ubuntu):Triaged> <https://launchpad.net/bugs/1304407>
<jam> and https://bugs.launchpad.net/bugs/1282869
<_mup_> Bug #1282869: juju bootstrap --upload-tools does not honor the arch of the machine being created <bootstrap> <constraints> <ppc64el> <upload-tools> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1282869>
<wallyworld> that's the one
<jam> wallyworld: right, you roughly fixed bootstrap, but upgrade-juju doesn't have the same check
<wallyworld> hmmm, well that sucks :-(
<wallyworld> i guess we'd wat that in 1.18.2
<jam> yeah, I think so
<rogpeppe> i'm a bit sad that the default-series inference when deploying charms is done client-side
<rogpeppe> it means that juju deploy needs to do an additional EnvironmentGet, which seems wrong, especially as that's an API call we may very well want to restrict access to in the future
<rogpeppe> wallyworld: https://bugs.launchpad.net/juju-core/+bug/1308966
<_mup_> Bug #1308966: default-series choice is inconsistent <juju-core:New> <https://launchpad.net/bugs/1308966>
<wallyworld> great, thank you
<rogpeppe> afk for 40 mins or so
<jam> rogpeppe: so the way it is supposed to work is that it bootstraps based on the version of the charm
<jam> so if you "juju deploy mysql" and there is a "trusty" version of it, it will create trusty targets.
<jam> But maybe there is a second check based on what tools are available?
<jam> (also, if there is only a precise version of mysql, then it deploys that, and you can force it with "juju deploy precise/mysql")
<jam> and *that* resolution of "what does 'mysql' mean" is supposed to be done by the charm store, proxied by the API Server
 * jam1 switches back to other machine
<evilnickveitch> jamespage, sorry, didn't see that earlier. Hopefully i will have another draft today!
<jamespage> evilnickveitch, cool - great
<natefinch> rogpeppe: how goes?
<rogpeppe> natefinch: just doing a live test now
<rogpeppe> natefinch: seems to be working
<rogpeppe> natefinch: it's really hard to see what's going on without some visibility into the state server status though
<rogpeppe> natefinch: would you be able to run up a CL that adds the state server status as we discussed?
<natefinch> rogpeppe: yeah, that's a good idea.  I haven't actually looked at the status code before, so pointers to the right spot would help speed things up.
<rogpeppe> natefinch: here's a branch containing what i did before: lp:~rogpeppe/juju-core/556-status-show-stateserver
<rogpeppe> natefinch: i *think* that we can keep the API interface the same (Jobs, WantsVote, HasVote) because we're already exposing Jobs in the API.
<natefinch> rogpeppe: ok
<rogpeppe> natefinch: but you'll need to change machineStatus in cmd/juju to hold the new string field
<rogpeppe> natefinch: and change formatMachine to set it, based on the jobs/wantsvote/hasvote tuple
<natefinch> rogpeppe: cool. doing
<dimitern> rogpeppe, I've updated https://codereview.appspot.com/87560043/ and it should be good to land now, can you take a look please?
<rogpeppe> dimitern: looking
<wallyworld> jamespage: hi, i need to ask you about bug 1308263 again. i'm not sure of where you and dave got to talking about it. is it just a case of updating the packaging branch with dave's patch? the patch seems to remove a note with your name on it and makes static linking unconditional. i don't know anything about packaging. is there a branch for which a merge proposal can be put up?
<_mup_> Bug #1308263: /var/lib/juju/tools/1.19.0.1-trusty-arm64/jujud: error while loading shared libraries: libgo.so.5: cannot open shared object file: No such file or directory <hs-arm64> <packaging> <juju-core:Triaged by wallyworld> <https://launchpad.net/bugs/1308263>
<jamespage> wallyworld, I really don't have anything todo with the PPA packaging
<wallyworld> oh, i didn't realise it was for a ppa
<sinzui> wallyworld, he speaks the truth
<jamespage> but I would suspect that the debian/rules file needs sycning from lp:ubuntu/trusty/juju-core
<jamespage> sinzui, fwiw I don't intend to track the devel releases next cycle
<jamespage> :-)
<sinzui> \o/
<jamespage> thought that might make you happy :-)
<sinzui> jamespage, Since I cannot build the an arm package and test it, I hesitated to take the changes and do a rebuild. And the PPA doesn't make arm64/ppc
<jamespage> sinzui, indeed
<sinzui> jamespage, a package rule sync is sane at least. I will find some victims
<wallyworld> jamespage: am i looking in the right url? https://code.launchpad.net/~ubuntu-branches/ubuntu/trusty/juju-core  i get a 404
<jamespage> wallyworld, gah - not quite released yet
<jamespage> lp:ubuntu/juju-core
<natefinch> FFS,  I wish bzr would just f'ing warn me when I'm about to switch branches and have local changes.
<wallyworld> jamespage: right, so i can see the debian/rules file in lp:ubuntu/juju-core needs to have dave's patch applied. is that something you do?
<jamespage> wallyworld, the debian/rules file in that branch works just fine
<jamespage> if you take that it will dtrt on ppc64el and arm64
<natefinch> can anyone else bootstrap trunk?
<natefinch> (in local)
<wallyworld> jamespage: so dave's patch which removes the conditional and simple enables static linking everywhere is wrong?
<jamespage> wallyworld, its not required
<natefinch> rogpeppe, jam, dimitern, mgz, perrito666: can you guys bootstrap local on trunk?  Mine doesn't seem to be starting mongo for some reason
<wallyworld> hmmm. why was the issue seen then?
<jamespage> wallyworld, the rules file right now only applies the static linking for builds using gccgo
<rogpeppe> natefinch: i haven't tried recently
<jamespage> wallyworld, I have no idea
<jamespage> if this was a problem all of the 1.18.x builds in the archive would be broken as well - and they are not
<wallyworld> ok. i'll ask dave to clarify. but he is away till las vegas i think
<wallyworld> we do need to get tis sorted before then though
<wallyworld> so we can release 1.19.1
<wallyworld> jamespage: actually, this line: golang_archs:= amd64 i386 armhf      why is arm64 not listed?
<wallyworld> could that have something to do with it?
<jamespage> wallyworld, because its not a golang arch
<jamespage> its a gccgo arch
<jamespage> golang == gc
<wallyworld> rightio
<jamespage> wallyworld, i suspect the confusion is around the filter
<jamespage> that kicks in the logic if the arch is not in that list
<jamespage> i.e. gccgo
<wallyworld> is there any harm is just using the flags all the time?
<jamespage> wallyworld, probably not but I can't change it for trusty now
<wallyworld> ok. that means folks on arm64 will need to get their juju from a ppa then i guess
<rogpeppe> natefinch: here are the last few HA changes, BTW: https://codereview.appspot.com/88860043
<natefinch> rogpeppe: looking
<rogpeppe> natefinch: i'm currently factoring out the peergrouper changes into their own CL with tests
<rogpeppe> natefinch: so it's not quite ready for review
<rogpeppe> natefinch: but it's good to see :-)
<natefinch> rogpeppe: very cool
<natefinch> rogpeppe: if I could bootstrap, I could test out the status change I made.  But I can't even get trunk to bootstrap for some reason
<rogpeppe> natefinch: on ec2, or local?
<natefinch> rogpeppe: local.  haven't tried ec2 yet
<rogpeppe> natefinch: ah.
<rogpeppe> natefinch: let me know if you find the reason
<rogpeppe> natefinch: i'm not going to get distracted right now :-)
<natefinch> rogpeppe: good, keep focused.  The end is in sight :)
<alexisb> rogpeppe, do you need to skip our 1x1 to stay focused
<rogpeppe> alexisb: let's do it anyway. we can keep it short.
<alexisb> sounds good
<wallyworld> sinzui: i found the rules file for the ppa and it seems quite different to the one used for packaging into the distro, and it seems to not use any go build flags at all. it seems to be that rules file needs to be updated?
<sinzui> wallyworld, that is exactly what I will do. I need a victim to test it
<wallyworld> sinzui: dannf is your man
<sinzui> wallyworld, I cannot build arm64...
<wallyworld> sinzui: dannf can, and is the one who raised the bug so i'm sure he would be fine to help test a fix
<sinzui> yep
<wallyworld> sinzui: do you mind if i assign the bug to you?
<wallyworld> or would you prefer it stays with me?
<sinzui> go ahead wallyworld
<sinzui> I can make the changes in a few hours
<wallyworld> ok. it took me way too long to figure out who to nag to get it fixed. i owe jamespage an apology for pestering him about it
<wallyworld> packiging is not my strong point :-(
 * hazmat stares at ouija board to estimate the time/cost of  a new provider in core
<alexisb> Hi all, can someone tell me if juju-core has done any work for MaaS name constraints?
<sinzui> rogpeppe, Juju CI has a HA test. We will turn it on when you tell me you want the feature tested
<dimitern> rogpeppe, LGTM? :)
<rogpeppe> dimitern: oh, sorry, totally got distracted staring at that test code again.
<rogpeppe> dimitern: going back to it :-)
<alexisb> wallyworld, see my q above
<wallyworld> alexisb: yes, it has a constraints attribute called "tags"
<wallyworld> which i believe are used to match with maas names
<wallyworld> but i've not used them mysel
<alexisb> landscape is asking if it it is "available"
<alexisb> I am trying to figure out how to answer them :)
<wallyworld> it should be
<dimitern> rogpeppe, cheers
<alexisb> when did it land?
<wallyworld> a while back i think
<sinzui> FAILURE lp:juju-core r2644 broke local upgrades http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/local-upgrade/1099/console
<wallyworld> dimitern: ^^^^^^^^^^^^^^^^^^^^ can you clarify when maas name supported landed?
<wallyworld> i think it was for 1.16?
<sinzui> ^ abentley is looking to how to change the test. We will report a bug if we think Juju really has a regression
 * rogpeppe breathes a sigh of relief that r2644 isn't his.
<wallyworld> rogpeppe: do you know when maas name support (via the tags constraint) landed? 1.16?
<rogpeppe> wallyworld: sorry, i have no idea
<dimitern> wallyworld, i'm not sure - have to look at the blame log
<wallyworld> ok, np
<wallyworld> dimitern: np. but we *do* support ir right?
<natefinch> alexisb: I think what the deal is, IIRC, is that tags doesn't actually match names, unless you give each maas machine a tag equal to it's name
<natefinch> alexisb: and I think that's why the people using maas don't really like it, because it's a kind of a PITA
<natefinch> alexisb: I wrote the tags code a long time ago (October, maybe?  Definitely before 2014)
<alexisb> wallyworld, I am taking over the corss team meeting but I wasnt given a leader code
<dimitern> wallyworld, I'm not sure how that failure has to do with maas agent name?
<wallyworld> dimitern: no, separate question :-)
<dimitern> wallyworld, support for maas name was added long ago
<wallyworld> dimitern: alexisb was asking to tell the landscape folks
<dimitern> wallyworld, but not sure who or when
<wallyworld> ok, thanks, i thought it was added a while back, yeah
<natefinch> dimitern: I thought we didn't support constraint by name, thought they had to create a tag with the maas name... or maybe I missed the time where we fixed that
<dimitern> natefinch, ok, sorry I don't really follow
<dimitern> natefinch, the name thing was something maas api needed from some point on
<dimitern> natefinch, i don't think it's a constraint, or if it is, it's maas specific
<wallyworld> dimitern: natefinch: there's a tags constraint which was done for maas IIRC
<wallyworld> which matches with nate's recollection you need to add tags t your maas instances i think
 * wallyworld -> bed, then 4 days off for easter :-D
<natefinch> wallyworld: night!
<rogpeppe> dimitern: reviewed
<dimitern> rogpeppe, thanks!
<natefinch> sonofa
<natefinch> $which juju
<natefinch> /usr/bin/juju
<natefinch> well there was like 4 hours wasted
<natefinch> gah... I hate it when tests are so DRY'd out that I can't figure where the actual data is that I need to change to make them pass :/
<natefinch> rogpeppe: how do I fix the status tests so they expect the new status item I added?  it's 2000 lines and the assertion that fails is in the middle somewhere
<rogpeppe> natefinch: i'm afraid you'll have to work that out...
<natefinch> and OMG, whose idea was it to use single letter type names?
<natefinch> M and L my ass
 * natefinch is grumpy
<natefinch> rogpeppe: btw, the status change shows a single bootstrap node stays in "adding vote" perpetually (has vote:n, wants vote: y).  Is that correct?
<rogpeppe> natefinch: that seems wrong, and i didn't observe that
<rogpeppe> natefinch: except... perhaps it needs to peergrouper worker enabled for that to work
<natefinch> rogpeppe: possible
<rogpeppe> natefinch: so, yeah, that's probably right in your branch
<rogpeppe> natefinch: i suppose we could change the state initialisation to set machine 0's HasVote to true
<natefinch> gah, fixing these tests is going to take like 10 times as long as actually writing the code
<hazmat> so how would you guys feel about a provisioner that took 3-4 hrs for a machine..
<natefinch> hazmat: so, azure?
<hazmat> i ask cause that's roughly the time for a baremetal machine on softlayer
<natefinch> hazmat: do they have to go build the machine from parts first?
<hazmat> natefinch, interestingly enough.. after we did all that work for availability set on azure.. there's a new azure tier thats cheaper and sans az and auto load balance.
<hazmat> natefinch, i dunno.. rack and stack via robots i'd assume ;-)
<natefinch> hazmat: it would be an interesting test of our timeouts.... I don't really know how we'd do in such a situation.
<natefinch> I gotta run to pick up my daughter from preschool
<natefinch> I'll be back in probably 45 minutes
<hazmat> natefinch, well for bootstrap we'd use a cloudinstance which would take 3m
<rogpeppe> natefinch, mgz, dimitern: a necessary peergrouper worker fix prior to final HA proposal, review appreciated: https://codereview.appspot.com/88000050
<mgz> rogpeppe: looking
<rogpeppe> mgz: ta!
<dimitern> rogpeppe, looking as well
<rogpeppe> dimitern: ta!
<dimitern> rogpeppe, reviewed
<rogpeppe> dimitern: brill, thanks
<mgz> rogpeppe: just reading through the last test, everything so far makes sense to me
<rogpeppe> mgz: great, thanks
<mgz> heh, mini language for statuses
<mgz> rogpeppe: lgtm. that watch machines bit at the end is slightly hairy, but I guess timeout for failure cases is fine
<rogpeppe> mgz: ta. better than polling i think.
<mgz> only one machine will actually change, right? (or at least change significatly)
<mgz> watching all of them is just for sanity
<rogpeppe> mgz: yeah
<rogpeppe> mgz, dimitern, natefinch: cherry on the cake: https://codereview.appspot.com/87860047/
<rogpeppe> when that lands, i think we can finally say that HA is "in"
<mgz> can say HA! to HA
<dimitern> rogpeppe, will look in a bity
 * rogpeppe tries another live test
<dimitern> bit
<rogpeppe> dimitern: thanks
<natefinch> rogpeppe: looking
<rogpeppe> natefinch: ta!
<natefinch> rogpeppe: lgtm'd
<rogpeppe> natefinch: thanks!
<rogpeppe> mgz: any chance you could have a once-over? this is an important step...
<mgz> rogpeppe: I'll try
<mgz> rogpeppe: seems fine
<rogpeppe> mgz: thanks for looking
<rogpeppe> mgz: i'll land it then. i just encountered a live failure mode, but i think it's better in where people can play with it.
<rogpeppe> mgz: i agree about the random set of changes - it's just the dross after everything else has been parcelled up nicely...
<mgz> :)
<rogpeppe> natefinch: how's the status stuff coming along?
<rogpeppe> natefinch: i was just debugging an issue that could really have benefitted from it...
<natefinch> rogpeppe: trying to fix all the tests.. The code was easy, the tests are a PITA, because I don't get a line number of where the failure actually is
<natefinch> rogpeppe: so I have to infer what data it was using to test
<rogpeppe> natefinch: hmm
<rogpeppe> natefinch: is that the problem where gocheck doesn't print the whole stack?
<rogpeppe> natefinch: if so, that's bugged me for years
<natefinch> rogpeppe: it's more that we're using table-ish driven tests... without any unique identifier on the row we're testing (at least as far as I can tell)
<rogpeppe> natefinch: in which case, please add an identifier
<natefinch> rogpeppe: that's a good idea
<rogpeppe> natefinch: in the meantime, could you push your branch so that i can use it to debug my stuff? :-)
<natefinch> rogpeppe: sure
<natefinch> rogpeppe: lp:~natefinch/juju-core/rogpeppe-556-status-show-stateserver
<natefinch> rogpeppe: I was wrong, there is an identifier, it's just logged nondescriptly like 50 lines before the test failure output (due to all the log output),
<rogpeppe> natefinch: ah. maybe make it more obvious by adding a newline then?
<natefinch> rogpeppe: yeah
<rogpeppe> HA has landed
<rogpeppe> woo
<natefinch> wooo!
<rogpeppe> natefinch: i thought we were going to go with a single string representation of the server status
<rogpeppe> natefinch: as we discussed yesterday
<natefinch> rogpeppe: sorry, I thought those were the strings we had agreed on... I tried to get them from chat history, but might have grabbed a non-final proposal
<rogpeppe> natefinch: hmm, maybe i've used the wrong branch
<natefinch> rogpeppe: the ones I put in were, indeed multi-word, honestly, I thought it was odd that we had decided on that after you mentioned wanting single word statuses
<rogpeppe> natefinch: i'm still not sure :-)
<natefinch> rogpeppe: we can just hyphenate them, if you want
<rogpeppe> natefinch: hmm, looks like i was using the wrong branch
<natefinch> rogpeppe: this is what I have:
<natefinch> 	switch {
<natefinch> 	case hasVote && wantsVote:
<natefinch> 		s = "has vote"
<natefinch> 	case hasVote && !wantsVote:
<natefinch> 		s = "removing vote"
<natefinch> 	case !hasVote && wantsVote:
<natefinch> 		s = "adding vote"
<rogpeppe> natefinch: if that saves them from being quoted by yaml, i reckon that's maybe goo
<natefinch> 	case !hasVote && !wantsVote:
<rogpeppe> d
<natefinch> 		s = "no vote"
<natefinch> 	}
<natefinch> rogpeppe: good point
<rogpeppe> natefinch: i still see the jobs field in there though
<rogpeppe> natefinch: on balance i think perhaps the hyphen will be better
<natefinch> rogpeppe: I thought you had wanted the jobs field?
<rogpeppe> natefinch: no, i don't think it's necessary in the human-readable status
<natefinch> rogpeppe: ok, my misunderstanding, I'll remove it
<rogpeppe> natefinch: the existence of member-status implies that it's an environ manager
<natefinch> rogpeppe: right, I was thinking that, and pretty much everything can host units
<rogpeppe> natefinch: i'm wondering whether "state-server-member-status" might be a slightly more obvious name
<rogpeppe> natefinch: (long, but more self-explanatory that it's referring to a state server)
<natefinch> rogpeppe: if we had more than one state-specific status I'd suggest indenting them all under a "state-server" heading
<natefinch> rogpeppe: but for now, I guess an extra-long name is ok
<rogpeppe> natefinch: one other tiny thing: perhaps put the member status at the start of the struct, so it's more obvious when reading the status output
<rogpeppe> yay! i just successfully deleted the bootstrap machine entirely and i've still got a working environment
<natefinch> nice
<rogpeppe> it does take ages for mongo to recover
<rogpeppe> sinzui: you can try enabling the HA test now...
<rogpeppe> sinzui: although i think you'll probably want to wait for natefinch's status change so you can gate the test's actions on state server status changes
<rogpeppe> s/status change/status change CL/
<natefinch> rogpeppe: https://codereview.appspot.com/87760057/
<rogpeppe> natefinch: BTW having seen the voting status in action, it reads very nicely
<natefinch> rogpeppe: yeah, it's definitely nice to have
<rogpeppe> alexisb: HA has landed and seems to work (to a first approximation anyway :-])
<rogpeppe> natefinch: LGTM
<rogpeppe> natefinch: ideally there would be some tests for different statuses, but i'm not overly picky
<natefinch> rogpeppe: I presume the hasvote wantsvote stuff is tested on the server, and testing the tiny functionality of the thing that translates those into strings would just require duplicating the function itself.   I guess testing that the API returns the right data could be useful.
<rogpeppe> natefinch: well, you wouldn't strictly need to duplicate the function itself
<rogpeppe> natefinch: as i said, i don't mind too much otherwise i wouldn't've LGTM'd
<alexisb> sweetness rogpeppe !!!!!
<alexisb> well done team!
<rogpeppe> alexisb: thanks
<bac> sinzui: how would you like to do a charmworld review for old time's sake?
<sinzui> okay bac
<bac> sinzui: sweet.  https://codereview.appspot.com/88980043
<bac> its a mishmash of drive bys on my way to doing a real branch.  but solves some very annoying problems.
<sinzui> bac: LGTM
<sinzui> thank you for modernising the interpolation too
<rogpeppe> natefinch: failed test...
<natefinch> rogpeppe: yeah, I forgot the api server changed, too, so I hadn't run those tests.  Fixing now
<rogpeppe> natefinch: np
<rogpeppe> natefinch: i'm just about to send an email to juju-dev@lists.ubuntu.com mentioning the HA changes. it mentions the status. i'll postpone the email until your branch lands
<rogpeppe> right, that's me done
<natefinch> rogpeppe: fixed, re-approving
<natefinch> rogpeppe: good work on the HA stuff.
<rogpeppe> natefinch: you too
<rogpeppe> see y'all in denver or las vegas!
<natefinch> rogpeppe: I swept up the few scraps you left in your wake.  But glad it's done
<natefinch> rogpeppe: have a good trip!
<rogpeppe> natefinch: i will do my best
<rogpeppe> natefinch: hopefully i won't discover my passport's gone missing :-)
<rogpeppe> g'night all
<natefinch> rogpeppe: haha
<bac> thanks sinzui.  a modern interpolation is a happy interpolation.
<lazyPower> is there an upper limit to the number of machines you can utilize on the local provider?
<natefinch> lazyPower: generally RAM is the limiting factor.
<natefinch> lazyPower: there's no hard coded limit
<lazyPower> ok thats what i was looking for. thanks natefinch
<natefinch> lazyPower: welcome
<stokachu> so juju will copy my apt proxy information on my host system in a local provider install
<stokachu> but if i set my host system to localhost:8000 then the deployed instances in the local provider wont work
<stokachu> as the lxc device is 10.0.3.1
<stokachu> is that expected?
<natefinch> stokachu: that sounds like something we haven't tested, but I'm not 100% sure.
<stokachu> i guess _should_ juju be copying an existing apt-proxy conf file from the host system to the deployed
<natefinch> stokachu: well, so for local, your host system *is* the bootstrap node
<stokachu> right, so, if my host system has an apt proxy conf directive pointing to http://localhost:8000 then thats whats in the deployed node
<stokachu> however localhost:8000 on the deployed node doesn't work
<stokachu> should be 10.0.3.1 for the lxc device or whatever network bridge is defined
<stokachu> i could be totally off on this
<natefinch> That's probably not something we really support. You're welcome to file a bug and people who know that area of the code better than I do will look at it :)
<natefinch> It's EOD for me, and the boss is getting antsy, so I gotta run
<psivaa>  hello, could i know how to workaround 'src/launchpad.net/juju-core/utils/ssh/ssh_gocrypto.go:84: undefined: ssh.ClientConn' when running 'go install -v launchpad.net/juju-core/...' pls?
<psivaa> go get -u -v launchpad.net/juju-core/... gives the following logs: http://pastebin.ubuntu.com/7270917/
<lazyPower> psivaa: they may be EOD'd/about to EOD. normal hours are 9am - 5pm EST
<psivaa> lazyPower: ack, my bad in fact. was supposed to query this earlier, but got held up fixing some issues in my system. thx . i'll pick it up when someone answers
<lazyPower> ack. I just wanted you to know they will see it, but it may be a bit before you get a response.
<psivaa> ack, thank you :)
#juju-dev 2014-04-18
 * arosales doesn't see wallyworld around . . .
<jam1> hmm... CI is unhappy
<jam1> It looks like AMZ is just out of instances
<jam1> but "hp upgrade" is failing
<jam1> and has been since r2644
<jam1> I wonder if axw tested upgrade with the bootstrap patch version change
<jam1> I see this in the logs, which looks worrying: machine-2: 2014-04-17 12:56:26 INFO juju.worker.apiaddressupdater apiaddressupdater.go:58 API addresses updated to []
<jam1> also weird, all-machines.log only shows machine-2 getting the updated tools. Nothing about machine-0 even noticing that it wanted them.
<jam1> I do wish we could run in --debug mode...
<jam1> I wonder if we could log what API's are being called in Info mode, even if we don't log all of the details we would in Debug mode.
<jam1> anyway, upgrade is borked... :(
<vladk> jam1: good morning
<dimitern> vladk, morning
<dimitern> vladk, are you working today?
<dimitern> vladk, jam is usually off on fridays (swapping them with sundays)
<dimitern> mgz, you around today?
 * dimitern will desperately need reviewers today :/
<vladk> dimitern: morning, I'm working, and you?
<dimitern> vladk, yes - there was some misunderstanding on my part - i thought i had public holiday on monday and decided to take it, but it turned out it's today.. meh no big deal
<vladk> dimitern, could you take a look https://codereview.appspot.com/88380044/
<dimitern> vladk, looking
<dimitern> vladk, reviewed
<vladk> dimitern: thanks
<vladk> why do you setupNetworks only if config.HasNetworks?
<vladk> I think they should setup always, so they setup also on bootstrap and add-machine commands.
<dimitern> vladk, eventually we'll do that, but for now the requirement is to set them up only when specified explicitly when deploying
<vladk> dimitern: I told about this with jam, his opinion:
<vladk> I'm probably happier if we set up everything rather than only the ones the user supplied
<vladk> as then if you want to deploy another service, in say a container, then we know that we do have that network
<vladk> now, when we have the NetworkWorker that can do dynamic setup of networks
<vladk> it matters less
<vladk> because then we can just set up the minimum, and then add ones that we need later.
<vladk> I thought we were starting all by default.
<dimitern> vladk, exactly, the worker will give us that
<dimitern> vladk, but remember we're doing MVP now, so we're taking some shortcuts
<voidspace> morning all
<voidspace> rogpeppe: ping
 * rogpeppe is not really here
<voidspace> It's a UK bank holiday today and Monday
<voidspace> rogpeppe: that's what I was checking
<rogpeppe> voidspace: indeed it is
 * voidspace would like to not really be here as well
<voidspace> rogpeppe: so I was just checking in
<rogpeppe> voidspace: i'm just sorting out insurance and packing before going away tomorrow...
<voidspace> rogpeppe: happy good friday
<rogpeppe> voidspace: you too
<voidspace> rogpeppe: have a good weekend
<rogpeppe> voidspace: you'll be happy to head HA has now landed...
<rogpeppe> s/head/hear/
<voidspace> rogpeppe: I just saw some emails
<voidspace> rogpeppe: awesome
<voidspace> rogpeppe: ah, looks like you're going on a proper holiday
<rogpeppe> voidspace: have a go - see if you can make it work...
<voidspace> rogpeppe: enjoy
<rogpeppe> voidspace: i am!
<voidspace> rogpeppe: will do, I'll try and break it :-)
<jam1> vladk: dimitern: I'm "off" today, but if you need something you can ping me.
<rogpeppe> voidspace: taking advantage of colorado mountain stuff
<voidspace> rogpeppe: ah, of course
<voidspace> gophercon
<jam1> Upgrade is broken, so I might give it a poke, as we can't do any sort of release with that
<voidspace> rogpeppe: see you in vegas then
<rogpeppe> voidspace: up
<rogpeppe> yup
<rogpeppe> voidspace: aye
<jam1> hi voidspace, welcome back
<jam1> (well, welcome back to IRC at least :)
<voidspace> jam1: hi, and thanks
<jam1> voidspace: are you back in the UK?
<voidspace> jam1: yep
<voidspace> jam1: for a week at least
<jam1> voidspace: lucky you to get to fly trans atlantic every other week
<voidspace> jam1: I'm waiting to see how bad the jetlag is
<voidspace> it usually lasts me a week
<jam1> voidspace: just don't change your TZ for this week
<voidspace> so I should recover just in time
<jam1> wake up 6 hours late
<voidspace> jam1: hah, I did consider it
<voidspace> jam1: my daughter has other ideas
<jam1> voidspace: I thought you liked to sleep in and start late anyway
<voidspace> hah, normally I do
<voidspace> Brett Cannon (Python core dev) will be looking for work soon, and has Go experience (by the way)
 * voidspace subtly changing topic away from my sleeping habits
<voidspace> he's an excellent dev, hopefully we have a slot for him when he becomes available
<jam1> voidspace: no such luck mr sleepy. I think I've actually met Brett at a pycon a few years ago. Is he the one who was doing importlib stuff?
<voidspace> jam1: yep, currently a googler - great guy
<jam1> voidspace: if he's looking, you should get his name in to Alexis, I think our slots are filling up pretty quickly.
<voidspace> jam1: he's not looking just yet - but planning a move in the next few months
<voidspace> he has to wait a bit longer for his options to vest, so I don't think we can tempt him into an early leave
<voidspace> he'll get in touch with me though, so we'll see
<jam1> voidspace: so "a few months" is certainly long enough for things to change. But at least atm the head count should all be filled by then (I think)
<dimitern> jam1, ah, alright then
<jam1> dimitern: since you and fwereade are hanging out, can you poke him about Manifest-charm-deployer ? I'm pretty sure I LGTM'd it, and it would be good to have in the next release
<fwereade> jam1, I'm catching up on email at the moment, and I'll want to run a fresh live test against reality with the latest code, but I'll land that today
<jam1> fwereade: sounds good.
<jam1> fwereade: as for "user" all the other files were explicitly checked with ft.File("name") preserveUser is checking the same thing but doesn't *look* the same as the previous N checks.
<jam1> so ignore me
<jam1> but I missed it because it wasn't matching the pattern
<fwereade> jam1, yeah, I worried vaguely that it was less obvious, but thought I'd prefer to stick with the var than dupe the definition
<fwereade> jam1, maybe I should be putting them allin vars, but that felt inconvenient
<jam1> fwereade: at this point, we've spent too long discussing it vs just landing it :)
<fwereade> jam1, quite so :)
<jam1> fwereade: do you have a take on the "juju bootstrap" should always be exactly pinned discussion?
<jam1> I feel like the discussion has gotten into bickering, and I'm trying to keep it productive.
<jam1> I feel like we haven't really come to a consensus
<jam1> so I'm want to actually change our behavior without having that.
<jam1> But I don't want to come across as just being petulant or defensive.
<jam1> I think abentley does have some points we should consider, but I also want us to come up with a strong consensus as I'd rather have consistency in this area, rather than doing it X for 2 releases and then changing our minds again.
<fwereade> jam1, yeah, just catching up and pondering
<jam1> fwereade: anyway, I'd appreciate more input in the thread, as I feel like more comments from me isn't productive anymore.
<jam1> rogpeppe: if you're still here: https://bugs.launchpad.net/juju-core/+bug/1309444
<_mup_> Bug #1309444: peergrouper spins in local/upgraded environment <ha> <logging> <juju-core:Triaged> <https://launchpad.net/bugs/1309444>
<jam1> local provider doesn't support --replicaset (yet?) so the peergrouper just bounces endlessly
<fwereade> jam1, do you remember who's been working on the precise/trusty lxc issues?
<jam1> and I *think* upgraded environments will do the same (today)
<jam1> fwereade: do you have an issue in particular?
<rogpeppe> jam1: oops, the peergrouper worker should be disabled for local environments
<rogpeppe> jam1: upgraded environments might be ok if axw's branch has landed
<jam1> rogpeppe: is it sufficient for it to see "not in replicaset mode" and just exit gracefully?
<rogpeppe> jam1: it could check the replica set status and see that there are no members
<rogpeppe> jam1: that would be somewhat more graceful
<jam1> rogpeppe: well this ends up in the log 2x:
<jam1> 2014-04-18 09:45:41 ERROR juju.worker.peergrouper worker.go:137 peergrouper loop terminated: cannot get replica set status: cannot get replica set status: not running with --replSet
<jam1> 2014-04-18 09:45:41 ERROR juju.worker runner.go:218 exited "peergrouper": cannot get replica set status: cannot get replica set status: not running with --repl
<jam1> thats a lot of not-getting the replica set status :)
<rogpeppe> jam1: it can't just exit though - otherwise it'll be restarted (we should perhaps fix that so it's possible for a worker to exit without being restarted)
<jam1> rogpeppe: I thought we had a way for workers to exit with "I'm finished now"
<rogpeppe> jam1: i don't think so, but we may do
<fwereade> jam1, see #juju-gui just now
<rogpeppe> jam1: i always thought that just exiting with a nil error should be enough
<fwereade> rogpeppe, jam1, they were meant to not be restarted if they return nil
<fwereade> rogpeppe, jam1, not sure what happened if that never landed, I thought we rediscussed that exact issue a few weeks ago
<rogpeppe> fwereade: yeah, we should do that
<rogpeppe> fwereade: (if we don't already)
 * rogpeppe is really gone now
<jam1> rogpeppe: fwereade: "if workerInfo.start == nil { // The worker has been deliberately stopped"
<rogpeppe> jam1: ah, that's cool then
<fwereade> excellent
<wwitzel3> hello
<natefinch> mgz, perrito666, dimitern, fwereade: staup?
<dimitern> natefinch, coming
<natefinch> standup that is
<dimitern> fwereade, mgz, vladk|offline, natefinch, i'd appreciate a review on this critical bug fix https://codereview.appspot.com/89260044
<dimitern> sinzui, when you're about to release 1.19.1, please add this to the release notes https://bugs.launchpad.net/juju-core/+bug/1307513/comments/1
<_mup_> Bug #1307513: Support multiple (physical & virtual) network interfaces with the same MAC address on the same machine <tech-debt> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1307513>
<sinzui> dimitern, Fab!
<sinzui> Thank you very much dimitern
<dimitern> sinzui, :) np
<natefinch> sinzui: I have a fix for this bug, but I don't think I actually know the area of the code well enough to be confident that it's the right fix.  It sort of looks like it should never have worked before:  https://bugs.launchpad.net/juju-core/+bug/1304407
<_mup_> Bug #1304407: juju bootstrap defaults to i386 <amd64> <apport-bug> <ec2-images> <metadata> <trusty> <juju-core:Triaged> <juju-core 1.18:Triaged> <juju-core (Ubuntu):Triaged> <https://launchpad.net/bugs/1304407>
<sinzui> natefinch, I think other rules that forced the local arch were in play
<natefinch> sinzui: could be.  It looks like the code that picks the image gets a list of matching ones back (one for amd64 and one for 386) and then just takes whichever is first
<sinzui> ouch
<sinzui> natefinch, Isn't the real issue with that bug is that we think amd64 is preferred either because AWS prefers it or because we see our local arch as the preference?
<sinzui> natefinch, would setting a large mem constraint also force selection of amd64? (all the i386 instances have small memory)
<natefinch> sinzui: what I was seeing was that we were passing in the constraints the user had defined (in this case, no constraints), and then filtering the list of images down to the cheapest ones, which leaves m1.small, and there's two versions, 386 and amd64.  Since there was more than one that matched what the user wanted we just picked the first one.  I don't know how it was being restricted to local arch before.
<natefinch> sinzui: what my change does is that if there's more than one image that matches what the user requested, it prefers to choose the one with the same arch as the local machine
<natefinch> sinzui: but if such a thing doesn't exist, it just picks whatever is first in the list
<sinzui> natefinch, I agree with your solution. I suppose for many people, the arch is not important so ling as the service works
<natefinch> sinzui: right.  If it were up to me, I'd probably just default to always choosing amd64... it's generally the default these days anyway, and matching the dinky old laptop someone is using to run the client on is not very intuitive to me.... but I'm not sure if other people had a specific reason for matching the local machine
<sinzui> natefinch, I agree. I suspect the surprise was we expect the more powerful /better arch to be selected
<natefinch> sinzui: I can send a quick email to the list about it.  either way is trivial to code.  I'd think most people would presume 64 bit is better, all things being equal.
<sinzui> yep
<natefinch> sinzui: any idea on how to reproduce the upgrading issue?   I just went 1.18->trunk without a hitch
<sinzui> natefinch, Your units upgraded?
<natefinch> sinzui: yep   just standard wordpress/mysql
<natefinch> sinzui: but 1.19 now
<natefinch> (1.19.1.1)
<sinzui> The tests all set tool-metadata-url to the testing streams
<sinzui> CI is republishing tools now. in 15 minutes there will be tools that are trunk
<natefinch> sinzui: I did use --upload-tools, that probably skews things
<sinzui> Yes, users are not supposed to use that
<sinzui> I cannot set tools on joyent because several the libs used by precise are tool old
<sinzui> and the machines are not allowed to get deps from anywhere other than Lp
 * sinzui ponders giving up for the day
<natefinch> sinzui: I'll take another look without upload-tools
<natefinch> sinzui: how do I get it to upgrade without if 1.19 hasn't been released
<sinzui> set tools-metadata-url to one of the testing streams
<sinzui> natefinch, which cloud are you using
<natefinch> sinzui: aws
<sinzui> natefinch, juju-dist.s3.amazonaws.com/testing/tools
<sinzui> hmm, publication of the latest rev is stalled though
 * sinzui looks
<natefinch> juju status
<natefinch> hehh
<natefinch> man I hate that we have environments.yaml and the jenvs
<natefinch> I always go edit the environments.yaml first and wonder why it doesn't do anything
<natefinch> sinzui: I can't make tools-metadata-url work.  I put it in the correct jenv, but I still get no upgrades available
<sinzui> natefinch, This is what I have for aws: http://pastebin.ubuntu.com/7278665/
<natefinch> sinzui: maybe the problem is that I changed it after I bootstrapped
<sinzui> I already reported that bug :)
<sinzui> natefinch, I think I cannot be changed if it was ever set
<sinzui> But when not set, you can set it once
<natefinch> sinzui: is there more to setting it than just editing the jenv?
<sinzui> natefinch, I would prefer to run the tests by bootstrapping with the released stream,  then change tools-metadata-url to use the testing stream
<sinzui> natefinch, I used juju set-env tools-metadata-url=https://juju-dist.s3.amazonaws.com/testing/tools
<sinzui> It works for my joyent env which didn't have that key set
<natefinch> sinzui: ahh, that worked
<sinzui> natefinch, oh was that key set in the env before?
<sinzui> I want to update the bug with your experience
<natefinch> sinzui: no, it wasn't set before
<natefinch> sinzui: I just thought I could edit the jenv directly, but that doesn't seem to work
<sinzui> the jenv is just the pre-state used to bootstrap the env.
<natefinch> sinzui: I thought that was the environments.yaml :/
<sinzui> I think there is a bug reported asking that juju warn when the jenv doesn't match the env
<natefinch> sinzui: I guess that's the pre-pre-state
<sinzui> :)
<natefinch> sinzui: anyway, my upgrade worked fine
<sinzui> natefinch, looky http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/hp-upgrade/1090/console
<sinzui> That just happened in CI
<sinzui> What is the lastest revs?
<sinzui> joyent is testing upgrade now
<natefinch> sorry, not sure what you're asking about latest revs
<sinzui> joyent just passed
<sinzui> natefinch, r2655 works
<sinzui> dimerrs branch doesn't look related, but it has a positive effect
<sinzui> natefinch, local just passed
<sinzui> azure and aws are testing now
<sinzui> and you effectively did the aws test this hour
<natefinch> sinzui: yeah, that's cool
<natefinch> man..... I really don't get how launchpad is organized.  How do I just get a list of commits to trunk? It shouldn't be that hard to find
<natefinch> ahh, I see..  I can't click on trunk, because that's a "Series"
<sinzui> I like the qbzr extension locally
<sinzui> Lp lists the last 10 commits to the branck
<natefinch> sinzui: huh, never occurred to me
<sinzui> the branch is owned by gobot
<sinzui> https://code.launchpad.net/~go-bot/juju-core/trunk
<sinzui> I know that since I need to explicitly be that bot to tag the branch.
<natefinch> sinzui: right
<natefinch> sinzui: Andrew made a commit this morning that looks like it might have been more likely to fix things.  At least it mentioned upgrade changes.
<natefinch> sinzui: 2654
<sinzui> I think so to reading the log, but the hp, joyent, and local upgrade tests failed with that specific rev.
<natefinch> sinzui: weird
<natefinch> sinzui: well, EOD for me regardless.  Glad it seems to be upgrading now, whatever the reason
<sinzui> Have a nice weekend natefinch
<natefinch> sinzui: you too
#juju-dev 2014-04-19
<bodie_> does anyone know offhand how to discard a shelved changeset?
<bodie_> (using bzr)
#juju-dev 2015-04-13
<jw4> clear
<jw4> :)
<jam> hey jw4, are you in the plenary room now?
<jw4> jam, I think so
<jw4> I'm sitting next to marcoceppi_
<jw4> :)
<jam> :). Richard Waagner Saal should be plenary. Just saying hi. I'm not in Nuremberg this week, so I figured I'd hang out as much as I can to say hi on IRC
<jw4> cool, I'm sorry to miss seeing you jam
<jam> same. I was looking forward to it, but my wife also has an important work conference this week, and I won the last time we had a conflict
<jam> rick_h_: ping, I'm curious to hear a summary of your Juju API discussion in 2 hrs.
<rick_h_> jam: ok
<rick_h_> jam:  https://docs.google.com/document/d/1G0GeXJ2qGpl8xYDOmud20lQwumIoUu47hUR-Sk7H0Ig/edit#heading=h.hkd4am63fnj6 is the notes for the pre-session stuff
<jam> rick_h_: thx
<mgz> rogpeppe: https://github.com/juju/xml/pull/4
<mgz> 'want resolved mode "", got "no-hooks"; still waiting'
<mup> Bug #1443432 was opened: 1.23b4 i/o timeout, LXC instances can't talk to state server <cloud-archive> <landscape> <juju-core:New> <https://launchpad.net/bugs/1443432>
<mup> Bug #1443440 was opened: 1.23-beta4 sporadically fails autotests <juju-core:New> <https://launchpad.net/bugs/1443440>
<mgz> mwhudson: I'm wondering if I can remove the workaround for bug 1425788 - but I'm assuming it's basically bug 1381671 so semi-randomly appears?
<mup> Bug #1425788: multiple definition of http.HandlerFunc <ci> <gccgo> <regression> <test-failure> <juju-core:Fix Released by dimitern> <https://launchpad.net/bugs/1425788>
<mup> Bug #1381671: reboot tests fail to build on gccgo <ci> <gccgo> <patch> <reboot> <regression> <test-failure> <juju-core:Fix Released by gz> <gcc-4.9 (Ubuntu):Invalid> <gccgo-5 (Ubuntu):Fix Released> <gccgo-go (Ubuntu):In Progress by mwhudson> <gcc-4.9 (Ubuntu Trusty):Invalid> <gccgo-5 (Ubuntu
<mup> Trusty):Invalid> <gccgo-go (Ubuntu Trusty):New> <gcc-4.9 (Ubuntu Utopic):Invalid> <gccgo-5 (Ubuntu Utopic):Invalid> <gccgo-go (Ubuntu Utopic):New> <https://launchpad.net/bugs/1381671>
<mgz> so, just because I'm not hitting it now, probably doesn't mean it's safe to build that file, till we're using a gccgo build with the fix
<mwhudson> mgz: sounds plausible
<mwhudson> mgz: i think yes
<mup> Bug #1443432 changed: 1.23b4 i/o timeout, LXC instances can't talk to state server <cloud-archive> <landscape> <juju-core:New> <https://launchpad.net/bugs/1443432>
<jw4> marcoceppi_: oi... have a sec?
<jw4> marcoceppi_: how do we generate a new charm stub?
<davecheney> natefinch:  curl https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz | tar xz
<davecheney> natefinch:  curl https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz | tar -C /usr/local xz
<marcoceppi_> jw4: read the docs ;)
<mup> Bug #1439535 changed: 1.23-beta websocket incompatibility <api> <deployer> <regression> <juju-core:Invalid> <python-jujuclient:Incomplete> <https://launchpad.net/bugs/1439535>
<mup> Bug #1443541 was opened: juju 1.23b4 vivid panic: runtime error: invalid memory address or nil pointer dereference <openstack> <uosci> <juju-core:New> <https://launchpad.net/bugs/1443541>
#juju-dev 2015-04-14
<mup> Bug #1376906 changed: arm* agents cannot call home, but amd64 can <armhf> <hs-arm64> <hs-arm64-maas-juju> <hs-moonshot-maas-juju> <maas> <maas-provider> <network> <juju-core:Expired> <https://launchpad.net/bugs/1376906>
<perrito666> marcoceppi_:  around?
<marcoceppi_> perrito666: kind of, what's up?
<perrito666> marcoceppi_: priv
<perrito666> marcoceppi_: I am pretty sure you forgot your "Ill come back in half an hour" promise to cloudbase
<marcoceppi_> perrito666: I'm talking to design, I'll swing by in 15 minsisshhhhhh
<perrito666> marcoceppi_: Guardare per credere.
<jamespage> wallyworld_, hey - I know you are all sprinting but https://bugs.launchpad.net/juju-core/+bug/1443541 is causing us pain - we can't verify openstack on vivid right now
<mup> Bug #1443541: juju 1.23b4 vivid panic: runtime error: invalid memory address or nil pointer dereference <openstack> <uosci> <juju-core:Triaged by ericsnowcurrently> <https://launchpad.net/bugs/1443541>
<jamespage> and I have a rc to upload - but would like to have tested it first :-)
<wallyworld_> jamespage: yeah, that one is on our radar - the guy to fix it just arrived from pycon, so hopefully we'll get something done today
<wallyworld_> jamespage: there's also this vivid issue we're looking at, bug 1443440
<mup> Bug #1443440: 1.23-beta4 sporadically fails autotests <local-provider> <mongodb> <systemd> <ubuntu-engineering> <vivid> <juju-core:Triaged by ericsnowcurrently> <https://launchpad.net/bugs/1443440>
<jamespage> wallyworld_, awesome
<jamespage> wallyworld_, glad to hear its in hand - just flagging it up :-)
<wallyworld_> i'll keep you informed
<jamespage> wallyworld_, thanks
<wallyworld_> jamespage: yeah, sorry about delays, travel etc got in the way
<jamespage> wallyworld_, np - completely understand
<jw4> OCR, forward port of change that landed on 1.23 recently http://reviews.vapour.ws/r/1423/
<perrito666> lately everyone is becoming really lazy with PR descriptions
<jw4> perrito666, updated per your feedback:  http://reviews.vapour.ws/r/1423/
<ericsnow> jamespage: FYI, I'm working on those two vivid juju bugs
<ericsnow> jamespage: I'll keep you posted on the progress
<jw4> perrito666, ta
<mup> Bug #1443904 was opened: Apache-licensed code has been borrowed with violation of license requirements <juju-core:New> <https://launchpad.net/bugs/1443904>
<perrito666> gsamfira_: ^^ that has your name all over it :p
<perrito666> literally
<coreycb> I'm trying to force a config-changed hook to fire from within an action.  is there a better way to do that than just calling the function directly?
<perrito666> jw4: ^^
<jw4> hmmm; unfortunately I don't think that will work very well right now coreycb
<jw4> the issue is that Actions and Hooks are in the same Uniter loop
<jw4> this is something that has come up before often; I want some time with fwereade to plan a solution
<coreycb> jw4, ok, thanks for the info
<jw4> coreycb, :-/
<mgz> gsamfira_: http://www.dangermouse.net/esoteric/whenever.html
<jw4> coreycb, fwereade says just invoke the hook script directly rather than trying to get the hook to run
<jw4> ../../hooks/config-changed.py
<jw4> or something like that
<jw4> coreycb, or better yet refactor the hook and action code so that common functionality is available to both
<coreycb> jw4, ok thanks.  I had some success with a few charms by just executing our decorated config-changed function, but then I ran into an error and chalked it up to some missing juju state, but perhaps I can just do that.
<jw4> coreycb, hopefully... let me know how it goes
<coreycb> jw4, ok
<voidspace> dooferlad: https://github.com/juju/juju/compare/1.23...voidspace:1.23-environ-subnets.diff
<mup> Bug #1443942 was opened: SNAT for externally routed traffic should be only for EC2 and for subnets in the VPC <juju-core:Triaged by dooferlad> <https://launchpad.net/bugs/1443942>
<dooferlad> voidspace/TheMue: Could I get a review? http://reviews.vapour.ws/r/1428/
<arosales> http://reports.vapour.ws/latest-bundle-and-charm-results
<thumper> voidspace, tasdomas here is the branch lp:~thumper/charms/trusty/postregresql/actions
<ericsnow> jamespage: FYI, those two vivid bugs have fixes committed
<ericsnow> jamespage: let me know if there are any other problems like that
<mup> Bug #1444037 was opened: juju-core 1.22.1 is not packaged in Ubuntu <block-proposed> <juju-core:New> <https://launchpad.net/bugs/1444037>
<redelmann> Hi, is charmstore down? or im having some internal problems?
<mup> Bug #1444066 was opened: First set-env for a var always results in a WARNING <juju-core:New> <https://launchpad.net/bugs/1444066>
<mbruzek> Hello gui guys.  What the heck is this? https://jujucharms.com/requires/ice-cream
<mbruzek> brownies, raspberry sauce, marshmallows ??
<mbruzek> benji ^
<benji> we were hungry
<benji> (it was for an [ill-conceived] demo)
#juju-dev 2015-04-15
<jw4> perrito666: OCR PTAL http://reviews.vapour.ws/r/1434/
<perrito666> jw4: I am so no OCR, not even in my own tz :p but ill take a look
<perrito666> oh axw_ already stamped it
<jw4> perrito666: you're a gem - thanks axw_
<perrito666> jw4: contrary to what you said, there is no actions doc
<jw4> perrito666: one typo bugfix and you get all antagonistic with me
<perrito666> lol
<jw4> perrito666: https://jujucharms.com/docs/1.20/actions
<jw4> perrito666: better link: https://jujucharms.com/docs/stable/actions
<perrito666> jw4: I was for some reason looking for that in github.com/juju/juju/docs
<perrito666> jw4: so, whenever I want to action get, I need to do agivenparam.name
<perrito666> ?
<jw4> perrito666: no, the .name was just an example to show how to retrieve nested values
<perrito666> that is a misleading example, we should add some more info there
<perrito666> btw, very useful docs, cheers
<jw4> perrito666: congrats to bodie_ who did most of the docs
<jw4> I did mostly state server, API's and back end stuff in the uniter, etc.
<jw4> so I'm a little fuzzy on the command line, action-set, action-get, etc.
<mup> Bug #1444333 was opened: subordinate relation removed still shows on the subordinate side <juju-core:New> <https://launchpad.net/bugs/1444333>
<mup> Bug #1444333 changed: subordinate relation removed still shows on the subordinate side <remove-relation> <subordinate> <juju-core:New> <https://launchpad.net/bugs/1444333>
<mup> Bug #1444353 was opened: juju backups/restore does not support local provider, but doesn't warn user <juju-core:New> <https://launchpad.net/bugs/1444353>
<mup> Bug #1444354 was opened: juju backups includes previous backups in saved file <juju-core:New> <https://launchpad.net/bugs/1444354>
<dimitern> jam, ping
<dimitern> jam, we're in the same hangout btw
<mattyw> gsamfira_, ping?
<gsamfira_> mattyw: pong
<mattyw> gsamfira_, hey there, have you ever tried installing vim with choclatey? I've followed the instructions on your wiki and thought I'd try installing vim at the end and it doesn't work -  but I don't understand the error. Just wondered if you'd tried it?
<mattyw> gsamfira_, this is what I get: vim : The term 'vim' is not recognized as the name of a cmdlet
<gsamfira_> ahh yes
<perrito666> mattyw: you are not embracing the proper spirit man
<perrito666> use notepad
<gsamfira_> mattyw: windows has this bad habbit of installing apps in its own folder
<gsamfira_> mattyw: those folders are rarely in $PATH
<gsamfira_> mattyw: so you end up having to type out the absolute path to it
<mattyw> gsamfira_, there's a vim74 in Program Files x86 that looks like it - but that gives me the same problem
<mattyw> gsamfira_, although maybe my powershall skills are failing me
<mattyw> perrito666, is this the start of a notepad <-> vim flamewar ;)
<gsamfira_> &mattyw:  'C:\Program Files (x86)\vim\vim74\vim.exe'
<gsamfira_> try: & 'C:\Program Files (x86)\vim\vim74\vim.exe' C:\test.txt
<gsamfira_> ampersand is important if running from powershell
<mattyw> gsamfira_, ah yes - vim74 is a folder - I feel like an idiot!
<perrito666> mattyw: it may very well be, we just need someone to take notepad's end
<mattyw> gsamfira_, got it - thanks very much
<mattyw> perrito666, lolz
<perrito666> mattyw: here gsamfira showed me (he is afk btw) that a nice way to do things is to create a bin folder, add it to your path and then add a lot of files called yourcommand.cmd with a call to the full path inside
<perrito666> and that puts the command in the path
<perrito666> isn't it super easy and user friendly?
 * perrito666 rejoices while running the testsuite in windows to fix it
<jw4> notepad rulez
<mattyw> perrito666, I used to do that in my old windows days - just trying to remember it all - it's all hidden at the back of my brain
<perrito666> mattyw: in my old windows days I used desktop icons
<perrito666> I was never a power hacker of windows
<mattyw> perrito666, same - but I had the custom folder with my own bins
 * perrito666 thinks that he should now install vim and add it to his bin folder
<gsamfira_> mattyw: thats how the brain protects itself from traumatic memories. It berries memories deep in the back of your mind
<perrito666> just like windows does with binaries, apparently
<mattyw> perrito666, haha
<jamespage> ericsnow, hey - I'm struggling to build juju from source right now to testout your fixes
<jamespage> go install -v github.com/juju/juju/...
<jamespage> ericsnow, http://paste.ubuntu.com/10826372/
<jamespage> any ideas?
<jw4> jamespage: looks like you may need to run godeps
<jw4> go install launchpad.net/godeps
<jw4> pushd github.com/juju/juju && godeps -u dependencies.tsv
<thumper> jw4: do you know about the Makefile?
<jw4> ah
<jw4> yesh
<jw4> jamespage: make install-dependencies
<jw4> thumper: but I don't think that runs godeps does it?
<thumper> if you have export JUJU_MAKE_GODEPS=true
<jw4> ooooh
<thumper> then make check runs godeps then the tests
<thumper> o/ jamespage
 * jw4 needs to read the docs again now that it's not all new
<jamespage> hey thumper
<jw4> ... to me
<perrito666> wallyworld: I am u here, next to the elevator
<jam> wallyworld: looking at your status feedback
<jam> does that mean "juju status > out.txt" is going to switch to UTC vs "juju status" is going to use localtime?
<wallyworld> perrito666: i'm already in mark's room
<wallyworld> jam: yes
<jam> wallyworld: sounds bad to me
<wallyworld> was a strong preference from IS
<perrito666> k just fetch me out after you finish
<jam> having the output change because I try to save it
<jam> is bad
<jam> now, having a flag, or "-o" or something else would be semi ok
<wallyworld> jam: i understand and made the same arguments
<wallyworld> but eventually caved in
<jam> wallyworld: so I think we push back/elevate this to someone else
<wallyworld> ok
<wallyworld> i'll add a --utc flag
<wallyworld> to the status command
<wallyworld> jam: would you be happy with a --utc flag?
<jam> wallyworld: I'm fine with it
<wallyworld> ok, thanks, i'll do that
<jam> I can understand the "use UTC if I'm scripting/documenting this for posterity"
<wallyworld> i do agree with you
<jam> but doing "juju status"
<jam> hmm
<jam> "juju status | vim"
<jam> and I get something different
<wallyworld> jam: actually, this is for status-history
<wallyworld> and with status
<wallyworld> it is only for the Since timestamp
<perrito666> I am + 1 with same command should give consistent output
<wallyworld> in the yaml
<perrito666> wallyworld: ^
<mwhudson> sinzui: can you run grep-dctrl -FBuild-Depends golang-go  -sPackage /var/lib/apt/lists/*Sources on your trusty system?
<sinzui> mwhudson, https://pastebin.canonical.com/129629/
<mwhudson> sinzui: thats a bit more than just docker :/
<benji> yeah, the DWIM when piping status has bitten me before
<perrito666> ericsnow: https://bugs.launchpad.net/juju-core/+bug/1444354
<mup> Bug #1444354: juju backups includes previous backups in saved file <backup-restore> <juju-core:Triaged> <https://launchpad.net/bugs/1444354>
<perrito666> :) enjoy
<jw4> mattyw: did your 'cannot retrieve meter status' fix get pushed to master yet, or just 1.23?
<mattyw> jw4, I there was definately a pr for it - tasdomas is the guy to speak to
<jw4> mattyw: k, thanks - I just triggered the bug again just upgrading from 1.24 to 1.24
<mattyw> jw4, it did land in master, can you send a paste of the error?
<jw4> mattyw: http://paste.ubuntu.com/10827323/
<mattyw> tasdomas, can you take a quick look ^^ I thought this would have been fixed by monday's change
<tasdomas> mattyw, looking
<mattyw> tasdomas, thanks very much
<tasdomas> jw4, the PR landed in master this morning
<mgz> jw4 - action hero, on now
<mwhudson> cherylj, mattyw, wallyworld_: icon time https://plus.google.com/+GustavoNiemeyer/posts/UpjgWHbpjCU
<mattyw> mwhudson, awesome
<cherylj> mwhudson: I'm on it!
<mattyw> cherylj, mwhudson I'm working on the amulet tests at the moment, hope to get them done by the end of our charm comp session tomorrow
<jw4> mgz: aww :)
<tasdomas> jw4, seems to be working fine for me
<jw4> hmm; upgrade from 1.24 to 1.24?
<jw4> I'll try again in a clean env
<tasdomas> jw4, yes
<jw4> basically I deployd from master... deployed a charm... modified juju.... did juju upgrade-juju --upload-tools
<mwhudson> i ran my ec2 out of disk space testing my fixes :-)
<tasdomas> jw4, did the same - git checkout master; git checkout HEAD^; [build,bootstrap,deploy];git checkout master;[build, upgrade]
<jw4> tasdomas: okay - thanks - I'll verify that it wasn't an issue with my env.
<tasdomas> jw4, do you have the gitref you were upgrading from ?
<jw4> tasdomas: aa1532b
<jw4> tasdomas: one step I didn't see in your upgrade... go install ./... before juju upgrade
<tasdomas> jw4, yeah, that's what I mean when I said "build"
<jw4> kk
<tasdomas> jw4, did an upgrade from aa1532b to HEAD, did not see any problems
<tasdomas> jw4, if you manage to reproduce me, do ping or email or find me
<mup> Bug #1444037 changed: juju-core 1.22.1 is not packaged in Ubuntu <block-proposed> <packaging> <juju-core (Ubuntu):Triaged by strikov> <juju-core (Ubuntu Trusty):New for strikov> <https://launchpad.net/bugs/1444037>
<mup> Bug #1444576 was opened: Skipped TestUpgradeSteps* in cmd/jujud/agent/upgrade_test.go <skipped-test> <test-failure> <juju-core:Triaged> <juju-core 1.23:Triaged> <https://launchpad.net/bugs/1444576>
<mup> Bug #1434555 changed: ppc64el unit test timeout <blocks-release> <ci> <ppc64el> <regression> <unit-tests> <juju-core:Fix Released by wallyworld> <juju-core 1.23:Fix Released by wallyworld> <https://launchpad.net/bugs/1434555>
<mup> Bug #1434555 was opened: ppc64el unit test timeout <blocks-release> <ci> <ppc64el> <regression> <unit-tests> <juju-core:Fix Released by wallyworld> <juju-core 1.23:Fix Released by wallyworld> <https://launchpad.net/bugs/1434555>
<mup> Bug #1434555 changed: ppc64el unit test timeout <blocks-release> <ci> <ppc64el> <regression> <unit-tests> <juju-core:Fix Released by wallyworld> <juju-core 1.23:Fix Released by wallyworld> <https://launchpad.net/bugs/1434555>
<natefinch> marcoceppi_: you around?
<natefinch> or lazyPower_ ?
<marcoceppi_> natefinch: whats up?
<natefinch> marcoceppi_: looking at the mongodb charm.... does  it only provide username, port, and hostname in its rels as a database?
<marcoceppi_> natefinch: probably. its an old ass charm
<natefinch> Am I supposed to be able to find out what it sets from its documentation?
<marcoceppi_> natefinch: in a perfect world yes. in the near future: yes, at this moment nothing like this exists
<natefinch> marcoceppi_: fair enough
<marcoceppi_> natefinch: it actually looks like it just sets hostname, port, type, and replset
<marcoceppi_> authenticated acces was experimental at the time
<natefinch> marcoceppi_: heh
#juju-dev 2015-04-16
<axw> jam: alexis says she's looking for someone to open the room, so will (may?) be late
<jam> axw: thanks, I was looking around for someone to find out what was up
<jam> late from now, so clearly will be late :)
<axw> jam: I'm done drafting the resources spec (apart from addressing comments and so on). Will be meeting with Mark later today. If you have any time this morning, would be great to get your feedback
<axw> no worries if you're busy
<jam> axw: is that Draft or Draft deprecated orâ¦ which one ?
<axw> jam: this one: https://docs.google.com/document/d/1-MCcHFQJ1lNc0vdkqcaPk6MYxiSKactYcmb-cC7oUPU/edit?disco=AAAAAJHXNzg
<jw4> tasdomas: I must have had some weird env issue yesterday... the upgrade worked fine for me today...
<tasdomas> jw4, that's good to hear
<wallyworld> jam: ping
<jam> wallyworld: pong
<Makyo> thumper, https://gist.github.com/makyo/d74cce8ed85eba05b4cb Here's some dumps from juju-bundlelib in a few different formats.  LMK if they're what you're looking for, but now that I've found out how to dump in any format, it's easy to get whatever eg https://github.com/makyo/juju-bundlelib/blob/dsl-bundleformat/dump-plaintext.py
<thumper> Makyo: thanks
<wallyworld> jam: hey, i've added a thought bubble to the api errors doc as to how this work might be surfaced to the user, could you take a look? i've had little time to do it and have to rush off to a 1.23 meeting but can check back a little later
<jam> k
<wallyworld> ty
<wallyworld> will check back when i can
<perrito666> Hey I saw a couple of sticker pads around yesterday, anyone has a spare one for the meta button? I would like to make it more accurate since I dont windows :p
<DaveJ__> Hi Guys, I was wondering if there was any update on CentOS support for JuJu ?
<dimitern> jam, me and voidspace are here in the room
<jam> dimitern: k
<lazyPower> o/ core, has anyone tried dhx on 1.23-beta4?
<lazyPower> i'm getting a weird coredump issue coming from subprocess
<perrito666> DaveJ__: there is a proposed patch right now, actually
<dimitern> jam, so i've updated the slide for 1.25 to mention accepting positive and negative spaces, as well as subnet tags in constraints
<lazyPower> This appears to be introduced only in beta4, i can confirm this works on beta3, and 1.22 as expected.
<DaveJ__> perrito666: Thanks - how close to being ready is it ?
<DaveJ__> perrito666: Is it something I could take an try out ?
<perrito666> DaveJ__: can you repeat the question to aznashwan
<perrito666> ?
<DaveJ__> perrito666: Sure
<DaveJ__> aznashwan:  Hi aznashwan, do you have details of the patch to support CentOS based charms on JuJu ?
<lazyPower> https://bugs.launchpad.net/juju-core/+bug/1444861
<mup> Bug #1444861: Juju 1.23-beta4 introduces ssh key bug when used w/ DHX <dhx> <juju-core:New> <https://launchpad.net/bugs/1444861>
<aznashwan> DaveJ__: currently, we are finalizing the review process and (I sincerely hope), we will get it merged into the current tip of master
<DaveJ__> aznashwan: Thanks.  I'd be keen to try it out as soon as I can get my hands on it
<aznashwan> DaveJ__: we've done the support for bootstrapping and charm deployment on CentOS (we don't actually have a CentOS charm yet, we just use trusty's mysql one for testing)
<aznashwan> DaveJ__: the hooks run, so once an actual CentOS charm is written I see no reason why it should not work
<DaveJ__> aznashwan:  That sounds promising.    Will it be possible to deploy the patch to an existing environment?  I'm just wondering if I can go ahead and start some prep, so I can try this out as soon as it;'s ready
<Makyo> thumper, http://paste.ubuntu.com/10825990/
<thumper> Makyo: thanks again
<mgz> gsamfira: I also deleted a whole bunch of tmpfiles and did this: <http://paste.ubuntu.com/10831779/>
<mup> Bug #1444861 was opened: Juju 1.23-beta4 introduces ssh key bug when used w/ DHX <dhx> <juju-core:New> <https://launchpad.net/bugs/1444861>
<perrito666> gsamfira: Is it possible that, besides fixing charm to close the charm before deleting it you forgot to update dependencies.tsv_
<perrito666> ?
<gsamfira> perrito666: I will be proposing a new branch that fixes tests again and updates dependencies.tsv in one go
<mattyw> cherylj, ping?
<cherylj> mattyw: what up?
<mup> Bug #1444912 was opened: /var/lib/juju gone after 1.18->1.20 upgrade and manual edit of agent.conf <juju-core:New> <https://launchpad.net/bugs/1444912>
<mup> Bug #1444912 changed: /var/lib/juju gone after 1.18->1.20 upgrade and manual edit of agent.conf <juju-core:New> <https://launchpad.net/bugs/1444912>
<mup> Bug #1444912 was opened: /var/lib/juju gone after 1.18->1.20 upgrade and manual edit of agent.conf <juju-core:New> <https://launchpad.net/bugs/1444912>
<mup> Bug #1333682 was opened: upgrading 1.18 to 1.19 breaks agent.conf <panic> <upgrade-juju> <juju-core:Confirmed for wallyworld> <https://launchpad.net/bugs/1333682>
<mup> Bug #1333682 changed: upgrading 1.18 to 1.19 breaks agent.conf <panic> <upgrade-juju> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1333682>
<mup> Bug #1445053 was opened: Failed to bootstrap local provider after interrupting <local-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1445053>
<mup> Bug #1445063 was opened: addressable containers cannot resolve non-FQDN in maas <addressability> <kvm> <lxc> <maas-provider> <network> <oil> <openstack> <uosci> <juju-core:Triaged> <juju-core 1.23:In Progress by dimitern> <https://launchpad.net/bugs/1445063>
<mup> Bug #1445066 was opened: 'juju action do' should have a --wait option <juju-core:New> <https://launchpad.net/bugs/1445066>
<jw4> thumper: leave me alone!
<thumper> jw4: simple, stop getting alerts on action
<thumper> any action line?
<thumper> how's this action?
<jw4> thumper: :-p
<thumper> jw4: another action bug coming for you...
 * thumper waits action
<jw4> thumper: yay
<jw4> thumper: I love bugs
<jw4> thumper: thanks for caring
<jw4> thumper: please keep them coming
<thumper> jw4: in that case, I have about 900 open juju bugs for you
<jw4> thumper: you need to look harder
<dimitern> a quick review for bug 1445063 anyone? https://github.com/juju/juju/pull/2090
<mup> Bug #1445063: addressable containers cannot resolve non-FQDN in maas <addressability> <kvm> <lxc> <maas-provider> <network> <oil> <openstack> <uosci> <juju-core:Triaged> <juju-core 1.23:In Progress by dimitern> <https://launchpad.net/bugs/1445063>
<dimitern> that is - fix for that bug
<dimitern> voidspace, ^^
<thumper> dimitern: done
<thumper> dimitern: rb didn't pick it up
<thumper> dimitern: so did it on github
<dimitern> thumper, sweet thanks!
<mup> Bug #1445078 was opened: 'juju action fetch' should allow output of a single result <juju-core:New> <https://launchpad.net/bugs/1445078>
<mup> Bug #1445093 was opened: local provider breaks when scaling up units <juju-core:New> <https://launchpad.net/bugs/1445093>
<mup> Bug #1445146 was opened: juju run fails after upgrade to 1.23-beta4.1 <juju-core:New> <https://launchpad.net/bugs/1445146>
<mup> Bug #1445146 changed: juju run fails after upgrade to 1.23-beta4.1 <juju-core:New> <https://launchpad.net/bugs/1445146>
<mup> Bug #1445146 was opened: juju run fails after upgrade to 1.23-beta4.1 <juju-core:New> <https://launchpad.net/bugs/1445146>
<mup> Bug #1445174 was opened: bootstrap suprious log: WARNING no architecture was specified, acquiring an arbitrary node <landscape> <juju-core:New> <https://launchpad.net/bugs/1445174>
<mup> Bug #1445186 was opened: 1.23, search domain missing from lxc resolv.conf <cloud-installer> <landscape> <juju-core:New> <https://launchpad.net/bugs/1445186>
<mup> Bug #1445186 changed: 1.23, search domain missing from lxc resolv.conf <cloud-installer> <landscape> <juju-core:New> <https://launchpad.net/bugs/1445186>
#juju-dev 2015-04-17
<mattyw> davecheney, lesson learned - never eat
<mup> Bug #1445338 was opened: Win builds fail: cloudconfig/userdatacfg.go:65: undefined: unixConfigure <ci> <regression> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1445338>
<mup> Bug #1445338 changed: Win builds fail: cloudconfig/userdatacfg.go:65: undefined: unixConfigure <ci> <regression> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1445338>
<mup> Bug #1445338 was opened: Win builds fail: cloudconfig/userdatacfg.go:65: undefined: unixConfigure <ci> <regression> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1445338>
<mup> Bug #1445369 was opened: Juju core freaks if /etc/os-release is not present <juju-core:New> <https://launchpad.net/bugs/1445369>
<mup> Bug #1445369 changed: Juju core freaks if /etc/os-release is not present <juju-core:New> <https://launchpad.net/bugs/1445369>
<mup> Bug #1445369 was opened: Juju core freaks if /etc/os-release is not present <juju-core:New> <https://launchpad.net/bugs/1445369>
<mup> Bug #1445146 changed: juju run fails after upgrade to 1.23-beta4.1 <juju-core:Invalid> <https://launchpad.net/bugs/1445146>
<wallyworld_> jam: looks like irc dropped out - the maas guys are in another meeting, did you want to chat about the error stuff?
<jam> wallyworld_: I'm happy to chat if you'd like
<jam> I'm in the hangout
<wallyworld_> jam: ok, give me a sec and i'll change rooms
<mup> Bug #1445369 changed: Juju core freaks if /etc/os-release is not present <juju-core:Invalid> <https://launchpad.net/bugs/1445369>
<mup> Bug #1444537 was opened: Log files from units deployed in lxc containers are shared on the physical node <logging> <lxc> <juju-core:Triaged> <https://launchpad.net/bugs/1444537>
<mup> Bug #1444537 changed: Log files from units deployed in lxc containers are shared on the physical node <logging> <lxc> <juju-core:Triaged> <https://launchpad.net/bugs/1444537>
<mgz> menn0: I still need to ammend that branch so one sec on review
<menn0> mgz: kk
<mup> Bug #1444537 was opened: Log files from units deployed in lxc containers are shared on the physical node <logging> <lxc> <juju-core:Triaged> <https://launchpad.net/bugs/1444537>
<mgz> menn0: it's okay, just wanted to run the unit test on vivid, they're good
<jam> mgz: menn0: would it be reasonable to add a logger.Debugf to that code you landed ?
<mgz> jam: yup, totally.
<jam> (generally if you are suppressing an error it would be good to log it at least)
<menn0> mgz:  ship it
<menn0> jam: you mean regarding upstart detection?
<jam> menn0: when you get the error that /sbin/initctl isn't there, just log at Debugf level
<jam> then if we get it for weird reasons
<jam> we can enable debug logging
<mgz> he means a bit like we added when trying to debug, dump the actual error back from exec.Command
<jam> and see what the error we're getting is
<menn0> jam: yep, that's a good idea
<mgz> I shall add now
<menn0> as long as it's just at debug
<jam> mgz: menn0: absolutely. But anytime you "add reporting to figure out what's going on", that's a good sign we may want a logger.Debugf for future use
<menn0> for sure
<mgz> logger.Debugf("exec %q failed: %v", initctlPath, err)
<mgz> maybe?
<mgz> anything else?
<jam> mgz: looks good to me
<menn0> maybe %#v so we see the field names for the error
<mgz> okay, done
<jam> mgz: can you test the output to confirm?
<menn0> mgz: sorry that should be %+v
<jam> (hard code a different /sbin/init, or make a test fail)
<menn0> talk about backseat coding... :)
<jam> menn0: well, %+v is just the field names for auto format, #v isthe go syntax which often includes field names
<menn0> jam, mgz: whatever works
<mgz> I pushed the log statement, will land shortly
<mup> Bug #1445338 changed: Win builds fail: cloudconfig/userdatacfg.go:65: undefined: unixConfigure <ci> <regression> <windows> <juju-core:Fix Released by gabriel-samfira> <https://launchpad.net/bugs/1445338>
<mgz> going to need re-check review
<mgz> for test junk I missed
<mgz> okay, have pushed new test-fixing revision
<mup> Bug #1445658 was opened: juju fills logs with attempts to do work on implicitly removed containers <juju-core:New> <https://launchpad.net/bugs/1445658>
#juju-dev 2016-04-18
<menn0> grrrr... an interface that mimicks bits of State using the same names but slightly different method signatures
<menn0> what a great idea :)
<menn0> wallyworld: sorry, another question. how does a machine agent get into the "Started" state? where is that set?
<wallyworld> menn0: from memory, that happens when the agent comes up on the node and "phones home" via the precense api
<wallyworld> until then the staus is pending/allocationg
<menn0> wallyworld: with this issue I'm looking at, it seems that the peergrouper isn't even noticing the new controller nodes so isn't adding them to the replicaset
<wallyworld> or it may be a direct status call, not sure
<menn0> wallyworld: and the only reason that that can happen (really) is if the machine doesn't get to "started"
<wallyworld> you mean not noticing the status change?
<wallyworld> is there a watcher for status?
<menn0> wallyworld: no it can't be that
<menn0> wallyworld: the peergrouper polls so even if the watcher wasn't working any changes still get noticed
<menn0> wallyworld: I can see the peergrouper running regularly in the logs, but the other machines aren't picked up
<wallyworld> when enable ha is run, the machine collection does get the new machine docs
<menn0> wallyworld: those machines do connect to the API on machine-0 even when their own mongodb instance isn't part of the replicatset yet
<wallyworld> what is the peergrouper polling for?
<menn0> I think it polls in case the replicaset changes underneath it (in mongodb)
<wallyworld> sorry, i meant what data is it polling?
<menn0> there are watchers so that it reacts straight away something changes in state
<menn0> and the polling is there to catch any changes in mongodb
<menn0> the only way the peergrouper can be not reacting to the new machines is if they don't get to "started"
<wallyworld> so it watches for machine status changes
<menn0> not really
<menn0> it watches for controller changes as well as waking up periodically
<wallyworld> that's what "started" is isn't it? a machine agent status
<menn0> and when it wakes up due to either a poll or a controller change
<menn0> it checks the controller machines against the mongodb replicaset config
<menn0> and udpates things if required
<menn0> but controller machines are only considered if they're "started"
<menn0> yes machine agent status
<menn0> obtained via (state.)Machine.Status()
<wallyworld> so, can we see in the logs if that SetStatus() api call is being made
 * menn0 checks
<wallyworld> by the new agent when it comes up via jujud
<menn0> wallyworld: they are being made:
<menn0> 2016-04-14 21:01:33 DEBUG juju.apiserver apiserver.go:291 <- [475] machine-1 {"RequestId":67,"Type":"Machiner","Version":1,"Request":"SetStatus","Params":"'params redacted'"}
<menn0> 2016-04-14 21:01:33 DEBUG juju.apiserver apiserver.go:305 -> [475] machine-1 324.478971ms {"RequestId":67,"Response":"'body redacted'"} Machiner[""].SetStatus
<menn0> don't know what the status is being set to but the calls are there
 * menn0 checks the machine-1 logs at that time
<wallyworld> if you run with trace you'll see tha params
<wallyworld> it sounds like maybe the watcher is suspect?
<menn0> wallyworld: I can't repro the problem reliably, it's intermittent
<wallyworld> awesome
<menn0> so I'm stuck with the DEBUG logs from the CI failures
<menn0> I think the machine was set to started
<menn0> 2016-04-14 21:01:33 INFO juju.worker.machiner machiner.go:105 "machine-1" started
<wallyworld> sounds like it
<menn0> so why the hell isn't the peergrouper noticing the machine.....
<wallyworld> so we need to know why the watcher is not firing or why the peergrouper doesn't see that
<wallyworld> exactly
<menn0> if the logs were at TRACE I'd see why in the peergrouper logs
<wallyworld> we can ask QA i guess
<menn0> wallyworld: I've just simulated a machine not setting its status to started (by hacking the machiner to not set the status for a specific tag) and I see exactly the same log output in the machine-N.logs and the mongodb logs.
<menn0> wallyworld: so that's the likely intermediate cause
<menn0> wallyworld: now to figure out why the machine isn't "started" when the peergrouper looks
<wallyworld> menn0: hmmm, interesting. maybe there's a network/routing issue at play?
<menn0> wallyworld: it can't be that because we see the new machines contact the API on the bootstrap node
<menn0> we even see it make the SetStatus call
<menn0> wallyworld: my guess is that something is setting the status to another value after the machiner sets it to started
<menn0> wallyworld: is that possible to your knowledge?
<wallyworld> not ottomh
<wallyworld> but could be i guess
<wallyworld> the logs should show extra SetStatus calls
<menn0> wallyworld: there aren't any
<wallyworld> menn0: could there be a legit issue with the watcher firing?
<wallyworld> it does get there eventually right?
<wallyworld> but after a long time
<menn0> wallyworld: well it polls every minute
<wallyworld> that is true
<wallyworld> so wtf
<menn0> wallyworld: so even if the watcher doesn't fire the peergrouper still sees any changes at least once a minute
<wallyworld> yes
<menn0> wallyworld: the other way the peergrouper might not be seeing a controller machine is if it's not in the controllerinfo doc
<menn0> wallyworld: but that seems less likely based on my read of the code
<wallyworld> yeah, i'm not overly familiar with the controllerInfo state code
<menn0> wallyworld: the code that updates it looks solid... it's either going to work or the whole txn which adds to it and adds the machine docs fails
<menn0> and the machine docs are clearly being added
<wallyworld> menn0: could it be one of those corner cases we've hit before with txn etc?
<menn0> wallyworld: maybe... I don't think so
<menn0> i've just noticed something else in the peergrouper code... digging some more
<wallyworld> ok
<menn0> wallyworld: I just noticed that the list of controller machines and machine status is only updated/checked based on the watcher
<menn0> wallyworld: and the code to do that is horrid
<wallyworld> menn0: you saying the poll every minute relies on the watcher firing?
<mup> Bug #1571476 opened: "juju register" stores password on disk <juju-core:Triaged> <https://launchpad.net/bugs/1571476>
<mup> Bug #1571477 opened: juju 1.25.3: juju-run symlink to tmpdir <landscape> <juju-core:New> <https://launchpad.net/bugs/1571477>
<wallyworld> to update the machien list to poll
<mup> Bug #1571478 opened: juju login/register should only ask for password once <juju-core:Triaged> <https://launchpad.net/bugs/1571478>
<menn0> wallyworld: no the poll happens every minute regardless
<wallyworld> right
<wallyworld> but the list to poll
<wallyworld> comes frm the watcher?
<menn0> wallyworld: and during that poll the latest status of the mongo replicaset is updated
<menn0> wallyworld: but the controller data from state is only refreshed based on the watchers
<menn0> wallyworld: and the way it's done screams of data race to me
<wallyworld> seems like ti :-(
<wallyworld> it
<menn0> there's a separate goroutine which passes a method on itself over a channel to main peergrouper goroutine
<wallyworld> i wonder why we poll at all?
<menn0> the peergroup receives the method, calls it
<menn0> which then goes and modifies stuff back on the main watcher again
<wallyworld> sounds like a good refactoring is in order
<menn0> yep, I might try that
<menn0> but first I'm going to land some simple logging changes so that we can get more info when it fails in CI
<wallyworld> sgtm
<menn0> wallyworld: thanks for being a sounding board all day... it helps a lot having to explain what i'm seeing
<wallyworld> tis ok, i didn't do much
<wallyworld> or anything really :-)
<wallyworld> axw: any chance of a small review? https://github.com/juju/bundlechanges/pull/22
<axw> wallyworld: sure
<axw> wallyworld: a little confused by the second paragraph in the description - I don't see any change relating to series override
<axw> wallyworld: is that a change to be made in juju/juju?
<wallyworld> axw: a test failed - the bundle said trusty but the charm in the bundle said precise. the test exected trusty, and i changed to precise
<wallyworld> actually, that may have been me adding trusty first up to the test
<wallyworld> and then having to change to precise
<wallyworld> so maybe ignore that
<axw> wallyworld: yeah, there are only additions in the code and tests
<wallyworld> yeah, i just checked too, so i think i just forgot what i changes/added
<axw> wallyworld: LGTM, please drop that para before merging to avoid confusing anyone else :p
<wallyworld> yep :-)
<wallyworld> ty
<menn0> wallyworld: peergrouper logging changes: http://reviews.vapour.ws/r/4623/
<menn0> back soon
<wallyworld> awesome, will look in a sec
<wallyworld> axw: here's the other half of that fix, only a small change http://reviews.vapour.ws/r/4624/
<axw> wallyworld: reviewed
<wallyworld> ty
<wallyworld> axw: yeah, i'll need to rework the common function. technical it is "user" specified (not charm specified) but i take the point about the message
<wallyworld> well, i guess not really user specified
<axw> wallyworld: no, I don't think so. in the original use of that function, it was user-specified because the user specifies with --series on the command line
<axw> here they're just doing "juju deploy some-bundle"
<wallyworld> yeah, i reconsidered by position :-)
<wallyworld> my
<axw> cool
<frobware_> jam: didn't get anywhere with tests (re RB). Was testing some stuff that dooferlad was proposing.
<dimitern> frobware: morning
<dimitern> frobware: I have 2 PRs up for review, more coming later
<dimitern> http://reviews.vapour.ws/r/4614/ and https://github.com/juju/gomaasapi/pull/42
<dimitern> frobware: ping
<frobware> dimitern: morning
<dimitern> frobware: morning :)
<dimitern> frobware: have you seen the links I pasted above?
<dimitern> (still not quite sure ERC doesn't shit itself and appears connected but it isn't)
<frobware> just about to look, but have a 1:1 with jam in a minute
<dimitern> frobware: sure, np
<jam> frobware: didn't like what I had to say?
<dimitern> wallyworld: ping
<dimitern> or axw ?
<axw> dimitern: heya, what's up?
<dimitern> I'm wondering why juju list-machines (or status for that mater) does not display containers
<axw> dimitern: erm, it doesn't? no idea
<axw> I haven't used containers in ages
<dimitern> axw: ok, np :)
<dimitern> I think, unless it's on purpose, it should be a bug
<mup> Bug #1571545 opened: juju status with default tabular format or juju list-machines does not show containers <observability> <status> <juju-core:New> <https://launchpad.net/bugs/1571545>
<hoenir> why does some test files on windows like fork/exec fail ? I know that fork syscall dosen't exist in windows but we should try to to detect the specific platfor and then run just the specific tests
<hoenir> I'm right?
<fwereade_> dimitern, frobware, voidspace: sorry, dropped accidentally, but won't have much to contribute to topic beyond "I endorse the removal of hacks"
<wallyworld> dimitern: hey, sorry was afk
<dimitern> wallyworld: np - it's rather late anyway
<frobware> dimitern: re ntpdate: https://bugs.launchpad.net/bugs/1564397
<mup> Bug #1564397: MAAS provider bridge script deletes /etc/network/if-up.d/ntpdate during bootstrap <bootstrap> <network> <juju-core:Triaged> <https://launchpad.net/bugs/1564397>
<dimitern> wallyworld: check out https://launchpad.net/bugs/1571545
<mup> Bug #1571545: juju status with default tabular format or juju list-machines does not show containers <observability> <status> <juju-core:New> <https://launchpad.net/bugs/1571545>
<wallyworld> dimitern: i do have a question for you - i started to look at removing the Network attr from deploy service and it was a very deep rabbit hole and many 100s of lines of code that i started to delete and then i noticed i started to overlap woth one of your existing prs so backed off
<wallyworld> dimitern: i am hoping all that network stuff - including collections, paras structs etc - can all be deleted for 2.0 rc1
<dimitern> wallyworld: ah, well - yeah, it's gnarly but we'll get there and drop it soonish I hope
<wallyworld> dimitern: yeah, it needs to be done before 1.0 final
<wallyworld> 2.0
<dimitern> it's no longer used and can't influence the code path anymore
<wallyworld> that status thing - it may not have ever included containers, not sure, but seems like a bug
<wallyworld> dimitern: but the api does expose it etc, so we need to drop thart bit at least
<dimitern> it's fine to drop the CLI argument (if still there)
<dimitern> i.e. just hide it while it can be dropped
<wallyworld> dimitern: the model migration stuff also references the obsolete collection(s)
<wallyworld> it would be best just to drop the whole lot; we will delete a couple of 1000 lines of code i think
<dimitern> wallyworld: ah, those I *think* are safer to remove now
<dimitern> wallyworld: let me have a look today how much we can drop
<dimitern> wallyworld: networksC it only still referenced by the opened ports, but should be easy to move that to spaces instead
<wallyworld> dimitern: ok, let me know how you get on. i delete a shit tonne of stuff from state, apis, params etc before i started to hit the address alllocation feature flag stuff
<wallyworld> so i stopped
<dimitern> wallyworld: yeah, I really wanted to drop the whole thing, but it was suggested as safer to drop it on maas only for now
<wallyworld> and yeah, the ports thing i wasn't sure about
<wallyworld> dimitern: not sure 100%, but i think the status thing was by design - [Machines] just really does mean machines and not containers
<wallyworld> to keep it not too verbose and to fit on a screen
<dimitern> well containers *are* machines :)
<wallyworld> not sure if we should keep it like that and add a --with-containers arg
<wallyworld> i'm just guessing
<wallyworld> what the rationale may have been
<wallyworld> but yeah, i agree with you
<dimitern> it will be nice to not have to go through a pile of  yaml just to get what addresses containers have
<wallyworld> agreed
<dimitern> frobware, voidspace, babbageclunk: friendly review poke :) https://github.com/juju/gomaasapi/pull/42 http://reviews.vapour.ws/r/4614/ http://reviews.vapour.ws/r/4626/
 * dimitern steps out for a while
<babbageclunk> dimitern: reviewed the first two - the third's getting pretty far away from anything I know about, so might take me a bit longer.
<jamespage> erm
<mup> Bug #1571593 opened: lxd bootstrap fails with unhelpful 'invalid config: no addresses match' <juju-core:New> <https://launchpad.net/bugs/1571593>
<mup> Bug #1571593 changed: lxd bootstrap fails with unhelpful 'invalid config: no addresses match' <juju-core:New> <https://launchpad.net/bugs/1571593>
<mup> Bug #1571593 opened: lxd bootstrap fails with unhelpful 'invalid config: no addresses match' <juju-core:New> <https://launchpad.net/bugs/1571593>
<babbageclunk> frobware: I had to destroy that ZNC service - it was hogging my nick here! But I still couldn't work out if it was listening on any local ports.
<frobware> babbageclunk: shame
<dimitern> cheers babbageclunk!
<babbageclunk> dimitern: :) oops, forgot to ping you when I'd done them!
<dimitern> babbageclunk: np, I've just got back anyway
<babbageclunk> frobware: I'll give it another go later on.
<voidspace> dimitern: did you get your reviews done?
<dimitern> voidspace: yeah, most of them - I'd appreciate a look on the last one though: http://reviews.vapour.ws/r/4626/
<voidspace> dimitern: I'll swap it: http://reviews.vapour.ws/r/4629/
<dimitern> voidspace: sure thing
<voidspace> dimitern: I like the logging changes :-)
<voidspace> dimitern: I have no new issues to add to the reviews already there
<dimitern> :) I'm sure *everybody* does heh
 * voidspace lunches
<dimitern> voidspace: ta
<voidspace> babbageclunk: I'll pick up something new after lunch
<voidspace> babbageclunk: we're nearly there!
<dimitern> voidspace: reviewed
<babbageclunk> voidspace - around? Want to pick your brains about how devicename and hardware id are set.
<dimitern> frobware, voidspace, babbageclunk: guys, I still need an approval on http://reviews.vapour.ws/r/4626/ - please, have a look
<dimitern> frobware: you know what? /etc/network/if-up.d/ntpdate is missing on xenial - it' only there on trusty
<dimitern> (well, maybe also in more recent non-LTS *releases)
<frobware> dimitern: is that because ntpdate is not in the base image on xenial?
<dimitern> frobware: I suspect so - it's not on my machine after upgrading to xenial, nor it's on freshly deployed xenial maas nodes with the most recent images
<frobware> dimitern: but technically it could come back (i.e., later versions add it (again))
<mgz> charms can also install packages.
<mgz> pretty sure neither ntp or ntpdate have ever been part of the base server image
<dimitern> true, but I can confirm ntpdate is there on trusty and not there on xenial images
<dimitern> even without juju in the picture
<dimitern> frobware: it could, but the code handles that transparently
<fwereade_> voidspace, state/machine.go:1262 seems like it might not be quite right -- surely the preferred address should be allowed to change if it's no longer one of the know addresses?
<dimitern> (chmod -f -x .. and later chmod -f +x ..)
<frobware> dimitern: do we fail on chmod or just ignore?
<dimitern> chmod -f does not fail when the file is missing
<frobware> dimitern: do we fail if chmod fails?
<frobware> dimitern: ty
<frobware> dimitern: and chmod -f is supported as an option in precise?
<dimitern> frobware: unfortunately I could see a bunch of ntpdate still hanging around with the chmod -x patch, as it seems ifup calls `/bin/sh /etc/network/if-up.d/ntpdate`
<frobware> dimitern: yay
<dimitern> frobware: we *could* try using ifup --no-scipts, but that seems more dangerous
<frobware> dimitern: ooh. interesting.
<dimitern> frobware: I'll do some experiments to see
<frobware> dimitern: we *should* try this. the scripts will have run once for curtin's ENI, and they will on every reboot. Just not whilst we're replacing stuff.
<frobware> dimitern: at face value that seem ok
<dimitern> frobware: that's an excellent point (which I keep forgetting about)
<frobware> dimitern: is --no-scripts supported in precise? :)
<dimitern> frobware: will try precise as well
<frobware> dimitern: fwiw, I don't think '--no-loopback' is supported in precise
<frobware> dimitern: nope, not supported.
<frobware> dimitern: http://pastebin.ubuntu.com/15912945/
<dimitern> frobware: unfortunately --no-scripts does not work even on xenial
<dimitern> frobware: that is, it works, but since one of the scripts is `bridge`, the bridges are not configured ok
<frobware> dimitern: what does 'bridge' mean here?
<dimitern> frobware: /etc/network/if-pre-up.d/bridge -> /lib/bridge-utils/ifupdown.sh*
<frobware> dimitern: which doesn't exist on precise... let me look elsewhere
<dimitern> frobware: it should be there if bridge-utils is installed
<fwereade_> voidspace, ignore me
<dimitern> babbageclunk, frobware: how about `verifySubnetAliveUnlessMissing(cidr) error` ? it will still return no error if cidr does not match an existing subnet
<dimitern> would that be easier to follow?
<natefinch> lol
<babbageclunk> dimitern: a bit, although it's still got too many clauses in the name
<natefinch> if you have to make a function into a sentence, you probably need more than one function
<dimitern> what's wrong with descriptive names?
<dimitern> or that's just not how real go devs roll :D
<babbageclunk> dimitern: hang on, I'm typing up what I mean
<babbageclunk> dimitern: nothing wrong with descriptive names, it's that the thing shouldn't be one function if its name has to be too descriptive.
<frobware> dimitern, natefinch: fwiw, that was my original concern in the PR
<natefinch> generally if you need a name that specific, it means you're tying the implementation of that function too tightly into what your consumer needs.  Just split it into two functions that do two simple things
<dimitern> ok, a better option will be I think to return a concrete error in the case the subnet does not exist, so it can be verified where needed
<natefinch> granted, I don't know what that function does per se, but it seems something like if subnetExists(cidr) { return verifyAlive(cidr) } is probably clearer and the individual functions are more reusable
<voidspace> fwereade_: ok, I will ignore you
<voidspace> babbageclunk: you there?
<babbageclunk> voidspace: yup, just typing up something
<babbageclunk> dimitern: https://pastebin.canonical.com/154572/
<babbageclunk> dimitern: Maybe?
<babbageclunk> voidspace: yup?
<voidspace> babbageclunk: do you still have questions?
<babbageclunk> voidspace: yes!
<babbageclunk> voidspace: hangout?
<voidspace> babbageclunk: why do you want to ask about new device name?
<natefinch> anyone up for a review of a 2.0 bug?  http://reviews.vapour.ws/r/4616/diff/#
<voidspace> babbageclunk: sure
<dimitern> babbageclunk: I'm trying out a similar approach, will update the PR soon with it
<frobware> jam: I think the bits we need to expose to apply NICs to the container is: SetContainerConfig(container, key, value string)
<babbageclunk> dimitern: cool
<babbageclunk> dimitern: can I pick your brains about /list
<babbageclunk> dimitern: oops, that is not what I meant to type
<dimitern> babbageclunk: yeah? :)
<babbageclunk> dimitern: what I meant to type was: provider/maas/volumes.go
<dimitern> babbageclunk: I'm not *that* familiar with it, but I'll help with what I can
<dimitern> babbageclunk: HO?
<babbageclunk> dimitern: ok thanks - voidspace was not much help!
<babbageclunk> dimitern: yup yu[
<dimitern> frobware, babbageclunk: updated http://reviews.vapour.ws/r/4626/diff/ can you have another look please?
<voidspace> dimitern: looking
<babbageclunk> Gah, X keeps crashing on me. :(
<voidspace> dimitern: I'm landing my branch - the only issue you opened was invalid and your other two comments I addresses
<voidspace> dimitern: (you suggested changing map[string]bool to set.Strings but the bool has significance, it isn't just a set)
<dimitern> voidspace: sure, sounds good
<voidspace> dimitern: the map tracks which subnets we actually found (true/false)
<voidspace> dimitern: cool
<dimitern> voidspace: we should bump deps.tsv for gomaasapi at some point as well
<voidspace> dimitern: it's been bumped whenever needed
<voidspace> dimitern: last time was on Friday
<dimitern> voidspace: ok
<voidspace> dimitern: I don't think anything has been done since then that needs updating
<dimitern> voidspace: my PR that fixes fetching VLANs with a null name
<dimitern> (landed earlier)
<voidspace> dimitern: ah, cool
<voidspace> dimitern: want me to do just that and propose it?
<voidspace> dimitern: I just knocked one more thing off the maas2 list and was about to tackle the next
<frobware> dimitern: looking
<dimitern> voidspace: well, as it's not a blocker for your maas I can do it later tonight or tomorrow
<voidspace> dimitern: ok
<dimitern> frobware: thanks!
<dimitern> babbageclunk: updated/simplified  http://reviews.vapour.ws/r/4626/diff/ (you dropped and missed this I think)
<voidspace> dimitern: LGTM on your branch
<dimitern> voidspace: ta!
<mup> Bug #1571687 opened: Azure-arm leaves machine-0 from the admin model behind <azure-provider> <ci> <destroy-controller> <jujuqa> <repeatability> <juju-core:Triaged> <https://launchpad.net/bugs/1571687>
<voidspace> mgz: ping
<katco`> ericsnow: standup time
<mgz> voidspace: yo
<babbageclunk> dimitern: looking now
<dimitern> thanks babbageclunk
<voidspace> mgz: just emailed you
<voidspace> mgz: I thought it was better to do as an email anyway
<voidspace> mgz: we'd like all the MAAS CI tests duplicating for MAAS 2.0 please :-)
<mgz> voidspace: we have a card for it
<voidspace> mgz: ah, awesome
<voidspace> mgz: we're very near to needing it
<voidspace> mgz: anytime tomorrow will be fine ;-)
<voidspace> :-P
<mgz> what's not working on master at present?
<voidspace> mgz: we don't add machine tags in instance characterstics (in progress)
<voidspace> mgz: instance.volumes unimplemented (in progress)
<voidspace> mgz: all container support not done yet (a couple of days work probably)
<voidspace> mgz: a network interface function that is implemented but not wired in
<voidspace> mgz: (that's trivial)
<mgz> hm, so the basic deploy test should work, but the bundle ones probably won't quite yet
<voidspace> mgz: yep
<voidspace> mgz: but it will only be a handful of days which is why I'm pinging now
<mgz> thanks :)
<voidspace> mgz: and thanks to you sir
<natefinch> cherylj: I commented on https://bugs.launchpad.net/juju-core/+bug/1531444 ... Maybe I'm missing some context, but it seems like it's probably not super critical
<mup> Bug #1531444: azure: add public mapping of series->Publisher:Offering:SKU <juju-core:Triaged> <https://launchpad.net/bugs/1531444>
<cherylj> natefinch: it was marked as critical as it impacted our ability to publish centos / windows in streams for azure
<babbageclunk> Is there a protocol for asking for help in canonical #maas?
<babbageclunk> Someone particular I should ask?
<cherylj> babbageclunk: I usually ask roaksoax, or mpontillo
<natefinch> cherylj: yes, but if it's only when a new version of windows comes out... we need to update core for that anyway (which, admittedly is horrible and bad, but it is the state of the code AFAIK)
<voidspace> babbageclunk: roaksoax has promised to help us
<dimitern> babbageclunk: allenap, mpontillo, roaksoax, blake_r
<mgz> it's not a very lively channel
<mgz> but gavin is our timezone, and sometimes in hitting range of me (allenap)
<cherylj> natefinch: thanks for checking it out, I'll bring it up again today to better understand the blockage
<babbageclunk> voidspace: or maybe he *vowed* to help us?
<babbageclunk> Ok, thanks
<voidspace> babbageclunk: uhm, maybe I guess...
<babbageclunk> voidspace: It would be more dramatic.
<voidspace> babbageclunk: it certainly would be
<babbageclunk> voidspace: Never promise when you could vow
<voidspace> babbageclunk: heh, sound life advice there
<voidspace> babbageclunk: frobware: dimitern: a really difficult one http://reviews.vapour.ws/r/4630/
<dimitern> voidspace: looking
<frobware> dimitern: does it need a test where there are no tags?
<dimitern> voidspace: :) LGTM
<voidspace> dimitern: thanks
<dimitern> frobware: that's up to gomaasapi I think - it should handle the lack of tags as an empty slice (or nil)
<dimitern> voidspace: ^^
<voidspace> dimitern: frobware: yep, it will just be an empty slice
<frobware> ok
<voidspace> dimitern: frobware: calling gomaasapi give us a struct with a Tags member - so either there's something there or there isn't. It doesn't matter.
<voidspace> or rather, an interface with a Tags method
<dimitern> voidspace: cool
<voidspace> frobware: thanks
<dimitern> voidspace: also I doubt you have any tags on your vmaas vms - otherwise that would've been noticed earlier :)
<dimitern> (I mean if it's a panic or something nasty like that)
<voidspace> indeed
<voidspace> babbageclunk: don't forget to update the status doc - ta!
<dimitern> voidspace, babbageclunk, frobware: http://reports.vapour.ws/releases/3899/job/run-unit-tests-race/attempt/1338#highlight it might be worth running provider/maas tests with '-race' a few times to find and fix that
<voidspace> dimitern: I've added it as a TODO on the status doc
<dimitern> voidspace: +1
<frobware> voidspace, babbageclunk, dimitern: reminder - no rick call today
<dimitern> frobware: ok
 * dimitern bbl
<babbageclunk> dimitern, voidspace: whoa, that data race is weird - can someone explain it to me a bit?
<mup> Bug #1519877 changed: 'juju help' Provider information is out of date <juju-core:Invalid> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1519877>
<voidspace> babbageclunk: I looked at it, went "whoa" and stopped looking at it
<voidspace> babbageclunk: as you might guess, the race detector detects possible race conditions between goroutines
<voidspace> babbageclunk: so it shouldn't be *too* hard to work out
<babbageclunk> voidspace: Ah, I think I get it - it's the fact we store the filename in GetFile.
<babbageclunk> voidspace: In fakeController.
<voidspace> babbageclunk: if you think you can fix it then awesome
<voidspace> babbageclunk: ah, right
<voidspace> sounds likely
<babbageclunk> voidspace: do you think I should put locking on all of the fakeController methods that store state on the controller for later?
<babbageclunk> voidspace: may as well, right?
<voidspace> babbageclunk: if it's not too much work
<voidspace> babbageclunk: I don't really like "just in case" code
<voidspace> babbageclunk: but locking is perhaps an exception
<babbageclunk> It's only a few methods
<mup> Bug #1571737 opened: Race is mass provider storage <ci> <maas-provider> <race-condition> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1571737>
<cherylj> babbageclunk, voidspace, so who gets bug 1571737?  :)
<mup> Bug #1571737: Race is mass provider storage <ci> <maas-provider> <race-condition> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1571737>
<babbageclunk> cherylj: me, me!
<cherylj> we have a winner!
<babbageclunk> cherylj: fixing it now
<cherylj> yay!
<cherylj> thank you, babbageclunk :)
<babbageclunk> cherylj: :)
<cherylj> babbageclunk: what's your lp ID?
<babbageclunk> cherylj: hmm, good question - checking now
<babbageclunk> cherylj: 2-xtian
<cherylj> yeah, I never would've guessed that
<cherylj> thanks, babbageclunk :)
<frobware> me neither
<frobware> and I'm sure you told me this a few weeks ago
<babbageclunk> voidspace, dimitern, frobware: review my data race fix please? http://reviews.vapour.ws/r/4631/
<voidspace> babbageclunk: LGTM
<babbageclunk> voidspace: sweet. What's the protocol for closing bugs? Will anything update it automatically on merge if I put a tag on the PR, or do I just close it manually?
<voidspace> babbageclunk: assign it to yourself, mark it in progress
<voidspace> babbageclunk: then once the fix lands mark it fix committed
<voidspace> babbageclunk: QA are responsible for marking it fix released (effectively closing it)
<voidspace> babbageclunk: I don't *think* there's anything auto here
<mgz> our release process does auto-fix-released bugs targetted at the milestone
<voidspace> mgz: cool
<voidspace> babbageclunk: you should probably target the bug at the latest 2.0 beta/rc or whatever the latest is then
<mgz> yeah, rc1
<mup> Bug #1570035 changed: Race in api/watcher/watcher.go <ci> <race-condition> <regression> <test-failure> <juju-core:Fix Released by natefinch> <https://launchpad.net/bugs/1570035>
<mup> Bug #1570994 changed: deploy fails to download updated local charm <juju-core:New> <https://launchpad.net/bugs/1570994>
<babbageclunk> When you say target it at 2.0-rc1 - is that the milestone? (It's already set to that.)
<voidspace> babbageclunk: yes
<babbageclunk> Ok, I've marked it as in-progress, and when the merge passes (as I'm sure it will with no flaky tests!) I'll change it to fix-committed.
<voidspace> babbageclunk: don't forget status doc
<babbageclunk> voidspace: haven't!
<voidspace> babbageclunk: :-p
<babbageclunk> voidspace: no, I mean I haven't updated it ;)
<voidspace> hah
<voidspace> I know you haven't
<babbageclunk> voidspace: but I will.
<voidspace> ok
<voidspace> you do that
<babbageclunk> voidspace: ding!
<mup> Bug # changed: 1556113, 1556146, 1556180, 1558901
<natefinch> ericsnow, katco`: if you're looking to break up your day, you could review my bugfix from last week: http://reviews.vapour.ws/r/4616/
<ericsnow> natefinch: will take a look in a bit
<katco> natefinch: same
<mup> Bug #1571783 opened: Windows unit tests cannot setup under go 1.6 <ci> <go1.6> <jujuqa> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1571783>
<mup> Bug #1571783 changed: Windows unit tests cannot setup under go 1.6 <ci> <go1.6> <jujuqa> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1571783>
<mup> Bug #1571783 opened: Windows unit tests cannot setup under go 1.6 <ci> <go1.6> <jujuqa> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1571783>
<natefinch> I really wish juju status would print out the controller name and model name I'm looking at
<perrito666> natefinch: open a bug
<mup> Bug #1571792 opened: Juju status should show controller and model names <juju-core:New> <https://launchpad.net/bugs/1571792>
<natefinch> cmars: I'm looking at https://bugs.launchpad.net/juju-core/+bug/1566130  but I can't reproduce it with a trivial install hook that just does an exit 1... do you still have a good repro?
<mup> Bug #1566130: awaiting error resolution for "install" hook <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1566130>
<cmars> natefinch, try pulling cs:~cmars/gogs and introducing an install hook error in reactive/gogs.py
<redir> anyone have a minute to rubber duck something with me?
<cmars> natefinch, maybe raise Exception("foo") in there
<natefinch> cmars: ok, I'll give it a try, thanks
<cmars> natefinch, then, "fix" it, do `juju upgrade-charm gogs --force-units`
<cmars> natefinch, then possibly juju resolved --retry gogs/0
<redir> katco: who would I ping about zseries information?
<mup> Bug #1556155 changed: worker/periodicworker data race <race-condition> <juju-core:Fix Released> <https://launchpad.net/bugs/1556155>
<mup> Bug #1570219 changed: juju2 openstack provider setting default network <canonical-bootstack> <network> <openstack-provider> <juju-core:Fix Released> <https://launchpad.net/bugs/1570219>
<katco> redir: sec
<redir> np
<katco> redir: i answered on internal network in case you missed it
<redir> I did
<natefinch> oh weird, we automatically retry failed hooks now?
<natefinch> wallyworld: you around?
<mgz> natefinch: yeah, see dev thread in jan, from message from bogdan in nov
<mgz> did his followup changes all get reviewed?
<mgz> I saw at least one hanging around for a while
<natefinch> Oh yeah, I remember that thread now
<natefinch> mgz: no idea about reviews
<natefinch> mgz: was wondering if maybe the retry code had something to do with the bug I'm looking at: https://bugs.launchpad.net/juju-core/+bug/1566130
<mup> Bug #1566130: awaiting error resolution for "install" hook <juju-core:Triaged by natefinch> <https://launchpad.net/bugs/1566130>
<mgz> natefinch: seems possible at least - don't have a tighter revision window to check for you I'm afraid, as we don't have tests for actually borked charms.
<natefinch> *nod*
<mgz> should at least have something exercising resolved --retry and the like, wouldn't be that hard to add.
<wallyworld> natefinch: sorta
<natefinch> wallyworld: np, got an answer elsewhere
<redir_afk> I'll be back later this eve, but not sure what time yet.
<katco> redir_afk: gl
<mup> Bug #1571831 opened: TxnPrunerSuite.TestPrunes intermittent test failure <juju-core:New> <https://launchpad.net/bugs/1571831>
<mup> Bug #1571832 opened: Respect the full tools list on InstanceConfig when building userdata config script. <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1571832>
<natefinch> cmars: that install bug... have you been able to reproduce it with latest master?  I tried a few different ways of having install fail, to no avail.
<cmars> natefinch, a very recent 2.0rc1 master. i'll try in a few min & demonstrate
<natefinch> cmars: thanks
<cmars> natefinch, updated the bug with really precise instructions. confirmed it's still there with latest master. the trick is to upgrade-charm --force-units after you get the hook error
<mup> Bug #1571855 opened: User lacking model write access confronted with unhelpful message <docteam> <juju-core:New> <https://launchpad.net/bugs/1571855>
<mup> Bug #1571861 opened: juju upgrade-charm requires --switch for local charms <juju-core:New> <https://launchpad.net/bugs/1571861>
<axw> wallyworld: I'm thinking of removing the --generate flag from change-user-password. it currently doesn't tell you what it generated, so not very helpful; and really, you should just use a password manager if you want that
<wallyworld> axw: sgtm i think
#juju-dev 2016-04-19
<mup> Bug #1571901 opened: "juju change-user-password --generate" is unhelpful <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1571901>
<wallyworld> cmars: a really small one http://reviews.vapour.ws/r/4635/
<wallyworld> if you have a chance
<cmars> wallyworld, sure, looking
<cmars> wallyworld, awesome, thanks!
<cmars> LGTM
<wallyworld> thanks for rasing the bug
 * menn0 loves the lxd provider ... so much easier to test HA
<natefinch> menn0: yeah, the lxd provider is awesome. I love how non-special it is.
<mup> Bug #1571914 opened: github.com/juju/juju/cmd/jujud unit tests fail if xenial is the LTS <juju-core:Triaged> <https://launchpad.net/bugs/1571914>
<mup> Bug #1571916 opened: github.com/juju/juju/environs unit tests fail if xenial is the LTS <juju-core:Triaged> <https://launchpad.net/bugs/1571916>
<mup> Bug #1571917 opened: github.com/juju/juju/juju unit tests fail if xenial is the LTS  <juju-core:Triaged> <https://launchpad.net/bugs/1571917>
<menn0> wallyworld or axw: big peergrouper refactoring/cleanup http://reviews.vapour.ws/r/4636/
<wallyworld> menn0: does it fix the issue?
<menn0> wallyworld: it might... it can't be reproduced easily (except in CI it seems)
<wallyworld> ok
<axw> menn0: will look shortly
<menn0> wallyworld: there were certainly some suspicious racy looking areas which I've elminated
<wallyworld> code will be much better regardless
<menn0> wallyworld: and these changes are very worthwhile regardless... there were some truly horrible bits
<wallyworld> so it seemed
<mup> Bug #1567161 changed: juju2 beta3, cannot download charm, failed to download, 400 <deploy> <landscape> <mongodb> <juju-core:Incomplete> <https://launchpad.net/bugs/1567161>
<mup> Bug #1568176 changed: charm deployment requests invalid revision number <charms> <juju-core:New> <https://launchpad.net/bugs/1568176>
<natefinch> why is there destroy controller, model, relation, service, unit, user.... but no destroy-machine?
<menn0> wallyworld: thanks... I already fixed the typo in diff 2
<menn0> :)
<wallyworld> menn0: changes look much nicer than what was there
<menn0> wallyworld: by tomorrow we should know if they helped
<wallyworld> indeed
<wallyworld> natefinch: destroy is going away to be replaced by remove
<wallyworld> for machines, relations, services, units, users
<natefinch> wallyworld: I'm fine with that as long as it's consistent.  Given we at the code freeze, bugs-only phase.... there's not a lot of "going to be" left :)
<wallyworld> natefinch: we will make changes as necessary till 2.0 is as it needs to be
<wallyworld> the CLI work is still in progress
<wallyworld> natefinch: all the terminology has been thoroughly reviewed at the highest level and agreed to
<wallyworld> so that is what is being implemented
<natefinch> wallyworld: understood
<wallyworld> :-)
<mup> Bug #1571923 opened: destroy-machine should exist in 2.0 <juju-core:New> <https://launchpad.net/bugs/1571923>
<mwhudson> hi any tips on how to debug manual provider bootstrap failure?
<mwhudson> i just get this:
<mwhudson> 2016-04-19 03:39:14 ERROR juju.provider.manual provider.go:31 initializing ubuntu user: subprocess encountered error code 1 (Connection to 10.0.2.15 closed.)
<mwhudson> hm, how do i get trace output?
<cherylj> natefinch: do you know if you left any containers running on the CI arm machine?
<mwhudson> cherylj: was it you who had some problem with ssh terminating early and the manual provider a few months ago?
<natefinch> cherylj: I don't know but I can check
<natefinch> cherylj: I had, just cleaned them up
<mwhudson> oh
<mwhudson> nm
<mup> Bug #1571932 opened: github.com/juju/juju/provider/dummy unit tests fail if xenial is the LTS  <juju-core:Triaged> <https://launchpad.net/bugs/1571932>
<mup> Bug #1571933 opened: github.com/juju/juju/worker/provisioner unit tests fail if xenial is the LTS <juju-core:Triaged> <https://launchpad.net/bugs/1571933>
<mup> Bug #1571947 opened: bootstrap --upload-tools fails with "cannot start bootstrap instance: missing tools URL" <juju-core:Triaged> <https://launchpad.net/bugs/1571947>
<dimitern> katco: ping?
<dimitern> katco: probably too late for you, so I'll pick up bug 1567676 if you don't mind
<mup> Bug #1567676: windows: networker tries to update invalid device and blocks machiner from working <windows> <juju-core:In Progress by cox-katherine-e> <https://launchpad.net/bugs/1567676>
<mup> Bug #1571982 opened: Centos7 machines fail to run cloud-init on Azure <juju-core:Triaged> <https://launchpad.net/bugs/1571982>
<mwhudson> anyone here know much about how the lxd provider works?
<frobware> dimitern: ping, 1:1?
<dimitern> frobware: oops, omw - sorry
<mup> Bug #1572022 opened: status randomly shows one of the peer relations, not both <juju-core:New> <https://launchpad.net/bugs/1572022>
<mup> Bug #1572022 changed: status randomly shows one of the peer relations, not both <juju-core:New> <https://launchpad.net/bugs/1572022>
<fwereade> perrito666, please ping me when you have a bit of time to chat about restores, I want to validate my assumptions
<mwhudson> mgz: awake again already?
<mup> Bug #1572022 opened: status randomly shows one of the peer relations, not both <juju-core:New> <https://launchpad.net/bugs/1572022>
<voidspace> babbageclunk: dimitern: frobware: easy one to start the day http://reviews.vapour.ws/r/4638/
<dimitern> frobware, voidspace: I can confirm MAAS 2.0 imposes the same restriction (for both devices and machines) on physical interfaces (can only be linked to untagged vlans)
<dimitern> here's the python script converted to py3 and finally working: http://paste.ubuntu.com/15927539/
<dimitern> and here's a paste of the output showing the issue: http://paste.ubuntu.com/15927550/
<TheMue> morning btw
<dimitern> updated the script to make it easier to switch maas versions: http://paste.ubuntu.com/15927766/
<dimitern> frobware, voidspace: fyi, filed bug 1572070
<mup> Bug #1572070: MAAS 2.0 cannot link physical device interfaces to tagged vlans, breaking juju 2.0 multi-NIC containers <juju> <MAAS:New> <https://launchpad.net/bugs/1572070>
<axw> wallyworld: https://bugs.launchpad.net/juju-core/+bug/1571832  -- what do you think about changing the tools storage to combine controller+model catalogies (model overlaying), so any tools added to the controller are available in all models?
<mup> Bug #1571832: Respect the full tools list on InstanceConfig when building userdata config script. <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1571832>
<axw> catalogues*
<wallyworld> axw: i think that makes sense, since model tools are sort of related to those of the host controller anyway
<wallyworld> in that there's a version dependency
<axw> wallyworld: ok, I'll see what I can do later on then
<axw> azure is buggered on master atm, fixing that first
<wallyworld> ok
<dimitern> frobware, voidspace: http://reviews.vapour.ws/r/4639/ fixes bug 1567676, please have a look when you have a moment
<mup> Bug #1567676: windows: networker tries to update invalid device and blocks machiner from working <windows> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1567676>
<frobware> dimitern: looking
<dimitern> frobware: ta!
<frobware> dimitern: done
<dimitern> frobware: thanks!
<mup> Bug #1571082 changed: autopkgtest lxd provider tests fail for 2.0 <jujuqa> <lxd-provider> <packaging> <juju-core:Fix Released> <juju-core (Ubuntu):Fix Released> <https://launchpad.net/bugs/1571082>
<dimitern> frobware: updated http://reviews.vapour.ws/r/4639/ btw
<frobware> dimitern: looking
<voidspace> dimitern: looking
<frobware> dimitern: what do we consider invalid now?
<dimitern> frobware: what's considered invalid hasn't changed, just how we handle it
<frobware> dimitern: I guess my concern is should we simply drop the validation and the respective tests?
<frobware> dimitern: it seems a little counter intuitive to validate and then ignore
<dimitern> frobware: we could, but I'd prefer to keep it at least for a while - easier to catch issues once we start creating bridges from juju
<dimitern> frobware: or anything else on the machine that needs to use those device names (ip commands, etc.)
<frobware> dimitern: if it is invalid shouldn't we encode it in a way that makes it safe to go into the db?
<voidspace> dimitern: if invalid names are not invalid it does seem weird - especially logging at Warning level
<voidspace> dimitern: assuming we're sure that's the right thing to do LGTM
<voidspace> dimitern: but like frobware, if there's no such thing as an invalid name then why even check?
<voidspace> dimitern: I leave it in your hands though
<dimitern> voidspace: cheers
<voidspace> dimitern: when will your branch removing AddressAllocation for MAAS  land?
<voidspace> dimitern: as it impacts my devices work
<dimitern> voidspace, frobware: let me give you an example - on a windows machine, "Local Network Connection #2" is a valid name, but trying to save this into state goes via the API server which runs on Ubuntu, and has different criteria for valid names (much more restrictive)
<dimitern> voidspace: it landed yesterday
<voidspace> dimitern: I understand why we need to allow invalid names
<mup> Bug #1572102 opened: Juju could indicate if the LXD image is out-of-sync with upstream <lxd> <juju-core:Triaged> <https://launchpad.net/bugs/1572102>
<voidspace> dimitern: what I don't understand is why we still need to check if they're invalid
<voidspace> dimitern: ah, so it has landed!
<dimitern> frobware: it's still safe to store it in mongo, no encoding needed
<voidspace> dimitern: I saw that the AddressAllocation flag was still used - but didn't see that it was just to emit a warning
<voidspace> cool
<dimitern> frobware, voidspace: ok, I think you convinced me to drop the tests and just keep the rest
<frobware> dimitern: so let's drop the tests - they don't test anything we rely on
<axw> ashipika: picking on you since you're the only OCR that's online. would you mind taking a look at https://github.com/juju/juju/pull/5215 if you have time?
<ashipika> axw: my pleasure
<dimitern> frobware: done, do you want a last look before I hit the button? :)
<frobware> dimitern: looking
<frobware> dimitern: go for it
<axw> sinzui: in case you see the azure issue in CI before my fix lands, I'm pretty sure it's covered by https://bugs.launchpad.net/juju-core/+bug/1571947, and fixed by the branch linked
<mup> Bug #1571947: bootstrap --upload-tools fails with "cannot start bootstrap instance: missing tools URL" <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1571947>
 * dimitern hits the button
<ashipika> axw: is it worth adding a note describing why the URL checking was reverted?
<axw> ashipika: yeah I guess. I'll create a tech-debt bug and link it
<ashipika> axw: ack
<dimitern> axw: great job! tyvm
<dimitern> now with that fix I can actually do some work with juju :)
<axw> dimitern: :)
<ashipika> axw: ship it
<axw> ashipika: thanks
<mup> Bug #1572116 opened: cloudconfig/instancecfg: SetTools should ensure URLs are set <juju-core:Triaged> <https://launchpad.net/bugs/1572116>
<sinzui> thank you axw
<katco> dimitern: pong, hey
<dimitern> katco: hey, np - just fixed bug 1567676 and as you were assigned to it wanted to check before I started
<mup> Bug #1567676: windows: networker tries to update invalid device and blocks machiner from working <windows> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1567676>
<katco> dimitern: feel free to take it, i was spinning getting vmaas set up
<katco> dimitern: but please keep me updated on it. it looked strange to me, like an impossible code path. what's your idea?
<dimitern> katco: it's fixed now and landed
<katco> dimitern: what was it?
<dimitern> katco: the seemingly impossible path actually turned to be quite valid - a windows machine calling the api on ubuntu, which in turn tries to validate that api args as linux interface names :)
<mup> Bug #1572145 opened: kvmProvisionerSuite.TestContainerStartedAndStopped no event arrived <ci> <intermittent-failure> <ppc64el> <regression> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1572145>
<katco> dimitern: ahhhh!
<dimitern> :)
<katco> dimitern: that makes me feel better haha
<katco> dimitern: i was wondering if perhaps GOOS was being locked during compiled-time or something
<dimitern> katco: fortunately not :) thanks for looking into it though, I know the code is somewhat gnarly and hard to follow at places, but it's improving already
<perrito666> bbl ~1h
<mup> Bug #1572159 opened: juju deploy: cannot specify resource revisions <juju-core:New> <https://launchpad.net/bugs/1572159>
<babbageclunk> I keep having the urge to all-caps JUJU when I mention it in text.
<babbageclunk> (Also when I say it aloud, although to a lesser extent.)
<babbageclunk> That's normal, right?
<dimitern> :) depends on the locale
<cherylj> heh
<mup> Bug #1572159 changed: juju deploy: cannot specify resource revisions <juju-core:New> <https://launchpad.net/bugs/1572159>
<mup> Bug #1572159 opened: juju deploy: cannot specify resource revisions <juju-core:New> <https://launchpad.net/bugs/1572159>
<dimitern> frobware, voidspace, babbageclunk: one seriously complicated review guys: http://reviews.vapour.ws/r/4641/ :)
<babbageclunk> dimitern: SOUNDS LIKE A CHALLENGE!
<dimitern> :)
<mup> Bug #1569898 changed: cmd/pprof: sporadic test failure <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1569898>
<mup> Bug #1571737 changed: Race in maas provider storage <ci> <maas-provider> <race-condition> <regression> <juju-core:Fix Released by 2-xtian> <https://launchpad.net/bugs/1571737>
<mup> Bug #1569898 opened: cmd/pprof: sporadic test failure <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1569898>
<mup> Bug #1571737 opened: Race in maas provider storage <ci> <maas-provider> <race-condition> <regression> <juju-core:Fix Released by 2-xtian> <https://launchpad.net/bugs/1571737>
<mup> Bug #1569898 changed: cmd/pprof: sporadic test failure <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1569898>
<mup> Bug #1571737 changed: Race in maas provider storage <ci> <maas-provider> <race-condition> <regression> <juju-core:Fix Released by 2-xtian> <https://launchpad.net/bugs/1571737>
<ericsnow> rogpeppe: FYI, I've commented on bug #1572159
<mup> Bug #1572159: juju deploy: cannot specify resource revisions <juju-core:New> <https://launchpad.net/bugs/1572159>
<rogpeppe> ericsnow: thanks, i just responded
<rogpeppe> ericsnow: and it worked for me
<ericsnow> rogpeppe: great! :)
<rogpeppe> ericsnow: out of interest, where in the code are the revnos detected?
<ericsnow> rogpeppe: resource/resourceadapters/deploy.go
<ericsnow> rogpeppe: via handleResources() in cmd/juju/service/deploy.go
<mup> Bug #1572159 changed: juju deploy: cannot specify resource revisions <juju-core:New> <https://launchpad.net/bugs/1572159>
<mup> Bug #1572159 opened: juju deploy: cannot specify resource revisions <juju-core:New> <https://launchpad.net/bugs/1572159>
<mup> Bug #1572159 changed: juju deploy: cannot specify resource revisions <juju-core:New> <https://launchpad.net/bugs/1572159>
<babbageclunk> dimitern, voidspace, frobware: Something for review - http://reviews.vapour.ws/r/4642/
<dimitern> babbageclunk: almost done
<dimitern> babbageclunk: reviewed
<babbageclunk> dimitern, frobware: Thanks!
<babbageclunk> voidspace: don't forget to update the document! :)
<voidspace> babbageclunk: hah, true enough :-)
<voidspace> I won't
<babbageclunk> voidspace: I'm going to replace the locks on fakeController with channels, then should I look at devices?
<voidspace> babbageclunk: I've started on devices, but mostly just understanding (or not) what's there
<voidspace> babbageclunk: we can share the work
<voidspace> babbageclunk: I'm doing AllocateContainerAddresses
<babbageclunk> voidspace: ok cool. I'll do this and then grab another one
<voidspace> cool
<mup> Bug #1572237 opened: juju rc1 loses agents during a lxd deploy <juju-core:New> <https://launchpad.net/bugs/1572237>
<perrito666> well juju tests with 1.6 in windows are broken with a big B
<mup> Bug #1572237 changed: juju rc1 loses agents during a lxd deploy <juju-core:New> <https://launchpad.net/bugs/1572237>
<mup> Bug #1572237 opened: juju rc1 loses agents during a lxd deploy <juju-core:New> <https://launchpad.net/bugs/1572237>
<natefinch> Well, there's a dumb bug
<natefinch> upgrade-charm barfs if you include the trailing slash for the path for a local charm
<natefinch> huzzah, fixed my bug
<mup> Bug #1570759 changed: apt-get install juju does not install /usr/bin/juju <packaging> <xenial> <One Hundred Papercuts:Confirmed> <juju-core:Fix Released by gz> <juju-core (Ubuntu):Fix Released> <https://launchpad.net/bugs/1570759>
<thumper> morning folks
 * thumper trawls through emails
<thumper> sinzui: what's the status of a maas 2 ci test?
<katco> thumper: morning
<sinzui> thumper: we have a machine ready for us to install it. Installation will start tomorrow
<thumper> sinzui: ok. will be good to see this get some further testing
<thumper> alexisb: morning
<lazyPower> cherylj - ran into that same issue i bumped my head on last week and got enough to give you a proper bug this time around https://bugs.launchpad.net/juju-core/+bug/1572312
<mup> Bug #1572312: Juju fails to deploy allocate node due to bad http response: 400 Bad Request <juju-core:New> <https://launchpad.net/bugs/1572312>
<cherylj> lazyPower: good news!  menn0 has already fixed that issue :)
<lazyPower> oh yeah?
<cherylj> lazyPower: yeah, see bug 1569054
<mup> Bug #1569054: GridFS namespace breaks charm and tools deduping across models <juju-core:Fix Committed by menno.smits> <https://launchpad.net/bugs/1569054>
<lazyPower> Nice! That makes so much sense now why it was intermittent
<Guest_98761> Allah is doing
<Guest_98761> sun is not doing Allah is doing
<Guest_98761> moon is not doing Allah is doing
<Guest_98761> stars are not doing Allah is doing
<Guest_98761> planets are not doing Allah is doing
<Guest_98761> galaxies are not doing Allah is doing
<Guest_98761> oceans are not doing Allah is doing
<Guest_98761> mountains are not doing Allah is doing
<Guest_98761> trees are not doing Allah is doing
<Guest_98761> mom is not doing Allah is doing
<Guest_98761> dad is not doing Allah is doing
<Guest_98761> boss is not doing Allah is doing
<Guest_98761> job is not doing Allah is doing
<Guest_98761> dollar is not doing Allah is doing
<Guest_98761> degree is not doing Allah is doing
<Guest_98761> medicine is not doing Allah is doing
<Guest_98761> customers are not doing Allah is doing
<Guest_98761> you can not get a job without the permission of allah
<Guest_98761> you can not get married without the permission of allah
<wallyworld> cmars: do you have 5 min?
<Guest_98761> nobody can get angry at you without the permission of allah
<cmars> wallyworld, sure
<Guest_98761> light is not doing Allah is doing
<wallyworld> cmars: https://plus.google.com/hangouts/_/canonical.com/tanzanite-stand
<Guest_98761> fan is not doing Allah is doing
<Guest_98761> businessess are not doing Allah is doing
<cmars> the question is, what is op doing?
<Guest_98761> america is not doing Allah is doing
<Guest_98761> fire can not burn without the permission of allah
<Guest_98761> knife can not cut without the permission of allah
<Guest_98761> rulers are not doing Allah is doing
<Guest_98761> governments are not doing Allah is doing
<Guest_98761> sleep is not doing Allah is doing
<Guest_98761> hunger is not doing Allah is doing
<Guest_98761> food does not take away the hunger Allah takes away the hunger
<Guest_98761> water does not take away the thirst Allah takes away the thirst
<Guest_98761> seeing is not doing Allah is doing
<Guest_98761> hearing is not doing Allah is doing
<bdx> wtf
<Guest_98761> seasons are not doing Allah is doing
<Guest_98761> weather is not doing Allah is doing
<Guest_98761> humans are not doing Allah is doing
<Guest_98761> animals are not doing Allah is doing
<Guest_98761> the best amongst you are those who learn and teach quran
<Guest_98761> one letter read from book of Allah amounts to one good deed and Allah multiplies one good deed ten times
<Guest_98761> hearts get rusted as does iron with water to remove rust from heart recitation of Quran and rememberance of death
<Guest_98761> heart is likened to a mirror
<Guest_98761> when a person commits one sin a black dot sustains the heart
<Guest_98761> to accept Islam say that I bear witness that there is no deity worthy of worship except Allah and Muhammad peace be upon him is his slave and messenger
<Guest_98761> read book www.fazaileamaal.com
<Guest_98761> read book www.muntakhabahadith.com
<Guest_98761> need spiritual teacher visit www.alhaadi.org.za
<Guest_98761> allah created the sky without any pillars
<Guest_98761> allah makes the sun rise from the east and sets it in the west
<Guest_98761> allah makes the day into the night and the night into the day
<Guest_98761> allah gives life and Allah gives death
<Guest_98761> all creation are useless,worthless,hopeless
<Guest_98761> can not do
<Guest_98761> can not benefit
<Guest_98761> can not harm
<Guest_98761> allah is the doer of each and everything
<Guest_98761> when Allah wants us to stand we stand
<mup> Bug #1511537 changed: Failed to load cookies EOF <ci> <intermittent-failure> <test-failure> <juju-core:Fix Released> <https://launchpad.net/bugs/1511537>
<Guest_98761> when Allah wants us to sit we sit
<Guest_98761> i am not doing Allah is doing
<Guest_98761> you are not doing Allah is doing
<Guest_98761> atom bomb is not doing Allah is doing
<mup> Bug #1511537 opened: Failed to load cookies EOF <ci> <intermittent-failure> <test-failure> <juju-core:Fix Released> <https://launchpad.net/bugs/1511537>
<mup> Bug #1511537 changed: Failed to load cookies EOF <ci> <intermittent-failure> <test-failure> <juju-core:Fix Released> <https://launchpad.net/bugs/1511537>
<perrito666> that was.... weird
<perrito666> kiwiirc is a big spam gateway
<wallyworld> axw: standup?
<wallyworld> natefinch: "Welcome to Juju 2.0" <-- should we use CurrrentVersion to print the real version, not always 2.0
<natefinch> wallyworld: might as well
<natefinch> wallyworld: done
<wallyworld> ty, looking
<natefinch> damn conflicts
#juju-dev 2016-04-20
<natefinch> rebased and fixed conflicts
<wallyworld> natefinch: is it https://jujucharms.com/docs/2.0/introducing-2 or https://jujucharms.com/docs/stable/introducing-2
 * natefinch is doing pretty well for having a sleeping toddler in his lap :)
<natefinch> wallyworld: evilnick said stable
<wallyworld> ok
<natefinch> The works 'now and later' bit is a slight problem. We could use the URL:
<natefinch> https://jujucharms.com/docs/stable/introducing-2
<natefinch> I can create that page in the current stable docs and redirect to /devel/
<natefinch> When we release the 2.0 docs in a few days, it will be at that URL anyhow.
<natefinch> (that was from him)
<wallyworld> natefinch: i have asked some questions in the PR about the tests
<wallyworld> eg we should be using PatchExecutable I think
<natefinch> wallyworld: I didn't even know we had a patch executable
<wallyworld> we do :-)
<wallyworld> very useful
<natefinch> oh god it writes OS dependent scripts.  Ouch
<natefinch> see, that's the nice thing about using the Go executable that we already have, it's already cross-platform
<wallyworld> but the tests are very messy
<natefinch> wallyworld: maybe we could address these problems for beta 6?
<wallyworld> hmmm, i reslly don't like the tests as they are
<wallyworld> i guess we could
<natefinch> wallyworld: well, I think the better solution might be to simply not run juju-1 and print 1.x all the time
<natefinch> wallyworld: and that would clean up the production code and the tests
<wallyworld> natefinch: but actually, why can't the os.Exec patch just write to stdout without running anything?
<natefinch> wallyworld: I could patch out the whole call to run anything, it's true.  My way lets us get a little closer to testing everything, but it's not truly necessary.  we could have a runExec(string path, args ...string) ([]byte, error) and patch that out
<wallyworld> natefinch: yes, something like that. i am reluctant to just print 1.x if rick has asked for the real version
<wallyworld> natefinch: bit also, PatchExecutable is used extensively in juju so why do something inconsistent here?
<natefinch> wallyworld: 'cause patch executable is horrible ;)  But mostly, I didn't know it existed.
<natefinch> wallyworld: also, patch executable doesn't let us verify we're sendingf the right arguments, AFAICT
<wallyworld> there's a PatchExedcutableEchoArgs
<natefinch> OMG, that whole file is awful hackery
<mup> Bug #1572350 opened: juju status shows incorrect message after switching to a controller with no model <docteam> <juju-core:New> <https://launchpad.net/bugs/1572350>
<mup> Bug #1572353 opened: mismatch at [0].Tag.id: unequal; obtained "2"; expected "1" <ci> <regression> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1572353>
<natefinch> wallyworld: I can use that if that's what you think is best.
<wallyworld> natefinch: if you used PatchExcutableEchoArgs, you could easily verify that "juju version" is called right, and return the expected version to std out in only a few lines of code
<wallyworld> it doesn't really matter if we consider PacthExdcutable good code or not, it's in our toolbox and is what juju uses and it works
<natefinch> wallyworld: that's fine.  I'll change my code to use it.
<wallyworld> ty :-)
<wallyworld> PatchExecutableEchArgs i *think* will work ok
<wallyworld> for what we need here
<natefinch> wallyworld: ok
<mup> Bug #1550586 changed: github.com/juju/juju/tools/lxdclient_test constant overflows int <ci> <i386> <lxd> <regression> <juju-core:Fix Released> <https://launchpad.net/bugs/1550586>
<mup> Bug #1572355 opened: provider/maas/volumes_test.go constant overflows int on 386 <ci> <i386> <regression> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1572355>
<natefinch> wallyworld: I'm not sure we can verify the args *and* produce output using what's there. It looks like it's one or the other
<wallyworld> natefinch: yeah, just looked at the code, it seems so
<natefinch> wallyworld: I actually have to go for a while... my toddler's awake, and I have to take care of him.  Maybe we can address the tests in beta 6?  They're not *wrong* per se, even if they're not consistent with the rest of the code.
<natefinch> wallyworld: I have time later to work on them, just not sure how ASAP this should be
<natefinch> wallyworld: might be a couple hours
<wallyworld> hmmm, ok, we need to get this landed asap. i was hoping for some other cleanup also but i guess that can be done in a followup
<wallyworld> natefinch: i've hit merge
<natefinch> wallyworld: cool, thanks for the review and the merge.  i'll be back on later, we can talk about further cleanup.
<wallyworld> ok
<mup> Bug #1572355 changed: provider/maas/volumes_test.go constant overflows int on 386 <ci> <i386> <regression> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1572355>
<mup> Bug #1550586 opened: github.com/juju/juju/tools/lxdclient_test constant overflows int <ci> <i386> <lxd> <regression> <juju-core:Fix Released> <https://launchpad.net/bugs/1550586>
<menn0> cherylj, wallyworld, thumper: I'm seeing a lot of this in failed CI runs from the last 24 hours:
<menn0> ERROR juju.state.unit unit.go:720 unit ubuntu/0 cannot get assigned machine: unit "ubuntu/0" is not assigned to a machine
<menn0> This is happening when jobs are just deploying the ubuntu charm
<menn0> I'm seeing it in functional-ha-recovery and functional-ha-backup-restore
<menn0> it's just in the test setup before it tries to enable HA and do other things
<menn0> hmmm actually it might just happen when Status is called and a unit is being deployed...
<mup> Bug #1550586 changed: github.com/juju/juju/tools/lxdclient_test constant overflows int <ci> <i386> <lxd> <regression> <juju-core:Fix Released> <https://launchpad.net/bugs/1550586>
<mup> Bug #1572355 opened: provider/maas/volumes_test.go constant overflows int on 386 <ci> <i386> <regression> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1572355>
<sinzui> menn0: yes We see than in the restore tests that claim success, but acutally failed https://bugs.launchpad.net/juju-core/+bug/1569467
<mup> Bug #1569467: juju restore-backup does not complete properly <backup-restore> <ci> <regression> <juju-core:Triaged by fwereade> <https://launchpad.net/bugs/1569467>
 * sinzui just purges 5 instances from us-east-1 left behind by today's tests
<thumper> menn0: hmm...
<thumper> just status related?
<menn0> sinzui: a lot of the log files for the failures also appear to be truncated ... activity in the console logs goes on for much longer than what's in the machine logs
<sinzui> that is odd. I never seen truncated logs before
 * menn0 hopes it isn't that juju that stops logging
<menn0> I'll do some more digging but things look pretty unhappy right now
<wallyworld> thumper: ping
<thumper> wallyworld: hey
<wallyworld> for you maas BlockSize types, you should use int64
<wallyworld> uint64
<wallyworld> we do that elsewhere in storage
<wallyworld> and not that it matters now, because we have just agreed to drop i386, but it fails on i386
<wallyworld> due to int overflow
<wallyworld> but to be consistent with juju/storahe/BlockDevice
<wallyworld> would be good
<wallyworld> so gomaasapi.BlockDevice could be updated
<wallyworld> thumper: you ok with that?
<thumper> meh
<thumper> can do
<wallyworld> thumper: it will avoid casts when copying data
<thumper> wallyworld: otp right now, so distracted
<wallyworld> sure np
<menn0> sinzui, thumper, cherylj, wallyworld: ok I'm less concerned about the failures now. it looks like we've been running into quota limits
<menn0> "cannot run instances: Your quota allows for 0 more running instances"
<cherylj> fun
<sinzui> menn0: yes the restore tests are always leaving unit/0 behind because the machine is not known by the restored controller
<menn0> I believe sinzui just cleaned up some orphaned instances
<sinzui> and I did just clean us-east-1 a hour ago
<menn0> sinzui, cherylj, wallyworld: because of the quota issues I still can't be sure if my peergrouper changes have helped
<menn0> I'll keep an eye on the runs
<mup> Bug #1543770 changed: Juju 2.0alpha1 does not assign a proper netmask for LXC containers <cpe-critsit> <cpe-sa> <dhcp> <lxc> <maas> <network> <juju-core:Fix Released> <https://launchpad.net/bugs/1543770>
<thumper> wallyworld: the main reason I didn't was due to schema.ForceInt returning an int
<thumper> and that is the only reason
<thumper> could easily do it inside gomaasapi
<wallyworld> ok
<wallyworld> thumper: fwiw, iirc, all sizes in juju/storage are uint64 so it will make your life easier later
 * thumper nods
<axw> wallyworld: can you please take a look at https://github.com/go-amz/amz/pull/67 when you have a moment
<wallyworld> sure
<wallyworld> sorry, i forgot
<axw> wallyworld: np, juju PR hasn't had a proper review yet anyway
<perrito666> yay go 1.6 made http.Client default transport no longer support  file:// protocol
<perrito666> at least in windows :p
<perrito666> or changed the way it handles file :p
<wallyworld> axw: lgtm, i assume we will set the new sg tags elsewhere
<axw> wallyworld: yes, in the juju branch
<axw> ta
<perrito666> well with the pleasure of having fixed this ill go to sleep see you all tomorrow, cheers
<wallyworld> perrito666: ty. do you have a reference for the change in behaviour? i wonder ehy they did it just on windows
<perrito666> Wallyworld so what changed (just confirmed through testing) is the kind of url handled
<mup> Bug # changed: 1379396, 1456757, 1481133, 1494743, 1534610, 1534619, 1534632, 1534637, 1544853, 1550817, 1550821, 1572355
<perrito666> Windows is different from linux in the necessity to have a drive letter in the path
<wallyworld> perrito666: ok, i'll look for something in the release notes
<wallyworld> they must have documented it if the behaviour changed
<perrito666> Tomorrow ill hit the code/release notes to see why \\localhost\c$ is no longer accepted and what is now the alternative
<perrito666> By default, if the drive letter is omitted c is used and it works
<perrito666> Now ill need to figure what is the new way of specifying the drive
<perrito666> C u all tomorrow
<mup> Bug # changed: 1469807, 1534620, 1545045, 1552021, 1559381, 1559382, 1559704
<mup> Bug # opened: 1469807, 1534620, 1545045, 1552021, 1559381, 1559382, 1559704
<mup> Bug # changed: 1469807, 1534620, 1545045, 1552021, 1559381, 1559382, 1559704
<mup> Bug # changed: 1290920, 1421315, 1421621, 1436863, 1455627, 1457575, 1460683, 1488576, 1532831, 1559715, 1568069
<mup> Bug #1458588 opened: cannot add charm to storage: io timeout <ci> <intermittent-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1458588>
<mup> Bug #1572382 opened: Can not add credential to user-defined OpenStack provider in juju 2.0 <openstack-provider> <juju-core:New> <https://launchpad.net/bugs/1572382>
<natefinch> wallyworld: 1:1?
<wallyworld> natefinch: sure, otp, one sec
<natefinch> wallyworld: np
<mup> Bug # changed: 1260187, 1490656, 1498010, 1498084, 1498175, 1502935, 1514451, 1521220, 1528975
<wallyworld> natefinch: next bug for your to do list maybe, bug 1567518
<mup> Bug #1567518: Payload commands don't work <juju-core:Triaged> <https://launchpad.net/bugs/1567518>
<wallyworld> after other stuff is wrapped up
<natefinch> wallyworld: yeah, I'd been trying to stick to the red ones, but I'm more than happy to fix that one.. I'm sure it must be something dumb.  I had looked at it briefly before, but didn't immediately see the problem.
<wallyworld> ty, will be good to get that fixed
<wallyworld> thumper: there's a test breakage affecting CI in a maas2 test, i will skip it for now to unblock beta5
<wallyworld> TestInstanceVolumesMAAS2
<thumper> wallyworld: ok
<wallyworld> bug 1572353
<mup> Bug #1572353: mismatch at [0].Tag.id: unequal; obtained "2"; expected "1" <ci> <regression> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1572353>
<thumper> ta
 * thumper looks
<mup> Bug #1391066 changed: worker/uniter: FAIL: filter_test.go:449: FilterSuite.TestConfigAndAddressEventsDiscarded <ci> <intermittent-failure> <test-failure> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1391066>
<mup> Bug #1409739 changed: UniterSuite.TestActionEvent failing intermittently <ci> <intermittent-failure> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1409739>
<mup> Bug #1426394 changed: TestConfigEvents random failure <ci> <intermittent-failure> <test-failure> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1426394>
<thumper> wallyworld: it is a dict ordering bug
<wallyworld> thumper: samecontents may fix it
<thumper> aye
 * thumper looks
<wallyworld> but i didn't want to just assume that order didn't matter
<wallyworld> without looking a bit deeper
<wallyworld> thumper: it is a slice isn't it?
<wallyworld> []storage.Volume
<wallyworld> hence ny concern about ordering
<thumper> yes, but it is coming in from python
<thumper> which may not have strict ordering
<thumper> let me look deeper
<thumper> you can skip it now
<wallyworld> ok
<thumper> I can unskip for the landing
 * thumper reads the tests
<thumper> wallyworld: yep
<thumper> the source iterates over a map
<thumper> and appends into a slice
<thumper> so ordering is not defined
<wallyworld> ah, the python source
<thumper> no
<thumper> the test creates a dict
<thumper> no, map
 * thumper tries to replicate the test failure locally
<wallyworld> oh right ok, i didn't look too closely at the test
<thumper> got it
<thumper> got a fix
<thumper> and tested
<wallyworld> thumper: if you push your fix, i'll stop the landing job
<thumper> wallyworld: have you seen dave's stress script
<thumper> ?
<thumper> wallyworld: pushing now
<wallyworld> no, don't think so
<wallyworld> if i have i gorgot
<wallyworld> forgot
<mup> Bug #1391066 opened: worker/uniter: FAIL: filter_test.go:449: FilterSuite.TestConfigAndAddressEventsDiscarded <ci> <intermittent-failure> <test-failure> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1391066>
<mup> Bug #1409739 opened: UniterSuite.TestActionEvent failing intermittently <ci> <intermittent-failure> <test-failure> <unit-tests> <juju-core:Triaged> <https://launchpad.net/bugs/1409739>
<mup> Bug #1426394 opened: TestConfigEvents random failure <ci> <intermittent-failure> <test-failure> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1426394>
<thumper> http://paste.ubuntu.com/15940883/
<thumper> wallyworld: http://reviews.vapour.ws/r/4649/
<wallyworld> looking
<wallyworld> i'll have to keep that script handy
<thumper> wallyworld: I have it in ~/bin/ as 'stress.sh'
<thumper> it compiles and links once and runs repeatedly
<thumper> very handy for intermittent failures
<wallyworld> indeed
<wallyworld> thumper: +1, i'll cancel my job
<thumper> k
<thumper> I've asked the bot to merge it
<thumper> it has been accepted
<mup> Bug # changed: 1391066, 1409739, 1411818, 1426394, 1432654, 1436495, 1440199, 1440205, 1440213, 1440219, 1451104, 1456726
<mup> Bug # changed: 1456763, 1459337, 1461578, 1463047, 1463135, 1469777, 1471004, 1471030, 1471308
<mup> Bug #1456728 opened: UniterSuite.TearDownTest fails <ci> <test-failure> <juju-core:Triaged> <juju-core 1.22:Triaged> <https://launchpad.net/bugs/1456728>
<mup> Bug #1457092 opened: InvalidInstanceID.NotFound <bootstrap> <ci> <intermittent-failure> <juju-core:Triaged> <juju-core 1.22:Triaged> <juju-core 1.24:Triaged> <https://launchpad.net/bugs/1457092>
<mup> Bug #1458992 opened: Uploading tools: closed explicitly <ci> <intermittent-failure> <juju-core:Incomplete> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1458992>
<menn0> thumper, axw or wallyworld: https://github.com/juju/juju/pull/5225
<wallyworld> looking
<wallyworld> menn0: nice pickup
<menn0> wallyworld: I wasn't able to repro but it because obvious when I looked right to the bottom of the race detector output and saw where one of the goroutines had been created (i.e. a different test)
<wallyworld> ack
<thumper> wallyworld, menn0: http://reviews.vapour.ws/r/4651/
<menn0> thumper: looking
 * wallyworld hugs thumper
<thumper> menn0: I really wanted to change schema.ForceInt to return int64 instead of int but that was out of scope for this change
<thumper> I really hate that scheam.Int returns int64
<thumper> but scheam.ForceInt is just int
<wallyworld> +100
<thumper> wallyworld: lp bug 1572353 fix committed
<mup> Bug #1572353: mismatch at [0].Tag.id: unequal; obtained "2"; expected "1" <ci> <regression> <test-failure> <juju-core:Fix Committed by thumper> <https://launchpad.net/bugs/1572353>
<wallyworld> i saw :-)
<wallyworld> ty
<thumper> np
<menn0> thumper: you've only tested for numbers around +/-42
<menn0> thumper: I think you need to test for all numbers
<menn0> sort that out will you :-p
<thumper> heh
 * menn0 jokes
<menn0> thumper: ship it
<thumper> for i := 0; i < math.MAX_INT; i++ { check fmt.Sprint(i)... }
<menn0> thumper: what about all floating point numbers too :)
<thumper> I did notice a missed code path
<menn0> and all possible strings
<thumper> for negative float values
 * thumper will add it
<menn0> thumper: what's that?
<thumper> -42.5
<thumper> the conditional in the float case statement
<thumper> that checks for v < 0
<menn0> ah right... a missed test not implementation
<menn0> you had me looking in the implementation and that looked right
<thumper> yeah, just the test
<thumper> sinzui: does the merge bot do juju/schema?
<natefinch> thumper: easiest way to find out, look at previous commits
<thumper> doesn't look like it
<sinzui> thumper: the bot only runs make build and make test
<redir_> looks like next is back
<mup> Bug # changed: 1463399, 1463826, 1464671, 1466525, 1468357, 1468369, 1469196, 1471775, 1479889, 1479942, 1484303, 1484308, 1485013, 1490653, 1494749, 1494754, 1494765,
<mup> 1494870, 1494876, 1494887, 1494894, 1494938, 1496472, 1502127, 1502149, 1502153, 1502154, 1507637, 1507644, 1534623, 1559313, 1560061, 1560192, 1565827
<thumper> wallyworld: https://github.com/juju/gomaasapi/pull/43
<mup> Bug # changed: 1535328, 1557264, 1559299, 1559305, 1565831, 1566450, 1566452
<mup> Bug # opened: 1535328, 1557264, 1559299, 1559305, 1565831, 1566450, 1566452
<mup> Bug # changed: 1384233, 1436407, 1457124, 1458992, 1461965, 1461968, 1461969, 1462409, 1462415, 1467556, 1510129, 1532232, 1535328, 1557264, 1559299, 1559305, 1565831, 1566450, 1566452
<mup> Bug #1558803 changed: Manual deploy on ppc64el wants wrong package and agents <ci> <manual-provider> <juju-core:Fix Released> <https://launchpad.net/bugs/1558803>
<axw> wallyworld: probably not till later given the meeting's on soon, but if you can take a look later it'd be much appreciated: https://github.com/juju/juju/pull/5226
<wallyworld> sure
<mup> Bug #1558803 opened: Manual deploy on ppc64el wants wrong package and agents <ci> <manual-provider> <juju-core:Fix Released> <https://launchpad.net/bugs/1558803>
<mup> Bug #1558803 changed: Manual deploy on ppc64el wants wrong package and agents <ci> <manual-provider> <juju-core:Fix Released> <https://launchpad.net/bugs/1558803>
<hoenir> In order to know the version for windows ,I need to look at the CurrentVersion registry value but there are some windows versions with the same SKU, like for example desktop and server, were can I look to know if I'm running a windows server editior or destktop one, other than looking on ProductName? It's there any integer or bytes number that will give me this answer?
<voidspace> "nonpersisting-pamala" is a slightly worrying name for a maas node
<voidspace> I'd quite like it to be persisting thank you very much
<babbageclunk> axw: ping?
<axw> babbageclunk: pong, in a meeting atm
<babbageclunk> axw: ok - let me know when you're free? Just want to confirm stuff before I go off half-cocked again. :)
<axw> babbageclunk: :) no worries, will do
<dimitern> voidspace, babbageclunk, frobware: please take a look http://reviews.vapour.ws/r/4653/
<voidspace> dimitern: ah, but have you tested on MAAS 2?
<dimitern> voidspace: not yet, wasn't sure if it'll work, but will run a test now
<voidspace> dimitern: should do
<voidspace> dimitern: anyway, straightforward - looks good
<dimitern> voidspace: thanks, will let you know if it works on 2.0 shortly
<axw> babbageclunk: sorry, finished now
<babbageclunk> axw: great, thanks!
<babbageclunk> axw: I'm already putting device.Name() into the VolumeAttachmentInfo.DeviceName, I think - should I change that to the full /dev/disk/by-dname/ path?
<babbageclunk> awx: or is it better with the device name.
<babbageclunk> gah, typo'd your name
<axw> babbageclunk: is /dev/disk/by-dname *always* available?
<babbageclunk> axw: yeah, from what blake_r was saying yesterday.
<axw> babbageclunk: also, is it really "by-dname"? not by-name?
<axw> ok
 * axw looks at MAAS docs and sees that it really is
<babbageclunk> axw: by-dname
<voidspace> babbageclunk: so, CreateDevice only lets us create a device with one interface - and AllocateContainerAddresses wants to be able to create devices with multiple interfaces on different subnets
<voidspace> babbageclunk: so it can't just be an "all in one call" unfortunately
<voidspace> babbageclunk: still not too bad
<babbageclunk> axw: do we also want to extract the hardware id from the id_path if it's a by-id one?
<axw> babbageclunk: typically speaking, a device name (e.g. /dev/sdb) is not guaranteed to be persistent
<voidspace> dimitern: do we really need to be able to create containers (devices) with multiple nics on different subnets?
<axw> babbageclunk: we need something unique and persistent for the lifetime of the volume's attachment to the machine
<axw> babbageclunk: i.e. it must not change if the machine is restarted
<babbageclunk> axw: ok
<axw> babbageclunk: for MAAS on KVM, the device names should not change
<axw> babbageclunk: I can't speak with any authority about by-dname, but it looks to me like it's just the same as the device name
<axw> babbageclunk: whereas "by-id" should be persistent
<babbageclunk> axw: ok - so then I should do something like what's there already (in the MAAS 1 version) - use the by-id path to get the hardware id if available
<axw> babbageclunk: that's my feeling, but if there's one thing that's always guaranteed to be available, and guaranteed to be persistent - then use that
<babbageclunk> axw: and if not set the device name on the attachment info
<axw> babbageclunk: yup. doesn't hurt to set the DeviceName field also; HardwareId will be used in preference it it's set
<babbageclunk> axw: Well, that was what I understood from blake_r
<babbageclunk> axw: (about by-dname)
<axw> babbageclunk: so if the by-dname value is always persistent, then you could just put that into DeviceLink
<axw> babbageclunk: and forget about HardwareId and DeviceName
<axw> babbageclunk: on the machine, we have a worker that inspects udev to get the device links, so we can match against the one you provide in VolumeInfo
<babbageclunk> axw: ah, ok - I hadn't clicked DeviceLink was a different field on the attachment info - my brain was autocorrecting it to DeviceName.
<axw> VolumeAttachmentInfo*
<axw> babbageclunk: yeah, DeviceName is just "sdb", whereas DeviceLink can be any of the /dev/disk/by-... links
<babbageclunk> axw: Ok - that sounds like the right thing to do - just DeviceLink on the attachment info.
<babbageclunk> axw:
<babbageclunk> Oops
<axw> babbageclunk: we needed that for GCE, because on there you're expected to use one of those paths to uniquely/persistently identify disks
<axw> sounds good
<babbageclunk> axw: awesome - thanks for helping me muddle through all this.
<axw> babbageclunk: not at all :)
<axw> babbageclunk: gotta go now, have a nice day
<babbageclunk> axw: you too!
<babbageclunk> voidspace: yeah, I guess convenience calls cost flexibility.
<babbageclunk> voidspace: could we reasonably extend the gomaasapi one to take more info?
<babbageclunk> voidspace: or does it make more sense to just do it in the provider with the lower-level bits?
<voidspace> babbageclunk: well, I'm trying to verify if maas supports devices with interfaces on different subnets
<voidspace> babbageclunk: it looks like the CreateDevice call only supports multiple interfaces on the same subnets
<voidspace> babbageclunk: so, assuming we *can* have multiple interfaces on different subnets, and do it as separate calls to add / link then there's not much value to it being in gomaasapi over juju
<voidspace> babbageclunk: we still have to maintain the code wherever it lives...
<voidspace> babbageclunk: I need to setup a maas that I can test this on though
<babbageclunk> voidspace: yeah, makes sense
<TheMue> morning
<wallyworld> axw: reviewed, have a question about retry
<mup> Bug #1555391 changed: MachineSuite failed during unit test <2.0-count> <ci> <intermittent-failure> <test-failure> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1555391>
<mup> Bug #1544849 changed: unit-test loop with juju.worker.uniter.remotestate retry hook timer triggered <2.0-count> <ci> <intermittent-failure> <ppc64el> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1544849>
<mup> Bug #1545055 changed: TestManageModelRunsUndertaker timed out <2.0-count> <ci> <intermittent-failure> <juju-core:Fix Released> <https://launchpad.net/bugs/1545055>
<mup> Bug #1544849 opened: unit-test loop with juju.worker.uniter.remotestate retry hook timer triggered <2.0-count> <ci> <intermittent-failure> <ppc64el> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1544849>
<mup> Bug #1545055 opened: TestManageModelRunsUndertaker timed out <2.0-count> <ci> <intermittent-failure> <juju-core:Fix Released> <https://launchpad.net/bugs/1545055>
<mup> Bug # changed: 1544849, 1545055, 1567676, 1571947, 1572353
<mup> Bug #1567676 opened: windows: networker tries to update invalid device and blocks machiner from working <windows> <juju-core:Fix Released by dimitern> <https://launchpad.net/bugs/1567676>
<mup> Bug #1571947 opened: bootstrap --upload-tools fails with "cannot start bootstrap instance: missing tools URL" <azure-provider> <bootstrap> <ci> <regression> <upload-tools> <juju-core:Fix Released by axwalk> <https://launchpad.net/bugs/1571947>
<mup> Bug #1572353 opened: mismatch at [0].Tag.id: unequal; obtained "2"; expected "1" <ci> <regression> <test-failure> <juju-core:Fix Released by thumper> <https://launchpad.net/bugs/1572353>
<mup> Bug #1567676 changed: windows: networker tries to update invalid device and blocks machiner from working <windows> <juju-core:Fix Released by dimitern> <https://launchpad.net/bugs/1567676>
<mup> Bug #1571947 changed: bootstrap --upload-tools fails with "cannot start bootstrap instance: missing tools URL" <azure-provider> <bootstrap> <ci> <regression> <upload-tools> <juju-core:Fix Released by axwalk> <https://launchpad.net/bugs/1571947>
<mup> Bug #1572353 changed: mismatch at [0].Tag.id: unequal; obtained "2"; expected "1" <ci> <regression> <test-failure> <juju-core:Fix Released by thumper> <https://launchpad.net/bugs/1572353>
<babbageclunk> voidspace, dimitern, frobware: reviews please? http://reviews.vapour.ws/r/4655/
<dimitern> babbageclunk: looking
<babbageclunk> (That's a nice quick one)
<babbageclunk> voidspace, dimitern, frobware: Also this one (a bit more involved, but just tests): http://reviews.vapour.ws/r/4654/
<dimitern> babbageclunk: no need to bump gomaasapi rev to use Path() ?
<babbageclunk> voidspace: man, journald keeps hanging/crashing on me
<babbageclunk> dimitern: no, it was already there - I didn't end up needing to add HardwareId
<dimitern> babbageclunk: ok, first one LGTM
<babbageclunk> dimitern: sweet, thanks!
<dimitern> babbageclunk: second one also LGTM
<dimitern> frobware, voidspace, dooferlad, babbageclunk: guys, please take a look at http://reviews.vapour.ws/r/4656/ - first step towards cleaning up a *lot* of legacy code
<mup> Bug # changed: 1460882, 1503992, 1534627, 1541482, 1551779, 1554700, 1554705, 1555694, 1557380, 1558078, 1561315, 1563932, 1563939, 1563942, 1563950, 1563956, 1564017, 1564395, 1564515, 1566332, 1566362, 1566367, 1566369, 1566431, 1567170, 1567719, 1567721, 1567722, 1567724, 1567726, 1567728,
<mup> 1567730, 1567732, 1567734, 1567925, 1568669, 1568848, 1568862, 1568925, 1569054, 1569086, 1569386, 1569490, 1569652, 1569654, 1569914, 1569948, 1570473, 1570654, 1571254, 1571476, 1571478, 1571861, 1571901
<mup> Bug #1572585 opened: juju ssh spawns many juju and ssh processes <juju-core:New> <https://launchpad.net/bugs/1572585>
<dimitern> let it rain ! :D
<dimitern> bugs galore
<voidspace> reviews galore too
<mup> Bug #1572585 changed: juju ssh spawns many juju and ssh processes <juju-core:New> <https://launchpad.net/bugs/1572585>
<mup> Bug # opened: 1460882, 1503992, 1534627, 1541482, 1551779, 1554700, 1554705, 1555694, 1557380, 1558078, 1561315, 1563932, 1563939, 1563942, 1563950, 1563956, 1564017, 1564395, 1564515, 1566332, 1566362, 1566367, 1566369, 1566431, 1567170, 1567719, 1567721, 1567722, 1567724, 1567726, 1567728,
<mup> 1567730, 1567732, 1567734, 1567925, 1568669, 1568848, 1568862, 1568925, 1569054, 1569086, 1569386, 1569490, 1569652, 1569654, 1569914, 1569948, 1570473, 1570654, 1571254, 1571476, 1571478, 1571861, 1571901
<mup> Bug # changed: 1460882, 1503992, 1534627, 1541482, 1551779, 1554700, 1554705, 1555694, 1557380, 1558078, 1561315, 1563932, 1563939, 1563942, 1563950, 1563956, 1564017, 1564395, 1564515, 1566332, 1566362, 1566367, 1566369, 1566431, 1567170, 1567719, 1567721, 1567722, 1567724, 1567726, 1567728,
<mup> 1567730, 1567732, 1567734, 1567925, 1568669, 1568848, 1568862, 1568925, 1569054, 1569086, 1569386, 1569490, 1569652, 1569654, 1569914, 1569948, 1570473, 1570654, 1571254, 1571476, 1571478, 1571861, 1571901
<mup> Bug #1572585 opened: juju ssh spawns many juju and ssh processes <juju-core:New> <https://launchpad.net/bugs/1572585>
<babbageclunk> dimitern: thanks for reviews!
<babbageclunk> dimitern: looking at yours now
<dimitern> babbageclunk: cheers!
<rogpeppe> is there any way to destroy a model within a controller without connecting to the model's API?
<rogpeppe> (other than destroying the controller and all its models)
<voidspace> dimitern: wow, that PR must have been annoying to do
<voidspace> dimitern: working through it as wll
<voidspace> *well
<dimitern> voidspace: it was frustrating, I'm glad it's done and verified to work :)
<dimitern> if that's the last time I see any of that code will be too early
<voidspace> dimitern: there's a use of NewSubnetTag in there which can panic
<dimitern> voidspace: I've replied to your comment
<voidspace> dimitern: so it might panic in the future, but as it's unlikely at the moment it's ok?
<dimitern> voidspace: ok, fair enough - AIUI you suggest not to use `subnetID != ""` but names.IsValidSubnet..() instead?
<voidspace> dimitern: isn't there a way of creating a tag that will error instead of panic - those panic'ing New functions are annoyiing
<voidspace> *annoying
<voidspace> dimitern: don't you want to return an error if you get a non-empty-but-invalid subnet id?
<voidspace> dimitern: i.e. leave the empty check, but check IsValid as well
<dimitern> voidspace: the only way I can think of is doing the same check New..() does before it panics
<voidspace> dimitern: yeah, very irritating
<dimitern> voidspace: ok, will change to check for errors etc.
<voidspace> dimitern: as these are stored in state do you need a migration?
<dimitern> voidspace: IIRC migrations are not yet working
<dimitern> but thumper can know for sure
<voidspace> dimitern: ok, but will you remember we need one when they are - or are we just not worrying at all about upgrades?
<babbageclunk> voidspace: what should I be working on now?
<voidspace> babbageclunk: well, the only remaining task is: Evaluate all tests in environ_whitebox_test.go and see if any ought to be ported to MAAS 2
<voidspace> babbageclunk: that should be fun...
<dimitern> voidspace: I'll send a mail to thumper/menn0 to clarify that; upgrades from pre-2.0 are not yet allowed AIUI
<voidspace> babbageclunk: with the exception of device related tests of course
<babbageclunk> voidspace: ok, I guess I'll start looking at that!
<voidspace> babbageclunk: that should be fun, sorry
<voidspace> babbageclunk: I'm nearly there on devices
<babbageclunk> dimitern: What you were suggesting should be cached on the maas environ this morning? Can't remember.
<dimitern> babbageclunk: the apiversion check now always done in SetConfig
<babbageclunk> dimitern: So if ecfg.maasServer() & .maasOAuth() haven't changed and env.apiVersion isn't zero then skip the version detection?
<dimitern> babbageclunk: let me have a look at how it is now
<mup> Bug #1564670 changed: After dist upgrade Juju 1.X should still be the default <packaging> <juju-core:Fix Released> <juju-core (Ubuntu):Fix Released> <https://launchpad.net/bugs/1564670>
<mup> Bug #1572634 opened: help text for juju logout needs improving <helpdocs> <juju-core:New> <https://launchpad.net/bugs/1572634>
<mup> Bug #1572637 opened: help text for juju change-user-password needs improving <helpdocs> <juju-core:New> <https://launchpad.net/bugs/1572637>
<dimitern> babbageclunk: hmm.. so what I thought is just verifying whether maasEnviron.apiVersion is empty and only then doing the version check
<dimitern> babbageclunk: it could be done with a guard mutex, like SupportedArchitectures()
<mup> Bug #1572637 changed: help text for juju change-user-password needs improving <helpdocs> <juju-core:New> <https://launchpad.net/bugs/1572637>
<babbageclunk> dimitern: ok - will that be a problem if the maas server url changes? Is that something that even makes sense?
<dimitern> babbageclunk: it cannot change for an existing model
<babbageclunk> dimitern: ok
<babbageclunk> dimitern: There's already a guard mutex in SetConfig - seems like that would cover this, right?
<babbageclunk> dimitern: so really this is just an early return if env.apiVersion is already set.
<babbageclunk> dimitern: is it por
<babbageclunk> gah
<dooferlad> frobware: I am not seeing the bridge script run at all on a precise node. Odd.
<dimitern> babbageclunk: that could work as well, yeah
<babbageclunk> dimitern: is it possible for someone to upgrade their maas from 1.9 to 2.0 underneath a Juju model?
<babbageclunk> dimitern: (sorry if these are crazy questions)
<frobware> dooferlad: odd indeed.
<dimitern> babbageclunk: it's possible - one of the many ways to shoot yourself in the foot :)
<dooferlad> frobware: looks like the file isn't added to the file system.
<frobware> dooferlad: odd. :) Because I have tried this in the past as that's where I ran into the --<option> incompatibilities.
<dimitern> babbageclunk: but we're not trying to handle this case - for none of the providers AFAICS
<babbageclunk> dimitern: ok
<dooferlad> frobware: http://paste.ubuntu.com/15951618/
<dooferlad> I put in the "Done bridging..." message. If the script is found you also get a "Bridging..." message.
<frobware> dooferlad: line 13, unlucky for some...
<dooferlad> frobware: indeed
<frobware> dooferlad: is this one the first boot, or second?
<dooferlad> frobware: not sure.
<frobware> dooferlad: it's the second, then to be expected as we `trap 'rm -f /tmp/add-juju-bridge.py' so that it doesn't run again.
<dooferlad> frobware: that is the only mention of it in /var/log/cloud-init-output.log
<dooferlad> frobware: is that overwritten at each boot?
<frobware> dooferlad: dunno for sure
<dooferlad> frobware: there is only one interfaces file and it hasn't been bridged, which is why I went looking.
<frobware> dooferlad: I have no nodes atm (tis deliberate)...
<mup> Bug #1572382 changed: Can not add credential to user-defined OpenStack provider in juju 2.0 <openstack-provider> <juju-core:Invalid> <https://launchpad.net/bugs/1572382>
<mup> Bug #1572637 opened: help text for juju change-user-password needs improving <helpdocs> <juju-core:Triaged> <https://launchpad.net/bugs/1572637>
<mup> Bug #1572382 opened: Can not add credential to user-defined OpenStack provider in juju 2.0 <openstack-provider> <juju-core:Invalid> <https://launchpad.net/bugs/1572382>
<mup> Bug #1572382 changed: Can not add credential to user-defined OpenStack provider in juju 2.0 <openstack-provider> <juju-core:Invalid> <https://launchpad.net/bugs/1572382>
<mup> Bug #1386219 changed: Usability: adding a large number of units takes too long <deploy> <performance> <scale-testing> <usability> <juju-core:Triaged> <https://launchpad.net/bugs/1386219>
<mup> Bug #1386219 opened: Usability: adding a large number of units takes too long <deploy> <performance> <scale-testing> <usability> <juju-core:Triaged> <https://launchpad.net/bugs/1386219>
<mup> Bug #1386219 changed: Usability: adding a large number of units takes too long <deploy> <performance> <scale-testing> <usability> <juju-core:Triaged> <https://launchpad.net/bugs/1386219>
<babbageclunk> voidspace: did you forget what day it is?
<voidspace> babbageclunk: yes
<babbageclunk> voidspace: :)
<voidspace> babbageclunk: hah, no I just copied and pasted and forgot to change it
<natefinch> sinzui, mgz: what's our 2.0 upgrade plan for non-ubuntu users?  Just realized the fix I made for bug #1564622 probably should be ubuntu-only
<mup> Bug #1564622: Suggest juju1 upon first use of juju2 if there is an existing JUJU_HOME dir <juju-release-support> <juju-core:In Progress by natefinch> <https://launchpad.net/bugs/1564622>
<natefinch> cherylj: ^
<sinzui> natefinch: Just install the new package. it replaces the old package in the cas of windows clients. Centos and osx packages are just dirs in the $PATH.
<natefinch> sinzui: uh, that's a problem for windows, isn't it?  If someone wants to run them side by side.
<sinzui> natefinch: could be
<natefinch> sinzui: it's probably trivial to get 2.0 to install side by side with 1.x, probably just need to change the directory name we install to, and probably the installer UUID.
<sinzui> natefinch: I think inno setup allows you to choose an alternate location.
<natefinch> sinzui: well, yes, but if the AppId is the same as the old appId, it'll remove the old one.  I think we want to change the AppId to a new UUID, so it won't remove the old one
<natefinch> sinzui: and probably change the app name to "Juju 2.0" (which doesn't change the executable name, just how it shows up in add/remove programs and the start menu)
<sinzui> natefinch: I will report the bug to get this done
<natefinch> sinzui: while we're in there, we should change the URL from juju.ubuntu.com to jujucharms.com
<sinzui> yes
<natefinch> sinzui: how about homebrew? I presume we have to change something there to get them side by side.
<natefinch> sinzui: my knowledge of homebrew doesn't extend much past knowledge of its existence.
<sinzui> natefinch: We could propose that and they may accept it
<natefinch> sinzui: I think we should.  If we don't have backwards or forwards compatibility, people need to be able to run side by side.
<redir_afk> bbiab ~30 minutes
<redir> natefinch, ericsnow: review please http://reviews.vapour.ws/r/4661/
<ericsnow> redir: k
<redir> ericsnow: when you have a minute of course
<cherylj> fwereade: if you're still around:  https://bugs.launchpad.net/juju-core/+bug/1572237/comments/5
<mup> Bug #1572237: juju rc1 loses agents during a lxd deploy <lxd-provider> <juju-core:Triaged by fwereade> <https://launchpad.net/bugs/1572237>
<mup> Bug #1572695 opened: Manual add-machine creates additional broken machine <ci> <intermittent-failure> <manual-provider> <precise> <juju-core:Triaged> <https://launchpad.net/bugs/1572695>
<alexisb> cherylj, I think he is out for the rest of the day
<TheMue> heya cherylj and alexisb
 * TheMue is still in with one eye while coding in his hotel room
<perrito666> wow utils tests are seriously not windows compatible
<TheMue> perrito666: maybe windows is not unit test compatible
<natefinch> TheMue: if so, only because we made it so :/
<TheMue> natefinch: did anybody ever expected our bash on win? ;)
<natefinch> TheMue: right? I think that's really cool.  I think someone said they tried ubuntu's juju in windows' bash, and it didn't work, but I didn't see any more details.
<TheMue> natefinch: juju simply is cool
<TheMue> natefinch: I'm currently working with ansible. no real environment/provider abstraction, no hiding of operating systems, more scripted approach, and nostuff like hooks or actions
<perrito666> wow, hold that thought, the tests dont pass on linux either :|
<TheMue> perrito666: ouch
<perrito666> something wrong with my repo... git, I hate you
<mup> Bug #1572695 changed: Manual add-machine creates additional broken machine <ci> <intermittent-failure> <manual-provider> <precise> <juju-core:Triaged> <https://launchpad.net/bugs/1572695>
<perrito666> could anyone run utils test suite for utils master and let me know if they have a success?
<mup> Bug #1572695 opened: Manual add-machine creates additional broken machine <ci> <intermittent-failure> <manual-provider> <precise> <juju-core:Triaged> <https://launchpad.net/bugs/1572695>
<cherylj> perrito666: I get "http: TLS handshake error from 127.0.0.1:56350: remote error: bad certificate"
<perrito666> what version of go?
<cherylj> go 1.5
<cherylj> I'm on wily still
<perrito666> mm, anyone with 1.6?
<cherylj> heh
<perrito666> cherylj: tx, I believe the error lies in the go version
<perrito666> as it is in exec.Exec
<cherylj> I see
<cherylj> I'm not advanced enough
<cherylj> :P
<TheMue> Oh, tomorrow is 16.04 release? Wow, thought it would be next week.
<natefinch> perrito666: have you run hodeps?
<natefinch> perrito666: godeps
<perrito666> yup
<perrito666> packaging/manager/manager_test.go:271: too few values in struct initializer
<perrito666> packaging/manager/utils_test.go:50: too few values in struct initializer
<perrito666> packaging/manager/utils_test.go:88: too few values in struct initializer
<natefinch> perrito666: lol, yep, failures
 * perrito666 picardfacepalms
<perrito666> so, are we now 1.6 people?
<natefinch> we are
<natefinch> officially
<perrito666> good so if I change the tests we become 1.6 people only :)
<natefinch> yep
<natefinch> perrito666: looks like 1.6 changed exit error: the returned ExitError now holds a prefix and suffix
<natefinch> perrito666: oh, that doesn't quite apply... but I'm sure it's just because they added a field to exiterror
<perrito666> natefinch: is it? I dont see that in the api doc
<perrito666> https://golang.org/pkg/os/exec/#ExitError
<perrito666> yes, they most likely did
<natefinch> perrito666: sorry, I should copy more: he returned ExitError now holds a prefix and suffix (currently 32 kB) of the failed command's standard error output
<perrito666> I am guessing its stderr
<natefinch> perrito666: which is the Stderr field
<natefinch> perrito666: the backwards compatibilty guarantee only holds if we're careful to use keyed fields... i.e., if we'd made that declaration ExitError{ProcessState: &state }  then it would still compile
<perrito666> yep
<perrito666> we should always do that
<natefinch> yep.  my editor warns me when there's code that doesn't do that
<perrito666> I really dont get why someone wouldnt it isnt like you are charged by the character
<natefinch> from golint or govet, I forget which
<perrito666> sinzui: bugs against juju/utils should go in juju-core launchpad too?
<sinzui> perrito666: Yes
<mup> Bug #1572700 opened: Support co-installable wind clients <windows> <juju-core:Triaged> <juju-release-tools:Triaged> <https://launchpad.net/bugs/1572700>
<natefinch> ahh.. it's go vet. We run that on the merge bot for juju core... I'm guessing we're not on juju/utils for some reason
<natefinch> sinzui, mgz: ^ shouldn't we run go vet for the merge bot for juju utils?  The code that breaks the current unit tests in 1.6 should have failed go vet even in earlier versions of go.
<perrito666> there you go, windows tests on utils are now an issue
<sinzui> natefinch: ah, very nice to know. I will discussi this with mzg. I think we want this for all juju/* projects right?
<natefinch> sinzui: definitely. It detects bugs and places which might become bugs later
<natefinch> (like this)
<perrito666> redir: hey, saw your comment, the tests currently in place reviewboard messed everything
<perrito666> could you please check https://github.com/juju/utils/pull/205/files ?
<perrito666> also ill need a second pair of eyes on that? natefinch ? its a short one https://github.com/juju/utils/pull/205/files
<mgz> yeah, we maybe want to throw in vet with some settings as a standard part of the gating
<natefinch> perrito666:  kk
 * perrito666 rips his shirt and screams windows gating <in the voice of kirk yelling kahn>
<redir> perrito666: claro!
<redir> looking
<redir> mgz: +1
<natefinch> perrito666: hm... go vet fails 102 times for utils...
<natefinch> perrito666: but at least that gets us to compilation
<perrito666> natefinch: it was not me :p
<perrito666> but yeah,  no pre push hook for utils
<mup> Bug #1572700 changed: Support co-installable wind clients <windows> <juju-core:Triaged> <juju-release-tools:Triaged> <https://launchpad.net/bugs/1572700>
<perrito666> natefinch: so do you or do I send the mail scolding everyone on the use of utils?
<redir> perrito666: Looking at the GH PR I have a couple other questions, mostly because my windows knowledge consists of cobwebs and dust.
<perrito666> redir: shoot
<natefinch> perrito666, mgz, sinzui: go vet actually fails on juju/juju too... I bet we're eliding the check for composite literals, since there are 1025 failures of that type in juju/juju
<redir> perrito666: is the use of that file url on windows always for powershell?
<sinzui> ouch
<mgz> natefinch: yeah, or still on an older vet version
<redir> perrito666: does it need to have triple slash? file:///c:/foo/bar
<natefinch> mgz: ouch
<perrito666> redir: no it doesnt
<natefinch> redir: the triple slash comes from having an empty drive letter, so file:///foo/bar or file://c:/foo/bar
<redir> perrito666: natefinch OK, tx. LGTM then.
<natefinch> perrito666: lgtm
 * perrito666 $$merge$$s and then wonders if the bot is there :p
<mup> Bug #1572700 opened: Support co-installable wind clients <windows> <juju-core:Triaged> <juju-release-tools:Triaged> <https://launchpad.net/bugs/1572700>
<natefinch> mgz: why are we on an old version of go vet?  Just to avoid some of the checks?
<mup> Bug #1572703 opened: utils tests are broken in windows. <juju-core:New> <https://launchpad.net/bugs/1572703>
<mup> Bug #1572707 opened: 2.0b5: panic when running juju register <juju-release-support> <landscape> <juju-core:Triaged> <https://launchpad.net/bugs/1572707>
 * perrito666 runs the whole test suite for just one deps change and cries over his tea
<natefinch> perrito666: I usually just do a test compile to make sure everything compiles, on the assumption that subtle behavior won't have changed (and if it does, CI will catch that).
<natefinch> perrito666: go test ./... -run=XXX
<perrito666> natefinch: I never know when changing utils
<alexisb> redir, do you know if there are plans, or have been plans for updating help text for the sapces commands?
<redir> alexisb: not off the top of my  head, but I'll search LP for something
<alexisb> redir, k
<alexisb> thanks
<redir> and ask pmatulis
<thumper> morning
<thumper> voidspace: if you appear online, would love a quick chat catchup to see if there is something I can do to help with the container networking WIP
<TheMue> morning thumper
<thumper> o/ TheMue
<perrito666> one line change has conflicts... I really was someone bad in my past life
<natefinch> is it in dependencies.tsv?
<perrito666> yes
<natefinch> figured
<natefinch> we should really rename that to conflicts.tsv
<natefinch> perrito666: I started working on an auto-conflict resolution tool for dependencies.tsv.  It's really totally mechanical most of the time
<perrito666> could anyone review my one line change? http://reviews.vapour.ws/r/4666/
<natefinch> perrito666: I was trying to think up something snarky to say but... nah.  Ship it.
<voidspace> thumper: hey, hi
<voidspace> thumper: I need deviceInterfaceInfo porting to maas 2 - which is pretty trivial
<voidspace> thumper: then I need to write tests
<voidspace> thumper: and get my KVM properly configured with multiple subnets and machines with multiple nics
<voidspace> thumper: I failed to do that today
<thumper> voidspace: anything I can help with, or shall I do sprint prep
<voidspace> thumper: so if you can get a network config for virt-manager that gives me multiple subnets that would be awesome
<voidspace> thumper: but otherwise, no  - I think we're good
<voidspace> thumper: should be ready to land tomorrow - and manually tested on Friday probably
<thumper> \o/
<voidspace> thumper: I also need to setup my hardware maas
<voidspace> thumper: then it's just bug hunting :-)
<voidspace> thumper: thanks for all your help, it's been fun
<thumper> cheers
<thumper> it has been good
<mup> Bug #1572736 opened: Cannot see 'status' from two different controllers <juju-core:New> <https://launchpad.net/bugs/1572736>
<arosales> cherylj: I am guessing juju status has a controller flag I missed?
<arosales> sorry, saw your comment now.
<cherylj> :)
<arosales> cherylj: be nice to have that in "juju help status" :-)
<mup> Bug #1572736 changed: Cannot see 'status' from two different controllers <juju-core:Invalid> <https://launchpad.net/bugs/1572736>
<mup> Bug #1572741 opened: list-controllers and list-models doesn't list cloud <juju-core:New> <https://launchpad.net/bugs/1572741>
<mup> Bug #1572746 opened: juju help status missing controller syntax <juju-core:New> <https://launchpad.net/bugs/1572746>
<thumper> um... wat?
<thumper> arosales: what do you mean "status from different controllers" ?
<thumper> you get status on a model, not a controller
<redir> brb reboot
<arosales> thumper: I was looking for the syntax "juju status -m <controller>:<model>"
<arosales> thumper: specifically I wanted to watch status on tow different models on two different controllers
<thumper> ah
<arosales> but help only specified -m, thus I opened up 1572746
<arosales> while I am bugging here is the juju 2.0 equivalent for "juju action fetch <action-id>" ?
<arosales> I suspected it would be behind show-action-output, but that only shows completed status not data returned
<mgz> arosales: that sounds like a bug
<arosales> mgz: ok, I'll open a bug on it
<bogdanteleaga> arosales: what output are you getting?
<arosales> bogdanteleaga: the action output
<arosales> bogdanteleaga: juju action fetch <action-id>
<arosales> was the 1.25 syntax to get more data on the action ran
<arosales> but I don't see a "fetch" equivalent in 2.0 :-/
<bogdanteleaga> yeah, I meant what does that show exactly?
<bogdanteleaga> it's just been renamed
<arosales> bogdanteleaga: It varies by service
<arosales> bogdanteleaga: ah great, whats the rename?
<arosales> I thought it may be behind "show-action-output" and I didn't see any other commands in juju help commands that indicated it may be a different command
<bogdanteleaga> arosales, yeah, it should be show-action-output
<bogdanteleaga> I was curious if you also get the enqueued, starting, completed times, or only the status
<arosales> bogdanteleaga: http://paste.ubuntu.com/15957254/ is what I have thus far
<arosales> I get status, and completed times
<arosales> I just need "fetch" for additional details
<bogdanteleaga> arosales, interesting, could you also try running juju run with some command
<bogdanteleaga> and then looking at the actions?
<bogdanteleaga> it should also queue actions
<arosales> bogdanteleaga: do you want that run with an action or just a shell command?
<bogdanteleaga> arosales, shell command should be fine
<arosales> ok
<bogdanteleaga> see if juju run shows something different from show-action-status afterwards
<mgz> like, arosales, just file the bug with the command + output you expect from 1.25 and what you do and get instead with 2.0
<arosales> mgz: will do it jus seemed bogdanteleaga thought this was just a rename
<mgz> arosales: the bug is it's intended to just be a rename, if it's different then...
<arosales> I agree its a bug
<arosales> I would be surprised to see it was a rename as I searched through help output
<bogdanteleaga> arosales, can you point me to a simple charm that has some actions defined?
<mgz> beta4 has under help commands "show-action-output     alias for 'action fetch'"
<arosales> bogdanteleaga: per your request http://paste.ubuntu.com/15957322/
<arosales> I don't connect "run" with "actions" though
<bogdanteleaga> arosales, for example https://paste.ubuntu.com/15957293/
<arosales> bogdanteleaga: see my previous pasebin @ http://paste.ubuntu.com/15957254/
<arosales> charm and actions in there
<bogdanteleaga> arosales, not sure if my machine'll handle a big bundle :P
<arosales> 5 unites, not too bad
<bogdanteleaga> I'll make a simple one
<wallyworld> perrito666: axw: redir: can we have standup now to avoid a clash with my next meeting?
<axw> wallyworld: sure, I'll be there in a couple minutes
<arosales> mgz: in beta5 I see output in help such as  "show-action-status      show results of all actions filtered by optional ID prefix"
<mgz> arosales: things change fast :)
<arosales> mgz: but will confirm 1.25 fetch output is different and file a bug if so
<arosales> mgz: just tyring to test your latest bits :-)
<mgz> arosales: https://github.com/juju/juju/pull/5142 is the change
<arosales> does alternatives work with beta5 to go back to juju 1.25 or should I specifically just prefix the binary path for juju 1.25?
<mgz> arosales: you can use the juju-1 name
<arosales> mgz: thanks
<perrito666> wallyworld: oops, missed your msg, still valid?
<wallyworld> perrito666: yep
<mup> Bug #1572772 opened: URLsSuite.TestImageMetadataURL paths fail on windows <ci> <regression> <test-failure> <unit-tests> <windows> <juju-core:Triaged> <https://launchpad.net/bugs/1572772>
<bogdanteleaga> arosales, https://paste.ubuntu.com/15957449/
<bogdanteleaga> are you sure the charm actually calls action-set?
<arosales> bogdanteleaga: I am confirming that namenode and resourcemanger do that now.  I you show results in our show-action-output
<perrito666> really? concatenating paths with + ? cmon people >:(
<perrito666> k ppl, lots of tests to run on windows, ill be afk, cheers
<mup> Bug #1571053 changed: container networking lxd 'Missing parent for bridged type nic' <ci> <lxd> <juju-core:Fix Released> <https://launchpad.net/bugs/1571053>
<mup> Bug #1572781 opened: tools info mismatch on arch with lxd containers <ci> <lxd> <juju-core:Triaged> <https://launchpad.net/bugs/1572781>
<mgz> hi OCR-antipode, simple review please: https://reviews.vapour.ws/r/4667/
<mgz> cmars: thanks :)
<cmars> mgz, lol i guess that name change didn't cascade into unix
<mwhudson> hah juju does not compile with go tip
<mgz> mwhudson: ^that change is for you
<mgz> axw: is there any chance bug 1572781 is a dupe/other symptom of bug 1571832
<mwhudson> mgz: <3
<mup> Bug #1572781: tools info mismatch on arch with lxd containers <ci> <lxd> <juju-core:Triaged> <https://launchpad.net/bugs/1572781>
<mup> Bug #1571832: Respect the full tools list on InstanceConfig when building userdata config script. <tech-debt> <juju-core:In Progress by axwalk> <https://launchpad.net/bugs/1571832>
<axw> mgz: le sigh. another symptom of 1571832
#juju-dev 2016-04-21
<alexisb> anyone else seeing new unit test failures on master?
<alexisb> something changed for me over the past hour or so
<thumper> fuck...
<thumper> cherylj: I'm going to beg forgiveness
<thumper> I'm doing a drive by
<thumper> during a bug fix
<thumper> because what I found is unneeded code
<thumper> wallyworld: ping
<wallyworld> wot
<wallyworld> it wasn't me
<thumper> wallyworld: do you recall how to have a JujuConnSuite test use a different user?
<wallyworld> it was someone else
<thumper> it was
<thumper> I've not even bothered to look who
<wallyworld> hmmm
<thumper> I don't care
<thumper> in particular
<wallyworld> was just messing with you
<thumper> I'm wanting to add a test for bug 1570594
<mup> Bug #1570594: read access to admin model allows grant <docteam> <juju-release-support> <juju-core:Triaged by thumper> <https://launchpad.net/bugs/1570594>
<wallyworld> i don't recall offhand, but i can look
<thumper> there are feature tests that the command is hooked up
<thumper> but I want to say "run this command as this user"
<thumper> seems like something we would want to do
<thumper> and I thought you may have done something like this (or axw)
<wallyworld> yeah. you can log in from the CLI. but for a test, i'd have to look
<thumper> wallyworld: in particular, I was thinking around the new local user / cred storage story
<wallyworld> thumper: i see another feature test that invokes "logout" and then "login", which will work,but there's a way to manipulate the suite directly i'm sure
<thumper> if I wanted to add somthing that would say "use this user / pwd combo" for the CLI command, how do we do it
<wallyworld> there's a current user in the yaml
<thumper> ew
<wallyworld> current-account
<thumper> we want a repo suite method that says "set this user as current"
<wallyworld> yes, i'm looking for that
<wallyworld> the other ways invoke setting up yaml that the cli would use
<thumper> that's gross
<thumper> we can do better
<wallyworld> eg create > 1 account on a controller with different access and set up one as current
<wallyworld> yes
<thumper> how about...
<thumper> ...
 * thumper thinks
 * thumper looks at the code some more
<wallyworld> thumper: i can't see any feature tests for grant/revoke even. so it may be something we've not added to the suite before
<thumper> it appears that the JujuConnSuite uses the default config storage (ie disk)
<wallyworld> thumper: what about OpenAPIAs()
<wallyworld> you get a client connection as the user you want
<thumper> featuretests/cmd_juju_model_test.go
<thumper> that is where the grant and revoke feature tests are
<thumper> it calls the cli
<thumper> from the outside
<thumper> so I need to set up the "current user"
<wallyworld> thumper: see TestCanPostWithLocalLogin
<thumper> wallyworld: wrong level of interaction
<wallyworld> so you want to set up the current user for a CLI call?
<thumper> yep
<wallyworld> juju switch-user i think
<wallyworld> ah that's gone
<thumper> nah... again wrong level
<thumper> I just want to set using the config manager bits
<thumper> as in: start of test create read only user, set that user as the current one
<wallyworld> you can set up an in memory config
<thumper> then invoke the cli
<wallyworld> one sec
<thumper> I'm sure we do this shit elsewhere
<thumper> just with all the changes around user cred storage, I'm not sure where to look any more
<thumper> it isn't where I last looked
<wallyworld> thumper: see createmodel_test
<wallyworld> in cmd/juju/controller
<wallyworld> the setuptest sets up an in memory config
<thumper> I don't want a memory config
<thumper> but at least I can see part of what I want there
<wallyworld> you want an on disk one in a fake home?
<thumper> where is the disk config?
<thumper> that is what the test currently uses
<wallyworld> jujuclient package
<thumper> because we don't have enough things with the name juju in it?
<thumper> geez
<wallyworld> NewFileClientStore()
<wallyworld> well that's what it is
<wallyworld> functionality for the Juju Client
<wallyworld> we could have called the package George
<thumper> heh
<thumper> it seeme the JujuConnSute has one already in ControllerStore
<thumper> so... what are the minimal steps I need to add a user/pwd pair to the current model for the store?
<wallyworld> thumper: controllerStore.UpdateAccount()
<wallyworld> that will create / update a named account (user)
<wallyworld> for a given controller
<thumper> k
<wallyworld> the AccountDetails struct has user/password plus a macaroon which is optional
<wallyworld> well, i think you can leave it off when testing
<wallyworld> thumper: the controllerStore on JujuConnSuite is right now just used to hold controller info when testing CLI command structs directly. if you want to add users etc it may be best to rename it
<wallyworld> thumper: although TestAddUserAndRegister uses the account bits
<wallyworld> in cmd_juju_register_test
<thumper> I think I have what I need...
<wallyworld> thumper: ok, np. if you get stuck, the above test shows how to setup NewAPIConnectionParams to create an api connection
 * thumper nods
<wallyworld> using the account details you added to the store
<cherylj> and so it was foretold
<cherylj> that tests would fail
<cherylj> on this, the 21st day of April
<cherylj> when thine distro-info revealed
<cherylj> that the lts
<cherylj> is
<cherylj> xenial
<cherylj> http://juju-ci.vapour.ws:8080/job/github-merge-juju/7506/console
<cherylj> and no merges shall merge
<mup> Bug #1572798 opened: list-clouds or show-cloud does not show default region <juju-core:New> <https://launchpad.net/bugs/1572798>
 * thumper headdesks
<cherylj> mgz, sinzui ping?
<sinzui> hi cherylj
<cherylj> hey hey sinzui
<cherylj> happy release day
<cherylj> nothing can merge now
<cherylj> :D
<cherylj> what can we do to get things to merge so we can fix the failing tests incrementally?
<sinzui> cherylj: 1. I open the lst cherry soda.
<cherylj> can we hack the distro-info on whatever instance we provision?
<sinzui> cherylj: 2. Just after the test script injects angsty Antelope, We change Xenial to supported or devel
<cherylj> sinzui: 3 - we assign these bugs POST HASTE!
<cherylj> alright wallyworld and thumper, time to pony up some bodies to fix unit tests :)
<thumper> wallyworld: how do I set the newly added user as the default user?
<sinzui> cherylj: I thnk merges/*ix unit tests are fixable in 30 minutes. I hope the window's unit tests don't need sucha a hack. I cannot think how they would think xenial is lts
<wallyworld> cherylj: redir is already doing a whole bunch :-)
<wallyworld> cherylj: and perrito666 fixes the windows ones
<cherylj> sinzui: that would probably be the fallback default series, which has not been updated to be xenial yet
<sinzui> cherylj: , No, first open soda and order child to bring ice cream, then fix test
<cherylj> check for shellfish
<wallyworld> thumper: SetCurrentAccount()
<cherylj> since your kids seem bent on killing you
<thumper> wallyworld: yeah... that didn't work
<cherylj> wallyworld: there are 7 bugs for different packages with failing tests
<cherylj> and that's just in juju/juju
<cherylj> I didn't test the other repos
<thumper> wallyworld: perhaps a quick hangout for a screenshare and you can tell me what I'm doing wrong
<wallyworld> cherylj: the guys have been told to pick stuff from the board and/or ask you, i'll ask them to prioritise unit test fixes
<wallyworld> thumper: ok
<thumper> cherylj: it seems like the no matching tools one should be a quick fix
<thumper> TBH I'm surprised there weren't xenial tools added to the tests before
<cherylj> thumper: some might be solved by changing testing/environ.go:const FakeDefaultSeries = "trusty" to "xenial"
 * thumper nods
<cherylj> but not all
<thumper> wallyworld: I'm in the hangout
<wallyworld> cherylj: can we tag the cards with "unit test" so they are easily identified?
<wallyworld> there's a lot of cards on that board
<cherylj> wallyworld: sure.  These are the only critical cards in tech-debt
<mup> Bug #1572798 changed: list-clouds or show-cloud does not show default region <juju-core:Invalid> <https://launchpad.net/bugs/1572798>
<wallyworld> cherylj: they are the ones i've already asked redir to look at
<wallyworld> the LTS ones
<sinzui> cherylj: I think changing the born date to 2016-05-21 will work. DO you agree
 * cherylj checks locally
<cherylj> sinzui: yes that works
<sinzui> :)
<cherylj> if by "born date" you mean "release"
<cherylj> same thing
<mup> Bug #1572798 opened: list-clouds or show-cloud does not show default region <juju-core:Invalid> <https://launchpad.net/bugs/1572798>
<sinzui> cherylj: I will send an update to all the slaves. It will take about 20 minutes. We might se everything start to work then
<cherylj> tyvm, sinzui
 * redir is eod
 * thumper read that initially as redir is dead
<redir> hehe
<wallyworld> redir: i was just letting cherylj know we did assign  resources to fixing the LTS test issues and weren't ignoring her :-)
<wallyworld> redir: didn't expect you to do any more work past EOD
<redir> cherylj: wallyworld I'll have a qq about those tomorrow. I modified stuff to work with xenial and got all but 2 passing in cmd/jj/service
<redir> wallyworld: I'm not
<redir> :)
<cherylj> redir: I'll be here, just ping me
<redir> cool
<cherylj> redir: we should make sure it works with trusty too (as we're updating the merge bot to forget about xenial for now)
<cherylj> and hopefully we'll save ourselves pain for the next lts
<wallyworld> cherylj: it can't really work with both at once
<wallyworld> we need to choose
<redir> cherylj: I modified environs/config/config.go to say xenial
<cherylj> wallyworld: I think it would be a case of checking not against a hard coded series, but rather against the default series
<mup> Bug #1572798 changed: list-clouds or show-cloud does not show default region <juju-core:Invalid> <https://launchpad.net/bugs/1572798>
<redir> and updated tests. THey worked (execpt 2) then reverted environs... and they still passed.
<wallyworld> cherylj: that's where is disagree - tests should use hard coded values
<wallyworld> cherylj: so that when stuff changes, we are forced to fix the tests
<cherylj> and we do this every two years
<cherylj> ok
<wallyworld> that's just IMHO
<wallyworld> others may disagree
<redir> but I think the multi series test charm code is looking for precise or trusty.
<wallyworld> redir: that's because the test charm data contains precise and trustyu
<wallyworld> that data doesn't come from lts
<redir> I think it should use the value of the system the tests are running on, wallyworld, cherylj $.02:)
<wallyworld> from from the metadata.yaml for the test charm
<redir> wallyworld: I figured that but hadn't found where yet.
<redir> wallyworld: of course:|
<wallyworld> redir: testcharms/charm-repo
<redir> wallyworld: thanks
<wallyworld> redir: that charm metadata doesn't necessarily need to include xenial - it could, but not strictly necessary to get the current tests passing as that's not a root cause issue IIANM
<thumper> wallyworld: this is so terrible
<thumper> wallyworld: dumped the yaml, and there is "kontroller" and "local.kontroller"
<redir> wallyworld: if you have a minute i wouldn't mind you vetting what I am doing, before I do it all over the place...
<wallyworld> redir: sure, jump in to https://plus.google.com/hangouts/_/canonical.com/juju-core-team
<thumper> axw: it is your fault
<thumper> jujuclient/file.go line 541
<thumper> apparently we shouldn't have multiple users using one controller
<axw> thumper: you can have multiple users, just not logged in at once
<axw> thumper: from the same client
<sinzui> cherylj: The update is in place and I have requeued alexis's and mgz's merges
<cherylj> thanks, sinzui!
<thumper> axw: the code says no
<thumper> see comment on line 533
<axw> thumper: this is client-side. you can logout and login
<axw> thumper: the reason for the change was to avoid accidentally leaving your client logged into a controller
<thumper> all our juju conn suite tests use the admin model...
<axw> thumper: can you just remove the admin user and then add your new user?
<thumper> how?
<thumper> how do I programatically logout?
<sinzui> axw: you have accidentally voluneered to prove we can delay xenial's release by one month in the tests. I will replay your lp1571832-tools-layered merge if needed
<axw> thumper: RemoveAccount(controller, "admin@local")
<axw> sinzui: wat?
<mwhudson> how do you guys stop goimports deleting "gopkg.in/mgo.v2" all the time?
<cherylj> axw: nothing is merging because unit tests are failing now that xenial is the lts
<cherylj> axw: so we're hacking the merge bot to think that it's not release day for xenial
<sinzui> axw: the juju test suite fails if xenial is released, and it is in some timezones. I have a hack in place to prevent the tests from seeing xenial's release. Your branch got in before mgz's
<axw> cherylj sinzui: I'm confused, did I do something wrong? cancel my job if you need to
<cherylj> axw: no :)
<axw> okey dokey
<cherylj> axw: you're just our first victim to see if sinzui's hack is working
<sinzui> axw, you did nothing wrong, you just got into the queue first
<axw> ah, heh :)
 * axw squeaks like a guinea pig
<cherylj> sinzui: so are the i386 tests gone now?
<sinzui> cherylj: yes they are :)
<cherylj> yay!
<cherylj> because I'm fixing the arm64 tests at the expense of i386
<cherylj> :)
<axw> mwhudson: I've been meaning to hack goimports to be aware of context. there's a TODO in the code to prefer import paths that are used in other files in the same package IIRC
<axw> mwhudson: it's a PITA
<mwhudson> axw: grumble
<natefinch> mwhudson: is there a different mgo package that it's choosing?  What I do is just remove the incorrect package from my gopath
<mwhudson> natefinch: i don't think so
<natefinch> mwhudson: when is it deleting the import?  I sometimes get the problem where I manually add the import before I add the code, and then save, and goimports runs and removes the unused import. :/
<mwhudson> natefinch: on any save
<mwhudson> natefinch: i think it's because the package name is different from the last component of the import path
<mwhudson> but it's also possible i'm holding it wrong
<natefinch> mwhudson: hmm.. weird.  I've done a lot of work using gopkg.in import paths, and it seems to work ok
<mwhudson> hmm
<natefinch> mwhudson: just double checked and it adds the right import for mgo.v2 if it's missing, and won't remove it if it's there (and used).
<mwhudson> natefinch: huh, ok, good to know
<mwhudson> now why is it not working for me ? :(
<natefinch> is it an old version of goimports?  maybe there was a bug fix at some point
<mwhudson> i just rebuilt it
<mwhudson> although let's rebuild again just to be sure
<mwhudson> ok works now, my rebuild must not have actually rebuilt it ?!
<mwhudson> anyway
<mwhudson> sorry for the noise
<natefinch> glad it's working :)
<mwhudson> no wait, no it's not :(
<natefinch> aww crap
<mwhudson> natefinch: aah works outside of emacs
 * mwhudson updates things
<mwhudson> bah doesn't help
<thumper> fuck fuck fuckity fuck
 * thumper bitches some more
<cherylj> can I get a review?  http://reviews.vapour.ws/r/4669/
<natefinch> cherylj: can you slap a comment about the i386 about the fact that we don't support it anymore?  I wouldn't know that by looking at that code.
<natefinch> above the i386 that is
<natefinch> cherylj: otherwise, ship it :)
<sinzui> sorry axw my fix worked, but your merge had one failure in cloudinitSuite.TestWindowsCloudInit
<axw> sinzui: yeah, I forgot to update a test, just rescheduled it
<axw> sinzui: sorry for holding up your fix
<thumper> hmm...
<mwhudson> cherylj, natefinch: powerpc seems a pretty safe "juju will never ever support" arch
<thumper> oh fark...
 * thumper headdesks again
 * thumper headdesks again
 * thumper headdesks again
<natefinch> mwhudson: it's really just a string.  If we want, we could use arch=abcdef ... but I'm guessing that might fail elsewhere.... i386 is something we can parse as a valid arch that we don't support.
<cherylj> natefinch: yeah, that's the thing.  It has to be a valid arch to juju (as defined in juju/utils/arch), but not one we support
<cherylj> and it was decreed yesterday that we are not building for i386 anymore
<cherylj> yay
<natefinch> cherylj: it's honestly kind of funny that we even have arches defined that we don't support.  You'd think we could just remove those and then... problem solved.
<natefinch> weird, somehow half the tests in this suite aren't getting run
 * thumper headdesks
 * thumper headdesks
 * thumper headdesks
 * thumper headdesks
 * thumper headdesks
<mwhudson> thumper: having fun?
<thumper> mwhudson: no, why?
<mwhudson> thumper: just a suspicion i had
<thumper> mwhudson: you must be very astute
<natefinch> lol
<mwhudson> thumper: well i'm running juju tests so i have lots of time to be observant
<mwhudson> oh um, this doesn't seem like good news
<mwhudson> with juju-mongodb (i.e. 2.4):
<mwhudson> PASS: allwatcher_internal_test.go:421: allWatcherStateSuite.TestChangeAnnotations	0.309s
<mwhudson> with 3.2:
<mwhudson> PASS: allwatcher_internal_test.go:421: allWatcherStateSuite.TestChangeAnnotations	24.110s
<thumper> FARK!!!
<thumper> wow
<thumper> shit
 * thumper passes that to wallyworld
<natefinch> told you
<wallyworld> yep, horatio has highlighted that
<mwhudson> ah ok cool
<wallyworld> it sucks, we need to figure out why exactly
<thumper> wallyworld: http://reviews.vapour.ws/r/4670/
<wallyworld> ok
<thumper> wallyworld: it has a drive by cleanup I couldn't let lie
<thumper> wallyworld: the important bit was that sharing models currently only done by controller admins
<thumper> which is crazy
<thumper> but that's what we have
<thumper> but read only access to the controller model made you a controller admin
<thumper> which I've fixed
<thumper> in this branch
<wallyworld> good :-)
<wallyworld> thumper: so what happened to all that angst that the grant/revoke methods were on the wrong facade?
<thumper> that is a different issue
<thumper> that's still fucked
<thumper> but we are no more fucked than before
<thumper> but more unfucking needed
 * thumper is emailing...
<wallyworld> so the root issue here was a Find() was missing a condition
<thumper> wallyworld: for admins, yes
<thumper> we didn't have readonly before
<natefinch> oh FFS
<axw> wallyworld: https://github.com/juju/juju/pull/5244  -- fixes critical issue
<wallyworld> sure
<wallyworld> axw: oh, i thought we had that filtering where needed
<axw> wallyworld: so did I ...
<wallyworld> is this a regression?
<axw> wallyworld: apparently only for LXC
<axw> I don't know
<wallyworld> right ok, lxc in 1.25/2.0 worked
<wallyworld> but not lxd
<axw> wallyworld: not sure if we were ever doing the right thing for KVM, seems unlikely that we'd have changed it
<wallyworld> kvm never got much love
<wallyworld> on non amd64
<wallyworld> axw: i just asked for an extra test
<axw> wallyworld: ok, thanks
<dimitern> mgz: ping
<dimitern> frobware: ping
<TheMue> morning
<dimitern> TheMue: o/
<TheMue> voidspace: twittering about management experiences ;)
<voidspace> TheMue: :-)
<dooferlad> dimitern, frobware, voidspace: standup?
<voidspace> dooferlad: omw
<voidspace> dimitern: so on a maas controller with extra interfaces juju bootstrap fails due to a schema error in gomaasapi
<voidspace> dimitern: the way networks are returned fails thumper's json validation code :-)
<voidspace> dimitern: so a fix in gomaasapi needed
<voidspace> I'm looking :-)
<voidspace> good to find these things out I guess
<dimitern> voidspace: oh, what's the error?
<voidspace> dimitern: ERROR failed to bootstrap model: cannot start bootstrap instance: cannot run instances: cannot run instance: space 0: subnet 3: subnet 2.0 schema check failed: dns_servers: expected list, got nothing
<voidspace> dimitern: it looks like it can't get all the subnet info - which is possibly a problem on the configuration
<voidspace> dimitern: but the controller is running fine, so gomaasapi shouldn't refuse to talk to it
<dimitern> voidspace: ah! it already is so much better at figuring out how exactly maas api changed
<dimitern> voidspace: yeah, the dns_servers should be optional
<voidspace> dimitern: yeah, but the old approach was tolerant against schema changes like that ;-)
<voidspace> oh the good old days
<dimitern> :D
<frobware> dimitern, voidspace, dooferlad: morning - running late today :/
<dimitern> frobware: morning o/
<dimitern> fwereade: are you around?
<fwereade> dimitern, o/
<frobware> dimitern: have you commissioned a node in MAAS with xenial recently?
<dimitern> frobware: not since BOW IIRC
<dimitern> fwereade: do you have 10m or so for a quick HO?
<frobware> dimitern: I have no working nodes this morning... :(
<dimitern> fwereade: context: I'm almost done with refactoring state ports to associate them with subnets, rather than networks
<dimitern> frobware: I'll try recommissioning one of the nucs now, as I've upgraded to 2.0 beta3 this morning
<fwereade> dimitern, sure, joining juju-sapphire
<dimitern> fwereade: omw
<axw> frobware: thanks for the review. I don't think there's much point in tacking on the type to the name, since you'll never see the name out of the context of the type anyway
<axw> frobware: unless space awareness would change that ...
<axw> frobware: would private networks like these be exposed to the user?
<frobware> axw: my comment was more from a debugging standpoint should you want to identify these "things" separately.
<axw> frobware: ok then, I guess I can buy that. I'll change it tomorrow
<axw> thanks
<babbageclunk> voidspace: after all the twitter harrassment, you got the day wrong again ;)
<babbageclunk> voidspace: anything I can do to help with the bootstrap issue?
<voidspace> babbageclunk: I just haven't corrected it yet :-p
<voidspace> babbageclunk: I'm on it, thanks
<babbageclunk> voidspace: ok, I'll keep going through the tests <sigh>
<voidspace> babbageclunk: hah
<voidspace> babbageclunk: you could test my branch deploying a container in the standard case
<voidspace> babbageclunk: I can't test due to fixing gomaasapi
<babbageclunk> ok
<babbageclunk> voidspace: what's the standard case/
<babbageclunk> ?
<voidspace> babbageclunk: add a machine (used to be juju add-machine)
<voidspace> babbageclunk: then juju deploy wordpress --to lxc:1
<voidspace> or whatever the number of the new machine is
<voidspace> might be 0 if you've not switched to admin, not sure
<babbageclunk> voidspace: ok cool - trying that now
<voidspace> babbageclunk: standard case is without extra subnets / nics which is what I was attemptin
<voidspace> *attemptin
<voidspace> what the hell
<babbageclunk> voidspace: nice correction
<voidspace> *attempting
<voidspace> a failed attemp
<dimitern> frobware: that nuc I tried to recommission failed\
<dimitern> frobware: telltale sign from the rackd.log: http://paste.ubuntu.com/15962619/
<frobware> dimitern: I'm totally broken too...
<dimitern> frobware: it's even reported: https://bugs.launchpad.net/maas/+bug/1389811
<mup> Bug #1389811: tgtadm: out of memory crash <crash> <oil> <MAAS:Triaged> <https://launchpad.net/bugs/1389811>
<babbageclunk> Man that is really annoying - sometimes my machine just freezes for about 2 minutes.
<babbageclunk> I think it's journald freezing.
<dimitern> babbageclunk: that used to happen quite often with all of my vmaas setups in lxc, all of which also bind-mount my $HOME inside the container
<frobware> dimitern: so I see another error to: 'sh initctl - not found'
<babbageclunk> dimitern: encrypted home?
<frobware> dimitern: I cannot persuade the UI to use trusty for commissioning
<dimitern> babbageclunk: what was observable is ssh starting to fail to connect; sudo systemctl restart logind resolves this for a while; haven't see in happen recently though
<frobware> dimitern: does commissioning in 1.9 work?
<frobware> dimitern: should you have the inclination and bandwidth...
<dimitern> frobware: have you tried changing the release used for commissioning in the UI Settings tab?
<frobware> dimitern: yep, it won't let me. :/
<frobware> dimitern: I tried fiddling with maas $PROFILE boot-sources ...
<dimitern> frobware: perhaps it's by design - i.e. not allowing commissioning with earlier releases than the controller
<frobware> dimitern: and I removed 16.04 from my images, leaving 14.04 but I still cannot select 14.04
<babbageclunk> frobware, voidspace, dimitern: sometimes when I start my maas controller KVM the maas admin isn't accessible - apache's running but going to /MAAS just spins
<babbageclunk> frobware, voidspace, dimitern: any suggestions?
<frobware> babbageclunk: how much ram did you allocate for the maas server?
<babbageclunk> frobware: 1G
<frobware> babbageclunk: also, cd /var/log/maas and take a look
<frobware> babbageclunk: I did that for a while, but I think it needs a little more, particularly when re-importing the images
<dimitern> frobware: no idea why it doesn't allow you :/ - anything in /var/log/maas/ ?
<dimitern> frobware: 2GB seems to be recommended
<babbageclunk> frobware, dimitern: "Can't update service statuses, no RPC connection to region."
<babbageclunk> I can ping google, for example, so there's network connection out.
<voidspace> babbageclunk: check the log?
<babbageclunk> dimitern, frobware: ok, can I just bump it up in VMM, or do I need to recreate.
<babbageclunk> ?
<voidspace> babbageclunk: 1G may not be enough
<voidspace> babbageclunk: bumping it in the VM should be fine
<frobware> babbageclunk: power it down, bump the value via the virt-manager interface for the node
<babbageclunk> Found a promising sounding bug from the log messages.
<dimitern> babbageclunk: try sudo dpkg-reconfigure maas-region-controller, then maas-region-api, then maas-rack-controller (in that order)
<babbageclunk> ok, I can see messages in the rack controller log complaining the region isn't responding.
<babbageclunk> dimitern: ok, will do
<frobware> dimitern: so I reinstalled maas (gah!) and just added 14.04 images but that is not enough. the "deploy" dropdown is populated with 14.04 but the commissioning dropdown is empty. looks like 16.04 or bust.
<dimitern> frobware: I suspect you need to wipe /var/lib/maas/boot-resources/* and re-import the images to be sure it's clean
<frobware> dimitern: I did a clean VM install. :)
<dimitern> frobware: oh :/ I see
 * frobware productively heads for lunch :/
<babbageclunk> Ok, poking those helped - not sure which bit, presumably the reconfigures restarted things - thanks guys!
<frobware> dimitern: yep, import 16.04 gives me the only boot image.
<dimitern> frobware: fwiw I can see both trusty and xenial in the dropdown to choose release for commissioning
<frobware> dimitern: 1.9.1?
<dimitern> maybe upgrading from 1.9 works better than fresh installation
<dimitern> frobware: no, 2.0.0 (beta3+bzr4941)
 * frobware really heads for lunch
<babbageclunk> voidspace: still bootstrapping, haven't had a chance to add a container yet.
<voidspace> babbageclunk: heh, ok
<voidspace> babbageclunk: good news about your oyster card (etc)
<voidspace> babbageclunk: gives you hope for humanity
<babbageclunk> voidspace: yeah! And such a nice postcard as well"
<babbageclunk> !
<babbageclunk> voidspace: please no monitoring of typing speed
<voidspace> babbageclunk: ah, I missed that. Nice.
<voidspace> babbageclunk: I'll email you the keylogger
<voidspace> ;-)
<babbageclunk> voidspace: bootstrap seemed to have hung after "installing package: tmux"
<babbageclunk> voidspace: killed it and kicked it off again
<babbageclunk> voidspace: any way to get debug logging before the controller is bootstrapped?
<voidspace> babbageclunk: anything in the logs?
<voidspace> babbageclunk: installing to container or machine?
<babbageclunk> voidspace: locally or on the node?
<voidspace> babbageclunk: on the node
<voidspace> babbageclunk: installing directly to the node, or to a container on the node?
<babbageclunk> voidspace: I couldn't ssh in - where's the key?
<babbageclunk> voidspace: still bootstrapping, haven't gotten to the add-machine part.
<voidspace> babbageclunk: oh I see - as part of the bootstrap, I'd forgotten it installed tmux
<voidspace> babbageclunk: uhm, not sure - dimitern ^^^ (where's the key to ssh in)
<babbageclunk> voidspace: yeah, what's that about - just so it's there?
<dimitern> voidspace, babbageclunk: juju ssh keys are in ~/.local/share/juju/ssh/
<babbageclunk> dimitern: ah, ok - will try using those to snoop around if it gets "stuck" again.
<dimitern> you could try ssh -i <juju_rsa> ubuntu@<what-maas-shows-as-ip>
<dimitern> while it's still bootstrapping
<dimitern> and if it appears hung, might be just slow to upd / inst the packages; i.e. tail -f /var/log/cloud-init-output.log once in
<dimitern> babbageclunk: ^^
<babbageclunk> dimitern: awse, thanks
<babbageclunk> ok, looks like it's actually doing things with tools.
<babbageclunk> voidspace, dimitern: was there some breakage around upload-tools recently?
<voidspace> babbageclunk: there
<dimitern> babbageclunk: oh yeah - but it's fixed now (2 days ago sometime)
<voidspace> babbageclunk: oops, there was - but it should be fixed
<dimitern> babbageclunk: bootstrap --upload-tools failed almost right away for me before the fix
<dimitern> babbageclunk: or you mean the tools info mismatch (new thing since yesterday IIRC) ?
<dimitern> that happens for lxd containers due to some arch mismatch between host and container
<babbageclunk> dimitern: no, I was thinking of the one earlier this week.
<babbageclunk> dimitern, voidspace: nothing happening in the cloud-init output log after this:
<babbageclunk> 454a97ab1b6bc43e28bb149813edfeec31eceda09b86b5474cca31712528f163  /var/lib/juju/tools/2.0-rc1.1-xenial-amd64/tools.tar.gz
<dimitern> babbageclunk: anything in /v/l/syslog?
<babbageclunk> bootstrap still seems stalled after installing tmux (log says that was successful/already installed)
<voidspace> I'll try
<voidspace> babbageclunk: is this master or my branch
<babbageclunk> dimitern: nothing suspicious looking. last message is: Apr 21 11:54:06 undisabled-miya systemd[1]: Started Cleanup of Temporary Directories.
<babbageclunk> dimitern: (I presume that's in UTC)
<babbageclunk> voidspace: yours
<dimitern> babbageclunk: any zombie processes?
<voidspace> dammit, I can't bootstrap because of my maas configuration issue
<voidspace> will leave it for the moment, sorry
<babbageclunk> dimitern: not that I can see
<babbageclunk> dimitern: any other logs I should look for?
<dimitern> babbageclunk: can you paste the current /v/l/c-i-output.log ?
<ashipika> can anybody help? i'm getting: WARNING cannot delete default security group: cannot retrieve default security group: "juju-3884d5ab-1317-420a-8acc-fb8bceb7d70a": The security group 'juju-3884d5ab-1317-420a-8acc-fb8bceb7d70a' does not exist in default VPC 'vpc-f34a3496' (InvalidGroup.NotFound)
<ashipika> ERROR failed to bootstrap model: cannot start bootstrap instance: no "xenial" images in us-east-1 with arches [amd64]
<babbageclunk> dimitern: http://pastebin.ubuntu.com/15964229/
<babbageclunk> dimitern: hey, it just finished!
<voidspace> dimitern: babbageclunk: confirmed to work https://github.com/juju/gomaasapi/pull/44
<dimitern> babbageclunk: yeah, it looks like it's just slow - a lot of published upgrades to apply
<dimitern> voidspace: looking
<voidspace> dimitern: the error was "got nothing"
<dimitern> voidspace: on my maas 2 I can see `"dns_servers": []` for some of the subnets
<voidspace> dimitern: that was the case that was already tested and worked
<voidspace> dimitern: ah, indeed - it is null
<mup> Bug #1567296 changed: Plugin API fails with multiple juju binaries <juju-core:Fix Released> <https://launchpad.net/bugs/1567296>
<voidspace> dimitern: I will update and change to null instead
<voidspace> dimitern: test still passes
<dimitern> voidspace: thanks!
<voidspace> dimitern: pushed
<dimitern> voidspace: it passes because go's json package allows that :)
<voidspace> dimitern: the test uses the schema - so we're testing the schema not the json package
<voidspace> dimitern: unless you meant something else
<voidspace> the problem was the over strict schema
<voidspace> without my change to the schema the test fails
<voidspace> dimitern: thaanks
<dimitern> voidspace: yeah, layers upon layers.. now LGTM though :)
<voidspace> lunch
<mup> Bug #1573020 opened: upgrade-charm --path does not recognize dot <juju-core:New> <https://launchpad.net/bugs/1573020>
<dimitern> frobware: I managed to fix my nucs and now commissioning passes ok
<frobware> dimitern: see my conversation on #maas
<dimitern> frobware: fwiw I struggled about with this bug 1389811 until now, added comment how managed to resolve it
<mup> Bug #1389811: tgtadm: out of memory crash <crash> <oil> <MAAS:Triaged> <https://launchpad.net/bugs/1389811>
<dimitern> frobware: (reading #maas) - nice! good catch
<babbageclunk> voidspace, dimitern - trying to deploy to a container I get this error;
<babbageclunk> ERROR storing charm for URL "cs:trusty/ubuntu-7": cannot retrieve charm "cs:trusty/ubuntu-7": cannot get archive: Get https://api.jujucharms.com/charmstore/v5/trusty/ubuntu-7/archive?channel=stable: dial tcp: lookup api.jujucharms.com on 192.168.150.2:53: read udp 192.168.150.4:37104->192.168.150.2:53: i/o timeout
<dimitern> babbageclunk: yeah, no xenial versions of the charm yet
<dimitern> babbageclunk: but never worry :) try e.g. juju deploy ubuntu --series xenial --force --to lxd:0
<babbageclunk> Ok - trying that, thanks
<babbageclunk> dimitern: nope, same error.
<voidspace> babbageclunk: try deploying wordpress instead
<voidspace> maybe the same problem
<dimitern> yeah
<babbageclunk> dimitern: hmm, maybe it can't get to t'internet?
<babbageclunk> dimitern, voidspace: ok, trying that
<dimitern> babbageclunk: try with --debug?
<babbageclunk> all these fun new options I'm learning"
<babbageclunk> !
<dimitern> babbageclunk: well, I'd first try just add-machine lxd:0 to make sure that works, comes up with expected NICs, dns works, egress to internet works
<mup> Bug #1573055 opened: Latest modification to https://streams.canonical.com/juju/tools/streams/v1/com.ubuntu.juju-released-tools.json  makes juju bootstrap fail <bootstrap> <juju-core:New> <https://launchpad.net/bugs/1573055>
<dimitern> babbageclunk: once all the above works (at least a couple of times :), you're almost there
<babbageclunk> dimitern: looks like DNS from that machine isn't working.
<voidspace> This VLAN is the default VLAN in the fabric, can't delete. Can't delete this fabric it has VLANs attached.
<dimitern> babbageclunk: ah, well - what's in /etc/resolv.conf on the host and the container?
<dimitern> voidspace: first delete the vlans
<babbageclunk> dimitern: resolv.conf looks right, points to the maas controller
<dimitern> babbageclunk: can you ping the dns IP from the container?
<babbageclunk> Hang on, I'm not in the container yet - only have a machine.
<voidspace> dimitern: can't delete the VLAN it's the default in the fabric.
<dimitern> voidspace: what are you trying to do? get rid of extraneous fabric?
<dimitern> babbageclunk: what does sudo lxc list show?
<babbageclunk> Hmm, didn't have DNS forwarding turned on in my MAAS 2 controller. Added that and now pinging things on the internet works.
<voidspace> dimitern: to see if I can bootstrap if I go back to the previous configuration :-/
<voidspace> because right now it won't bootstrap
<babbageclunk> Don't quite understand how the bootstrap worked given that.
<voidspace> deleting the NICs from the VM worked
<voidspace> well, got rid of the fabrics anyway
<dimitern> good :)
<voidspace> no, they're still there - but not listed on the controller any more - which may be enough
<mup> Bug #1573055 changed: Latest modification to https://streams.canonical.com/juju/tools/streams/v1/com.ubuntu.juju-released-tools.json  makes juju bootstrap fail <bootstrap> <juju-core:Invalid> <https://launchpad.net/bugs/1573055>
<babbageclunk> voidspace, dimitern: Ok, from juju debug-log, I'm getting this error:
<babbageclunk> voidspace, dimitern: actually, these errors: http://pastebin.ubuntu.com/15966208/
<voidspace> babbageclunk: try lxc instead of lxd - I don't trust lxd :-)
<voidspace> (yet)
<babbageclunk> voidspace: ok
<voidspace> babbageclunk: ah, there's an interesting error though
<voidspace> schema check failed
<babbageclunk> voidspace: also, this bit seems cromulent: device 2.0 schema check failed: interface_set
<voidspace> looks like interface_set needs to be optional too
<babbageclunk> voidspace: snap
<babbageclunk> ok, trying lxc instead
<voidspace> babbageclunk: see this PR and do the same for interface set in gomaasapi https://github.com/juju/gomaasapi/pull/44
<dimitern> babbageclunk: it can be useful to try what juju does from the maas cli?
<voidspace> to look at interface_set, "machines read" should be enough
<voidspace> and find the place with null :-/
<babbageclunk> dimitern: yeah, that sounds good
<mup> Bug #1573055 opened: Latest modification to https://streams.canonical.com/juju/tools/streams/v1/com.ubuntu.juju-released-tools.json  makes juju bootstrap fail <bootstrap> <juju-core:Invalid> <https://launchpad.net/bugs/1573055>
<dimitern> babbageclunk: see here (if you haven't) http://paste.ubuntu.com/15966260/
<voidspace> *phew* bootstrap is slow but it worked this time
<voidspace> well, is getting further anyway
<babbageclunk> dimitern: I haven't - what is that?
<dimitern> babbageclunk: that's my test-bed script which I used as PoC how to do the multi-nic devices
<dimitern> it's fairly messy, opinionated, and not portable I'm afraid
<dimitern> but it implements the necessary API calls in the right order
<babbageclunk> dimitern: useful as a starting point, thanks!
<babbageclunk> voidspace: deploying to lxc is working well.
<voidspace> babbageclunk: cool, that means AllocateContainerAddresses maybe basically works...
<babbageclunk> although I still see the same schema error in the logs.
<babbageclunk> kind of neat that it keeps going in spite of that.
<babbageclunk> Ok, I'll fix that in the gomaasapi schema
<dimitern> babbageclunk, voidspace: the schema error like any other issues during PrepareContainerInterfaceInfo() are simply ignored
<babbageclunk> dimitern: nice
<dimitern> so to truly verify whether it worked, you need at minimum: 1) be able to ssh to the host, 2) sudo lxc exec juju-machine-0-lxd-0 bash, 3) ping bbc.co.uk, 4) ip route show, 5) cat /e/n/i.d/00-juju.cfg
<dimitern> also helps to check sudo lxc list - it should show multiple NICs there
<mup> Bug #1573055 changed: Latest modification to https://streams.canonical.com/juju/tools/streams/v1/com.ubuntu.juju-released-tools.json  makes juju bootstrap fail <bootstrap> <juju-core:New> <https://launchpad.net/bugs/1573055>
<babbageclunk> dimitern: is that to me?
<dimitern> babbageclunk: yeah - just sharing how I verify usually
<babbageclunk> I couldn't deploy to an lxd - it failed with the simplestreams error.
<babbageclunk> What would the equivalent be for lxc?
<dimitern> babbageclunk: ah, sorry - well, lxc should work just the same (only slower to come up due to the initial cloning, etc.)
<dimitern> babbageclunk: so juju add-machine lxc:0
<babbageclunk> dimitern: How do I list LXC containers? The container doesn't show up on the host node under lxc list.
<dimitern> babbageclunk: sudo lxc-ls -f
<babbageclunk> dimitern: thanks
<voidspace> deploying to lxc I get this in the logs (i.e. it doesn't work):
<voidspace> machine-0: 2016-04-21 14:50:24 WARNING juju.provisioner lxc-broker.go:115 failed to prepare container "0/lxc/0" network config: unexpected: ServerError: 500 INTERNAL SERVER ERROR (Subnet matching query does not exist.)
<dimitern> voidspace: it might help setting the logging-config to '<root>=TRACE' to get (a lot more) context
<voidspace> yeah
<babbageclunk> dimitern: ok, and using lxc-attach bash I can get into the container. I can ping bbc, and there are routes (not exactly sure what they should be). There's no 00-juju.cfg in /e/n/i.d (which will henceforth be known as enid).
<voidspace> enid :-)
<babbageclunk> But this isn't a multi-nic situation, so maybe that's to be expected.
<dimitern> babbageclunk: lxc-attach might be misleading btw
<dimitern> babbageclunk: I'd use 'sudo lxc-attach -n juju-machine-0-lxc-0 -e bash' to be sure you're in the container ns and not the host
<dimitern> babbageclunk: check /var/log/cloud-init-output.log while inside the container for any issues
<katco> ericsnow: standup time
<babbageclunk> dimitern: all looks pretty good - at least, there's a nice looking finished message
<dimitern> babbageclunk: from cloud-init?
<babbageclunk> dimitern: yup
<babbageclunk> Cloud-init v. 0.7.5 finished at Thu, 21 Apr 2016 14:40:19 +0000. Datasource DataSourceNoCloudNet [seed=/var/lib/cloud/seed/nocloud-net][dsmode=net].  Up 12.0 seconds
<dimitern> babbageclunk: so it seems to be working inside the lxc?
<dimitern> babbageclunk, voidspace, fwereade: updated http://reviews.vapour.ws/r/4656/ and I'd appreciate one last look
<dimitern> I suppose it's to be expected to start seeing "expected series 'trusty', obtained 'xenial'" test failures on master now
<dimitern> more of those: adding new machine to host unit "django/0": cannot assign unit "django/0" to machine 0: series does not match
<mup> Bug #1573099 opened: Juju Register not test / script friendly; requires console interaction <juju-core:New> <https://launchpad.net/bugs/1573099>
<dimitern> babbageclunk, voidspace, fwereade: any objections to land http://reviews.vapour.ws/r/4656/ ?
<fwereade> dimitern, I'm afraid I can't devote proper attention to it today :(
<babbageclunk> dimitern: I've already said I love it! Unfortunately I'm not valid yet. :(
<dimitern> fwereade: np
<dimitern> babbageclunk: cheers - so we're waiting on voidspace and/or frobware :)
<frobware> dimitern: looking now
<dimitern> frobware: thanks!
<frobware> dimitern: if you are using spaces don't charms implicitly express which subnet they refer to? (I was reading the commit message)
<dimitern> frobware: yeah, but charms see addresses as well
<dimitern> frobware: and they're more important for opening ports than spaces (if the charm even cares)
<dimitern> frobware: did that make sense ? :)
<frobware> dimitern: I got sidetracked with the diff
<dimitern> frobware: ah, yeah it's a bit big, but at least makes things more consistent around migrations
<frobware> dimitern: why did we go from s/0:juju-public/0:/ ?
<dimitern> frobware: juju-public was the hardcoded name of the supposedly default public network, now it's an empty subnet CIDR instead
<frobware> dimitern: so 0: represents the CIDR?
<dimitern> frobware: it can be also non-empty (tests verify that), but no user(charmer)-facing way to do it
<babbageclunk> voidspace, frobware, dimitern: review please? https://github.com/juju/gomaasapi/pull/45
<dimitern> frobware: the colon separates machine ids and subnet cidrs, so the 0 is the machine id
<dimitern> babbageclunk: looking
<frobware> dimitern: ok... wasn't obvious to me
<dimitern> frobware: it's explained in the doc comment of state.WatchOpenedPorts
<dimitern> oops, now I see that comment needs updating as well
<dimitern> babbageclunk: actually, when can we have a device without interface_set ?
<babbageclunk> dimitern: That's what I see in my MAAS now
<dimitern> babbageclunk: I can't see that happening using the maas cli manually: http://paste.ubuntu.com/15967504/
<dimitern> babbageclunk: can you try that on your maas (the parent needs to match one of your nodes ofc)
<babbageclunk> http://pastebin.ubuntu.com/15967528/
<babbageclunk> oh, sorry - I didn't do the create
<babbageclunk> hang on
<dimitern> babbageclunk: ah! the warning might be causing that
<dimitern> babbageclunk: try 'maas refresh' to update the api spec cached by the maas cli client
<babbageclunk> dimitern: ok, did that - still no interface_set
<dimitern> babbageclunk: try 'devices create' instead of 'devices read'
<dimitern> might be different
<babbageclunk> dimitern: just tried that, still the same.
<babbageclunk> man, I love the name generator: frondescent-rosalie
<dimitern> :)
<dimitern> babbageclunk: hmm - what does 'maas maas2 version read -d' return?
<babbageclunk> although, I'm still getting the warning
<babbageclunk> http://paste.ubuntu.com/15967646/
<dimitern> babbageclunk: 'maas refresh' won't work (at least used to) if any of the profiles' urls you have ('maas list') are unaccessible
<babbageclunk> dimitern: only the one in there
<dimitern> babbageclunk: uuh - rather old
<dimitern> babbageclunk: 2.0.0 alpha3+bzr4810
<babbageclunk> dimitern: true - shall I upgrade and try again?
<dimitern> babbageclunk: please do, at least we've confirmed the response format between alpha3 and beta3 is different
<babbageclunk> dimitern: okeydoke, running now
<dimitern> babbageclunk: are you using the packages in the experimental3 ppa?
<babbageclunk> dimitern: I don't remember adding that - how'd I check - look in sources?
<dimitern> babbageclunk: apt-cache madison maas
<dimitern> it what I use
<babbageclunk> dimitern: madison is a weird subcommand name
<dimitern> babbageclunk: it had some good story behind it as well, but can't remember now :)
<dimitern> babbageclunk: you can: sudo add-apt-repository ppa:maas-maintainers/experimental3, then update and dist-upgrade
<dimitern> frobware: review poke (sorry to be a pest)
<frobware> dimitern: still looking at it
<dimitern> frobware: ok, take your time
<dimitern> frobware: I'll rebase the follow-up onto it and keep going for now
<frobware> dimitern: I'm almost done
<dimitern> frobware: sweet!
<babbageclunk> dimitern - my maas version still reports the old version - is there a command to restart the various daemons at one fell swoop?
<frobware> babbageclunk: reboot
<frobware> babbageclunk: it's all virtual, it's all quick
<babbageclunk> frobware: yeah - I guess that would work just as well
<babbageclunk> frobware: just feels a bit windowsy
 * frobware blushes
<babbageclunk> :(
<babbageclunk> Oops :)
<dimitern> babbageclunk: yeah, I usually reboot to ensure kernel updates etc. take effect
<dimitern> (with dist-upgrade esp.)
<frobware> dimitern: ah, always with dist-upgrade. always.
<frobware> dimitern: reviewed
<dimitern> frobware: thanks!
<babbageclunk> dimitern: now maas-cli is back to hanging. Do I need to do the dpkg-reconfigure every time?
<babbageclunk> Or is that because of the upgrade?
<frobware> dimitern, voidspace, babbageclunk: hopefully an easy one: http://reviews.vapour.ws/r/4673/
<dimitern> babbageclunk: what do you mean? it doesn't terminate?
<babbageclunk> dimitern: yeah - I run maas maas2 version read -d and have to ^C it
<babbageclunk> service maas-rackd status complains about the region controller being missing.
<dimitern> babbageclunk: that's not a good sign :/ how about the web ui?
<frobware> babbageclunk: re 'madison' - http://unix.stackexchange.com/questions/276037/why-apt-madison
<babbageclunk> service maas-regiond status says active (exited) with nothing in the log.
<dimitern> babbageclunk: hmm yeah - it seems dpkg-reconf might be in order
<babbageclunk> dimitern: Already tried, no dice
<babbageclunk> dimitern: I'mma reboot again
<dimitern> frobware: LGTM
<dimitern> babbageclunk: which package did you reconf?
<babbageclunk> dimitern: maas-region-controller and maas-rack-controller
<frobware> dimitern, babbageclunk, voidspace: need to drop early today...
<babbageclunk> Is there a way to make review board ignore any whitespace changes? gofmt changing the gaps for blocks of assignments makes it really hard to see what changed.
<voidspace> ericsnow: ^^^
<babbageclunk> Ooh, release drinks
<ericsnow> babbageclunk: there's a "Hide whitespace changes" toggle in the diff view, on the right below the list of files at the top
<ericsnow> babbageclunk: is that what you're talking about?
<mgz> weeelll... I now have a new lxd container failure
<mgz> working through them
<dimitern> babbageclunk: maas-region-api IIRC is the new name of the maas-region-controller? anyway - try reconf'g that
<dimitern> babbageclunk: :/ release issues I guess - sorry for suggesting the upgrade :(
<dimitern> babbageclunk: in anycase at least I verified on a working beta3 MAAS we have interface_set, and on your alpha3 we don't, so it's good to be flexible about that in gomaasapi for the time being; so lgtm on your https://github.com/juju/gomaasapi/pull/45
<bogdanteleaga> do we have the custom signed image stream thing merged?
<mup> Bug #1565044 changed: s390x unit tests fail because not tools for arch <ci> <jujuqa> <s390x> <test-failure> <juju-core:Fix Released by reedobrien> <https://launchpad.net/bugs/1565044>
<mup> Bug #1573136 opened: kill-controller is stuck <juju-core:New> <https://launchpad.net/bugs/1573136>
<dimitern> alexisb: ping
<alexisb> dimitern, pong
<alexisb> whats up?
<mup> Bug #1573148 opened: Juju terms language confusing for locally-deployed charms <juju-core:New> <https://launchpad.net/bugs/1573148>
<mup> Bug #1573149 opened: 'failed to ensure LXD image' creating LXD container <ci> <lxd> <juju-core:In Progress by tycho-s> <https://launchpad.net/bugs/1573149>
<perrito666> Going afk for a moment
<voidspace> how do I set the log level in juju 2, now there's no set-env
<natefinch> voidspace: set-model-config
<natefinch> voidspace: took me a while to figure that out too
<voidspace> natefinch: ok, thanks. set-model-config *what*?
<voidspace> ah, no
<voidspace> natefinch: never mind
<voidspace> thanks :-)
<natefinch> welcome :)
<ericsnow> natefinch: PTAL: http://reviews.vapour.ws/r/4675/
<natefinch> ericsnow: will do
<ericsnow> natefinch: ta
<natefinch> ericsnow: http://reviews.vapour.ws/r/4676/
<ericsnow> natefinch: k
 * perrito666 tries to discover why tomb is sudenly not working in go1.6 in windows
<natefinch> ericsnow: btw, an example of using the functionality added in that PR is here: http://reviews.vapour.ws/r/4677/
<mup> Bug #1557052 changed: Can't bootstrap a trusty controller: no registered provider for "lxd" <2.0-count> <go1.2> <lxd> <trusty> <juju-core:Fix Released> <https://launchpad.net/bugs/1557052>
<mup> Bug #1571131 changed: juju add-ssh-key $(cat ~/.ssh/a-key.pub) needs quoting in helptext <helpdocs> <juju-core:Fix Released by reedobrien> <https://launchpad.net/bugs/1571131>
<ericsnow> natefinch: FYI, I left a review of your core patch
<ericsnow> natefinch: I don't think we need that testing patch if we change the approach in core
<ericsnow> natefinch: (though it probably stands on its own merits)
<natefinch> ericsnow: cool, thanks
<ericsnow> natefinch: np
<perrito666> rogpeppe1: have a moment?
<mup> Bug #1573259 opened: Non-admin users unable to share models <regression> <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1573259>
 * thumper sighs
<thumper> I'm going to have to work out how to get the lxd provider working again...
<thumper> abentley: do you know of the git command that relates to the bzr one of merge-preview?
<abentley> thumper: No, sorry.  Still a noob with git.
<thumper> hmm..
<thumper> I want to see the diff of what my merge would be against master
<thumper> so not what is on master that isn't on mine
<thumper> just what is unmerged
<thumper> should be easy right?
<ericsnow> thumper: make sure you apt-get lxd to get the latest and then run "sudo dpkg-reconfigure -p medium lxd" to set up the LXD bridge
<ericsnow> thumper: see https://github.com/juju/docs/pull/998/files
<thumper> ericsnow: ta
<ericsnow> thumper: np :)
 * thumper is looking at code and getting a sinking feeling
<ericsnow> thumper: had to do it today so it was fresh in my mind
 * thumper breathes a sigh of relief
<thumper> seems I don't need to worry about that...
 * thumper goes back to details
 * thumper chuckles
<thumper> "lxc finger"
<thumper> I'll give it the finger
<perrito666> axw_: ping me when you are awake and around?
<abentley> thumper: So merge --preview actually does the merge, so it'll show conflicts and stuff.  The equivalent git command, if it exists, would also need to do the merge.  "diff -r submit:" just performs a diff against well-chosen revisions and is more likely to have a git analogue.
<abentley> thumper: I think you can do the equivalent of "diff -r submit:" with "git diff $(git merge-base)"
<thumper> hmm...
<thumper> ta
<redir> ericsnow: got a minute?
<ericsnow> redir: sure
<redir> HO?
 * perrito666 no longer has a webcam but still waves when finishing a conversation......
<wallyworld> redir: i was just going to ping you about eric's PR? can i join the hangout too?
<axw_> perrito666: I'm awake and around
<wallyworld> ericsnow: redir: are you guys hanging out somewhere?
<ericsnow> wallyworld: moonstone
<thumper> BOOM!!!
<thumper> that was my head-cannon
 * thumper cries in the corner
<mup> Bug #1573286 opened: juju/mongo: oplog tests fail with mongod 3.2 <juju-core:New> <https://launchpad.net/bugs/1573286>
<perrito666> axw_: nevermind, I sorted it out
<perrito666> tx a lot
<axw_> perrito666: okey dokey
<thumper> bugger....
<thumper> I really should do this in two branches
<thumper> but I won't
<thumper> poo
<mwhudson> is there some funky way i can jack up the mgo logging when running juju tests? menn0?
<menn0> mwhudson: I don't know off the top of my head but I know approximately where the change would need to be made
 * menn0 looks
<mwhudson> close enough
<menn0> mwhudson: the base suite which manages the mongo instances used in tests is in here: github.com/juju/testing/mgo.go
<mwhudson> i have found some time.Sleep(100 * time.Millisecond)s in mgo and i am suspicious
<menn0> mwhudson: you'll probably want to mess with the args passed to mongo in MgoInstance.run()
<mwhudson> menn0: ah no i mean mgo's own logging, mgo.SetOutput and stuff
<menn0> ahhh
 * menn0 confused mgo and mongo
<mwhudson> i guess i can jam something in there as well
<menn0> right
<menn0> mwhudson: all the tests that use mongodb (and therefore mgo) are based on that suite
<menn0> mwhudson: so that suite is probably still the place to mess with mgo's logging
 * menn0 looks at those sleeps
<mup> Bug #1573294 opened: state tests run 100x slower with mongodb3.2 <juju-core:New> <https://launchpad.net/bugs/1573294>
<menn0> mwhudson: wow there are a bunch of sleeps throughout mgo
<mwhudson> menn0: enterprissssssssssssssssssssssssssse
<mwhudson> hm no doesn't seem to be a sleep in mgo
<mwhudson> menn0: so now i guess i am interested in jacking up mongo's logging :-)
<mwhudson> menn0: and actually seeing it, i guess
<menn0> mwhudson: pass --logpath /some/where and --logappend I guess?
<menn0> in MgoInstance.run()
<mwhudson> menn0: ah haha MgoInstance.run() waits for a log message though :(
<menn0> mwhudson: ah crap...
<menn0> mwhudson: then you might have to monkey with the log capture stuff in the suite
<mwhudson> yeah
<mwhudson> oh but it looks like the mongo log output is Tracef-ed
<mwhudson> menn0: so can i set the log level to trace when running the tests somehow?
<menn0> mwhudson: which logs? juju's?
<mwhudson> yea
<mwhudson> i guess i can just jam a logger.SetLevel(loggo.TRACE) in
#juju-dev 2016-04-22
<mwhudson> menn0: do we have any friends at mongodb inc or who really know mongo details inside out?
<menn0> mwhudson: I think niemeyer might know some people there, not sure otherwise
<mwhudson> menn0: ok
<mwhudson> to by now what is by now a general lack of surprised, switching to the mmapv1 storage engine makes the test much faster
<menn0> mwhudson: doh ... so maybe wiredtiger sucks
<mwhudson> yeah it's a bit odd
<menn0> or maybe it's something about the way we use it
<mwhudson> i wonder if there's some tuneable
<mwhudson> this is why i would like to talk to someone who really knows their shit
<menn0> mwhudson: start with niemeyer I guess
<mwhudson> niemeyer: hello hello
<mwhudson> huh i wonder about using https://docs.mongodb.org/manual/core/inmemory/ for tests one day
<mwhudson> oh, enterprise only
 * mwhudson is now reading spinlock code in wiredtiger...
<redir> until tomorrow juju-dev
<mwhudson> hee hee
<mwhudson> After this operation, 1,356 MB of additional disk space will be used.
<mwhudson> (installing juju-mongodb3.2-dbgsym)
<niemeyer> mwhudson: heya
<niemeyer> I'm on my phone so apologies in advance if the communication breaks up
<mwhudson> niemeyer: ok, just wondering if you know a good person to poke at mongo inc
<mwhudson> niemeyer: we have an issue where index creation with wiredtiger takes ~100ms vs ~1ms with mmapv1, makes some tests veeeerrrry slow
<mwhudson> niemeyer: also it's 100ms of wall clock time but <<100ms of cpu time so i suspect it's waiting on a lock or something
<niemeyer> mwhudson: Let me find a bug #.. Just a sec
<niemeyer> mwhudson: https://jira.mongodb.org/browse/SERVER-21198
<niemeyer> mwhudson: Very unfortunate indeed, and I could not find much support to actually fix the underlying problem
<niemeyer> mwhudson: I haven't yet finished testing with libeatmydata
<mwhudson> niemeyer: huh, let's try turning --nojournal off
<niemeyer> mwhudson: You can't if --config is used, but you probably don't need that
<niemeyer> (mgo does as it needs to ensure replicas work well too)
<mwhudson> hm seems to help
<mwhudson> of course i have so much debugging crud patched into my tree that its hard to tell
<niemeyer> mwhudson: If you find something interesting please do let me know.. I'm struggling with the same issue.. Still using mmapv1 on CI due to that
<mwhudson> niemeyer: will do
<mwhudson> i should actually read the whole ticket too...
<axw_> thumper: why should a non-controller-administrator model owner not be allowed to grant access to their model to someone else?
<thumper> axw_: this is what I'm fixing
<axw_> thumper: didn't you change it to that tho? was that temporary?
<thumper> axw_: no, it was done that way when grant/revoke were added
<thumper> in error
<axw_> thumper: ah ok, I'm looking at the wrong function
<axw_> thumper: so what will it be? only model owners can share?
<thumper> anyone with write access to the model can share
<thumper> and admins
<axw_> thumper: hmm ok. I guess if you give someone write access, then it makes sense that they can delegate that to someone else. ok, thanks
<axw_> thumper: just clarifying some docs
<thumper> np
<mwhudson> niemeyer: blah, 3.2 is still a LOT slower even without --nojournal
<mwhudson> but it certainly does help
<mwhudson> 3.2/nojournal: 20s 3.2/no-nojournal: 6.5s 2.4/nojournal: 0.3s 2.4/no-nojournal: 0.8s
<mwhudson> and 3.2 with mmapv1 is ~the same as 2.4, this is mmapv1 vs WT, nothing else afaict
<mwhudson> huh mongod doesn't start with eatmydata for some reason
<mwhudson> menn0: btw i also filed https://bugs.launchpad.net/juju-core/+bug/1573286
<mup> Bug #1573286: juju/mongo: oplog tests fail with mongod 3.2 <juju-core:New> <https://launchpad.net/bugs/1573286>
<menn0> mwhudson: yep I saw that
<menn0> mwhudson: thanks
<menn0> mwhudson: I'll jump on that once this SSH host key work is done
<mwhudson> cool
<menn0> axw_: do you happen to know off the top of your head what happens with "juju ssh" and windows machines?
<menn0> do we explicitly block it or will it just fail because there's no SSH server?
<axw_> menn0: windows client?
<menn0> windows workload machine
<axw_> menn0: I'm pretty sure it'll still try to connect
<axw_> (and fail)
<menn0> axw_: ok thanks
<axw_> menn0: I'm bootstrapping azure, I can try it if you like
<menn0> axw_: it's ok, it doesn't matter too much, more curious than anything
<axw_> okey dokey
<mup> Bug #1571687 changed: Azure-arm leaves machine-0 from the admin model behind <azure-provider> <ci> <destroy-controller> <jujuqa> <repeatability> <juju-core:Invalid> <https://launchpad.net/bugs/1571687>
<mup> Bug #1573365 opened: Can't bootstrap --upload-tools on trusty <juju-core:New> <https://launchpad.net/bugs/1573365>
<mup> Bug #1573365 changed: Can't bootstrap --upload-tools on trusty <juju-core:New> <https://launchpad.net/bugs/1573365>
<mup> Bug #1573382 opened: only one user can create local users <juju-core:In Progress by thumper> <https://launchpad.net/bugs/1573382>
 * thumper goes for a wander
 * menn0 is done
<mup> Bug #1572707 changed: 2.0b5: panic when running juju register <juju-release-support> <landscape> <juju-core:Invalid by axwalk> <https://launchpad.net/bugs/1572707>
<mup> Bug #1573407 opened: juju show-user/list-users do not check authorisation <juju-core:Triaged> <https://launchpad.net/bugs/1573407>
<mup> Bug #1573410 opened: trusty juju 1.25.5 having issues deploying xenial lxc containers <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1573410>
<rogpeppe2> perrito666: i do now :)
<dimitern> I figured out how to trick the unit tests into thinking trusty is still the latest lts :)
<dimitern> mv /usr/bin/distro-info /usr/bin/distro-info-org && ln -s /usr/bin/fake-distro-info /usr/bin/distro-info
<dimitern> fake-distro-info is just `#!/bin/bash\necho trusty`
<dimitern> and make check passed for the first time since yesterday, just like on a good ol' trusty machine \o/
<dimitern> babbageclunk: morning
<dimitern> babbageclunk: any luck with the maas upgrade yesterday?
<babbageclunk> dimitern: morning!
<babbageclunk> dimitern: Well, it came back up when I restarted the vm this morning, but it's still reporting the wrong version.
<dimitern> babbageclunk: via the CLI ?
<babbageclunk> I'm doing the update/dist-upgrade dance now.
<babbageclunk> yup
<babbageclunk> I got a bit confused by the fact that apt doesn't have a dist-upgrade command, so I just did an upgrade (which definitely mentioned maas packages).
<dimitern> babbageclunk: the vm where the maas contoller runs is xenial, right?
<babbageclunk> do I still need to do a apt-get dist-upgrade as well?
<babbageclunk> dimitern: yes
<dimitern> babbageclunk: sudo apt dist-upgrade works for me btwq
<dimitern> btw even
<babbageclunk> dimitern: tsk, doesn't appear in the help or the man page! Presumably there for backwards-familiarity?
<dimitern> babbageclunk: ha! you're correct - it doesn't appear in the help, but bash completion seems to suggest it.. I guess it's aliased to full-upgrade
<babbageclunk> dimitern: I don't understand the difference - reading on askubunut
<babbageclunk> uhbuntu
<dimitern> babbageclunk: full-upgrade vs dist-upgrade ? it appears to do the same thing on my machine at least :)
<babbageclunk> dimitern: sweet, that's emitting lots of messages about upgrading the maas installation
<babbageclunk> dimitern: well, migrating it
<babbageclunk> dimitern: It was the difference between upgrade and dist/full-upgrade I didn't grok
<babbageclunk> but now I think I do!
<babbageclunk> ha, didn't realise maas was django
<dimitern> heh I still don't get the difference between full- vs dist-upgrade
<babbageclunk> dimitern: yay, version is upgraded now - between that and bumping up the ram on the controller everything seems good now.
<babbageclunk> dimitern: thanks for the help!
<dimitern> babbageclunk: awesome! I'm glad it ok, as I was already feeling bad for suggesting it :)
<babbageclunk> dimitern: :) would have been straightforward if I was a bit more familiar with all of the tools.
<dimitern> babbageclunk: well, with my experience maas upgrades are rarely a walk in the park :D although it seems to be improving with each release
<babbageclunk> dimitern: ooh, I wonder if that fixes the bug with the power settings getting cleared when the machine gets allocated via the API.
<dimitern> babbageclunk: hopefully - they've made changes to the power handling, esp. around what's considered 'fatal error' (not to be retried)
<voidspace> babbageclunk: dimitern: frobware: dooferlad: you updated your main machines to xenial yet?
<babbageclunk> nope
<dooferlad> voidspace: noooooo
<voidspace> :-)
<voidspace> dooferlad: I thought you probably wouldn't have
<dimitern> voidspace: yeah
<voidspace> dimitern: hah, cool
<voidspace> I might wait a few days, I don't generally wait very long
<dimitern> voidspace: I was eager to try maas2 2 weeks ago so I did
<rogpeppe> a trivial change to the lxd provider, review appreciated please: http://reviews.vapour.ws/r/4682/
<voidspace> rogpeppe: LGTM
<rogpeppe> voidspace: ta!
<dimitern> rogpeppe: wait a sec please
<frobware> voidspace: yes, for about 2 months now
<dimitern> rogpeppe: why panic?
<rogpeppe> dimitern: we've had this conversation before
<dimitern> rogpeppe: I guess so, but can you remind me? :)
<rogpeppe> dimitern: config.Schema is deterministic
<rogpeppe> dimitern: and configSchema is a global variable that never changes
<rogpeppe> dimitern: and we've run a test that calls environprovider.Schema
 * dimitern has another look
<rogpeppe> dimitern: so if config.Schema returns an error, there is some serious problem
<rogpeppe> dimitern: the interface contract for Schema doesn't return an error (and why should it? a provider should be able to return its own schema without errors)
<dimitern> rogpeppe: right, got it
<simonklb> hi, I've experienced some problems with the hostname not being set in /etc/hosts when Juju with LXD as provider
<rogpeppe> dimitern: this code is just the same in the other providers
<simonklb> this is on 2.0 beta4
<dimitern> rogpeppe: :) thanks for explaining
<simonklb> I saw this: https://github.com/lxc/lxd/issues/1759
<simonklb> would it be up to Juju to set this up correctly?
<rogpeppe> jamespage: did that work OK for you?
<voidspace>    []
<voidspace>                       
<simonklb> jamespage: I saw that you were involved here: https://bugs.launchpad.net/charms/+source/ceph/+bug/1365671
<mup> Bug #1365671: juju + maas provider: ceph-mon doesn't listen on localhost and /etc/hosts points fqdn to 127.0.1.1 <cloud-installer> <landscape> <openstack> <ceph (Juju Charms Collection):Won't Fix> <https://launchpad.net/bugs/1365671>
<simonklb> Could you please clarify how this works now in Juju 2.0?
<simonklb> I'm trying to setup the hbase charm on a local LXD environment but it complains about the hostname not being set in /etc/hosts
<jamespage> simonklb, tbh I don't know how that would work in a LXD environment - the original bug was with MAAS doing some config it did not need todo.
<simonklb> jamespage: do you know how cloud-init is configured? Is that something you can do with Juju?
<jamespage> simonklb, juju manages all of that complexity
<jamespage> simonklb, this charm? https://jujucharms.com/hbase/
<simonklb> jamespage: yes!
<simonklb> from that bug-thread you were in it said that MAAS was configured with manage_etc_hosts localhost - so is cloud-init configured per provider somehow?
<simonklb> it seems like manage_etc_hosts is unset with the LXD only setup
<jamespage> simonklb, its certinaly not the same with every provider...
<jamespage> cory_fu, hey - do we have a more recent hbase charm than https://jujucharms.com/hbase/
<jamespage> I wrote that for 12.04 and I suspect it needs to be removed from the charmstore....
<jamespage> its certainly not been tested recently by the looks of things...
<jamespage> dammit need to stop doing ...
<jamespage> gnuoy is going to start fining me about that
<gnuoy> I am...
<jamespage> my fingers do it on auto
<simonklb> jamespage: a more recent hbase would be even better, but it might be useful to know how to configure cloud-init with Juju (if it's possible)
<jamespage> simonklb, juju intentionaly abstracts that away from the end user
<jamespage> there may be configuration knobs per provider that change cloud-init behaviour, but nothing that will allow you to touch it directly...
<simonklb> jamespage: right, seems wierd that they wouldn't make cloud-init setup /etc/hosts though
<jamespage> "they" are right here so someone will be able to answer that definatively...
<jamespage> ;-)
<simonklb> haha yea, unfortunately the timezones make it so that my work day is almost over when they wake up :)
<simonklb> atleast the americans
<Garyx> Is the MAAS 2.0 support still wip?
<Garyx> Just tried a new xenial MAAS setup and it gives me a runtime panic when trying to bootstrap
<Garyx> Xenial repository seems to use Beta4
<simonklb> jamespage: is it possible to run the hbase charm in hbase standalone mode?
<mup> Bug #1571687 opened: Azure-arm leaves machine-0 from the admin model behind <azure-provider> <ci> <destroy-controller> <jujuqa> <repeatability> <juju-core:Incomplete> <https://launchpad.net/bugs/1571687>
<frobware> dimitern, voidspace, babbageclunk|run, dooferlad: going dark for a bit. need to get back to a working graphical login. my dist-upgrade didn't go so well.
<dimitern> frobware: ok, best of luck then :)
<frobware> I can't get my monitor out of 640x480... :(
<voidspace> Garyx: latest master bootstrap should work, but containers aren't yet supported
<voidspace> Garyx: so yes, WIP and you need to set the MAAS2 feature flag
<voidspace> Garyx: it shouldn't panic though - although older versions did
<voidspace> Garyx: so it's probably an older version you have (I hope)
<cory_fu> jamespage: We have https://jujucharms.com/u/bigdata-dev/apache-hbase/ which is a bit newer...
<cory_fu> But I don't think it's really been tested much...
<cory_fu> HBase hasn't been a priority for us...
<jamespage> cory_fu, probably more than my 4 year old bash version....
<jamespage> py-juju tastic...
<cory_fu> :)
<simonklb> jamespage: I actually got your hbase bash charm working :D
<jamespage> simonklb, wow
<jamespage> simonklb, I struggled to remember I actually wrote it until I dropped into the code...
<jamespage> must remember to leave more bug comments as mental notes for +4 years time...
<jamespage> 1804
<jamespage> golly
<jamespage> oh no 2004
<jamespage> apparenlty I can't add up either...
<simonklb> haha, I had to modify it a bit to get hbase to work in standalone mode, but its running at least!
<simonklb> for some reason I have an issue pinging the machine using it's domain name, pinging other machines by hostname works, but not the one running hbase
<simonklb> the only difference is that the hbase machine is running precise but the other are running trusty
<simonklb> is this a known problem?
<Garyx> voidspace: happened on beta5 and beta4 for me :/
<Garyx> voidspace: I set the featur flag so that it would try to bootstrap.
<babbageclunk> ok, this is getting ridiculous - I'm going to have to spend some time working out why my computer keeps freezing.
<ericsnow> katco: are we having our retrospective or just a normal standup today?
<katco> ericsnow: standup i think... no need for a retro yet. although we should schedule one for resources before too long
<katco> ericsnow: sorry, forgot to take it off the calendar
<ericsnow> katco: np :)
<simonklb> I checked /var/lib/misc/dnsmasq.lxdbr0.leases and saw that the hostnames are not registered properly
<simonklb> it's not depending on the series though, scratch that, same problem with machines running trusty
<simonklb> one machine registered ubuntu * and another just
<simonklb> * *
<simonklb> the ones that I'm able to ping obviously registered the hostname correctly, i.e. something like "juju-01f074e8-8788-44d6-87a3-a1861bd1e906-machine-0 *"
<simonklb> any dnsmasq gurus here? :)
<simonklb> looks like the machines register themselves with a different hostname first and then try to register again with the real hostname
<simonklb> is this some kind of race?
<marcoceppi> can we not bootstrap remote lxd servers?
<simonklb> saw this came up years ago as well https://bugs.launchpad.net/ubuntu/+source/maas/+bug/1043121
<mup> Bug #1043121: deployed node cannot be looked up with dnsmasq on MAAS <amd64> <apport-bug> <precise> <maas (Ubuntu):Won't Fix> <maas (Ubuntu Precise):Won't Fix> <https://launchpad.net/bugs/1043121>
<simonklb> but for MAAS
<simonklb> I'm hitting this with local LXD
<natefinch> marcoceppi: we don't support remote LXD currently
<katco> ericsnow: standup time
<mup> Bug #1573659 opened: Panic in when bootstrapping MAAS 2.0 <maas-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1573659>
<dimitern> frobware, voidspace, dooferlad, babbageclunk: guys, I'd appreciate a review on this: http://reviews.vapour.ws/r/4685/
<babbageclunk> voidspace, dooferlad, dimitern - I think we had this discussion last week, but no Juju-spaces sync meeting anymore? I should delete the appointment.
<dimitern> babbageclunk: ooh.. I forgot, but yeah I guess no call today as well
<babbageclunk> dimitern: oh, but should I keep the appointment in my calendar?
<dimitern> babbageclunk: delete it, we will likely reschedule anyway once it starts regularly again
<babbageclunk> ok
<babbageclunk> dimitern: wow, that's a lot of scrolling
<dimitern> babbageclunk: hopefully not too hard to follow - tried to mostly remove stuff and resist the occasional urge to refactor :)
<babbageclunk> dimitern: yeah, it looks like just deletia. (And comment rewrapping.)
<dimitern> babbageclunk: thanks!
<mup> Bug #1573665 opened: Drop special-casing of precise once we no longer support it. <juju-core:New> <https://launchpad.net/bugs/1573665>
<mup> Bug #1573668 opened: bootstrap should really be add-controller <docteam> <juju-core:New> <https://launchpad.net/bugs/1573668>
<dimitern> voidspace, dooferlad, frobware: ping
<dooferlad> dimitern: pong
<dimitern> dooferlad: here it is http://reviews.vapour.ws/r/4685/ :)
<dooferlad> dimitern: will take a look in a moment.
<dimitern> dooferlad: ta!
<dimitern> babbageclunk: btw that panic when bootstrapping maas2 seems to be when a machine has at least 1 nic with mode=auto and no subnet
<babbageclunk> dimitern: ok - can I reproduce that in vmaas?
<dimitern> babbageclunk: yeah, should be easy - just add a second nic to a node, recommission it, and try to bootstrap +maas2 ff on and --to my-node-hostname
<dimitern> babbageclunk: this might help: http://paste.ubuntu.com/15983759/
<babbageclunk> dimitern: thanks - I was still trying to unpack it. +maas2? ff on?
<dimitern> :) sorry - too concise
<dimitern> babbageclunk: with the maas2 feature flag enabled
<babbageclunk> dimitern: Ha! should have known that.
<alexisb> fwereade, ping
<babbageclunk> dimitern: ok - I'll try to chase it down now.
<dimitern> babbageclunk: thanks! cherylj filed a bug 1573659
<mup> Bug #1573659: Panic in when bootstrapping MAAS 2.0 <maas-provider> <juju-core:Triaged> <https://launchpad.net/bugs/1573659>
<dimitern> babbageclunk: some logs/context here as well -  https://github.com/juju/juju/issues/5259#issuecomment-213466954 (telltale sign of a link without subnet - mode=auto, is this log message `interfaces.go:274 interface "eno4" has no address`, just before the panic)
<voidspace> dimitern: another monster review from dimitern
<voidspace> dimitern: grabbing coffee first
<babbageclunk> dimitern: hmm, I added another nic and recommissioned, but now it doesn't show any network for the node. Do I need to add both of them?
<dimitern> voidspace: it's mostly removals :)
<dimitern> babbageclunk: doesn't show *any* interfaces after commissioning?
<babbageclunk> dimitern: nope
<dimitern> babbageclunk: can you paste the output of 'maas .. interfaces read <node-id>' ?
<perrito666> anyone else experiencing tests not passing for xenial?
<babbageclunk> dimitern: hang on, I tried removing it and recommissioning to see what I saw then.
<dimitern> perrito666: you're on xenial?
<dimitern> perrito666: I managed to get them passing by moving /usr/bin/distro-info out of the way and replacing it with a bash script that simply outputs "trusty"
<perrito666> dimitern: that is..... no, I wont do that, tests need fixing
<dimitern> perrito666: yeah, I tried that for 2h yesterday and decided I have enough other stuff to do :)
<perrito666> dimitern: ok, that gives me an idea of the order of magnitude
<perrito666> but, this is also something that broke recently
<babbageclunk> dimitern: should I add the nic on a different network?
<dimitern> babbageclunk: just leave it unconfigured
<alexisb> fwereade, I am going to have cherylj hand off the quick fix for bug 1572237
<mup> Bug #1572237: juju rc1 loses agents during a lxd deploy <lxd-provider> <juju-core:Triaged by fwereade> <https://launchpad.net/bugs/1572237>
<alexisb> unless you speak now
<alexisb> we will need to try and get something in today
<babbageclunk> dimitern: ah, ok - that's probably what I got wrong.
<cherylj> ericsnow, katco ping?
<ericsnow> cherylj: hi
<babbageclunk> dimitern: (sorry to keep bugging you)
<cherylj> hey ericsnow, that bug alexisb just mentioned is super critical and we need someone to see if there's a quick fix we can do for it.
<babbageclunk> dimitern: how do I leave it unconfigured?
<ericsnow> cherylj: I'll take a look
<cherylj> ericsnow: want to do a quick HO to see if you would be comfortable taking it?
<dimitern> babbageclunk: leave both dropdowns (subnet and address) Unconfigured?
<babbageclunk> dimitern: just leave out the bridge name?
<dimitern> babbageclunk: what bridge?
<ericsnow> cherylj: only if you think it would help :)
<babbageclunk> dimitern: the dropdowns are network source and device model.
<cherylj> ericsnow: I looked through the bug again, and fwereade did put in his update about a possible solution, so just let me know if you have questions
<babbageclunk> dimitern: hang on - dropdowns in VMM or MAAS ui? I'm talking about the VMM ui
<ericsnow> cherylj: k
<cherylj> I couldn't remember if he had updated the bug with that info or not
<dimitern> babbageclunk: ah, sorry
<dimitern> babbageclunk: in the virt-manager connect both NICs to the same bridge (maas-19-int in my case)
<babbageclunk> dimitern: Ok, now I've got two interfaces, one unconfigured.
<babbageclunk> dimitern: Does this look right? http://pastebin.ubuntu.com/15984913/
<perrito666> cherylj: https://github.com/juju/juju/pull/5262
<dimitern> babbageclunk: great, so bootstrapping with --upload-tools --to <node-name> should (hopefully) reproduce the panic
<perrito666> as small as it looks, that fixes most of the remaining windows tests
<dimitern> babbageclunk: yeah - notice how the second (ens9) is mode=link_up only, no subnet
<babbageclunk> dimitern: ok, go it!
<babbageclunk> dimitern: duh, got
<dimitern> babbageclunk: nice!
<babbageclunk> dimitern: I mean, the panic at least.]
<dimitern> babbageclunk: passing also --config=logging-config='<root>=TRACE' combined with some additional trace logging (if needed) around maas2Interfaces should show the cause
<cherylj> perrito666: did you live test on windows?
<babbageclunk> dimitern: ok, thanks - I can see where it is, although it's odd - there's a nil check immediately before it.
<babbageclunk> dimitern: ok, digging a bit more.
<dimitern> babbageclunk: thanks for looking into that
<dimitern> babbageclunk: nil checks on an interface value might be misleading btw
<dimitern> babbageclunk: the infamous double nil dereference
<babbageclunk> dimitern: Ok - that's almost certainly the problem.
<dimitern> (usually concerns errors only, but..)
<babbageclunk> dimitern: I've heard of that but hadn't ever seen it in the wild before!
<perrito666> cherylj: I did, I actually am fixing those on windows
<cherylj> perrito666: nice :)
<katco> cherylj: wow sorry i completely missed your ping... did you get what you needed?
<babbageclunk> dimitern: hmm. It sounds like the right fix for that is to make the subnet methods handle a nil receiver.
<dimitern> babbageclunk: here's some info on that if you're interested: http://devs.cloudimmunity.com/gotchas-and-common-mistakes-in-go-golang/index.html#nil_in_nil_in_vals
<cherylj> katco: yes, thanks :)
<katco> cherylj: k, sorry about that
<katco> ericsnow: ta
<cherylj> katco: no worries
<babbageclunk> dimitern: given that I can't cast to the real underlying type.
<babbageclunk> dimitern: since it's private in another package.
<babbageclunk> dimitern: and reflection seems like the wrong thing to use here.
<dimitern> babbageclunk: well, if the gomaasapi.Subnet interface's implementation uses non-pointer receivers (as they just access the data already populated in the unexported fields)
<perrito666> cherylj: I need to suffer to fix those in order to feel the same that windows feels :p
<cherylj> haha
<babbageclunk> dimitern: no, no value receivers.
<voidspace> dimitern: that's a lot of code removed!
<voidspace> dimitern: still reading
<dimitern> voidspace: :) hopefully more to come soon
<babbageclunk> dimitern: were you halfway through a thought?
<voidspace> dimitern: heh
<voidspace> dimitern: I'm on page 3
<voidspace> of 76 apparently...
<voidspace> possibly a slight exaggeration
<dimitern> voidspace: might be easier to look the diff on github instead
<babbageclunk> dimitern: Hmm - I could add an IsNil method to gomaasapi.Subnet. Is that idiomatic?
<dimitern> babbageclunk: I wanted to also remove a few other things, but that would have made it harder to follow
<voidspace> dimitern: appreciated
<dimitern> babbageclunk: nope, how about just changing gomaasapi.subnet's methods to use non-pointer receivers?
<babbageclunk> dimitern: oh, I see. Does that work? Trying now in the playground. If so I'll do that.
<dimitern> babbageclunk: for example: http://play.golang.org/p/pAHQEQEunJ
<babbageclunk> dimitern: How does that help? They both panic.
<babbageclunk> dimitern: oh, hang on - I think I'm misreading
<dimitern> babbageclunk: that might be a better example actually: http://paste.ubuntu.com/15985771/
<dimitern> babbageclunk: it's not quite there yet, but it's a start
<babbageclunk> dimitern: not quite analogous - they they're both == nil.
<babbageclunk> dimitern: but the problem is that the subnet's nil - do you mean do the same thing to the links?
<dimitern> babbageclunk: yeah, that's one option
<babbageclunk> dimitern: why does that help? you've just changed the point at which the address gets taken - does that change how the interface gets constructed?
<babbageclunk> dimitern: sorry if I'm being dense.
<dimitern> babbageclunk: basically change the whole chain interface->link->subnet->vlan to ensure no dangling pointers
<babbageclunk> dimitern: ok, I think I see
<dimitern> babbageclunk: or, alternatively make sure any method that returns an interface does check the pointer field != nil before
<babbageclunk> dimitern: I think it has to be that, right?
<dimitern> babbageclunk: the crux of the problem is that getters-returning-interfaces should also return either an error or at least a "exists" bool flag
<babbageclunk> So interface.Subnet() shouldn't return Subnet(nil), it should return nil
<dimitern> babbageclunk: that' won't help though
<babbageclunk> Oh, but can it do that? It doesn't return a *Subnet.
<dimitern> babbageclunk: as the compiler will silently wrap the nil in a non-nil interface instance
<babbageclunk> dimitern: ok, that's what I thought.
<dimitern> babbageclunk: e.g. if link.Subnet() returned (Subnet, bool) instead of just Subnet, and use the bool to indicate whether l.subnet == nil..
<babbageclunk> dimitern: Ok, so then I think you're right
<cherylj> perrito666: can we count on c: always being the root?
<dimitern> it will be a lot more natural to the caller to check firs the flag and only then try to use the subnet interface
<perrito666> cherylj: I was clear on "just do this for tests and fake paths" :)
<perrito666> cherylj: otherwise no, you cant
<perrito666> cherylj: there is no "root" on windows, or at least is not such a clear concept
<babbageclunk> dimitern: Or the other way would be to have IsNil methods on all of those interfaces. But I think you're right, the other way is probably more Go-y.
<cherylj> perrito666: okay, thanks.   :)
<perrito666> cherylj: ask as much as you like I am really happy to share the windowsness
<babbageclunk> dimitern: that's kind of a nasty gotcha.
<dimitern> babbageclunk: you can check for if subnet != nil, assuming 'subnet' is an interface, coming from a func that returns either the implementation or a plain nil
<dimitern> babbageclunk: and because this happens a lot with errors, that's why we have jc.ErrorIsNil vs gc.IsNil
<voidspace> dimitern: LGTM
<dimitern> voidspace: \o/ tyvm
<voidspace> dimitern: nice work, thank you
<dimitern> dooferlad: I'll wait for your review as well - will check back in a hour or so
<babbageclunk> dimitern: but there's no way we can change our interface to do that, right?
<dimitern> voidspace: cheers
<dimitern> (well  - it was a pleasure to see old mistakes go away)
<babbageclunk> dimitern: If the implementation methods returned (eg) *vlan instead of VLAN would that help?
<dimitern> babbageclunk: yeah, but that trail has long left :)
<babbageclunk> dimitern: because the interfaces would still return VLAN
<dimitern> babbageclunk: another reason why you should take interface arguments but return concrete (pod) types
<babbageclunk> dimitern: *interface methods I mean
<babbageclunk> dimitern: hard to do testing with that, though.
<dimitern> babbageclunk: I need to leave now, but I'm sure you can figure it out
<dimitern> :)
<babbageclunk> dimitern: same!
<babbageclunk> dimitern: Thanks for all the help - have a lovely weekend!
 * babbageclunk isn't as sure he can figure it out.
<dimitern> changing a few methods to return (interface-result, bool-flag) should be easy (and also to test)
<dimitern> babbageclunk: cheers! likewise ;)
<babbageclunk> yup yup
<mup> Bug #1573681 opened: conflict when managing LXD storage with Juju 2.0 <kanban-cross-team> <landscape> <juju-core:New> <Juju Charms Collection:New> <https://launchpad.net/bugs/1573681>
<redir> ericsnow: you wanted to review this WIP?
<ericsnow> redir: sure
<ericsnow> redir: it might be a little while as I'm working on a priority bug
<redir> ericsnow: http://reviews.vapour.ws/r/4689/
<redir> np
<redir> I'll keep chugging on the other LTS things.
<redir> They need to land all at once. But incremental reviews would make it much easier.
<redir> ericsnow: ^
<ericsnow> redir: k
<mup> Bug #1573741 opened: 2.0-rc1 cannot deploy in azure <azure-provider> <blocker> <ci> <deploy> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1573741>
<mup> Bug #1573742 opened: Newly provisioned machines start off at the controller's version. <juju-core:New> <https://launchpad.net/bugs/1573742>
<redir_lunch> does juju still use zookeeper?
<katco> redir: it does not
<redir> katco: that was my understanding. Just see comments referencing making sure it is configured to start...
<katco> redir: where is that?
<redir> provider/ec2 tests
<redir> katco: ^
<katco> redir: if you're nearby, create a small patch to remove those comments
<redir> katco: I am working on that test so I'll do it in the drive by
<katco> redir: nice, good find :)
<katco> redir: and ta
<redir> katco: I think I'll have a qq here in this test. i
<redir> katco: It is testing the cloud-init stuff which will be significantly different in xenial due to the change to systemd
<redir> katco: wondering if we need seperate tests for xenial and pre-xenial
<katco> redir: i didn't want to get into it this morning, but the fact that our tests don't work on different releases is a huge smell to me
<katco> redir: i.e. unit tests really shouldn't be affected by what release you're on
<redir> katco: agreed.
<redir> ericsnow: resubmitted a PR that has all passing tests rebased on master. RB: http://reviews.vapour.ws/r/4691/
<ericsnow> redir: k
<redir> or if there's anyone else that wants to review that ^^ feel free
<alexisb> ericsnow is busy
<alexisb> saving the world
<redir> alexisb: yeah, that's why I said or anyone else. ericsnow has a full plate
<redir> and it is late on friday, so the reviewer pool is shallow
<alexisb> redir, I would do it for you but I am not an official reviewer so you would need another +1 besides me
<redir> alexisb: understood:)
 * redir goes to look at the board
<mgz> ha!
<mgz> redir: (looking, btw)
<mgz> one of the ec2 local server suite tests actually had 'zookeeper' in a commen
<mgz> so not only was it ported verbatim from pyjuju and not corrected, no one looked at it since
<alexisb> heh
<redir> mgz: tx
<mgz> not wild about the way these bundle tests are written, but I guess it's 2 years before we have to edit all these strings again
<mgz> the alternative is I guess faking out the series selection stuff at the suite level,
<redir> mgz: you are not alone
<redir> 2.1!
<redir> :)
<mgz> redir: and the ec2 bootstrap test is just a fork of the existing one in two, so I won't nitpick it
<redir> mgz yeah that could have been smarter
<mgz> redir: review'd
<alexisb> mgz, thank you especially given it is nearly  your saturday
<mgz> alexisb: it is totally saturday :)
<alexisb> mgz, are you UTC or UTC+1
<mgz> In summer time so... +1 atm?
<alexisb> :) well in that case, dude! go enjoy your weekend
<mgz> hey, I've got 9 hours till saturday morning squash
<perrito666> mgz: here we call this friday night, and spend it drinking
<mgz> perrito666: what's the bootstrap wait loop change about in your windows tests pr?
<mgz> it looks reasonable, just not sure how it's connected
<perrito666> a race condition
<mgz> perrito666: review'd
<perrito666> mgz: in go 1.6, in windows, it will sometimes end withouth returning lastErr because of the select having hc.closed firing first
<perrito666> mgz: why this happens only in windows, beats me
<perrito666> it should be happening in linux too
<mgz> perrito666: some socket implementation detail? I'd expect that to be the case on all platforms.
<perrito666> mgz: could be
<perrito666> in any case, the logic was begging for that race to happen
<perrito666> I wish it was easier to track though :p
<redir> mgz: commented on your comment.
<mgz> redir: don't see it? hit publish mabe?
<redir> mmm yup I was muted on RB
<redir> mgz: the tests fail if they don't match up.
<mgz> redir: right, I expect that to be the case
<redir> so they can't be unique
<mgz> the question is do we care to fix them so the tests actually end up with exactly one image
<redir> or they are unique in export_test
<mgz> which they know what it is
<redir> they do end up with one of the images in export_test
<mgz> but, if it makese sense in the context of the test, feel free to drip the issue
<redir> depending on what constraints are applied
<redir> OK
<redir> dropping
<redir> or dripping
<redir> as it were
<redir> tx
<mgz> perrito666: you win funny find/replace error of the day
<perrito666> mgz: lol, tx
<mgz> perrito666: `git diff 5d2acdb0^..5d2acdb0 -- environs/config/config.go` - spot the funny mistake
<perrito666> mgz: could you pastebin? I am in an odd repo status
<mgz> perrito666: I'll fix and make you review
<perrito666> aghh logger
<mgz> perrito666: :P
<perrito666> (I had a seccond repo)
<perrito666> I suck
<redir> oh, are we on next again?
<redir> or is master just closed for a while
<mgz> we're not on next
<redir> well yeah we prolly don't want to land that thing you just reviewed
<mgz> alexisb: are we keeping master blocked just for eric's fix?
<redir> are the test slaves setup to have distro-info --lts return trusty?
<alexisb> mgz, it is blocked for the azure bug
<alexisb> but yeah no other commits atm please
<mgz> redir: yeah, we moved our setup in time
<mgz> so, we have another month of trusty as the default
<redir> mgz OK because that PR you reviewed is to fix it to work when --lts returns xenial.
<redir> mgz: which means it will prolly break on your time shifted slaves
<mgz> redir: we can coordinate that monday
 * redir is unclear on the transition process
<mgz> redir: probably just want me or curtis to revert the hack as your branch lands
<redir> hopefully it still merges cleanly and passes then:o
<redir> OK. I'll follow up with whomever is around then.
<redir> Good luck at squash in 8 hours
<mgz> I played quite well on two hours sleep last week, though... not generally a good idea
<redir> indeed
#juju-dev 2016-04-23
<mgz> perrito666: http://reviews.vapour.ws/r/4693/
<redir> later juju-dev i am eow
<mup> Bug #1573410 changed: trusty juju 1.25.5 having issues deploying xenial lxc containers <canonical-bootstack> <juju-core:Won't Fix> <https://launchpad.net/bugs/1573410>
<mup> Bug #1574076 opened: juju package should suggest juju-1.25 not juju-core <juju-core:New> <juju (Ubuntu):New> <https://launchpad.net/bugs/1574076>
#juju-dev 2016-04-24
<axw> cherylj: around?
<mup> Bug #1573741 changed: 2.0-rc1 cannot deploy in azure <azure-provider> <blocker> <ci> <deploy> <regression> <juju-core:Fix Released by cherylj> <https://launchpad.net/bugs/1573741>
<mup> Bug #1573741 opened: 2.0-rc1 cannot deploy in azure <azure-provider> <blocker> <ci> <deploy> <regression> <juju-core:Fix Released by cherylj> <https://launchpad.net/bugs/1573741>
<mup> Bug #1572145 changed: kvmProvisionerSuite.TestContainerStartedAndStopped no event arrived <ci> <ppc64el> <regression> <test-failure> <unit-tests> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1572145>
<mup> Bug #1573741 changed: 2.0-rc1 cannot deploy in azure <azure-provider> <blocker> <ci> <deploy> <regression> <juju-core:Fix Released by cherylj> <https://launchpad.net/bugs/1573741>
<mup> Bug #1572145 opened: kvmProvisionerSuite.TestContainerStartedAndStopped no event arrived <ci> <ppc64el> <regression> <test-failure> <unit-tests> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1572145>
<mup> Bug #1564503 changed: arm64 unit tests have never passed <arm64> <jujuqa> <tech-debt> <unit-tests> <juju-core:Fix Released by cherylj> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1564503>
<mup> Bug #1572145 changed: kvmProvisionerSuite.TestContainerStartedAndStopped no event arrived <ci> <ppc64el> <regression> <test-failure> <unit-tests> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1572145>
<mup> Bug #1564503 opened: arm64 unit tests have never passed <arm64> <jujuqa> <tech-debt> <unit-tests> <juju-core:Fix Released by cherylj> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1564503>
<mup> Bug # changed: 1300519, 1307282, 1323562, 1389494, 1389495, 1545040, 1564503
<mup> Bug #1574272 opened: Juju agent uninstalls itself while adding machine <juju-core:New> <https://launchpad.net/bugs/1574272>
<mup> Bug #1095592 changed: jujud machine TestManageEnviron is very slow <intermittent-failure> <tech-debt> <test-failure> <juju-core:Fix Released> <https://launchpad.net/bugs/1095592>
<mup> Bug #1556171 changed: uniterV0Suite.SetUpTest connection reset <centos> <go1.5> <wily> <juju-core:Fix Released> <https://launchpad.net/bugs/1556171>
<mup> Bug #1559730 changed: TestMongoErrorNoCommonSpace timed out <intermittent-failure> <network> <unit-tests> <juju-core:Fix Released> <https://launchpad.net/bugs/1559730>
<mup> Bug #1457148 opened: TestOpenStateFails fails <ci> <intermittent-failure> <test-failure> <juju-core:Incomplete> <juju-core 1.25:Triaged> <https://launchpad.net/bugs/1457148>
<mup> Bug #1460685 opened: TestUniterSubordinates fails <ci> <intermittent-failure> <test-failure> <juju-core:Incomplete> <https://launchpad.net/bugs/1460685>
<mup> Bug # changed: 1458741, 1461561, 1463661, 1465157, 1467362, 1494734, 1496161
<mup> Bug # changed: 1218997, 1372437, 1384275, 1391046, 1413067, 1427501, 1455490, 1458588, 1539116, 1543235, 1545050, 1552804, 1556157, 1560757
<mup> Bug #1433589 opened: failed to initialize mongo admin user: cannot set admin password: Closed explicitly <bootstrap> <intermittent-failure> <mongodb> <juju-core:Triaged> <juju-core 1.24:Won't Fix> <https://launchpad.net/bugs/1433589>
<mup> Bug # changed: 1363002, 1439112, 1494729, 1542520
<mup> Bug # changed: 1064263, 1227931, 1230306, 1233497, 1299580, 1304530, 1316709, 1329154, 1331025, 1366627, 1367651, 1383116, 1387388, 1403151, 1426118, 1439813, 1444353, 1451487, 1453644, 1470526, 1478051, 1492058, 1492088
<alexisb> anyone on the core team around?
#juju-dev 2017-04-17
<babbageclunk> wallyworld: ping?
<wallyworld> hey
<babbageclunk> wallyworld: how was Easter? Eat lots of eggs?
<wallyworld> babbageclunk: sadly no, but i am going to the shop to get some soon as i need a chocolate fix
<wallyworld> i was too naughty for the easter bunny
<babbageclunk> wallyworld: discount eggs are the best
<babbageclunk> wallyworld: so my test for the IP address changing was weird - the remote firewaller change was doing the right thing, but it looks like the firewaller is hanging when it tries to actually open the ports.
<babbageclunk> :(
<babbageclunk> Chasing it now.
<wallyworld> babbageclunk: oh :-( was that gce?
<babbageclunk> wallyworld: yup yup
<wallyworld> babbageclunk: i'll be back soon with chocolate, so if you get stuck, i can help then
<babbageclunk> wallyworld: cool thanks
#juju-dev 2017-04-18
 * babbageclunk goes for a walk
<axw> jam: FYI, https://azure.microsoft.com/en-au/updates/ga-multiple-ips-per-nic/ & https://azure.microsoft.com/en-au/updates/dual-nic-support/
<rick_h> axw: oooooh
<axw> rick_h: :)
<axw> rick_h: also, did you see there's SR-IOV support, so you can have moar faster mellanox networks
<axw> rick_h: still in preview tho
<rick_h> axw: no, I didn't. I get the summary emails from GCE and AWS but not from Azure. I need to find that I guess.
<axw> rick_h: I'm not subscribed, I just stumbled across it. not sure where to subscribe to these sorts of things
<rick_h> axw: https://cloud.google.com/newsletter/ and https://aws.amazon.com/new/ (via rss) are the things I follow I think
<axw> rick_h: thanks. i'm on the google one already, prob should read aws too
 * rick_h tries https://azure.microsoft.com/en-us/community/newsletter/subscribe/ to see if that'll send news
<babbageclunk> wallyworld, axw - I'm assuming we don't rely on the name of the firewalls created by GCE to be the hash of the rules, it's just how we make them unique. Does that sound right?
<wallyworld> babbageclunk: that's my understanding, but to change the algorithm would require upgrade steps
<wallyworld> maybe
<wallyworld> or at least compatibility code
<axw> babbageclunk: no idea sorry
<axw> that's what I would expect though
<babbageclunk> wallyworld: I'm not changing the way the name's generated, just not updating it when we change the rules it contains (since it looks like that's not supported). The code finds firewalls by rules, not by relying on the name, so I don't *think* we'll need any upgrade steps.
<wallyworld> that sounds ok
<babbageclunk> wallyworld: easiest review ever: https://github.com/juju/juju/pull/7246
<wallyworld> ok
<babbageclunk> I mean, I guess I have seen 1-line PRs, so it could technically be easier
<wallyworld> babbageclunk: sorry, was just otp. don't we need to rename the rule with the new hash so it can be found next time?
<wallyworld> or so that the name matches the hash
<babbageclunk> wallyworld: I don't think so - we don't find it by name, otherwise we wouldn't have been able to find existingFirewallName. You can see above in OpenPorts that it finds it by CIDR and/or port.
<wallyworld> babbageclunk: ok, something is bothering me about having the name start out as a hash of the ports and then not being the hash. what happens if a new sec group is created with the old rules, there will be a name clash?
<babbageclunk> wallyworld: hmm, I see what you mean.
<babbageclunk> wallyworld: so in that case maybe we need to delete the firewall and add a new one? Although I guess that could kill connections. :(
<wallyworld> babbageclunk: could do. can we rename a firewall?
<babbageclunk> wallyworld: oh right - maybe as a separate call from updating its rules.
<babbageclunk> wallyworld: reading some docs
<wallyworld> ok
<babbageclunk> wallyworld: I can't find any way to rename one.
<wallyworld> babbageclunk: create new one and then delete old?
<wallyworld> babbageclunk: or maybe just use a uuid type thing to generate a unique name up front and not tie to rules hash
<babbageclunk> wallyworld: yeah, I was just about to say that
<babbageclunk> wallyworld: ugh, using a random number requires stitching errors through and making the tests handle nondeterministic results. Do you think I should do the add and then delete thing instead? Feels a bit more fragile though
<wallyworld> babbageclunk: you can mock out the name generation for the tests using a func that doesn't error
<wallyworld> and that returns a deterministic result
<babbageclunk> wallyworld: yeah, but there are are higher-level tests that also (incidentally) check the name because they're looking at calls.
<babbageclunk> wallyworld: oh, you mean reassign the function in export_test?
<wallyworld> something like that, or pass in as a dep, so the test can control the name generated
<babbageclunk> wallyworld: yeah, I was passing it in as a dep (since I've seen people object to the rebinding), but it means that I need to pass it down multiple levels
<babbageclunk> wallyworld: tempted to make the tests not look at the random bit of the name.
<wallyworld> babbageclunk: sgtm
<axw> jam: teeny review please? https://github.com/juju/gomaasapi/pull/69
<jam> axw: I assume that information was always in the API response, even for older versions of MAAS?
<axw> jam: yeah, and we use it when connecting to MAAS 1.x
<jam> sgtm
<wallyworld> axw: no rush, if you get a chance at some point https://github.com/juju/juju/pull/7239
<axw> wallyworld: sure. pool leak dude is here, bbs
<wallyworld> ouch, hope it's not too bad
<axw> wallyworld: pool has been losing water for months, just recently got to 3-4cm each day. my regular pool guy was hopeless, had a few shots at finding the leak
<axw> wallyworld: so called up a specialist and he found it in 30 minutes...
<axw> wallyworld: haven't seen the invoice yet, hopefully that's not too special
<wpk> $1 for hitting it with a hammer, $999 for knowing where to hit
<axw> wpk: yeah, pretty much. except the guy who didn't solve the problem still charged $1000 (figuratively, not really that much)
<wallyworld> axw: was it in a connecting pipe? or the pool itself?
<anastasiamac> wallyworld: wow \o/ m just curious if it's fixed :D regardless of where it is.. altho the location would probably determine the price of work..
<axw> wallyworld anastasiamac: bottom of the pool. he swam down and filled the crack with some type of glue/filler
<anastasiamac> axw: niiice :) how old is the pool? some builders have extended warantees...
<axw> anastasiamac: good question/idea. I don't know - I'll check later
 * babbageclunk is popping out, back in a bit
#juju-dev 2017-04-19
<wallyworld> axw: quick one? https://github.com/juju/juju/pull/7249
<axw> wallyworld: looking
<axw> wallyworld hml: shouldn't we have v2-unstable while we work out the breaking API changes we want to make?
<axw> on goose
<wallyworld> axw: i'm sorta ambivalent about that - pita to change the imports again, plus we are really the only ones using it
<axw> wallyworld: true enough
<axw> ok
<wallyworld> axw: and we *still* have charm.v6-unstable et al in core :-/
<axw> indeed
<hml> axw: I did add a note to the readme.md that the branch was experimental for now and to use v1 for stable.  perhaps unstable would have been better phrasing
<axw> hml: okey dokey
<hml> axw: just have to remember to change the readme.md at some point.  :-)
<axw> hml: on other repos we use -unstable suffix to indicate that hte API may break, then get rid of the suffix when we're happy with it
<axw> hml: but since nobody else is using this, it should be fine
<axw> just so long as it doesn't go on for too long
<hml> axw: agreed
<wallyworld> babbageclunk: i know you're having fun with GCE, but here's a review for when you need to give your brain a rest https://github.com/juju/juju/pull/7251
<babbageclunk> wallyworld: looking
<wallyworld> yay, ty
<wallyworld> sorry
<thumper> we doing the tech board now?
<jam> thumper: I'm going to show up
<axw> thumper wallyworld jam menn0: I need to eat, so won't be coming
<menn0> thumper, jam: my brain feels full but I'm ok to join
<jam> menn0: we're in there, but we'll try to keep it lighter
<babbageclunk> wallyworld: lgtm'd
<wallyworld> yay, ty
<babbageclunk> wallyworld: ping? Or are you in a call?
<wallyworld> in a call, finished soon
<wallyworld> babbageclunk: hey
<babbageclunk> wallyworld: hey, sorry - are IngressRules being sorted somewhere before being hashed to form a key?
<babbageclunk> wallyworld: I couldn't see it, but it seems like that would be needed.
<wallyworld> for gce provider? not sure tbh. it's been ages sincei looked at that code
<wallyworld> i think they are yes
<wallyworld> probs in the firewaller
<wallyworld> i'm sure there's a bespoke sort function
<wallyworld> babbageclunk: yep. network.SortIngressRules
<babbageclunk> wallyworld: I mean the cidrs within the rukes
<babbageclunk> rukes
<babbageclunk> gah
<babbageclunk> rules
<wallyworld> oh, not sure
<wallyworld> otehr providers don't need that sorting i don't think
<wallyworld> so it would be gce specific if it were there
<babbageclunk> wallyworld: Ok, I'll check again and add it if I don't find it. (I just mean so that if there's one rule for tcp:80 from (1.2.3.0/24 and 2.3.4.0/24) and another for tcp:3306 from (2.3.4.0/24 and 1.2.3.0/24) they get combined correctly. I'll add a test for it anyway.)
<wallyworld> babbageclunk: yeah, that scenario should be handled, i though it was. the code to do that is indeed in the gce section somehwere. i recall seeing it
<wallyworld> but add a test for sure if there's not coverage
<wallyworld> there's also a bunch of tests
<wallyworld> for various scenarios
<axw> wallyworld: something dodgy about max-status-history-age, see http://juju-ci.vapour.ws/job/github-check-merge-juju/781/artifact/artifacts/trusty.log/*view*/
<wallyworld> axw: i fixed that failure locally and once tests passed, pushed before landing
<wallyworld> how did that get through i wonder
<wallyworld> i'll look, see if latest landing still has it
<rogpeppe> is anyone around that might be able to give this a review, please? It's been waiting for 9 days now. https://github.com/juju/juju/pull/7222
<rogpeppe> wallyworld, axw: ^
<axw> rogpeppe: not right now sorry, but will tomorrow morning if nobody else gets to it first
<rogpeppe> axw: do you know if there are any juju-core devs in non-antipodean timezones any more now?
<axw> rogpeppe: jam, hml, and externalreality
<rogpeppe> axw: ok, that's good to know, thanks
<menn0> rogpeppe: reviewed +1
 * menn0 isn't actually working right now but saw rog's plea :)
<menn0> rogpeppe: in future, email me if you're having trouble getting a review
<rogpeppe> menn0: â¤
<rogpeppe> menn0: i've been away for a week
<menn0> rogpeppe: ah right
<rogpeppe> menn0: so it wasn't too much of an issue
<menn0> rogpeppe: I had started to look at that PR last week but must have gotten distracted
<rogpeppe> menn0: what's the magic string to get a test CI run on a PR, BTW?
<menn0> rogpeppe: $$merge$$ will run all the tests, !!build!! I believe to request a pre-merge check of the merge tests
<menn0> rogpeppe: that's supposed to happen automatically though
<menn0> at first submission at least
<rogpeppe> menn0: ah, it didn't seem to on the above PR for one
<menn0> rogpeppe: it seems unreliable (or maybe user specific?)
<rogpeppe> menn0: i think i prefer it to be explicit anyway
<rogpeppe> menn0: like the golang.org one
<bdx_> @team, https://bugs.launchpad.net/juju/+bug/1684143
<meetingology> bdx_: Error: "team," is not a valid command.
<mup> Bug #1684143: applications deployed to lxd on aws instances failing <juju:New> <https://launchpad.net/bugs/1684143>
<rogpeppe> ongoing juju command-level mocking simplifications https://github.com/juju/juju/pull/7254
<rogpeppe> anyone around for a review of this? ^
<lazyPower> wallyworld: rogpeppe - question. If i'm having an issue with resource-get, and getting zero output on the controller, is just the "how to reproduce" isntructions valid enough?
<lazyPower> i dont want to file a bug that wont help anyone, but at the same time i cannot deploy any of teh older k8s bundles to test an upgrade path scenario as resource-get just hangs indefinitely.
<rogpeppe> lazyPower: if it hangs indefinitely, sending SIGQUIT (or doing ctrl-\) can be useful to show where it's hung
<lazyPower> i just filed https://bugs.launchpad.net/juju/+bug/1684242 -- will redeploy and give that a go
<mup> Bug #1684242: resource-get hangs indefinitely on older k8s bundles <juju:New> <https://launchpad.net/bugs/1684242>
<lazyPower> updated, thanks for the detail rogpeppe
<wallyworld> babbageclunk: you find the GCE code to handle the CIDR/port aggregation?
<babbageclunk> wallyworld: yeah, but I can't see it doing any sorting of CIDRs
<babbageclunk> wallyworld: got time for a hangout?
<wallyworld> ok, seems like something that needs to be fixed i guess
<wallyworld> sure
<wallyworld> standup
#juju-dev 2017-04-20
<axw> wallyworld: I have a few fixes for vsphere in https://github.com/juju/juju/pull/7256, I'd appreciate a review at some point so I can land it for the beta
<cmars> anyone up for a fairly mechanical review? https://github.com/juju/juju/pull/7253
<cmars> pm me if you have any questions
<mup> Bug #1682827 changed: Bootstrap on OpenStack Cloud fails <juju:New> <https://launchpad.net/bugs/1682827>
<axw> wallyworld: so you ok with me landing that MAAS PR then?
<wallyworld> yeah
<wallyworld> we can followup over the next few days and try and get it validated
<cmars> axw, thanks for the review
<axw> cmars: np
<wallyworld> axw: the vsphere pr is reviewed
<axw> wallyworld: thanks
<wallyworld> hopefully that will be the last change for a while... :-)
<axw> wallyworld: I'm going to leave it at 2 sec to match what the govc tool does, OK?
<wallyworld> axw: sure, np. was just curious
<axw> wallyworld: no worries. thanks
<jam> axw: it doesn't seem right to disable the default cpu/cores stuff
<jam> axw: as you can supply '--constraints' if you really are constrained
<jam> the out-of-the-box result should be something that we feel is reasonable for a controller
<jam> I know we've had problems with <1GB mem in the past
<axw> jam: we already have controller-specific constraitns in the provider/common package
<jam> axw: wouldn't that also be expected for any charm?
<jam> default cpu cores = 2 might be a bit much
<jam> maybe 1 core 1GB ?
<jam> but removing it entirely seems like you'll just run into "all these charms just don't seem to work"
<axw> jam: AFAIK, no other provider overrides CPU/mem in the provider code
<axw> jam: hmm, maybe the 1GB limit is in the provider, mixed in with image selection. seeing it in the azure provider now. I can put that in.
<jam> axw: I can't say as to where it exists, but some sort of sane "minimum default" seems useful.
<axw> jam: it's different because it'll vary by site. but yes, we should - I thought it was handled outside the provider, but looks like I was wrong
<jam> axw: could we just handle the "this is already off" error instead of pre-checking?
<jam> not sure if they give clear errors
<axw> jam: maybe, not sure if the destroy task will still be actioned if that happens. I'll look at optimising it later
<jam> axw: not a big deal, pre-check is always a little bit racy
<icey> juju is once again not setting JUJU_AVAILABILITY_ZONE in all cases: https://bugs.launchpad.net/juju/+bug/1684325
<mup> Bug #1684325: customize-failure-domain has no effect when ceph-mon is deployed in a container <juju:New> <https://launchpad.net/bugs/1684325>
<axw> jam: turns out there's a default of 1GB mem in the OVF anyway, and then vcenter may allocate more. since it's defined in the images and they could change, we should still check - but I'm going to defer until after my current work
<jam> axw: k
<wallyworld> jam: axw: i checked the code - the controller contraints are processed at bootstrap in withDefaultControllerConstraints() in environs/bootstrap. We honor any constraints the user may have passed in though. why do we think vSphere should be special and get to set it's own constraints that sometimes fail?
<jam> wallyworld: juju should be defining default constraints for a controller, it doesn't have to come from vsphere
<wallyworld> right, that's what we do
<wallyworld> i was confused as to your pushback on the PR
<wallyworld> all that was done was special vSphere constraints were removed and we rely on the controller contraints already in place
<axw> wallyworld: the one piece I don't think we do in one location, is set default constraints for non-controller machines
<axw> wallyworld: e.g. setting 1GB mem. I think we might do that when selecting instance types, but that doesn't apply to vSphere
<wallyworld> right, there's aren't any
<wallyworld> hmm, ok there could a mem constraint
<wallyworld> not sure
<axw> wallyworld: anyway, as I said above the OVF says by default that it needs 1GB mem, 2 vCPUs
<axw> doesn't specify Hz
<wallyworld> so we are ok then
<axw> wallyworld: yes, but I think we should check the minimum memory, in case the images change, or images for other series/OS come along
<wallyworld> if that's what we do for other providers, sgtm
<axw> wallyworld: doesn't need to be done as a constraint, just check the ImportSpec that gets generated from the OVF
<axw> I'll do it later, working on higher priority things atm
<wallyworld> yep
<wallyworld> axw: no rush since you have stuff happening; if you get a chance, but can wait till tomorrow, i have soccer in a bit anyway https://github.com/juju/juju/pull/7258
<axw> wallyworld: ok, probably a bit later - I keep context switching and losing where I'm at :)
<wallyworld> yeah, no worries, leave it go, just wanted to let you know it was there for later
<wallyworld> tomorrow is fine
<wallyworld> the gui doen't even use it yet
<rogpeppe> axw, wallyworld, jam: any chance of a review of https://github.com/juju/juju/pull/7254?
<wallyworld> rogpeppe: i'm just heading out to soccer but will be back later this evening
<rogpeppe> wallyworld: k
<mup> Bug #1680582 changed: Agents losing connection to leader tracker <canonical-bootstack> <juju-core:Won't Fix> <https://launchpad.net/bugs/1680582>
<mup> Bug #1680582 opened: Agents losing connection to leader tracker <canonical-bootstack> <juju-core:Won't Fix> <https://launchpad.net/bugs/1680582>
<mup> Bug #1680582 changed: Agents losing connection to leader tracker <canonical-bootstack> <juju-core:Won't Fix> <https://launchpad.net/bugs/1680582>
<rogpeppe> here's a small addition to juju/utils, in case anyone cares to review it: https://github.com/juju/utils/pull/273
<jam> rogpeppe: I made some comments on it as well
<rogpeppe> jam: thanks
<rogpeppe> jam: except you're too late :)
<jam> rogpeppe: its never too late :)
<rogpeppe> jam: well, it's merging already
<jam> rogpeppe: sure, but you can always land a follow up
<rogpeppe> jam: i'm not convinced by the "no panic ever" thing.
<rogpeppe> jam: lots of our code can panic
<jam> rogpeppe: we've been bit several times with panic() in production code, and adding more doesn't make it better
<rogpeppe> jam: this is something that's easy to verify at the call site - like a number of panics that are in the go stdlib
<jam> rogpeppe: well, I'd guess that not all callers are going to be passing in consts
<rogpeppe> jam: more-or-less anything that takes a pointer as an argument can panic
<rogpeppe> jam: in practice, all callers are passing in a const for the "var=" part
<rogpeppe> jam: in this case, i think i prefer the panic to making it return an error
<rogpeppe> jam: FWIW i just used the existing tests from juju/environs/tools/build.go
<rogpeppe> jam: the implementation there didn't panic when not passed a "=" - it just silently got it wrong
<rogpeppe> jam: i guess it could just silently ignore a badly formatted argument
<jam> rogpeppe: if it is only ever used with const values, then I'm ~ok with a panic() but it feels like the type of helper that ends up folded into code that gets called from iterating over an array of env vars someone was asked to set
<rogpeppe> jam: ok, i'll make it silently ignore the bad value. i'd prefer that to adding an error return, because it's very convenient to use it inline currently.
<jam> thk
<jam> thx
<rogpeppe> jam: these are the only calls for the moment, BTW: http://paste.ubuntu.com/24419916/
<rogpeppe> jam: thanks for the review of https://github.com/juju/juju/pull/7254
<rogpeppe> jam: here you go: https://github.com/juju/utils/pull/274
<jam> rogpeppe: lgtm
<rogpeppe> jam: thanks
<rogpeppe> jam: wanna approve it on the PR?
<jam> I did
<jam> well, I hit approve, apparently not submit review
<jam> rogpeppe: done
<rogpeppe> jam: thanks
<wpk> jam: what's the PR you wanted to get reviewed?
<jam> https://github.com/juju/juju/pull/7260
<zeestrat> Is there a way to use dynamic variables such as env variables when deploying bundles with the native juju deploy in 2.x?
<wpk> is there a doc on building juju dpkg?
<wpk> *deb
<wpk> ?
<rogpeppe> anyone have a clue what failed here? http://juju-ci.vapour.ws:8080/job/github-merge-juju-utils/178/console
<wpk> glitch in matrix
<wpk> watch out for people in black coats
<rogpeppe> wpk: :)
<rogpeppe> here's a small change to make the cmd tests pass under Go tip: https://github.com/juju/juju/pull/7261
<rogpeppe>  i'm looking for a second review of this if poss, please: https://github.com/juju/juju/pull/7261
<thumper> morning
#juju-dev 2017-04-21
<axw> wallyworld: I've just left a comment on your PR about a major issue, but still going through it
<wallyworld> ok, ty
<wallyworld> fair point
<wallyworld> veebers: that unit test that failed on CI works fine for me - not interactive auth failures
<wallyworld> *no
<wallyworld> i've got an up-to-date zesty
<wallyworld> axw: part of the issue is that passing in collections to watch from the apiserver facade breaks layering elsewhere. the collections are not exposed at that layer. i'll have to try and think of something else
<axw> wallyworld: you don't need to specify collections, just concepts
<axw> wallyworld: we still talk about application offers above the state layer do we not?
<axw> wallyworld: I'm not suggesting you pass in the collection names, but rather pass in a struct/bool saying whether or not to include "application offers"
<axw> wallyworld: and that should be decided at the apiserver, where we normally do the permission checking
<wallyworld> we do. it comes down to how to map that. does the struct has a single bool - watch everything except offers. or do all the things to watch need to be enumerated etc. i liked the perm approach because it was sort of role based
<wallyworld> i could start out with a bool i guess
<axw> wallyworld: my point is that we don't do role checking in the state layer. *that* is a layer violation
<wallyworld> that is true
<axw> wallyworld: IMO, a struct with a single bool just for application offers. everything else is on by default and can't be controlled, until/unless we need to do that
<wallyworld> but there's no easy way to map the concept of a role to a what do i do
<wallyworld> that works for now
<veebers> wallyworld: let me dig around, I'll have more questions for you in a bit :-)
<wallyworld> ok
<menn0> thumper: ping?
<thumper> pong
<menn0> thumper: actually, switching to onyx channel
<wallyworld> veebers: will be there soon, just in another meeting
<veebers> wallyworld: ack
<wallyworld> burton-aus: test!
<burton-aus> wallyworld Got it.
<thumper> hmm...
<thumper> if I have an api connection to a controller...
<thumper> how do I download some tools...
<thumper> ugh...
 * thumper will look Monday
<thumper> have a good weekend everyone
<wallyworld> jam: unless i am wrong, it looks like our bot runs unit tests with mongo 2.4.9
<wallyworld> it seems i am getting a landing error due to a 2.4.9 vs 3.2 thing. i need to re-test locally with 2.4.9 to be sure
<wallyworld> did you think we were still using 2.4.9?
<wallyworld> but i can't get the tests to fail locally (incl with --race). but the bot is not happy with some inernal mongo error (or so it seems)
<jam> wallyworld: yes, everything on trusty uses 2.4
<jam> wallyworld: so you have to support both
<jam> wallyworld: as in, juju bootstrap --series=trusty *also* uses 2.4 (from what I understood)
<rogpeppe> a small fix to juju/utils to help it pass juju tests: https://github.com/juju/utils/pull/275
<rogpeppe> wpk: you might want to take a look at this ^
<anastasiamac> rogpeppe: lgtm
<rogpeppe> anastasiamac: ta!
<rogpeppe> anastasiamac: hi, BTW :)
<anastasiamac> rogpeppe: thnx for following it up :) hi \o/
<rogpeppe> anastasiamac: only trying to fix tests in juju-core that were broken when i updated utils...
<SimonKLB> i guess it is by design that you're not able to access relation data from "relation B" while youre in the context of "relation A" - but does anyone here have a viable solution to get around that?
<SimonKLB> in this case both relation A and B is related on the same interface (if that matters at all)
<SimonKLB> say you have 3 different data-sources and you want to grab the connection information of all 3 when configuring your charmed application
<jrwren> SimonKLB: AFAIK you can always access any relation data. What are you seeing which suggests that you cannot?
<SimonKLB> "error: permission denied"
<jrwren> from which command?
<SimonKLB> relation-get -r [id]
<jrwren> that is very surprising. I've never seen that. Maybe the relation does not actually exist?
<jrwren> does it show in relation-list -r [id] ?
<jrwren> is the [id] used one that is related to the unit for which the hook is running? does it show in relation-ids?
<SimonKLB> jrwren: yes i grabbed it from relation-ods
<SimonKLB> ids*
<SimonKLB> relation-list -r [id] works fine
<SimonKLB> shows the correct charm/unit-no
<SimonKLB> jrwren: http://paste.ubuntu.com/24426504/
<jrwren> that has got to be a bug or something very strange going on. It worked with one id but not another. What version of juju is this?
<jrwren> why does your relation-get not require a unit id?
<SimonKLB> jrwren: could it be that it gets it from an env?
<SimonKLB> juju version: 2.2-beta2-xenial-amd64
<jrwren> SimonKLB: not afaik, but this is the edge of my juju knowledge.
<SimonKLB> echo $JUJU_REMOTE_UNIT
<SimonKLB> charmscaler-metric-cpu/0
<SimonKLB> that is probably it
<wpk> rogpeppe: ok, I'll fix the PR for juju/juju change
<jrwren> ah. I see. interesting.
<wpk> rogpeppe: it didn't feel right tbh
<rogpeppe> wpk: i've changed it
<rogpeppe> wpk: another change was needed in juju-core because of the ordering change - that's landing in https://github.com/juju/juju/pull/7254
<jrwren> SimonKLB: yup, beyond my knowledge.
<SimonKLB> that's ok! anyone else?
<jrwren> SimonKLB: one last thought, is it a container-scoped relation?
<SimonKLB> jrwren: nope, global
<jrwren> SimonKLB: ok, thanks. It shall be interesting to learn of the resolution.
<SimonKLB> me too :D
<wpk> rogpeppe: I mean for https://github.com/juju/juju/pull/7162
<wpk> rogpeppe: I'm waiting with this merge as it can really break stuff
<wpk> rogpeppe: (as it makes proxy settings truly global, at least on Xenial)
<rogpeppe> wpk: i just made a comment on that PR
<wpk> rogpeppe: thanks
<bdx> I'm wondering how/why machine spaces constraints are all of the sudden being passed through to the lxd containers on them ..... https://bugs.launchpad.net/juju/+bug/1684143/comments/10
<mup> Bug #1684143: applications deployed to lxd on aws instances failing <juju:New> <https://launchpad.net/bugs/1684143>
<bdx> I hadn't experienced this up until this week .... all prvious lxd deploys to instances in the past had never experienced this issue
<bdx> I have lxd on top of aws instance instance deploys that I sent off all last week, and for all of time prior to hitting this this week
<balloons> hml, externalreality, someone want to have a glance at https://github.com/juju/juju/pull/7257?
<mup> Bug #1685382 opened: max_user_instances in inotify runs low on juju-db units - version 1.25.6 <canonical-bootstack> <juju-core:New> <https://launchpad.net/bugs/1685382>
#juju-dev 2017-04-22
<bdx> machine constraints passed through to containers ..... is this new in 2.1?
#juju-dev 2017-04-23
<rick_h> bdx: there was new stuff in 2.1 I tbelieve to have containers respect constraints. e.g. you could limit a lxd container to 2gb of ram
<rick_h> bdx: and it'll use the lxd tools to enforce that
<bdx> rick_h: yeah, but also inherit the constraints from the host?
<bdx> :(
<bdx> rick_h: which won't work with spaces for any provider other then MAAS
<wallyworld> babbageclunk: the 1:1 meeting got messed up - i've re-added it to the calendar if you're free
<babbageclunk> wallyworld: oh yeah - let's do it!
<wallyworld> babbageclunk: something's come up - just have to pop out for a bit, so if you ping me for a review and I don't answer straight away, it's not that i'm ignoring you....
<babbageclunk> wallyworld: ok. I mean, that's what you'd say if you were ignoring me tho. :)
<wallyworld> yep :-)
<bdx> someone make my juju deployed lxd applications not inherit their hosts spaces constraints by default on aws plssss
<thumper> wallyworld: hey
<thumper> wallyworld: if you are in the all watcher code, are you able to address the current critical bug there?
<thumper> well... poo
 * thumper goes to dig
<babbageclunk> wallyworld: my computer died, but all better now - take a look at this? https://github.com/juju/juju/pull/7246
#juju-dev 2018-04-16
<veebers> Can I get a review on https://github.com/juju/juju/pull/8599 please :-)
<wallyworld> looking
<wallyworld> veebers: lgtm
<wallyworld> kelvinliu: don't forget to $$merge$$ your PR once approved
<veebers> wallyworld: cheers
<kelvinliu_> yup, i am doing it now, thx Ian
<thumper> anastasiamac: I just realised I have a clash around our 1:1, I have a physio appt
<thumper> anastasiamac: can we do it an hour later?
<anastasiamac> thumper: sounds good
<thumper> thanks
<wallyworld> kelvinliu: i am seeing a hook execution error in the mysql charm when adding a relation to gitlib. it's related to some python executed by the reactive framework and looks like an incorrect file path is being passed in. so something has changed which needs to be fixed
<wallyworld> kelvinliu: this is the log from the mysql unit pod. you'll see the error in there https://pastebin.ubuntu.com/p/zmgH3Hxkvh/
<wallyworld> would be good to know if you're seeing the same issue
<anastasiamac> an easy review PTAL, adding bionic to utils - https://github.com/juju/utils/pull/299
<kelvinliu_> i just recreated the k8s core controllers and models, but a install hook was failed in worker node and didnot retry. So I deleted whole models and controller and is doing it again. the cluster is stablizing now
<wallyworld> kelvinliu: it's coming from this reactive helper def juju_version():
<wallyworld> the path for jujud is different
<wallyworld> so i need to make an upsteam change to account for that
<wallyworld> for caas agents, it's /var/lib/juju/tools
<wallyworld> not /var/lib/juju/tools/machine-*
<wallyworld> anastasiamac: lgtm, ty
<anastasiamac> wallyworld: \o/ phenomenal! thnx
<kelvinliu_> u mean in charms.reactive ?
<wallyworld> kelvinliu: yeah, in the core package
 * thumper afk for a bit
<kelvinliu_> https://github.com/juju/charm-helpers/blob/master/charmhelpers/core/hookenv.py#L1042
<kelvinliu_> i can work on this
<wallyworld> kelvinliu: that's upstream. the best way to test I think i to add a replacement function to the hookenv.py file in the reactive base layer for caas. that function in there *should* be used in preference to the upstream core one. we can get things working and then publish changes to the caas base layer. it will take a bit more then to get stuff upstream
<kelvinliu_> sure, agreed
<vino> what is the clean way to get the controller destroyed ? i started the bootstrap but then i gave sigkill :(
<vino> destroy gives trouble
<wallyworld_> kill-controller
<wallyworld_> i normally do "juju kill-controller -t 0 -y <name>"
<vino> ok. thx. i didnt know this and was trying some other froce option with destroy
<vino> which doesnt exists
<veebers> wallyworld_: d'oh, could have sworn I ran the whole suite, I updated https://github.com/juju/juju/pull/8599 with a unit test fix (I realise now I squashed it so it'll be harder to see the test changes :-\)
<anastasiamac> wallyworld_: i replied but u seemed to have bowed out of Canonical channels.... so dunno if u saw...
<wallyworld> vino: vpn messed with irc, did you see my reply above?
<vino> kill-controller ?
<vino> yes
<wallyworld> great
<vino> thx
<thumper> babbageclunk: got a few minutes to talk about the engine worker dependency issues?
<babbageclunk> thumper: yup yup! Would welcome more ideas
<babbageclunk> in 1:1?
<babbageclunk> thumper: ?
<babbageclunk> I guess he went away
<wallyworld> babbageclunk: 12 line PR?
<wallyworld> https://github.com/juju/juju/pull/8601
<wallyworld> kelvinliu: the above fixes the mysql/gitlab issue ^^^^^
<babbageclunk> wallyworld: with pleasure!
<wallyworld> yay!
<wallyworld> until we get upstrean charmhelpers fixed
 * wallyworld afk for a meeting for a bit
 * thumper takes a deep breath
 * thumper gets back to tests for migrationmaster facade
<jam> wallyworld: thumper: /wave. I don't think I have anything pressing to bring up, though I'm curious how the meeting on Friday went.
<thumper> jam: which one?
<jam> btw, are we supposed to be switching to #juju?
<jam> thumper: mark
<thumper> yes
<thumper> jam: it went really well
<thumper> jam: I want to sort the mailing lists out first, and was going to announce irc changes when that was done
<jam> sure
<thumper> bug got a bit caught up trying to finish something off
 * vino in induction prg
<wallyworld> vino: there's a go fmt error in the juju run pr thant landed cmd/juju/commands/run_test.go:95:30: missing ',' before newline in composite literal
<wallyworld> kelvinliu: i just pushed a new version of the operator image to dockerhub which adds a symlink the the jujud that the pythin charm helper code expects. will will still need to fix the python though
<wallyworld> but things will work again with this change
<wallyworld> we'll need to add the fixed python to the base base layer next and once that's verified, push an upstream change
<vino> hi wallyworld
<vino> i am looking into it.
<wallyworld> hey
<wallyworld> thank you
<vino> not just that i found another issue as well.
<vino> juju run is broken because of the commit we made this morning it wont work without explicitly specifying a timeout value
<kelvinliu_> thx Ian.
<vino> wallyworld:  PR#8603 fix for run command
<wallyworld> looking
<vino> thank u
<wallyworld> vino: lgtm
<admcleod_> anyone here have experience with a mixed provider model, e.g. MAAS + manually provisioned machines?
<rick_h_> admcleod_: what are you looking for?
<admcleod_> someonet that has experience with that
<rick_h_> admcleod_: in what way? I mean I know several folks have used it for certain things
<admcleod_> i have a controller in maas, ive added an s390x lpar (it has to be a manually provisioned machine)
<rick_h_> k
<admcleod_> and im having issues with deploying bundles to it. im in the process of trying different scenarios to create a bug
<admcleod_> but someone may know something very explicit already
<rick_h_> this hit the mailing list right? /me thought he saw it earlier today
<admcleod_> i didnt post it there
<rick_h_> ah sorry, the main juju channel
<admcleod_> yep :)
<rick_h_> admcleod_: so if you add a machine for the non-s390x and then use the --map-machines where 0=0,1=1 (assuming 0 is the lpar per the other convo) that fails to work work?
<rick_h_> hmm, one thing I've not tested out is if the map-machines will allow you to only map some of the machines in the bundle to existing numbers in the model
<admcleod_> well. im using --map-machines=existing, because if its ALL manual provider, that works fine
<admcleod_> but what im seeing so far is that it wont use the existing machine
<admcleod_> i have 12 different scenarios that im trying out now to add to a bug, e.g., bundle with/without machines, with/without constraints, with/without map machines
<rick_h_> right, but if you're map machine existing but you only have one machine manually added I'm not sure how that's supposed to work right
<rick_h_> constraints are ignored if a placement directive is in play
<rick_h_> you've overriden any of that
<rick_h_> and map machines is just a placement directive is my understanding
<admcleod_> well, that may be what its supposed to do
<admcleod_> but im testing it regardless
<rick_h_> there's no sense saying "constrain to a machine with 4g of ram and put it on machine 2"
<rick_h_> k
<admcleod_> sure. that makes no sense.
<admcleod_> neither does 'map-machines=existing' .. oh ill ask for another machine from maas then
<rick_h_> right
<rick_h_> so I would expect that before the bundle deploy you'd have the existing machines setup, you'd map number by number (or they have to match the same number in the bundle to the number in the model) and it should work?
<rick_h_> but you're saying that it fails to work properly in that case?
<admcleod_> well, imagine a scenario where i want 3 machines for 3 magpie units (the real scenario is openstack with different hypervisor architectures)
<admcleod_> so i have to add the s390x machine before i deploy the bundle
<admcleod_> but its a maas controller for the amd64 nodes
<admcleod_> so i only manually add 1 machine
<admcleod_> then what i would expect is if i deploy the bundle with --map-machines=existing, it would use the manually provisioned machine and request 2 more from maas
<rick_h_> so I'd ask you to not use existing, but instead use a number match for that one machine and see if it works
<admcleod_> ill try that now
<admcleod_> but why should existing not work?
<admcleod_> does that imply 'all must exist or i will ignore you'?
<rick_h_> so I'm nervous about what "existing" does as it's assuming use the numbers of the machines in the bundles against the same numbers in the model
<thumper> morning
<rick_h_> this is what I mean in that I've not tested a partial deploy like that where only some machines are available in the model vs not
<admcleod_> ok, let me tell you what happens with the number...
<rick_h_> thumper: here can tell you the what why after that :)
<thumper> balloons: shall we use the QA sync for the 1:1 you can't make tomorrow?
<admcleod_> rick_h_: didnt work, pastebin coming
<rick_h_> admcleod_: kk
 * thumper awaits pastebin
<admcleod_> wait wrong bundle, have to do that again
<rick_h_> ah ok
<thumper> admcleod_: can you make sure we have the status showing machines, and the bundle?
<admcleod_> thumper: yep
<admcleod_> alright, so if i specify the machines in the bundle correctly, then both map-machines=existing and map-machines=#=# work ok. so thats ok
<rick_h_> admcleod_: ok cool, that makes me feel a little better
<admcleod_> so map-machines=existing explicitly requires machine defintiions and placement directives?
<rick_h_> admcleod_: that's what I'm not sure what exactly existing does. I've not used it much tbh. If I map I go all mappy
<admcleod_> i was hoping i could specify a constraint, e.g. arch, and that would map to existing machines
<rick_h_> admcleod_: yea, but since mapping is a placement, constraints get ignored
<admcleod_> alright
<admcleod_> so then i cant use constraints with manually added machines?
<rick_h_> admcleod_: I don't think there's a place where that works out because you're back to saying "constraint this" but at the same time it's about matching machine numbers.
<admcleod_> well. if i was only using MAAS for machines, i could ignore defining machines explicitly, and use arch constraints
<rick_h_> admcleod_: right
<rick_h_> "let the provider deal with it"
<admcleod_> but i cant do that with a manually added machine
<veebers> Morning all
<rick_h_> but when you go with manual machines and mapping you've taken the provider out of the picture
<admcleod_> is that a 'feature' or a bug?
<admcleod_> well forget mapping then
<rick_h_> hmm, if you don't do any mapping and all the machine were added to the model (e.g. you weren't asking juju to provider for some things and not for others) I'm not sure tbh
<admcleod_> right so i guess "juju deploy" talks directly to the provider and doesnt check manually provisioned machines first
<rick_h_> right
<admcleod_> that is incredibly annoying
<thumper> admcleod_: existing effectively replaces 0=0,1=1,2=2...
<admcleod_> thumper: right
<thumper> explicit mapping overrides existing
<thumper> you can use explicit mapping with and without existing
<admcleod_> alright. so i can sort of work around my issue with mapping and machine specification and placement
<thumper> admcleod_: deploy of bundles does check the existing model first
<admcleod_> thumper: ah well it doesnt seem like it does
<thumper> no... it does
<thumper> I'd like a concrete example that shows it isn't working
<admcleod_> thumper: ok, without machine definitions, with a manually added machine, with constraints
<rick_h_> thumper: so if he manually adds a machine of a unique arch into the model and then deploys the bundle and expects juju to match the machine based on arch constraints juju will not pick up the existing machine
<rick_h_> thumper: because constraints = provider and the machine is not provider based (manaully added)
<thumper> juju doesn't do any of that
<thumper> use existing is *only* for numbered placement
<admcleod_> ok
<admcleod_> so forgt map-machines
<admcleod_> how about just using constraints
<rick_h_> thumper: that's what we're saying. If you juju add-model; juju add-machine ssh.....(s390x); juju deploy openstack-bundle-with-one-s390x constraint machines;
<admcleod_> thats what im trying to prove
<rick_h_> it doesn't work that way
<thumper> no... it doesn't work that way
<thumper> at all
<thumper> juju deploying a single app with constraints will never choose an existing machine
<thumper> unless you use --to
<thumper> which points to a machine
<rick_h_> thumper: right, and then --to negates any constraint checks at that point.
<thumper> this is outside bundles even
<thumper> agreed
<admcleod_> ok so thats my answer then
<admcleod_> i cant use constraints
<rick_h_> thumper: that's what we're establishing with admcleod_, he was hoping that to make his bundle work his only change to the deploy was to add the machine that matched the constraints manually.
<admcleod_> and just before i bothered: https://pastebin.canonical.com/p/BYj5p3hfzT/
<admcleod_> right because we have some bundles that do rely on constraints to provision multiple architectures without using --to
<rick_h_> thanks for clarifying admcleod_
<thumper> yeah, that isn't going to work with the current implementation
<thumper> you can specify multiple architectures, but the provider must provide those
<admcleod_> thumper: alrighty
<thumper> can you maas also have a s390x pod?
<thumper> s/you/your/
<admcleod_> not right now
<admcleod_> ill use --map-machines and explicit machines
<admcleod_> it wouldve just been nice to be able to only use constraints
 * thumper nods
<thumper> I understand
<thumper> babbageclunk: ping
<admcleod_> i mean if juju knows the arch of the manually provided machine... would it be that hard to implement? or would that cause more problems than it would solve
<thumper> I think it would cause more problems than it would solve without a comprehensive review of behaviour
<admcleod_> right. i guess being explicit is the safest way
<balloons> Morning thumper
<admcleod_> alright thanks for not making me do that 24 times
<thumper> babbageclunk: morning
<thumper> ugh
<thumper> that was ment for balloons
<veebers> thumper: woah, getting a bit passive aggressive with your ping there ;-)
<balloons> Sure we can chat now
<thumper> balloons: I've jumped in our 1:1 HO
<babbageclunk> thumper: morning!
<babbageclunk> (also everyone else)
<thumper> babbageclunk: I have something I'd like you to look at before our 1:1
<babbageclunk> thumper: ok
<thumper> babbageclunk: https://github.com/juju/juju/pull/8604
<babbageclunk> (Oh which is at 10, not 9:30, right? I keep forgetting.)
<babbageclunk> looking now
<thumper> babbageclunk: yes 10
<admcleod_> hrm. so should i also expect networking issues with this "mixed provider" model? i normally dont have a problem deploying to lxd on these manual machines, but in this scenario im getting "cannot get subnet" - so, presumably the MAAS provider has issues with the subnet used on the manually deployed machine?
<thumper> babbageclunk: I'm in our 1:1 HO now if you can start early
<babbageclunk> thumper: ok - not done with the review yet though (although probably also won't be by 10 either so <shrug>)
<thumper> admcleod_: there is always opportunities for networking issues between a normal provider and manual
<thumper> babbageclunk: I want to talk through that change a bit too
<admcleod_> thumper: alright well ill log a bug for that one, not sure its much of a blocker
#juju-dev 2018-04-17
<wallyworld> vino: did you have a minute, i just looked at juju run pr, quick hangout?
<vino_> yes
<vino_> wallyworld: yes
<wallyworld> vino: see you in standup
<wallyworld> anastasiamac: small PR if you have a chance - needed for 2.4 beta 1 https://github.com/juju/juju/pull/8613
<anastasiamac> wallyworld: nws, m about to pr up the comment u've asked and will need a review too :D
<anastasiamac> babbageclunk: veebers: could u please TAL https://github.com/juju/juju/pull/8614 does it make sense to u coming in cold into the context?
<babbageclunk> anastasiamac: will take a look soon
<wallyworld> anastasiamac: will do just let me know the PR
<anastasiamac> wallyworld: https://github.com/juju/juju/pull/8614 :)
<wallyworld> looking
<wallyworld> anastasiamac: lgtm, ty
 * anastasiamac does a fast-landing dance \o/
<babbageclunk> oops, sorry anastasiamac
<babbageclunk> Oh, should have looked first before holding off!
<anastasiamac> babbageclunk: u r sorry i dance? :D
<babbageclunk> No, I just should have paused what I was doing!
<anastasiamac> babbageclunk: nws, i just needed someone to tell me that that i made sense :D wallyworld is good at keeping my english straight :D PR is heading for landing now
<anastasiamac> babbageclunk: no no, it was not worth interuppting anyone... was just a review ;D
<babbageclunk> Well I'll do better next time.
<veebers> sorry anastasiamac was at lunch :-\
<anastasiamac> veebers: all good!
 * thumper goes to make a coffee
<veebers> wallyworld: when you have a spare moment if you could ack the test addition I made since your approval: https://github.com/juju/juju/pull/8599
<wallyworld> sure, sorry i totally forgot
<veebers> nw
<wallyworld> veebers: just a minor tweak
<veebers> gah, my illusion of perfection is broken ;-) I'll fix it up and merge
<wallyworld> anastasiamac: thanks for review, i missed those other errors, i've updated PR with a new commit
<veebers> wallyworld: remind me the command that shows charm hooks run?
<veebers> hah, that's almost not english :-)
<wallyworld> juju show-status-log
<veebers> awesome thanks
<veebers> wallyworld: is install hook retried by default perchance?
<wallyworld> veebers: yah, all hooks are retried i think 3 times
<veebers> wallyworld: ah right cheers, I'll update the test charm to take that into account :-)
<wallyworld> yeah sorry should have mentioned that
<veebers> nw, no biggie
<wallyworld> anastasiamac: i just noticed you already approved but given the changes i made for errors are non-trivial, would appreciate another look; they are all in the 2nd commit
<anastasiamac> wallyworld: k, looking now :D
<wallyworld> awesome, ty
<wallyworld> yay, thanks for review
<thumper> babbageclunk: got a minute, I'm a little confused
<babbageclunk> thumper: sure
<thumper> babbageclunk: jumping in the HO
<anastasiamac> wallyworld: that was really good !!! loved how u consolidated all these errors. once i saw it, it totally made sense
<wallyworld> yeah, we do tend to scatter shite everywhere
<wallyworld> was good to clean it up
<wallyworld> thanks for noticing in the PR
<wallyworld> was late when i did it :-)
<anastasiamac> \o/
<veebers> It appears that juju is attempting to re-try the install hook at least 8 times. Am I doing something odd in my charm to trigger this?
<veebers> all my charm is doing is in the install hook increment the attempt count (file stored in tmp) and setting: "status-set blocked "Install hook failed on purpose." then exit 1
<veebers> wallyworld: oh you;re back :-) I was just complaining: It appears that juju is attempting to re-try the install hook at least 8 times. Am I doing something odd in my charm to trigger this?
<veebers> all my charm is doing is in the install hook increment the attempt count (file stored in tmp) and setting: "status-set blocked "Install hook failed on purpose." then exit 1
<wallyworld> yeah, using vpn drops irc :-(
<wallyworld> i don't think you're doing anything amiss
<wallyworld> i think juju reties 3 times
<wallyworld> maybe just exit 1 and see what happens though that shouldn't make a difference
<thumper> wallyworld: it retries more than 3 times
<wallyworld> really? :-(
<wallyworld> i wonder why
<wallyworld> even 3 seems too many
<wallyworld> i had recalled that we settled on 3 but obviously i'm wrong
<veebers> it's tried 10 times now ^_^
<wallyworld> i'm sure in my testing it didn't do that
<wallyworld> it retried what i could have sworn was 3 times and then gave up
<wallyworld> and left the unit in error state
<veebers> wallyworld: if your testing was just exit 1 always you might not have seen it, but because I'm waiting to make sure it stops trying to ensure my resolve is the thing that fixed it I see it happen
<wallyworld> but could be wrong there too
<wallyworld> if you always exit 1 then resolve will be the thing that moves it on
 * thumper thrashes his computer somewhat
<veebers> wallyworld: right, I can resolve --no-retry to move it on, but I can't ensure that resolve will re-run the hook when requested (although it seems perhaps that's not needed as it's *always* retrying)
<thumper> babbageclunk: got 5 minutes?
<thumper> babbageclunk: I'm invoking the live demo demons
<thumper> babbageclunk: demonstrating as yet untried code
<babbageclunk> sure
<wallyworld> veebers: yeah, the code will need to be read to see what's going on. i can't recall off hand
<wallyworld> --no-retry would be a good start
<veebers> wallyworld: ack. I'll proceed with a functional test for no-retry. If I get nosey / have time I might try figure out what the code is doing there too
<wallyworld> sgtm
 * thumper jumps back in HO
<veebers> wallyworld: I'm confused, my juju 2.3.5 resolve works as expected, passing --no-retry skips the install hook and goes onto the start hook
<thumper> babbageclunk: I realise there was one more space where I haven't hooked it up :)
<babbageclunk> thumper: I mean, I guess that had to have been the problem
<thumper> :)
<babbageclunk> thumper: so it's working now with quick status changes?
<thumper> shtill threading
<thumper> babbageclunk: got it using the new code, but all coming back as unknown not alive
<thumper> so all machine showing down
 * thumper sighs
<babbageclunk> doh
<thumper> added more logging
<thumper> ah FFS
 * thumper knows
<thumper> babbageclunk: I think I have it this time...
<babbageclunk> by George, I think he's got it!
<thumper> nope
<thumper> got logic backwards
<thumper> trying again
<thumper> this is the culmination of around two years of effort
<thumper> I'll be celebrating when this is in
<wallyworld> thumper: sent email about upgrade test for 2.3.6 - short answer NFI, can't reproduce
<thumper> BOOM!!!
<thumper> working
<thumper> wallyworld: ok
<thumper> babbageclunk: wanna see it before I run to get Maia
<thumper> ?
<thumper> too late
<thumper> gotta go
<thumper> I'll be back on later tonight to polish and put up for review
<thumper> wallyworld: this is new presence hooked up in status
<wallyworld> oh joy
<thumper> with user updatable controller feature flags
<wallyworld> faaark, nice
<thumper> wallyworld: jump on a HO
<thumper> I want to show someone
<wallyworld> righto
<wallyworld> lol
<wallyworld> vino_: go fmt is still sad. you'll need to do a quick fix for run_test.go
<vino_> that same line 95 ?
<thumper> wallyworld: in our 1:1
<vino_> ok let me look
<wallyworld> vino_: there's about 4 errors
<babbageclunk> thumper: oops, sorry
<vino_> i cudnt get this githook fix in my setup.
<vino_> huh..
<anastasiamac> thumper: wallyworld: PTAL licencing for 2.3 - https://github.com/juju/juju/pull/8615
<wallyworld> ok
<wallyworld> jeez, 31 files
<anastasiamac> wallyworld: but mostly tests :D
<anastasiamac> 31? m seeing 25...
<wallyworld> maybe i can't count
<veebers> wallyworld: you're 'resolve --retry' fix, did that go in for 2.3.5 or 2.3.6?
<wallyworld> 2.3.6
<veebers> wallyworld: so with 2.3.5 we should see that 'resolve --no-retry' does in fact try the errored hook? I'm not seeing that, but I"ll double down now and confirm that
<wallyworld> yeah, i think 2.3.5 has the issue but hmmm, not sure now
<vino_> wallyworld: run_test.go - 4 errors correct ?
<wallyworld> sounds right yeah, that's whay my go fmt said i think
<vino_> line: 105, 110, 119
<wallyworld> simething like that, don't have the putput anymore
<vino_> no worries.
<vino_> wallyworld: done. plz verify
<wallyworld> looking
<wallyworld> vino_: i thought there were four errors, https://pastebin.ubuntu.com/p/b3DM8qm4m3/
<kelvinliu_> wallyworld: i still got the same problem. so after I run `juju deploy ./mysql`, I got this message immediately, after unit status changed to `allocating`.
<vino_> can u see that ?
<wallyworld> which message?
<vino_> next to @line104
<wallyworld> lookin again at pr
<wallyworld> vino_: ah, look ok. the pr showed 2 changes together
<wallyworld> sorry, looks good, will approve
<vino_> yea.
<vino_> but i shd get this githook and few other things fixed sooner..
<vino_> no sorry required :)
<wallyworld> yeah, see if you can get it working
<wallyworld> also, our landing bot shoulld not accept prs with these errors
<wallyworld> we need to fix that
<wallyworld> kelvinliu_: which message are you seeing?
<kelvinliu_> wallyworld: some logs are here https://pastebin.ubuntu.com/p/D9sjnXvY4K/
<wallyworld> kelvinliu_: i can't see anything obvious. does juju status show an incremented version number to indicate the upgrade went ok? ie 2.4-beta1.2 or something like that
<wallyworld> you could try creating a new model and trying again
<kelvinliu_> wallyworld: the controller upgrade went ok, and tried it a few times, now I got 4 testcaas models
<wallyworld> you should see in the operator logs that the charm reactive code is calling set-pod-spec
<wallyworld> this triggers juju to create the unit
<wallyworld> which then creates the mysql pod
<kelvinliu_> wallyworld: no, it's not there
<wallyworld> ok, let me run up a k8s cluster and see what my logging says
<kelvinliu_> wallyworld: so after `juju-unit  executing    running start hook` then `workload   unknown`, then `juju-unit  idle`
<wallyworld> i can't recall the exact order off hand, i'll see when my k8s cluster starts
<kelvinliu_> wallyworld: thx
<kelvinliu_> wallyworld: or do u have a couple of minutes for a screen share session?
<wallyworld> kelvinliu_: i won't be much help until i run up my k8s bundle to compare
<kelvinliu_> wallyworld: ok, sure, we will wait for the cluster up and running first
<wallyworld> kelvinliu_: i just started a k8s cluster and deployed mysql without any problem
<wallyworld> did you want to do a hangout?
<kelvinliu_> wallyworld: yes, plz
<wallyworld> see you in standup
<kelvinliu_> yup
<vino_> have a quick question.. only snap install, will install the service files in /var/lib/juju/init ?
<vino_> i dont see in /var/lib/juju/tool
<wallyworld> vino_: snap install of juju is just the client
<wallyworld> when you bootstrap, it pulls down the agent binary and creates the systemd files on the vm
<wallyworld> so you need to bootstrap and ssh into the controller machine to see what was done
<kelvinliu_> wallyworld: just raised a bug for the remove-relation failure on apps running k8s -> https://bugs.launchpad.net/juju/+bug/1764649 and also created a card on caas trello board plz let me know if any questions. thx
<mup> Bug #1764649: juju caas remove-relation does not work <juju:New for wallyworld> <https://launchpad.net/bugs/1764649>
<wallyworld> great ty
<vino_> wallyworld: sorry i was a bit away for my tea.
<vino_> yes i did that.
<vino_> and i am there
<wallyworld> vino_: great ok, so the service files - you should see a symlink from files /etc/systemd/system to their actual location under /var/lib/juju/init
<vino_> :) yes.
<vino_> i did ssh to instance but was looking in another terminal didnt realize :p
<wallyworld> ah no worries
<thumper> wallyworld: still around?
<wallyworld> thumper: sorta
<wallyworld> soccer soon
<thumper> wallyworld: how soon?
<thumper> just getting PR ready
<wallyworld> 15
<wallyworld> maybe 0
<wallyworld> 20
<thumper> long enough
<wallyworld> ok
<thumper> poo, need to remove some critical logging...
<thumper> wallyworld: https://github.com/juju/juju/pull/8617
<thumper> wallyworld: did you want to get on a hangout to talk through?
<thumper> smaller than the penultimate branch
<wallyworld> thumper: can do
 * thumper jumps in HO
<thumper> phew...
 * thumper waits for the merge
<thumper> manadart: ping
<manadart> thumper: Pong. On a call with jam.
<thumper> manadart: hey, was just wondering if you had an ETA on removing the horrible lxd patches we have
<manadart> thumper: Not yet, but I can peel off this intermittent failure I am working on and round up such an estimate.
<thumper> heh... no, we're good
<thumper> what is the intermittent failures?
<manadart> https://bugs.launchpad.net/juju/+bug/1753418
<mup> Bug #1753418: intermittent failure in kvmProvisionerSuite.TestKVMProvisionerObservesConfigChanges <intermittent-failure> <test-failure> <juju:In Progress by manadart> <https://launchpad.net/bugs/1753418>
<thumper> night all
<manadart> G'night.
<hml> gâmorning o/
<rick_h_> morning hml
<thumper> morning
<hml> morning thumper
<veebers> Morning o/
<thumper> wallyworld: so... no power nor internet at home now
<thumper> my laptop battery has several hours I guess
<thumper> and tethered to my phone
<thumper> they are replacing a damanged pole outside my house, so have turned off power, but we never got notified
<thumper> from 9 till 3
<wallyworld> thumper: oh that's what happended
<thumper> yeah, I was a little surprised, but you should see the kids
<thumper> they are like, I can't charge my phone?
<thumper> can't watch netflix?
<thumper> what am I going to do?
<hml> ha!
<wallyworld> lol
<wallyworld> thumper: so no release call :-) i thought i might be good to chat about that upgrade issue even thoigh nix is away
<thumper> wallyworld: let's do a voice only hangout
<thumper> to save my data
<wallyworld> ok
<veebers> wallyworld: I'm scratching my head on this one, re: resolve command: every build I've tried it works as expected, --no-retry skips the failing install hook and continues.
<veebers> I tried 2.3.5 (snap installed), I tried the commit before your one that fixed the issue
<babbageclunk> ha, if you're benchmarking something where you're appending to logs, make sure part of the thing isn't copying all of the logs so your bechmark gets progressively slower.
<wallyworld> veebers: i don't have an explanation for that off hand, will need to investigate
<wallyworld> thumper: we can fill out much of the 2.4 release notes but the enable-ha/remove-machine and ha space sections are todo (john/joe) as are model owner changes (you). i think we may need a day to get these sorted
<thumper> yeah
<wallyworld> so we maybe should take a view tomorrow after forcing folks to fill out notes today :-)
<wallyworld> we can do 2.4 and 2.3.6 a day apart
<wallyworld> different people do each
<vino> hey Wallyworld: morning! fixed the format issues u have mentioned.
<wallyworld> great ty
<wallyworld> looking at pr
<wallyworld> lgtm
<wallyworld> babbageclunk: ready!
<wallyworld> still in same hangout
<babbageclunk> ok
#juju-dev 2018-04-18
<wallyworld> veebers: reviewed, a couple of questions, see what you think
 * veebers looks
<veebers> wallyworld: have responded, essentially they're all good points I'll make the change. I did respond re: the status-log question though as it needs further comment.
<babbageclunk> wallyworld: that seems to work fine for the ifPrimaryController! Unfortunately the other use of the singular worker is in the model manifolds - it's not so simple to get rid of that one without actually implementing leases, or else running a set of raft workers in each model. :(
<wallyworld> veebers: sorry, was eating, looking now
<wallyworld> babbageclunk: well that sucks. those model workers ultimately run on a controller. i wonder if there's a way to gate their startup in the engine definition somehow. not sure off hand
<wallyworld> veebers: makes sense, thank you, left a couple of comments
<babbageclunk> wallyworld: Yeah, I think we could feed the state of the raft-leadership flag from the machine manifolds into the model manifolds, but then that would have the effect of making all those workers run on the machine that was the raft leader, which I don't know that we'd want.
<wallyworld> babbageclunk: that may be true. i guess we want them spread around a bit
<wallyworld> babbageclunk: maybe use a separate named raft fsm for that :-)
<wallyworld> or just use the lease fsm or something
<babbageclunk> Yeah, it seems like just exposing the lease store would do it.
<veebers> wallyworld: hah no worries; eating is way more important than my PR ^_^ have made the suggested changes and all pushed up.
<veebers> wallyworld: you mentioned a site that had details re: tomb.v2 -> tomb.v3, is that just github and release notes or is there something more substantial?
<wallyworld> veebers: v1->v2. not sure off hand, i know there's a home page somewhere
<wallyworld> maybe just gh
<veebers> wallyworld: ack, I'll have a look. cheers
<wallyworld> veebers: bug 1745031 is fix committed now right?
<mup> Bug #1745031: gce add credentials "Enter file" absolute path msg improvment <juju:In Progress by veebers> <https://launchpad.net/bugs/1745031>
<veebers> wallyworld: aye it is, I'll update status
<wallyworld> ta, and move card
<veebers> card is done, forgot to update bug :0
<wallyworld> veebers: tiny one https://github.com/juju/juju/pull/8623
 * veebers looks
<veebers> wallyworld: LGMT, have made a suggestion on it
<wallyworld> ok, ty
<wallyworld> veebers: would $PROJECT_DIR work?
<wallyworld> i should just try it
<veebers> wallyworld: I don't think so, because it's using PROJECT which is a relative path, really just the project name
<wallyworld> veebers: i just tried it and it seemed to work.       $make -C /home/ian/juju/go/src/github.com/juju/juju pre-check
<wallyworld> where i was in some other dir
<thumper> wallyworld: have you confirmed that make check fails if go vet isn't happy?
<wallyworld> yes of course!
<veebers> wallyworld: oh cool, ah right go list is probably smart about what the project is and knows GOPATH etc.
 * thumper is now in the coffee shop up the road
<wallyworld> veebers: yeah i think so
<veebers> thumper: if the house wasn't a total madhouse I would invite you up so you could consume our power and internet
<thumper> wallyworld: have you got a link to the release notes for 2.4?
<thumper> veebers: that's fine, getting coffee while I'm here, have the kids too
<wallyworld> thumper: https://docs.google.com/document/d/1kz1NGMEeHdmSMs3PSxsDVej_tR4bjyAIcRWyx0UwohE/edit
<vino> wallyworld: have a min
<wallyworld> sure
<wallyworld> standup ho
<vino> hangout
<vino> ok.
<veebers> babbageclunk: you seen or used magithub? (https://github.com/vermiculus/magithub)
<veebers> wallyworld: to confirm, for the resolve test do you still want to use show-status-log to confirm hook execution?
<wallyworld> veebers: i reckon that could maybe be overkill?
<wallyworld> seems the charm behaviour helps verify it's all correct
<veebers> wallyworld: agreed, sorry was a bit confused by PR comments. It seems you where asking that again. I'll leave as is using charm, no log check
<wallyworld> veebers: i  may not have been clear
<wallyworld> +1 to using charm
<veebers> mean, I'll merge now then :-)
<babbageclunk> veebers: ooh, no I have not
<kelvinliu_> wallyworld: I think bug  https://bugs.launchpad.net/juju/+bug/1764649 is not a juju bug but a `interface-mysql` bug
<mup> Bug #1764649: juju caas remove-relation does not work <juju:New for wallyworld> <https://launchpad.net/bugs/1764649>
<thumper> wallyworld: there was something we were talking about recently that we wanted to add to charms metadata.yaml, do you recall what it was?
<thumper> we have a request to add some charm-version info
<thumper> but if we were going to modify metadata.yaml I'd like to get both changes in at once
<wallyworld> thumper: hmmm, yeah i do recall something, i think it was a version field for when charms supported application level data bags
<thumper> what would that version be used for?
<wallyworld> kelvinliu_: i don't think the mysql charm for IAAS clouds has the same issue? or does it?
<thumper> the charm-version I'm thinking about relates to charm releases
<thumper> like adding "pike" for keystone
<thumper> or something
<wallyworld> thumper: so a v2 charm would support X whereas V1 would not. that's different to your version
<wallyworld> maybe we need a capabilities list
<wallyworld> like series
<wallyworld> a charm supports "app data bags" and ....
<thumper> hmm...
<thumper> I know there was one where we wanted to add lxd devices to pass through etc
<kelvinliu_> wallyworld: remoteUnitName will be empty for non relation hook or relation-broken hook from juju side. So in `relation-broken hook` handler, we can't access `self.conversation()` unless the `scope` of the `Relation class in provides.py` set to `scopes.GLOBAL`
<wallyworld> thumper: yeah, that rings a bell too
<thumper> wallyworld: I'm going to make a quick doc for charm metadata updates
<thumper> so we can have one place...
<wallyworld> kelvinliu_: oh, i see, i didn't realise remote unit name was empty for relation-departed
<kelvinliu_> wallyworld: but the scope of `interface-mysql` is set to `scopes.SERVICE`
<wallyworld> thumper: yup
<wallyworld> kelvinliu_: i wasn't sure what it was set to. i think service scope has different behaviour in getting the conversation
<kelvinliu_> kelvinliu_: `remoteUnitName` is empty for `relation-broken`, `relation-departed` is fine.
<kelvinliu_> wallyworld:`remoteUnitName` is empty for `relation-broken`, `relation-departed` is fine.
<wallyworld> hmmm, i wonder why that is. i might need to go and read the uniter code
<wallyworld> there must be a reason that someone had, but it seems the mysql charm expects it
<wallyworld> to not be empty
<kelvinliu_> wallyworld: i guess, when we got `relation-broken` event triggered, we may lose the remote unit context in this case. So the context of remote unit will be not guaranteed to be there.
<wallyworld> kelvinliu_: just saw this in some doc: The "relation-broken" hook is not specific to any unit, and always runs once when the local unit is ready to depart the relation itself
<wallyworld> i have to check code, but it does seem relation-broken hook does not set remote unit
<wallyworld> hence the charm interface layer is wrong
<wallyworld> i'm quickly testing on an iaas model to confirm
<kelvinliu_> wallyworld: yes, I changed `'{provides:mysql}-relation-{broken,departed}'`  to `'{provides:mysql}-relation-departed'`, and `juju resolved mysql/0` is resolved successfully
<wallyworld> kelvinliu_: so it seems we don't even want to run the relation broken hook
<wallyworld> we would look to do a PR against the mysql interface layer if we 100% confirm the right thing is to skip the broken hook
<kelvinliu_> wallyworld: if do need to run `relation-broken`, we have to parse `scope` manually to get the `correct conversation`.
<wallyworld> kelvinliu_: so i checked the standard mysql charm. it is not a reactive charm. it uses relation broken to revoke all privileges for the user
<wallyworld> we should check bugs filed against the reactive interface layer to see if its a known issue
<kelvinliu_> wallyworld: yes, I am looking
<wallyworld> jam: forgot - you and manadart need to update 2.4 beta1 release notes for tomorrow's release
<jam> wallyworld: ack
<wallyworld> kelvinliu_: how goes it with mysql interface layer. do you think we could do a patch and fork the code?
<manadart> wallyworld: Ack.
<kelvinliu_> wallyworld: to make it working, we will just make the handler to listen `departed` event. I just read all the other interface repos, I only see `relation-broken` handlers if the scope sets to `GLOBAL`. from what I read from the code, `remote unit name` is required for non `GLOBAL scope.`
<kelvinliu_> wallyworld: we probably should set the scope to `GLOBAL` to handle the `relation-broken`.
<wallyworld> kelvinliu_: sounds right i think
<wallyworld> kelvinliu_: the author of the layer is cory who works for canonical. if you could do a patch and put up a PR, we can get him to look at it
<wallyworld> raise a bug and link the PR to the bug
<wallyworld> https://github.com/johnsca/juju-relation-mysql
<kelvinliu_> wallyworld: I saw you set the scope to `GLOBAL` actually. But it seems the interface includes in `layer.yaml` did not use the local interface version at `./deps/interface/juju-relation-mysql`. I am reading to the docs to see how to fix this
<wallyworld> i'm not overly familiar with the detail of the reactive layers here
<wallyworld> at EOD today, you could raise a bug with as much detail as has been discovered in the bug and cory should see that when he starts
<wallyworld> he's in US timezone
<kelvinliu_> wallyworld: i was reading this https://github.com/juju-solutions/interface-mysql-root    not sure how did `charm` know which one to download 'interface:juju-relation-mysql'
<kelvinliu_> wallyworld: yes, I will do that, thx
<wallyworld> kelvinliu_: yeah, that's not the one the charm uses. maybe that one is more suitable
<wallyworld> i'd need to look
<wallyworld> the charm knows which one to use because it's in a layers file, i'll look it up
<wallyworld> kelvinliu_: see layes.yaml
<kelvinliu_> where did u get this ./deps/interface/juju-relation-mysql from?
<wallyworld> layers.yaml
<wallyworld> in the root dir of the mysql charm
<wallyworld> charm build uses this layers file
<wallyworld>  ./deps/interface/juju-relation-mysql is put there by charm build
<wallyworld> based on layers.yaml
<kelvinliu_> ic
<wallyworld> kelvinliu_: that mysql-root one is not what we want
<wallyworld> it's for the admin db
<kelvinliu_> wallyworld: in the r`equires.py`, the scope is set to `GLOBAL`. donot know why they are different.
<wallyworld> kelvinliu_: that's for the requires side
<wallyworld> kelvinliu_: sorry, had someone at door. the bit that is failing here is on the providers side, the mysql end
<kelvinliu_> wallyworld: would you give more details?
<wallyworld> relations have a requires end and a provides end
<wallyworld> mysql defines an interface that is the provides end
<wallyworld> the logic for that is in provides.py
<wallyworld> you can look at the charm metadata.yaml file to see the endpoints
<kelvinliu_> ah, ic. so the requires.py -> gitlab, provides.py -> mysql, in this case
<wallyworld> yup, that's it
<kelvinliu_> more clear now, thx
<wallyworld> and the reactive framework plugs in the requires vs provides bit of that mysql interface layer
<kelvinliu_> ic. so the problem is provides/mysql can't get the remote name(`gitlab`) to get the correct conversation when the scope sets to `service` for event `relation-broken`.
<kelvinliu_> wallyworld: created this issue here, https://github.com/johnsca/juju-relation-mysql/issues/3   plz let me know if you have more questions, thx
<wallyworld> kelvinliu_: bug report looks ok, i'll just make a small correction - "operator" is a caas specific term which is not widely in use
<wallyworld> kelvinliu_: i can't edit, can you change operator to "uniter hook runner"
<kelvinliu_> wallyworld: updated, thx for the correction.
<wallyworld> great ty
<wallyworld> kelvinliu_: next steps include adding CI test coverage for caas deployments
<kelvinliu_> wallyworld: yup, should I create a card on A team board for this?
<wallyworld> kelvinliu_: yes please
<wallyworld> kelvinliu_: we can talk more tomorrow with chris on how to get started. all the CI code is in the source tree
<wallyworld> in the acceptancetests dir
<kelvinliu_> wallyworld: ok, thx. let's talk this tmr morning
<wallyworld> kelvinliu_: sounds good
<manadart> I have added release notes for the new controller config options for spaces. Subject to editorial intervention :)
<veebers> Morning o/
#juju-dev 2018-04-19
<vino> WallyWorld : have a min...
<wallyworld> sure
<vino> hangout
<vino> ?
<wallyworld> yup
<wallyworld> thumper: https://github.com/juju/juju/pull/8626 :-) pretty please
<thumper> not small is it
<wallyworld> thumper: it's 99 go generated
<wallyworld> 99%
<wallyworld> ignore those bits
<wallyworld> babbageclunk: got a few minutes?
<babbageclunk> wallyworld: sure
<wallyworld> 1:1
<wallyworld> vino: free now
<vino> yes .
<wallyworld> manadart: FYI https://github.com/juju/juju/pull/8626
<manadart> wallyworld: Nice.
<wallyworld> thumper: you coming to team standup?
<wallyworld> jam: also?
<jam> brt
<wallyworld> vinu: back now, did you want a hangout?
<vinu> sure
<vinu> plz gimme 5 mins
<wallyworld> ok
<vinu> i am checking this.
<vinu> excellent. it all works fine.
<vinu> thanks Ian
<vinu> can we hangout ?
* ChanServ changed the topic of #juju-dev to: https://jujucharms.com | for most conversations, see #juju
<wallyworld> vinu: hey, sorry, missed yout message
<wallyworld> now ok?
<vinu> yes.
<vinu> i am here.
<wallyworld> see you in standup ho
<hml> balloons: eatmydata is not my friend.
<balloons> hml, ohh.. no dice
<balloons> hml, I wouldn't want you to expend my effort as optimizing it really isn't required atm
<hml> balloons: lots of errors with adding the libary to /etc/ld.so.preload
<hml> and running eatmydata â sudo do-release-upgradeâ¦ gave me no time save
<hml> balloons: just tried a few things and gave up
<balloons> hml, gotcha. Worth a try at least
<hml> balloons: maybe if someone has used it beforeâ¦
<veebers> Morning all o/
<admcleod_> so i can deploy ppc64el/arm64/amd64 fine, from an amd64 machine, using maas provider, juju 2.3.2, but 2.4-beta2 nope. bug 1765524. is there any kind of workaround i can try?
<mup> Bug #1765524: ERROR failed to bootstrap model: cannot use agent built for ARCH_X using a machine running on ARCH_Y <juju:New> <https://launchpad.net/bugs/1765524>
<thumper> admcleod_: weird
<admcleod_> i try
<admcleod_> getting full debug for hml
<hml> thumper: iâm almost EOD if you want some fun today.  :-)
<thumper> hml: awesome
<thumper> admcleod_: what about 2.3.6?
<admcleod_> hml: thumper bug updated
<admcleod_> ill try 2.3.6, sec
<thumper> thanks
<thumper> oh...
<thumper> I know what this is
<admcleod_> ruh roh
<thumper> 2.3.2 has agents built for the other architectures, and they are available in simple streams
<thumper> 2.3-beta2 isn't released
<admcleod_> ahhh
<thumper> so is just being built locally
<thumper> so just the local arch is used
<thumper> sorry 2.4-beta2
<thumper> since 2.4-beta1 is now released and in streams
<admcleod_> erm
<thumper> you should be able to use that
<admcleod_> 2.3.6 didnt work
 * thumper thinks
<thumper> I recall some bug around architectures...
<admcleod_> maybe ths is some config im specifying in one fo these files
<admcleod_> nope
<veebers> thumper: I would have assumed the same, diff arches need to use agents from streams.
<thumper> personally I would have expected 2.3.6 to work
<admcleod_> it seems reasonable, maybe 2.3.6 arent in streams either?
<admcleod_> cos some builders havent run?
<thumper> I'm pretty sure that we check before announcing...
<admcleod_> trying beta1
<admcleod_> beta1 nope
<thumper> 22:14:16 ERROR juju.cmd.juju.commands bootstrap.go:535 failed to bootstrap model: cannot use agent built for "ppc64el" using a machine running on "amd64"
<veebers> 2.3.6 should definitely be in streams. beta1 won't be in released, proposed stream for beta1
<thumper> looks like the machine that was created was amd64
<admcleod_> ill see if i can see maas doing anything
<thumper> admcleod_: you would need to set agent-stream=proposed for the beta to find the agents
<admcleod_> k
<thumper> admcleod_: can we just see if we can get 2.3.6 working?
<admcleod_> ok
<thumper> admcleod_: let's add the flag to keep broken instances so we can see what we actually get
<admcleod_> k
<admcleod_> ok 2.3.6 does work, i made a mistake
<thumper> ah...
<thumper> good
<thumper> perhaps it is just the streams then
<thumper> try adding the agent-stream for 2.4-beta1 bootstrap
<admcleod_> yep sec
<admcleod_> how do i set agent stream
 * thumper looks
<thumper> model defaults I think
<veebers> admcleod_, thumper: --config agent-stream=proposed --model-default agent-stream=proposed
<admcleod_> ah
<admcleod_> proposed no, devel yes
<veebers> admcleod_: doh sorry yeah beta1 is in devel
<admcleod_> thanks!
<admcleod_> alright, ill update my bug then
<admcleod_> dont suppose we could have a slightly more informative error message? ;)
<thumper> babbageclunk: got a few minutes to chat?
<babbageclunk> thumper: I've got standup in 4 mins
<babbageclunk> after that?
<thumper> I'll be heading to the gym shortly after that
<thumper> perhaps after lunch?
<babbageclunk> yup
<babbageclunk> thumper: ping me then?
<thumper> babbageclunk: ack
<admcleod_> thumper: so the reason i was trying to get 2.4-beta2 going was to see if the lxd network issues have been resolved. now i have failure to launch with 'no associated target operation'
<admcleod_> seen that?
<thumper> no
<admcleod_> alright, bug coming a bit later
#juju-dev 2018-04-20
<admcleod_> thumper: https://bugs.launchpad.net/juju/+bug/1765571
<mup> Bug #1765571: lxd container fails to launch on bionic host: No associated target operation <juju:New> <https://launchpad.net/bugs/1765571>
<admcleod_> thumper: im going to EOD but ill leave those envs up so if you want me to try anything/get more info let me know
<thumper> admcleod_: ack
<thumper> babbageclunk: ping
<babbageclunk> thumper pingback - good timing!
<babbageclunk> in 1:1?
<thumper> ack
<thumper> wallyworld: can you reply to Junien on https://bugs.launchpad.net/bugs/1762979 ?
<mup> Bug #1762979: juju resolve --no-retry behaviour is inverted <juju:Fix Committed by wallyworld> <juju 2.3:Fix Released by wallyworld> <https://launchpad.net/bugs/1762979>
<wallyworld> looking
<veebers> babbageclunk (thumper?) I found go-guru-hl-identifier-mode, I find it useful you may too (highlights identifier under point).
<babbageclunk> veebers: might try it, thanks!
<veebers> I enabled with "(add-hook 'go-mode-hook #'go-guru-hl-identifier-mode)", I tweaked the face a little too '(go-guru-hl-identifier-face ((t (:inherit highlight :slant italic))))
<babbageclunk> I like it! It feels a bit inconsistent though - it highlights instaces of time (in time.Duration) but not Duration, and not a different package. Maybe it's just if that name (time) happens to have been used as a local somewhere in the file?
<wallyworld> thumper: :-) pretty please? https://github.com/juju/juju/pull/8634
 * thumper looks
<wallyworld> need for CI etc
<thumper> wallyworld: got a sec to chat about this?
<wallyworld> sure
<thumper> 1:!
<kelvinliu> hi wallyworld: can I have ur 5mins on hangout?
<wallyworld> kelvinliu: just talking to chris, give me
<wallyworld> 5
<wallyworld> unless you want to join standup hangout
<kelvinliu> wallyworld: sure, I will just join soon. thx
<wallyworld> kelvinliu: i *need* coffee, let me go make one and i'll ping you in 5
<kelvinliu> wallyworld: yup, no problem, thx
<babbageclunk> thumper: retrying that benchmarking on aws machines I can get between 340 - 400 updates per second. Weirdly, tweaking that parameter I was talking about didn't seem to make much difference.
<thumper> hmm... ok
<wallyworld> kelvinliu: free now, see you in hangout?
<kelvinliu> wallyworld: yup
 * thumper sighs
<thumper> looks like gofmt is sad on develop
<thumper> how?
<veebers> did wallyworlds change land for make check?
<thumper> type DBusAPIFactory = func() (DBusAPI, error)
<thumper> there is an = there
<thumper> service/systemd/service.go
<thumper> go test passes in there
<thumper> but the prepush hook fails
<kelvinliu> wallyworld: a quick one https://github.com/wallyworld/caas/pull/2 thx
<thumper> https://github.com/juju/juju/commit/4185a33acc39527276de908c2492f6286d87f0d1
<thumper> funny, but Joe's change was just renaming
<wallyworld> kelvinliu: thks
<thumper> the type alias is from before
<thumper> perhaps this is a new check?
<veebers> thumper: which go version are you using?
<thumper> 1.10.1
<veebers> I see the prechecks are happening (this run fron ~17 hours ago: http://ci.jujucharms.com/job/github-check-merge-juju/963/console)
<wallyworld> kelvinliu: it includes your bundle changes, they will need to be backed out
<wallyworld> kelvinliu: also build dirs like  charms/gitlab/deps/interface/pgsql
<wallyworld> need to not be pushed
<kelvinliu> wallyworld: use  INTERFACE_PATH=/path-to-ur-local-interfaces-dir use local interface
<wallyworld> kelvinliu: thanks, yeah, i already had that set as it turns out
<kelvinliu> wallyworld: ah, ok, I will gitignore deps
<kelvinliu> wallyworld: is the deps required or not
<wallyworld> no
<wallyworld> can be removed from PR
<wallyworld> is generated on build AFAIK
<wallyworld> kelvinliu: looks good with a few comments to fix
<thumper> wallyworld: https://github.com/juju/juju/pull/8635
<kelvinliu> wallyworld: made changes for the comments. thx
<wallyworld> kelvinliu: where does ##MODEL## come from?
<wallyworld> i've not seen that before
<kelvinliu> wallyworld: in the $profile
<wallyworld> the bash profile?
<wallyworld> ih wait
<wallyworld> i see
<kelvinliu> wallyworld: yes
<wallyworld> thanks, haven't noticed that before
<kelvinliu> wallyworld: np :)
<wallyworld> kelvinliu: looks good, i'll merge when the mysql layer change gets merged so as not to break existing users
<kelvinliu> wallyworld: yeah, thx
<axino> wallyworld: hi
<wallyworld> hey
<axino> wallyworld: so afaik, "juju resolved" has been, by default, retrying _by default_ since 2.x (1.x default was to not retry)
<axino> wallyworld: and you're saying in the bug that most 2.x versions didn't retry if one simply ran "juju resolved" ?
<wallyworld> yeah, in 2.x the behaviour was meant to be swapped
<wallyworld> 2.x requires --no-retry to not retry
<wallyworld> 2.x retries by default
<wallyworld> or is supposed to
<wallyworld> but i was testing an found it didn't
<thumper> wallyworld: if you are happy with the proxy rework branch, just add the merge comment, otherwise I'll address anything monday morning
<thumper> laters peeps
<wallyworld> thumper: looking
<wallyworld> axino: and i looked at the code and saw a bug
<axino> wallyworld: I'm suprised no one saw it in 1+ years
<wallyworld> axino: me too. apparently tim was talking to xav from bootstack and when the bug was mentioned, xav said somrthing like "oh, i was wondering why it never seemed to work properly"
<axino> heh
<wallyworld> better late than never for a fix i guess
<axino> oh yeah definitely
<axino> wallyworld: and this is an agent thing right ? not a client thing ?
<wallyworld> yeah, if i recall correctly
<wallyworld> CLI was modified but agent wasn't
<wallyworld> modified to the reversed behaviour
<wallyworld> kelvinliu: see this error? https://pastebin.ubuntu.com/p/MF3wRbZWFp/
<wallyworld> the mysql charm relation joined isn't happy
<kelvinliu> wallyworld: ah, sorry. I changed the code inside the container by exec into it, but forgot change it into the repo.
<wallyworld> ok, np, just push the change :-)
<kelvinliu> wallyworld: done.
<wallyworld> thanks, looking
<kelvinliu> wallyworld: np
<wallyworld> kelvinliu: it seems then that he doc string is wrong in the mysql layer
<wallyworld> they leave off the items() also
<kelvinliu> wallyworld: yes, I was about to add comment to the PR
<wallyworld> great!
<wallyworld> just waiting for gitlab to start now so i can see the result
<wallyworld> kelvinliu: there is indeed a problem - gitlab can't talk to mysql, gets auth error it seems
<kelvinliu> wallyworld: just found one issue on operators. so it seems operator was deployed by k8s pod api, so no `deployment` resources created. So k8s will not be able to manage it for recovering for example.
<wallyworld> you sure? i get deployments
<wallyworld> $ kubectl -n caas get deployments
<wallyworld> NAME          DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
<wallyworld> juju-gitlab   1         1         1            1           6m
<wallyworld> juju-mysql    1         1         1            1           8m
<kelvinliu> that's the deployments for app pod
<kelvinliu> i mean depployment for operator pod
<wallyworld> oh right yes, deployment for operator pod not done yet
<wallyworld> there's a todo in the code
<kelvinliu> ah, ic. I was just deleted gitlab operator pod, so it's gone
<wallyworld> kelvinliu: so if mysql password is being correctly passed to gitlab docker image, but gitlab can't connect, that implies an issue in provides.py on the mysql side i think
<wallyworld> you'll have to remove-application from juju and deploy again
<kelvinliu> yup.
<wallyworld> kelvinliu: actually, it looks like an issue with the gitlab image https://pastebin.ubuntu.com/p/2yxHnJffqT/
<wallyworld> we use latest, i might try an older one
<kelvinliu> wallyworld: yes, i got the same error.
<kelvinliu> wallyworld: but latest image was pushed 10 days ago
<wallyworld> kelvinliu: i tried an older version 10.6.4, same error :-(
<kelvinliu> wallyworld: i guess it's related with the db access
<wallyworld> it is but we need to find out why
<wallyworld> will have to try with theold mysql layer
<wallyworld> kelvinliu: solved!
<wallyworld> it was new mysql version 8
<wallyworld> this works
<wallyworld> juju deploy ./mysql --config mysql_image=mysql/mysql-server:5.7
<wallyworld> i will push a change to the charm to use that tag
<kelvinliu> wallyworld: ah mysql had a recent push.
<wallyworld> yeah
<kelvinliu> wallyworld: would be better to lock the version tag
<wallyworld> kelvinliu: that's what i'm going to do - change the tag in the charm config
<wallyworld> config.yaml has the default image path
<wallyworld> in my example above, i overrode the default when deploying
<kelvinliu> wallyworld: ah, ic. the charm actually should be latest, then overwrite it from cmd when we deploy. probably we can just add the current compatible versions to the README?
<anastasiamac> kelvinliu: wallyworld would not have seen ur message, he logged out just before it.
<kelvinliu> anastasiamac: yeah, just saw that. thx for remind. haha
<anastasiamac> nws :)
<anastasiamac> m guessing u r not using tab to autocomplete the nick :D
<kelvinliu> ah, how did u know? i just ctl+c  ^-^
<wallyworld> kelvinli_: i've pushed changes to pin the image version. you could rebase your branch
<wallyworld> kelvinli_: left some more comments in the PR
<kelvinliu> wallyworld, looking the comments now
<wallyworld> kelvinli_: thanks for changes! perhaps a comment that the -o directory can be anywhere convenient; doesn't have to necessarily be ~/.local/shared/charms
<kelvinliu> wallyworld, yeah, agreed.
<wallyworld> \o/
<kelvinliu> Have a good weekend  :-)
<balloons> Good morning
<mup> Bug #1729930 changed: juju.state.leadership manager.go:72 stopping leadership manager with error: state changing too quickly; try again soon <sts> <juju-core:Fix Released by axwalk> <https://launchpad.net/bugs/1729930>
<hml>  balloons: regarding the prehook checks not working on first push of new branch:  https://paste.ubuntu.com/p/jV84x29pZm/
<hml> balloons: perhaps because master is now not used?  guessing in the dark
<balloons> Ah, yea we should remove master and staging
<balloons> hml, did you try a quick fix of looking in develop?
<hml> balloons: sure
<balloons> I would rather we must added the meta linter instead of that old script
<hml> balloons:  i did a simple change of master to develop and still failsâ¦ something else is wrong.. not a git master
<balloons> Gotcha. Sorry it's a pain
#juju-dev 2018-04-21
<mumixam512> https://www.youtube.com/user/l0de/live IS POPPIN HOT RIGHT NOW STILL GOING!! CALL 315-505-4666. IRC.EFNET.ORG #lrh
<mumixam512> https://www.youtube.com/user/l0de/live IS POPPIN HOT RIGHT NOW STILL GOING!! CALL 315-505-4666. IRC.EFNET.ORG #lrh
<mumixam512> kjackal vinu BradCrittenden ionutbalutoiu redir marlinc beisner icey Spads jog__ wallyworld rogpeppe mwhudson higgins mthaddon ubot9 veebers jw4 elmo aluria cmars seyeongkim mup meetingology ejat babbageclunk thedac blahdeblah mhorlanchuk fnordahl manadart Mmike dpb1 vern iatrou freyes tasdomas gsamfira aisrael jamespage coreycb hazmat rick_h_ mpontillo nottrobin hazmat` cory_fu jcsackett balloons LiftedKilt jillr pjdc tinwood ubuntulog2 Odd_Bloke anth
#juju-dev 2018-04-22
<veebers> Morning all o/
<thumper> wallyworld: got 5 min to talk 2.3.6 upgrade issue?
<thumper> wallyworld: nm, let's chat this afternoon
<wallyworld> thumper: i'm here
<thumper> wallyworld: ok
<thumper> wallyworld: where?
<wallyworld> 1:1
<thumper> ack
