/srv/irclogs.ubuntu.com/2014/09/09/#juju-dev.txt

sinzuiHi davecheney00:20
davecheneysinzui: s'ok, wallyworld_ answered my question00:22
wwitzel3ericsnow: ok, done with dinner, taking a look at 70800:45
thumperwaigani_: I'm being bitten by the missing envuser now too00:50
thumperwaigani_: because I changed the 'add service' code to look for them :-|00:50
thumperwaigani_: how far away are you?00:50
waigani_thumper: just about done00:50
thumpercoolio00:50
waigani_thumper: I didn't see your messages but I implemented exactly what you suggested - even the param name :)00:51
thumperwaigani_: I didn't say them on the PR, jus here00:51
waigani_thumper: currently if you pass in creator: "eric" to MakeUser it does not create a local user for eric01:02
thumperwaigani_: I think that is fine for now01:02
waigani_thumper: this now fails when you try to specify eric as the creator of the environuser01:02
thumperwaigani_: again, probably ok01:03
thumperwaigani_: fall back to the environment owner01:03
thumperif not specified01:03
waigani_thumper: you mean if doesn't exist as a local user?01:04
waigani_because it is being specified in the params01:04
thumperI mean that if you pass a value in explicitly to the factory, it is exptected to work01:04
thumperif you haven't set it up right, then it is the tests fault01:05
thumpernot the factory01:05
waigani_thumper: right, so if you pass in "eric" as the creator you should have created "eric" as a local user?01:05
thumperyes, that is what I'm saying01:05
waigani_got it, I'll update the test in that case01:06
waigani_thumper: do we still need factory.MakeEnvUser ?01:15
thumperyes01:16
thumperwaigani_: there will be cases where we want an envuser, but they are not local01:16
thumperall users are local users01:16
waigani_right, of course01:17
waigani_thumper: just fixing up all the call sites now, there are a few01:21
thumperwaigani_: here is one for your TODO list:01:27
thumperfunc (s *Service) GetOwnerTag() string {01:27
thumperfrom state/service.og01:27
thumpers/og/go/01:27
thumperplease make it return a names.UserTag01:28
thumpercoffee time01:29
waigani_thumper: okay01:29
waigani_thumper: should the user now have a func to get the envuser?01:30
thumperwaigani_: I don't thinkso01:37
waigani_thumper: https://github.com/juju/juju/pull/70201:37
* thumper looks01:42
thumperwaigani_: one change and one question01:46
waigani_thumper: api.Open does not return a NotFound err01:47
waigani_I tried to satisfy and it failed01:47
thumperis that error from api.Open?01:47
thumperwaigani_: that's fine, that is why I asked :)01:47
waigani_right01:47
thumperalthough...01:48
thumperby the time it hits api.Open01:48
thumperit should be "permission denied"01:48
thumperand nothing else01:48
* thumper adds another comment01:49
waigani_thumper: why perm denied? I'm giving the user perms in the test01:49
thumperwaigani_: no, in the general case01:49
thumperand you aren't giving the user perms, you are explicitly testing that they can't get in01:50
thumperthe error result should be "permission denied"01:50
waigani_right, because it's more info than you should share to say that the user is not found01:52
thumperright01:54
axw_oops, sorry wallyworld_, thought I had updated my blobstore when I added it to dependencies.tsv...02:03
wallyworld_no worries02:03
thumperdavecheney: I have rockne-02 up with the deb locally02:09
thumperdavecheney: but I can't remember how to install a deb02:10
thumperanyone?02:10
bcsallersudo dpkg -i package.deb02:12
thumperbcsaller: ta02:13
davecheneythumper: how the mighty have fallen02:15
thumperdavecheney: I don't claim to be mighty with dpkg02:15
thumpernor have ever02:15
thumperhmm...02:17
thumperjuju bootstrap tells me port 37017 is in use02:17
thumperhow can I get netstat to tell me if this is true02:17
thumperI did  'netstat -a'02:17
thumperbut that didn't show the port in use02:17
thumperam I mssing something?02:17
thumperactually, I see it now02:18
thumperhmm...02:18
thumperhow to find out the process?02:18
thumperdavecheney: if I can bootstrap with the 1.18.4 deb specified with the local provider, and do status, is that verified fixed?02:22
thumpermwhudson: hey, around?02:23
mwhudsonthumper: yes02:23
davecheneythumper: yes, i think so02:24
thumperdavecheney: cool02:26
davecheneythumper: are you sure rockne doens't have 64k pages ?02:56
davecheneyif you hit it with the api-get upgrade hammer02:56
davecheneyit will be running 64k pages02:56
thumperdavecheney: yes, looked02:56
davecheneywelp, shitter02:56
thumperdavecheney: I just did upgrade, and not dist-upgrade02:57
thumperyou want me to try that?02:57
davecheneynope02:57
davecheneyuname -a02:57
* thumper is sshing in again02:57
davecheneygetconf PAGESIZE02:57
thumperLinux rockne-02 3.13.0-18-generic #38-Ubuntu SMP Mon Mar 17 21:41:16 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux02:57
thumperah...02:58
thumperwat02:58
* thumper did that just before02:58
thumperbut got a different result02:58
thumperubuntu@rockne-02:~$ getconf PAGESIZE02:58
thumper6553602:58
davecheneytextbook defintion of instanity02:58
thumperit was 4096 when I looked just before02:58
davecheneymaybe that was your own host02:58
thumperperhaps02:59
davecheneyi reckon it's not an issue02:59
davecheneyyou did the test rihgt02:59
* thumper is bootstrappinga agin02:59
thumperwhere did it fail last time?03:00
* thumper tags verification-done03:01
davecheneyonce any juju process had been running for  > 5 mins03:01
davecheneyjuju ssh some unit03:01
davecheneywait for 20 mins03:01
davecheneyno crash, all good03:01
thumperoh, it has to run for some time?03:02
thumperhmm03:02
* thumper bootstraps it and waits03:02
davecheneyyup, the bug is when the scavenger runs, it will try to munmap(2) an area of memory that isn't a multiple of the page size03:04
davecheneythis shows up on agents03:04
davecheneyand using juju ssh as the juju ssh parent process just sits there quitely03:04
menn0thumper: PTAL https://github.com/juju/juju/pull/70903:14
davecheneythumper: double underpants, check out dmsg03:16
davecheneymake sure there are no oddball kernel messages there03:16
davecheneythat's the canonical check03:16
thumperdavecheney: well, machine 0 has been up over 15 minutes03:17
thumperhad 'watch juju status` running03:17
davecheneynup[ that won't show it03:18
davecheneyjuju status only runs for a few seconds03:18
davecheneyso either the jujud daemons crash03:18
thumperdmesg seems fine03:18
davecheneylook, it's ficed03:19
davecheneyit's been fixed for months03:19
davecheneyif you use the right compiler03:19
* thumper has marked the bug as verified03:19
davecheneyjob done03:19
davecheneynext03:19
bcsallerany thoughts on why I can run lxc containers from inside a docker container but that the local provider fails to dial the state server on bootstrap?03:26
davecheneybcsaller: sounds like the networking is all fucked up03:27
bcsallerdavecheney: I was able to lxc-create/start etc. I manually brought up the lxcbr0 in the container and that seemed to work in the raw lxc case. w/o the bridge boostrap was failing much sooner03:29
bcsallerso it still might be, but I'm not sure that it is03:29
davecheneybcsaller: what addresses and networks do the various components have ?03:34
bcsallerdavecheney: juju.state open.go:101 connection failed, will retry: dial tcp 127.0.0.1:37017: connection refused03:35
bcsalleris the failure I'm seeing x10003:35
bcsallerso its not getting very far I think03:35
bcsallerI put lxcbr0 on 10.0.4.103:36
bcsallerand eth0 in the container is a 172. address03:37
thumpermenn0: did you figure out why your test was passing when you didn't think it should?03:39
bcsallerdavecheney: eh, looks like there still might be some issues with the lxc-container networking as well, so I'll keep debugging the setup03:40
=== allenap_ is now known as allenap
=== psivaa_ is now known as psivaa
waigani__thumper_: do we need to handle the error from ParseUserTag? s.doc.OwnerTag is guaranteed to be in the right format, right?04:12
=== _thumper_ is now known as thumper
* thumper thinks of how to best handle this...04:13
thumperwaigani_: as much as I find it a little frustrating, I think the only real approach is to return (names.UserTag, error)04:15
thumperand handle the error in the places where we need to04:15
thumperwhich is exactly one place04:16
waigani_thumper: yep04:16
thumperwe shouldn't ever get an error04:16
thumperbut I'd rather return an error that may one day be real04:16
thumperthan panic04:16
waigani_yeah, for sure04:16
davecheneywaigani_: what line ?04:30
waigani_davecheney: https://github.com/juju/juju/pull/71304:31
waigani_davecheney: state/service.go:62804:31
davecheneywaigani_: is it too late to not call the document OwnerTag04:34
davecheney'cos it's not04:34
waigani_davecheney: nop, what would you like it called?04:35
davecheneyanything, as long as it doesn't end with Tag04:35
davecheneythere are two reasons for this04:35
davecheney1. the data in there is not in tag string format04:35
davecheney2. william has decreed that tags shall not be stored in the database04:35
waigani_GetOwner ?04:36
waigani_thumper: ^?04:36
davecheneysgtm04:36
thumperdavecheney: unfortunately it is indeed a string version of a tag04:37
thumperdavecheney: and I think that 2. is flexible if it refers to a generic entity04:37
thumperbut in this case it certainly doesn't04:37
thumperit is only a user04:37
davecheneyok, if it is a tag04:38
thumperso it is a little more complicated04:38
davecheneythen it should be aclled OwnerTag and it miust be passed through ParseUserTag04:38
thumperthere was the suggestion to remove it all together04:38
thumperand clean it up04:39
davecheneythumper: fair enough04:39
davecheneyi don't know the background04:39
davecheneyjust eating what's in front of me04:39
thumperit was an early attempt to deal with permissions04:39
* thumper nods04:39
davecheneys/eating/digesting04:39
thumperwaigani_: this is turning out to be much more of a PITA than I wanted04:42
* thumper is considering the whole kill it approach04:42
waigani_thumper: doing last round of testing04:42
thumpernuke it from orbit04:42
thumperit is the only way to be sure04:42
waigani_thumper: you want me to drop the branch?04:43
thumperwaigani_: I04:44
thumperugh04:44
thumperI'm thinking we may be throwing good effort after bad04:44
thumperand we should perhaps just clean up the mess04:44
waigani_ooooh04:44
thumperrather than pushing it into a nice pile in the corner04:44
thumperI'd like to clarify with fwereade04:45
thumperwaigani_: however, removing it has more changes04:45
thumperas all the deploy helpers now take a service owner04:45
thumperthat we would no longer need04:45
waigani_thumper: I'm just about done with this, shall I finish it off and push it up for reference if nothing else?04:45
thumperwaigani_: if you like, and we should get input from fwereade04:46
thumperwaigani_: don't spend too much more on it though04:46
waigani_understood04:46
thumperwaigani_: instead look at auditing the user manager functions that we have04:46
waigani_thumper: okay, where should I start with that?04:47
thumperwaigani_: look at what functions are implemented,04:47
thumpercompare CLI, api client, api server04:47
thumperand state04:47
thumperand look at strings vs. tag usage04:47
waigani_ah right, go it04:47
thumperI know there isn't consistency, but I want to know how inconsistent we are04:48
=== Guest9121 is now known as wallyworld
wallyworldaxw_: can you connect to cloud-images.ubuntu.com  ?04:50
axw_wallyworld: yep04:50
wallyworldsigh, i can't :-(04:50
menn0thumper: sorry, just saw this... yes I figured out why that test was passing - the test setup was wrong so it was passing for the wrong reason04:55
thumpermenn0: ok, in which case you should be good to go04:56
menn0menn0: cheers04:56
wallyworldaxw_: can you run "juju metadata validate-images" for me to look up a precise image id on ec2, since i can't access cloud-images05:05
wallyworldseems there's a routing issue :0(05:05
axw_sure05:05
wallyworldaxw_: ah, got connectivity again05:06
axw_okey dokey05:06
=== urulama-afk is now known as urulama
wallyworldaxw_: it appears there's a problem with trunk - i bootstrap with default-series=precise and machine 0 comes up ok. i deploy a charm, and machine 1 can't start: "no matching tools available"05:21
axw_hrm05:21
axw_I'll take a look05:21
axw_wallyworld: which provider?05:21
wallyworldok, ta05:21
wallyworldaws05:21
axw_and are you doing --upload-tools?05:21
wallyworldyep05:22
axw_hm, weird. ok05:22
wallyworldand also --upload-series=precise,trusty05:22
axw_that shouldn't do anything anymore05:22
wallyworldi'm running from a utopic client05:22
wallyworldthought so, just did it in case05:22
axw_you should get a deprecation warning for --upload-series... you did right?05:23
wallyworldyeah, i did05:23
axw_ok. I'll try and repro in a sec05:23
axw_wallyworld: what did you try to deploy?05:36
axw_ubuntu?05:36
wallyworldmysql05:36
axw_you didn't specify series?05:36
wallyworldno05:37
axw_k05:37
wallyworld  "1":05:37
wallyworld    agent-state-info: no matching tools available05:37
wallyworld    instance-id: pending05:37
wallyworld    series: precise05:37
axw_wallyworld: just worked for me... :(05:38
axw_wallyworld: can you check cloud-init-output.log on machine-0 for lines saying "Adding tools"05:38
wallyworldok, i'll try again a bit later and try and reproduce05:38
wallyworldi may have destroyed, i'll check05:38
axw_wallyworld: oh I have an idea what it might be05:39
wallyworldok05:39
axw_if you uploaded, then your uploaded tools will have series=utopic.. does our code know about utopic already?05:39
axw_actually, probably does...05:39
wallyworldshould do, but i wanted precise tools05:40
axw_wallyworld: yeah, what happens is the CLI uploads the tools it can build, and the bootstrap machine explodes them into each of the series of hte same OS05:40
axw_by "the tools it can build" I mean the local series05:40
axw_hrm, actually it should be the series of the bootstrap machine not the local machine... will have to check it's doing the right thing05:41
wallyworldchecking machine-0, the only tools entry in cloud-init-output is 3b20f9692616c75f4df7326aed49efcfe520cbdeddeb39b8e19a59696e2975f8  /var/lib/juju/tools/1.21-alpha1.1-precise-amd64/tools.tar.gz05:42
axw_wallyworld: nothing saying "Adding tools"05:42
axw_?05:43
wallyworldnot that i can see05:43
axw_ok... can you please cat /var/lib/juju/tools/1.21-alpha1.1-precise-amd64/downloaded-tools.txt05:43
wallyworld{"version":"1.21-alpha1.1-precise-amd64","url":"file:///tmp/juju-tools260863187/tools/releases/juju-1.21-alpha1.1-utopic-amd64.tgz","sha256":"3b20f9692616c75f4df7326aed49efcfe520cbdeddeb39b8e19a59696e2975f8","size":8198295}05:44
wallyworldah look05:45
wallyworldutopic05:45
axw_right, that's a bug05:45
axw_thanks05:45
wallyworldyet machine 0 is precise05:45
axw_yeah, that URL is wrong and precise doesn't know about utopic, so it doesn't know it's Ubuntu05:46
axw_wallyworld: just live testing a fix now, do you want a patch while I write a unit test?05:59
wallyworldaxw_: it's ok, i have been able to test what i needed05:59
axw_cool05:59
wallyworldmongo syslog is beng spammed :-(05:59
wallyworldi've reduced it, but it's still logging regularly about authenticating a user06:00
axw_hmm, actually that URL shouldn't make a difference, only the version should. hrrmmm.06:01
axw_I'll try faking my series06:01
axw_wallyworld: can you please review https://github.com/juju/utils/pull/2806:26
* axw_ checks OCR06:27
axw_asleeping06:27
axw_if you're too busy, I can wait06:27
axw_master is not happy with the apt retries though06:28
=== kwmonroe_ is now known as kwmonroe
axw_wallyworld: I can't reproduce the issue. I've forced my local series to utopic, still nothing. That URL doesn't matter, I was misremembering what it was used for06:53
axw_bootstrapped ec2 with default-series=precise, and deployed mysql with no issue06:54
tasdomasmorning07:02
tasdomasdimitern, ping?07:02
wallyworldaxw_: hmmmm, ok. i'll try again a bit later07:04
dimiterntasdomas, hey07:05
tasdomasdimitern, you pinged me yesterday - was afk at that moment07:05
axw_wallyworld: CI doesn't look particularly happy either, though.07:05
wallyworldaxw_: looks like the upgrade jobs at first glance07:06
dimiterntasdomas, yes, it was about the port ranges work, we'll be inheriting from you :)07:06
tasdomasdimitern, right - I'm addressing fwereade's comments as we speak07:06
dimiterntasdomas, can you give me a quick status update?07:07
tasdomasdimitern, fixing up the PR (https://github.com/juju/juju/pull/517)07:07
tasdomasdimitern, it's a large PR, fwereade requested that it be split up into smaller ones, unfortunately I won't be able to do that07:08
dimiterntasdomas, right, so how much time do you need?07:08
tasdomasdimitern, to finish fixing the PR?07:09
dimiterntasdomas, I can perhaps take over and finish it if you don't have the time?07:09
dimiterntasdomas, I heard your team is focusing on other things now07:10
tasdomasdimitern, that would be great07:10
tasdomasdimitern, I'll finish what I am working on at the moment07:11
tasdomasdimitern, do you want to have a hangout to discuss the port ranges work? Or do you want a small write-up on what's been done and what still needs to be done?07:11
dimiterntasdomas, ok, cool, I'll have a look to remember what's what and how to continue07:11
dimiterntasdomas, what works better for you?07:12
tasdomasdimitern, ok, ping me if you have any questions07:12
tasdomasdimitern, it doesn't really make a difference for me07:12
tasdomasdimitern, whatever works best for you07:12
dimiterntasdomas, ok, then I'd rather have the writeup summary, as I'm doing like 3 things now :)07:13
tasdomasdimitern, ok - you'll have it by lunch time (2-3 hours)07:14
dimiterntasdomas, thanks!07:14
tasdomasdimitern, no, thank you07:14
tasdomasdimitern, also, I've updated the PR https://github.com/juju/juju/pull/667 - when you have a sec, could you take a look?07:15
dimiterntasdomas, sure, looking07:15
dimiterntasdomas, LGTM07:33
tasdomasdimitern, thanks - I'll update the error message before landing07:33
dimiterntasdomas, sweet!07:34
axw_wallyworld: I have charms deploying without provider storage :)   needs some polishing and more testing before I can propose anything07:44
axw_also upgrade steps required this time07:45
TheMuemorning08:01
dimiternTheMue, morning08:07
TheMuedimitern: regarding the last comment yesterday: yes, the suite is running twice, once for v0 and once for v1, during the first run the test for a function introduced with v1 is skipped08:09
TheMuedimitern: this way it's easy to check if v1 doesn't break compatability to v008:09
dimiternTheMue, yeah, I've seen this, but doesn't that seem awkward way of running the tests?08:09
dimiternTheMue, how is that better than having 2 separate v0- and v1-only suites?08:10
TheMuedimitern: it thought about it, but then you 1st need a base test you can embed into the real ones, and then 2nd you have one for v0 and one for v1 with almost the same content, in my case only one additional test. that's lots of redundant code08:12
TheMuedimitern: because each new version has to ensure that it doesn't break existing functionality08:12
dimiternTheMue, ok, that sounds good to me08:17
TheMuedimitern: yeah, spent some time yesterday how to organize it best and to see, where the lowest dependencies exist08:18
dimiternTheMue, cheers08:19
TheMuejam: would also like to discuss it with you, mast of API versioning :D08:20
TheMues/mast/master/08:23
fwereadeTheMue, well, new versions are surely there *because* we want to break existing functionality -- when things don't change, yes, you get a duplicated test; but when they do I think it will be very hard to adapt that style of test08:30
fwereadeTheMue, I understand where you're coming from08:30
fwereadeTheMue, might it make most sense to have per-method suites? so then you can run the same per-method suite against multiple versions, hopefully minimising duplication without falling into a situation where adding a new version involves adding a new layer of special-casing to an over-general full-facade suite?08:33
TheMuefwereade: yes, I simply want to ensure that all functions of a former version work like before while those which are added or changed surely behave different08:33
TheMuefwereade: could you please expand a bit?08:34
TheMuefwereade: did you you take a look into my proposal?08:34
fwereadeTheMue, so, the concern is that having a single full suite with one special-case for one new method is defining the direction we'll take in the future08:34
TheMuesimply to synchronize better08:34
fwereadeTheMue, next method will be another special case08:34
fwereadeTheMue, and then next version there's a change in functionality for some method08:35
fwereadeTheMue, and whoever implements it will... add another special case08:35
fwereadeTheMue, and *very soon indeed* it will become straight-up impossible to understand what's happening in this single godlike test suite that actually tests slightly different things for all the api versions08:36
jamhazmat: I'm assuming you succeeded in building tokumx, but I've been struggling a bit. Did you grab their source control branches? What version? And did you use cmake or scons, as it looks like they want to switch to cmake (mongo itself uses scons), but I keep running into errors trying to build 1.5.008:36
fwereadeTheMue, I haven't seen the code we're talking about, though, I'm just going by what you said above08:36
TheMuefwereade: please take a look here: https://github.com/TheMue/juju/blob/capability-detection-for-networker/apiserver/machine/machiner_test.go08:38
TheMuefwereade: and I would like to see an outline of a per-method suite. this term sadly doesn't tell me a lot. ;)08:39
jamTheMue: a "Suite" object for each method, rather than one "Suite" for each Facade08:39
fwereadeTheMue, you have a suite that tests all the methods, but special-cases some of them08:39
fwereadeTheMue, I'm suggesting having lots of suites, defining our expectations of the behaviour of a single method each08:39
fwereadeTheMue, and registering explicitly only the tests we actually want to run08:40
TheMuejam: ah, thanks08:40
fwereadeTheMue, rather than mixing the what-to-test in with the how-to-test08:40
* TheMue tries to imagine how the code base will look like for a number of methods that are robust over time. 08:44
TheMueso a v0 test would be embedded into a v1 test and so on, and only when it breaks, e.g. at v7, a new implementation would be made?08:45
TheMuemy goal is a good compromise of test reusage and flexibility for changes over time.08:46
fwereadeTheMue, I'd rather avoid embedding anything at all anywhere really08:46
fwereadeTheMue, I'm imagining there'd be a TestGetMachines suite, which gets set up to run its tests against v1 of the API08:47
TheMueso let's say we have 5 suites for a v0, I add a new method, now have e.g. 6 suites for v1, and then in v2 I add two more and change one ...08:47
fwereadeTheMue, and all the other suites test against both v1 and v008:48
TheMuefwereade: no embedding, code duplication instead?08:48
fwereadeTheMue, where did I suggest we duplicate code?08:48
TheMuefwereade: that's why I ask08:48
fwereadeTheMue, you write one suite, that is capable of testing that some method implementation acts as expected08:49
TheMuefwereade: simply to get better aware of your thoughts ;)08:49
fwereadeTheMue, you then feed all the facade versions that you expect to have that behaviour into that suite08:49
fwereadeTheMue, so adding a new version is a matter of adding the new version to the suite for each method it still uses08:49
fwereadeTheMue, new method? new suite, targeting just that facade08:50
TheMuefwereade: ok, that's what I'm doing (when I get your word right), but for the whole suite with more then one method to test08:50
fwereadeTheMue, yes08:50
TheMueaaaaaaah08:50
fwereadeTheMue, I just want more granularity08:50
TheMuefwereade: instead of using the skipping or evel switches based on the version number inside the tests08:51
fwereadeTheMue, I think (particularly for the bigger facades) full-facade suites wil become unmanageable really alarmingly fast08:51
TheMuefwereade: sounds cool08:51
fwereadeTheMue, on a separate note, what does Machiner need GetMachines for?08:53
fwereadeTheMue, ah, whether something's on a manual provider? what do we use that for?08:55
jamfwereade: IIRC for stuff that was on the Agent API but doesn't do anything for Unit agenst, and thus is a Machiner responsibility08:55
TheMuefwereade: the whole branch is about the needance for a safe networker. and here we neede the information if a machine is provisioned manually. first approach has been to retrieve the information extra, as it isn't needed so often.08:55
fwereadejam, TheMue: then that is *definitely* not a machiner responsibility -- the machiner doesn't start the networker08:56
TheMuefwereade: but review and discussion feedback has been to not make an extra call, so I changed the way we retrieve a machine info on the client side of the API08:56
fwereadeTheMue, jam: this feels like it should be a job, as communicated by the agent api, rather than tacking it onto an unrelated purpose-specific facade08:58
fwereadeTheMue, jam: am I confused about something?08:58
jamfwereade: so, previously there was an API on Agent that was giving you the Life of the entity you wanted, and a bunch of other Machine related stuff that didn't make sense for Unit agents.08:58
TheMuefwereade: what is the task of the machiner API?08:58
jamfwereade: GetEntity IIRC, looking08:58
TheMuefwereade: naive, by taking the term "Machiner" I would expect machine related API calls, like retrieving information about a machine08:59
fwereadeTheMue, set the machine to dead once it's marked as dying, and shut down08:59
fwereadeTheMue, it also sends network addresses once on startup which is a bit yucky08:59
fwereadeTheMue, the facades are all worker-specific09:00
fwereadeTheMue, they should be exactly what's needed for a remote worker to fulfil its (ideally *single*) responsibility09:00
TheMuefwereade: here's my problem from a maintenance perspective. wanting to do something related to machines it always pulls me to the term "Machine" or "Machiner", but never to something called "Agent"09:02
jamfwereade: so today we have AGentGetEntitiesResult which has 1 field that is actually shared, and then 2 fields that aren't meaningful for Unit agents, we would have been adding a 3rd. It felt better to split that out for Machine-Agent specific responsibilities.09:02
jamI see your point that Machiner is the worker, not the Machine-Agent api09:02
jambut do we have a Facade for just machine agents (vs all agents in general), do we want one? Is it just better to pull it out of Agent.GetEntities and make it something Agent.GetMachineDetails sort of thing?09:03
jamfwereade: ^^09:03
fwereadejam, TheMue: IMO the separate existence of unit agents is the anomaly -- making the agent api more machine-agenty doesn't seem to me to be a particularly major issue, because it echoes where we want to go anyway09:03
TheMuejam, fwereade: so maybe there's a need for two facades: "Machiner"/"MachineWorker" and "Machine"09:03
fwereadeTheMue, I don't think so09:03
fwereadeTheMue, what's the worker that uses "Machine"09:03
fwereade?09:03
TheMuefwereade: are only worker using the API?09:04
fwereadeTheMue, and agents; and external clients; but essentially, yes09:04
fwereadeTheMue, and an agent is almost a special case of a worker09:04
fwereadeTheMue, it's the "worker" that starts other workers09:05
fwereadeTheMue, and what we have hitherto done is (1) use the Jobs to figure out what to start09:05
fwereadeTheMue, or (2) pull hacky shite out of the agent config instead09:05
fwereadeTheMue, the latter is not good09:06
jamfwereade: so there is currently a bunch of code in api/agent/state.go that claims to be talking about an "Entity" but has stuff like "Entity.Jobs()" which returns []params.MachineJobs09:06
jamwhich doesn't fit very well on a generic "Entity" object.09:06
fwereadejam, agreed, that's not nice09:06
jamfwereade: I think the sentiment was lets pull it into something for Machine agents, and it got put over on Machiner. I think I'm in agreement that it shouldn't go there, but where *should* it go09:07
TheMuefwereade: ok, maybe here's my mistake, as to me the API is for more than just the worker. it's an API. and if I wan't to talk about machines I need somewhere to talk to.09:07
fwereadejam, IMO making the agent code more machine-agenty is a far lesser sin09:07
jamTheMue: the Facades design is about 1 Facade per worker09:07
jamso it isn't talking about Machines09:07
jamit is more that *if* you're Worker needs to know about Machines then your corresponding Facade will have a Machines API call09:08
jamTheMue: eg, we won't have "juju" the CLI client talking to the Machine facade.09:08
fwereadeTheMue, if there's functionality that two separate facades need, you implement it separately from both, and embed (or passthrough if there's a different method name)09:08
fwereadeTheMue, the individual facades control auth, and if their functionality is unique it's generally in there too09:09
TheMuefwereade: ic, thanks09:09
fwereadeTheMue, shared implementations are in apiserver/common, and need a GetAuthFunc (supplied by the facade) to determine how they can be called09:09
* TheMue is astonished how this turns. looks like an almost pushed PR needs larger changes again. already had an LGTM and the change of the test code only has been to check how to better test the API :D09:12
TheMuefwereade: so in my case you would place that GetMachine() at the agent API?09:13
fwereadeTheMue, sorry, back: I'm wondering why we are not expressing an agent's responsibilities with *jobs*09:21
fwereadeTheMue, that's what they're for after all09:21
fwereadeTheMue, this feels like just another case of exposing inappropriate information to the agents09:22
TheMuefwereade: ok, fine for me, but my use cae is: I need an information about a machine09:23
fwereadeTheMue, why is it ok for the agent to know what sort of provider it's running on?09:23
TheMuefwereade: I need to know if a machine has been provisioned manually, because then always a safe networker is needed09:23
TheMuefwereade: we don't talk about the provider, but the machine09:24
=== rvba` is now known as rvba
TheMuefwereade: e.g. a manually provisioned machine on ec209:24
fwereadeTheMue, I thought you were asking about its provider type09:24
TheMuefwereade: or openstack09:24
fwereadeTheMue, -> you know about providers in the machine agent09:24
TheMuefwereade: sorry, bad expressed myself, no09:24
fwereadeTheMue, -> you are breaking layering09:24
fwereadeTheMue, surely the agent should know *nothing* about why or how it was provisioned09:25
TheMuefwereade: *sigh*09:25
fwereadeTheMue, I'm sorry to architect-tantrum at you09:26
fwereadeTheMue, but09:26
fwereadeTheMue, we have jobs, which we're meant to use09:26
TheMuefwereade: *rofl* no problem09:26
fwereadeTheMue, we have dirty hacks that get around jobs, that we kinda had to do because we "designed" the system without an api layer, and were hamstrung by compatibility09:26
TheMuefwereade: so, dear architect, what's your idea for determining if a safe or "non-safe" networker has to be used?09:27
fwereadeTheMue, we introduce new jobs, and use those to determine what workers to run09:27
fwereadeTheMue, the bad-but-once-acceptable way to do it is the explicit checking based on provider type and/or machine id (that we have still not managed to excise from jujud)09:28
fwereadeTheMue, the right way to do it now is to get rid of *all* those special cases, and use the fact that we can now change the api meaningfully to express the set of responsibilities that a machine agent can have, or not have09:29
TheMuefwereade: the idea has been to let the providers decide by implementing it as an environment capability09:29
fwereadeTheMue, sure, but that happens somewhere in the api server, and the machine agent shouldn't know or care09:30
TheMuefwereade: otherwise if this is a kind of job decided by the API server, than for each new provider implementation the server side API possibly has to be changed to. do I understand you right?09:31
wwitzel3davecheney: thanks for the review09:31
TheMuefwereade: because this also is a breaking for me, the idea of clean provider interfaces so that provider implementations can be plugged in and exchanged09:32
fwereadeTheMue, the two reasonable approaches I can see are (1) new job, that the MA uses to start appropriate workers; or (2) putting it in the Networker facade, such that the client side knows whether to run "safely" or not09:32
fwereadeTheMue, would you expand a little on what you expect to change there?09:32
davecheneywwitzel3: np09:32
fwereadeTheMue, isn't it still just a matter of the provider exposing whether you can safely mess with network interfaces on its machines?09:33
fwereadeTheMue, but we use that to figure out the machine jobs09:33
fwereadeTheMue, and we do that in a component that's allowed to know about providers09:33
fwereadeTheMue, then we express it to the agents in a form that's easy for the agents to consume09:34
fwereadeTheMue, which may or may not match the underlying internal data model09:34
TheMuefwereade: hmm, maybe I lost you here09:34
fwereadeTheMue, would you explain what change to the provider interfaces you're worried about?09:35
TheMuefwereade: nothing on the provider interfaces, only that the Agent API has to know about the existing providers and what they need to decide wether they need a safe or non-safe networker (I hate this term ;) )09:36
jamTheMue: so I think for what *we're* trying to accomplish, having a JobManageNetworks would be perfectly appropriate for deciding what kind of Networker we want to run09:37
jamand whether that Job gets added can be based on whether the machine was manually provisioned.09:37
fwereadeTheMue, maybe it's the agent api, maybe it's done at the state level09:37
jamTheMue: so what we care about is whether we should be managing /etc/network/interfaces09:37
fwereadeTheMue, all I care about at this point is that we not leak that information onto the agents themselves09:37
TheMuejam: to decide it we need to now which provier, which machine (bootstrap or not) and if it is manually provisioned09:37
jamTheMue: but that can be done at provisioning time, rather than when the agent is starting up09:38
fwereadeTheMue, but you *cannot know those things in the agent* if you care about coupling and layering and the consequences of ignoring those considerations09:38
jamTheMue: so we remove all of the special case inside the code, and just have it told by the thing that actually knew that information originally.09:38
* fwereade brb, don't stop talking09:39
jamTheMue: at least, AIUI, I also think we should bring dimitern in on this conversation.09:40
TheMuefwereade: if the API allows to retrieve the needed information (in a  generic way, GetMachines is also used instead of the old way to retrieve information about a machine on the client side) we can provide all needed information09:40
jamBut the idea is that when you want to ask the question "should X run in Y circumstance" that question can still be asked, we just need to ask it earlier and record it as whether or not an agent will be assigned a Job09:40
dimiterni'm here09:41
* dimitern reads a lot of scrollback09:41
TheMuedimitern: followed this interesting discussion? ;)09:41
jamTheMue: so for example, ContainerType is also a bad API09:42
jaminstead, it should be a JobRunLXCProvisioner09:42
dimiternTheMue, nope I'm afraid, I'm trying to write a manual procedure for making an addressable container in ec2 and maas09:42
jamor something along those lines.09:42
TheMuejam: there maybe already several ones, yes09:42
TheMuejam: maybe my thoughts of an API, what I understand as an API, are a bit naive09:43
jamTheMue: at least as I am "channelling my inner fwereade" the idea is that we can look at the questions we're asking, and figure out if they are appropriate or whether someone else should just be giving the answer.09:43
jamTheMue: it isn't so much about API vs not API09:43
jambut what questions should be asked and who is responsible for knowing the answer.09:43
dimiternfwereade, TheMue, jam: my concerns align pretty well with "<fwereade> TheMue, and *very soon indeed* it will become straight-up impossible to understand what's happening..."09:44
* jam has to go take the dog out before it gets messy, brb09:44
TheMuejam: this discussion has been in the beginning, the whole change has been about adding an environment capability implemented by the providers to decide, which networker to use09:44
tasdomasdimitern, I've shared a doc with you09:46
TheMuedimitern: we're not talking about testing anymore, more about responsibilities09:46
tasdomasdimitern, and pushed my latest changes to the port ranges PR09:47
TheMuedimitern: what information are retrieved from where so that the thing currently implemented as environment capability can decide which networker to start09:48
wallyworldaxw_: just got back from soccer; say message; niiiice09:48
wallyworldsaw09:48
TheMuedimitern: or if the networker can decide it internally by communicating with the Agent API which then decides based on provider, machine id, and manual provisioning which one to take09:49
TheMuedimitern: so (1) passing information to client/worker and decide there or (2) passing information to according API and decide there?09:49
fwereadeTheMue, dimitern, jam: cmd/jujud/machine.go:50709:50
axw_wallyworld: I'm rewinding a bit to improve things, but it shouldn't be too far off09:50
wallyworldok09:50
fwereade// TODO(axw) 2013-09-24 bug #122950709:50
fwereade// Make another job to enable storage.09:50
fwereade// There's nothing special about this.09:50
mupBug #1229507: create a machine job for machines/environments that provide local storage <local-provider> <manual-provider> <tech-debt> <juju-core:Triaged> <https://launchpad.net/bugs/1229507>09:50
* axw_ slinks into the shadows09:51
fwereadeTheMue, dimitern, jam: the other place we do it is in deciding whether to start the authentication worker09:51
fwereadeTheMue, dimitern, jam: I *think* those are the existing dependencies in jujud09:52
fwereadeaxw_, you couldn't do it then, we still had to worry about sending jobs that agents didn't understand09:52
axw_ah yes09:52
fwereadeaxw_, I think we're fine now, because we implement can a new Jobs method that can send more values09:53
fwereadeaxw_, and be sure that nobody's going to call it without being prepared09:53
dimiterntasdomas, thanks, I got it, will look a bit later09:53
* dimitern is still catching up to the current discussion.. 09:54
fwereadedimitern, short version: <architect-tantrum>the agents must not know about providers! (oh, and we shouldn't jam agent methods onto the machiner)</architect-tantrum>09:55
dimiternfwereade, I'm +100 for this09:55
TheMuefwereade: inside the PR the agents DON'T know about the provider09:56
TheMuefwereade: they simply delegate the decision to the current provider by using an environment capability09:56
dimiternfwereade, I mean agents not knowing about providers, but capabilities implemented by providers and checked by the agent?09:56
TheMuedimitern: yes, as we discussed, this is how it works inside the PR09:57
TheMuedimitern: but I don't need to tell you, you know it :)09:57
fwereadeTheMue, you have added an IsManual field to the api09:58
fwereadeTheMue, that is *explicit* information about the provider type, exposed to the agent09:58
TheMuefwereade: please, no, not the provider09:58
fwereadeTheMue, the agent now needs to care about what it means for something to be a manual provier09:58
TheMuefwereade: it's about if it is provisioned manually, even in ec2, openstack, azure ...09:58
dimiternfwereade, not exactly, IsManual is about a machine being manually provisioned or not09:59
TheMuefwereade: it's not about the manual provider, definitely not09:59
fwereadeTheMue, dimitern: what provisions manual machines?09:59
dimiternTheMue, well, it kinda is09:59
axw_a manual machine can technically be in a non-manual provider environment09:59
dimiternfwereade, but it's the property of a machine, isn't it?09:59
fwereadeaxw_, right, but that machine's provider is not, say, ec210:00
axw_it is at the moment, because we don't have per-machine providers10:00
axw_we have a per-environment one10:00
fwereadeaxw_, dimitern, TheMue: we weren't able to explicitly tag machines with their provider, it's true10:00
fwereadeaxw_, dimitern, TheMue: remind me, how do we prevent the provisioner trying to do things with those machines?10:01
hazmatjam, i built it in a trusty cloud container10:01
axw_fwereade: it doesn't know about those instances, so it leaves them alone10:01
dimiternfwereade, they are already provisioned perhaps?10:01
dimiterni.e. have instance id10:01
fwereadeaxw_, dimitern, TheMue: hmm, makes sense -- and when they die?10:01
hazmatjam, i can give you my binary, i think i have instructions somewhere as well extracted from my bash history10:01
TheMuefwereade: maybe again a leak of information on my side. what is the intention of Machine.IsManual in state?10:02
hazmatjam, http://paste.ubuntu.com/8298506/10:02
hazmatjam, re build recipe10:02
axw_fwereade: I don't understand the question10:02
dimiternfwereade, they are destroyed as usual, but how the provisioner doesn't reap them you mean?10:03
axw_there provisioner doesn't destroy those instances, because they're not things under its management10:03
axw_again because it doesn't know about the instance IDs10:03
fwereadeaxw_, ah ok, we ask the provider for instances with X ids, we get back errpartialinstances and gnore the missing ones?10:04
axw_fwereade: I'm afraid my memory on specifics is a bit hazy10:05
hazmatjam, and my binary (w ssl) is @ https://www.dropbox.com/s/dbcrgahxxyt8buv/tokumx-1.5.0-linux-x86_64.tgz?dl=010:05
axw_fwereade: but that sounds about right.10:05
axw_fwereade: TBH, I think a job is just as applicable to manually provisioned machines as it is to manual provider type10:05
hazmatjam, i'd give it a go with the compile again using the build recipe (lxc container)10:05
hazmator clean env10:06
axw_I haven't looked at the PR in question though10:06
jamhazmat: I thought I was using exactly the same thing, but I'm getting: http://paste.ubuntu.com/8298512/10:07
hazmatjam, sorry can't help more than that atm, in the middle of a sprint and pair programming10:08
dimiternfwereade, so to summarize my point, having IsManual or the machiner facade is a *good* thing I think, it's not provider-specific; this allows us to define the capability across providers; my only contention was with the way this is tested wrt api versions10:08
TheMuefwereade: the returned IsManual talks about juju/state/machine.go:270. will this function only return true when we use a manual provisioner?10:08
jamhazmat: np10:10
TheMuedimitern: yeah, here fwereade had a good idea for test that will go in next (when we solved the current topic). per-method suites running for the respective versions. so no huge suite and no skipping or branching inside10:10
jamI'll try it again10:10
TheMuedimitern: I like it10:10
jamhazmat: thanks for the pointers10:10
dimiternTheMue, ah, good - reading scrollback again to get context10:11
TheMuedimitern: yes, it's a good approach, especially when API change more and more over time10:13
jamdimitern: so I certainly think from a "what type of networker do we run" it fits better on a Jobs basis.10:14
dimiternjam, thinking about it now, yes it indeed does work better as a job, but there's one caveat10:15
dimiternjam, it's not just about the networker, that's why I keep forgetting to ask TheMue to rename the capability to "RequiresSafeNetworking" perhaps, to point out that it applies to all networking (incl. what we do in cloud-init on maas)10:16
TheMuejam: so not the the-provider-knows-it-best-approch, but the a-central-instance-called-api-knows-it-best-approach?10:16
TheMuejam: here I dislike the idea, that the logic on the server-side has to know about the provider10:17
TheMuejam: in (1) we retrieve information from the server-side and let the provider (capabilty) decide, in (2) we send information to the server-side and make the decision there10:18
TheMuejam: in both solutions information of one side are passed to the other side10:19
voidspaceHmmm... it's looking increasingly like we're stuffed with ipv6 support without some help from mgo10:19
dimiternTheMue, the api *definitely* knows what provider is used btw10:19
TheMuejam: and as an old friend of bottom-up I like it more to pass information from the server to the client then vice-versa10:19
TheMuedimitern: yes, but is this good?10:20
dimiternvoidspace, no luck finding a workaround for ipv6 format to pass as arg?10:20
jamTheMue: my point is that at the time you do provisioning you determine whether it is safe to control networking on that machine, and then record that information as a Job10:20
dimiternTheMue, good or bad, it's unavoidable10:20
voidspacedimitern: mgo keeps the cluster addresses from the addresses we pass in10:20
TheMuedimitern: hehe, maybe I'm just to old-schooled bottom-up10:21
voidspacedimitern: and mgo requires ipv6 addresses in one format (the correct one) and mongo requires another format10:21
voidspacedimitern: so neither works, and as far as I can tell so far I can't work around it at the level above10:21
voidspacedimitern: still digging into exactly how mgo gets the cluster addresses10:21
voidspacebut it's not simple code10:21
dimiternvoidspace, so does it seem like a bug in mgo?10:21
TheMuejam: hmm, decision on client, storage as Job, retrieval when needed? did I get you right? that sounds like a clean approach to me10:22
jamdimitern: well probably a bug in Mongo that mgo needs to work around10:22
voidspacedimitern: not *really*10:22
voidspacedimitern: it's a bug in mongo that mgo makes it impossible to work around :-)10:22
jamTheMue: well, it would still be mostly determined inside the API Server (I believe), as that's the thing where you are saying "add this manually provisioned machine to your list of machines to control"10:23
voidspacedimitern: mgo requires the *right format* (because it just passes addresses through to net.Dial functions)10:23
voidspacedimitern: but mongo can't work with them10:23
voidspaceI guess no-one is using mgo with ipv610:23
jamvoidspace: so we write our own "dial" functions, so we can patch them as needed, can't we?10:23
jamvoidspace: I haven't seen this particular bug that you're describing, is it in the traceback?10:23
voidspacejam: mgo code calls net.DialTimeout - from the Go standard librarty10:24
voidspacejam: I worked out why the ipv6 test fails sometimes10:24
voidspace*library10:24
voidspacejam: so a "different Dial function"10:24
voidspacejam: can we patch the Go standard library?10:24
voidspacejam: if calling Set causes a primary renegotiation (doesn't seem to happen every time)10:24
voidspacejam: then mgo calls syncServers10:24
voidspacejam: this uses net.DialTimeout to check it can reach servers10:25
jamvoidspace: line 114 of mongo/open.go10:25
TheMuejam: it is determined indirectly by adding more information than today, but the decision itself is done on the client side based on this information10:25
jamwe define our "what do you use to call mongo"10:25
voidspacejam: this is a call to net.DialTimeout inside mgo10:25
voidspaceand is different from the Dial functions we use10:25
voidspacejam: mgo is using it to check that cluster members are up10:25
jamvoidspace: k, so arguably the mgo bug is (a) that it isn't using our dial function10:25
jambecause we're using TLS anyway, so Dial would really fail10:26
jamI guess you could connect to the port10:26
voidspaceit doesn't fail10:26
voidspaceyeah, it's just a connect10:26
jambut you couldn't talk to MongoD there10:26
voidspacenet.DialTimeout requires ipv6 addreses to be in the form [::1]:port10:26
voidspacewith square brackets10:26
voidspaceand if you don't have them the dial fails with "too many colons in address"10:26
voidspacebut mgo discards the actual error and just reports "no reachable servers"10:27
voidspacejam: however due to the mongo bug we discovered a while ago, we can't start an ipv6 replicaset unless we use the address form *without* square brackets10:27
jamvoidspace: so thi sis line 399 of mgo.v2/cluster.go ?10:28
jam"dial with UDP and a 10s timeout "?10:28
voidspacejam: yep10:28
voidspacejam: I added an extra log line there and you see the error...10:28
jamvoidspace: so I think the fix is that we tweak the "getKnownAddrs" code to handle the mongo ipv6 badness10:29
jamso change cluster.getKnownAddrs10:29
jamso that if it sees an address like "fe08::1:12345"10:29
voidspacejam: or even just resolveAddr10:29
jamit knows to call that "[fe08::1]:12345"10:29
jamvoidspace: I'd rather have real addresses in memory as much as possible10:29
voidspacejam: we have to be careful that "fixed addresses" don't leak back to mongo10:29
jamand only translate at the exact "talking to mongo" boundary10:30
voidspacebecause mongo can't parse them in that format10:30
jamvoidspace: sure, but I'd still rather have a very clear "this is where we're translating for mongo" and it should live in mgo, and we should get rid of our hack-arounds in juju10:30
voidspacejam: but it's largely serialised configs we're sending10:31
jamvoidspace: certainly you would agree that "mgo" should be where it knows the details of how Mongo works10:31
voidspacejam: so "fixing at the boundary" means de-serialising and re-serialising10:31
jamvoidspace: all of the replicaset code was intended to live inside mgo10:31
jamvoidspace: natefinch implemented in Juju as a prototype to see it working10:31
jamwith the intent that it migrates into mgo proper10:31
voidspacejam: we serialise replicaset configs and just call session.Run w10:31
jamonce the API seemed to be appropriate and working10:31
jamvoidspace: so for some amount we can have it in "replicaset" as that is "logically" mgo code10:32
voidspacejam: yes, but mgo passes serialised data straight through10:32
voidspaceour replicaset code can strip out the square brackets from member addresses - and add them back in10:32
jamvoidspace: so if my last lines weren't clear10:32
jam"it should be in mgo, but 'replicaset' can be treated as mgo code"10:33
voidspacebut that isn't sufficient because getKnownAddrs needs to change too10:34
jamvoidspace: this is one of those cases where if we had separated out the "struct for serialization" from the "struct in memory that you use to get stuff done" it would be clearer.10:34
voidspaceright, but then mgo would need specific code for every possible mongo command10:34
jamvoidspace: so userSeeds, dynaSeeds, servers.Slice should likely all already have real [dead:beef::1] addresses.10:34
voidspaceby "should", you mean "need fixing"?10:35
jamvoidspace: as in "logically should be done", and probably needs a patch, yes.10:35
jamvoidspace: at least, if I was doing the code, I would want our in-memory representation to hold "correct" values, and translate at the boundary10:36
jamlike you do for UTF8 / Unicode / byte strings / user encodings.10:36
voidspaceso long as there's no way for an "unfixed" address to leak10:36
jamvoidspace: it will, but you can treat that as a bug10:37
voidspaceand as mgo is a low level driver that basically allows us to send whatever to mgo, it's very hard to guarantee that10:37
jamjust like user encodings often leak all over the place10:37
jamvoidspace: so as you can always call session.Run sure, there are cases where the user has to do the work, but mgo should be the abstraction over 90% of that.10:37
voidspacewell, yes - and we can try and patch our code everywhere we find holes and fix all the bugs as we find them10:38
voidspaceor we have one function to do the fix and we speak mongo native addresses everywhere else10:38
jamvoidspace: I don't think we want to think in terms of Mongo bad-ipv6 addresses10:38
jamas then *those* leak in our code10:38
jamas they have already done here10:38
jamand we know those are bad formats10:38
jamI'd rather have good addresses leak10:38
jamthan bad ones10:38
voidspaceheh10:39
voidspacewell, I don't disagree10:39
jamvoidspace: hence why you try to make what you keep around "correct" as much as possible, because at best it exposes bugs in other people's stuff when you accidentally hand them the right thing.10:39
voidspacejust that a general fix *really* means a layer over session.Run and deserialising and checking all commands10:39
jamvoidspace: I don't think we have to fix session.Run10:39
jamyou don't really fix "exec.Command"10:39
voidspaceand we *still* need to fix mgo as well *anyway*10:39
voidspaceyou have to layer over the top of it10:40
voidspacefixing the replicaset functions to translate at the boundary is easy enough10:40
voidspaceso I'll go down this path10:40
jamvoidspace: so my grep for "\.Run(" only points to state.Presence10:41
jamand replicalset10:41
jamand *replicaset* is meant to be in mgo eventually, so it is allowed and must be made correct.10:41
jamand state.Precence is doing something that mgo actually added support for10:42
jamwell, maybe it didn't expose it10:42
jamThere is a "cluster.isMaster" function10:42
jamthat calls ssesion.Run("ismaster") which is what we are duplicating in our code.10:42
jamvoidspace: so again, think of the "replicaset/" directory as though it should live in mgo.v210:43
voidspacejam: sure, that's not the issue10:43
jamand I think you can see the layering that I'm proposing.10:43
voidspacejam: we *know* that fixing resolveAddr solves the immediate problem without risk of leaking "wrong" addresses10:43
jamcalling session.Run from user code means 'mgo' isn't doing its job10:43
jamvoidspace: maybe, but I think it is still the wrong fix10:43
voidspacejam: the fix your suggesting is a lot more work *and* a much higher risk of introducing tricky bugs10:43
jamvoidspace: it means we maybe sort of sometimes think in terms of almost IPv6 addresses in memory.10:44
voidspacethat we could be playing whack-a-mole with in production for our customers10:44
voidspaceand requiring new releases of mgo to solve, so out of the teams hands for actually delivering a fix10:44
jamvoidspace: resolveAddr is a bugfix to mgo10:45
jamI think that's an invalid argument10:45
voidspacejam: we do one bugfix instead of n10:45
jamvoidspace: I don't tihnk we have N10:45
voidspacewhere n is potentially unbounded :-)10:45
jamwe know that today nobody is using IPv6 with mgo and mongo10:45
jambecause it doesn't work10:45
voidspaceright10:45
jamvoidspace: I really think you're overstating it10:46
jamhaving correct addresses in memory *is the fix*10:46
TheMuejam, voidspace: hangout?10:46
voidspacejam: I guess in terms of encoding, you're suggesting mixing encoded / decoded - with mojibake risk10:46
voidspaceI suggesting we stay decoded...10:46
jamand while MGO does allow you to poke at the internals of the DB, it isn't *how user code is meant to look*10:46
voidspace*encoded10:46
voidspacedammit10:46
voidspacegetting my metaphors wrong10:46
jamvoidspace: TheMue: joining10:46
voidspacejam: I may well be overstating it10:47
voidspacejam: I'll talk to Gustavo about it10:47
voidspaceTheMue: dimitern: neither my mic nor camera are working10:48
jamdimitern: I had to plug in my headphones, and I think the sound settings ar ewrong, brb10:48
TheMuejam: each time you're frozen10:49
jamTheMue: strange, as I can follow you guys just fine10:49
jamTheMue: dimitern: voidspace: k, I'll type to respond to you guys10:51
jambut I can follow you without problem10:51
jamdimitern: so what's up with you today10:51
jamdimitern: feel free to run the meeting since people can't follow me well10:52
voidspacemy webcam works fine10:52
voidspace"cheese" uses it no problem10:52
voidspaceit's chrome10:52
voidspace*grrr*10:52
TheMuevoidspace: hehe, I almost never use chrome anymore, but the fox10:52
jamdimitern: do you agree that the Networker worker shouldn't be deciding what mode to be run in, but it should be a Job ?10:57
=== mup_ is now known as mup
dimiternjam, the networker does not decide on its own, it's started in either safe mode or not10:58
jamTheMue: not that grouping for Facades10:58
jamdimitern: sure but it makes sense (to me) that the Agent doesn't decide its tasks, but is given them10:59
jamand the logic of whether that task should be run is determined elsewhere.10:59
jamand encoded as "Jobs"10:59
dimiternjam, but using a job works for me, except that little quirk about disabling cloud-init scripts for maas10:59
jamdimitern: so there is a bit of duplicating logic, but only because we want to get rid of the cloud-init step anyway11:00
voidspacesorry, rejoining11:00
jamdimitern: we should still do bridging in the Networker, IMO11:00
jamdimitern: *today* what we have been doing in cloud-init should be done in the Networker11:01
jamdimitern: irrespective of the new MaaS api11:01
jamdimitern: we can do the same logic we have today11:02
jamwhich is "always bridge eth0"11:02
jambut grow into better logic11:02
jamvoidspace: so they only go in via "replicaSet"11:06
jamvoidspace: so the issue is that server.Addr still has the bad ipv6 address11:07
jamvoidspace: so from what I can see server.Addr is the one that we pass to newServer11:08
jamso cluster.server() is the other place that is setting it11:09
jamvoidspace: and that is being called by spawnSync11:09
jamwhich got the result of resolveAddr11:09
jamand got that addr11:09
jamultimately from an IsMaster call11:10
jamwhich is again ReplicaSet related, and we should be able to patch it at that level11:10
voidspacejam: which newServer?11:11
jamvoidspace: so I think we can patch line 140 of cluster.go to know it needs to translate back11:11
jammgo/server.go newServer11:11
jamAdd a check there11:11
jamthat we don't have an invalid IPv6 address11:11
voidspaceah, we have a newServer too11:11
voidspacewhich doesn't take an address11:11
voidspaceI've done a pull on my mgo so I can look at the latest version11:12
voidspaceso our lines aren't matching up11:12
voidspacelet me go back11:12
* jam goes to pick up my son11:12
jamvoidspace: so looking at 'master'11:25
jammgo.v2/server.go newServer11:25
jamis where Addr seems to be getting set11:25
jam(I didn't find another spot)11:25
jamthat seems to be called from cluster.go 394 "server()" and the addr is passed in11:25
jamvoidspace: and that is only being called by line cluster.go 45711:26
jamthat addr comes from the call to spawnSync11:26
jamwhich gets it from knownAddrs or from a hosts list11:26
jamhosts is from the result of syncServer, which gets it from a results object11:27
jamwhich is the result of calling ismaster11:27
jamgetKnownAddrs doesn't talk to mongo, but just pulls together all of the objects it already has in memory11:27
jamso I'm reasonably comfortable11:27
jamsaying the patch could be:11:28
jama) add a trap in server.go newServer that doesn't allow Addr to be an invalid IPv6 address (can probably use net.ParseAddr for that)11:28
jamb) fix cluster.go line 136 isMaster call to call, fill the result object, and then fix the result object to have valid addresses11:28
jamc) fix our replicaset/ package to do similar things11:29
jamc-i) we duplicate IsMaster, so we need to duplicate the fix11:29
jamc-ii) CurrentConfig probably needs fixing11:30
jamc-iii) Not sure about CurrentStatus, but probably11:30
jamc-iv) And Initiate and applyRelSetConfig would need fixing11:31
jamthough probably that is a helper that takes a Config11:31
jammungeIpv6Addresses(*Config)11:32
jam1voidspace: my machine just locked up. It is working well enough to lock the screen, but all text entry fields are not working.11:51
jam1voidspace:11:51
jam1so I don't know if you got my earlier message and whether it made sense11:51
mattywfwereade, ping?11:54
fwereademattyw, pong11:54
jam1fwereade: so if we add a JobManageNetworking, that requires an API bump, doesn't it?12:09
perrito666late good morning12:14
fwereadejam1, yeah, I think it does12:15
fwereadejam1, we don't really want to confuse old clients12:15
fwereadejam1, even if they would probably handle it with a minor logged whine12:15
jam1mmmmm, wine12:15
fwereade:)12:15
jam1fwereade: I think they would actually just casually discard it because the checks I see have an empty "default:" section.12:16
jam1but yes12:17
jam1I'm fine saying that it must be a new API when the set of values can change12:17
fwereadejam1, ah, I thought I remembered them logging an "unknown job" -- but indeed, I think we agree anyway ;)12:17
wallyworldjam1: i ran an ensure-availability test with my mongo login changes - the new state servers appeared to correctly start and juju status shows everything is ok. all-machines log looks ok too12:17
jam1wallyworld: are we sure that clusterAdmin is respected with 2.4 ?12:17
jam1because I'm sure machine-0 is the one that is setting up the replicaset12:18
wallyworldjam1: this is on a trusty state server12:18
wallyworldwhich is mongo 2.4.9 i believe12:18
wallyworldthe changes are only a band aide anyway :-(12:19
wallyworldi can't see a way to turn it off12:19
jam1wallyworld: so at this point it seems like we'd have to dig into the mongo code and figure out why it is emitting the warning, and I'm guessing it is a bug in mongo.12:21
wallyworldjam1: yeah, i did link 2 very similar bugs in the juju-core bug report12:21
wallyworldmongo bugs that is12:21
wallyworldany they are marked as targetted at 2.712:21
wallyworldand12:21
wallyworldso i can't see any fix coming for 2.412:22
jam1wallyworld: I agree that mongo won't fix it, I'm guessing it isn't something we can fix ourselves, unless we can do some post-config on syslog12:22
voidspacejam1: hey12:29
jam1voidspace: heya12:29
voidspacejam1: sorry, missed your messagess12:29
voidspacejam1: pretty sure I saw all your messages12:30
jam1voidspace: k. does it sound reasonable ?12:30
voidspacejam1: yep12:30
voidspaceCurrentConfig and CurrentStatus definitely need fixing, plus Add and Set12:30
jam1I think Add/Set end up using the same apply helper12:31
voidspaceor maybe just fixing applyRelSetConfig (which I've renamed applyReplSetConfig because it annoyed me) would do Add and Set automatically12:31
voidspaceright12:31
voidspacethey take a config which has Members and it's Members that needs fixing12:31
jam1voidspace: so I'd rather not mutate Members, but instead use an internal munged Members to pass on12:32
voidspacejam1: yep12:32
jam1voidspace: though that depends on whether you get a Members or a *Members12:32
voidspacejam1: although I think we create the config12:32
jam1voidspace: then it can just be the config thing that we mutate12:33
voidspacejam1: it's internal12:33
jam1which I think was my "mungeIPv6Addresses(*config)" suggestion12:33
voidspacejam1: the replicaset changes are easy enough12:33
voidspaceit's the mgo ones that are more funky, but you've done a lot of the work tracing it for me12:33
jam1voidspace: certainly you have to confirm with gustavo for the mgo ones, but I think they're straightforward and limited in scope12:34
voidspacecool12:34
jam1and stick well to the "translate at the point that is known to give/need bad information"12:35
voidspaceso long as that doesn't proliferate too far12:35
jam1well, it is all the stuff that talks about the replicaset config, I think12:36
perrito666ericsnow: ping me when you are back please13:05
rogpeppehas anyone here used the juju publish command?13:09
natefinchrogpeppe: I didn't even realize it already existed13:14
=== jheroux_away is now known as jheroux
ericsnowperrito666: I'm here13:39
ericsnowperrito666: let me guess, you have another PR you want me to "accidentally" merge <wink>13:42
perrito666ericsnow: mm, so you are the go to guy for those things :p13:42
* perrito666 makes a note13:42
perrito666nah, I wanted to make sure that with what is merged I can already work on restore integration to your code13:43
ericsnowperrito666: yep, the only missing parts are the high-level abstraction and the API server facade13:44
ericsnowperrito666: neither should have any relationship with the restore implementation13:44
perrito666ericsnow: did you pr the API server facade13:45
perrito666?13:45
ericsnowperrito666: it depends on 708, which is up for review right now13:45
perrito666it has been lgtmd, hasnt it?13:46
ericsnowperrito666: needs sign-off from wwitzel3's review mentor (or a full reviewer)13:49
perrito666natefinch: standup?14:04
natefinchperrito666: oops, coming14:08
voidspacenatefinch: is there a standard trick for a "right split" on strings, given there's no strings.SplitRight function?14:15
voidspacenatefinch: other than revers, split, reverse again14:15
natefinchvoidspace: use strings.LastIndex?14:17
voidspacenatefinch: ah cool, that will do nicely14:18
voidspacethanks14:18
voidspacenatefinch: and is there a function to split a string at an index point?14:20
natefinchvoidspace: foo, bar := baz[:x], baz[x:]14:22
voidspacenatefinch: thanks14:22
voidspacenice and easy14:22
voidspaceas it was easy, I assumed Go didn't support it...14:22
=== urulama is now known as urulama-afk
ericsnownatefinch: could you change the juju org github OAuth app URL to "https://reviews.vapour.ws/oauth/"?14:32
natefinchericsnow: sure14:33
natefinchericsnow: done, though that was only a case-change from OAuth to oath14:34
natefinchoauth that is14:34
ericsnownatefinch: ah, cool14:35
ericsnownatefinch: that URL still won't work until I get SSL working, but I can wait to switch "apps" until then14:36
ericsnownatefinch: right now it's using the app I registered on my own github account, which obviously is only a short-term solution14:36
voidspacenatefinch: ping, I'd like to ask a couple of questions if you don't mind14:38
voidspacenatefinch: I need to ensure I'm working on a copy of a struct and I don't know this area of Go well enough to know if I already am or not14:39
voidspacenatefinch: (because I want a mutated copy of the struct but don't want the caller to see the change)14:39
voidspaceI haven't actually asked the question yet. I don't expect you to know just form that...14:39
voidspacenatefinch: http://pastebin.ubuntu.com/8300392/14:41
voidspacenatefinch: just constructing my own version for play.golang.org to find out...14:42
natefinchvoidspace: the first rule of Go is that everything is passed by value14:43
voidspaceright14:43
voidspaceexcept slices14:43
voidspaceand therefore maybe iterating over a slice14:43
voidspaceand the *call* is constructing a slice too14:43
natefinchvoidspace: nope, they're passed by value, it's just that the value is a pointer to an array14:43
natefinchvoidspace: sorry, brb14:43
voidspacepass by value where the value is a pointer is what python does14:43
voidspacewhich never copies14:43
voidspaceso that doesn't elucidate...14:44
=== Ursinha is now known as Ursinha-afk
voidspacenatefinch: trying it with play.golang.org shows me it's a copy14:49
voidspacenatefinch: I *assume* it's the call and not the iteration that copies (?)14:49
voidspacealthough I can test that as well14:50
voidspacenope, the iteration copies too14:50
voidspaceunless you have a slice of pointers I guess14:50
natefinchvoidspace: back14:53
voidspacenatefinch: so the iteration definitely returns a copy14:54
voidspacenatefinch: and so does the call14:54
natefincheverything is always a copy unless you're dereferencing a pointer.  The trick is that slice[0] is dereferencing a pointer14:54
voidspaceright, but iterating over the slice isn't14:54
natefinchvoidspace: correct14:54
voidspacebut slice[0] = foo14:54
voidspaceis that creating a pointer14:54
voidspaceI guess it must be14:54
natefinchslice[0] is dereferencing the pointer to the backing slice and setting its value to foo14:55
voidspacenatefinch: that's clear to me now, thanks14:55
natefinchvoidspace: cool14:55
voidspacenatefinch: I needed to be sure I had a copy because I want to mutate the value14:55
voidspacenatefinch: in replicasets we now have "good ipv6 addresses" and "bad ipv6 addresses"14:56
natefinchahrh14:56
natefinchinteresting14:56
voidspacenatefinch: we always want to use good addresses, but mongo only works with bad ones14:56
voidspacenatefinch: the bug causing the ipv6 replicaset test to be unreliable is due to the fact that we *have* to use the format "::1:1234" for mongo14:56
voidspacenatefinch: but mgo calls net.Dial(addr) when it does syncServers14:57
voidspacenatefinch: and for net.Dial(addr) you *must* use the form [::1]:123414:57
natefinchvoidspace: well that's a kick in the pants14:57
voidspacenatefinch: so the test would pass if we didn't cause a syncServers and would fail if we did14:57
voidspacenatefinch: which seems to be random :-)14:57
voidspacenatefinch: it needs a fix in mgo14:57
voidspacenatefinch: but we're going to ensure that in replicaset (i.e. our side of the code) we only use and see the "good format"14:58
voidspacei.e. *with* square brackets14:58
perrito666voidspace: can we not use a struct with renderers for the different formats?14:58
voidspaceperrito666: the address is in the serialised bson14:58
voidspaceperrito666: and mgo stores it's own concept of server addresses14:58
voidspaceperrito666: so no14:58
=== Ursinha-afk is now known as Ursinha
* perrito666 tries lite ide15:07
natefinchperrito666: it's ok.  It makes using gdb less painful, but it's still not great15:08
perrito666natefinch: to be honest I usually only use the code navigation features on ides15:08
natefinchperrito666: ahh, the only reason I tried lite ide was the gdb integration.  As an editor it's kinda meh15:11
* TheMue came bake to the good old vim after trying Sublime text for some time15:15
* katco coughs. https://www.youtube.com/watch?v=DubEaS0lMqE15:16
perrito666TheMue: I always go back to vim15:17
perrito666but every now and then I need to take a stroll out of my comfort zone to remind myself why I use vim15:17
natefinchI technically have atom installed... I started it up once..... but haven't really played with it15:18
TheMueperrito666: hehe, good argument15:19
perrito666katco: trust me, RMS dressed as a saint is the opposite of good marketing for your editor15:19
katcoperrito666: tongue and cheek :p i don't try to market my editor haha15:20
voidspacekatco: I want to compare a value to a set of possible values15:21
voidspacekatco: is there anything more elegant than15:21
voidspace(entry.State == PrimaryState || entry.State == SecondaryState || entry.State == ArbiterState)15:21
katcovoidspace: switch entry.State { case PrimaryState,SecondaryState,ArbiterState: myfunc()}15:23
voidspacekatco: cool, thanks15:24
katcovoidspace: any time :)15:25
perrito666mm, why no editor offers a package navigation instead of file navigation15:31
TheMueperrito666: vim with tagbar allows a kind of, at least inside the current file as scope. here you can navigate over types, fields, functions etc15:32
perrito666TheMue: yup, so far I got, but what I meant is the way to navigate in the packages of a project15:33
TheMueperrito666: yes, I know, but sadly here I don't have a better answer yet15:35
perrito666TheMue: I am trying to mod ninja-ide to support go, I presume that I will eventually get there and will be able to navigate packages15:35
* TheMue still wants his old Smalltalk platforms back *sniff*15:35
TheMueperrito666: write a good vim plugin, I'll use it15:36
perrito666TheMue: I prefer to cut my fingers, vim plugin lang is awful15:36
TheMueperrito666: I've got at least my own little plugin giving me the most important commands at my fingertips15:37
TheMueperrito666: vimscript isn't nice, yes, but it works. but afaik you can use python too15:37
TheMueperrito666: or lua?15:37
alexisbfwereade, ping15:42
fwereadealexisb, heyhey15:44
fwereadealexisb, sorry about yesterday, public holiday15:44
alexisbfwereade, yep, I saw that post my pong15:46
alexisbping15:46
alexisbI would like to meet today if possible15:46
fwereadealexisb, do you have a couple of minutes now? otherwise it will need to be later15:47
alexisbare you free post the actions call?15:47
fwereadealexisb, I'm catching up with bodie now because I need to be away at 615:48
alexisbfwereade, what time are you available later?15:48
fwereadealexisb, to be safe, let's say 3 hours from now on the hour?15:50
fwereadealexisb, hope it'll be quicker15:50
fwereadealexisb, but probably better to have an actual time15:50
alexisbfwereade, ack, 3 hours is fine15:50
alexisbI am not in a hurry but would like to catch you this week before you are out15:50
fwereadealexisb, yeah, it's been a while, I meant to come to our 1:1 yesterday but then was out and completely forgot15:54
alexisbfwereade, no worries at all15:55
alexisbI sent an invite and I am flexible if that doesnt work15:55
ericsnownatefinch: I totally spaced our 1-on-116:03
ericsnownatefinch: you have time later?16:03
natefinchericsnow: heh it's ok, I was busy any way.  later is fine16:03
ericsnownatefinch: when is good for you?16:03
natefinchericsnow: pretyy much any time except for the next hour16:05
ericsnownatefinch: let's go in 2 hours then16:05
natefinchericsnow: cool16:06
rogpeppefwereade, jam: i see that ian has changed the blobstore to use sha384, which is great. i wonder what you think about using sha256 instead (a minor change now) so that the hashes match the current hashes used for local charm caching. this would make migration considerably more straightforward.16:23
rogpeppewallyworld: ^16:24
wwitzel3woo, back standing16:56
wwitzel3that was a crappy two weeks of sitting :(16:57
perrito666wwitzel3: sweet you unpacked the legs finally?16:58
perrito666:p16:58
wwitzel3:)16:59
wwitzel3they arrived after standup16:59
perrito666natefinch: wow, it really sucks as an editor, when hitting ctrl+s if there is nothing to edit it will just write s17:24
perrito666s/edit/save17:24
natefinchperrito666: wow, I hadn't noticed that17:25
perrito666they are doing a Qt pattern for key handling which should not be used in this case lol17:26
voidspacenatefinch: can you start a strategy multiple times?17:30
natefinchvoidspace:  I don't know17:31
voidspacenatefinch: heh, me neither17:32
voidspacenatefinch: guess I'm about to find out17:32
voidspacenatefinch: well, it either worked or succeeded on first attempt so didn't need to check for HasNext17:32
natefinchericsnow:18:06
ericsnownatefinch: coming18:06
voidspaceright EOD18:24
ericsnownatefinch: https://github.com/juju/juju/pull/70818:24
ericsnowcmars: could you take a look at https://github.com/juju/juju/pull/708?18:51
natefinchericsnow: I'm currently reviewing, btw, but certainly welcome more eyes.18:55
ericsnownatefinch: sorry, I thought you were running an errand18:55
natefinchericsnow: doesn19:00
natefinchericsnow: I was just going to the freezer, not the store :)  I can see the confusion, though.19:01
ericsnownatefinch: :)19:01
cmarsericsnow, back from lunch, will take a look soon19:01
perrito666uhh ice cream19:01
ericsnowcmars: thanks19:01
cmarsmattyw, fwereade restarting chrome for a hangout19:01
natefinchperrito666: well, my wife *is* pregnant :)19:01
mattywcmars, ack19:02
perrito666natefinch: so you got all the possible combos?19:02
natefinchperrito666: heh19:02
natefinchperrito666: nah, just chocolate.19:02
jcw4rick_h_: I sent an email to the juju-dev list asking for use cases around Juju Actions19:10
rick_h_jcw4: awesome19:11
jcw4rick_h_: per our discussion last week I'm particularly interested in the GUI perspective... thanks :)19:11
rick_h_jcw4: will do, we've got a couple of charms in progress we'd use actions for19:12
jcw4rick_h_: perfect!19:12
* cmars is looking at ericsnow's backup PR19:17
cmarsericsnow, i'm not familiar with the backup story in juju, but i'll try to pick it up from context in the code. anything else that might be helpful (bugs, docs, etc)?19:18
ericsnowcmars: https://github.com/juju/juju/blob/master/doc/backup_and_restore.txt19:18
cmarssweet, thanks19:18
ericsnowcmars: basically that PR is the barrier between the backups implementation and the rest of juju19:19
* perrito666 said he would buy one of those new phones and suddenly there is a horde of floss advocates comparing him to every possible traitor in history19:22
perrito666its a good thing I said nothing about the watch thing19:22
rick_h_jcw4: so this doc you linked seems more about the actions api vs examples of 'actions a charm would implement'19:40
rick_h_jcw4: which are you kind of looking for?19:40
perrito666ericsnow: you got lgtmd, meeerge19:40
perrito666:p I want to see the next pr19:41
ericsnowperrito666: there's a CI blocker19:41
perrito666...19:41
perrito666life19:41
ericsnowperrito666: in the meantime, you can already review the next patch at https://github.com/ericsnowcurrently/juju/pull/419:52
ericsnowcmars: would you mind reviewing the next PR instead ^^^19:54
cmarsericsnow, sure19:54
ericsnowcmars: thanks19:54
perrito666ericsnow: wait, in none of those I see the actual backups command arent you missing one pr?19:56
jcw4rick_h_: right now we're trying to nail down the API, but examples of actions a charm would implement are valuable too.19:57
ericsnowperrito666: that PR exposes the Create method on the new Backups facade19:57
rick_h_jcw4: ok, will make some space for our notes/such and you can pull it in as you need.19:58
jcw4muchas gracias!19:58
ericsnowsinzui: how soon do you think we'll know on the re-testing for bug #1366802?20:04
mupBug #1366802: juju.-gui fails with a config-changed error when used under juju 1.21alpha <ci> <regression> <juju-core:Triaged> <juju-gui:Incomplete> <https://launchpad.net/bugs/1366802>20:04
sinzuiericsnow, It is fixed, but there is a more catastrophic regression being reported now20:11
pafounettehi20:12
ericsnowsinzui: lovely20:12
pafounette"juju bootstrap" doesn't work because  juju seems to compare non-localized-error-string against localized-error-string ...  https://github.com/juju/juju/blob/master/environs/sshstorage/storage.go#L25420:14
pafounetteso why not using the errno ?20:15
=== ChanServ changed the topic of #juju-dev to: https://juju.ubuntu.com | On-call reviewer: see calendar | Open critical bugs: 1367431
sinzuinatefinch, can you get someone looking at bug 1367431?20:16
mupBug #1367431: Juju upgrade times out, never completes <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1367431>20:16
natefinchpafounette: ug, that's some ugly code20:22
pafounettenatefinch, yeah :) and I can't "bootstrap" just because my hosts don't return english error messages20:23
natefinchpafounette: that's definitely a bug we need to fix.  My apologies for the problems it's causing.20:23
pafounettenatefinch, thanks :)20:24
thumpermenn0: morning21:13
thumpermenn0: not sure if it is related to the bug natefinch just passed on21:13
thumpermenn0: but I consistently get a jujud upgrade test failure locally21:13
thumpermenn0: could be related to my changes though21:13
thumpermenn0: want me to check master?21:13
menn0thumper: unit test or running a manual upgrade?21:13
thumperunit test21:13
menn0interesting21:14
thumperbut I'm not sure if I even have your branch merged in actually21:14
* thumper checks21:14
thumpermenn0: actually, I don't21:14
thumperso it won't be that21:14
menn0ok21:14
menn0do you have the failure details handy. I might still be able to figure out what's happening.21:15
thumpermenn0: sure, will pastebin21:15
ericsnowthumper: wasn't there discussion somewhere about supporting sub-commands for juju sub-commands?21:15
perrito666menn0: do you think that mi branch that was merged yesterday can be the culprit?21:15
menn0menn0: I haven't looked at the CI failures yet but given the timing it seems probable21:15
thumpermenn0: http://paste.ubuntu.com/8303344/21:16
thumperericsnow: yes21:16
thumperericsnow: I'm trying to get direction from sabdfl21:16
thumperericsnow: we use it sometimes now21:16
ericsnowperrito666: wow, that just keeps on giving21:16
thumperericsnow: but specs that have been put forward recently get sent back with "use top level commands"21:17
ericsnowthumper: I'm adding a new juju backups command that will have its own subcommands21:17
perrito666menn0: do remember that a couple of steps where reverted there and you had a comment on how the order of some steps where reverted in order21:17
thumperericsnow: fwereade and I (and some others) do prefer subcommands, so we're trying to get definitive feedback21:17
ericsnowthumper: I'd rather not have to roll my own support for that if I can avoid it :)21:17
thumperericsnow: there are examples already in our code21:18
ericsnowthumper: which commands?21:18
thumperericsnow: look at the user command21:18
menn0perrito666: yep21:18
ericsnowthumper: thanks21:18
thumperericsnow: although it is disabled on the release branch21:18
thumperas we mess around with the api21:18
ericsnowthumper: no worries21:19
menn0perrito666: sorry, I misunderstood the first thing you said. I think it's more likely that it's my big upgrade sync branch, not yours.21:19
menn0perrito666: but I don't know much at this stage.21:19
menn0thumper: is that the only test that's failing/21:19
perrito666menn0: I am eod but count me in for additional help21:19
menn0perrito666: thanks21:19
thumpermenn0: yeah21:20
thumpermenn0: I'm looking at it now21:20
menn0perrito666: I think once I'm able to reproduce the problem locally I should be able to get it sorted pretty quickly.21:20
menn0thumper: that test is one of the older ones (from before my time) although the code it's running has certainly changed plenty recently21:21
menn0thumper: it's quite strange that it's just that test failing21:21
* thumper nods21:21
menn0thumper: definitely try with master on your machine as a first check21:21
thumpermenn0: ping me if you need any help with this bug21:23
menn0thumper: will do thanks21:23
* thumper pulls master21:23
ericsnowcmars: FYI, I've added you as an admin on reviewboard21:27
cmarsericsnow, sweet, thanks21:27
thumpermenn0: I've grabbed an updated master, and currently the tests are stuck on cmd/jujud21:35
thumperI'm guessing they may time out later...21:35
thumpermenn0: but this could be handy for reproducibility21:36
menn0thumper: I'll try current master on my machine21:36
thumpermenn0: I'm wondering if this is related to a change from axw where the tools are now in the environ storage21:37
thumpermenn0: or gridfs or whereever it went21:37
menn0thumper: could be. curtis wrote on the ticket for the CI blocker that axw's merge for that seemed to be where things broke.21:38
thumpermenn0: this one https://github.com/juju/juju/pull/70021:38
menn0thumper: the jujud upgrade tests on master pass on my machine21:39
menn0thumper: trying all the cmd/jujud tests now21:39
thumperfailed here21:39
menn0thumper: wonderful :(21:40
thumper3 failed21:40
=== jheroux is now known as jheroux_away
thumpermenn0: confirm your tip hash?21:42
menn0thumper: 203a10db796649043a1162df35d6cf96a14b479821:42
thumperwhich is pull #681 merge?21:42
thumperif so, we have the same version21:42
* thumper reruns tests21:43
menn0thumper: that's the one21:43
menn0thumper: all cmd/jujud tests pass btw21:43
thumperseems like a race condition then21:43
menn0thumper: so there's something environment at play21:43
thumperI got three failures21:43
thumperperhaps21:43
menn0environmental I mean21:43
thumpereither environmental or racy21:43
thumpermenn0: run the tests five times21:44
menn0thumper: will do21:44
menn0thumper: i've dug into the CI failure logs a bit for the "local upgrade on trusty" job21:44
menn0thumper: machine-0 upgrades fine, machine-1 upgrade fine (but there's a rsyslog issue) and machine-2 doesn't upgrade because it can't download the tools21:45
menn0thumper: so it's looking more likely that it's the tools in gridfs change that causing the CI issues21:45
thumperhmm...21:46
thumperwhy can't machine-2 get the tools?21:46
thumperha21:46
thumpernot environmental21:46
thumperpass that time here21:46
menn0this gets repeated over and over:21:46
thumperwell that sucks21:46
menn02014-09-09 19:26:38 INFO juju.worker.upgrader upgrader.go:167 fetching tools from "https://10.0.1.1:17070/environment/558e5fc8-f707-45d6-8066-0698e5ac2e4e/tools/1.21-alpha1.1-trusty-amd64"21:46
menn02014-09-09 19:26:38 INFO juju.utils http.go:66 hostname SSL verification disabled21:46
menn02014-09-09 19:26:41 ERROR juju.worker.upgrader upgrader.go:157 failed to fetch tools from "https://10.0.1.1:17070/environment/558e5fc8-f707-45d6-8066-0698e5ac2e4e/tools/1.21-alpha1.1-trusty-amd64": bad HTTP response: 400 Bad Request21:46
menn0in the machine-2 logs21:46
* menn0 goes to run those unit tests again21:47
thumperruns twice21:54
thumperthen failed in one place21:54
thumpermachine_test.go:701:21:55
thumperthrough machine_test.go:909:21:55
menn0thumper: I've just run all the cmd/jujud tests 5 times without failure21:57
thumpertry again?21:57
wallyworldthumper: menn0: just getting up to speed, anything I can do?21:58
menn0wallyworld: we have 2 problems, possibly related21:58
thumperwallyworld: the gridfs tools patch is causing CI errors21:58
wallyworldthumper: intermittent?21:58
thumperwallyworld: but I also get race conditions in cmd/jujud tests21:58
thumperwallyworld: seems to pass on only one architecture21:58
wallyworldhmmmm, ok21:58
menn0wallyworld: almost all the upgrade related CI tests are failing21:59
wallyworldi'll start looking at CI, will likely make more progress once andrew comes online21:59
menn0wallyworld: I've been looking at the logs for the CI failures, particularly local provider upgrades on trusty21:59
menn0wallyworld: there's 3 machines in the env. machine-0 and machine-1 upgrade fine22:00
menn0wallyworld: but machine-2 can't download the tools22:00
wallyworldinteresting22:00
menn0wallyworld: even though machine-1 is downloading from the same URL22:00
menn0wallyworld: at about the same time22:00
wallyworld:-(22:00
menn0wallyworld: bad HTTP response: 400 Bad Request22:00
wallyworldso not a 40422:01
wallyworldor a 50022:01
menn0nope22:01
menn0the server thinks the client is sending a bad request22:01
wallyworldand yet it will the the same request for machine 1 or 222:02
menn0wallyworld: indeed!22:02
menn0wallyworld: that's what's strange22:02
wallyworldawesome22:02
menn0wallyworld, thumper: I'm going to try and repro the CI failure locally22:03
wallyworldi'll start digging as well, just need a coffee first22:03
menn0wallyworld: thumper: and if that pans out, try ripping out axw's change22:03
menn0wallyworld, thumper: I thought it was going to be related to my big upgrade sync merge but it's really not looking like that now22:04
wallyworldmenn0: you can leave it tome to look if you want to get back to other tungs22:04
wallyworldthings22:04
menn0wallyworld: that might make sense22:05
wallyworldno use all of us being tied up22:05
menn0wallyworld: oh and another thing22:05
menn0wallyworld: a possible other problem I noticed in the CI failure logs22:05
wallyworldyou sound like COlumbo22:05
menn0wallyworld: after machine-1 upgraded (successfully) the rsyslog worker was borked22:06
menn02014-09-09 19:17:32 INFO juju.worker runner.go:261 start "rsyslog"22:06
menn02014-09-09 19:17:32 ERROR juju.worker runner.go:219 exited "rsyslog": x509: cannot validate certificate for 10.0.1.1 because it doesn't contain any IP SANs22:06
menn02014-09-09 19:17:32 INFO juju.worker runner.go:253 restarting "rsyslog" in 3s22:06
menn0machine-0 was fine after upgrade22:06
wallyworldsounds unrelated22:06
menn0and machine-2 didn't manage to upgrade22:06
wallyworldi have no idea what an IP SAN is22:06
menn0wallyworld: yep I think it's unrelated but is yet another thing to sort out22:06
wallyworldyeah :-(22:06
menn0no doubt related to the recent work in this area22:06
wallyworldyup22:07
menn0I don't know what an IP SAN is either22:07
thumpercmars: I'm going to stand you up today22:07
thumpercmars: next week?22:07
cmarsthumper, no prob22:07
wallyworldmenn0: thanks for looking22:07
menn0I could guess what I'm a SAN IP is but that makes no sense in terms of the rsyslog worker :)22:07
cmarswallyworld, IP SAN = subjectAltName22:08
cmarsyou have to use a different x509 field to issue a cert for an IP addr22:08
wallyworldok22:08
menn0maybe we should change the logs to say subjectAltName instead of SAN22:09
wallyworldhopefully the network guys know how to fix22:09
cmarsmenn0, i think that message comes from crypto/tls22:09
menn0thumper: do you want to try running the jujud unit tests with andrew's change removed?22:09
cmarsor crypto/x50922:09
menn0cmars: right22:09
menn0cmars: so not so easy to change22:09
thumpermenn0: will do, otp just now22:10
menn0thumper: or indeed mine22:10
menn0thumper: kk22:10
sinzuiwallyworld, katco does safe-mode get converted to provisioner-harvest-mode during upgrades?22:16
wallyworldsinzui: yes22:16
wallyworldalthough the default is different22:16
sinzuithank you wallyworld. so long as the value set transitions to the new scheme, I don't need to document madness22:17
sinzuiThat is the only good news I have had today22:17
wallyworld:-(22:17
wallyworldCI s not happy with trunk22:18
thumperwallyworld: my frustration is around the intermittent failures I get on upgrade22:18
thumperwallyworld: but I believe they may be due to mongo not starting22:18
wallyworldyeah :-(22:18
thumperwallyworld: and may well under the covers be the standard mongo failures22:18
thumperhard to get logging for that.22:18
wallyworldwell awesome22:18
* thumper sees where extra logging could go22:19
wallyworldsinzui: i haven't checked - are you guys setting up a test run on jenkins using mongo 2.622:19
sinzuinot yet wallyworld22:19
wallyworldok22:19
wwitzel3I'm looking at this ticket https://bugs.launchpad.net/juju-core/+bug/1365623, is there any reason we can't just add a --force to juju run and skip the acuireHookLock step? Is there more to it than that? Or should that work?22:27
mupBug #1365623: juju run with option to bypass hook queue  <feature> <juju-core:Triaged> <https://launchpad.net/bugs/1365623>22:27
thumperwwitzel3: seems fine, as long as it only works at the machine level, not charm22:31
thumperbugger22:43
* thumper drums fingers while waiting for the jujud tests22:48
thumperfive good in a row22:48
* thumper has messed with logging22:48
thumperthe more I mess with the tests, the more inclined I am to write my juju-test plugin22:48
wallyworldthumper: i had a *very* brief look earlier, and it seems jujud represents the vast majority of our intermittent failures now22:50
wallyworlda bit early to tell for sure22:50
wallyworldi'm very tempted to remove the test retry on landing22:51
thumperI'm going to poke a bit longer22:51
thumper+1 on that22:51
wallyworldok, i'll jfdi22:51
thumpergah22:53
thumpertests aren't failing now22:53
wallyworldthumper: sinzui: i have removed the --retry flag from the landing tests. let's see how that pans out22:55
sinzuiwallyworld, thank you, I think you are taking a courageous step22:58
katcowallyworld: EOD, sent you an email w/ latest22:58
axw_wallyworld: any insight into the tools upgrading errors?22:59
axw_I did test upgrade, all worked... :/23:00
wwitzel3thumper: ok, thanks23:02
wallyworldsinzui: we can easily revert, but i hope the landing tests will pass much more often now23:03
wallyworldkatco: thank you, have a good evening23:03
wallyworldaxw_: not yet sadly23:03
katcowallyworld: thanks, have a good day wallyworld and axw_ (and everyone just coming on)23:04
axw_cheers katco, good night23:11
wallyworldaxw_:  there's a difference between 1.20 and 1.21 - the tools fetching in 1.20 uses utils.GetHTTPClient(hostnameVerification), whereas 1.21 uses utils.GetNonValidatingHTTPClient()23:31
wallyworldmaybe that could explain the 40023:31
wallyworldwhen 1.20 is trying to fetch the new tools23:32
wallyworldjust a guess, but i can't see anything else to go on23:32
axw_wallyworld: that's intentional; we can't validate the API server for HTTPS23:33
axw_I just tested (again) upgrading 1.20.7 to 1.21-alpha123:34
axw_testing on ec2 now23:34
axw_worked on local23:34
wallyworldaxw_: but 1.20 is talking to the state server http endpoint to get the tools23:34
wallyworldusing the validating client and https23:35
axw_wallyworld: ah yes, but the 1.21 API server always tells the client to disable verification23:35
axw_there's an API call that is used first to find the URL, and a flag to decide whether validation is done23:35
wallyworldok, so we are sure hostnameVerification=false23:35
axw_pretty sure we'd see something other than 400 if validation failed anyway23:36
axw_fairly, I'll double check23:36
wallyworldyeah, i'm just clutching at straws a bit23:36
axw_wallyworld: yep, in apiserver/common/tools.go, there's a TODO to remove the flag in 1.2223:37
wallyworldok23:37
wallyworldaxw_: the logs seems to show that the only tools fetch that succeeds is the one to get them from http://juju-dist.s3.amazonaws.com23:39
axw_waigani: I'm pretty sure the "blank space before comment" rule only applies when it follows another code block.23:39
wallyworldit seems all of the calls to the state server http fail23:39
axw_wallyworld: yeah...23:39
waiganiaxw_: oh really?23:40
waiganiaxw_: so directly after a func sig is okay?23:41
wallyworldi gotta agree with axw_ here, waigani23:41
wallyworldtoo much whiespace is horrible23:41
axw_so you can see where one logical set of operations begins and another ends23:41
waiganiwallyworld, axw_: so what exactly is the rule, as I've been reviewed to add whitespace before comments23:41
axw_waigani: the only times I've been told that (before) is when there's a bunch of fields together in a struct, and no space between them/comments23:42
waiganiaxw_: that makes sense, I'll go with that for now23:42
wallyworldwhen declaring an interface, you need whitespace before each doc comment for the methods23:42
axw_cheers23:42
wallyworldalso when folling a code block23:42
wallyworldfollowing23:43
thumperdavecheney: comment added, FWIW, but no vanguard to poke23:43
thumperwallyworld: I've added some chagnes to the jujud tests, and now I can't get any failures23:43
wallyworldgood right?23:43
thumperwallyworld: so I guess that is good, but ever so slightly concerning23:43
thumperwallyworld: yeah...23:43
thumperI'll propose23:43
wallyworldthumper: what sort of changes?23:44
thumperafter lunch...23:44
wallyworldok23:44
axw_wallyworld: bah, upgrade on ec2 worked too :|23:44
thumperjujud gained a setup logging method that did the lumberjack stuff23:44
thumperthat replaced the default logger23:44
thumpernow we use the default logger in the tests to send things through c.Log23:44
wallyworldthumper: that's concerning that a logging change affects stuff like that23:44
thumperso I mocked it out for the tests23:44
thumperwallyworld: yes... that is why I said conerning23:45
wallyworldi see now that i have the info :-)23:45
thumperwhat I was trying to do was to capture the logging output23:45
thumperinstead of having the tests write it to a file23:45
wallyworldheisenburg :-)23:45
thumpernow I could no longer reproduce23:45
thumperyeah23:45
thumperI'll submit a patch after lunch23:45
wallyworldok23:45
thumpera fuck it23:45
thumperI'll do it now23:45
thumperthen it may be landed by lunch23:45
menn0axw_: do you want me to assign bug 1367431 to you? I've added some detail about what we know so far.23:46
mupBug #1367431: Juju upgrade times out, never completes <ci> <regression> <upgrade-juju> <juju-core:Triaged> <https://launchpad.net/bugs/1367431>23:46
axw_menn0: sure, thanks23:47
wallyworldaxw_: it seems that the sendError(400) used by the tools http server is not also logging the error passed to it, so we are blind as to the root cause23:49
wallyworldmaybe we need to add extra logging23:49
axw_yep23:49
wallyworldlet's do that and lands as a fixes-blah23:50
wallyworldthen we can see the errors23:50
axw_will get onto it23:50
wallyworldok23:50
thumperwallyworld: https://github.com/juju/juju/pull/71723:50
thumperwallyworld: lets do this...23:50
wallyworldlooking23:50
menn0axw_: done23:51
axw_thanks23:51
wallyworldaxw_: i think we should log the error server side in the sendError(), as well as when it is received client side23:52
axw_wallyworld: changing client won't help atm, as it's old code. I will update server23:53
menn0axw_: this may not be relevant but perrito666 merged some changes to API server login handling yesterday. that's about restricting API calls during restore but it may have inadvertently caused what you're seeing.23:53
axw_menn0: maybe, though I've pulled master and can't repro yet23:53
thumperwallyworld: if you are happy, please add the merge flags, I'm going to lunch23:53
wallyworldthumper: will do23:53
thumperta muchly23:54
wallyworldthumper: will need to wait till landings unblocked though23:54
axw_wallyworld: actually there is a slightly useful error message that narrows it down a bit23:58
axw_"bad HTTP response" means the API server failed to find the tools locally, and failed to find them remotely23:58
wallyworldpaste it?23:58
wallyworldaxw_: i could see from that (400) where in the code it is being generated, but not exactly why23:59
perrito666menn0: that was not supposed to be merged so you migt revert it without asking too23:59
wallyworldhence the need for extra logging23:59

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!