/srv/irclogs.ubuntu.com/2013/09/26/#juju-dev.txt

=== gary_poster is now known as gary_poster|away
* thumper misses list comprehension in go02:23
wallyworld_thumper: mr ocr, i have a branch which hooks up simplestreams mirrors support for tools https://codereview.appspot.com/1395204303:02
wallyworld_for fuck sake, out landing bot has been shot down03:07
wallyworld_shut03:07
wallyworld_ah maintenance i think03:07
thumperhi wallyworld_03:13
wallyworld_hi03:13
thumperI'll look shortly03:13
wallyworld_i hope canonistack is back soon03:13
wallyworld_np03:13
bradmwallyworld_: I thought it was back already?03:20
wallyworld_bradm: when i nova list it says the instances are shutdown03:20
wallyworld_+--------------------------------------+----------------------+---------+-------------------------+03:20
wallyworld_| ID                                   | Name                 | Status  | Networks                |03:20
wallyworld_+--------------------------------------+----------------------+---------+-------------------------+03:20
wallyworld_| 4829b364-72ad-4ee7-a21c-3ba640f28854 | juju-gobot-machine-0 | SHUTOFF | canonistack=10.55.32.55 |03:20
wallyworld_| 97a7c226-a195-4014-9df5-c998bba3a491 | juju-gobot-machine-3 | SHUTOFF | canonistack=10.55.32.52 |03:20
wallyworld_+--------------------------------------+----------------------+---------+-------------------------+03:20
bradmwallyworld_: yeah, the compute node being rebooted will do that03:21
wallyworld_bradm: it would not have been in the procedures to restart stuff that was running?03:21
bradmwallyworld_: I wasn't directly involved, but it would seem not to be the case03:22
wallyworld_:-(03:22
wallyworld_this is the second time our instances have been broken :-(03:22
bradmyou can't just power it on?03:22
wallyworld_i'm not sure how03:22
wallyworld_i assume thee's a nova command03:23
wallyworld_i'll take a look03:23
bradmnova start <id>03:23
wallyworld_yeah, trying that now03:23
wallyworld_bradm: back running, seems quicker perhaps03:24
bradmwallyworld_: probably, there's likely hardly anyone elses instances going :)03:25
wallyworld_\o/03:25
bradmthe compute nodes are pretty beefy machines03:26
bradmthey're just being overcommitted by a lot03:26
bradmI'll chase up what happened internally, the announcements did say things would be restarted03:29
bradmbut that definately appears not to be the case, or at least not consistantly03:29
wallyworld_thanks :-)03:31
thumperwallyworld_: something is wrong with the gobot03:33
thumperno mongod03:33
wallyworld_:-(03:33
wallyworld_i'm not familiar with how it is set up sadly03:34
thumperwe need more monday gods03:34
thumpermon-god03:34
wallyworld_yeah03:34
wallyworld_although stopping and starting should not have affected it you'd think03:34
bradmfwiw with my dinky little juju test env on lcy02 the reboot didn't break it, its back up and going03:36
thumperoh good03:37
wallyworld_thumper: i had a quick look - mongod is in /usr/local/bin and /usr/local/bin is in the path so i'm not sure03:38
* thumper -> haircut03:41
hazmatthumper, testing saucy local fwiw03:50
hazmatthumper, is there a particular version of interest? trunk i assume?03:52
hazmatthumper, fails for me.. although looks like diff issue, namely the upstart job needs a wait between dropping an upstart template to disk and starting till inotify triggers and register with upstart.04:16
bradmthis is very interesting, a default juju bootstrap on lcy02 fails, since the instance type isn't big enough04:32
bradmmongodb shuts itself down saying there's not enough space04:34
wallyworld_bradm: that started happening about a week ago for some reason, i think folks are looking into it04:53
bradmwallyworld_: I can tell you why04:53
bradmwallyworld_: the default instance is a m1.tiny, which has a 2G /04:54
wallyworld_ok :-)04:54
wallyworld_juju used to be ok in 1G04:54
wallyworld_or even 51204:54
bradmwallyworld_: I just bootstrapped with more, mongodb alone uses 3G04:54
bradmwallyworld_: its disk thats the issue, not memory04:54
wallyworld_serious? the landing bot bootstrap machine used to be a 512M instance04:54
wallyworld_ah disk04:54
wallyworld_i thought you were talking about ram04:55
wallyworld_still, juju should not pick tiny on canonistack04:55
bradmI can't say why mongodb suddenly wants all your disk, but that seems to be the c ase04:55
bradmbootstrap it with a m1.tiny and you'll see, check the logs in /var/log/mongodb04:56
bradmit pretty clearly says it needs mor disk04:56
wallyworld_there's some issue in how juju is choosing the instance04:56
wallyworld_it used to work. it should be picking small04:56
wallyworld_i'm not sure of the current status though but it is being looked at04:56
bradmyeah, not sure where it changed, but thats the fix, to bootstrap with contraints that give you a bigger disk04:56
bradmis that whats happening with your gobot?  needs more disk for mongodb?04:59
bradmI wonder if mongodb should be using the smallfiles option05:01
wallyworld_bradm: could be, but it was running fine before the shutdown05:02
bradmwallyworld_: /var/log/mongodb/mongodb.log should make it pretty clear05:04
wallyworld_bradm: yeah, true. i'm tied up trying to get some coding finished, but i'll look soon05:05
bradmwallyworld_: cool, I can do some more testing myself once I've gotten this charm done05:15
wallyworld_bradm: ok. i'm flat out right now as i'm off from tomorrow for a week and am trying to get everything done before i go. i'll hopefully be ale to look a bit later05:16
bradmwallyworld_: actually, I'm off next week too :)05:17
wallyworld_\o/05:17
wallyworld_going anywhere?05:17
bradmyeah, my parents have taken our son for a holiday, we're driving up there to pick him up and spend some time with them05:18
bradmits one of our first times (outside of hospital) that we've been away from him, its interesting05:18
wallyworld_how old is he?05:19
bradm605:19
wallyworld_yeah, we didn't spent time away from out son for a few years also05:19
bradmthere are medical issues with him too, so we're probably a bit more protective than normal05:20
wallyworld_yeah, i can understand that05:20
bradmhe had 2 open heart surgeries before he was 505:20
wallyworld_wow05:25
wallyworld_glad he's ok05:25
bradmyeah, he's pretty good given what he's had to go thru05:25
bradmhow about you?  going anywhere interesting?05:25
wallyworld_hervey bay to watch whales, then to frazer island for a few days05:26
wallyworld_looking forward to it05:26
bradmahh, nice - I've been to hervey bay whale watching before, lots of fun05:27
wallyworld_yeah, me too about 10 years ago05:28
wallyworld_with kid #1. now with kid #205:28
bradmwe're starting to think along those lines for holidays as the boy gets older, he might actually get a bit more out of it05:29
wallyworld_yep. we took kid #1 to nz when he was 4 and he remembers nothing. what a waste05:29
bradmit'd be pointless for us before now, we always seemed to spend a good portion of the year with him in and out of hospitals05:30
wallyworld_that's a shame, i hope he gets well asap05:32
bradmhe's been really good this year05:32
bradmusually a flu would mean a trip to hospital, this year so far things have been good05:32
wallyworld_\o/05:34
bradmohh, there's two mongodb running in my juju env05:35
bradmand its the non juju one taking up all the space05:35
bradmthe juju started one has --smallfiles, the other one doesn't05:37
wallyworld_ah05:38
=== thumper is now known as thumper-afk
jamwallyworld_: as I'm working through some other things, I came across this question. How does "juju bootstrap --upload-tools" work today w/ openstack. Doesn't it put the tools in your private bucket, which should *not* be world readable?06:13
wallyworld_jam: yes06:13
wallyworld_it puts them in private bucket06:13
jamwallyworld_: right, and both cloud-init and Upgrader just use a "wget" to get the tools06:14
jamno Auth06:14
wallyworld_jam: bot is down since the canonistack maintenance. i haven't had a chance to look deeply, but running tests says it can't find mongod in path, but mongod is in /usr/local/bin and that dir is in the path from what i can see06:14
wallyworld_jam: it uses a temp url06:15
wallyworld_which is publically readable06:15
wallyworld_jam: i've tested bootstrapping with upload tools and simplestreams and it works fine06:15
wallyworld_unless i've missed smething06:16
wallyworld_jam: when getting to tools URL, it does a storage.URL() which for environs storage returns a url from which anyone can read06:17
jamwallyworld_: we don't have temp urls on canonistakc, IIRC. I'm worried we're actually making our private containers world readable06:17
davecheneywallyworld_: sounds like the bot is using the old tarball06:17
davecheneyit should use mongodb from ppa:juju/stable06:17
jamwallyworld_: when working it out originally, we decided it was ok that the "public-bucket-url" had to be world readable06:17
jamI don't expect it to be any different with tools-url06:18
jambut I'm seriously suspecting that we should be able to "juju bootstrap --upload-tools" on Canonistack06:18
wallyworld_jam: i'll have to check but it all seemed to work ok06:18
wallyworld_upload tools is now automatic06:18
jamwallyworld_: you mean sync-tools ?06:18
wallyworld_no, upload06:18
jamagain, I think if it *is* working, we have a security hole06:18
wallyworld_i don't recall explicitly setting permissions on the control buket06:19
jamwallyworld_: "swift stat $PRIVATE-BUCKET" has ".r:*,.rlistings"06:21
jamwallyworld_:  :(06:21
wallyworld_hmmm. the tool stuff doesn't set that i'mpretty sure06:21
jamwallyworld_: I don't know *who* is setting it, but it is wrong, and it means private tools won't work when we "fix" it.06:21
jamwallyworld_: I think the "auto-upload-tools" stuff creates a bucket and sets it world readable06:22
wallyworld_i'll have to check06:22
wallyworld_jam:06:23
wallyworld_containerName: ecfg.controlBucket(),06:23
wallyworld_// this is possibly just a hack - if the ACL is swift.Private,06:23
wallyworld_// the machine won't be able to get the tools (401 error)06:23
wallyworld_containerACL: swift.PublicRead,06:23
wallyworld_this was put in in january06:23
wallyworld_by dimiter i think06:23
jamwallyworld_: with nobody realizing "you can't get the tools, but you're exposing all of your secrets to the world" ?06:23
jamI'm not 100% sure what goes in the private bucket06:24
jamas I don't think we put creds there.06:24
jamSo it *might* be ok06:24
wallyworld_we put the state file there06:24
jamwallyworld_: which is just the IP address, right?06:24
wallyworld_i'd have to check but i don't think creds go there06:24
wallyworld_i *think* so06:24
jamwallyworld_: I think the only actually private thing is potentially private charms06:29
jamAs I'm pretty sure we put the charm data in there06:30
jamhowever06:30
jamthat *also* needs to be accessible via "wget" because of how we removed Environ creds from the Uniter agents.06:30
wallyworld_so it could be worse i guess06:30
jamfwereade: I need to chat with you about this.06:31
jamwallyworld_, fwereade: security bug #123127806:31
wallyworld_ok06:31
jam(mup won't find it because it is private)06:31
fwereadewallyworld_, jam, reading back06:32
jamWe just need a discussion, because there is certainly a "vulnerability vs not working at all" that we have to sort through.06:32
jamfwereade: G+ might be appropriate06:32
jamwallyworld_: bigjools seems to be enjoying himself without you so far :)06:33
fwereadejam, well, I'm here, if the sight of a dressing gown will not be damaging to your sensibilities06:33
wallyworld_jam: how do you know?06:33
jamwallyworld_: he's posting pics of the great barrier reef on G+06:33
wallyworld_fwereade: my eyes, my poor, poor eyes06:33
jamfwereade: started: https://plus.google.com/hangouts/_/26fdcf993421ca83a1cf0b1a3ddd35772695e49306:33
wallyworld_jam: ah ok. that social networking thing i ignore06:33
jamfwereade: you could just turn the camera off :)06:34
davecheneyhttps://code.launchpad.net/~dave-cheney/juju-core/158-lp-1210407/+merge/18767506:44
davecheneyaxw: thanks for your review, see my comments06:56
axwlooking06:56
axwdavecheney: will lgtm, just curious about this: "we don't reboot machines"  -- it doesn't work?07:04
axwI get your point though - it doesn't really matter07:05
davecheneyaxw: if you reboot a machine it gets a new ephemeral ip07:05
davecheneyand at that point, nothing works07:05
davecheneyaxw: why do you say twice ?07:09
davecheneyI get your point. It just feels wrong to do it twice when it only ought07:09
davecheneyto be done once. But, given that's not really possible... LGTM.07:09
davecheney"07:09
axwdavecheney: *if* it were able to reboot07:09
axwit's idempotent though, so doesn't matter.07:10
davecheneyaxw: fair point07:10
davecheneyalso, bootcmd http://cloudinit.readthedocs.org/en/latest/topics/examples.html07:11
davecheneydoes what runcmd does07:11
davecheneyit has the same firstboot properties07:11
axwdavecheney: ok, then the doc comment on juju-core/cloudinit/Config.AddBootCmd is wrong :)07:12
davecheneyaxw: ok, i'll fix that in a followup07:12
axwdavecheney: in that page, the comment for bootcmd has a hidden gem: " * bootcmd will run on every boot"07:13
davecheneyurgh07:14
davecheneyoh well07:14
davecheneycare factor, quite small07:14
davecheneythis may fix the azure disk suckage07:14
davecheneybut while reading that page07:15
davecheneywhere does it say runcmd is only rnu once ?07:15
* axw shrugs07:15
davecheneyi'm glad we've arrived at this place07:16
axwit says it in juju-core, but that's maybe not authoritative07:16
davecheneyaxw, best I can tell, we've never tried07:17
davecheneyeveryone knows rebooting an ec2 intsance will screw it07:17
axwno worries, it's not a big deal07:18
axwfwereade: when you have a moment, would you mind expanding on your comments here? https://codereview.appspot.com/13832045/07:18
fwereadeaxw, sure07:19
axwI've changed the authentication stuff around a bit to allow HTTP GETs & HTTPS PUTs; wanted to know what you meant first, though, in case I was expending too much effort on this...07:19
fwereadeaxw, I was just ruminating that *if* cert distribution prove to be some sort of hassle (mainly because the CLI still needs direct storage access for deploy/upgrade-charm) we *could* use ssh storage for the manual provider and filesystem storage for the local one, because the clients that need write access should already have the information needed to set up the appropriate Storage types07:21
axwah, right07:22
axwfwereade: does anything other than CLI need write access?07:22
fwereadeaxw, the API server itself may do07:22
fwereadeaxw, but (assuming non-HA, anyway) that's doable via the filesystem07:23
axwfwereade: ok. it can write directly, given it's local, so that's fine07:23
axwyep07:23
axwok, so yes I did expend too much effort07:23
axwoh well07:23
fwereadeaxw, well, you expended it too early, at least07:23
axw:)07:23
fwereadeaxw, but no real harm done, I think07:23
fwereadeaxw, we would ideally like to not depend on provider storage at all but that's not an immediate plan07:25
axwok07:26
fwereadeaxw, what we will need to do soon, though, is start exposing storage access via the API, so that an API-only CLI can still upload charms from local repos07:27
axwfwereade: also, when you're not busy, would you please look at my latest replies on these two: https://code.launchpad.net/~axwalk/+activereviews07:27
axwfwereade: I was wondering if/when storage would be API based07:27
fwereadeaxw, will do07:27
axwthanks07:28
axwfwereade: actually, my changes to httpstorage aren't for naught07:34
axwthey'll allow GETs to not require a self-signed cert07:34
axwforgot taht important bit :)07:34
fwereadeaxw, I don't think they are, indeed07:34
axwfwereade: I mean, changes I haven't pushed yet07:34
fwereadeaxw, ah right -- cool then :)07:34
axwI've been changing things today07:34
fwereadeaxw, https://codereview.appspot.com/13632046/ LGTM07:38
axwfwereade: thanks. the error is tested in jujutest/livetests07:39
fwereadeaxw, I was just thinking of direct tests for New/Is07:39
axwfwereade: ok, then no. I'll add some before landing07:40
fwereadeaxw, cheers07:40
axweh that package has no tests... time to add some07:42
rogpeppe1mornin' all07:48
axwmorning rogpeppe107:59
rogpeppe1axw: hiy07:59
rogpeppe1a08:00
rogpeppe1:-)08:00
fwereadeaxw, https://codereview.appspot.com/13255051/ nearly LGTM, take a look and let me know your thoughts08:04
fwereaderogpeppe1, morning08:04
rogpeppe1fwereade: yo!08:04
axwfwereade: thanks, reading08:04
axwfwereade: replying now, but yeah, there is currently no handling of destruction for bootstrap nodes08:12
axwthe others can be destroyed as usual08:12
axwI wasn't really sure where to draw the line with "null" :)08:13
rogpeppe1fwereade: i'd don't quite understand this comment: https://codereview.appspot.com/13912043/diff/1/environs/configstore/disk.go#newcode13408:24
rvbaHi jam, hi mgz… would any of you have time to talk about what seems to be a serious bug in the MAAS provider (bug 1229275).08:25
rogpeppe1fwereade: i *think* the only time we add attributes is when we call Prepare08:25
_mup_Bug #1229275: juju destroy-environment also destroys nodes that are not controlled by juju <maas (Ubuntu):New> <https://launchpad.net/bugs/1229275>08:25
rogpeppe1rvba: oops!08:25
mgzrvba: yup, but I'll need to get on a bus in a sec08:25
fwereaderogpeppe1, it's not really actionable, especially in the light of our later discussions08:27
rogpeppe1fwereade: ok, cool08:27
fwereaderogpeppe1, prepare chooses bootstrap-state and writes it; bootstrap uses exactly that08:27
rogpeppe1fwereade: yup08:27
fwereaderogpeppe1, it may involve some light massage of bootstrap responsibilities vs prepare responsibilitie08:27
fwereaderogpeppe1, but nbd08:27
rvbaSo basically, juju destroys all the instances he gets back from the provider's instances() method, and that is basically all the instances.08:27
fwereadervba, that looks like a critical to me08:28
rvbaCritical indeed.08:28
fwereadervba, how does the maas providers markinstances as controlled by itself?08:28
rogpeppe1rvba: the provider's Instances method should not be returning instances it didn't itself create08:28
rvbafwereade: it doesn't08:28
rvbarogpeppe1: that's the problem indeed.08:28
rogpeppe1rvba: the other providers take care to avoid that08:28
fwereadervba, well, crap -- as someone maasy, how would you recommend we do so?08:29
rvbafwereade: if this needs to be addressed on the MAAS side, then the easiest way is probably to set a tag on the nodes.  A tag identifying the juju environment.08:30
rvbaOut of curiosity, how do the other providers do it?08:31
mgzrvba: either by looking at the security groups or the name attached to the instances, I believe08:32
mgzinstances that env controls are given names juju-ENVNAME-*08:33
rvbaRight… that's how the Azure provider works too now that I think of it.08:36
fwereademgz, rvba: fwiw envname is bad08:38
fwereademgz, rbva: long-term, envname can only ever be a local alias for the actual environment uuid08:39
mgzit's all a little dodgy, but I don't like a the alternatives much08:39
fwereademgz, rvba: and we've already had problems with two people using the same env name and same provider credentials08:39
fwereadeit's easy to say "don't do that then"08:40
mgzwell, we should check for that on bootstrap and blow up08:40
mgzright, I need to get on bus08:40
fwereadebut that's not as helpful as designing things such that we don't have to do so in the first place08:40
fwereadervba, I need to take a break for a bit, but... actually just a mo08:41
fwereadervba, how does juju not destroy those other instances first?08:41
fwereadervba, the provisioner will be asking for all instances and culling those it doesn't recognise08:42
fwereadervba, so *starting* a juju environment should also kill everything else08:42
fwereadervba, as should upgrading it08:42
fwereadervba, do you know if that's the case?08:42
rvbafwereade: I just tested it, that's not what happens.08:43
rvba(I'm testing with the latest trunk)08:43
fwereadervba, ok, that's weird08:44
fwereadervba, if it's not culling unknown instances it implies that actually AllInstances is reporting the right ones08:45
rvbafwereade: that should happen during bootstrap right?08:46
jamfwereade: well, you also have to run "juju status" first or it can poll the Provider at all yet08:48
jamwe've had many "run status and everything dies" bugs :)08:48
rvbafwereade: I simply tested running "juju bootstrap", is the culling supposed to happen there or later, for instance when the bootstrap node comes up?08:50
fwereadervba, jam speaks truth, you need to connect once before the bad things will happeb08:50
fwereadervba, it'll happen when the proviioner starts running08:50
rvbaOkay, testing that now.08:50
rvba(node is installing)08:50
fwereadervba, which will happen just after the first command that connects08:51
fwereadervba, cheers08:51
fwereadeaxw, responded, let me know what you think of the Destroy error question08:51
fwereadebbiab08:51
axwfwereade: yes sorry, I agree Destroy should return an error for now08:53
rvbafwereade: you were right, culling did happen.08:56
fwereadervba, well, ok, the good thing here is we don't have to worry about backward compatibility then, because nothing (sensible) we do can make the situation any worse09:02
fwereadeaxw, this right here ^ is a reason for an EnvironUpgrader that acts directly on the environment (independent of the should-it-hit-state discussion)09:05
* axw reads back09:06
fwereadeaxw, short version: maas instances are not tied back to their environment, and getting instances from maas gets *all* instances, not all instances in the *environment*09:07
* axw nods09:07
axwand destroys them all09:07
axwfwereade: where's the link to EnvironUpgrader?09:07
axwI didn't get much sleep last night, so a little slower than usual today09:08
fwereadeaxw, sorry, we were chatting about it in the state-upgrades thread09:08
fwereadeaxw, your contention was that it should connect to state09:09
fwereadeaxw, I think that's the wrong way round09:09
fwereadeaxw, *but* that adding an optional upgrade method to environ might be a good idea for other reasons09:10
axwfwereade: for example, so you could add a tag to the maas nodes that you control?09:10
fwereadeaxw, exactly so :)09:10
fwereadeaxw, or indeed so we could correct the envname problem (above) for the other providers09:11
axwfwereade: your latest reply clarifies things for me, and yes, much nicer to not manipulate state from environ09:11
fwereadeaxw, great09:11
axwfwereade: I've updated https://codereview.appspot.com/13255051/09:15
axwokay if I handle Destroy properly in a followup?09:16
fwereadeaxw, absolutely09:19
fwereadeaxw, LGTM09:20
axwthanks09:20
axwfwereade: I'll get the last of the httpstorage stuff in next, then get onto Destroy09:20
fwereadeaxw, perfect, tyvm09:21
axwfwereade: and then Prechecker wireup09:21
fwereadegreat -- that one's going to be a bit interesting, I think, we should plan how we get it in there ahead of time09:22
* fwereade bbiab again, see you all atthe meeting09:22
axwme too, I need a break. bbl09:23
=== thumper-afk is now known as thumper
thumperrogpeppe1: I've realized that I really don't like mornings09:48
rogpeppe1 thumper: that's taken you a while :-)09:49
rogpeppe1thumper: i've realised that i forgot (again!) about our chat last night09:49
thumperMy head just isn't in it that early09:49
thumperI should go check the agenda09:50
jamhttps://bugs.launchpad.net/bugs/1229275 is that actually Critical ?10:36
_mup_Bug #1229275: juju destroy-environment also destroys nodes that are not controlled by juju <juju-core:Triaged> <maas (Ubuntu):Triaged> <https://launchpad.net/bugs/1229275>10:36
jamseems High at best10:36
jamespecially given "nobody is assigned to it"10:36
dimiternfwereade, there it is https://codereview.appspot.com/13963043 - first part, the secrets blanking will follow10:58
jamdimitern: the other way around11:02
=== gary_poster|away is now known as gary_poster
fwereadedimitern, would you take a really quick look at https://bugs.launchpad.net/juju-core/+bug/1229286 ? it feels somewhat likely to be unitery11:18
_mup_Bug #1229286: debug-log and boolean options are broken in trunk <juju-core:New> <https://launchpad.net/bugs/1229286>11:18
dimiternfwereade, looking11:19
fwereadedimitern, the config bits specifically11:19
fwereadedimitern, may be helpful to confer with TheMue, he was touching config recently11:20
dimiternfwereade, I haven't tried juju set when live testing the api uniter11:20
dimiternfwereade, just debug-hooks and relation-set/get11:21
fwereadedimitern, yeah, I should have thought of that11:21
fwereadedimitern, in fact the stuff you're doing is as critical as this regardless11:21
fwereadeTheMue, is there any likelihood you'll be able to look into it this pm?11:21
TheMuefwereade: yep, will do11:22
TheMuefwereade: lunch in a few moments, but then11:23
dimiternfwereade, did debug-log show the hooks output before?11:23
dimiternfrankban, hey11:23
fwereadeTheMue, cool, thanks, please just verify what's happening with set vs config-changed11:23
dimiternfrankban, about that bug ^^11:24
dimiternfrankban, have you tried using debug-hooks instead?11:24
frankbandimitern: no11:24
fwereadedimitern, frankban: re logging you need to enable that logging in env config now11:24
fwereadedimitern, frankban: thumper knows exactly11:24
dimiternfrankban, debug-hooks will show you if config-changed got fired11:25
frankbandimitern: as I mentioned in the bug description, I am pretty sure that config-changed is called11:25
thumperdimitern, frankban: it is due to logging changes that were made recently to make things more "productiony"11:25
thumperbootstrap with --debug11:25
thumperor --log-config=<root>=DEBUG11:26
frankbanthumper: cool, good to know11:26
thumperor whatever you want11:26
thumperthis log config then propagates to all the agents11:26
frankbanthumper: so, by default, hooks output is not displayed in the debug log, correct?11:26
dimiternah, good to know11:26
thumpercan be updated using "juju set-env log-config=blah"11:26
thumperfrankban: correct11:26
thumperonly warning and errors11:26
thumperused to be debug for everything11:27
thumperI'll write an email for juju-dev tomorrow to explain the changes11:27
thumperand hooks11:27
thumpernot juju hooks11:27
thumperbut how to do other logging stuff11:27
frankbandimitern: so, the real bug is about boolean options: it seems they are always set to false11:28
frankbanthumper: thanks for the clarification11:28
thumpernp11:29
dimiternfrankban, hmm.. TheMue, can this be relevant to your recent config changes?11:29
TheMuedimitern: should not, only empty settings have been touched11:30
TheMuedimitern: will will take a look after lunch11:30
* TheMue => lunch11:30
dimiternfwereade, did you have a chance to look at https://codereview.appspot.com/13963043 ?11:35
fwereadedimitern, been in meetings I'm afraid, i'll try to fit in in before Igo forlunch11:35
dimiternfwereade, ok11:36
fwereadedimitern, did we not have an implementation for Upgrader that swapped out 127.0.0.1?11:41
fwereadedimitern, erDeployer11:41
dimiternfwereade, that's from there11:41
dimiternfwereade, it's not swapping anything11:42
dimiternfwereade, and it actually works like proposed - live tested on ec211:42
fwereadedimitern, I see, ok, no quibbles with what we're doing11:42
fwereadedimitern, but would you please pull the common implementation of those methods out into a common type we can embed, like the other shared functionality?11:43
fwereadedimitern, Ican live with that as an *immediate* followup11:43
dimiternfwereade, even though it's going away as soon as we have machine addresses?11:44
fwereadedimitern, we're still going to need to do the same thing in the same two places, aren't we?11:44
dimiternfwereade, I'll do it in this CL, not to much to do I think11:45
fwereadedimitern, we'd just stop using an environ to do so, surely11:45
fwereadedimitern, that's even better :)11:45
fwereadethanks11:47
* fwereade quick lunch11:48
jamdimitern: https://codereview.appspot.com/13964043/ looks pretty much the same as the one you set back to WIP and were going to resubmit. Did you mark the wrong one?11:52
jamhttps://code.launchpad.net/~dimitern/juju-core/145-apiserver-provisioner-blank-secrets/+merge/187577 looks just like https://code.launchpad.net/~dimitern/juju-core/147-apiprovisioner-blank-env-secrets/+merge/18773811:52
jamdimitern: maybe you meant to reject https://code.launchpad.net/~dimitern/juju-core/146-apiprovisioner-addresses/+merge/187719 ?11:53
dimiternjam, no, it has almost the same description and diff, but different prereq11:55
gary_posterTheMue, when you get back would like to know how https://bugs.launchpad.net/juju-core/+bug/1224568 is doing12:38
_mup_Bug #1224568: Improve hook error reporting <juju-core:In Progress by themue> <https://launchpad.net/bugs/1224568>12:38
TheMuegary_poster: it's almost done, one smaller CL is missing. after investigating the problem of frankban i'll continue (tests are missing)12:39
gary_posterawesome thanks TheMue @12:40
gary_poster!12:40
TheMuefrankban: ping12:52
frankbanTheMue: pong12:55
TheMuefrankban: the boolean value, how is it configured?12:55
frankbanTheMue: I saw every boolean values set to False, both if they are true by default (in config.yaml) and when they are set to True using "juju set". Hope that answers your question12:57
TheMuefrankban: the setting makes me wonder, there has been a change in getting handling nil values when default is set13:00
TheMuefrankban: the change happened with rev 180013:00
frankbanTheMue: it is possible, I saw this problem in trunk, but it works as usual reverting to 175013:01
frankbanTheMue: the bug includes instructions to dupe, I'd ensure this is not soemthing wrong in my local configuration before investigating13:02
TheMuefrankban: so if 1799 would be ok and 1800 not we've got it ;)13:02
TheMuefrankban: the change has been to omit nil values if default is set. and this may be interpreted as false13:04
frankbanTheMue: the weird think is that it seems the value is False in the hooks execution even when you explicitly set an option to true (and the default is false)13:05
TheMuefrankban: are you still on 1750 or back on trunk13:05
frankbanTheMue: 175013:05
TheMuefrankban: the hook execution part is strange13:06
TheMuefrankban: take a look at http://bazaar.launchpad.net/~go-bot/juju-core/trunk/revision/1800, get.go line 52 (the rest are tests)13:06
jamTheMue: so rev 1800 has "if option.Default != nil { info["value"] = option.Default" which seems to be the only change. Otherwise we leave value untouched.13:07
TheMuefrankban: yes, exactly13:07
TheMuefrankban: before that change the map contains the key "value", only with a value nil13:08
TheMuefrankban: so with a quick hack on your 1750 to behave here like the 1800 and showing the same errors shows that it's a shitty CL :/13:10
frankbanTheMue: so you duped?13:11
TheMuefrankban: yes, i would revert it then13:11
TheMuefrankban: but it would help me if you make that quick hack test to be sure that this is the correct concluion13:12
TheMueconclusion13:12
frankbanTheMue: are you sure the problem is there? AFAICT ServiceGet works correctly (the correct values are showed, i.e. in the GUI (and the GUI takes that information using the API)13:16
TheMuefrankban: no, i'm not sure, that's so far the only change i've found regarding config later than 175013:17
fwereadeTheMue, there's another biiig one13:17
fwereadeTheMue, uniter working via API13:17
TheMuefrankban: so you see the correct values in GUI? fine13:18
frankbanTheMue: yes13:18
TheMuefrankban: ok, will investigate there (uniter)13:18
dimiternfwereade, updated https://codereview.appspot.com/1396304313:24
fwereadedimitern, cheers13:25
fwereadedimitern, nice and clean, LGTM13:28
dimiternfwereade, thanks13:29
fwereadedimitern, remind me what else is on your plate after that one? the blanking?13:29
mgzgot a lead on our memory/tiny booting issues, bug 1227425 may be related13:30
_mup_Bug #1227425: Cloud images do not need apt-xapian-index <bot-comment> <cloud-images-build> <ubuntu-cloud-images> <Ubuntu:New> <https://launchpad.net/bugs/1227425>13:30
fwereadeTheMue, ah-ha13:30
fwereadeTheMue, a true boolean is being reported to the uniter as ""13:31
dimiternfwereade, I realized we no longer need StateAddresses() and APIAddresses() on agent.Config, so I'll remove these as well13:31
fwereadedimitern, nice13:31
fwereadedimitern, thanks13:31
TheMuefwereade: i'm currently digging in the uniter13:31
TheMuefwereade: where are you13:31
fwereadeTheMue, add a boolean to testing/repo/series/wordpress/config.yaml13:32
fwereadeTheMue, find the uses of assertYaml in uniter_test.go13:33
fwereadeshit13:38
fwereadeconfig data is getting squeezed through map[string]string and we didn't spot because we didn't have tests involving non-string config settings at the sharp end13:38
rogpeppe1a small MP that might speed up tests slightly: https://codereview.appspot.com/13968043/13:39
frankbanTheMue: revno 1800 works well fwiw. trying 1845 now13:39
TheMuefwereade: testing it, just had to change something in my test code ;)13:40
TheMuefrankban: aha13:40
fwereadeTheMue, frankban, dimitern: state/apiserver/uniter/uniter.go:50913:40
fwereadeTheMue, frankban, dimitern: those are not relation settings are are most definitely not a map[string]string13:40
fwereadeTheMue, frankban, dimitern: this is critical13:41
frankbanso sval, _ := v.(string) is killing booleans?13:41
dimiternfwereade, hmm13:42
dimiternfwereade, ok, so we need map[string]interface{} there?13:42
fwereadedimitern, yeah13:42
TheMuewow13:43
fwereadedimitern, the confusing range of configgy/settingsy types with their selection of arbitrarily different rules is deeply depressing to me13:43
dimiternfwereade, if it's only that, it's easy enough to fix the API13:43
fwereadedimitern, bad luck for getting caught up in it (and Iprobably reviewed it too :/)13:44
fwereadedimitern, I believe so13:44
fwereadedimitern, we did release with the uniter api active, didn't we?13:44
dimiternfwereade, we did13:45
fwereadedimitern, still, upgrading the return type won't actually hurt13:45
fwereadedimitern, or will it13:45
fwereadedimitern, what happens if we try to deserialize a map[string]interface{} with mixed values into a map[string]string?13:45
dimiternfwereade, it ignores non-strings?13:46
fwereadedimitern, that'd be nice, and I think it might, but we should check13:46
dimiternfwereade, I mean - non-strings get empty string values13:47
fwereadedimitern, that would mean behaviour wouldn't change13:47
dimiternfwereade, I can do a CL that changes the result of ConfigSettings() to params.ConfigResults (new type - like SettingsResults, but with params.Config instead)13:48
fwereadedimitern, can we give them explicit ConfigSettingsResults and RelationSettingsResults names please?13:49
fwereadedimitern, and name the types they use ConfigSettings and RelationSettings13:49
dimiternfwereade, well, ConfigResult is used by the provisioner actually, for environ config result13:49
rogpeppe1dimitern, fwereade, TheMue, natefinch, mgz, jam: environment file extension: anyone want to weigh in? https://codereview.appspot.com/1396904313:50
dimiternfwereade, we can change these, but that means even more api incompatibility13:50
fwereadedimitern, type names are arbitrary, aren't they? where's the incompatibility?13:50
fwereadedimitern, field names are a problem13:50
dimiternfwereade, protocol on-the-wire might change?13:50
dimiternfwereade, or not, ok13:51
fwereadedimitern, if they suck we just have to eat it up and hope we learn from our mistakes :)13:51
dimiternfwereade, next CL will be about that then13:51
fwereadedimitern, I think it's even more important than the secret-masking tbh13:51
fwereadedimitern, this is a pretty devastating regression13:52
TheMuerogpeppe1: reviewed13:53
dimiternfwereade, I'm done with the provisioner for now - submitted the first for landing, the second one is next, and while waiting I'll tend to the uniter13:53
* fwereade throws flowers before dimitern's path13:54
natefinchI like jenv because if we decide we don't like yaml anymore, we can put something else in there.  I do sorta have a hatred for prefixing things with j, just due to an inordinate amount of time exposed to java crap13:54
* natefinch isn't bitter though...13:55
dimiternfwereade, (if we ask Captain Hindsight for advice it'll be:) we would've caught this if we had tests for non-string settings13:55
fwereadethank you, Captain Hindsight!13:55
fwereadedimitern, perfectly correct13:56
dimiternfwereade, so I'll look about adding some13:56
fwereadedimitern, stick to local unit tests for the bit you change, for now, please -- I consider this critical and don't want to release with it *again* ;p13:58
fwereadedimitern, changing the uniter tests to exercise it may be noisy13:58
fwereadedimitern, they must ofc be done but they'll delay landing the fix13:59
dimiternfwereade, ok13:59
fwereadedimitern, that said, hmm, how do we test in the api?14:00
smoserhey14:01
smoserlooking at https://codereview.appspot.com/13962043/14:01
fwereadedimitern, if we use wordpress' config settings14:01
smoserrather than disabling certificate checking ...14:01
smoserwouldn't it be better to add the certificates ?14:01
smoserit seems juju would know them.14:01
smosercloud-init has config that explicitly allows adding certificates that should then be accepted.14:02
smoserhazmat, ^ ?14:02
fwereadejam, smoser makes an interesting point ^14:02
fwereadedimitern, anyway: if we are using wordpress as the "standard" testing charm14:03
smoserhttp://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/doc/examples/cloud-config-ca-certs.txt14:03
dimiternfwereade, I have some simple charms I can use14:04
fwereadedimitern, we should probably just add all config types to it and so gently encourage people testing to actually check them all14:04
fwereadedimitern, you may find that the uniter is tightly coiled around the fake wordpress charm14:04
fwereadedimitern, but, eh, that's the next branch anyway, I'll stop distracting you14:05
fwereadesmoser, I think the encompassing issue may be that some clouds don't even have certs configured14:06
smoseris that possible ?14:06
smoserignorance being exposed....14:06
smoserbut when i go to some https sight with firefox14:07
smoserit says "Hey, this doens't look right".  You want to get the certificate and trust it ?14:07
fwereadesmoser, I have only second-hand "knowledge", inferred from the conversations of those who know more than me14:07
smosercan't juju client just do the "get the certificate" bit. and then launch instances with that.14:07
fwereademgz, IIRC you were doing ugly things to induce certificate errors recently I recall -- did I misunderstand your saying you'd been removing the certs temporarily and things had still worked?14:08
dimiternfwereade, the fix is done, testing now14:34
mgzfwereade: jam had done testing along those lines, but only for the client side so far I think (as it's harder to screw up the certs on a booted node and check that works)14:38
jammgz, fwereade: I ssh'd into the node and messed up the certs for testing the patch I proposed.14:43
jamfwereade, smoser: While I like adding the functionality to allow a new known cert, I don't think it has the same user impact14:43
jambecause digging up the cert and adding it to the config is far more complex than just shoving a "false" in there when you are testing.14:43
jamso I'd be happy to add support for custom certs14:43
jambut I think we still need the "disable" ability14:44
smoserjam, not necessarily14:45
smosersee my comment about firefox above14:45
smoserfirefox bsaically allows me to say 'false' for checking of that server. and it does the rest.14:45
smoseri've actually done this once before on a project for exlicitly this reason. i figured out how firefox did what it does... how it gets the certificate and did that. and inserted that certificate.14:46
smoseri do see the point about this being "testing" and that https is likely only used without certificates on "test" scenarios.14:47
rvbamgz: just one question about the tag solution: if you upgrade a juju deployment that was created before we used the tags and then use a version of juju which uses the tags to filter out machines, your deployment will be broken.  What's the policy to solve that kind of upgrade problems in juju?14:48
mgzhm, good question14:50
mgzthat would be the case with either solution14:50
rvbaTrue.14:51
mgzwe could use compat code that detected the hey, no tag named after our environment, assume old behaviour of all machines are ours14:51
mgzbut that may not be the best way14:52
fwereaderbva, mgz: we are getting closer to sanity for upgrades, but there's little so far14:52
rvbamgz: that seems like the only solution14:52
fwereadervba, I was tending towards mgz's suggestion myself... it's bad but I don't see alternatives14:52
rvbaWell, another solution is to have juju detect that there is no tag, and then create it and attach all the nodes it knows about to it.14:53
mgzwe'd need to be doubly sure that destroy-enviornment *twice* wouldn't then go and delete all maas nodes anyway14:54
fwereadervba, mgz, tag only the machines that have instance ids assigned in state?14:54
mgzbecause hey, the second time there's no tag named after our env, so everything must be ours, so wipe it...14:54
rvbafwereade: yes14:54
rvbamgz: the second time, no machine id will be stored, so no machine removed.14:55
fwereadervba, mgz: that can't happen automaticaly within the environment though14:55
mgzit seems an easy enough disaster to avoid14:56
fwereadervba, mgz: yeah -- axw has a lot on his plate right now but he seems enthusiastic about doing the long-overdue upgrade stuff in the near future14:57
rvbamgz: maybe the first solution (explicitly supporting the old behavior) is simpler after all.15:00
rvbafwereade: out of curiosity, why doesn't juju itself keeps track of the machines it owns?15:02
rvbakeep*15:02
fwereadervba, it does -- but Destroy is entirely internal to the environment, which is itself expected to keep track of its own machines and differentiate between those in and out of the environment15:03
fwereadervba, it would indeed be possible to have written it such that juju had to specify all the instances it knew about15:04
dimiternanyone seen this local provider error: http://paste.ubuntu.com/6159055/15:04
fwereadervba, but I think that would make it very hard for juju to effectively reap instances that it needed to itself15:04
dimiternit used to work fine a week ago15:04
dimiternloaded invalid environment configuration: storage-port: expected int, got 804015:05
fwereadedimitern, that looks kinda like an int has been inappropriately coerced to a string somewhere, doesn't it15:06
rvbafwereade: I don't want to bother you with that, but I don't really understand.  If juju has the list of all the machine it owns, it can pass it to the environment when destroying it.15:06
rvbamachines*15:06
dimiternfwereade, it does15:06
rvbaBut that's not the way it works now so we have to fix the MAAS provider anyway :).15:07
fwereadervba, if we start an instance but fail to record it against a machine, we want to automatically trash that instance15:07
rvbafwereade: hum, I see.15:08
fwereadervba, I will try to make the situation clearer than it currently is in the writing-a-provider doc I'm working on15:08
rvbaCool15:08
rogpeppe1mgz: what's the status of the VPC-only bug?15:17
hazmatmgz, if you read the bug report, it states in the description how to get enabled with that on an existing account15:22
hazmathttps://bugs.launchpad.net/juju-core/+bug/122186815:22
_mup_Bug #1221868: juju broken with ec2 and default vpc <juju-core:Confirmed for gz> <https://launchpad.net/bugs/1221868>15:22
hazmatits took about 2biz days15:22
dimiternfwereade, ping15:26
fwereadedimitern, pong15:26
dimiternfwereade, how do you suggest to live test that thing? so far I tried ec2 live testing and calling juju set svc flag=True, calls config-changed in a debug hooks session and config-get shows it as expected15:27
fwereadedimitern, that sounds solid15:27
fwereadedimitern, but that local provider thing is really alarming15:27
dimiternfwereade, I'll check on trunk to see if it's my branch or it's broken15:28
fwereadedimitern, thanks15:28
TheMueah, tests pass15:31
dimiternfwereade, same effect in trunk15:37
natefinchargs... couple annoying bugs in goyaml..... unmarshaling "" into a *string makes the string nil  (not an empty string), and unmarshalling [] into a slice gives you a nil slice (not an empty slice).  PITA15:37
dimiternfwereade, so the local provider was broken earlier15:37
* fwereade freaks out at dimitern but wants to chat to nate for a moment15:37
fwereadenatefinch, that's annoying15:38
natefinchfwereade: yeah, we already had one workaround in constraints15:38
fwereadenatefinch, I'm sure there was a similar bug with goyaml in the past15:38
natefinchfwereade: yeah, we had  to set up a whole SetYAML method because the containertype was getting unmarshaled as nil instead of empty.15:40
fwereadenatefinch, ouch -- do you know if there's a goyaml bug for that?15:40
dimiterncan someone else try bootstrapping a local environment from trunk and deploying anything, to see if all-machines.log shows this error http://paste.ubuntu.com/6159055/15:40
natefinchfwereade: didn't look like it when I perused the bug list (only 13 bugs listed)15:41
dimiternTheMue, rogpeppe1, jam, mgz  ^^ ?15:44
dimiternand please make sure you did go install . in cmd/juju and jujud/, and use --upload-tools on bootstrap15:45
mgzhazmat: thanks, I'm just not certain I want to do that on the shared bzr account, how disruptive was it for you?15:49
dimiternfwereade, there's the fix https://codereview.appspot.com/1390804415:53
hazmatmgz, seamless, just pick a region your not using15:54
hazmatmgz, you have to clear out ec2 resources in that region (ie no running instances, also good to clear out groups)15:54
mgzah, that does seem good15:54
hazmatmgz, so i take it then there hasn't been any progress on this? we really need it for 1.16..15:55
hazmati ran into two users last week, who couldn't use juju on ec2..15:55
natefinchfwereade: now there are bugs15:56
fwereadenatefinch, thanks15:56
dimiternok, so no one wants to try to reproduce the local provider issue, i'm filling a bug15:58
sinzuifwereade, do you have a revision that you want to release as 1.15.0?16:01
fwereadesinzui, I am very worried that I do not, because dimitern's problem seems pretty critical to me16:02
sinzuifwereade, okay. That's fine. Is there a bug I can track16:03
fwereadesinzui, dimitern is filing it as we speak16:03
sinzuifab. Thank you.16:04
dimiternfwereade, sinzui: there it is bug 123154316:07
_mup_Bug #1231543: upgrader startup failure with local provider <juju-core:New> <https://launchpad.net/bugs/1231543>16:07
sinzuiThank you dimitern16:09
fwereadedimitern, would you please mark that critical and start investigating? TheMue, are you on something else or can you assist reproing?16:10
dimiternfwereade, it's filed as critical16:10
dimiternfwereade, and I'm looking at it16:10
dimiternfwereade, the uniter fix is proposed already16:10
fwereadedimitern, you anticipate my micromanagement with aplomb and panache16:10
fwereadedimitern, I'm about to LGTM it I think16:11
fwereadedimitern, yep, LGTM, just one tweak needed16:12
dimiternfwereade, ok, will tend to it afterwards16:12
TheMuefwereade: can do tomorrow morning, have to reactivate the matching VM (not enough space anymore on disk)16:13
TheMuefwereade: currently I'm fighting with a called but non-existing constructor *sigh*16:15
* TheMue still will propose now, so the changes can be reviewed16:16
* fwereade is taking a short family break but will return anon16:18
TheMueshit, propose will not work with the missing function :(16:19
TheMuedimitern: i'll start to setup my testing vm now16:21
TheMuedimitern: will you not any findings in the issue to that i can support you after setup later16:22
TheMuecu later16:24
dimiternTheMue, so far I tested it happens in trunk and r1885, will go further16:24
rogpeppe1dimitern, mgz, jam, natefinch: next stage in environment info storage, reviews appreciated please: https://codereview.appspot.com/1397004316:26
rogpeppe1fwereade: ^16:26
rogpeppe1dimitern: ping16:30
dimiternok, so it doesn't happen as far as r1844, going back up16:30
dimiternrogpeppe1, pong16:30
dimiternrogpeppe1, I'm up to my elbows into the local provider atm16:30
rogpeppe1dimitern: i'm just wondering about API connections and how they can find the API addresses to store locally16:30
dimiternrogpeppe1, expand a bit please16:31
rogpeppe1dimitern: so, the plan is that when we make an API connection, we find out the current set of API addresses and store that locally in a .jenv file16:31
dimiternrogpeppe1, how about if they change after that?16:31
rogpeppe1dimitern: we refresh the cache each time we connect16:32
rogpeppe1dimitern: and fall back to environ config info if the connection fails16:32
dimiternrogpeppe1, sgtm16:32
rogpeppe1dimitern: but we need to find out the current set of API addresses so we can store them16:32
rogpeppe1dimitern: and i'm thinking of an API call that's available to anyone that can access the API that returns them16:33
jamrogpeppe1: it could be returned from Login16:33
dimiternrogpeppe1, so like a Login call16:33
rogpeppe1jam: that's an interesting idea16:34
rogpeppe1jam: i quite like that actually.16:34
rogpeppe1jam: then api.Open can cache it, so it can be retrieved by a later call16:34
rogpeppe1jam: so we don't have to change the type sig16:34
jamsomething like that, yeah16:34
rogpeppe1ah, there's a problem, i think16:35
rogpeppe1jam: i *think* that State.APIAddresses just returns the same IP addresses that mongo peers use to talk to each other16:36
rogpeppe1jam: which probably won't be public IP addresses16:36
dimiternrogpeppe1, they aren't16:36
rogpeppe1damn. i guess i'll need to fix that first16:36
dimiternrogpeppe1, but with the addresser stuff coming up it might not be needed16:37
dimiternmachine addressability16:37
rogpeppe1dimitern: go on... how does that help?16:37
dimiternrogpeppe1, machines will know their own addresses (public, private, all)16:37
rogpeppe1dimitern: go on16:38
dimiternrogpeppe1, and you can query state for them, and there will be a worker to update them as needed16:38
dimiternrogpeppe1, mgz is working on that I think for some time16:38
rogpeppe1dimitern: so to find the API addresses, you do a search for all machines with ManageState, then query their addresses?16:38
rogpeppe1s/ManageState/JobManageState/16:39
dimiternrogpeppe1, yes16:39
dimiternrogpeppe1, and for other potential new jobs we have16:39
rogpeppe1dimitern: that seems somewhat inefficient. wouldn't it be a linear scan?16:39
dimiternrogpeppe1, who needs to know?16:40
rogpeppe1dimitern: it'll happen every time someone connects to the API16:40
dimiternrogpeppe1, and currently it happens thorough the StateInfo16:40
rogpeppe1dimitern: i was thinking that we'd have a doc in mongo which held the API addresses, then some agent would maintain that16:40
dimiternthrough16:40
dimiternrogpeppe1, that might be an addition to the addressability stuff, or even orthogonal to it16:41
rogpeppe1dimitern: i think it's orthogonal, yes16:41
rogpeppe1hmm, how does a machine's public address get filled in now? by the provisioner, i guess16:42
mgzrogpeppe1: that's the idea16:42
mgznot sure what you mean by "linear scan" though16:43
rogpeppe1mgz: well, if i want to find out the addresses of all machines that are state servers, how should i do it?16:43
dimiternrogpeppe1, not really16:43
dimiternrogpeppe1, the unit's addresses are set by the uniter, but the machine addresses are taken from the environment16:44
mgzquery out machines that have the stateserver bit set in mongo, and pull the address?16:44
dimiternrogpeppe1, by the provisoner, but it doesn't set them anywhere yet16:44
rogpeppe1mgz: won't that be a linear scan through all machines?16:44
mgzhaving a seperate table with addresses of state servers doesn't *sound* faster to me16:44
mgzbut is also perfectly possible, it's just a denormalisation16:45
dimiternfwereade, I found the culprit - the issue in bug 1231543 starts to happen in r187716:46
_mup_Bug #1231543: upgrader startup failure with local provider <juju-core:New> <https://launchpad.net/bugs/1231543>16:46
rogpeppe1mgz: to me it sounds like one fetch of a document in a single document collection, versus a scan through potentially many hundreds16:46
rogpeppe1mgz: but... i think that for the time being it's probably fine16:47
rogpeppe1mgz: storing the addresses separately is an optimisation really.16:47
rogpeppe1dimitern: hmm, so the uniter API has PublicAddress and SetPublicAddress. is there any particular reason for that?16:48
dimiternrogpeppe1, the uniter sets these on startup16:49
rogpeppe1dimitern: what i mean is: why have the PublicAddress method if it's only there to pass its result to SetPublicAddress?16:50
rogpeppe1dimitern: (which also gives a compromised uniter the potential freedom to muck with its reported public address, something you probably don't want)16:51
dimiternrogpeppe1, the uniter needs both to set public/private addresses of a unit, and to read them16:51
rogpeppe1dimitern: why's that?16:52
dimiternrogpeppe1, the addresses shouldn't be on a unit at all - they should be on a machine, but that's that16:52
rogpeppe1dimitern: i'm wondering about an API call, say Start, which informs the API that the uniter has started16:52
dimiternrogpeppe1, because public-address is one of the relation settings set automatically when entering scope for example16:52
rogpeppe1dimitern: ah, good point, so we need PublicAddress16:53
dimiternrogpeppe1, the API very well knows when the unit agent connects, and starts a pinger now16:53
rogpeppe1dimitern: in that case, that's probably the moment that the public and private addresses should be set16:53
dimiternrogpeppe1, perhaps, if we're not using a separate worker for that16:54
dimiternrogpeppe1, and setting them on the machine, not on the unit16:54
rogpeppe1dimitern: yeah16:54
rogpeppe1dimitern: but the point is that we could remove that stuff from ModeInit, i think16:55
rogpeppe1dimitern: hmm, except not right now of course16:55
rogpeppe1dimitern: because it really does get the public address from the provider16:56
rogpeppe1dimitern: ok, ignore my stupidity16:56
mgzI've added an explaination to bug 1227533 about our memory woes the last week16:56
_mup_Bug #1227533: Juju fails to bootstrap if memory is lower than 1GB <juju-core:Triaged> <https://launchpad.net/bugs/1227533>16:56
mgznow I must depart, farewell!16:56
rogpeppe1mgz: one mo, please?16:56
mgzone mo while I close things :)16:57
dimiternrogpeppe1, there's a todo about it in mode init16:57
rogpeppe1mgz: kapil was asking about the status of the VPC-only bug...16:57
rogpeppe1dimitern: yeah, i understand that now :-)16:57
dimiternrogpeppe1,  ...and a few other places, and there's the tech-dept bug 120537116:58
_mup_Bug #1205371: state.Addresses and APIAddresses need better implementation <juju-core:In Progress by gz> <https://launchpad.net/bugs/1205371>16:58
rogpeppe1dimitern: hmm, so there's no way of finding out a machine's public address currently unless it has a unit on it?16:58
mgzrogpeppe1: it's the next on my list, but haven't started yet, saw his comments earlier16:58
rogpeppe1mgz: ok, cool16:58
mgzwill tackle the registration stuff at least tomorrow16:58
mgzokay, now must fly16:58
* dimitern is totally puzzled how r1877 could lead to that local provider issue17:00
rogpeppe1i'd love a review of  https://codereview.appspot.com/13970043/ if anyone has a little time17:22
natefinchrogpeppe1: I can take that17:22
rogpeppe1natefinch: ta muchly17:23
rogpeppe1natefinch:17:23
fwereadedimitern, thanks, Iwill meditate upon 187717:33
fwereadedimitern, "The simplestreams tools metadata includes a sha256..."?17:34
natefinchrogpeppe1: what's the difference between  done := make(chan struct{})17:35
natefinch go func() { info.BootstrapConfig(); done <- struct{}{} }()17:35
natefinch<-done17:35
natefinchand just calling info.BootstrapConfig() in the current goroutine?  They both just block waiting for bootstrapconfig to finish, right?17:35
rogpeppe1natefinch: ha, there is a subtle difference, but it's just a debugging remnant17:35
rogpeppe1natefinch: i'll revert it17:35
rogpeppe1natefinch: 2 points if you can tell me why i did it :-)17:35
natefinchrogpeppe1: if you had a panic in bootstrap config it would make the call stack a lot shorter17:36
rogpeppe1natefinch: close17:36
natefinchrogpeppe1: could be something to do with the scheduler, but that seems too subtle to matter17:38
rogpeppe1natefinch: nah17:38
rogpeppe1natefinch: it's to do with gocheck17:38
rogpeppe1natefinch: if you panic, then gocheck catches it and distorts things17:38
rogpeppe1natefinch: so by panicing in a goroutine you get a much cleaner idea of what's going on at that momen17:38
rogpeppe1t17:38
natefinchahh ok17:39
natefinchrogpeppe1: I presume you'll take out the log messages in there as well17:40
rogpeppe1natefinch: yes17:40
natefinchk\17:40
natefinchrogpeppe1: btw, is "erewhemos" someone misspelling "somewhere" backwards, or something that actually makes more sense?17:42
rogpeppe1natefinch: the former :-)17:42
dimiternrogpeppe1, sweet! i'll remember that trick next time i'm fighting tests panic17:42
natefinchrogpeppe1: ha, ok.  I thought so, but you never know17:43
rogpeppe1natefinch: just a nonsense name that's unlikely to be confused with anything in the production code17:43
fwereadenatefinch, I'm sorry about that, there was a satirical work by samuel butler called "erewhon" which is not *quite* "nowhere" backwards17:43
fwereadenatefinch, it seemed like a good idea at the time17:43
dimiternfwereade, yes that's whati found so fr17:43
fwereadedimitern, just to be crystal clear: 1876 works, 1877 does not?17:44
rogpeppe1fwereade: we're in the *distopia* right?17:44
rogpeppe1dystopia, sorry17:44
natefinchfwereade: haha, ok. not up on my Victorian authors17:44
fwereaderogpeppe1, heh17:44
dimiternfwereade, that's what I see, but I'll double check, just a minute17:44
dimiternfwereade, indeed17:49
dimiternfwereade, and the error now makes sense 2013-09-26 17:48:00 ERROR juju runner.go:211 worker: exited "upgrader": cannot set agent tools for machine 0: empty size or checksum17:49
dimiternfwereade, but, interestingly the coercing error is not there in 187717:51
rogpeppe1natefinch: still waiting for that review, BTW :-)18:02
natefinchrogpeppe1: still doing it. Had to stop in the middle for a little bit.  Almost done :)18:03
rogpeppe1natefinch: np18:03
dimiternfwereade, so the other error starts to show in my r1884 that switches to api provisioner18:08
rogpeppe1fwereade: do you know what stage mgz is at with the addressing stuff?18:13
natefinchrogpeppe1: done18:13
rogpeppe1fwereade: i just started hacking up the publisher/addresser worker, then realised that he might already have done/nearly done it18:13
rogpeppe1natefinch: thanks18:13
fwereaderogpeppe1, I'm afraid I do not actually know, i was kinda expecting a CLfrom him today18:18
rogpeppe1fwereade: i need that, or something like it, to cache the API addresses18:19
fwereadedimitern, ah ok18:19
fwereadedimitern, so the upgrader thing appears to be a problem18:19
rogpeppe1fwereade: this is the sketch of the code i just wrote: http://paste.ubuntu.com/6159815/18:19
dimiternfwereade, yeah18:19
rogpeppe1fwereade: oops, this is better: http://paste.ubuntu.com/6159817/18:20
fwereadedimitern, I thought all we were meant to be setting was a version, not a whole tools18:20
dimiternfwereade, and the other thing - it doesn't seem to be an int coerced to string, it's an int - I debugged so far as to say the provisionerAPI returns the correct map[string]interface{} in worker/WaitForEnviron18:21
fwereadedimitern, oh, ffs, is it possibly a json problem? definitely an int and not a float?18:21
fwereaderogpeppe1, sorry, I have only skim-read it, but I think it may well have overlap18:23
dimiternfwereade, trying to see exactly what now18:23
rogpeppe1fwereade: yeah, if he's doing an addresser worker, it almost certainly will18:24
rogpeppe1fwereade: well, i'll keep it around in case18:24
dimiternany idea why this error? ERROR juju.provider.local environ.go:482 could not install machine agent service: exec ["start" "juju-agent-dimitern-local"]: exit status 1 (start: Job is already running: juju-agent-dimitern-local)18:24
rogpeppe1time to stop for the day18:25
fwereadedimitern, aw hell, that really should be fixed for 1.16 too, we don't seem to shut down local envs cleanly18:26
dimiternfwereade, hmm - we *are* stopping them, but the upstart job remained and it though "because it's there, it must be running"18:27
fwereadedimitern, looks like we're calling StopAndRemove though18:28
dimiternfwereade, hmm.. it get's deeper18:30
dimiternfwereade, so now the upstart job hangs18:30
dimiternfwereade, that's why the bootstrap doesn't complete and I terminated it18:30
rogpeppe1g'night all18:31
rogpeppe1might be back later, actually18:31
fwereaderogpeppe1, see you soon18:31
fwereadedimitern, "cannot install, already running" seems to imply that it really was running18:32
fwereadedimitern, and was thus not properly cleaned up18:32
dimiternfwereade, believe me, ps xa | grep juju was the first thing I did - no results, even as root18:33
dimiternfwereade, just the upstart job was there18:33
fwereadedimitern, very strange18:33
dimiternfwereade, so the mongo hangs at bootstrap18:38
dimiternfwereade, and that fails the whole thing18:38
dimiternfwereade, it's indeed running now, and the error is correct18:38
fwereadedimitern, ok, so we have *some* sort of poorly characterized local provider cleanup problem18:39
dimiternfwereade, and even upstart believes jujud job is running18:39
dimiternfwereade, and I can't see it18:39
fwereadedimitern, and a clear current issue: that we're recording full agent tools including hashes for no clear reason, when all we really care about it the binary version they're running18:40
fwereadedimitern, concur>18:40
dimiternfwereade, not sure I get you there18:42
fwereadedimitern, so the problem seems to be that we're setting *tools* on the agent, rather than just setting the binary version which is all anyone cares about AFAIK18:43
fwereadedimitern, and we can't set tools because we didn't record the hash we downloaded and verified18:44
fwereadedimitern, and it seems a bit pointless to report it back to juju when juju told it to us in the first place18:44
dimiternfwereade, yes, that seems likely18:45
dimiternfwereade, I have to stop though.. lest my head explodes :/18:48
fwereadedimitern, no worries at all, you are already above and beyond18:48
fwereadedimitern, is there a specific bug for the tools issue?18:49
dimiternfwereade, don't know18:49
dimiternfwereade, I added the one for the upgrader, but this seems unrelated18:49
fwereadedimitern, the upgrader was what I meant by the tools issue18:50
dimiternfwereade, bs, actually the upgrader error is about tools, the other errors were different18:50
dimiternfwereade, :)18:50
fwereadedimitern, I think there is one for screwy local-env destruction18:50
dimiternfwereade, maybe18:51
rogpeppe1fwereade: the point of setting tools on the agent was so that it was possible to make available that information in the status, so you could know exactly what s/w was running on each machine19:25
fwereaderogpeppe1, ok, so we *should* have to record and write into the tools dirs the hashes of the original tarballs?19:27
rogpeppe1fwereade: yes19:27
fwereaderogpeppe1, I dob't really see how that helps anyone19:29
rogpeppe1fwereade: when debugging stuff it means you have an unambiguous record of what is being run where, which i *think* could be very useful at times19:30
rogpeppe1fwereade: for reproducibility and diagnosis of difficult issues in a highly distributed environment19:30
rogpeppe1fwereade: and i don't really see why it should be a hard thing to do, though i haven't read through the discussion above, so i don't know what the current issue is19:31
fwereaderogpeppe1, it looks like we're barfing when calling SetAgentTools because the tools in state now demand a hash19:32
rogpeppe1fwereade: and you can't have a Tools with an empty hash?19:32
fwereaderogpeppe1, apparently not19:32
fwereaderogpeppe1, it seems to be demanding that if there's a URL, there must be a size and checksum19:33
rogpeppe1fwereade: oh yes, checkToolsValidity19:34
fwereaderogpeppe1, but not barfing if there's no URL19:34
fwereaderogpeppe1, when I *thought* we always wrote a URL19:34
fwereaderogpeppe1, but ofc do not necessarily have the original tgz available and so can't always manage size/hash19:35
fwereaderogpeppe1, (not that we do, even when we do, AFAIK -- maybe that changed somewhere?)19:35
rogpeppefwereade: sorry, computer just crashed19:42
rogpeppefwereade: last thing i was was "it seems to be demanding that if there's a URL, there must be a size and checksum"19:43
rogpeppes/was/saw was/19:43
natefinchsigh.... goyaml doesn't differentiate between nil slices and empy slices :/20:26
=== sidnei` is now known as sidnei
wallyworldfwereade: hiya, saw the email about the error, i can take a look22:22
fwereadewallyworld, tyvm22:22
wallyworldany clues to get me started? i see a few comments in the bug22:22
wallyworldcould it be related to the env split up?22:22
thumpergrr22:25
thumperI have the upgrader constantly bouncing22:25
thumperany one else noticed?22:25
thumperwallyworld: fwereade: ??? http://paste.ubuntu.com/6160651/22:25
fwereadethumper, https://bugs.launchpad.net/juju-core/+bug/123154322:26
_mup_Bug #1231543: upgrader startup failure with local provider <regression> <juju-core:In Progress by dimitern> <https://launchpad.net/bugs/1231543>22:26
fwereadethumper, wallyworld is looking at it now dimitern has I think stopped22:26
wallyworldthumper: that error looks like tools checksum is failing to be calculated22:26
thumperkk22:27
thumperI'm trying to chase the lxc issues22:27
wallyworldfwereade: thumper's error message mentions checksums, whereas bug says something about ports22:27
fwereadewallyworld, that is also a problem22:27
wallyworldyeah, so  issues \o/22:28
wallyworld222:28
fwereadewallyworld, but the tools checksum is easier to get a handle on and isolate22:28
wallyworldthe tools one is my fault22:28
wallyworldif i can't easy find it i can just disable the checksum check for now22:28
fwereadewallyworld, so do we now write out size/sha256 into the tools dir when we unbundle?22:28
wallyworldwe do22:28
wallyworldbut for some reason the checksum is not getting passed down the api22:29
fwereadewallyworld, I bet we just miss it in the local provider then22:29
fwereadewallyworld, or is it happening everywhere?22:29
wallyworldit could be that the tools are being read from the old place which means no checksum22:29
fwereadewallyworld, although, hmm, yeah exactly22:29
wallyworldfwereade: i tested bootstrapping on ec2, hp etc with the new stuff and it works22:29
fwereadewallyworld, I'm a little scepticalabout the value of recording all that in state anyway22:29
fwereadewallyworld, cool22:30
wallyworldfwereade: we recorded the url in state, from which a tools stuct is made. and that tools struct is used to find a tools tarball. so it needs the checksum22:31
fwereadewallyworld, we only ever call SetAgentTools in code that has already been extracted from the tarball in question22:32
wallyworldfwereade: i'll have to re-read the code - what do we use the agent tools stored in state for? the tools info from SetAgentTools?22:33
fwereadewallyworld, not much22:33
thumperfwereade: we should get around to fixing the tools for the local provider22:34
wallyworldso i could drop the checksum requirement. i thought it was needed somewhere, can't recall though22:34
fwereadewallyworld, that said, minimal changes good, I am not encouraging you to rewrite and would most favour a simple tweak to the local providr that made sure it wrote its tools dir properly22:34
thumperrather than the upload-tools malarky we do now22:34
fwereadethumper, oh, god, yes we should22:34
thumperfwereade: however I'm not sure what the best way is22:34
fwereadethumper, I'm quite sure we can harmonise it with all the simplestreams stuff22:35
thumperI hope so22:35
wallyworldfwereade: when you say "not much" - is there a simple explanation of why we store the tools url and version in state?22:35
fwereadewallyworld, the version we need for status22:36
fwereadewallyworld, series is duplicated, a machine should already know its own series22:36
wallyworldwhy the url?22:36
fwereadewallyworld, and for that matter arch should always be in hardware characteristics too22:36
wallyworlddo we ever use the url to fetch tools?22:37
wallyworldif not, i can drop the need for imsisting on checksum22:37
fwereadewallyworld, I was asking rogpeppe -- I hope I am not mischaracterising him to say that it's there just in case it turns out to be useful one day22:37
fwereadewallyworld, SetAgentTools is, as far as I'm aware, purely a record of what the agent reports itself to be running22:38
wallyworldwell22:38
fwereadewallyworld, url and checksum and size are not, I think, exposed anywhere22:38
wallyworldnot sure i agree with recording all that extra info just to report a version22:38
fwereadewallyworld, all that detail in (once) state.Tools would have been great if we'd ever stored an environment's available tools in state22:39
wallyworldfwereade: would you object if i zero out url and checksum in set agent tools22:39
wallyworldif we have a url and not the checksum, that is not something we should encourage22:40
fwereadewallyworld, because then we could just grab the tools for a particular machine with a trvial query, get the url and size and checksum, and hand them straight over22:40
fwereadewallyworld, well22:40
wallyworldor i could find out why checksum is missing22:40
fwereadewallyworld, the url really just indicates "this is where we got them from"22:40
wallyworldok, i'll see how it pans out. for the release, where we need something done, it may just be easier to drop the mandatory checksum requirement22:41
wallyworldand fix next week22:41
fwereadewallyworld, indeed, if that's what it comes to then so be it22:41
wallyworldcause the other issue sounds more tricky22:42
thumperwallyworld, fwereade: I'll look at the port int issue22:42
fwereadethumper, <322:42
thumperwallyworld: if you want to tackle the checksum thing22:42
wallyworldyes indeed22:42
thumperheh, interesting,22:43
wallyworldfwereade: i'm also part way through ripping out all legacy tools support - that will need to be landed after 1.15 when all clouds have had simplestreams tools uploaded by the release team22:43
thumperI can see from the rpc logging that the value is being sent through as an int22:43
* thumper digs22:43
thumperwhat the actual fuck...22:44
fwereadewallyworld, awesome news22:45
fwereadethumper, that sounds less awesome22:45
* thumper just digging22:45
* wallyworld needs a coffee22:45
wallyworldthumper: how do i reproduce  your issue?22:56
thumperwallyworld: all I did was bootstrap the local provider22:56
wallyworldok22:56
thumperI did try to deploy some things22:56
thumperbefore I checked the logs22:56
thumperso not entirely sure22:56
wallyworldnp thanks22:56
thumperbut I feel just bootstrap is enough22:56
thumperI also feel that my problem may be shadowing yours22:57
wallyworldshould be easy to find then hopefully22:57
thumperso you might not get yours fixed22:57
thumperuntil mine is22:57
wallyworldlet's find out22:57
thumperhmm...23:07
thumperI think I know what it is, but it is weird23:07
thumperand not sure why it hasn't broken before this23:07
thumperif it is what I think it is23:07
* fwereade wants to watch, but is going to bedinstead23:08
fwereadegn all23:08
wallyworldfwereade: night23:10
wallyworldthumper: i found the spot where SetAgentTools was passing in incomplete tools23:10
thumperwallyworld: cool, I've found out where the validate is failing, but unsure as to why23:11
wallyworldbut i'm not sure i habe the size and checksum info at that point to pass in also23:11
wallyworldit really is just passing in a version wrapped in a tools struct which seem silly23:11
wallyworldthumper: ah, actually i think when local provider starts up, the tools hack it uses might not be recording the checksum etc, so when that info is read back later, it is missing23:14
* wallyworld is guessing23:14
thumperhow to I get the type of something printed out?23:14
wallyworld%T23:14
wallyworldfmt.Println("%T", thing)23:14
wallyworldPrintf23:14
thumperstabby!!!!!!!!!!!!!!!!!!123:16
thumpererror used to be :  storage-port: expected int, got 804023:16
thumperadded type info23:16
thumperguess what?23:16
thumperstorage-port: expected int, got float64(8040)23:16
thumperthis is why it is failing23:16
thumperFFS23:16
thumperis it because json serialization only has float64?23:17
thumperhow to we fix this in a non sucky way?23:17
* thumper wonders how the api port is handled23:18
* thumper digs23:18
thumperstabby stabby23:18
thumperthe difference is:23:18
thumperschema.Int23:18
thumpervs23:18
thumperschema.ForceInt23:18
thumperguess which is which?23:18
thumperhuh?23:20
thumperI change it now I get a panic23:20
wallyworldthumper: you need a custom json demarshaller i think23:22
wallyworldfor the struct23:22
thumperno, found it23:22
thumperyou wouldn't believe it if I told you23:22
thumperwell, you might23:22
thumperschema.Int -> int6423:22
thumperschema.ForceInt -> int23:23
wallyworldwtf23:23
thumperok, that fixes it23:23
wallyworld\o/23:23
wallyworldthumper: save me some time - can you point me to where the local provider does its tools hacky thing to find the tools to bundle23:24
thumperit does the default --upload-tools bit23:24
thumperwhat do you mean exactly?23:24
wallyworldfor some reason, the tools struct passed to bootstrap is (i think) missing the checksum info23:25
wallyworldi need to find out how that is happening23:25
wallyworldjust working backwards to find it23:27
thumperprobably the possible tools created by the upload-tools stuff23:28
thumperat a guess23:28
* thumper proposes a copule of branches23:35
wallyworldthumper: found it, fixed, testing23:36
thumperhttps://codereview.appspot.com/14005043/  is just logging tweaks23:37
wallyworldthe local environ did not implement CustomToolsSource interface23:37
wallyworldso it did not find tools using simplestreams, and defaulted to legacy23:37
wallyworldwhich means no checksums23:37
thumperhttps://codereview.appspot.com/14006043 is the fix for the config23:38
thumperah23:38
* thumper goes to set commit messages in prep23:38
* thumper waits for review23:39
thumperalmost tiem for lunch23:39
wallyworldthumper: done with one comment23:40
thumperadded a little context23:42
thumperwallyworld: the new test failed with the expected same error output to the log file23:43
thumperchanged the schema, and all good \o/23:43
wallyworldyay23:43
wallyworldthumper: i'll be proposing a fix soon, may you can look after lunch23:43
thumperok23:44
* thumper is heading into town to lunch with veebers23:44
thumperwallyworld: once you review the actual fix, you can approve it23:44
thumperI'm hoping you won't find any issue23:44
* thumper -> lunch23:45
wallyworldok23:45

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!