[00:00] davecheney: is there a particular bug you'd like me to look at next or should I just grab one from the list? [00:11] wallyworld_, thumper: does any of this look familiar and do you have an advice? http://pastebin.ubuntu.com/7224152/ [00:13] sinzui: and juju works fine on hp cloud? [00:16] wallyworld_, yes [00:16] hmmm [00:17] wallyworld_, I just confirmed that both configs use the same keys [00:18] sinzui: in the past, where container permissions have been wrong, the container has been created but subsequent reads failed. here we can't even create the container [00:18] it does seem to imply a canonistack swift issue [00:18] wallyworld_, noted. and in the past creation fails were race conditions. this say auth failure [00:19] sinzui: have you tried the other region? [00:20] wallyworld_, no [00:20] sometimes that can work [00:20] lyc01 vs lyc02 [00:20] wallyworld_, but I just checked the canonistack dashboard for /both/ accounts. The container view shows an error [00:21] hmmm, ok [00:21] and no joy asking in #is? [00:22] wallyworld_, they officially defer to canonical support. I opened a ticket there 10 hours ago and no one will talk to me [00:22] :-( [00:23] I am tempted to send an email notifying canonistack that it will be desupported. Without working accounts, I cannot deliver the next juju to it [00:24] agreed [00:24] we do need an openstack deployment to test against though :-( besides hp cloud [00:25] thumper: this fixes bug 1304132 and also removes the log noise from the critical bug alexis emailed about https://codereview.appspot.com/85770043 [00:25] <_mup_> Bug #1304132: nasty worrying output using local provider [00:53] arosales: hazmat email sent [00:53] with ppc segfault informatoin [00:54] * arosales looks [00:55] davecheney, k.. trying fresh on new ppc8.. orange box #2 vanquished [00:55] hazmat: that would be a good data point, i only have access to wolfe and winton, which are power7 [00:56] hazmat: if you see panics on your power8 host, you should revert to that kernel I specified [00:56] davecheney, my p7 host has been good wolfe-02.. 3.13-08 [00:57] trying on stilton-5 [00:57] hazmat: yes, that is the working kernel [00:57] it is pre the switch to 64k [00:57] pages [00:57] dfc: so 3.13.0-08.28 is what we need correct? [00:57] hazmat: are you running -08.28? [00:57] the .28 isn't the important bit [00:58] the -08 -18, -23 is [00:58] i've included a link to the old kernel in the archive [00:58] one moment.. switching tracks off maas [01:01] dfc, gotcha, avoid -18 and -23 [01:01] davecheney, so my p7 is -> 3.13.0-8-generic [01:01] davecheney, my p8 is -> 3.13.0-23-generic [01:06] hazmat: right [01:12] davecheney, so.. theory being that's an okay version? .. i'm gonna test and find out either way [01:13] hazmat: i've tested -8, -18 and -23 [01:14] only -8, which was pre the 4k page switch can run juju stabally [01:14] the other kernels radomly kill juju proceses with SEGVs [01:14] solid [01:15] davecheney, cool, thanks for tracking that down.. apparently i lucked into having at least one good demo p machine [01:15] hazmat: yeah me too [01:15] winton-02 is ooooooooooold [01:15] so it was running a very old kernel [01:15] but thumper bodie and timv hit problems [01:17] oh man, 64k pages kill the gccgo runtime? [01:17] somehow that's easy to believe [01:18] mwhudson: yup [01:18] mwhudson: tell me your thoughts [01:18] its signal related [01:19] somehow an invlaid signal is generated, or created, or just pops into existance [01:19] the powerpc/kernel/signal_64.c doesn't know how to handle it, so It calls force_sigsegv [01:19] and the userland thinks it has hit a nil pointer exception and panics [01:19] davecheney: well i think malloc.goc has a #define PAGE_BITS 12 in it [01:19] o [01:19] h [01:19] that sounds pretty messed up [01:19] mwhudson: but why should that matter [01:20] 12 is < 16 [01:20] davecheney: dunno [01:20] but is a multiple [01:20] all that happens is if you call mmap(0, 4096) you get a 64k allocation [01:20] mwhudson: i'm logging this all in a bug now [01:20] then i have a juju test to fix [01:21] then i'll try to create a smaller reproduction case [01:21] there is additional debugging in that file [01:21] but it appears to be turned off in this build [01:21] maybe spinning a new kernel with it enabled is the next step [01:22] wallyworld_: back from the gym now [01:22] davecheney: an invalid signal number is generated? [01:22] wow [01:22] what is the userspace doing when this signal arrives? [01:22] chlling [01:22] thumper: ok. i have 2 fixes for that critical bug https://codereview.appspot.com/85770043 and https://codereview.appspot.com/85750045 [01:23] so it's an async signal? [01:23] not sure if more work is needed [01:23] [18519.444748] jujud[19277]: bad frame in setup_rt_frame: [01:23] 0000000000000000 nip 0000000000000000 lr 0000000000000000 [01:23] [18519.673632] init: juju-agent-ubuntu-local main process (19220) [01:23] killed by SEGV signal [01:23] [18519.673651] init: juju-agent-ubuntu-local main process ended, respawning [01:23] wallyworld_: so what what going wrong? [01:23] thumper: i'll get those landed and will have to either test or ask axw if there's anything else obvious that needs looking at [01:24] thumper: 2 things 1. instance poller noise due to it not ignoring unprovisioned machines [01:24] +1 for that [01:24] 2. bad schema def for storage-port config attr on manual provider causing provisioner startup to fail [01:24] due to json serialisation issue [01:24] float64 vs int and all that [01:25] so those 2 fixes i did just by looking at logs [01:25] i had a look at the code to see if i could relate the fixes to the actual observed issue, but didn't get far enough [01:26] so i figured we could fire up some arm instances and test and/or ask axw for input when he comes online [01:26] I am online [01:26] what input do you need? [01:26] \o/ [01:26] I have LGTM'd your two fixes [01:26] axw: bug 1302205 [01:26] <_mup_> Bug #1302205: manual provisioned systems stuck in pending on arm64 [01:26] ok :-) [01:26] i am not sure if my fixes are sufficient [01:27] they are needed, but is there more to be done [01:27] davecheney, confirmed btw re 23.. panic while doing nothing detected in the log [01:27] hrm [01:27] have you seen similar issues when developing the manual provider? [01:27] wallyworld_: nope [01:28] or maybe we just need to test with the fixes [01:28] could yet be an arm issue i guess [01:28] davecheney: can you run my test program from https://sourceware.org/bugzilla/show_bug.cgi?id=16629 ? [01:28] looking at the logs now... [01:28] hazmat: i've even seen /usr/bin/go panic while running tests [01:28] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304754 [01:28] <_mup_> Bug #1304754: gccgo compiled binaries are killed by SEGV on 64k ppc64el kernels [01:28] certainly that storage-port issue is pretty fatal [01:28] arosales: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304754 [01:28] <_mup_> Bug #1304754: gccgo compiled binaries are killed by SEGV on 64k ppc64el kernels [01:28] wallyworld_: ah, the "close of closed channel" is something rogpeppe brought up last night [01:29] there's a bug in cmd/jujud [01:29] not sure if he fixed it yet... [01:29] (although i can't see why 64k pages would matter here) [01:29] axw: is that in the machine 0 log attached to the bug? [01:29] wallyworld_: yeah [01:29] let me look [01:29] wallyworld_: https://codereview.appspot.com/85450044/ [01:29] wallyworld_: I broke the machine agent when I allowed upgrade steps to get a state connection [01:30] davecheney: also, it would sure be nice to follow the execution of handle_rt_signal64 with gdb [01:30] axw: does that mp fix the close channel issue? [01:31] mwhudson: way above my pay grade [01:31] i'm not even qualified for pointer arythmetic [01:31] wallyworld_: yeah [01:31] davecheney: looks like the latest in the archies is -23 [01:31] yup [01:31] axw: great. so i'll land my branches and we can re-test i guess [01:32] wallyworld_: sgtm [01:32] wallyworld_: it would be nice to silence "cannot get instance info for instance "manual:10.0.128.7": no instances found" too, but it's not critical [01:33] axw: i haven't look into that on yet - what's the cause? [01:33] i wonder if there is an arm64 kernel with 64k pages i can try with [01:33] wallyworld_: manually provisioned machines are not managed by the provider - they just should not be polled [01:33] mwhudson: thta would be a good test [01:34] i tried to test using gccgo/amd64 [01:34] axw: in that case i'll add some code to my first branch [01:34] but lxc was all fucked on amd64 yesterday [01:34] do both fixes in one go [01:34] wallyworld_: there's a state.Machine.IsManual method that'll help there [01:34] i don't know anything about legacy architectures like amd64 [01:38] davecheney: do you have link hand to the matching initrd to the -28 .deb you pointed at? [01:41] arosales, so i removed the other kernels .. sudo upgrade-grub.. currently doing shutdown -r now .. to see if it worked ;-) [01:41] arosales_, removed via pkgs that is === arosales_ is now known as arosales [01:42] where's this -28 kernel at, I don't see it in proposed? [01:42] jcastro, its on the machines that barf.. ls /boot [01:42] jcastro: o/ [01:42] jcastro: I have a version of debug-log on my machine that works with the local provider [01:42] arosales, don't do what i just suggested it.. it doesn't like that ;-) [01:42] thumper, hey! we made a plugin, heh [01:43] yeah, but it doesn't do filtering [01:43] * thumper guesses [01:43] oooh [01:44] or replay or exclude/include by unit/machine or channel [01:44] jcastro, we have parity.. [01:45] * hazmat sheds a tear [01:45] for debug-log [01:45] heh [01:45] thumper, also, one thing we should talk about [01:45] arosales: not -28 [01:45] is the debug-hooks <-> retry --resolved thing makes me cry [01:45] hazmat, ack :-) [01:45] you want uname -a [01:45] Linux winton-02 3.13.0-8-generic #28-Ubuntu SMP Mon Feb 17 08:22:39 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux [01:45] oh _8_ [01:45] wget https://launchpad.net/ubuntu/+source/linux/3.13.0-8.28/+build/5602341/+files/linux-image-3.13.0-8-generic_3.13.0-8.28_ppc64el.deb [01:45] not 28 [01:46] davecheney, so the _only_ fix is to revert to -08? [01:46] jcastro: i'll forward you the email [01:46] davecheney, do I need an initrd for that? [01:46] arosales, atm yes [01:46] arosales: the only workaround I have at this time [01:46] jcastro: no [01:46] davecheney, thanks! [01:46] jcastro: I don't understand [01:46] so I do debug-hooks [01:47] and in order to be able to fire off a hook to debug it [01:47] also, know this [01:47] I can only make one person happy at a time [01:47] I need to open a new terminal and do resolved --retry [01:47] it isn't your turn [01:47] it is hazmat's [01:47] <3 [01:47] local log will keep me happy. :D [01:48] yeah, I'm open to looking to fix it... [01:48] davecheney, I am confused in your email you state, "workarounds: you should install this kernel [01:48] wget https://launchpad.net/ubuntu/+source/linux/3.13.0-8.28/+build/5602341/+files/linux-image-3.13.0-8-generic_3.13.0-8.28_ppc64el.deb" [01:49] davecheney, ah I should have said -8.28 not -28 [01:49] davecheney, gotcha [01:49] with is a revert [01:49] thumper, and all of cts :-) [01:49] sorry long day [01:49] * arosales better just grab some dinner [01:50] arosales: yup, we're also lucky that -28.8 isn't a think [01:51] both those numbers appear to be increasing [01:51] s/think/thing [02:08] c.Assert(err, gc.IsNil) [02:08] ... value *mgo.QueryError = &mgo.QueryError{Code:16149, Message:"exception: cannot run map reduce without the js engine", Assertion:false} ("exception: cannot run map reduce without the js engine") [02:08] store tests are failing again [02:09] i thought that the store tests wouldn't run unless we passed a flag ? [02:09] cmars: didn't you fix this ? [02:09] davecheney, thought so, yes. is this trunk or 1.18? [02:09] cmars: trunk [02:09] hmm [02:10] davecheney, which test is it? is there a file & line #? [02:10] cmars: please hold [02:11] go test launchpad.net/juju-core/store 2>&1 | pastebinit [02:11] Failed to contact the server: [Errno socket error] [Errno socket error] timed out [02:11] oh for fucks sake [02:11] does nothing work today ? [02:13] thumper: what is the env var to lower logging ? [02:13] JUJU_LOG= ? [02:14] here's mine: JUJU_LOGGING_CONFIG==INFO; juju.container=TRACE; juju.provisioner=TRACE [02:14] that is ready by bootstrap [02:14] ta [02:14] hnn, that isn't it [02:14] rog had a different one [02:14] a flag to testing [02:15] -juju.log WARNING [02:17] cmars: http://paste.ubuntu.com/7224466/ [02:17] ok, thanks. looking [02:19] ta [02:22] davecheney, i thought it had landed, but it hasn't [02:22] https://code.launchpad.net/~cmars/juju-core/cs-mongo-tests/+merge/213563 [02:23] i think we can land it, if CI will support running the store tests with full mongodb tests [02:25] davecheney, what do you think? will you take that as an action item? [02:25] cmars: no [02:25] ok :) [02:25] i cannot take that as an action item [02:26] i'll follow up w/curtis tmw then [02:26] cool [02:26] cheers [03:19] arosales: do you have any doc or otherwise that tells me what i need to do get access to some arm vms to test a fix for that bug [03:27] wallyworld_: i can help w/ that [03:27] \o/ [03:27] wallyworld_: do you have an account on batuan? [03:27] that's the gateway into our network - if you don't, you can ask for one in #is [03:27] yes, since i have logged onto power vms previously [03:27] sweet [03:28] i think you know it's bug 1302205 [03:28] <_mup_> Bug #1302205: manual provisioned systems stuck in pending on arm64 [03:28] right [03:28] i don't think it's specifcally an arm issue [03:28] but want to test on arm anyway [03:28] yep - just a sec... wonder if someone just took our host down [03:28] there have been 3 branches which landed today or last night which should hopefully fix it [03:34] wallyworld_: yeah, looks like its in use debugging an unrelated issue. i'll send you an e-mail with access info and ping you (or have someone ping you) when its ready [03:34] great thanks :-) [03:36] davecheney: how can I make sure that the bufio.Scanner doesn't consume too much? [03:36] davecheney: I have an io.ReadCloser [03:36] davecheney: and I want to read up to the first new line, and no more [03:36] the Scanner when I call scan reads 4k [03:36] which consumes way more than I want [03:45] wallyworld_: e-mail sent - system isn't quite ready yet, stay tuned.. [03:45] ok [03:51] dannf, thanks re wallyworld arm access question :-) [03:53] np, very happy to have help w/ it :) two days and i've just managed to learn how to kinda read go :/ [04:06] wallyworld_: ok, have at it [04:06] dannf: awesome, firing up ssh now [04:07] dannf: "ssh 10.229.41.200" should work right? [04:07] it should, but its not working for me either - lemme ask [04:40] is that because the ssh config can't work out where to proxy through? [04:42] thumper: fyi I filed a merge request for the changes you suggested yesterday about isolating the git tests [04:42] jcw4: awesome, I saw that in my inbox [04:42] jcw4: I'll take a look once I submit this change :-) === vladk|offline is now known as vladk [04:43] thumper: thanks; no rush... just excited about contributing ;) [04:46] wallyworld_: fyi instance type constraint branch has conflicts [04:46] yes it does, fixed loclly [04:46] still wip [04:47] jcw4: did you push your changes? [04:47] yes; to a new branch [04:47] the last one was too messy [04:47] :-) [04:48] jcw4: ok, there is a resubmit option on the RHS of the merge proposal page, that includes a "start over" [04:48] which would have marked the old as superseded [04:48] but that's OK, I'll just reject the old one. [04:48] I see. Thanks [04:50] jcw4: what happens when you change the LC_ALL to "C" ? [04:50] I was planning on doing that after running all the tests without it. [04:50] after they all passed I was too excited and forgot [04:50] testing now [04:52] thumper: worker/uniter/charm/... tests passed [04:52] I'll push that change too? [04:52] move that patch env into the base git test suite [04:52] with the other env patches [04:53] jcw4: then you can delete the SetUpTest for GitDirSuite [04:53] cool, right [04:53] as it won't be doing anything [04:53] then yes, push that [05:03] thumper: the LoggingSuite TearDownTest(c) needs to be called in the GitSuite TearDownTest? I'd add that back in if necessary [05:03] jcw4: if the only line of the tear down is to upcall the tear down, then you can just delete it [05:03] thumper: okay; that's all there is. tx [05:15] * thumper EODs [05:20] wallyworld_: I've responded to your comments, but I'm now looking at HA [05:20] np [05:21] just wanted to get some thoughts down [05:21] i'm stuck on otherthings also [05:33] hi all, I am using 1.16.6 stable juju-core to bootstrap an Openstack Havana cloud, but fails with can't find index.json [05:34] it seems that juju is trying to find the meta file in the path /streams/ but swift has tools/streams/ [05:41] morning all [05:57] fwereade, can you take a look at this please ? https://codereview.appspot.com/85220044/ [06:51] dimitern: howdy! How's the vlan work coming along? [06:51] hey bigjools [06:52] bigjools, i'm in the final steps - cloudinit scripts that bring up network interfaces [06:53] bigjools, vladk is working on a few extensions to gomaasapi to allow us to unit test the new api calls [06:53] bigjools, capabilities; lshw dump of a node; networks?op=list_connected_macs [06:54] dimitern: nice, all going to make it for the release? And any issues with maas I need to know about? [06:54] bigjools, but all these were live tested on my local maas using daily builds ppa [06:55] bigjools, we're aiming for feature completeness by friday, but should be ready before that [06:55] excellent [06:56] bigjools, bug 1303617 hit me after a recent upgrade and i can no longer use the fast installer (fails at boot and doesn't recover), which is slow and tedious [06:56] <_mup_> Bug #1303617: pc-grub install path broken in curtin [06:56] dimitern: weird, I did a fast install today and it was fine [06:56] hmm Fix Released - i'll try it now [06:57] bigjools, we have a few wishlist items for the maas api [06:58] bigjools, like the ability to see networks + connected macs in one place (either in GET node/system_id or in GET networks/(all)) [06:59] dimitern: please file bugs [06:59] bigjools, will do [06:59] I will triage them as wishlist and we'll put them on the stack [06:59] bigjools, otherwise now we need to do several api calls at startinstance time to get all we need [06:59] ok we can optimise that [07:10] cmars: I'm not sure why your test failed, but it would seem that we could tell the landing bot to always run the mongojs tests [07:10] cmars: though because of that, I'd actually rather have the CI tests disable it, rather than disabled by default. [07:11] experience has shown that ENV vars play nicer with go test than flags, because flags are only valid per package, and "go test ./..." tries to pass all flags to all packages. [07:11] bigjools, filed bug 1304857 [07:11] <_mup_> Bug #1304857: API should report networks and connected macs in the response of a single node [07:12] jam, fancy a review ? https://codereview.appspot.com/85220044/ [07:15] bigjools, another question re gomaasapi - how do you feel about adding high level wrappers around common APIs? like having AcquireNode method that the provider calls, rather than constructing a URL internally? Other similar examples are ListNetworks, ListNodes, etc.? === vladk is now known as vladk|offline [07:20] dimitern: I can't remember much about gomaasapi [07:22] bigjools, :) well, that was just a thought [07:22] dimitern: one of the other guys will have an opinion I'm sure [07:23] bigjools, we'll ask them for reviews when changes are proposed === BjornT_ is now known as BjornT [07:30] dimitern, I'm getting progressively more nervous about NetworkName vs making it clear that it's a provider-specific id like instance.Id [07:31] fwereade, can't we say yes, it's provider specific, but it's also used by juju to identify the network internally? [07:31] dimitern, it will be, indeed [07:31] dimitern, but we're going to want network names as well [07:31] fwereade, ok, how are we going to make it clearer? [07:32] dimitern, when openstack gives us network abcdef638746328756865198, and we call that NetworkName, what field will we use for the "private" name users will want to use [07:32] dimitern, or "my_network" or whatever [07:33] fwereade, openstack has labels for networks just the same [07:33] dimitern, and so does every provider ever? [07:33] dimitern, that's quite the prediction ;p [07:33] fwereade, i can't say that :P [07:34] fwereade, so tell me how to alleviate your nervousness about it? :) [07:34] dimitern, call it NetworkId :) [07:35] dimitern, you know -- we have machine ids, and instance ids, and they are not the same [07:35] dimitern, (and machine ids are machine names really, but hysterical raisins) [07:36] fwereade, so, basically change it everywhere from NetworkName to NetworkId ? [07:36] fwereade, I need a follow-up for that [07:36] dimitern, I'm more concerned about the API [07:37] fwereade, you're thinking about network tags? [07:37] dimitern, and that the terminology that's hard to change be consistent with what we expect to do [07:37] dimitern, well, that was my first thought [07:37] dimitern, but then I realised that converting these names into tags would be completely wrong [07:37] mornin' all [07:37] morning rogpeppe [07:37] dimitern, because they're provider vocabulary, not juju vocabulary [07:37] dimitern: hiya [07:38] fwereade, ok, so then what? [07:38] fwereade, i'm trying to follow but can't see what's needed [07:38] axw: ping [07:38] dimitern, although -- wait, don't you use tags in the client api? I think we should... [07:38] rogpeppe: pong [07:39] fwereade, we use tags everywhere in the api [07:39] fwereade, but not for networks [07:39] axw: about removing JobManageEnviron: [07:39] * fwereade grumbles [07:39] axw: the reason we don't want to remove JobManageEnviron from a voting state server is that when a machine hasn't got JobManageEnviron we allow it to be removed [07:40] dimitern, we don't identify machines in the client API by provider-specific instance id, and we shouldn't identify networks that way either [07:40] axw: and if that happens we could break the invariant that we only ever have an odd number of voting state servers [07:40] axw: or rather, and odd number of state servers that *want* to vote [07:40] fwereade, i agree, but the only way we can deal with networks so far is if we get them from the provider [07:40] fwereade, at provisioning time [07:40] s/and odd/an odd/ [07:41] dimitern, we *can* impose a requirement that network names match provider ids exactly, this is MVP after all [07:41] fwereade, i guess you're suggesting to require the user to add any networks to juju before being able to deploy with them [07:42] fwereade, and i can see how this is the way we wanna go eventually, but not for nwo [07:42] now [07:42] dimitern, well, mid-term, yes -- I'd expect --networks params to be validated [07:42] rogpeppe: okay [07:42] rogpeppe: lots to take in here, still figuring out how all the voting bits work. [07:43] dimitern, short-term, I want us to be clear on the distinction between juju vocabulary over the client API (tags) and provider vocabulary over the internal API (network ids) [07:43] fwereade, so let's make a plan - i land this last CL and make another one for s/NetworkName/NetworkId/ throughout, and then do the cloudinit stuff [07:43] axw: thanks for taking a look. feel free to ask about whatever doesn't seem to make sense. [07:43] dimitern, internally I'm fine saying that network name == network id (for mvp at least) [07:43] dimitern, am I helping? [07:43] fwereade, how can we be clear about this? in the docs? where? [07:43] fwereade: it would be nice if network ids were distinguishable from machine ids and unit ids (which are both currently distinguishable from each other) [07:44] fwereade: and service names, of course [07:44] fwereade, yeah, that is how it's gonna be for now - we call it networkId, but we mean maas-specific name [07:44] rogpeppe, not really gonna happen, that's why we have tags [07:44] fwereade: yeah, fair enough [07:44] rogpeppe, although *probably* units/machines will be safe, but services won't ;p [07:45] fwereade: yeah [07:45] dimitern, so, in the Setprovisioned bits: it's a provider-specific network id, not a tag [07:46] dimitern, in the Client-facing IncludeNetworks/ExcludeNetworks bits, we should be using tags [07:46] fwereade, yes, you mean better doc comment [07:46] dimitern, internally we can just strip off the "network-" prefix and keep going mapping 1:1 with provider-specific network ids [07:46] fwereade, so juju deploy --networks=net1,net2 which goes over the API as network-net1, network-net2 [07:47] fwereade, and for include/excludeNetworks in state we still use the ids, not tags as usual [07:48] dimitern, yeah, exactly -- and for now the stripped names have to map to intrnal provider ids, but we keep them distinct so it doesn't become confusing when we have to change over later [07:48] fwereade, ok, got it [07:48] dimitern, inside state you can even stick with a single field in the document doing both duties... but be very clear that the _id field is for the *juju* name, not the provider name [07:49] fwereade, better comments, ok [07:49] dimitern, brilliant, thanks [07:49] fwereade, i'll try to remember all that :) will propose it some time later today [07:49] dimitern, I'm going through the CL now in case you hadn't realised ;p -- some more naming quibbles but otherwise looking sound I think [07:50] fwereade, great! [07:52] dimitern, reviewed [07:53] rogpeppe, btw, do you think you might have a spare cycle to look at https://bugs.launchpad.net/bugs/1303735 today? it looks a bit like something you might know about [07:53] <_mup_> Bug #1303735: private-address change to internal bridge post juju-upgrade [07:53] fwereade: looking [07:53] axw, did you see https://bugs.launchpad.net/bugs/1303583 ? [07:53] <_mup_> Bug #1303583: provider/azure: new test failure [07:54] fwereade: I have, but haven't had time to look into it yet [07:54] axw, np, just wanted to make sure it was on your radar [07:54] fwereade, ta [07:58] fwereade: the issue is quite obscure to me - i'm can't see the exact problem that's being reported there [07:59] rogpeppe, AIUI it's a change in behaviour -- jamespage will be able to make it clear I think? [08:00] fwereade: right. it would be nice to know what's the expected behaviour there and how the reported logs differ [08:00] rogpeppe, I upgrade nova-compute nodes (which had the correct private-address) and the private-address switches to be the ip address of the internal bridge virbr0 [08:00] jamespage: where can i see the result of that in the status? (or the logs?) [08:00] rogpeppe, in the bug report [08:00] jamespage: yeah, i was looking at the bug report [08:01] rogpeppe, the dns-name of all the nodes are the same [08:01] #err [08:01] jamespage: the status doesn't seem to show private addresses [08:01] rogpeppe, OK - public-address then [08:01] jamespage: ah, dns-name, sorry [08:01] rogpeppe, whatever happened it was wrong [08:01] jamespage: right, the public address. that really confused me. [08:01] rogpeppe, I'm not sure about the private-address tbh [08:02] rogpeppe, titled changed [08:02] jamespage: thanks === vladk|offline is now known as vladk [08:16] fwereade, how about s/SetProvisionedWithNetworks/ProvisionInstance/ ? [08:18] jamespage: can you find out what addresses nova returns for the instance ids? [08:18] rogpeppe, not right now [08:18] jamespage: ok [08:18] but I can look again later [08:19] jamespage: i'm suspecting that nova is returning the libvirt bridge address as one of the addresses for an instance, and our logic happens to be picking it out [08:19] rogpeppe, hmm [08:20] rogpeppe, nova has no knowledge of that afaik [08:20] as in there is no agent in the instance that would let it know [08:20] jamespage: hmm [08:25] jamespage: ah, i see where it comes from [08:31] axw: i think this issue (#1303735) is to do with worker/machiner - setMachineAddresses is setting the libvirt bridge address without marking it as NetworkMachineLocal [08:31] <_mup_> Bug #1303735: public-address change to internal bridge post juju-upgrade [08:31] jamespage: so, i know what the issue is, but i'm not yet sure of the right way to fix it [08:37] rogpeppe: that would suggest that the openstack provider doesn't have any cloud-local addresses [08:37] is that expected? [08:38] rogpeppe: and you're right of course, it's not setting them to local - how would it know to do that? [08:38] axw: yeah [08:39] axw: i don't think it means the provider doesn't have any cloud-local addresses, as we're looking for public addresses here [08:39] rogpeppe: sorry, misread the bug [08:39] rogpeppe: I thought it was private [08:39] axw: it seems like state.mergedAddresses doesn't preserve ordering, which is perhaps a pity [08:40] jamespage: it would still be useful to see what addresses nova is returning for the instances [08:42] axw: i'm thinking that it might be possible for a machine to know which interfaces are private, but it might be quite os-specific [08:42] rogpeppe: ISTM that the best thing we could do is to prefer cloud-local over unknown [08:43] rogpeppe: indeed [08:43] axw: when asking for a public address? [08:43] (quite os specific) [08:43] axw: that seems wrong to me [08:43] rogpeppe: yeah, if there's no public address [08:43] is it less wrong to choose an unknown address that might be private (like this)? [08:43] axw: another possibility is to strictly order Machine.Addresses before Machine.MachineAddresses [08:43] the right thing of course is to classify things properly [08:44] rogpeppe: looking at instance.SelectPublicAddress, that won't work - it chooses the last cloud-local/unknown in the list [08:45] which is different to internal, for some reason [08:45] axw: that's definitely wrong if so [08:45] rogpeppe: perhaps it just needs to change to be like internal [08:46] axw: yes [08:46] (and preserve order) [08:46] axw: in fact, the implementation of internalAddressIndex and publicAddressIndex should probably be merged [09:10] jamespage: when you have a moment, would you be able to run this go program on one of the openstack nodes in the juju env that exhibits this problem? http://play.golang.org/p/GH0261EIHH [09:11] axw: i'm thinking we might be able to make some deductions from the interface name [09:13] rogpeppe, OK - lemme finish up the upgrade testing I'm doing and I'll try again [09:36] morning all [09:36] rogpeppe: you around? [09:36] natefinch: yup [09:36] natefinch: just doing a review. will be with you shortly. [09:36] rogpeppe: sure [09:42] rogpeppe: sorry was afk. I suppose that would be better than what we have now [09:43] axw: preserving order, you mean? [09:43] rogpeppe: deducing classification [09:43] axw: yeah [09:44] axw: we should preserve order too, i think, so the addresses are in predictable order. currently we're shuffling them randomly, which isn't great [09:44] rogpeppe, axw: you guys talking about the sort.Stable address problem with replicaset addresses? [09:44] rogpeppe: provider addresses should certainly come before machine, but otherwise I think relying on order is a mistake [09:45] natefinch: no, something else entirely - choosing public addresses when there are only unknown/cloud-local [09:45] natefinch: no, we're tallking about #1303735 [09:45] <_mup_> Bug #1303735: public-address change to internal bridge post juju-upgrade [09:45] axw: mgz says that order is important [09:45] ahh ok [09:46] rogpeppe: if addresses really do have a priority, then I think that should be explicit [09:46] axw: and that a provider can return preferred addresses by putting them earlier in the addresses slice [09:46] ordering in a slice seems pretty subtle, easy to break [09:46] axw: that's the current design, FWIW [09:47] yeah, I get that - it needs to be fixed - just whining :) [09:47] type AddressByPriority []Address [09:47] now it's explicit [09:48] natefinch: i think it's reasonable as is, actually. [09:48] rogpeppe: we should probably document that order is important on instance.Instance.Addresses [09:49] axw: it's not too hard to take care to preserve order. it would be nice if there was a function to help with merging address slices in the instance package [09:49] axw: definitely [09:51] rogpeppe: I'm not a huge fan of relying on order of a generic slice. I guess we very rarely pass it around outside the provider, and if the provider interface makes it clear the order matters, then that's probably ok. [09:51] natefinch: we pass it around a lot actually [09:51] natefinch: i don't really see the problem - slices are inherently ordered [09:53] rogpeppe: yes, but that order usually doesn't matter. And it's not clear it matters when some random function gets a list of addresses deep in the bowels of the code. [09:53] natefinch: huh? that order often/usually does matter! [09:53] I presume we got into this mess because we didn't realize the order of the slice matters [09:53] natefinch: e.g. []byte [09:54] natefinch: we definitely need to document that more [09:54] natefinch: but i think it's reasonable to have a convention that []Address is ordered [09:55] natefinch: otherwise we'd end up adding some kind of a priority field which would actually make things considerably harder [09:59] rogpeppe: I'm just not a fan of preventing bugs by following conventions that are likely only written down in one place in a huge codebase. But I agree making the providers return a different type would be a hassle. [10:00] natefinch: it's not just making the providers return a different type - it's coordinating priorities. do you have some global definition of address priority levels? what do you do when you combine address from two different sources? [10:00] natefinch: all those issues fall out naturally if you assume that ordering matters in a slice [10:01] natefinch: we should definitely write down in a couple of places that order is significant [10:01] rogpeppe: I don't want to continue to argue it, since it's just stopping us from actually doing anything, but I think the answer is non-trivial no matter what we do. [10:01] natefinch: i don't think it's too hard actually. just preserve order when combining addresses. [10:02] mgz: have you nova booted an instance on hp cloud manually and then attempted to ssh into it? i've had no luck getting in via ssh [10:02] rogpeppe: I guess I don't know how to preserve order when merging two slices unless you know how they were sorted in the first place. [10:03] natefinch: trivial answer: just concatenate the slices [10:03] axw: I just got a "session already closed" panic on the bot. Doesn't your patch fix that? [10:04] my patch? [10:04] axw: the one that untwines StateWorker and APIWorker [10:04] jam: rogpeppe fixed a channel closed one [10:04] natefinch: more sophisticated answer: delete items in the second slice that exist in the first slice before concatenating them [10:04] jam: link? [10:05] jam: nm, found it [10:05] rogpeppe: how do you know the ones in the second slice are lower priority than all the ones in the first slice? [10:05] axw: heres' a link to the failure: https://code.launchpad.net/~jameinel/juju-core/go-vet-cleanup/+merge/214911 [10:05] jam: yeah, my patch wasn't for a "session already closed" error [10:05] jam: that looks different [10:05] natefinch: you make that decision [10:05] natefinch: based on the origin of each slice [10:06] jam: it may well be related to my patch though [10:06] jam: i'll have a look [10:07] axw: I thought you had a comment in IRC about breaking the machine agent because of the multiple connections during upgrade, which might be related, but maybe not directly. [10:07] this, in particular, looks like a Watcher that is trying to finish something while the connections are cleaning up. [10:08] jam: I did, and rogpeppe fixed it... I don't think it is related, but maybe rog will have a better idea [10:10] axw: rogpeppe: looking at state/watcher/watcher.go it looks like it could be a race condition. If we triggered tomb.Dying but also got the timeout in time.After(period), the w.needSync will be checked without looking at tomb.Dying [10:11] hmm.. alternatively, on first entering the function, you also set needSync, but haven't looked at Dying yet (AFAICT) [10:11] jam: i don't think that should matter [10:12] the traceback says that it was happening in New() [10:12] jam: until the watcher's tomb is Dead, it's entitled to do anything it likes [10:12] though it doesn't go above that. [10:12] jam: i think it must be that we're not closing things down properly [10:12] rogpeppe: sure, it looks like we might have gotten a closed session while we were doing something else, and we're closing it concurrently with creating something new.. ? [10:14] rogpeppe: anyway, don't look too deeply on this, I was just trying to push out some of wwitzel's in-progress stuff while he was gone [10:14] it isn't critical work [10:17] btw, rogpeppe: to land HA, we need to rework the sort.Stable of addresses. sort.Stable is go 1.2, and we only require go 1.1.2 right now [10:18] natefinch: why do you need to stable sort? [10:18] natefinch: right - i saw that. all that selectPreferredStateServerAddress logic is about to go anyway [10:19] natefinch: i didn't suggest taking it out because i didn't want to perturb the branch any more [10:19] natefinch: i'd just delete all of that and use mongo.SelectPeerAddress instead [10:19] axw: we used a stable sort to preserve address order [10:20] rogpeppe: right, we just have to take it out since the bot can't compile it [10:20] rogpeppe, OK - this is from 12.04 - http://paste.ubuntu.com/7225600/ [10:20] rogpeppe: yeah, just wondering what part of the address is being ignored for the sort.Sort not to be good enough [10:20] rogpeppe, however I think I saw the issue on 14.04 nodes - so doing it there as well. [10:21] jamespage: oh, one mo. i didn't include some crucial info. [10:21] cos if they're equal and we're considering all fields, surely we don't care [10:24] jamespage: this is more useful: http://play.golang.org/p/mmy9KhUy9T [10:24] axw: we weren't comparing all fields [10:37] wallyworld_: yeah, you need to add your ssh key either through cloud-init or via nova though [10:37] mgz: i tried via nova using keypair-add [10:37] right, with that... it didn't work? [10:37] i used the --pub-key option [10:38] yeah, didn't work [10:38] mgz: i'm trying to test the latest fixes to the manual provider that landed today [10:38] rogpeppe, http://paste.ubuntu.com/7225650/ [10:38] you can use `nova console-log` to see what's up if you supplied any cloud init bits [10:39] mgz: didn't supply any cloud init bits, was just assuming keypair-add would work [10:39] console log seemed to show some random key being used [10:39] not mine [10:39] odd [10:39] jamespage: thanks [10:40] wallyworld_: ah, [10:40] jamespage, mgz: do you think it would be reasonable to pattern match on the interface name to determine the class of address? (e.g. if it matches virbr* then assume it's machine-local) [10:40] did you actually use `nova boot --key-name MYKEY` ? [10:40] yep [10:40] jamespage: i don't know how predictable interface names are in linux [10:40] okay, I'm out of ideas then :P [10:40] mgz: the same name as i used for keypair-add [10:41] :-( [10:41] wallyworld_: try supplying a key with cloud-init instead [10:41] rogpeppe, hmm [10:41] 's a bit more work but should be fine [10:41] mgz: point me to some doc to tell me what to do? [10:41] sec [10:41] or i can try with lxc i guess [10:42] jamespage: because i believe there are cases where we really do want to get the addresses off the local machine interfaces. but that's hard if we can't tell which ones are machine-local. [10:45] wallyworld_: basically, make a text file with `#cloud-config\nssh_authorized_keys\n - ssh-rsa .... blah@blah\n" [10:45] rogpeppe, you can't safely make that assumption "if it matches virbr* then assume it's machine-local" [10:46] see doc/examplescloud-config-ssh-keys.txt in lp:cloud-init for an example [10:46] ta, ok [10:46] then you can supply that file stright as --user-data to boot [10:46] (no need to gzip as it's so small) [10:46] ok, i'll try that [10:47] rogpeppe, is it possible to limit juju to quering interfaces its been told about or created itself? [10:47] rogpeppe, whitelist rather than blacklist [10:47] jam, standup? [10:48] jamespage: we'll nearly start doing that with maas now I think, as dimitern has started getting the network interfaces from the lshw that maas provides [10:48] we should probably do something similar when we grow better networking support in other clouds [10:49] mgz, jamespage, we definitely will do that for other clouds, gradually as juju networking support grows [10:50] fwereade, updated and tested https://codereview.appspot.com/85220044/ - should be good to land [10:50] dimitern, cheers [11:07] fwereade, I've made the small change you asked for - just added a test and a small fix - happy or me to land it? https://codereview.appspot.com/83060049/ [11:17] dimitern: sorry I missed the ping. I completely spaced off the standup, and was on my other laptop. [11:18] jam1, we're still there, you can join if you like :) [11:32] mattyw, if I LGTMed with fixes you don't need to ask, but you can always ask for another review if you'e not sure [11:32] fwereade, ok, I just added the test - and a fix I found while writing it so I'll approve it then, thanks [11:33] mattyw, cool [11:39] fwereade, dimitern: https://code.launchpad.net/~jameinel/juju-core/1.18-refuse-downgrade-1299802/+merge/214878 needs a review [11:40] jam1, looking [11:42] dimitern: thanks [11:47] jam1, LGTM [12:12] mgz, jamespage: i wonder if we could just add only addresses from eth* interfaces for the time being. that would probably cover the case that we care about most currently. [12:13] natefinch: hangout? [12:13] rogpeppe: is it expected we'll want to have non-voting replicaset members? is that why we have NoVote/WantsVote? or is that specifically for handling inaccessible members? [12:14] axw: yes - if a machine goes down, we don't know that it might just come back up again in a few moments, so we don't want to just destroy it or remove it immediately [12:15] axw: so we just mark it so that it doesn't want the vote [12:15] axw: also, we can have a machine with WantsVote=false and HasVote=true [12:17] ok [12:17] rogpeppe: sure [12:17] axw: our main invariant is that the number of machines that *want* the vote must always be odd, and similarly the number of machines in the replica set configuration that *have* the vote must always be odd. [12:17] natefinch: one mo, i've just been called to lunch [12:17] ok [12:17] * rogpeppe lunches [12:18] * natefinch breakfasts [12:21] * perrito666 snacks after breakfast [12:21] we really need more names for eating occasions [12:21] * axw pats his belly full of pizza [12:22] perrito666: brunch is the breakfast + lunch meal [12:22] second breakfast is the hobbit one, (along with elevensies (sp?)) [12:22] heh [12:22] jam1: I am in the hobbit one [12:22] I intend to lunch too [12:22] (and honestly,I might also eat something near eleven not that you mention it) [12:23] perrito666: http://www.moviemistakes.com/film1778/quotes [12:23] so, breakfast, second breakfast, elevensies, Lunch, Luncheon, Afternoon tea, dinner, supper, I'm not sure if there are more [12:24] that pretty much covers my day :) [12:25] rogpeppe, natefinch: so how close are we to having a "juju ensure-state-availability" that we can play with ? [12:37] jam1: i've got a branch that seems to work [12:38] jam1: but it needs more tests [12:48] rogpeppe: natefinch: I just noticed that we thought EnsureMongo could probably land (and be polished from there) yesterday, but it is still up for review. [12:49] at least the comment yesterday was "if I get enough time before the kids wake up", which probably didn't happen, but certainly afterwards... ? [12:49] jam1: it's landing very soon [12:49] jam1: it used a go 1.2 feature which meant it couldn't land as was [12:49] rogpeppe: if that wasn't said weeks ago, I would trust you :) [12:49] rogpeppe: what was that? (I wasn't particularly aware of 1.2 incompatibilities) [12:50] jam1: it used sort.Stable, which is a go1.2 addition [12:50] ah [12:50] jam1: it's been LGTM'd [12:51] is the landing bot awake? [12:52] mattyw: it landed my stuff 10 min ago [12:52] but I'll check it [12:52] mattyw: do you have something that it isn't noticing? [12:53] jam1, https://code.launchpad.net/~mattyw/juju-core/deploy-with-user-name/+merge/213962 [12:53] jam, I guess there might be a queue? [12:53] mattyw: you don't have a commit message set [12:53] so the bot ignores it [12:53] jam, ah - of course, thanks [12:53] mattyw: I copied your description [12:54] jam1, that's great thanks very much [12:54] jam, I'll try to remember for next time [12:58] fwereade, poke re https://codereview.appspot.com/85220044/ [13:02] natefinch: i've got a dentist's appointment now. back in 30 mins [13:02] mattyw: I can see the bot is processing your request. [13:02] Note that we've had some intermittent failures with "Session already closed". If you see that, you can resubmit. [13:03] jam1, ok thanks [13:04] Hi jam, fwereade : I think this bug is describing unsupported behaviour or lxc nested in kvm: https://bugs.launchpad.net/juju-core/+bug/1304530 [13:04] <_mup_> Bug #1304530: nested lxc's within a kvm machine are not accessible [13:05] sinzui: yeah, that's likely just a case of no one having tried it yet [13:06] the local provider is already pretty crazy when it comes to addressing without adding nested containers in [13:08] mgz, I think stokachu has done something like that and it required esoteric magis to work [13:08] if you manually fiddle with the network setup you you could probably make it work [13:08] it's not something we're looking to support for trusty though [13:09] mgz, CI hates trunk https://bugs.launchpad.net/juju-core/+bug/1305047 [13:09] <_mup_> Bug #1305047: Unit tests fail on lp:juju-core r2588 [13:10] sinzui: that's rogpeppe's bug [13:10] rogpeppe: have you got a bug number for it? [13:12] Ah, silly me, stokachu is the reporter of the bug. So I think he has reached the dead end that thumper predicted [13:23] dimitern, rereviewed [13:27] fwereade, thanks [13:29] mgz, sinzui: i tried and failed to reproduce that problem [13:31] :( [13:32] sinzui: interestingly panic is in a different test to the one that jam saw [13:33] s/panic/that panic/ [13:35] rogpeppe, CI will run the tests 5 times before giving up. It tried for many revs and did many fails [13:35] rogpeppe, But Vi just got a pass http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/ [13:36] rogpeppe, trusty is has the same bad record, but its passes happen in a better order to make CI happy: http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-trusty/ [13:37] sinzui: it seems like a pretty easy to hit race fail [13:37] landing bot has jackpotted a number of times [13:37] rogpeppe: you have dentist appointments that only take 30 minutes? Damn, mine always take like an hour. [13:37] (and that's not including time to get there) [13:39] natefinch: the actual appointment was just a checkup - 10 minutes only; and the dentist is only a couple of minutes bike ride away [13:39] rogpeppe: that's cool. [13:41] hmm, i just saw another (probably unrelated) panic when testing [13:42] http://paste.ubuntu.com/7226267/ [13:42] rogpeppe: is the change you suspect just revertable? [13:43] mgz: probably [13:43] mgz: i'd like to know what's going on though [13:43] if CI can hit the error this reliably, should be pretty easy to confirm blame or not [13:50] rogpeppe, does the same code get used in the MAAS provider? when using LXC containers, brX is also valid [13:51] jamespage: yes, the same code gets used in the MAAS provider [13:51] jamespage: perhaps we need provider-specific code to run in the client to get the addresses [13:51] rogpeppe, so in the MAAS provider the IP address is assigned to the bridge, not the physical interface [13:51] assuming LXC or KVM containers have been created [13:52] whitelisting eth* and br* might work OK [13:52] that said I've seen emX style entries as well with biosdevname [13:52] jamespage: i'm not familiar with the details of what scenarios we really need the machine-local address discovery. [13:52] fwereade, mgz: ^ [13:53] s/discovery/discovery for/ [13:53] jamespage: I'm suspecious of anything just running on the machines themselves [13:53] mgz, rogpeppe: do we know what this local discovery step is used for? [13:54] jamespage: i'm not sure. i'm guessing there are some places that we can't use the provider for discovery [13:54] jamespage: perhaps this is something that's there for manual provisioning only [13:56] jamespage: this is all re bug 1303735 right? [13:56] <_mup_> Bug #1303735: public-address change to internal bridge post juju-upgrade [13:57] mgz, yes === hatch__ is now known as hatch [14:27] rogpeppe, jamespage: it's the blasted local provider that needs local address discovery [14:27] ah, interesting, there's a race in the test between agent config and the apiaddressupdater [14:27] rogpeppe, jamespage: I can't immediately recall whether we have lxc-ls --fancy everywhere we might need it, which *could* let us work around that [14:27] rogpeppe, jujud tests? [14:28] fwereade: yeah [14:28] rogpeppe, those things suck [14:28] ;) [14:28] fwereade: it's only a race because we don't set the APIHostPorts in the state [14:28] fwereade: so the apiaddressupdater starts and immediately assigns no addresses to the agent config [14:29] fwereade: the test only works if the APIWorker grabs the APIInfo before it does that [14:29] fwereade: we should have valid APIHostPorts in the state, then there shouldn't be a problem [14:30] rogpeppe, right, makes sense [14:30] fwereade: i wondered about having an EnvironProvider method that allows us to ask a provider for local addresses [14:31] fwereade: then the local provider could implement it, but the other providers could just return nothing [14:32] fwereade: (but it would potentially allow us to move away from using the instancepoller for some providers, if we wanted to - hazmat thinks that's a good idea) [14:33] rogpeppe, honestly I think it's a matter of tuning the instancepoller more than it is a matter of dropping it [14:33] rogpeppe, we want to keep track of instance status as well [14:33] rogpeppe, I think there's something else [14:33] rogpeppe, oh, yeah, instance networks [14:34] fwereade: regardless, having a way for providers to add locally-sources addresses to a machine seems like a reasonable idea [14:34] rogpeppe, yeah, I wouldn't object to making the MachineAddresses stuff smarter [14:34] fwereade, network ids and tags https://codereview.appspot.com/86010044 when you can take a look [14:35] fwereade: FWIW MachineAddresses isn't a great name - it doesn't really say why Machine.MachineAddresses is different from Machine.MachineAddresses... [14:35] rogpeppe, agreed, it's an awful name [14:36] fwereade: LocalAddresses? [14:36] fwereade: LocallySourcedAddresses? [14:36] rogpeppe, the semantic payload there is not ideal either [14:36] (i don't like either of those, BTW) [14:36] rogpeppe, the latter is probably best [14:37] anyone know why I'm getting this when I run juju? WARNING unknown config field "proxy-ssh" [14:37] (best of a bad bunch) [14:37] fwereade: AgentProvidedAddresses ? [14:39] natefinch, hmm, that seems odd -- it's something axw added for azure, but I'm not sure where ther error comes from [14:39] fwereade: I don't have proxy-ssh in my environments.yaml anywhere [14:43] fwereade: I think blowing away my old environments.yaml and making a new one helped [14:44] dimitern: I done lbox propose for gomaasapi, but no codereview on appspot was created, only on LP [14:44] vladk, did lbox give you any errors? [14:44] natefinch, would you take a few moments to dig into it sometime today please? many users will have old environments.yamls... [14:45] dimitern: no just print link to LP: [14:45] Proposal: https://code.launchpad.net/~klyachin/gomaasapi/101-testserver-extensions/+merge/214961 [14:45] vladk, you could try running it again [14:46] fwereade: yeah, once HA lands, I'll be able to actually work on other thigns [14:46] fwereade: which will be as soon as I can run a few tests [14:47] * fwereade cheers at natefinch [14:48] fwereade: this also seems to have fixed other problems I had been experiencing. Definitely worth investigating [14:49] (luckily I kept around the old environments.yaml [14:51] I wish we had an environment variable we could set that would effectively add --debug to every juju command line call === alexlist is now known as alexlist` [15:03] dimitern: I specified -cr explicitly: https://codereview.appspot.com/86070043 [15:03] vladk, ah, you know - gomaasapi doesn't have .lbox.check in the root dir I think [15:04] vladk, in juju-core we have .lbox containing the default args for lbox: "propose -cr -for lp:juju-core" [15:04] vladk: you can always rerun lbox propose as many times as you want, so that's fine [15:04] I always do `lbox propose -cr -v` out of long standing habit [15:07] fwereade, are you guys aiming for a 1.18.1 release prior to the end of tomorrow? [15:07] (which co-incidentally is when Final Freeze kicks in) [15:08] jamespage: we appear to have no fixed bugs in 1.18.1 [15:08] so I'd guess not. [15:09] gah, I still can't deploy stuff locally [15:09] mgz, well we have 24 hrs until tomorrow eod :-) [15:10] jamespage: yeah, but all the bugs look hard... ;_; [15:15] jamespage, it's not impossible: I'm working on the first; I will ask axw to look at the second overnight; just asked for cmars' comments re third; not sure about 4th, I'll ask an australian to take a look; 5th apears unreproable [15:15] fwereade, OK - thanks [15:16] ah hah..... lxc-ls seems broken. I bet that's my problem [15:16] fwereade, its not impossible to get a point release in after tomorrow [15:16] jamespage, sure, but I prefer to be a good citizen where practical [15:18] natefinch: i'm just proposing a branch to fix one of the machine agent panics. perhaps we could join up to move the HA stuff forward after that? [15:19] fwereade, updated https://codereview.appspot.com/85220044/ once more [15:19] rogpeppe: sure. I fixed the port problem with the initiate address and removed the testing panics you mentioned in the review. It works on amazon, but I seem to have an LXC problem on my local host, so I'm apt-getting and will reboot after to see if that fixes anything [15:19] fwereade, i really want to land that and https://codereview.appspot.com/86010044/ today if i can [15:21] this fixes a cmd/jujud test crash: https://codereview.appspot.com/86080043/ [15:21] fwereade, mgz, dimitern, natefinch: review appreciated [15:22] unfortunately it's not the one that people have been seeing on the 'bot and in CI [15:23] ha, typed in the wrong id [15:23] but I should actually review dimitern's branch, which is where I ended up :P [15:24] brb, gonna reboot now that I have upgraded, see if that fixes my lxc problems [15:29] rogpeppe, reviewed [15:29] dimitern: ta! [15:30] dimitern: in general i prefer to use a literal - if i use "nothing", then i have to check its value. i don't mind the slightly greater verbosity. [15:31] dimitern: at some point in the future i hope to see a "zero" builtin in Go that acts like nil except it represents the zero value for any type. [15:32] rogpeppe, I know you don't mind :) It's just my opinion [15:32] rogpeppe, yeah, that will be very handy [15:33] dimitern: FWIW, i think the code with naive literals reads slightly more easily - it's more directly obvious what the code is doing. [15:34] dimitern, https://codereview.appspot.com/86010044/ reviewed [15:38] fwereade, ta! [15:38] dimitern, you might not be so happy when you read it, we may need to discuss, I fear I have been unclear [15:39] rogpeppe, the first thing i'm doing when reading unfamiliar code and see a var/type/etc. i don't get, i immediately hit M-. in emacs, which invokes godef on the symbol and voila! [15:39] dimitern: i'm usually reading the code in codereview... [15:40] dimitern: but without the nothing declaration there's no need for any second look - it's immediately obvious on first scan [15:40] dimitern: which is why i prefer it more direct like that [15:42] dimitern, the other one LGTM with trivials [15:42] fwereade, thanks, still reading the last review :) [15:43] fwereade, i can't really impose restrictions on what juju deems a valid network id until it's provider specific, can I ? [15:44] rogpeppe: lucky 13? https://codereview.appspot.com/72500043/ addressed the things you mentioned in the last review, and it tests ok live on local and amazon [15:44] dimitern, bugger, ofc you're right [15:44] dimitern, TODO it with a short explanation of why we're so lax? or, hmm, ask the maas guys what their restrictions on net names are? [15:45] dimitern, and slavishly copy those? :) [15:45] fwereade, and re params.Network having both Tag and Id as you suggest - I can do that and for now make sure both match always [15:45] fwereade, (except for the tag prefix ofc) [15:46] fwereade, I'll just look at the maas source [15:47] fwereade, re tags/names/ids - we can have in state and in the api + params all three and make tags work with names and keep name=id for now [15:53] dimitern: I got LGTM from rvba. Could you give me the next task? [15:54] vladk, sorry, I have a few comments for you review [15:54] vladk, will submit in a minute [15:54] dimitern, yeah -- params.Network is just saying here's the net with this juju name, and this is what the provider calls it [15:54] dimitern, sounds like we're aligned [15:54] dimitern, thanks [15:56] fwereade, yep, thanks, will propose a bit later, if you're still here will ping you again :) [15:58] vladk, reviewed [15:59] fwereade: your comments on dimitern's proposal confuse me [15:59] mgz, ha :) [15:59] mgz, it is perfectly possible that I am missing something [15:59] natefinch: get it approved! [16:00] mgz, would you expand a little? [16:00] it seemed like we were deriving the tags from the cloud provider stuff, hence the no restrictions bar != "", rather than from being named by the use [16:00] *user [16:01] tag=what juju calls the network, id=what the cloud calls the network, name=junked due to being ambiguous [16:01] rogpeppe: it's going. I *just* merged and fixed a conflict, so it should just work. fingers crossed. [16:01] natefinch: hangout? [16:01] (and label=optional friendly id for network also from the cloud) [16:02] I guess your review is saying we *should* be providing a way for the user to specify a tag, tied to a given id [16:03] but without some juju cli network commands, I don't see how we add that [16:06] fwereade: ^does that make sense of my confusion? [16:07] mgz, tag is purely API-level, it's not what juju calls things [16:07] mgz, for now network.ProviderId == network.Name in juju (both state and api) and tags are created from names [16:07] fwereade: ugh, the name inside the tag then [16:08] having tag be the magic decoration bit is annoying [16:08] mgz, we *will* have cli network commands, and I want what we have to day to fit in with what we will need to do soon [16:09] mgz, for now, maas is the only thing that has networks, and there's a perfect mapping between provider id and user name [16:09] fwereade: so, dimitern's version seems to do that, by autonaming the... names inside the tags, and placing no restrictions on them [16:09] so they can become user-specified later [16:11] mgz, but it also conflates names and ids in several places -- and if we don't make the kind of data clear now we will have the devil of a time once we have a name that does not match an id [16:11] fwereade: well the conflating I saw was from that autonaming business... maybe there's some other bits I missed? [16:12] mgz, I wasn't clear about not just renaming Name to Id, but having both and using name for juju stuff and id for provider stuff [16:12] mgz, it seemed to me that it was using Id instead of name across the board [16:13] okay, so we just need to be picky as hell about the naming... which will still be confusing even if we are due to too many things, too few names for names... [16:17] rogpeppe: sorry, stepped out to get lunch. I can hangout now, yeah [16:17] natefinch: ok, cool [16:17] natefinch: https://plus.google.com/hangouts/_/canonical.com/juju-core?authuser=1 === JoshStrobl__ is now known as JoshStrobl [16:19] natefinch: lp:~rogpeppe/juju-core/540-enable-HA [16:21] :332 [16:22] \o/ The proposal to merge lp:~natefinch/juju-core/030-MA-HA into lp:juju-core has been updated. Status: Approved => Merged [16:24] natefinch, sweetness! [16:24] finally finally finally [16:24] natefinch: if you get time want to chat about that for a couple of min [16:25] rick_h_: how urgent is it? My day is pretty slammed [16:25] natefinch: not at all, completely when you've got time [16:25] and the time can even be 'let's catch up in vegas' [16:25] ha, ok [16:25] just a heads up gui wants to catch up on HA to see what we can/should do from our end so we can have a plan [16:28] rick_h_: good enough... I'll shoot you an email about it. We're not quite done with it, but this was a huge chunk that had taken way too long to get in [16:28] natefinch: rgr, thanks [16:39] hey guys, something in the test suite is now creating a directory and turning it into 666 [16:39] which means it is not executable or writeable [16:39] which means the test suite is failing to clean it up [16:39] a lot of: /tmp/jctest.LpP/gocheck-5577006791947779410/27/some-file [16:39] files [16:40] fwereade: is that one of your FT tests ? [16:40] jam1: there's a lot of tests that use 0666 [16:41] natefinch: sure ,but you don't normally change a *DIR* to 0666 [16:41] jam, hmm, I didn't *think* I did that, but I can't swear to it [16:41] they need 7 to be able to read the contet [16:41] jam1: oh, directory, right [16:42] jam1: Tcharm/repo_test.go has a couple of those [16:42] s/Tcharm/charm. [16:43] jam1, hmm, I do have an 0644 in there, drivebying it now [16:43] fwereade: sorry, it is 444 read only [16:43] 6 would be rw [16:43] jam, none of them I think [16:44] ah-ha! yes I do [16:44] bugger [16:44] well, it is all test number 27 ... ): [16:44] :) [16:44] not that *that* part helps [16:45] fwereade: TestRemovedCreateFailure [16:45] TestDirCreateFailure [16:45] fwereade: so I think just adding a Chmod(777) so we can clean up afterwards would be nice [16:46] jam1, yeah, deferred chmods back to 0777 [16:46] I don't think it is causing the test suite to *fail*, but it is preventing rm-rf from cleaning up after itself [16:46] jam1, yep [16:46] jam1, sorry about that [16:46] fwereade: np, I only noticed because the test suite is failing for other reasons [16:46] and that shows up in the log [16:47] dimiter's last patch just failed, some of which looks transient, and some which looks like an error message changed. [16:47] but at the *end* of that, it says "I couldn't clean up" [16:48] but hey, root can do anything it wants... [16:48] :) [16:53] trivial code review anyone? https://codereview.appspot.com/85970044 [16:54] jam1: btw, did you see EnsureMongoServer finally landed? [16:54] natefinch: I didn't. YAY \o/ [16:54] right? :) Super psyched [17:01] rogpeppe: https://codereview.appspot.com/85970044 [17:04] natefinch: lgtm [17:16] natefinch, woooot! [17:19] fwereade: thanks :) [17:20] * natefinch has to see a man about some bees. === natefinch is now known as natefinch-afk [17:24] * rogpeppe is done for the day [17:24] might make it back in later for a little bit [17:24] g'night all [17:37] * fwereade needs to go out for a while, would appreciate looks at https://codereview.appspot.com/85670046 [17:44] natefinch-afk: you forgot to set a commit message when you proposed your merge, I'll do it for you === vladk is now known as vladk|offline [18:06] proposal for LP: #1303880 up, PTAL https://codereview.appspot.com/86130043 [18:06] <_mup_> Bug #1303880: Juju 1.18.0, can not deploy local charms without series [18:11] jam1, can you take a look at ^^ [18:19] cmars, I need to update the bug and note that setting the default-series in the env is also a solution if you are opposed to typing the series when you deploy a charm [18:20] cmars, I am taking the regression tag off. Now that we know the affected users are the edge cases we talked about. I think the solution is to show the right error message [18:23] sinzui, that's a much easier fix :) please note the desired error message in the bug [18:24] I see you included lucid, but juju and charms don't run on it [18:24] cmars, I will think of a message right now [18:41] cmars, https://bugs.launchpad.net/juju-core/+bug/1303880/comments/6 [18:41] <_mup_> Bug #1303880: Juju 1.18.0, can not deploy local charms without series [18:42] though I see I meant to write "release notes" in that commment [19:44] sinzui, updated my proposal. can someone take a look, its much smaller now :) https://codereview.appspot.com/86160043 [19:45] cmars, I am not a reviewer but that size is nice. [19:45] :) [19:45] where did the US go? [19:48] perrito666, can you review cmars's branch? === mwhudson- is now known as mwhudson [21:15] anyone up for doing a review? https://codereview.appspot.com/86200043/ [21:22] sinzui: have you tried using staging-tools or their kin lately? [21:23] bac, I don't even know what they are [21:24] sinzui: you created the branch :) [21:24] https://code.launchpad.net/~ce-orange-squad/charmworld/staging-tools [21:25] bac :) I have forgotten much [21:25] oh [21:25] bac. [21:25] you probably care about the rt a report today [21:26] yes, maybe. does it involved access to canonistack post hb? [21:26] bac: I used those tools several times a week. orangesquad and juju-qa cannot use swift. Juju is unusable [21:26] bac: nova is fine [21:27] sinzui: i'm seeing canonical-sshuttle dying, not being able to connect to canonistack [21:27] it all worked the last time i tried [21:27] * sinzui tries [21:28] bac, I am connected, but what did I connect too because it looks empty [21:29] bac: I think my jenv is bad. I am told the env is not bootstrapped [21:30] sinzui: you think you're on staging? [21:56] davecheney: bugger... it seems like godeps doesn't update the hg branches [22:01] ah, no it does, it just doesn't say that it does [22:10] trivial review to just update the go.net library: https://codereview.appspot.com/86250043 [22:12] Has juju ssh to a machine number (juju ssh 0) been fixed yet for LXC? [22:15] cory_fu: what do you mean? [22:15] cory_fu: for the local provider? [22:15] cory_fu: yes it works, except for machine 0 as that is the host [22:15] unless your host actually has sshd running [22:17] cmars: how goes https://bugs.launchpad.net/juju-core/+bug/1303880 [22:17] <_mup_> Bug #1303880: Juju 1.18.0, can not deploy local charms without series [22:18] thumper, i have a proposal, PTAL, https://codereview.appspot.com/86160043/ [22:18] ack [22:19] wallyworld_: just to clarify - the branch you linked fixed that error, but wasn't the root cause, correct? just wondering if there's a fix i can/should verify [22:24] dannf: perhaps more context would help :-) [22:26] thumper: LP: #1302205 [22:27] <_mup_> Bug #1302205: manual provisioned systems stuck in pending on arm64 [22:27] dannf: it was my understanding that the fixes that wallyworld had were to fix the root cause [22:27] dannf: are you still having issues? [22:27] I know that wallyworld was trying to test yesterday [22:27] but not sure on the final progress [22:27] dannf: hi, i just logged on again after networking issues so can't see that backscroll, can i help? [22:28] wallyworld: yeah - just curious if the branches you linked were root cause - i.e., if it is worth me retesting w/ them [22:29] dannf: i committed fixes, but had trouble testing because i have to run from trunk to test and so can't use simplestreams to get the tools and building juju from source on the arm vms just hung [22:29] so i couldn't get tools built to test with [22:30] wallyworld: did you try building on the nova host? i've built there many times w/o a problem [22:30] i installed gcc-go and couldn't get outgoing access to lunchpad or github so just copied my source tsrball acros [22:30] yeah, i think i build on the nova host [22:31] i can try again [22:31] actually [22:31] i could get outgoing access via wget [22:31] but go get just hung [22:31] so i couldn't get the juju source in the normal way [22:31] via vcs [22:31] though building in the vms *should* work - if not, probably a bug [22:32] a fairly trivial review if anyone wants to take a look: https://codereview.appspot.com/85600044 [22:32] wallyworld: we can ask is to open access for us to certain things. surprised lp access was blocked [22:33] dannf: i could wget to launchpad but "go get launchpad.net/juju-core" failed [22:33] or hung [22:33] ah - go get... never used that before [22:33] thumper, thanks [22:33] so i just copied across the source [22:33] i'll investigate that and at least get a bug filed if neeeded [22:33] dannf: go get uses bzr behind the scenes [22:33] dannf: i have a meeting now but will ping back when done [22:34] ack [22:35] sinzui, fix for LP: #1303880 is landing in trunk. do you need it proposed to any branches? [22:35] <_mup_> Bug #1303880: Juju 1.18.0, can not deploy local charms without series [22:36] cmars, yes please lp:juju-core/1.18 [22:37] wallyworld: seems to be working for me. but slow. and no output. i just see the directory growing in size [22:40] wallyworld: trivial review for you, although since it is needed on two branches, lbox isn't good at handling it [22:41] https://code.launchpad.net/~thumper/juju-core/update-websocket-lib/+merge/215057 [22:41] and https://code.launchpad.net/~thumper/juju-core/update-websocket-lib/+merge/215046 [22:43] thumper: otp, will look soon [22:43] ack [22:55] cmars: I have approved the other branch [22:55] cmars: although I did realise that there aren't any tests for the new error message [22:55] cmars: is it hard to add one? [22:56] cmars: also, lbox doesn't like submitting the same branch to multiple targets [22:56] cmars: it is a bit too dumb [23:13] thumper, i'll propose a test case for deploying local without series. might be after dinner [23:13] cmars: ack [23:18] wallyworld: and go get seems to have completed (/home/ubuntu/dannf/go) [23:19] dannf: great :-) still in meeting, will check back soon [23:19] wallyworld: np; i need to start up the grill, so responses will be latent [23:43] arosales: hazmat ping [23:45] arosales: hazmat do you have time for a quick G+ to talk about the demo [23:48] davecheney: hello [23:48] arosales: hazmat lets take this to #eco [23:49] ok