waigani_ | davecheney: is there a particular bug you'd like me to look at next or should I just grab one from the list? | 00:00 |
---|---|---|
sinzui | wallyworld_, thumper: does any of this look familiar and do you have an advice? http://pastebin.ubuntu.com/7224152/ | 00:11 |
wallyworld_ | sinzui: and juju works fine on hp cloud? | 00:13 |
sinzui | wallyworld_, yes | 00:16 |
wallyworld_ | hmmm | 00:16 |
sinzui | wallyworld_, I just confirmed that both configs use the same keys | 00:17 |
wallyworld_ | sinzui: in the past, where container permissions have been wrong, the container has been created but subsequent reads failed. here we can't even create the container | 00:18 |
wallyworld_ | it does seem to imply a canonistack swift issue | 00:18 |
sinzui | wallyworld_, noted. and in the past creation fails were race conditions. this say auth failure | 00:18 |
wallyworld_ | sinzui: have you tried the other region? | 00:19 |
sinzui | wallyworld_, no | 00:20 |
wallyworld_ | sometimes that can work | 00:20 |
wallyworld_ | lyc01 vs lyc02 | 00:20 |
sinzui | wallyworld_, but I just checked the canonistack dashboard for /both/ accounts. The container view shows an error | 00:20 |
wallyworld_ | hmmm, ok | 00:21 |
wallyworld_ | and no joy asking in #is? | 00:21 |
sinzui | wallyworld_, they officially defer to canonical support. I opened a ticket there 10 hours ago and no one will talk to me | 00:22 |
wallyworld_ | :-( | 00:22 |
sinzui | I am tempted to send an email notifying canonistack that it will be desupported. Without working accounts, I cannot deliver the next juju to it | 00:23 |
wallyworld_ | agreed | 00:24 |
wallyworld_ | we do need an openstack deployment to test against though :-( besides hp cloud | 00:24 |
wallyworld_ | thumper: this fixes bug 1304132 and also removes the log noise from the critical bug alexis emailed about https://codereview.appspot.com/85770043 | 00:25 |
_mup_ | Bug #1304132: nasty worrying output using local provider <ppc64el> <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1304132> | 00:25 |
davecheney | arosales: hazmat email sent | 00:53 |
davecheney | with ppc segfault informatoin | 00:53 |
* arosales looks | 00:54 | |
hazmat | davecheney, k.. trying fresh on new ppc8.. orange box #2 vanquished | 00:55 |
davecheney | hazmat: that would be a good data point, i only have access to wolfe and winton, which are power7 | 00:55 |
davecheney | hazmat: if you see panics on your power8 host, you should revert to that kernel I specified | 00:56 |
hazmat | davecheney, my p7 host has been good wolfe-02.. 3.13-08 | 00:56 |
hazmat | trying on stilton-5 | 00:57 |
davecheney | hazmat: yes, that is the working kernel | 00:57 |
davecheney | it is pre the switch to 64k | 00:57 |
davecheney | pages | 00:57 |
arosales | dfc: so 3.13.0-08.28 is what we need correct? | 00:57 |
arosales | hazmat: are you running -08.28? | 00:57 |
davecheney | the .28 isn't the important bit | 00:57 |
davecheney | the -08 -18, -23 is | 00:58 |
davecheney | i've included a link to the old kernel in the archive | 00:58 |
hazmat | one moment.. switching tracks off maas | 00:58 |
arosales | dfc, gotcha, avoid -18 and -23 | 01:01 |
hazmat | davecheney, so my p7 is -> 3.13.0-8-generic | 01:01 |
hazmat | davecheney, my p8 is -> 3.13.0-23-generic | 01:01 |
davecheney | hazmat: right | 01:06 |
hazmat | davecheney, so.. theory being that's an okay version? .. i'm gonna test and find out either way | 01:12 |
davecheney | hazmat: i've tested -8, -18 and -23 | 01:13 |
davecheney | only -8, which was pre the 4k page switch can run juju stabally | 01:14 |
davecheney | the other kernels radomly kill juju proceses with SEGVs | 01:14 |
hazmat | solid | 01:14 |
hazmat | davecheney, cool, thanks for tracking that down.. apparently i lucked into having at least one good demo p machine | 01:15 |
davecheney | hazmat: yeah me too | 01:15 |
davecheney | winton-02 is ooooooooooold | 01:15 |
davecheney | so it was running a very old kernel | 01:15 |
davecheney | but thumper bodie and timv hit problems | 01:15 |
mwhudson | oh man, 64k pages kill the gccgo runtime? | 01:17 |
mwhudson | somehow that's easy to believe | 01:17 |
davecheney | mwhudson: yup | 01:18 |
davecheney | mwhudson: tell me your thoughts | 01:18 |
davecheney | its signal related | 01:18 |
davecheney | somehow an invlaid signal is generated, or created, or just pops into existance | 01:19 |
davecheney | the powerpc/kernel/signal_64.c doesn't know how to handle it, so It calls force_sigsegv | 01:19 |
davecheney | and the userland thinks it has hit a nil pointer exception and panics | 01:19 |
mwhudson | davecheney: well i think malloc.goc has a #define PAGE_BITS 12 in it | 01:19 |
mwhudson | o | 01:19 |
mwhudson | h | 01:19 |
mwhudson | that sounds pretty messed up | 01:19 |
davecheney | mwhudson: but why should that matter | 01:19 |
davecheney | 12 is < 16 | 01:20 |
mwhudson | davecheney: dunno | 01:20 |
davecheney | but is a multiple | 01:20 |
davecheney | all that happens is if you call mmap(0, 4096) you get a 64k allocation | 01:20 |
davecheney | mwhudson: i'm logging this all in a bug now | 01:20 |
davecheney | then i have a juju test to fix | 01:20 |
davecheney | then i'll try to create a smaller reproduction case | 01:21 |
davecheney | there is additional debugging in that file | 01:21 |
davecheney | but it appears to be turned off in this build | 01:21 |
davecheney | maybe spinning a new kernel with it enabled is the next step | 01:21 |
thumper | wallyworld_: back from the gym now | 01:22 |
mwhudson | davecheney: an invalid signal number is generated? | 01:22 |
mwhudson | wow | 01:22 |
mwhudson | what is the userspace doing when this signal arrives? | 01:22 |
davecheney | chlling | 01:22 |
wallyworld_ | thumper: ok. i have 2 fixes for that critical bug https://codereview.appspot.com/85770043 and https://codereview.appspot.com/85750045 | 01:22 |
mwhudson | so it's an async signal? | 01:23 |
wallyworld_ | not sure if more work is needed | 01:23 |
davecheney | [18519.444748] jujud[19277]: bad frame in setup_rt_frame: | 01:23 |
davecheney | 0000000000000000 nip 0000000000000000 lr 0000000000000000 | 01:23 |
davecheney | [18519.673632] init: juju-agent-ubuntu-local main process (19220) | 01:23 |
davecheney | killed by SEGV signal | 01:23 |
davecheney | [18519.673651] init: juju-agent-ubuntu-local main process ended, respawning | 01:23 |
thumper | wallyworld_: so what what going wrong? | 01:23 |
wallyworld_ | thumper: i'll get those landed and will have to either test or ask axw if there's anything else obvious that needs looking at | 01:23 |
wallyworld_ | thumper: 2 things 1. instance poller noise due to it not ignoring unprovisioned machines | 01:24 |
thumper | +1 for that | 01:24 |
wallyworld_ | 2. bad schema def for storage-port config attr on manual provider causing provisioner startup to fail | 01:24 |
wallyworld_ | due to json serialisation issue | 01:24 |
wallyworld_ | float64 vs int and all that | 01:24 |
wallyworld_ | so those 2 fixes i did just by looking at logs | 01:25 |
wallyworld_ | i had a look at the code to see if i could relate the fixes to the actual observed issue, but didn't get far enough | 01:25 |
wallyworld_ | so i figured we could fire up some arm instances and test and/or ask axw for input when he comes online | 01:26 |
axw | I am online | 01:26 |
axw | what input do you need? | 01:26 |
wallyworld_ | \o/ | 01:26 |
axw | I have LGTM'd your two fixes | 01:26 |
wallyworld_ | axw: bug 1302205 | 01:26 |
_mup_ | Bug #1302205: manual provisioned systems stuck in pending on arm64 <add-machine> <hs-arm64> <manual-provider> <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1302205> | 01:26 |
wallyworld_ | ok :-) | 01:26 |
wallyworld_ | i am not sure if my fixes are sufficient | 01:26 |
wallyworld_ | they are needed, but is there more to be done | 01:27 |
hazmat | davecheney, confirmed btw re 23.. panic while doing nothing detected in the log | 01:27 |
axw | hrm | 01:27 |
wallyworld_ | have you seen similar issues when developing the manual provider? | 01:27 |
axw | wallyworld_: nope | 01:27 |
wallyworld_ | or maybe we just need to test with the fixes | 01:28 |
wallyworld_ | could yet be an arm issue i guess | 01:28 |
mwhudson | davecheney: can you run my test program from https://sourceware.org/bugzilla/show_bug.cgi?id=16629 ? | 01:28 |
axw | looking at the logs now... | 01:28 |
davecheney | hazmat: i've even seen /usr/bin/go panic while running tests | 01:28 |
davecheney | https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304754 | 01:28 |
_mup_ | Bug #1304754: gccgo compiled binaries are killed by SEGV on 64k ppc64el kernels <linux (Ubuntu):New> <https://launchpad.net/bugs/1304754> | 01:28 |
wallyworld_ | certainly that storage-port issue is pretty fatal | 01:28 |
davecheney | arosales: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1304754 | 01:28 |
_mup_ | Bug #1304754: gccgo compiled binaries are killed by SEGV on 64k ppc64el kernels <linux (Ubuntu):New> <https://launchpad.net/bugs/1304754> | 01:28 |
axw | wallyworld_: ah, the "close of closed channel" is something rogpeppe brought up last night | 01:28 |
axw | there's a bug in cmd/jujud | 01:29 |
axw | not sure if he fixed it yet... | 01:29 |
mwhudson | (although i can't see why 64k pages would matter here) | 01:29 |
wallyworld_ | axw: is that in the machine 0 log attached to the bug? | 01:29 |
axw | wallyworld_: yeah | 01:29 |
wallyworld_ | let me look | 01:29 |
axw | wallyworld_: https://codereview.appspot.com/85450044/ | 01:29 |
axw | wallyworld_: I broke the machine agent when I allowed upgrade steps to get a state connection | 01:29 |
mwhudson | davecheney: also, it would sure be nice to follow the execution of handle_rt_signal64 with gdb | 01:30 |
wallyworld_ | axw: does that mp fix the close channel issue? | 01:30 |
davecheney | mwhudson: way above my pay grade | 01:31 |
davecheney | i'm not even qualified for pointer arythmetic | 01:31 |
axw | wallyworld_: yeah | 01:31 |
arosales | davecheney: looks like the latest in the archies is -23 | 01:31 |
davecheney | yup | 01:31 |
wallyworld_ | axw: great. so i'll land my branches and we can re-test i guess | 01:31 |
axw | wallyworld_: sgtm | 01:32 |
axw | wallyworld_: it would be nice to silence "cannot get instance info for instance "manual:10.0.128.7": no instances found" too, but it's not critical | 01:32 |
wallyworld_ | axw: i haven't look into that on yet - what's the cause? | 01:33 |
mwhudson | i wonder if there is an arm64 kernel with 64k pages i can try with | 01:33 |
axw | wallyworld_: manually provisioned machines are not managed by the provider - they just should not be polled | 01:33 |
davecheney | mwhudson: thta would be a good test | 01:33 |
davecheney | i tried to test using gccgo/amd64 | 01:34 |
wallyworld_ | axw: in that case i'll add some code to my first branch | 01:34 |
davecheney | but lxc was all fucked on amd64 yesterday | 01:34 |
wallyworld_ | do both fixes in one go | 01:34 |
axw | wallyworld_: there's a state.Machine.IsManual method that'll help there | 01:34 |
mwhudson | i don't know anything about legacy architectures like amd64 | 01:34 |
arosales | davecheney: do you have link hand to the matching initrd to the -28 .deb you pointed at? | 01:38 |
hazmat | arosales, so i removed the other kernels .. sudo upgrade-grub.. currently doing shutdown -r now .. to see if it worked ;-) | 01:41 |
hazmat | arosales_, removed via pkgs that is | 01:41 |
=== arosales_ is now known as arosales | ||
jcastro | where's this -28 kernel at, I don't see it in proposed? | 01:42 |
hazmat | jcastro, its on the machines that barf.. ls /boot | 01:42 |
thumper | jcastro: o/ | 01:42 |
thumper | jcastro: I have a version of debug-log on my machine that works with the local provider | 01:42 |
hazmat | arosales, don't do what i just suggested it.. it doesn't like that ;-) | 01:42 |
jcastro | thumper, hey! we made a plugin, heh | 01:42 |
thumper | yeah, but it doesn't do filtering | 01:43 |
* thumper guesses | 01:43 | |
jcastro | oooh | 01:43 |
hazmat | or replay or exclude/include by unit/machine or channel | 01:44 |
hazmat | jcastro, we have parity.. | 01:44 |
* hazmat sheds a tear | 01:45 | |
hazmat | for debug-log | 01:45 |
jcastro | heh | 01:45 |
jcastro | thumper, also, one thing we should talk about | 01:45 |
davecheney | arosales: not -28 | 01:45 |
jcastro | is the debug-hooks <-> retry --resolved thing makes me cry | 01:45 |
arosales | hazmat, ack :-) | 01:45 |
davecheney | you want uname -a | 01:45 |
davecheney | Linux winton-02 3.13.0-8-generic #28-Ubuntu SMP Mon Feb 17 08:22:39 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux | 01:45 |
jcastro | oh _8_ | 01:45 |
davecheney | wget https://launchpad.net/ubuntu/+source/linux/3.13.0-8.28/+build/5602341/+files/linux-image-3.13.0-8-generic_3.13.0-8.28_ppc64el.deb | 01:45 |
jcastro | not 28 | 01:45 |
arosales | davecheney, so the _only_ fix is to revert to -08? | 01:46 |
davecheney | jcastro: i'll forward you the email | 01:46 |
jcastro | davecheney, do I need an initrd for that? | 01:46 |
hazmat | arosales, atm yes | 01:46 |
davecheney | arosales: the only workaround I have at this time | 01:46 |
davecheney | jcastro: no | 01:46 |
jcastro | davecheney, thanks! | 01:46 |
thumper | jcastro: I don't understand | 01:46 |
jcastro | so I do debug-hooks | 01:46 |
jcastro | and in order to be able to fire off a hook to debug it | 01:47 |
thumper | also, know this | 01:47 |
thumper | I can only make one person happy at a time | 01:47 |
jcastro | I need to open a new terminal and do resolved --retry | 01:47 |
thumper | it isn't your turn | 01:47 |
thumper | it is hazmat's | 01:47 |
jcastro | <3 | 01:47 |
jcastro | local log will keep me happy. :D | 01:47 |
thumper | yeah, I'm open to looking to fix it... | 01:48 |
arosales | davecheney, I am confused in your email you state, "workarounds: you should install this kernel | 01:48 |
arosales | wget https://launchpad.net/ubuntu/+source/linux/3.13.0-8.28/+build/5602341/+files/linux-image-3.13.0-8-generic_3.13.0-8.28_ppc64el.deb" | 01:48 |
arosales | davecheney, ah I should have said -8.28 not -28 | 01:49 |
arosales | davecheney, gotcha | 01:49 |
arosales | with is a revert | 01:49 |
hazmat | thumper, and all of cts :-) | 01:49 |
arosales | sorry long day | 01:49 |
* arosales better just grab some dinner | 01:49 | |
davecheney | arosales: yup, we're also lucky that -28.8 isn't a think | 01:50 |
davecheney | both those numbers appear to be increasing | 01:51 |
davecheney | s/think/thing | 01:51 |
davecheney | c.Assert(err, gc.IsNil) | 02:08 |
davecheney | ... value *mgo.QueryError = &mgo.QueryError{Code:16149, Message:"exception: cannot run map reduce without the js engine", Assertion:false} ("exception: cannot run map reduce without the js engine") | 02:08 |
davecheney | store tests are failing again | 02:08 |
davecheney | i thought that the store tests wouldn't run unless we passed a flag ? | 02:09 |
davecheney | cmars: didn't you fix this ? | 02:09 |
cmars | davecheney, thought so, yes. is this trunk or 1.18? | 02:09 |
davecheney | cmars: trunk | 02:09 |
cmars | hmm | 02:09 |
cmars | davecheney, which test is it? is there a file & line #? | 02:10 |
davecheney | cmars: please hold | 02:10 |
davecheney | go test launchpad.net/juju-core/store 2>&1 | pastebinit | 02:11 |
davecheney | Failed to contact the server: [Errno socket error] [Errno socket error] timed out | 02:11 |
davecheney | oh for fucks sake | 02:11 |
davecheney | does nothing work today ? | 02:11 |
davecheney | thumper: what is the env var to lower logging ? | 02:13 |
davecheney | JUJU_LOG= ? | 02:13 |
thumper | here's mine: JUJU_LOGGING_CONFIG=<root>=INFO; juju.container=TRACE; juju.provisioner=TRACE | 02:14 |
thumper | that is ready by bootstrap | 02:14 |
davecheney | ta | 02:14 |
davecheney | hnn, that isn't it | 02:14 |
davecheney | rog had a different one | 02:14 |
davecheney | a flag to testing | 02:14 |
davecheney | -juju.log WARNING | 02:15 |
davecheney | cmars: http://paste.ubuntu.com/7224466/ | 02:17 |
cmars | ok, thanks. looking | 02:17 |
davecheney | ta | 02:19 |
cmars | davecheney, i thought it had landed, but it hasn't | 02:22 |
cmars | https://code.launchpad.net/~cmars/juju-core/cs-mongo-tests/+merge/213563 | 02:22 |
cmars | i think we can land it, if CI will support running the store tests with full mongodb tests | 02:23 |
cmars | davecheney, what do you think? will you take that as an action item? | 02:25 |
davecheney | cmars: no | 02:25 |
cmars | ok :) | 02:25 |
davecheney | i cannot take that as an action item | 02:25 |
cmars | i'll follow up w/curtis tmw then | 02:26 |
davecheney | cool | 02:26 |
cmars | cheers | 02:26 |
wallyworld_ | arosales: do you have any doc or otherwise that tells me what i need to do get access to some arm vms to test a fix for that bug | 03:19 |
dannf | wallyworld_: i can help w/ that | 03:27 |
wallyworld_ | \o/ | 03:27 |
dannf | wallyworld_: do you have an account on batuan? | 03:27 |
dannf | that's the gateway into our network - if you don't, you can ask for one in #is | 03:27 |
wallyworld_ | yes, since i have logged onto power vms previously | 03:27 |
dannf | sweet | 03:27 |
wallyworld_ | i think you know it's bug 1302205 | 03:28 |
_mup_ | Bug #1302205: manual provisioned systems stuck in pending on arm64 <add-machine> <hs-arm64> <manual-provider> <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1302205> | 03:28 |
dannf | right | 03:28 |
wallyworld_ | i don't think it's specifcally an arm issue | 03:28 |
wallyworld_ | but want to test on arm anyway | 03:28 |
dannf | yep - just a sec... wonder if someone just took our host down | 03:28 |
wallyworld_ | there have been 3 branches which landed today or last night which should hopefully fix it | 03:28 |
dannf | wallyworld_: yeah, looks like its in use debugging an unrelated issue. i'll send you an e-mail with access info and ping you (or have someone ping you) when its ready | 03:34 |
wallyworld_ | great thanks :-) | 03:34 |
thumper | davecheney: how can I make sure that the bufio.Scanner doesn't consume too much? | 03:36 |
thumper | davecheney: I have an io.ReadCloser | 03:36 |
thumper | davecheney: and I want to read up to the first new line, and no more | 03:36 |
thumper | the Scanner when I call scan reads 4k | 03:36 |
thumper | which consumes way more than I want | 03:36 |
dannf | wallyworld_: e-mail sent - system isn't quite ready yet, stay tuned.. | 03:45 |
wallyworld_ | ok | 03:45 |
arosales | dannf, thanks re wallyworld arm access question :-) | 03:51 |
dannf | np, very happy to have help w/ it :) two days and i've just managed to learn how to kinda read go :/ | 03:53 |
dannf | wallyworld_: ok, have at it | 04:06 |
wallyworld_ | dannf: awesome, firing up ssh now | 04:06 |
wallyworld_ | dannf: "ssh 10.229.41.200" should work right? | 04:07 |
dannf | it should, but its not working for me either - lemme ask | 04:07 |
thumper | is that because the ssh config can't work out where to proxy through? | 04:40 |
jcw4 | thumper: fyi I filed a merge request for the changes you suggested yesterday about isolating the git tests | 04:42 |
thumper | jcw4: awesome, I saw that in my inbox | 04:42 |
thumper | jcw4: I'll take a look once I submit this change :-) | 04:42 |
=== vladk|offline is now known as vladk | ||
jcw4 | thumper: thanks; no rush... just excited about contributing ;) | 04:43 |
thumper | wallyworld_: fyi instance type constraint branch has conflicts | 04:46 |
wallyworld_ | yes it does, fixed loclly | 04:46 |
wallyworld_ | still wip | 04:46 |
thumper | jcw4: did you push your changes? | 04:47 |
jcw4 | yes; to a new branch | 04:47 |
jcw4 | the last one was too messy | 04:47 |
thumper | :-) | 04:47 |
thumper | jcw4: ok, there is a resubmit option on the RHS of the merge proposal page, that includes a "start over" | 04:48 |
thumper | which would have marked the old as superseded | 04:48 |
thumper | but that's OK, I'll just reject the old one. | 04:48 |
jcw4 | I see. Thanks | 04:48 |
thumper | jcw4: what happens when you change the LC_ALL to "C" ? | 04:50 |
jcw4 | I was planning on doing that after running all the tests without it. | 04:50 |
jcw4 | after they all passed I was too excited and forgot | 04:50 |
jcw4 | testing now | 04:50 |
jcw4 | thumper: worker/uniter/charm/... tests passed | 04:52 |
jcw4 | I'll push that change too? | 04:52 |
thumper | move that patch env into the base git test suite | 04:52 |
thumper | with the other env patches | 04:52 |
thumper | jcw4: then you can delete the SetUpTest for GitDirSuite | 04:53 |
jcw4 | cool, right | 04:53 |
thumper | as it won't be doing anything | 04:53 |
thumper | then yes, push that | 04:53 |
jcw4 | thumper: the LoggingSuite TearDownTest(c) needs to be called in the GitSuite TearDownTest? I'd add that back in if necessary | 05:03 |
thumper | jcw4: if the only line of the tear down is to upcall the tear down, then you can just delete it | 05:03 |
jcw4 | thumper: okay; that's all there is. tx | 05:03 |
* thumper EODs | 05:15 | |
axw | wallyworld_: I've responded to your comments, but I'm now looking at HA | 05:20 |
wallyworld_ | np | 05:20 |
wallyworld_ | just wanted to get some thoughts down | 05:21 |
wallyworld_ | i'm stuck on otherthings also | 05:21 |
yaguang | hi all, I am using 1.16.6 stable juju-core to bootstrap an Openstack Havana cloud, but fails with can't find index.json | 05:33 |
yaguang | it seems that juju is trying to find the meta file in the path /streams/ but swift has tools/streams/ | 05:34 |
dimitern | morning all | 05:41 |
dimitern | fwereade, can you take a look at this please ? https://codereview.appspot.com/85220044/ | 05:57 |
bigjools | dimitern: howdy! How's the vlan work coming along? | 06:51 |
dimitern | hey bigjools | 06:51 |
dimitern | bigjools, i'm in the final steps - cloudinit scripts that bring up network interfaces | 06:52 |
dimitern | bigjools, vladk is working on a few extensions to gomaasapi to allow us to unit test the new api calls | 06:53 |
dimitern | bigjools, capabilities; lshw dump of a node; networks?op=list_connected_macs | 06:53 |
bigjools | dimitern: nice, all going to make it for the release? And any issues with maas I need to know about? | 06:54 |
dimitern | bigjools, but all these were live tested on my local maas using daily builds ppa | 06:54 |
dimitern | bigjools, we're aiming for feature completeness by friday, but should be ready before that | 06:55 |
bigjools | excellent | 06:55 |
dimitern | bigjools, bug 1303617 hit me after a recent upgrade and i can no longer use the fast installer (fails at boot and doesn't recover), which is slow and tedious | 06:56 |
_mup_ | Bug #1303617: pc-grub install path broken in curtin <landscape> <curtin:Fix Released by smoser> <curtin (Ubuntu):Fix Released> <https://launchpad.net/bugs/1303617> | 06:56 |
bigjools | dimitern: weird, I did a fast install today and it was fine | 06:56 |
dimitern | hmm Fix Released - i'll try it now | 06:56 |
dimitern | bigjools, we have a few wishlist items for the maas api | 06:57 |
dimitern | bigjools, like the ability to see networks + connected macs in one place (either in GET node/system_id or in GET networks/(all)) | 06:58 |
bigjools | dimitern: please file bugs | 06:59 |
dimitern | bigjools, will do | 06:59 |
bigjools | I will triage them as wishlist and we'll put them on the stack | 06:59 |
dimitern | bigjools, otherwise now we need to do several api calls at startinstance time to get all we need | 06:59 |
bigjools | ok we can optimise that | 06:59 |
jam | cmars: I'm not sure why your test failed, but it would seem that we could tell the landing bot to always run the mongojs tests | 07:10 |
jam | cmars: though because of that, I'd actually rather have the CI tests disable it, rather than disabled by default. | 07:10 |
jam | experience has shown that ENV vars play nicer with go test than flags, because flags are only valid per package, and "go test ./..." tries to pass all flags to all packages. | 07:11 |
dimitern | bigjools, filed bug 1304857 | 07:11 |
_mup_ | Bug #1304857: API should report networks and connected macs in the response of a single node <api> <MAAS:New> <https://launchpad.net/bugs/1304857> | 07:11 |
dimitern | jam, fancy a review ? https://codereview.appspot.com/85220044/ | 07:12 |
dimitern | bigjools, another question re gomaasapi - how do you feel about adding high level wrappers around common APIs? like having AcquireNode method that the provider calls, rather than constructing a URL internally? Other similar examples are ListNetworks, ListNodes, etc.? | 07:15 |
=== vladk is now known as vladk|offline | ||
bigjools | dimitern: I can't remember much about gomaasapi | 07:20 |
dimitern | bigjools, :) well, that was just a thought | 07:22 |
bigjools | dimitern: one of the other guys will have an opinion I'm sure | 07:22 |
dimitern | bigjools, we'll ask them for reviews when changes are proposed | 07:23 |
=== BjornT_ is now known as BjornT | ||
fwereade | dimitern, I'm getting progressively more nervous about NetworkName vs making it clear that it's a provider-specific id like instance.Id | 07:30 |
dimitern | fwereade, can't we say yes, it's provider specific, but it's also used by juju to identify the network internally? | 07:31 |
fwereade | dimitern, it will be, indeed | 07:31 |
fwereade | dimitern, but we're going to want network names as well | 07:31 |
dimitern | fwereade, ok, how are we going to make it clearer? | 07:31 |
fwereade | dimitern, when openstack gives us network abcdef638746328756865198, and we call that NetworkName, what field will we use for the "private" name users will want to use | 07:32 |
fwereade | dimitern, or "my_network" or whatever | 07:32 |
dimitern | fwereade, openstack has labels for networks just the same | 07:33 |
fwereade | dimitern, and so does every provider ever? | 07:33 |
fwereade | dimitern, that's quite the prediction ;p | 07:33 |
dimitern | fwereade, i can't say that :P | 07:33 |
dimitern | fwereade, so tell me how to alleviate your nervousness about it? :) | 07:34 |
fwereade | dimitern, call it NetworkId :) | 07:34 |
fwereade | dimitern, you know -- we have machine ids, and instance ids, and they are not the same | 07:35 |
fwereade | dimitern, (and machine ids are machine names really, but hysterical raisins) | 07:35 |
dimitern | fwereade, so, basically change it everywhere from NetworkName to NetworkId ? | 07:36 |
dimitern | fwereade, I need a follow-up for that | 07:36 |
fwereade | dimitern, I'm more concerned about the API | 07:36 |
dimitern | fwereade, you're thinking about network tags? | 07:37 |
fwereade | dimitern, and that the terminology that's hard to change be consistent with what we expect to do | 07:37 |
fwereade | dimitern, well, that was my first thought | 07:37 |
fwereade | dimitern, but then I realised that converting these names into tags would be completely wrong | 07:37 |
rogpeppe | mornin' all | 07:37 |
dimitern | morning rogpeppe | 07:37 |
fwereade | dimitern, because they're provider vocabulary, not juju vocabulary | 07:37 |
rogpeppe | dimitern: hiya | 07:37 |
dimitern | fwereade, ok, so then what? | 07:38 |
dimitern | fwereade, i'm trying to follow but can't see what's needed | 07:38 |
rogpeppe | axw: ping | 07:38 |
fwereade | dimitern, although -- wait, don't you use tags in the client api? I think we should... | 07:38 |
axw | rogpeppe: pong | 07:38 |
dimitern | fwereade, we use tags everywhere in the api | 07:39 |
dimitern | fwereade, but not for networks | 07:39 |
rogpeppe | axw: about removing JobManageEnviron: | 07:39 |
* fwereade grumbles | 07:39 | |
rogpeppe | axw: the reason we don't want to remove JobManageEnviron from a voting state server is that when a machine hasn't got JobManageEnviron we allow it to be removed | 07:39 |
fwereade | dimitern, we don't identify machines in the client API by provider-specific instance id, and we shouldn't identify networks that way either | 07:40 |
rogpeppe | axw: and if that happens we could break the invariant that we only ever have an odd number of voting state servers | 07:40 |
rogpeppe | axw: or rather, and odd number of state servers that *want* to vote | 07:40 |
dimitern | fwereade, i agree, but the only way we can deal with networks so far is if we get them from the provider | 07:40 |
dimitern | fwereade, at provisioning time | 07:40 |
rogpeppe | s/and odd/an odd/ | 07:40 |
fwereade | dimitern, we *can* impose a requirement that network names match provider ids exactly, this is MVP after all | 07:41 |
dimitern | fwereade, i guess you're suggesting to require the user to add any networks to juju before being able to deploy with them | 07:41 |
dimitern | fwereade, and i can see how this is the way we wanna go eventually, but not for nwo | 07:42 |
dimitern | now | 07:42 |
fwereade | dimitern, well, mid-term, yes -- I'd expect --networks params to be validated | 07:42 |
axw | rogpeppe: okay | 07:42 |
axw | rogpeppe: lots to take in here, still figuring out how all the voting bits work. | 07:42 |
fwereade | dimitern, short-term, I want us to be clear on the distinction between juju vocabulary over the client API (tags) and provider vocabulary over the internal API (network ids) | 07:43 |
dimitern | fwereade, so let's make a plan - i land this last CL and make another one for s/NetworkName/NetworkId/ throughout, and then do the cloudinit stuff | 07:43 |
rogpeppe | axw: thanks for taking a look. feel free to ask about whatever doesn't seem to make sense. | 07:43 |
fwereade | dimitern, internally I'm fine saying that network name == network id (for mvp at least) | 07:43 |
fwereade | dimitern, am I helping? | 07:43 |
dimitern | fwereade, how can we be clear about this? in the docs? where? | 07:43 |
rogpeppe | fwereade: it would be nice if network ids were distinguishable from machine ids and unit ids (which are both currently distinguishable from each other) | 07:43 |
rogpeppe | fwereade: and service names, of course | 07:44 |
dimitern | fwereade, yeah, that is how it's gonna be for now - we call it networkId, but we mean maas-specific name | 07:44 |
fwereade | rogpeppe, not really gonna happen, that's why we have tags | 07:44 |
rogpeppe | fwereade: yeah, fair enough | 07:44 |
fwereade | rogpeppe, although *probably* units/machines will be safe, but services won't ;p | 07:44 |
rogpeppe | fwereade: yeah | 07:45 |
fwereade | dimitern, so, in the Setprovisioned bits: it's a provider-specific network id, not a tag | 07:45 |
fwereade | dimitern, in the Client-facing IncludeNetworks/ExcludeNetworks bits, we should be using tags | 07:46 |
dimitern | fwereade, yes, you mean better doc comment | 07:46 |
fwereade | dimitern, internally we can just strip off the "network-" prefix and keep going mapping 1:1 with provider-specific network ids | 07:46 |
dimitern | fwereade, so juju deploy --networks=net1,net2 which goes over the API as network-net1, network-net2 | 07:46 |
dimitern | fwereade, and for include/excludeNetworks in state we still use the ids, not tags as usual | 07:47 |
fwereade | dimitern, yeah, exactly -- and for now the stripped names have to map to intrnal provider ids, but we keep them distinct so it doesn't become confusing when we have to change over later | 07:48 |
dimitern | fwereade, ok, got it | 07:48 |
fwereade | dimitern, inside state you can even stick with a single field in the document doing both duties... but be very clear that the _id field is for the *juju* name, not the provider name | 07:48 |
dimitern | fwereade, better comments, ok | 07:49 |
fwereade | dimitern, brilliant, thanks | 07:49 |
dimitern | fwereade, i'll try to remember all that :) will propose it some time later today | 07:49 |
fwereade | dimitern, I'm going through the CL now in case you hadn't realised ;p -- some more naming quibbles but otherwise looking sound I think | 07:49 |
dimitern | fwereade, great! | 07:50 |
fwereade | dimitern, reviewed | 07:52 |
fwereade | rogpeppe, btw, do you think you might have a spare cycle to look at https://bugs.launchpad.net/bugs/1303735 today? it looks a bit like something you might know about | 07:53 |
_mup_ | Bug #1303735: private-address change to internal bridge post juju-upgrade <openstack-provider> <juju-core:Triaged> <juju-core 1.18:Triaged> <https://launchpad.net/bugs/1303735> | 07:53 |
rogpeppe | fwereade: looking | 07:53 |
fwereade | axw, did you see https://bugs.launchpad.net/bugs/1303583 ? | 07:53 |
_mup_ | Bug #1303583: provider/azure: new test failure <gccgo> <juju-core:Triaged> <https://launchpad.net/bugs/1303583> | 07:53 |
axw | fwereade: I have, but haven't had time to look into it yet | 07:54 |
fwereade | axw, np, just wanted to make sure it was on your radar | 07:54 |
dimitern | fwereade, ta | 07:54 |
rogpeppe | fwereade: the issue is quite obscure to me - i'm can't see the exact problem that's being reported there | 07:58 |
fwereade | rogpeppe, AIUI it's a change in behaviour -- jamespage will be able to make it clear I think? | 07:59 |
rogpeppe | fwereade: right. it would be nice to know what's the expected behaviour there and how the reported logs differ | 08:00 |
jamespage | rogpeppe, I upgrade nova-compute nodes (which had the correct private-address) and the private-address switches to be the ip address of the internal bridge virbr0 | 08:00 |
rogpeppe | jamespage: where can i see the result of that in the status? (or the logs?) | 08:00 |
jamespage | rogpeppe, in the bug report | 08:00 |
rogpeppe | jamespage: yeah, i was looking at the bug report | 08:00 |
jamespage | rogpeppe, the dns-name of all the nodes are the same | 08:01 |
jamespage | #err | 08:01 |
rogpeppe | jamespage: the status doesn't seem to show private addresses | 08:01 |
jamespage | rogpeppe, OK - public-address then | 08:01 |
rogpeppe | jamespage: ah, dns-name, sorry | 08:01 |
jamespage | rogpeppe, whatever happened it was wrong | 08:01 |
rogpeppe | jamespage: right, the public address. that really confused me. | 08:01 |
jamespage | rogpeppe, I'm not sure about the private-address tbh | 08:01 |
jamespage | rogpeppe, titled changed | 08:02 |
rogpeppe | jamespage: thanks | 08:02 |
=== vladk|offline is now known as vladk | ||
dimitern | fwereade, how about s/SetProvisionedWithNetworks/ProvisionInstance/ ? | 08:16 |
rogpeppe | jamespage: can you find out what addresses nova returns for the instance ids? | 08:18 |
jamespage | rogpeppe, not right now | 08:18 |
rogpeppe | jamespage: ok | 08:18 |
jamespage | but I can look again later | 08:18 |
rogpeppe | jamespage: i'm suspecting that nova is returning the libvirt bridge address as one of the addresses for an instance, and our logic happens to be picking it out | 08:19 |
jamespage | rogpeppe, hmm | 08:19 |
jamespage | rogpeppe, nova has no knowledge of that afaik | 08:20 |
jamespage | as in there is no agent in the instance that would let it know | 08:20 |
rogpeppe | jamespage: hmm | 08:20 |
rogpeppe | jamespage: ah, i see where it comes from | 08:25 |
rogpeppe | axw: i think this issue (#1303735) is to do with worker/machiner - setMachineAddresses is setting the libvirt bridge address without marking it as NetworkMachineLocal | 08:31 |
_mup_ | Bug #1303735: public-address change to internal bridge post juju-upgrade <openstack-provider> <juju-core:Triaged> <juju-core 1.18:Triaged> <https://launchpad.net/bugs/1303735> | 08:31 |
rogpeppe | jamespage: so, i know what the issue is, but i'm not yet sure of the right way to fix it | 08:31 |
axw | rogpeppe: that would suggest that the openstack provider doesn't have any cloud-local addresses | 08:37 |
axw | is that expected? | 08:37 |
axw | rogpeppe: and you're right of course, it's not setting them to local - how would it know to do that? | 08:38 |
rogpeppe | axw: yeah | 08:38 |
rogpeppe | axw: i don't think it means the provider doesn't have any cloud-local addresses, as we're looking for public addresses here | 08:39 |
axw | rogpeppe: sorry, misread the bug | 08:39 |
axw | rogpeppe: I thought it was private | 08:39 |
rogpeppe | axw: it seems like state.mergedAddresses doesn't preserve ordering, which is perhaps a pity | 08:39 |
rogpeppe | jamespage: it would still be useful to see what addresses nova is returning for the instances | 08:40 |
rogpeppe | axw: i'm thinking that it might be possible for a machine to know which interfaces are private, but it might be quite os-specific | 08:42 |
axw | rogpeppe: ISTM that the best thing we could do is to prefer cloud-local over unknown | 08:42 |
axw | rogpeppe: indeed | 08:43 |
rogpeppe | axw: when asking for a public address? | 08:43 |
axw | (quite os specific) | 08:43 |
rogpeppe | axw: that seems wrong to me | 08:43 |
axw | rogpeppe: yeah, if there's no public address | 08:43 |
axw | is it less wrong to choose an unknown address that might be private (like this)? | 08:43 |
rogpeppe | axw: another possibility is to strictly order Machine.Addresses before Machine.MachineAddresses | 08:43 |
axw | the right thing of course is to classify things properly | 08:43 |
axw | rogpeppe: looking at instance.SelectPublicAddress, that won't work - it chooses the last cloud-local/unknown in the list | 08:44 |
axw | which is different to internal, for some reason | 08:45 |
rogpeppe | axw: that's definitely wrong if so | 08:45 |
axw | rogpeppe: perhaps it just needs to change to be like internal | 08:45 |
rogpeppe | axw: yes | 08:46 |
axw | (and preserve order) | 08:46 |
rogpeppe | axw: in fact, the implementation of internalAddressIndex and publicAddressIndex should probably be merged | 08:46 |
rogpeppe | jamespage: when you have a moment, would you be able to run this go program on one of the openstack nodes in the juju env that exhibits this problem? http://play.golang.org/p/GH0261EIHH | 09:10 |
rogpeppe | axw: i'm thinking we might be able to make some deductions from the interface name | 09:11 |
jamespage | rogpeppe, OK - lemme finish up the upgrade testing I'm doing and I'll try again | 09:13 |
natefinch | morning all | 09:36 |
natefinch | rogpeppe: you around? | 09:36 |
rogpeppe | natefinch: yup | 09:36 |
rogpeppe | natefinch: just doing a review. will be with you shortly. | 09:36 |
natefinch | rogpeppe: sure | 09:36 |
axw | rogpeppe: sorry was afk. I suppose that would be better than what we have now | 09:42 |
rogpeppe | axw: preserving order, you mean? | 09:43 |
axw | rogpeppe: deducing classification | 09:43 |
rogpeppe | axw: yeah | 09:43 |
rogpeppe | axw: we should preserve order too, i think, so the addresses are in predictable order. currently we're shuffling them randomly, which isn't great | 09:44 |
natefinch | rogpeppe, axw: you guys talking about the sort.Stable address problem with replicaset addresses? | 09:44 |
axw | rogpeppe: provider addresses should certainly come before machine, but otherwise I think relying on order is a mistake | 09:44 |
axw | natefinch: no, something else entirely - choosing public addresses when there are only unknown/cloud-local | 09:45 |
rogpeppe | natefinch: no, we're tallking about #1303735 | 09:45 |
_mup_ | Bug #1303735: public-address change to internal bridge post juju-upgrade <openstack-provider> <juju-core:Triaged> <juju-core 1.18:Triaged> <https://launchpad.net/bugs/1303735> | 09:45 |
rogpeppe | axw: mgz says that order is important | 09:45 |
natefinch | ahh ok | 09:45 |
axw | rogpeppe: if addresses really do have a priority, then I think that should be explicit | 09:46 |
rogpeppe | axw: and that a provider can return preferred addresses by putting them earlier in the addresses slice | 09:46 |
axw | ordering in a slice seems pretty subtle, easy to break | 09:46 |
rogpeppe | axw: that's the current design, FWIW | 09:46 |
axw | yeah, I get that - it needs to be fixed - just whining :) | 09:47 |
natefinch | type AddressByPriority []Address | 09:47 |
natefinch | now it's explicit | 09:47 |
rogpeppe | natefinch: i think it's reasonable as is, actually. | 09:48 |
axw | rogpeppe: we should probably document that order is important on instance.Instance.Addresses | 09:48 |
rogpeppe | axw: it's not too hard to take care to preserve order. it would be nice if there was a function to help with merging address slices in the instance package | 09:49 |
rogpeppe | axw: definitely | 09:49 |
natefinch | rogpeppe: I'm not a huge fan of relying on order of a generic slice. I guess we very rarely pass it around outside the provider, and if the provider interface makes it clear the order matters, then that's probably ok. | 09:51 |
rogpeppe | natefinch: we pass it around a lot actually | 09:51 |
rogpeppe | natefinch: i don't really see the problem - slices are inherently ordered | 09:51 |
natefinch | rogpeppe: yes, but that order usually doesn't matter. And it's not clear it matters when some random function gets a list of addresses deep in the bowels of the code. | 09:53 |
rogpeppe | natefinch: huh? that order often/usually does matter! | 09:53 |
natefinch | I presume we got into this mess because we didn't realize the order of the slice matters | 09:53 |
rogpeppe | natefinch: e.g. []byte | 09:53 |
rogpeppe | natefinch: we definitely need to document that more | 09:54 |
rogpeppe | natefinch: but i think it's reasonable to have a convention that []Address is ordered | 09:54 |
rogpeppe | natefinch: otherwise we'd end up adding some kind of a priority field which would actually make things considerably harder | 09:55 |
natefinch | rogpeppe: I'm just not a fan of preventing bugs by following conventions that are likely only written down in one place in a huge codebase. But I agree making the providers return a different type would be a hassle. | 09:59 |
rogpeppe | natefinch: it's not just making the providers return a different type - it's coordinating priorities. do you have some global definition of address priority levels? what do you do when you combine address from two different sources? | 10:00 |
rogpeppe | natefinch: all those issues fall out naturally if you assume that ordering matters in a slice | 10:00 |
rogpeppe | natefinch: we should definitely write down in a couple of places that order is significant | 10:01 |
natefinch | rogpeppe: I don't want to continue to argue it, since it's just stopping us from actually doing anything, but I think the answer is non-trivial no matter what we do. | 10:01 |
rogpeppe | natefinch: i don't think it's too hard actually. just preserve order when combining addresses. | 10:01 |
wallyworld_ | mgz: have you nova booted an instance on hp cloud manually and then attempted to ssh into it? i've had no luck getting in via ssh | 10:02 |
natefinch | rogpeppe: I guess I don't know how to preserve order when merging two slices unless you know how they were sorted in the first place. | 10:02 |
rogpeppe | natefinch: trivial answer: just concatenate the slices | 10:03 |
jam | axw: I just got a "session already closed" panic on the bot. Doesn't your patch fix that? | 10:03 |
axw | my patch? | 10:04 |
jam | axw: the one that untwines StateWorker and APIWorker | 10:04 |
axw | jam: rogpeppe fixed a channel closed one | 10:04 |
rogpeppe | natefinch: more sophisticated answer: delete items in the second slice that exist in the first slice before concatenating them | 10:04 |
axw | jam: link? | 10:04 |
axw | jam: nm, found it | 10:05 |
natefinch | rogpeppe: how do you know the ones in the second slice are lower priority than all the ones in the first slice? | 10:05 |
jam | axw: heres' a link to the failure: https://code.launchpad.net/~jameinel/juju-core/go-vet-cleanup/+merge/214911 | 10:05 |
rogpeppe | jam: yeah, my patch wasn't for a "session already closed" error | 10:05 |
axw | jam: that looks different | 10:05 |
rogpeppe | natefinch: you make that decision | 10:05 |
rogpeppe | natefinch: based on the origin of each slice | 10:05 |
rogpeppe | jam: it may well be related to my patch though | 10:06 |
rogpeppe | jam: i'll have a look | 10:06 |
jam | axw: I thought you had a comment in IRC about breaking the machine agent because of the multiple connections during upgrade, which might be related, but maybe not directly. | 10:07 |
jam | this, in particular, looks like a Watcher that is trying to finish something while the connections are cleaning up. | 10:07 |
axw | jam: I did, and rogpeppe fixed it... I don't think it is related, but maybe rog will have a better idea | 10:08 |
jam | axw: rogpeppe: looking at state/watcher/watcher.go it looks like it could be a race condition. If we triggered tomb.Dying but also got the timeout in time.After(period), the w.needSync will be checked without looking at tomb.Dying | 10:10 |
jam | hmm.. alternatively, on first entering the function, you also set needSync, but haven't looked at Dying yet (AFAICT) | 10:11 |
rogpeppe | jam: i don't think that should matter | 10:11 |
jam | the traceback says that it was happening in New() | 10:12 |
rogpeppe | jam: until the watcher's tomb is Dead, it's entitled to do anything it likes | 10:12 |
jam | though it doesn't go above that. | 10:12 |
rogpeppe | jam: i think it must be that we're not closing things down properly | 10:12 |
jam | rogpeppe: sure, it looks like we might have gotten a closed session while we were doing something else, and we're closing it concurrently with creating something new.. ? | 10:12 |
jam | rogpeppe: anyway, don't look too deeply on this, I was just trying to push out some of wwitzel's in-progress stuff while he was gone | 10:14 |
jam | it isn't critical work | 10:14 |
natefinch | btw, rogpeppe: to land HA, we need to rework the sort.Stable of addresses. sort.Stable is go 1.2, and we only require go 1.1.2 right now | 10:17 |
axw | natefinch: why do you need to stable sort? | 10:18 |
rogpeppe | natefinch: right - i saw that. all that selectPreferredStateServerAddress logic is about to go anyway | 10:18 |
rogpeppe | natefinch: i didn't suggest taking it out because i didn't want to perturb the branch any more | 10:19 |
rogpeppe | natefinch: i'd just delete all of that and use mongo.SelectPeerAddress instead | 10:19 |
rogpeppe | axw: we used a stable sort to preserve address order | 10:19 |
natefinch | rogpeppe: right, we just have to take it out since the bot can't compile it | 10:20 |
jamespage | rogpeppe, OK - this is from 12.04 - http://paste.ubuntu.com/7225600/ | 10:20 |
axw | rogpeppe: yeah, just wondering what part of the address is being ignored for the sort.Sort not to be good enough | 10:20 |
jamespage | rogpeppe, however I think I saw the issue on 14.04 nodes - so doing it there as well. | 10:20 |
rogpeppe | jamespage: oh, one mo. i didn't include some crucial info. | 10:21 |
axw | cos if they're equal and we're considering all fields, surely we don't care | 10:21 |
rogpeppe | jamespage: this is more useful: http://play.golang.org/p/mmy9KhUy9T | 10:24 |
rogpeppe | axw: we weren't comparing all fields | 10:24 |
mgz | wallyworld_: yeah, you need to add your ssh key either through cloud-init or via nova though | 10:37 |
wallyworld_ | mgz: i tried via nova using keypair-add | 10:37 |
mgz | right, with that... it didn't work? | 10:37 |
wallyworld_ | i used the --pub-key option | 10:37 |
wallyworld_ | yeah, didn't work | 10:38 |
wallyworld_ | mgz: i'm trying to test the latest fixes to the manual provider that landed today | 10:38 |
jamespage | rogpeppe, http://paste.ubuntu.com/7225650/ | 10:38 |
mgz | you can use `nova console-log` to see what's up if you supplied any cloud init bits | 10:38 |
wallyworld_ | mgz: didn't supply any cloud init bits, was just assuming keypair-add would work | 10:39 |
wallyworld_ | console log seemed to show some random key being used | 10:39 |
wallyworld_ | not mine | 10:39 |
mgz | odd | 10:39 |
rogpeppe | jamespage: thanks | 10:39 |
mgz | wallyworld_: ah, | 10:40 |
rogpeppe | jamespage, mgz: do you think it would be reasonable to pattern match on the interface name to determine the class of address? (e.g. if it matches virbr* then assume it's machine-local) | 10:40 |
mgz | did you actually use `nova boot --key-name MYKEY` ? | 10:40 |
wallyworld_ | yep | 10:40 |
rogpeppe | jamespage: i don't know how predictable interface names are in linux | 10:40 |
mgz | okay, I'm out of ideas then :P | 10:40 |
wallyworld_ | mgz: the same name as i used for keypair-add | 10:40 |
wallyworld_ | :-( | 10:41 |
mgz | wallyworld_: try supplying a key with cloud-init instead | 10:41 |
jamespage | rogpeppe, hmm | 10:41 |
mgz | 's a bit more work but should be fine | 10:41 |
wallyworld_ | mgz: point me to some doc to tell me what to do? | 10:41 |
mgz | sec | 10:41 |
wallyworld_ | or i can try with lxc i guess | 10:41 |
rogpeppe | jamespage: because i believe there are cases where we really do want to get the addresses off the local machine interfaces. but that's hard if we can't tell which ones are machine-local. | 10:42 |
mgz | wallyworld_: basically, make a text file with `#cloud-config\nssh_authorized_keys\n - ssh-rsa .... blah@blah\n" | 10:45 |
jamespage | rogpeppe, you can't safely make that assumption "if it matches virbr* then assume it's machine-local" | 10:45 |
mgz | see doc/examplescloud-config-ssh-keys.txt in lp:cloud-init for an example | 10:46 |
wallyworld_ | ta, ok | 10:46 |
mgz | then you can supply that file stright as --user-data to boot | 10:46 |
mgz | (no need to gzip as it's so small) | 10:46 |
wallyworld_ | ok, i'll try that | 10:46 |
jamespage | rogpeppe, is it possible to limit juju to quering interfaces its been told about or created itself? | 10:47 |
jamespage | rogpeppe, whitelist rather than blacklist | 10:47 |
dimitern | jam, standup? | 10:47 |
mgz | jamespage: we'll nearly start doing that with maas now I think, as dimitern has started getting the network interfaces from the lshw that maas provides | 10:48 |
mgz | we should probably do something similar when we grow better networking support in other clouds | 10:48 |
dimitern | mgz, jamespage, we definitely will do that for other clouds, gradually as juju networking support grows | 10:49 |
dimitern | fwereade, updated and tested https://codereview.appspot.com/85220044/ - should be good to land | 10:50 |
fwereade | dimitern, cheers | 10:50 |
mattyw | fwereade, I've made the small change you asked for - just added a test and a small fix - happy or me to land it? https://codereview.appspot.com/83060049/ | 11:07 |
jam1 | dimitern: sorry I missed the ping. I completely spaced off the standup, and was on my other laptop. | 11:17 |
dimitern | jam1, we're still there, you can join if you like :) | 11:18 |
fwereade | mattyw, if I LGTMed with fixes you don't need to ask, but you can always ask for another review if you'e not sure | 11:32 |
mattyw | fwereade, ok, I just added the test - and a fix I found while writing it so I'll approve it then, thanks | 11:32 |
fwereade | mattyw, cool | 11:33 |
jam1 | fwereade, dimitern: https://code.launchpad.net/~jameinel/juju-core/1.18-refuse-downgrade-1299802/+merge/214878 needs a review | 11:39 |
dimitern | jam1, looking | 11:40 |
jam1 | dimitern: thanks | 11:42 |
dimitern | jam1, LGTM | 11:47 |
rogpeppe | mgz, jamespage: i wonder if we could just add only addresses from eth* interfaces for the time being. that would probably cover the case that we care about most currently. | 12:12 |
rogpeppe | natefinch: hangout? | 12:13 |
axw | rogpeppe: is it expected we'll want to have non-voting replicaset members? is that why we have NoVote/WantsVote? or is that specifically for handling inaccessible members? | 12:13 |
rogpeppe | axw: yes - if a machine goes down, we don't know that it might just come back up again in a few moments, so we don't want to just destroy it or remove it immediately | 12:14 |
rogpeppe | axw: so we just mark it so that it doesn't want the vote | 12:15 |
rogpeppe | axw: also, we can have a machine with WantsVote=false and HasVote=true | 12:15 |
axw | ok | 12:17 |
natefinch | rogpeppe: sure | 12:17 |
rogpeppe | axw: our main invariant is that the number of machines that *want* the vote must always be odd, and similarly the number of machines in the replica set configuration that *have* the vote must always be odd. | 12:17 |
rogpeppe | natefinch: one mo, i've just been called to lunch | 12:17 |
natefinch | ok | 12:17 |
* rogpeppe lunches | 12:17 | |
* natefinch breakfasts | 12:18 | |
* perrito666 snacks after breakfast | 12:21 | |
perrito666 | we really need more names for eating occasions | 12:21 |
* axw pats his belly full of pizza | 12:21 | |
jam1 | perrito666: brunch is the breakfast + lunch meal | 12:22 |
jam1 | second breakfast is the hobbit one, (along with elevensies (sp?)) | 12:22 |
axw | heh | 12:22 |
perrito666 | jam1: I am in the hobbit one | 12:22 |
perrito666 | I intend to lunch too | 12:22 |
perrito666 | (and honestly,I might also eat something near eleven not that you mention it) | 12:22 |
jam1 | perrito666: http://www.moviemistakes.com/film1778/quotes | 12:23 |
jam1 | so, breakfast, second breakfast, elevensies, Lunch, Luncheon, Afternoon tea, dinner, supper, I'm not sure if there are more | 12:23 |
perrito666 | that pretty much covers my day :) | 12:24 |
jam1 | rogpeppe, natefinch: so how close are we to having a "juju ensure-state-availability" that we can play with ? | 12:25 |
rogpeppe | jam1: i've got a branch that seems to work | 12:37 |
rogpeppe | jam1: but it needs more tests | 12:38 |
jam1 | rogpeppe: natefinch: I just noticed that we thought EnsureMongo could probably land (and be polished from there) yesterday, but it is still up for review. | 12:48 |
jam1 | at least the comment yesterday was "if I get enough time before the kids wake up", which probably didn't happen, but certainly afterwards... ? | 12:49 |
rogpeppe | jam1: it's landing very soon | 12:49 |
rogpeppe | jam1: it used a go 1.2 feature which meant it couldn't land as was | 12:49 |
jam1 | rogpeppe: if that wasn't said weeks ago, I would trust you :) | 12:49 |
jam1 | rogpeppe: what was that? (I wasn't particularly aware of 1.2 incompatibilities) | 12:49 |
rogpeppe | jam1: it used sort.Stable, which is a go1.2 addition | 12:50 |
jam1 | ah | 12:50 |
rogpeppe | jam1: it's been LGTM'd | 12:50 |
mattyw | is the landing bot awake? | 12:51 |
jam1 | mattyw: it landed my stuff 10 min ago | 12:52 |
jam1 | but I'll check it | 12:52 |
jam1 | mattyw: do you have something that it isn't noticing? | 12:52 |
mattyw | jam1, https://code.launchpad.net/~mattyw/juju-core/deploy-with-user-name/+merge/213962 | 12:53 |
mattyw | jam, I guess there might be a queue? | 12:53 |
jam1 | mattyw: you don't have a commit message set | 12:53 |
jam1 | so the bot ignores it | 12:53 |
mattyw | jam, ah - of course, thanks | 12:53 |
jam1 | mattyw: I copied your description | 12:53 |
mattyw | jam1, that's great thanks very much | 12:54 |
mattyw | jam, I'll try to remember for next time | 12:54 |
dimitern | fwereade, poke re https://codereview.appspot.com/85220044/ | 12:58 |
rogpeppe | natefinch: i've got a dentist's appointment now. back in 30 mins | 13:02 |
jam1 | mattyw: I can see the bot is processing your request. | 13:02 |
jam1 | Note that we've had some intermittent failures with "Session already closed". If you see that, you can resubmit. | 13:02 |
mattyw | jam1, ok thanks | 13:03 |
sinzui | Hi jam, fwereade : I think this bug is describing unsupported behaviour or lxc nested in kvm: https://bugs.launchpad.net/juju-core/+bug/1304530 | 13:04 |
_mup_ | Bug #1304530: nested lxc's within a kvm machine are not accessible <addressability> <cloud-installer> <kvm> <local-provider> <lxc> <juju-core:New> <https://launchpad.net/bugs/1304530> | 13:04 |
mgz | sinzui: yeah, that's likely just a case of no one having tried it yet | 13:05 |
mgz | the local provider is already pretty crazy when it comes to addressing without adding nested containers in | 13:06 |
sinzui | mgz, I think stokachu has done something like that and it required esoteric magis to work | 13:08 |
mgz | if you manually fiddle with the network setup you you could probably make it work | 13:08 |
mgz | it's not something we're looking to support for trusty though | 13:08 |
sinzui | mgz, CI hates trunk https://bugs.launchpad.net/juju-core/+bug/1305047 | 13:09 |
_mup_ | Bug #1305047: Unit tests fail on lp:juju-core r2588 <regression> <test-failure> <juju-core:Triaged> <https://launchpad.net/bugs/1305047> | 13:09 |
mgz | sinzui: that's rogpeppe's bug | 13:10 |
mgz | rogpeppe: have you got a bug number for it? | 13:10 |
sinzui | Ah, silly me, stokachu is the reporter of the bug. So I think he has reached the dead end that thumper predicted | 13:12 |
fwereade | dimitern, rereviewed | 13:23 |
dimitern | fwereade, thanks | 13:27 |
rogpeppe | mgz, sinzui: i tried and failed to reproduce that problem | 13:29 |
sinzui | :( | 13:31 |
rogpeppe | sinzui: interestingly panic is in a different test to the one that jam saw | 13:32 |
rogpeppe | s/panic/that panic/ | 13:33 |
sinzui | rogpeppe, CI will run the tests 5 times before giving up. It tried for many revs and did many fails | 13:35 |
sinzui | rogpeppe, But Vi just got a pass http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-precise/ | 13:35 |
sinzui | rogpeppe, trusty is has the same bad record, but its passes happen in a better order to make CI happy: http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/run-unit-tests-amd64-trusty/ | 13:36 |
mgz | sinzui: it seems like a pretty easy to hit race fail | 13:37 |
mgz | landing bot has jackpotted a number of times | 13:37 |
natefinch | rogpeppe: you have dentist appointments that only take 30 minutes? Damn, mine always take like an hour. | 13:37 |
natefinch | (and that's not including time to get there) | 13:37 |
rogpeppe | natefinch: the actual appointment was just a checkup - 10 minutes only; and the dentist is only a couple of minutes bike ride away | 13:39 |
natefinch | rogpeppe: that's cool. | 13:39 |
rogpeppe | hmm, i just saw another (probably unrelated) panic when testing | 13:41 |
rogpeppe | http://paste.ubuntu.com/7226267/ | 13:42 |
mgz | rogpeppe: is the change you suspect just revertable? | 13:42 |
rogpeppe | mgz: probably | 13:43 |
rogpeppe | mgz: i'd like to know what's going on though | 13:43 |
mgz | if CI can hit the error this reliably, should be pretty easy to confirm blame or not | 13:43 |
jamespage | rogpeppe, does the same code get used in the MAAS provider? when using LXC containers, brX is also valid | 13:50 |
rogpeppe | jamespage: yes, the same code gets used in the MAAS provider | 13:51 |
rogpeppe | jamespage: perhaps we need provider-specific code to run in the client to get the addresses | 13:51 |
jamespage | rogpeppe, so in the MAAS provider the IP address is assigned to the bridge, not the physical interface | 13:51 |
jamespage | assuming LXC or KVM containers have been created | 13:51 |
jamespage | whitelisting eth* and br* might work OK | 13:52 |
jamespage | that said I've seen emX style entries as well with biosdevname | 13:52 |
rogpeppe | jamespage: i'm not familiar with the details of what scenarios we really need the machine-local address discovery. | 13:52 |
rogpeppe | fwereade, mgz: ^ | 13:52 |
rogpeppe | s/discovery/discovery for/ | 13:53 |
mgz | jamespage: I'm suspecious of anything just running on the machines themselves | 13:53 |
jamespage | mgz, rogpeppe: do we know what this local discovery step is used for? | 13:53 |
rogpeppe | jamespage: i'm not sure. i'm guessing there are some places that we can't use the provider for discovery | 13:54 |
rogpeppe | jamespage: perhaps this is something that's there for manual provisioning only | 13:54 |
mgz | jamespage: this is all re bug 1303735 right? | 13:56 |
_mup_ | Bug #1303735: public-address change to internal bridge post juju-upgrade <openstack-provider> <juju-core:Triaged> <juju-core 1.18:Triaged> <https://launchpad.net/bugs/1303735> | 13:56 |
jamespage | mgz, yes | 13:57 |
=== hatch__ is now known as hatch | ||
fwereade | rogpeppe, jamespage: it's the blasted local provider that needs local address discovery | 14:27 |
rogpeppe | ah, interesting, there's a race in the test between agent config and the apiaddressupdater | 14:27 |
fwereade | rogpeppe, jamespage: I can't immediately recall whether we have lxc-ls --fancy everywhere we might need it, which *could* let us work around that | 14:27 |
fwereade | rogpeppe, jujud tests? | 14:27 |
rogpeppe | fwereade: yeah | 14:28 |
fwereade | rogpeppe, those things suck | 14:28 |
fwereade | ;) | 14:28 |
rogpeppe | fwereade: it's only a race because we don't set the APIHostPorts in the state | 14:28 |
rogpeppe | fwereade: so the apiaddressupdater starts and immediately assigns no addresses to the agent config | 14:28 |
rogpeppe | fwereade: the test only works if the APIWorker grabs the APIInfo before it does that | 14:29 |
rogpeppe | fwereade: we should have valid APIHostPorts in the state, then there shouldn't be a problem | 14:29 |
fwereade | rogpeppe, right, makes sense | 14:30 |
rogpeppe | fwereade: i wondered about having an EnvironProvider method that allows us to ask a provider for local addresses | 14:30 |
rogpeppe | fwereade: then the local provider could implement it, but the other providers could just return nothing | 14:31 |
rogpeppe | fwereade: (but it would potentially allow us to move away from using the instancepoller for some providers, if we wanted to - hazmat thinks that's a good idea) | 14:32 |
fwereade | rogpeppe, honestly I think it's a matter of tuning the instancepoller more than it is a matter of dropping it | 14:33 |
fwereade | rogpeppe, we want to keep track of instance status as well | 14:33 |
fwereade | rogpeppe, I think there's something else | 14:33 |
fwereade | rogpeppe, oh, yeah, instance networks | 14:33 |
rogpeppe | fwereade: regardless, having a way for providers to add locally-sources addresses to a machine seems like a reasonable idea | 14:34 |
fwereade | rogpeppe, yeah, I wouldn't object to making the MachineAddresses stuff smarter | 14:34 |
dimitern | fwereade, network ids and tags https://codereview.appspot.com/86010044 when you can take a look | 14:34 |
rogpeppe | fwereade: FWIW MachineAddresses isn't a great name - it doesn't really say why Machine.MachineAddresses is different from Machine.MachineAddresses... | 14:35 |
fwereade | rogpeppe, agreed, it's an awful name | 14:35 |
rogpeppe | fwereade: LocalAddresses? | 14:36 |
rogpeppe | fwereade: LocallySourcedAddresses? | 14:36 |
fwereade | rogpeppe, the semantic payload there is not ideal either | 14:36 |
rogpeppe | (i don't like either of those, BTW) | 14:36 |
fwereade | rogpeppe, the latter is probably best | 14:36 |
natefinch | anyone know why I'm getting this when I run juju? WARNING unknown config field "proxy-ssh" | 14:37 |
fwereade | (best of a bad bunch) | 14:37 |
rogpeppe | fwereade: AgentProvidedAddresses ? | 14:37 |
fwereade | natefinch, hmm, that seems odd -- it's something axw added for azure, but I'm not sure where ther error comes from | 14:39 |
natefinch | fwereade: I don't have proxy-ssh in my environments.yaml anywhere | 14:39 |
natefinch | fwereade: I think blowing away my old environments.yaml and making a new one helped | 14:43 |
vladk | dimitern: I done lbox propose for gomaasapi, but no codereview on appspot was created, only on LP | 14:44 |
dimitern | vladk, did lbox give you any errors? | 14:44 |
fwereade | natefinch, would you take a few moments to dig into it sometime today please? many users will have old environments.yamls... | 14:44 |
vladk | dimitern: no just print link to LP: | 14:45 |
vladk | Proposal: https://code.launchpad.net/~klyachin/gomaasapi/101-testserver-extensions/+merge/214961 | 14:45 |
dimitern | vladk, you could try running it again | 14:45 |
natefinch | fwereade: yeah, once HA lands, I'll be able to actually work on other thigns | 14:46 |
natefinch | fwereade: which will be as soon as I can run a few tests | 14:46 |
* fwereade cheers at natefinch | 14:47 | |
natefinch | fwereade: this also seems to have fixed other problems I had been experiencing. Definitely worth investigating | 14:48 |
natefinch | (luckily I kept around the old environments.yaml | 14:49 |
natefinch | I wish we had an environment variable we could set that would effectively add --debug to every juju command line call | 14:51 |
=== alexlist is now known as alexlist` | ||
vladk | dimitern: I specified -cr explicitly: https://codereview.appspot.com/86070043 | 15:03 |
dimitern | vladk, ah, you know - gomaasapi doesn't have .lbox.check in the root dir I think | 15:03 |
dimitern | vladk, in juju-core we have .lbox containing the default args for lbox: "propose -cr -for lp:juju-core" | 15:04 |
mgz | vladk: you can always rerun lbox propose as many times as you want, so that's fine | 15:04 |
mgz | I always do `lbox propose -cr -v` out of long standing habit | 15:04 |
jamespage | fwereade, are you guys aiming for a 1.18.1 release prior to the end of tomorrow? | 15:07 |
jamespage | (which co-incidentally is when Final Freeze kicks in) | 15:07 |
mgz | jamespage: we appear to have no fixed bugs in 1.18.1 | 15:08 |
mgz | so I'd guess not. | 15:08 |
natefinch | gah, I still can't deploy stuff locally | 15:09 |
jamespage | mgz, well we have 24 hrs until tomorrow eod :-) | 15:09 |
mgz | jamespage: yeah, but all the bugs look hard... ;_; | 15:10 |
fwereade | jamespage, it's not impossible: I'm working on the first; I will ask axw to look at the second overnight; just asked for cmars' comments re third; not sure about 4th, I'll ask an australian to take a look; 5th apears unreproable | 15:15 |
jamespage | fwereade, OK - thanks | 15:15 |
natefinch | ah hah..... lxc-ls seems broken. I bet that's my problem | 15:16 |
jamespage | fwereade, its not impossible to get a point release in after tomorrow | 15:16 |
fwereade | jamespage, sure, but I prefer to be a good citizen where practical | 15:16 |
rogpeppe | natefinch: i'm just proposing a branch to fix one of the machine agent panics. perhaps we could join up to move the HA stuff forward after that? | 15:18 |
dimitern | fwereade, updated https://codereview.appspot.com/85220044/ once more | 15:19 |
natefinch | rogpeppe: sure. I fixed the port problem with the initiate address and removed the testing panics you mentioned in the review. It works on amazon, but I seem to have an LXC problem on my local host, so I'm apt-getting and will reboot after to see if that fixes anything | 15:19 |
dimitern | fwereade, i really want to land that and https://codereview.appspot.com/86010044/ today if i can | 15:19 |
rogpeppe | this fixes a cmd/jujud test crash: https://codereview.appspot.com/86080043/ | 15:21 |
rogpeppe | fwereade, mgz, dimitern, natefinch: review appreciated | 15:21 |
rogpeppe | unfortunately it's not the one that people have been seeing on the 'bot and in CI | 15:22 |
mgz | ha, typed in the wrong id | 15:23 |
mgz | but I should actually review dimitern's branch, which is where I ended up :P | 15:23 |
natefinch | brb, gonna reboot now that I have upgraded, see if that fixes my lxc problems | 15:24 |
dimitern | rogpeppe, reviewed | 15:29 |
rogpeppe | dimitern: ta! | 15:29 |
rogpeppe | dimitern: in general i prefer to use a literal - if i use "nothing", then i have to check its value. i don't mind the slightly greater verbosity. | 15:30 |
rogpeppe | dimitern: at some point in the future i hope to see a "zero" builtin in Go that acts like nil except it represents the zero value for any type. | 15:31 |
dimitern | rogpeppe, I know you don't mind :) It's just my opinion | 15:32 |
dimitern | rogpeppe, yeah, that will be very handy | 15:32 |
rogpeppe | dimitern: FWIW, i think the code with naive literals reads slightly more easily - it's more directly obvious what the code is doing. | 15:33 |
fwereade | dimitern, https://codereview.appspot.com/86010044/ reviewed | 15:34 |
dimitern | fwereade, ta! | 15:38 |
fwereade | dimitern, you might not be so happy when you read it, we may need to discuss, I fear I have been unclear | 15:38 |
dimitern | rogpeppe, the first thing i'm doing when reading unfamiliar code and see a var/type/etc. i don't get, i immediately hit M-. in emacs, which invokes godef on the symbol and voila! | 15:39 |
rogpeppe | dimitern: i'm usually reading the code in codereview... | 15:39 |
rogpeppe | dimitern: but without the nothing declaration there's no need for any second look - it's immediately obvious on first scan | 15:40 |
rogpeppe | dimitern: which is why i prefer it more direct like that | 15:40 |
fwereade | dimitern, the other one LGTM with trivials | 15:42 |
dimitern | fwereade, thanks, still reading the last review :) | 15:42 |
dimitern | fwereade, i can't really impose restrictions on what juju deems a valid network id until it's provider specific, can I ? | 15:43 |
natefinch | rogpeppe: lucky 13? https://codereview.appspot.com/72500043/ addressed the things you mentioned in the last review, and it tests ok live on local and amazon | 15:44 |
fwereade | dimitern, bugger, ofc you're right | 15:44 |
fwereade | dimitern, TODO it with a short explanation of why we're so lax? or, hmm, ask the maas guys what their restrictions on net names are? | 15:44 |
fwereade | dimitern, and slavishly copy those? :) | 15:45 |
dimitern | fwereade, and re params.Network having both Tag and Id as you suggest - I can do that and for now make sure both match always | 15:45 |
dimitern | fwereade, (except for the tag prefix ofc) | 15:45 |
dimitern | fwereade, I'll just look at the maas source | 15:46 |
dimitern | fwereade, re tags/names/ids - we can have in state and in the api + params all three and make tags work with names and keep name=id for now | 15:47 |
vladk | dimitern: I got LGTM from rvba. Could you give me the next task? | 15:53 |
dimitern | vladk, sorry, I have a few comments for you review | 15:54 |
dimitern | vladk, will submit in a minute | 15:54 |
fwereade | dimitern, yeah -- params.Network is just saying here's the net with this juju name, and this is what the provider calls it | 15:54 |
fwereade | dimitern, sounds like we're aligned | 15:54 |
fwereade | dimitern, thanks | 15:54 |
dimitern | fwereade, yep, thanks, will propose a bit later, if you're still here will ping you again :) | 15:56 |
dimitern | vladk, reviewed | 15:58 |
mgz | fwereade: your comments on dimitern's proposal confuse me | 15:59 |
fwereade | mgz, ha :) | 15:59 |
fwereade | mgz, it is perfectly possible that I am missing something | 15:59 |
rogpeppe | natefinch: get it approved! | 15:59 |
fwereade | mgz, would you expand a little? | 16:00 |
mgz | it seemed like we were deriving the tags from the cloud provider stuff, hence the no restrictions bar != "", rather than from being named by the use | 16:00 |
mgz | *user | 16:00 |
mgz | tag=what juju calls the network, id=what the cloud calls the network, name=junked due to being ambiguous | 16:01 |
natefinch | rogpeppe: it's going. I *just* merged and fixed a conflict, so it should just work. fingers crossed. | 16:01 |
rogpeppe | natefinch: hangout? | 16:01 |
mgz | (and label=optional friendly id for network also from the cloud) | 16:01 |
mgz | I guess your review is saying we *should* be providing a way for the user to specify a tag, tied to a given id | 16:02 |
mgz | but without some juju cli network commands, I don't see how we add that | 16:03 |
mgz | fwereade: ^does that make sense of my confusion? | 16:06 |
fwereade | mgz, tag is purely API-level, it's not what juju calls things | 16:07 |
dimitern | mgz, for now network.ProviderId == network.Name in juju (both state and api) and tags are created from names | 16:07 |
mgz | fwereade: ugh, the name inside the tag then | 16:07 |
mgz | having tag be the magic decoration bit is annoying | 16:08 |
fwereade | mgz, we *will* have cli network commands, and I want what we have to day to fit in with what we will need to do soon | 16:08 |
fwereade | mgz, for now, maas is the only thing that has networks, and there's a perfect mapping between provider id and user name | 16:09 |
mgz | fwereade: so, dimitern's version seems to do that, by autonaming the... names inside the tags, and placing no restrictions on them | 16:09 |
mgz | so they can become user-specified later | 16:09 |
fwereade | mgz, but it also conflates names and ids in several places -- and if we don't make the kind of data clear now we will have the devil of a time once we have a name that does not match an id | 16:11 |
mgz | fwereade: well the conflating I saw was from that autonaming business... maybe there's some other bits I missed? | 16:11 |
dimitern | mgz, I wasn't clear about not just renaming Name to Id, but having both and using name for juju stuff and id for provider stuff | 16:12 |
fwereade | mgz, it seemed to me that it was using Id instead of name across the board | 16:12 |
mgz | okay, so we just need to be picky as hell about the naming... which will still be confusing even if we are due to too many things, too few names for names... | 16:13 |
natefinch | rogpeppe: sorry, stepped out to get lunch. I can hangout now, yeah | 16:17 |
rogpeppe | natefinch: ok, cool | 16:17 |
rogpeppe | natefinch: https://plus.google.com/hangouts/_/canonical.com/juju-core?authuser=1 | 16:17 |
=== JoshStrobl__ is now known as JoshStrobl | ||
rogpeppe | natefinch: lp:~rogpeppe/juju-core/540-enable-HA | 16:19 |
vladk | :332 | 16:21 |
natefinch | \o/ The proposal to merge lp:~natefinch/juju-core/030-MA-HA into lp:juju-core has been updated. Status: Approved => Merged | 16:22 |
alexisb | natefinch, sweetness! | 16:24 |
natefinch | finally finally finally | 16:24 |
rick_h_ | natefinch: if you get time want to chat about that for a couple of min | 16:24 |
natefinch | rick_h_: how urgent is it? My day is pretty slammed | 16:25 |
rick_h_ | natefinch: not at all, completely when you've got time | 16:25 |
rick_h_ | and the time can even be 'let's catch up in vegas' | 16:25 |
natefinch | ha, ok | 16:25 |
rick_h_ | just a heads up gui wants to catch up on HA to see what we can/should do from our end so we can have a plan | 16:25 |
natefinch | rick_h_: good enough... I'll shoot you an email about it. We're not quite done with it, but this was a huge chunk that had taken way too long to get in | 16:28 |
rick_h_ | natefinch: rgr, thanks | 16:28 |
jam1 | hey guys, something in the test suite is now creating a directory and turning it into 666 | 16:39 |
jam1 | which means it is not executable or writeable | 16:39 |
jam1 | which means the test suite is failing to clean it up | 16:39 |
jam1 | a lot of: /tmp/jctest.LpP/gocheck-5577006791947779410/27/some-file | 16:39 |
jam1 | files | 16:39 |
jam1 | fwereade: is that one of your FT tests ? | 16:40 |
natefinch | jam1: there's a lot of tests that use 0666 | 16:40 |
jam1 | natefinch: sure ,but you don't normally change a *DIR* to 0666 | 16:41 |
fwereade | jam, hmm, I didn't *think* I did that, but I can't swear to it | 16:41 |
jam1 | they need 7 to be able to read the contet | 16:41 |
natefinch | jam1: oh, directory, right | 16:41 |
natefinch | jam1: Tcharm/repo_test.go has a couple of those | 16:42 |
natefinch | s/Tcharm/charm. | 16:42 |
fwereade | jam1, hmm, I do have an 0644 in there, drivebying it now | 16:43 |
jam1 | fwereade: sorry, it is 444 read only | 16:43 |
jam1 | 6 would be rw | 16:43 |
fwereade | jam, none of them I think | 16:43 |
fwereade | ah-ha! yes I do | 16:44 |
fwereade | bugger | 16:44 |
jam1 | well, it is all test number 27 ... ): | 16:44 |
jam1 | :) | 16:44 |
jam1 | not that *that* part helps | 16:44 |
jam1 | fwereade: TestRemovedCreateFailure | 16:45 |
jam1 | TestDirCreateFailure | 16:45 |
jam1 | fwereade: so I think just adding a Chmod(777) so we can clean up afterwards would be nice | 16:45 |
fwereade | jam1, yeah, deferred chmods back to 0777 | 16:46 |
jam1 | I don't think it is causing the test suite to *fail*, but it is preventing rm-rf from cleaning up after itself | 16:46 |
fwereade | jam1, yep | 16:46 |
fwereade | jam1, sorry about that | 16:46 |
jam1 | fwereade: np, I only noticed because the test suite is failing for other reasons | 16:46 |
jam1 | and that shows up in the log | 16:46 |
jam1 | dimiter's last patch just failed, some of which looks transient, and some which looks like an error message changed. | 16:47 |
jam1 | but at the *end* of that, it says "I couldn't clean up" | 16:47 |
jam1 | but hey, root can do anything it wants... | 16:48 |
jam1 | :) | 16:48 |
natefinch | trivial code review anyone? https://codereview.appspot.com/85970044 | 16:53 |
natefinch | jam1: btw, did you see EnsureMongoServer finally landed? | 16:54 |
jam1 | natefinch: I didn't. YAY \o/ | 16:54 |
natefinch | right? :) Super psyched | 16:54 |
natefinch | rogpeppe: https://codereview.appspot.com/85970044 | 17:01 |
jam1 | natefinch: lgtm | 17:04 |
fwereade | natefinch, woooot! | 17:16 |
natefinch | fwereade: thanks :) | 17:19 |
* natefinch has to see a man about some bees. | 17:20 | |
=== natefinch is now known as natefinch-afk | ||
* rogpeppe is done for the day | 17:24 | |
rogpeppe | might make it back in later for a little bit | 17:24 |
rogpeppe | g'night all | 17:24 |
* fwereade needs to go out for a while, would appreciate looks at https://codereview.appspot.com/85670046 | 17:37 | |
jam1 | natefinch-afk: you forgot to set a commit message when you proposed your merge, I'll do it for you | 17:44 |
=== vladk is now known as vladk|offline | ||
cmars | proposal for LP: #1303880 up, PTAL https://codereview.appspot.com/86130043 | 18:06 |
_mup_ | Bug #1303880: Juju 1.18.0, can not deploy local charms without series <charms> <regression> <series> <juju-core:In Progress by cmars> <https://launchpad.net/bugs/1303880> | 18:06 |
cmars | jam1, can you take a look at ^^ | 18:11 |
sinzui | cmars, I need to update the bug and note that setting the default-series in the env is also a solution if you are opposed to typing the series when you deploy a charm | 18:19 |
sinzui | cmars, I am taking the regression tag off. Now that we know the affected users are the edge cases we talked about. I think the solution is to show the right error message | 18:20 |
cmars | sinzui, that's a much easier fix :) please note the desired error message in the bug | 18:23 |
sinzui | I see you included lucid, but juju and charms don't run on it | 18:24 |
sinzui | cmars, I will think of a message right now | 18:24 |
sinzui | cmars, https://bugs.launchpad.net/juju-core/+bug/1303880/comments/6 | 18:41 |
_mup_ | Bug #1303880: Juju 1.18.0, can not deploy local charms without series <charms> <series> <juju-core:In Progress by cmars> <https://launchpad.net/bugs/1303880> | 18:41 |
sinzui | though I see I meant to write "release notes" in that commment | 18:42 |
cmars | sinzui, updated my proposal. can someone take a look, its much smaller now :) https://codereview.appspot.com/86160043 | 19:44 |
sinzui | cmars, I am not a reviewer but that size is nice. | 19:45 |
cmars | :) | 19:45 |
sinzui | where did the US go? | 19:45 |
sinzui | perrito666, can you review cmars's branch? | 19:48 |
=== mwhudson- is now known as mwhudson | ||
rogpeppe1 | anyone up for doing a review? https://codereview.appspot.com/86200043/ | 21:15 |
bac | sinzui: have you tried using staging-tools or their kin lately? | 21:22 |
sinzui | bac, I don't even know what they are | 21:23 |
bac | sinzui: you created the branch :) | 21:24 |
bac | https://code.launchpad.net/~ce-orange-squad/charmworld/staging-tools | 21:24 |
sinzui | bac :) I have forgotten much | 21:25 |
sinzui | oh | 21:25 |
sinzui | bac. | 21:25 |
sinzui | you probably care about the rt a report today | 21:25 |
bac | yes, maybe. does it involved access to canonistack post hb? | 21:26 |
sinzui | bac: I used those tools several times a week. orangesquad and juju-qa cannot use swift. Juju is unusable | 21:26 |
sinzui | bac: nova is fine | 21:26 |
bac | sinzui: i'm seeing canonical-sshuttle dying, not being able to connect to canonistack | 21:27 |
bac | it all worked the last time i tried | 21:27 |
* sinzui tries | 21:27 | |
sinzui | bac, I am connected, but what did I connect too because it looks empty | 21:28 |
sinzui | bac: I think my jenv is bad. I am told the env is not bootstrapped | 21:29 |
bac | sinzui: you think you're on staging? | 21:30 |
thumper | davecheney: bugger... it seems like godeps doesn't update the hg branches | 21:56 |
thumper | ah, no it does, it just doesn't say that it does | 22:01 |
thumper | trivial review to just update the go.net library: https://codereview.appspot.com/86250043 | 22:10 |
cory_fu | Has juju ssh to a machine number (juju ssh 0) been fixed yet for LXC? | 22:12 |
thumper | cory_fu: what do you mean? | 22:15 |
thumper | cory_fu: for the local provider? | 22:15 |
thumper | cory_fu: yes it works, except for machine 0 as that is the host | 22:15 |
thumper | unless your host actually has sshd running | 22:15 |
thumper | cmars: how goes https://bugs.launchpad.net/juju-core/+bug/1303880 | 22:17 |
_mup_ | Bug #1303880: Juju 1.18.0, can not deploy local charms without series <charms> <series> <juju-core:In Progress by cmars> <https://launchpad.net/bugs/1303880> | 22:17 |
cmars | thumper, i have a proposal, PTAL, https://codereview.appspot.com/86160043/ | 22:18 |
thumper | ack | 22:18 |
dannf | wallyworld_: just to clarify - the branch you linked fixed that error, but wasn't the root cause, correct? just wondering if there's a fix i can/should verify | 22:19 |
thumper | dannf: perhaps more context would help :-) | 22:24 |
dannf | thumper: LP: #1302205 | 22:26 |
_mup_ | Bug #1302205: manual provisioned systems stuck in pending on arm64 <add-machine> <hs-arm64> <manual-provider> <juju-core:In Progress by wallyworld> <https://launchpad.net/bugs/1302205> | 22:27 |
thumper | dannf: it was my understanding that the fixes that wallyworld had were to fix the root cause | 22:27 |
thumper | dannf: are you still having issues? | 22:27 |
thumper | I know that wallyworld was trying to test yesterday | 22:27 |
thumper | but not sure on the final progress | 22:27 |
wallyworld | dannf: hi, i just logged on again after networking issues so can't see that backscroll, can i help? | 22:27 |
dannf | wallyworld: yeah - just curious if the branches you linked were root cause - i.e., if it is worth me retesting w/ them | 22:28 |
wallyworld | dannf: i committed fixes, but had trouble testing because i have to run from trunk to test and so can't use simplestreams to get the tools and building juju from source on the arm vms just hung | 22:29 |
wallyworld | so i couldn't get tools built to test with | 22:29 |
dannf | wallyworld: did you try building on the nova host? i've built there many times w/o a problem | 22:30 |
wallyworld | i installed gcc-go and couldn't get outgoing access to lunchpad or github so just copied my source tsrball acros | 22:30 |
wallyworld | yeah, i think i build on the nova host | 22:30 |
wallyworld | i can try again | 22:31 |
wallyworld | actually | 22:31 |
wallyworld | i could get outgoing access via wget | 22:31 |
wallyworld | but go get just hung | 22:31 |
wallyworld | so i couldn't get the juju source in the normal way | 22:31 |
wallyworld | via vcs | 22:31 |
dannf | though building in the vms *should* work - if not, probably a bug | 22:31 |
rogpeppe1 | a fairly trivial review if anyone wants to take a look: https://codereview.appspot.com/85600044 | 22:32 |
dannf | wallyworld: we can ask is to open access for us to certain things. surprised lp access was blocked | 22:32 |
wallyworld | dannf: i could wget to launchpad but "go get launchpad.net/juju-core" failed | 22:33 |
wallyworld | or hung | 22:33 |
dannf | ah - go get... never used that before | 22:33 |
cmars | thumper, thanks | 22:33 |
wallyworld | so i just copied across the source | 22:33 |
dannf | i'll investigate that and at least get a bug filed if neeeded | 22:33 |
wallyworld | dannf: go get uses bzr behind the scenes | 22:33 |
wallyworld | dannf: i have a meeting now but will ping back when done | 22:33 |
dannf | ack | 22:34 |
cmars | sinzui, fix for LP: #1303880 is landing in trunk. do you need it proposed to any branches? | 22:35 |
_mup_ | Bug #1303880: Juju 1.18.0, can not deploy local charms without series <charms> <series> <juju-core:In Progress by cmars> <https://launchpad.net/bugs/1303880> | 22:35 |
sinzui | cmars, yes please lp:juju-core/1.18 | 22:36 |
dannf | wallyworld: seems to be working for me. but slow. and no output. i just see the directory growing in size | 22:37 |
thumper | wallyworld: trivial review for you, although since it is needed on two branches, lbox isn't good at handling it | 22:40 |
thumper | https://code.launchpad.net/~thumper/juju-core/update-websocket-lib/+merge/215057 | 22:41 |
thumper | and https://code.launchpad.net/~thumper/juju-core/update-websocket-lib/+merge/215046 | 22:41 |
wallyworld | thumper: otp, will look soon | 22:43 |
thumper | ack | 22:43 |
thumper | cmars: I have approved the other branch | 22:55 |
thumper | cmars: although I did realise that there aren't any tests for the new error message | 22:55 |
thumper | cmars: is it hard to add one? | 22:55 |
thumper | cmars: also, lbox doesn't like submitting the same branch to multiple targets | 22:56 |
thumper | cmars: it is a bit too dumb | 22:56 |
cmars | thumper, i'll propose a test case for deploying local without series. might be after dinner | 23:13 |
thumper | cmars: ack | 23:13 |
dannf | wallyworld: and go get seems to have completed (/home/ubuntu/dannf/go) | 23:18 |
wallyworld | dannf: great :-) still in meeting, will check back soon | 23:19 |
dannf | wallyworld: np; i need to start up the grill, so responses will be latent | 23:19 |
davecheney | arosales: hazmat ping | 23:43 |
davecheney | arosales: hazmat do you have time for a quick G+ to talk about the demo | 23:45 |
arosales | davecheney: hello | 23:48 |
davecheney | arosales: hazmat lets take this to #eco | 23:48 |
arosales | ok | 23:49 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!