davecheney | thumper: nagging nag, did you talk to smoser about power64 ? | 00:00 |
---|---|---|
ericsnow | wallyworld_: it's left over from the original backup script | 00:00 |
ericsnow | wallyworld_: I'm in the process of sorting out all those hard-coded paths | 00:00 |
wallyworld_ | ok | 00:01 |
wallyworld_ | we must have been lucky this hasn't failed then | 00:01 |
wallyworld_ | in an HA environment | 00:01 |
katco | wallyworld_: for your perusal: http://reviews.vapour.ws/r/519/ | 00:06 |
wallyworld_ | rightio | 00:06 |
ericsnow | wallyworld_: right | 00:07 |
thumper | davecheney: yes, been escelated | 00:11 |
thumper | for some spelling of that | 00:11 |
axw | wallyworld_: seen the Azure bug? | 00:16 |
wallyworld_ | axw: yeah, otp, but i was going to ask you to pick it up. i trusted kapil to have tested it but obviously not :-( | 00:17 |
axw | will do | 00:17 |
wallyworld_ | ty | 00:17 |
thumper | davecheney: I noticed that the juju/utils package tests panic with gccgo | 00:24 |
thumper | davecheney: but no idea why | 00:24 |
thumper | I couldn't grok the panic | 00:24 |
thumper | subpackages are fine, just the main one | 00:25 |
thumper | wallyworld_: do you know which juju projects have the autolander? | 00:28 |
menn0 | ericsnow: it's pretty awful that everything is hardcoded to be able machine-0 but if that's going to get resolved soon then I guess that's ok | 00:31 |
davecheney | thumper: thanks | 00:31 |
menn0 | s/able/about/ | 00:31 |
davecheney | thumper: paste.ubuntu ? | 00:31 |
* thumper looks for it again | 00:32 | |
ericsnow | menn0: agreed | 00:32 |
thumper | davecheney: http://paste.ubuntu.com/9351357/ | 00:33 |
thumper | ericsnow: do you know which juju projects are auto-landed? | 00:34 |
ericsnow | thumper: you talking about the bot or for reviewboard? | 00:35 |
thumper | ericsnow: bot | 00:35 |
thumper | ericsnow: although knowing about reviewboard would also help | 00:36 |
ericsnow | thumper: RB is currently just core and utils | 00:36 |
thumper | ericsnow: it seemed that juju/cmd didn't have the rb hookup | 00:36 |
davecheney | thumper: sorry, my internets shat themselves | 00:36 |
thumper | ericsnow: ah... | 00:36 |
davecheney | you were saying ? | 00:36 |
thumper | davecheney: http://paste.ubuntu.com/9351357/ | 00:36 |
davecheney | ta | 00:37 |
thumper | I think I'll just use the github magic merge button for juju/utils | 00:37 |
davecheney | thumper: first thougth is you have the wrong gccgo | 00:37 |
=== kadams54 is now known as kadams54-away | ||
thumper | davecheney: I've not changed it... | 00:37 |
davecheney | prety much everyone has the wrong gccgo | 00:37 |
ericsnow | thumper: for the bot I think it's just core | 00:37 |
davecheney | thumper: do this | 00:37 |
davecheney | go test -c $PKG | 00:37 |
davecheney | gdb --args ./$PKG.test | 00:37 |
davecheney | r | 00:37 |
thumper | davecheney: can I use '.' for $PKG? | 00:38 |
davecheney | yes | 00:38 |
davecheney | so if your in juju/cmd | 00:38 |
davecheney | the bin will be called | 00:38 |
davecheney | ./cmd.test | 00:38 |
thumper | davecheney: but also need to specify the compiler right? | 00:38 |
davecheney | yes | 00:38 |
ericsnow | wallyworld_, menn0: fix landed for 1398448 | 00:38 |
davecheney | -c basically says "don't throw away the test binary afterwads, and give it a predictable name" | 00:39 |
menn0 | ericsnow: awesome | 00:39 |
thumper | davecheney: all passed with gdb | 00:39 |
menn0 | ericsnow: it'll take a while to filter down to the relevant CI jobs | 00:39 |
ericsnow | menn0: for that "unable to get DB names" issue, looks like the mgo session dropped or something (hence EOF) | 00:40 |
davecheney | thumper: so here is a fun thing | 00:40 |
davecheney | gccgo development has moved on since 4.9.2 | 00:40 |
ericsnow | menn0: I'm working on making it at least a little more robust | 00:41 |
davecheney | what are out changes of getting gccgo 5.0 backported to trusty ? | 00:41 |
davecheney | s/out changes/our chances/ | 00:41 |
thumper | nfi, but we can ask | 00:41 |
davecheney | basically we're going to have to stick with the trunk of gccgo if we want to have any support from uupstream | 00:42 |
menn0 | ericsnow: sounds good | 00:43 |
davecheney | thumper: this is on a branch ? | 00:44 |
davecheney | i'll try to repro | 00:44 |
thumper | davecheney: master | 00:44 |
davecheney | ok, that makes it easy | 00:44 |
davecheney | github.com/juju/cmd ? | 00:44 |
=== kadams54-away is now known as kadams54 | ||
thumper | davecheney: no, utils | 00:46 |
=== kadams54 is now known as kadams54-away | ||
* davecheney smacks forhead | 00:47 | |
davecheney | http://paste.ubuntu.com/9351435/ | 00:48 |
davecheney | thumper: ummm | 00:48 |
davecheney | what happened here | 00:48 |
davecheney | oh, wait | 00:48 |
davecheney | sorry, local problem | 00:48 |
wallyworld_ | ericsnow: ty, sorry just got out of meeting | 00:51 |
ericsnow | wallyworld_: no worried | 00:51 |
wallyworld_ | thumper: no, many are supposed to, you mean on github or lp? | 00:52 |
wallyworld_ | i think the tarmac bot has died | 00:52 |
thumper | wallyworld_: I just don't know which ones | 00:52 |
thumper | wallyworld_: my general approach is to try $$merge$$ and if nothing happens for a few minutes, do it manually | 00:52 |
wallyworld_ | we need to follow up there, it's fallen into a hole | 00:53 |
davecheney | thumper: ok, repro pretty easy | 00:53 |
thumper | wallyworld_: github lander not lp | 00:53 |
davecheney | % env GOMAXPROCS=42 ./utils.test | 00:54 |
davecheney | Segmentation fault (core dumped) | 00:54 |
davecheney | wheeee | 00:54 |
thumper | davecheney: is the fix as obvious? | 00:55 |
=== kadams54-away is now known as kadams54 | ||
davecheney | now I should be able to get it to shit itself under gdb | 00:56 |
davecheney | urgh, looks like a gc bug | 00:57 |
davecheney | or a crash in libunwind | 00:57 |
davecheney | http://paste.ubuntu.com/9351492/ | 00:59 |
thumper | interesting | 01:00 |
davecheney | thumper: what release are you running ? | 01:01 |
thumper | trusty | 01:01 |
davecheney | same | 01:01 |
* thumper afk for a bit | 01:02 | |
mwhudson | davecheney: oh, i've seen that one | 01:07 |
mwhudson | davecheney: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64001 | 01:08 |
davecheney | mwhudson: do you want me to work on a repro ? | 01:10 |
davecheney | it's related to the http -> tls -> asn code path | 01:10 |
mwhudson | davecheney: a small-ish repro would be great, yes | 01:10 |
mwhudson | davecheney: yeah, but the crash seems dementedly unrelated to that | 01:10 |
davecheney | you're right about the stack split | 01:10 |
davecheney | that paste above is pretty clear where it's happening | 01:11 |
mwhudson | yeah, but not why | 01:11 |
davecheney | somethign it makeing libunwind shit itself | 01:11 |
mwhudson | well no | 01:11 |
davecheney | ? | 01:11 |
mwhudson | at least not in my case | 01:11 |
mwhudson | it's in the __morestack splitting code | 01:12 |
mwhudson | and $rsp is bogus | 01:12 |
mwhudson | sometimes unaligned, sometimes unmapped | 01:12 |
mwhudson | i guess if you're really lucky it's valid but pointing at some random part of the heap so you get random corruption | 01:12 |
davecheney | http://paste.ubuntu.com/9351539/ | 01:12 |
davecheney | mwhudson can you do this | 01:13 |
mwhudson | i saw at least three distinct failure modes fwiw | 01:13 |
davecheney | env GOGC=1 gdb --args go get github.com/lxc/lxd | 01:13 |
mwhudson | that was one of them | 01:13 |
davecheney | yeah, i've seen some others | 01:13 |
davecheney | i think the https one is the easiest to make a repro | 01:13 |
mwhudson | it worked once... | 01:15 |
mwhudson | but then | 01:15 |
mwhudson | Program received signal SIGBUS, Bus error. | 01:15 |
mwhudson | __morestack () at ../../../src/libgcc/config/i386/morestack.S:529 | 01:15 |
davecheney | oh | 01:15 |
davecheney | interesting | 01:15 |
mwhudson | (gdb) p $rsp | 01:15 |
mwhudson | $1 = (void *) 0xffffedf45360 | 01:15 |
davecheney | blergh | 01:16 |
davecheney | mwhudson: ok, i'll try to make a stand alone repro that blows up | 01:16 |
davecheney | hopefully ian can figure out what is failing | 01:16 |
davecheney | 'cos it's way above my ken | 01:16 |
mwhudson | yeah | 01:17 |
mwhudson | i spent an hour or so poking at it when i was in austin and got ~nowhere | 01:17 |
wallyworld_ | katco: standup? | 01:19 |
davecheney | mwhudson: do you know if 4.9.2 is available in a trusty-backports ? | 01:19 |
katco | wallyworld_: shoot sorry brt | 01:19 |
mwhudson | davecheney: i do not know | 01:19 |
davecheney | poop | 01:19 |
=== kadams54 is now known as kadams54-away | ||
davecheney | mwhudson: http://paste.ubuntu.com/9351595/ | 01:27 |
davecheney | worlds smallest repro | 01:27 |
davecheney | mwhudson: will you be my mule and put this stuff on the gcc bugzilla ? | 01:27 |
thumper | menn0: updated https://github.com/juju/cmd/pull/10/files | 01:34 |
thumper | davecheney: that is pretty small | 01:34 |
* thumper rushes to get the cmd branch in before anastasiamac | 01:36 | |
mwhudson | davecheney: ! | 01:42 |
axw | wallyworld_: sorry I was wrong, D1 fails for me too. I put in a rubbish name and saw "D1", but that's hard-coded in D1 | 01:42 |
axw | err hard coded in Juju | 01:43 |
mwhudson | davecheney: sure, will en-bugzilla | 01:44 |
wallyworld_ | axw: no worries, glad it's reproduceable | 01:47 |
menn0 | thumper: looking | 01:50 |
davecheney | mwhudson: so that one liner can blow up simply | 01:50 |
davecheney | i'm trying to disect it down to the bit that confuses the gc | 01:50 |
davecheney | oh | 01:50 |
davecheney | i know what it could be | 01:51 |
davecheney | asn1 probably looks like a random field of pointers to the gc | 01:51 |
mwhudson | oh | 01:54 |
mwhudson | it fails with GOGC=off though? | 01:54 |
davecheney | should do | 01:55 |
davecheney | repro is super fiddly | 01:55 |
davecheney | oh, interestingly | 01:56 |
davecheney | that aspolodes as well | 01:56 |
davecheney | which means you were right about the stack split | 01:56 |
anastasiamac | thumper: u r making me cry but it's k - whoever has the last laugh and all.... :-P | 02:00 |
menn0 | thumper: done. there's a few doc issues but otherwise LGTM. | 02:00 |
menn0 | thumper: note that some of the comments are directly against the last commit which Github makes a little tricky to get to. | 02:01 |
thumper | menn0: ta | 02:06 |
thumper | ah... didn't fix the docstrings on those alias methods | 02:07 |
thumper | bugger | 02:07 |
davecheney | anastasiamac: i like your attitude to code review | 02:09 |
davecheney | don't get mad, get even | 02:09 |
anastasiamac | davecheney: i get made at drivers not coders ;-) | 02:10 |
anastasiamac | davecheney: plus thumper is scary | 02:10 |
anastasiamac | davecheney: wouldn't want to get mad at him... | 02:10 |
thumper | pfft | 02:11 |
davecheney | yeah, he's fluffy | 02:11 |
davecheney | like a kitten | 02:11 |
thumper | fluffy now with the facial fuzz | 02:11 |
axw | wallyworld_: https://code.launchpad.net/~axwalk/gwacl/rolesizes-fix-dg-names/+merge/243478 please | 02:16 |
wallyworld_ | looking | 02:16 |
wallyworld_ | axw: where are the Aliases used? | 02:20 |
axw | wallyworld_: nowhere yet, they'll be used in Juju | 02:20 |
axw | wallyworld_: I'll extend environs/instances/InstanceType to have an "Aliases []string" field | 02:20 |
wallyworld_ | axw: ok, do we have the ExtraLarge etc ones the right way around? | 02:21 |
axw | wallyworld_: yes | 02:21 |
wallyworld_ | rightio | 02:21 |
axw | wallyworld_: from Azure: | 02:21 |
axw | 2014-12-02 11:37:29 ERROR juju.cmd supercommand.go:323 failed to bootstrap environment: cannot start bootstrap instance: POST request failed: BadRequest - Value 'D1' specified for parameter 'RoleSize' is invalid. Allowed values are 'ExtraSmall,Small,Medium,Large,ExtraLarge,A5,A6,A7,A8,A9,Basic_A0,Basic_A1,Basic_A2,Basic_A3,Basic_A4,Standard_D1,Standard_D2,Standard_D3,Standard_D4,Standard_D11,Standard_D12,Standard_D13,Standard_D14'. (http code | 02:21 |
axw | 400: Bad Request) | 02:21 |
wallyworld_ | ok, and the same for the G ones I guess | 02:21 |
wallyworld_ | those names look like a dog's breakfast | 02:22 |
thumper | waigani_: very complex review ... not! http://reviews.vapour.ws/r/575/ | 02:23 |
=== kadams54 is now known as kadams54-away | ||
=== kadams54-away is now known as kadams54 | ||
thumper | wallyworld_: is the landing bot unblocked? | 02:24 |
wallyworld_ | should be | 02:24 |
wallyworld_ | for master? | 02:24 |
wallyworld_ | i landed a fix last night | 02:24 |
thumper | jw4: FAIL: action_test.go:144: actionSuite.TestFindActionTagsByPrefix | 02:25 |
thumper | jw4: intermittent? | 02:25 |
mwhudson | davecheney: __morestack is called a _lot_ | 02:29 |
mwhudson | it turns out | 02:29 |
thumper | WTH!!! | 02:33 |
thumper | that action test fails for me all the time... | 02:34 |
axw | wallyworld_: I always forget. the bot does gwacl? | 02:36 |
wallyworld_ | axw: i merged by hand, i think tarmac is dead | 02:36 |
axw | ok | 02:36 |
* thumper throws his hands up and leaves the office | 02:38 | |
thumper | how the fuck did this test land? | 02:40 |
thumper | menn0: state/action.go:278 | 02:41 |
thumper | menn0: and apiserver/action/action_test.go:144 | 02:42 |
davecheney | mwhudson: yeah, which is odd | 02:42 |
davecheney | 'cos there is no escape analysis in gccgo | 02:42 |
davecheney | so there should be _less_ stack pressure | 02:42 |
* thumper heads to the 'uper duper market | 02:43 | |
mwhudson | davecheney: so i was wrong in my initial bug comment, __generic_morestack is returning junk | 02:44 |
mwhudson | davecheney: also, it really doesn't fail with my random gccgo tip build | 02:44 |
axw | wallyworld_: D1 fails to provision still, but it's a different error now (Compute.OverconstrainedAllocationRequest) | 02:56 |
axw | this may take a little while... | 02:57 |
wallyworld_ | balls | 02:57 |
wallyworld_ | sure that's not just a transient azure snaffu | 02:58 |
axw | wallyworld_: I get the same error on West US and Southeast Asia, for both D1 and D2 multiple times | 02:58 |
axw | trying through the management console now | 02:58 |
wallyworld_ | hmmm | 02:59 |
=== kadams54 is now known as kadams54-away | ||
thumper | menn0: ping | 03:35 |
thumper | jw4: ping | 03:40 |
thumper | wallyworld_: can I get you to test something for me please? | 03:41 |
wallyworld_ | sure | 03:41 |
thumper | wallyworld_: on master, run the tests in apiserver/actions plz? | 03:41 |
thumper | I get a fail that I can't see how it landed | 03:41 |
thumper | it is me | 03:43 |
thumper | grr | 03:43 |
thumper | fuckity fuck fuck | 03:43 |
thumper | I'm bringing in jw4's utils change... | 03:43 |
wallyworld_ | ok, just gotta shelve, sec | 03:44 |
wallyworld_ | OK: 8 passed | 03:44 |
wallyworld_ | PASS | 03:44 |
wallyworld_ | ok github.com/juju/juju/apiserver/action 5.622s | 03:44 |
wallyworld_ | thumper: ^^^^ | 03:44 |
thumper | ta | 03:45 |
wallyworld_ | let me check that y master is up to date, i think it ia | 03:45 |
thumper | it is | 03:45 |
thumper | I know what it is | 03:45 |
wallyworld_ | kk | 03:45 |
menn0 | thumper: sorry, I was out giving programming lessons | 03:55 |
thumper | menn0: to? | 03:55 |
menn0 | thumper: still need me to do something? | 03:56 |
menn0 | thumper: a friend's son (he's 11 but very keen) | 03:56 |
* thumper nods | 03:56 | |
menn0 | doing an hour a week with him | 03:56 |
thumper | menn0: got a minute to hangout? | 03:56 |
menn0 | thumper: yep | 03:56 |
thumper | menn0: standup hangout | 03:56 |
menn0 | thumper: there now | 03:57 |
anastasiamac | 4421 lines in one file :-( | 04:25 |
anastasiamac | painful... | 04:26 |
=== kadams54 is now known as kadams54-away | ||
wallyworld_ | jam1: ping | 05:33 |
bradm | davecheney: hey | 05:41 |
davecheney | bradm: hey | 05:42 |
bradm | davecheney: this might be easier than going back and forth via RT :) | 05:42 |
davecheney | bradm: yeah | 05:43 |
bradm | davecheney: so I checked the squid config on batuan, you did not have github on it. | 05:43 |
bradm | davecheney: but you do now. :) | 05:43 |
davecheney | excelelnt | 05:43 |
davecheney | thanks | 05:43 |
davecheney | i'll try now | 05:43 |
bradm | davecheney: give it a go, all those things you listed should be working. | 05:43 |
davecheney | bradm: what do I do abour getting mercurial on rubgy ? | 05:43 |
bradm | davecheney: is the packaged version ok? | 05:43 |
davecheney | for <reasons> juju uses all three of git, bzr and hg | 05:44 |
davecheney | yup | 05:44 |
bradm | davecheney: do you need it in a chroot? or just in the base OS? | 05:44 |
davecheney | base os | 05:44 |
davecheney | don't care about that chroot stuff | 05:44 |
davecheney | in this case i'm just a luser | 05:44 |
bradm | you put your hg in your bzr in your git? then you should cvs it, and then rcs that. | 05:44 |
bradm | you'd never lose anything then. | 05:44 |
davecheney | thanks for the tip | 05:45 |
davecheney | did i mention we also use more than one code review system | 05:45 |
* davecheney stabs self in face | 05:45 | |
bradm | davecheney: mercurial package is installed. | 05:46 |
davecheney | danka | 05:47 |
bradm | davecheney: how's that look? need anything else to get you going? | 05:48 |
davecheney | squid.internal gives some public ip | 05:50 |
davecheney | can I check the proxy settings with you ? | 05:50 |
bradm | sure | 05:50 |
bradm | what are you using? | 05:51 |
davecheney | http_proxy=http://squid.internal:8123 | 05:51 |
davecheney | https_proxy=$http_proxy | 05:51 |
davecheney | export http_proxy https_proxy | 05:51 |
bradm | try http_proxy=http://batuan.canonical.com:3128 | 05:52 |
bradm | and the https_proxy bit too | 05:52 |
bradm | squid.internal is mostly used by UK hosts, I only mentioned it because I wasn't sure what you had | 05:53 |
davecheney | bradm: thanks | 05:53 |
davecheney | all working now | 05:53 |
bradm | davecheney: perfect! let us know if you have any further issues | 05:53 |
davecheney | bradm: dfc@rugby:~$ gcc | 05:54 |
davecheney | The program 'gcc' is currently not installed. To run 'gcc' please ask your administrator to install the package 'gcc' | 05:54 |
davecheney | sorry, i didn't even think to check this | 05:55 |
davecheney | could I get build-essential and gdb pls | 05:55 |
bradm | davecheney: running | 05:55 |
bradm | davecheney: done. | 05:57 |
davecheney | thanks | 05:57 |
* axw pulls hair out | 06:05 | |
axw | wallyworld_: now I'm finding that some locations don't have some role sizes. gotta filter that out too... | 06:05 |
axw | le sigh | 06:05 |
wallyworld_ | ffs | 06:10 |
wallyworld_ | fwereade_: if you did have time, here's the work to generate the server cert on each state server. i've removed the cert from state entirely. http://reviews.vapour.ws/r/552/ | 06:48 |
axw | wallyworld_: https://code.launchpad.net/~axwalk/gwacl/listlocations/+merge/243495 - another small one, please | 07:39 |
wallyworld_ | looking | 07:41 |
TheMue | morning | 08:27 |
TheMue | jam1: a few seconds, missed to install the plugin | 09:00 |
TheMue | jam1: dimitern: so, in the hangout | 09:01 |
dimitern | TheMue, what hangout? | 09:03 |
TheMue | dimitern: I thought you would be part of the 1:1:1 ;) | 09:04 |
dimitern | TheMue, ah, I though yours was yesterday | 09:04 |
dimitern | thought | 09:05 |
TheMue | dimitern: jam1 moved it due to his holiday | 09:05 |
dimitern | TheMue, I don't have the link - can you add me to the guests? | 09:05 |
TheMue | dimitern: yep | 09:05 |
TheMue | dimitern: just invited you | 09:06 |
dimitern | TheMue,yeah, thanks - hmm.. it came to my phone though | 09:07 |
TheMue | dimitern: *lol* that's the magic of the google cloud | 09:07 |
voidspace | dimitern: when you get a chance old boy | 09:14 |
voidspace | dimitern: http://reviews.vapour.ws/r/564/ | 09:14 |
dimitern | voidspace, sure, will have a look shortly | 09:17 |
voidspace | dimitern: "PickNewAddress" (or whatever we call it). Subnet method or State method? | 09:33 |
voidspace | dimitern: I think Subnet method. State has a big enough API already. | 09:34 |
voidspace | coffee - brb | 09:34 |
jam1 | dimitern: TheMue: standup ? | 10:02 |
dimitern | jam1, omw | 10:02 |
TheMue | jam1: coming, just finished 1:1 | 10:02 |
rogpeppe | voidspace: ping | 10:09 |
perrito666 | good morning | 10:12 |
wallyworld_ | jam1: after your standup, can you ping me back? i need to ask about bug 1397376 | 10:13 |
mup | Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <fallout> <landscape> <maas-provider> <juju-core:Triaged> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1397376> | 10:13 |
jam1 | wallyworld_: we'll want to make sure dimitern is in that conversation, as there seems to be some strong disagreement about whether things should be talking DNS names or IP addresses. | 10:14 |
wallyworld_ | sure | 10:14 |
wallyworld_ | customers want ip addresses i think | 10:14 |
wallyworld_ | and there are claims only returning one address is a regression | 10:15 |
wallyworld_ | since it used to return multiple | 10:15 |
jam1 | wallyworld_: IIRC from the maas discussion, the IP address can change while the DNS name would stay consistent | 10:15 |
wallyworld_ | and the api is called api-endpoints after all | 10:15 |
jam1 | so the MaaS guys asked us to talk in terms of DNS names | 10:15 |
wallyworld_ | oh, i see | 10:15 |
axw | rogpeppe: hey. would there be any reason for charm.Meta to have bson tags, if we duplicate the structure in state and put bson tags there? | 10:16 |
jam1 | wallyworld_: api-endpoints did return only 1, then it started returning 2, then we reverted back to 1 | 10:16 |
perrito666 | davecheney: did you mean http://golang.org/pkg/os/#Create ? in your email about chmod? | 10:16 |
dimitern | wallyworld_, jam1, I can bring this up at today's maas cross team call | 10:16 |
rogpeppe | axw: yeah, i think we're storing it directly in mongo in the charm store | 10:17 |
axw | ok | 10:17 |
wallyworld_ | dimitern: that would be good. i fear (perhaps unnecessarily) that other providers would work better with ip addresses | 10:17 |
wallyworld_ | eg openstack autopilot | 10:18 |
wallyworld_ | see comment #8 | 10:18 |
wallyworld_ | so it seems we want one behaviour for maas, and another for openstack | 10:18 |
wallyworld_ | but if the provider gives us the correct info for machine addresses, it all should just work | 10:19 |
dimitern | wallyworld_, afaik maas did not return dns names before due to a bug - it was supposed to | 10:19 |
wallyworld_ | and we need the prefered address a 0 | 10:19 |
wallyworld_ | as | 10:19 |
wallyworld_ | as per comment #9 | 10:19 |
wallyworld_ | dimitern: so it seems then that with a mass that works correctly, the bug becomes a matter of ensuring that we ensure that the preferred ip address is in Addresses[0] so that's te one printed | 10:20 |
dimitern | wallyworld_, or we can just try to resolve dns names before saving | 10:22 |
wallyworld_ | for maas | 10:22 |
jam1 | dimitern: wallyworld_: so I believe the bug *here* is that MaaS guys are saying "use the DNS name" but the Autopilot guys are saying "but I don't want to have to add MaaS as my DNS source" | 10:22 |
jam1 | It makes some sense fo inside the MaaS cloud as everything is run by MaaS, but for client machines | 10:22 |
jam1 | they are fairly likely to just know the MaaS endpoint, and not be configured to talk to "foo.maas" | 10:22 |
wallyworld_ | client machines would want ip address i think | 10:23 |
jam1 | wallyworld_: AIUI, api-info is intended to supersede api-endpoints | 10:23 |
jam1 | as it can give information on stuff like CA Cert | 10:24 |
wallyworld_ | that could well be true, i'm not across the detail on this bit of the system, hence asking you guys :-) | 10:24 |
wallyworld_ | but | 10:24 |
wallyworld_ | we do need to consider backwards compatibility, no? | 10:24 |
wallyworld_ | dimitern: jam1: so can i leave this bug in your guys' capable hands? :-) | 10:26 |
dimitern | wallyworld_, sure thing :) | 10:26 |
wallyworld_ | ty :-) \o/ | 10:27 |
wallyworld_ | dimitern: andrew is working the azure bug - it got complicated, so he has to add in extra apis to query location as not all locations support all the role sizes :-( | 10:28 |
dimitern | wallyworld_, sweet! I underestimated that one badly | 10:28 |
jam1 | dimitern: I'm just makign coffee, but I'll be at the next meeting. | 10:28 |
wallyworld_ | dimitern: me too. i thought that the gwacl changes made by kapil were all good to go, bad assumption :-) | 10:29 |
wallyworld_ | as it turned out | 10:29 |
wallyworld_ | azure is complicated | 10:29 |
dimitern | yeah, and often broken as well | 10:30 |
wallyworld_ | yep, through no fault of ours it seems many times | 10:30 |
dimitern | fwereade_, jamespage, gnuoy, networking call? | 10:30 |
jamespage | dimitern, yes - sorry - still in stockholm | 10:31 |
bac | hi axw, you still around? | 10:34 |
axw | bac: heya, yes I am | 10:36 |
bac | axw: good, i know it is late for you. i saw a problem with azure yesterday that i wanted to tell you about. | 10:37 |
axw | bac: is it this one? https://bugs.launchpad.net/juju-core/1.21/+bug/1398406 | 10:37 |
mup | Bug #1398406: Azure provider attempts to deploy with unsupported "D1" RoleSize <azure-provider> <bootstrap> <ci> <regression> <juju-core 1.21:In Progress by axwalk> <https://launchpad.net/bugs/1398406> | 10:37 |
bac | axw: no. | 10:37 |
axw | okey dokey | 10:37 |
voidspace | rogpeppe: pong | 10:38 |
voidspace | rogpeppe: sorry, only just seen your ping | 10:38 |
rogpeppe | voidspace: np | 10:38 |
bac | axw: it involved having two environments up with the same credentials but different storage for the state servers. i used destroy-environment on one and it took down both. | 10:38 |
rogpeppe | voidspace: how much do you know about USSO? | 10:38 |
axw | eep | 10:38 |
voidspace | rogpeppe: well, I've never heard of the acronym | 10:38 |
bac | axw: i have the remnants on azure but have not been able to create a minimal reproduction of the problem | 10:38 |
voidspace | rogpeppe: so not a good start... | 10:39 |
rogpeppe | voidspace: just saw your name on the identityprovider source code and thought you might be able to help us... | 10:39 |
bac | axw: i was using 1.20.12-utopic-amd64. | 10:39 |
axw | bac: I suspect azure is not using the env UUID to separate things... I will see if I can figure it out. did you raise a bug already? | 10:39 |
voidspace | rogpeppe: heh, I worked a lot on identityprovider - but not actually on the identity protocols in the end | 10:39 |
bac | axw: i did not since i could not reproduce | 10:39 |
voidspace | rogpeppe: and I don't recall ever hearing USSO, so it might have been added after I left | 10:39 |
axw | fair enough | 10:39 |
rogpeppe | voidspace: ubuntu single sign on | 10:39 |
axw | bac: thanks, I'll take a look at the code to see if I can think up a repro | 10:40 |
bac | axw: the problem was quite costly as it destroyed our CI environment | 10:40 |
axw | :( | 10:40 |
voidspace | rogpeppe: ah... | 10:40 |
bac | axw: if you'd like to look at the jenv files or azure storage i've kept them | 10:40 |
voidspace | rogpeppe: oh, it depends on the question then | 10:40 |
voidspace | rogpeppe: it's been a while though... | 10:40 |
axw | bac: that would be good to have | 10:40 |
bac | axw: and i'm happy to file a bug | 10:40 |
rogpeppe | voidspace: perhaps you could join us in a hangout for a few moments? | 10:40 |
voidspace | rogpeppe: sure | 10:41 |
bac | axw: do you have access to chinstrap? i'd like to not sanitize the jenv files so i don't want to attach them to the bug report | 10:41 |
axw | bac: yes I do | 10:41 |
bac | axw: cool, i'll put it there and reference it. | 10:41 |
bac | axw: thanks and good night | 10:42 |
axw | bac: thanks | 10:42 |
axw | (and sorry about your CI env :() | 10:43 |
axw | PTAL, I tweaked the isLimitedRoleSize code a bit | 10:58 |
axw | err | 10:58 |
axw | wallyworld_: ^^ | 10:58 |
wallyworld_ | looking | 10:58 |
wallyworld_ | axw: just a typo, might be good for one more live test before landing, just to be sure with the tweaks | 11:01 |
axw | wallyworld_: thanks. sure | 11:02 |
* axw dinners first | 11:03 | |
wallyworld_ | np, ty | 11:03 |
dimitern | voidspace, whew.. no meetings for a while, so back to your review | 11:12 |
=== ashipika1 is now known as ashipika | ||
voidspace | dimitern: cool, thanks | 11:16 |
=== ashipika1 is now known as ashipika | ||
dimitern | voidspace, reviewed, please ping me if something is unclear | 11:34 |
=== ashipika1 is now known as ashipika | ||
voidspace | dimitern: sure, thanks | 12:05 |
perrito666 | Is there a create version tat supports permissions? | 12:22 |
perrito666 | what exactly will _posix tag match? | 12:43 |
perrito666 | build* tag | 12:43 |
wallyworld_ | fwereade_: if you did get a chance or the inclination to look at http://reviews.vapour.ws/r/552/ that would be great, but you're busy so i'll bother someone else tomorrow if needed | 12:56 |
fwereade_ | wallyworld_, thanks, I'll try | 12:57 |
wallyworld_ | sure, no hassle if not | 12:57 |
axw | wallyworld_: the Azure fixes still need backporting to 1.21, I'll do that tomorrow | 13:28 |
wallyworld_ | axw: sure, np | 13:28 |
wallyworld_ | thanks for fixing | 13:28 |
bac | axw: i filed bug 1398820, not sure if you saw it | 13:30 |
mup | Bug #1398820: destroying Azure environment took down other environment <juju-core:New> <https://launchpad.net/bugs/1398820> | 13:30 |
=== kadams54 is now known as kadams54-away | ||
=== kadams54-away is now known as kadams54 | ||
mbruzek1 | wallyworld_ ping | 14:35 |
bac | sinzui: thanks for the insight on that bug. i'm trying to reproduce with azure and azure-1 now. | 14:36 |
anastasiamac | mbruzek1: it's 00.35am where wallyworld is :) | 14:36 |
wallyworld_ | but sadly he is here | 14:37 |
mbruzek1 | ping retracted go to sleep Ian | 14:37 |
wallyworld_ | mbruzek1: how can i help? | 14:37 |
anastasiamac | wallyworld_: OMG!! | 14:37 |
wallyworld_ | like you can talk :-) | 14:37 |
sinzui | bac: abentley and I are trying to remember the issue we found about a year ago. | 14:37 |
wallyworld_ | same time for you | 14:37 |
anastasiamac | wallyworld_: I had 89 comments on my tiny PR... how can I sleep? | 14:37 |
wallyworld_ | all/mostly trivial | 14:38 |
anastasiamac | wallyworld_: true. in fact, I was hoping u could review tomorrow in hopes of landing before ur holiday :) | 14:38 |
abentley | sinzui, bac: https://bugs.launchpad.net/juju-core/+bug/1257481 | 14:38 |
mup | Bug #1257481: juju destroy-environment destroys other environments <ci> <destroy-environment> <juju-core:Fix Released by jameinel> <juju-core 1.16:Fix Released by jameinel> <juju-core (Ubuntu):Fix Released> <juju-core (Ubuntu Saucy):New> <https://launchpad.net/bugs/1257481> | 14:38 |
wallyworld_ | anastasiamac: sure | 14:39 |
anastasiamac | wallyworld_: thnx ;-) m EOD-ing now | 14:40 |
wallyworld_ | as well you should | 14:40 |
bac | thanks abentley. | 14:40 |
anastasiamac | wallyworld_: u2 | 14:40 |
mattyw | sinzui, landing is blocked on 1398448 - but it looks like a fix has landed for it? | 14:40 |
* sinzui checks test | 14:42 | |
sinzui | mattyw, ha ha, that bug is indeed fix released, but is replaced by another bug 1398837 | 14:44 |
mup | Bug #1398837: cannot extract configuration from backup file: "var/lib/juju/agents/machine-0/agent.conf <backup-restore> <ci> <regression> <juju-core:Triaged> <https://launchpad.net/bugs/1398837> | 14:44 |
mattyw | sinzui, so still blocked - but for a different reason? | 14:45 |
sinzui | mattyw, yes, sorry. | 14:45 |
mattyw | sinzui, no problem - is someone working on that bug? | 14:48 |
sinzui | mattyw, not yet, it was reported when abentley was checking on the other blocking bug. I am sending out an email about the many hot bugs targeted to 3 milestones | 14:49 |
mattyw | sinzui, ok - I'm not vounteering as I'm going to be out for a few days - but just wanted to make sure they weren't being ignored | 14:50 |
mattyw | sinzui, so I'm not being helpful - but I'm being supportive ;) | 14:50 |
sinzui | understood Makyo | 14:50 |
sinzui | understood mattyw | 14:50 |
natefinch | ooh ooh, can I be unhelpful but supportive, too? ;) | 14:53 |
mattyw | natefinch, I've taken that job, you'll have to be helpful but unsupportive | 14:56 |
mattyw | or you can be unhelpful and unsupportive | 14:57 |
natefinch | I can't believe you fools! here's the fix. | 14:57 |
perrito666 | natefinch: /ignore sinzui ? | 14:57 |
natefinch | I didn't even know /ignore was a thing... | 14:58 |
* natefinch is an IRC n00b | 14:58 | |
perrito666 | odd, you seem to have the age to have been young when IRC was a thing | 14:58 |
* perrito666 hides | 14:58 | |
natefinch | thou dost injure me, perrito666 | 14:59 |
natefinch | I used IRC back in the late 90's, and figured everyone had moved on since then. But, you know, linux people seem to pine for 1999 for some reason. | 15:00 |
perrito666 | natefinch: you do realize that if I did not spent so much time watching tv and movies that joke would have been completely lost on non english native speakers | 15:00 |
dimitern | wallyworld_, jam1, the dns issue with api-endpoints for maas is not maas-related, it stems most likely from the changes in address selection logic (i.e. prefer hostnames to public/cloud-local IPs - the latter being most common with maas) | 15:08 |
katco | sinzui: ping | 15:17 |
sinzui | hi katco | 15:17 |
katco | sinzui: good morning :) | 15:17 |
katco | sinzui: i'd like to provide andreas with some binaries for tip of 1.21 | 15:17 |
katco | sinzui: what's the best-practice for doing so? i want to make sure he's testing what will eventually be b4 | 15:18 |
=== kadams54 is now known as kadams54-away | ||
sinzui | katco, http://juju-ci.vapour.ws:8080/job/publish-revision/1254/ lists the binaries we built and tested | 15:19 |
sinzui | katco, , but there are no streams for these | 15:19 |
katco | sinzui: that's probably _perfect_ | 15:19 |
katco | sinzui: i haven't wrapped my head around simple streams yet, but i suspect he could use these binaries to update his existing testing environment? | 15:20 |
ericsnow | perrito666: FYI, that blocker is due to the hard-coded "machine-0" in the old restore :P | 15:21 |
sinzui | katco, --upload-tools is the only option. | 15:21 |
katco | sinzui: hm. he can probably build a local environment if pressed. | 15:22 |
sinzui | katco, our testing streams are volatile, they are master at the moment. there were 1.21-beta4 for about 3 hours when those packges were made | 15:22 |
sinzui | katco, yes, the local env will be fine he just needs the state-server and two services | 15:23 |
katco | sinzui: i'm pretty confident in that use-case. but i want to make sure it works for his more complex example | 15:23 |
perrito666 | ericsnow: well, i told you that there was no guaranteeof compatibility with thenew backup | 15:23 |
ericsnow | perrito666: I'm expecting that the new restore will break in the same way (under HA) | 15:24 |
bac | sinzui: i am able to reproduce bug 1398820 and have added more information. | 15:24 |
mup | Bug #1398820: destroying Azure environment took down other environment <azure-provider> <destroy-environment> <juju-core:Triaged> <https://launchpad.net/bugs/1398820> | 15:24 |
perrito666 | ericsnow: most likely, but it is waaaay easier to fix | 15:24 |
ericsnow | perrito666: easier to test at least :) | 15:25 |
bac | sinzui: due to documented loss of user data (the blunder cost us 10 engineer hours and denied 12 engineers the use of our landing automation for five hours) i'd suggest it be marked critical. a work-around after-the-fact is not very useful. | 15:26 |
perrito666 | ericsnow: inded | 15:26 |
sinzui | thank you bac | 15:26 |
voidspace | dimitern: one of your suggestions | 15:31 |
voidspace | dimitern: adding the following to the comment for the State.IPAddress method "with the given value." | 15:31 |
voidspace | dimitern: you don't think it's entirely obvious that one returned will represent the value you pass in? | 15:31 |
dimitern | voidspace, yeah, I really meant to say "missing full-stop" | 15:32 |
voidspace | dimitern: heh, you got me on that one. | 15:32 |
voidspace | dimitern: cool | 15:32 |
dimitern | voidspace, in general doc comments should be proper sentences when possible I think | 15:33 |
voidspace | dimitern: ok, I've never been quite sure. | 15:33 |
voidspace | dimitern: I'll stick to that from now on. Easy enough. | 15:33 |
dimitern | voidspace, cheers | 15:33 |
alexisb | dimitern, ping | 15:39 |
dimitern | alexisb, hey | 15:40 |
dimitern | alexisb, just replying to your mail btw | 15:40 |
alexisb | hey there dimitern | 15:40 |
alexisb | nws, no rush on that | 15:40 |
alexisb | I saw some of your irc chatter earlier; are you working this bug: | 15:41 |
alexisb | https://bugs.launchpad.net/juju-core/+bug/1397376 | 15:41 |
mup | Bug #1397376: maas provider: 1.21b3 removes ip from api-endpoints <api> <cloud-installer> <fallout> <landscape> <maas-provider> <juju-core:Triaged> <juju-core 1.21:Triaged> <https://launchpad.net/bugs/1397376> | 15:41 |
alexisb | ?? | 15:41 |
dimitern | alexisb, I did investigate the issue - it seems like a juju-specific regression (if you can call it that, since it was never documented nor claimed anywhere api-endpoints should return IPs instead of hostnames) | 15:42 |
alexisb | dimitern, given it is currently marked as critical and blocking the 1,21 release we need to decide on a path forward and get it resolved | 15:43 |
dimitern | alexisb, there are several possible solutions | 15:44 |
dimitern | alexisb, and I prefer to have the same one across all providers, i.e. always return a usable IP as first endpoint (if we only have a hostname try resolving it first) | 15:45 |
dimitern | alexisb, I'll add a comment to the bug | 15:45 |
alexisb | dimitern, thank you | 15:45 |
dimitern | alexisb, and propose a fix + backports | 15:46 |
alexisb | make sure to make it clear if you are looking for a response from the bug commiter, etc | 15:46 |
dimitern | sure | 15:47 |
=== jog_ is now known as jog | ||
voidspace | dimitern: instead of enforcing that InterfaceId or MachineId can only be set once | 15:49 |
voidspace | dimitern: how about we only do it once? | 15:50 |
voidspace | dimitern: if we do that then the check is unnecessary. If we *need* to do it more than once then the code is actually a hindrance. | 15:50 |
voidspace | dimitern: so at *best* the code is useless... IMO | 15:50 |
* fwereade_ swears at the jujud tests and kicks things for a bit | 15:51 | |
dimitern | voidspace, how can you guarantee it's only once | 15:51 |
voidspace | dimitern: by only doing it in one place | 15:51 |
voidspace | dimitern: by understanding the code | 15:51 |
voidspace | dimitern: if we attempt to do it more than once and it fails juju will have problems due to a code path that stops | 15:51 |
voidspace | dimitern: so we need to do that *anyway* | 15:51 |
voidspace | dimitern: I can add an assert, it's easy enough | 15:52 |
voidspace | dimitern: I'm just not convinced it's very useful, and may actually be the opposite of useful (at which point we just take it out again) | 15:52 |
alexisb | fwereade_, I see you are having a good evening ;) | 15:52 |
fwereade_ | alexisb, a delight, as always :) | 15:52 |
alexisb | :) | 15:54 |
=== kadams54-away is now known as kadams54 | ||
dimitern | voidspace, ok, let's think about it for a bit | 15:58 |
dimitern | voidspace, the reason we have similar "hard states", e.g. once a machine is provisioned it can't be "unprovisioned", is because these steps happen at different times and more importantly in different workers | 16:00 |
voidspace | dimitern: right | 16:01 |
voidspace | dimitern: in our case an IP address will be requested *for* a machine | 16:01 |
voidspace | dimitern: so nothing else will attempt to use it | 16:01 |
voidspace | dimitern: and once allocated the MachineId will be set | 16:01 |
voidspace | dimitern: so there's no use case for changing it, but nor is it possible that something will try | 16:02 |
dimitern | voidspace, yeah, it seems reasonable to bind setting state to allocated with setting the machine id | 16:02 |
dimitern | voidspace, how about having a AllocateTo(machineId, interfaceId) method ? | 16:03 |
voidspace | dimitern: instead of individual setters - ok. | 16:04 |
voidspace | dimitern: to add the asserts I need to get rid of omitempty it would seem | 16:04 |
voidspace | I *would* need to get rid of omitempty | 16:04 |
dimitern | voidspace, it asserts the state is unknown and machine / interface ids are empty before setting state to allocated + mid +iid | 16:04 |
dimitern | voidspace, yeah - omitempty should go on both ids | 16:05 |
voidspace | dimitern: so you still want the asserts... | 16:05 |
voidspace | to protect us against something that can never happen... :-p | 16:05 |
voidspace | I should sprinkle anti-polar-bear dust around the code as well just in case | 16:06 |
dimitern | voidspace, imagine multiple workers trying AllocateTo() in parallel | 16:06 |
voidspace | dimitern: but they'll be given different ip addresses | 16:06 |
dimitern | voidspace, true | 16:06 |
voidspace | technically there's a race between fetching the full set of existing addresses and generating a new one | 16:07 |
dimitern | voidspace, ok, at least let's have an error when it's already allocated | 16:07 |
voidspace | ok :-) | 16:07 |
dimitern | voidspace, indeed, that's why we need to assert the txn-revno of ipaddressesC haven't changed since we fetched them :) - good point | 16:08 |
voidspace | a fair compromise and that will protect against the race | 16:08 |
voidspace | we should do that in the code that generates the ip address | 16:08 |
dimitern | voidspace, but that's relevant to the picking algorithm | 16:08 |
dimitern | voidspace, yep | 16:08 |
katco | voidspace, dimitern: not sure if this is how fwereade_ intended this to be used, but the new leasing stuff will provide a sort of "environment mutex" | 16:09 |
katco | where you could say, "i'm mucking in the ip addresses, give me that lease" | 16:10 |
dimitern | voidspace, so AllocateTo() returns nil on success or - say ErrAlreadyAllocated with message "IP "1.2.3.4" is already allocated to machine "1"" | 16:10 |
katco | and then when others would like to know about ip addresses they could grab that lease | 16:10 |
dimitern | katco, oh GIL ? :) | 16:10 |
dimitern | or we can call it JIL instead :D | 16:10 |
katco | not familiar with that term | 16:10 |
dimitern | katco, good to know | 16:10 |
fwereade_ | katco, I suspect that will not be a very happy pairing of components | 16:10 |
dimitern | katco, that's python's global interpreter lock | 16:11 |
katco | ahh ok | 16:11 |
=== tvan-afk is now known as tvansteenburgh | ||
katco | fwereade_: thanks for chiming in :) i retract my suggestion! :p | 16:11 |
fwereade_ | katco, no worries :) | 16:11 |
fwereade_ | dimitern, voidspace: I'm not really up to date with what you're dicussing, btw | 16:12 |
fwereade_ | dimitern, voidspace: but txn-revno is a very big hammer to be using in general | 16:13 |
dimitern | voidspace, and the other method can be SetUnavailable(value) - returning some (doesn't have to be typed) error when the address is already unavailable | 16:13 |
dimitern | fwereade_, we're discussing how to do the address allocation algorithm for containers | 16:14 |
dimitern | fwereade_, creating a new placeholder doc in the ipaddresses collection with state "unknown" (similarly to local charm uploads), then allocating it (or failing) and changing the state to "allocated" or "unavailable" depending on the result | 16:15 |
perrito666 | natefinch: stand up? | 16:16 |
voidspace | dimitern: using txn-revno would also prevent *another* address being allocated | 16:16 |
natefinch | perrito666: momentarily | 16:16 |
dimitern | voidspace, correct | 16:17 |
fwereade_ | voidspace, dimitern: I'm probably being dense, but txn-revno on what doc exactly? | 16:17 |
voidspace | fwereade_: ipaddresses - a new one | 16:17 |
dimitern | fwereade_, voidspace, on the new ipaddresses collection Michael is doing now | 16:17 |
fwereade_ | voidspace, so the documents in that collection will be one-to-one with machines? | 16:18 |
dimitern | fwereade_, it will contain all machine addresses juju knows about | 16:18 |
fwereade_ | dimitern, not just one doc with all addresses, surely? | 16:18 |
dimitern | fwereade_, yes - machines and network interfaces | 16:18 |
fwereade_ | dimitern, that will be a hell of a bottleneck | 16:19 |
dimitern | fwereade_, wow, that haven't occurred to me *lol* | 16:19 |
fwereade_ | dimitern, even without txn-revno, all writes will be serialized | 16:19 |
dimitern | fwereade_, right, so txn-revno is no good then | 16:20 |
dimitern | fwereade_, we can't just fetch all addresses, pick a random in the same range, not yet existing, then create it asserting the collection hasn't changed | 16:21 |
katco | fwereade_: forgive my ignorance, but doesn't this really sound like an environmental mutex would be helpful? what am i missing? | 16:22 |
dimitern | katco, how is that mutex supposed to work? | 16:22 |
fwereade_ | dimitern, voidspace: I am most likely misunderstanding the problem -- but I'm not following who needs to allocate these addresses? | 16:23 |
fwereade_ | dimitern, voidspace: surely in the general case we are asking the provider for new addresses, aren't we? | 16:23 |
voidspace | fwereade_: no | 16:23 |
katco | dimitern: well, the lease service is a fifo stack; 1 exists per state machine, and only 1 person can own a lease for a namespace at a time | 16:23 |
voidspace | fwereade_: for "reasons" | 16:24 |
voidspace | fwereade_: let me remember specifically | 16:24 |
dimitern | fwereade_, no, we're asking for a specific address | 16:24 |
voidspace | fwereade_: with ec2 we can request they give us a new address - but they don't tell us what it is | 16:24 |
voidspace | fwereade_: so we then have to look on the network interface and find "the new address" | 16:24 |
voidspace | fwereade_: and if we do that in parallel (adding several machines to an interface) we have race conditions working out which one is for which machine | 16:25 |
dimitern | fwereade_, yeah - ec2 is quite unhelpful and most providers allow you to request a specific address to allocate | 16:25 |
voidspace | fwereade_: but both ec2 and maas (and openstack but we only initially care about ec2 and maas) | 16:25 |
fwereade_ | voidspace, dimitern: oh, what fun | 16:25 |
voidspace | fwereade_: will allow us to pick an address and request that specific one | 16:25 |
voidspace | fwereade_: so we're generating and storing allocated addresses | 16:25 |
dimitern | fwereade_, and keeping track which machine uses them, or marking them as unavailable (i.e. something else uses it, we'll try another but remember this) | 16:26 |
fwereade_ | voidspace, dimitern: ok, so, to try to restate in my own words | 16:26 |
fwereade_ | voidspace, dimitern: an example would be: you have machine N, with a bunch of containers N/lxc/A,B,C | 16:27 |
fwereade_ | voidspace, dimitern: something running in the agent for machine N observes that there are 3 new addresses available, and needs to assign them to the appropriate containers? | 16:28 |
fwereade_ | voidspace, dimitern: or have I completely misunderstood the context? | 16:28 |
* fwereade_ suspects he completely has | 16:29 | |
voidspace | fwereade_: it's more that we notice there are three new containers, we need to create three new addresses | 16:30 |
voidspace | fwereade_: more likely we will try to create one address three times | 16:30 |
dimitern | fwereade_, yeah - that thing will be the container provisioner most likely, but it won't notice the new addresses, it will request them and allocate them to each container, so that the address updater (after this happens) can see the machine has 4 IPs (as seen via the cloud api) but only 1 IP is for the machine (already allocated) the rest are for the containers | 16:30 |
voidspace | fwereade_: if we arean't picking the addresses ourself, but we have a worker that requests an address and then has to work out what the new one is | 16:30 |
voidspace | fwereade_: if that happens three time in parallel - when the worker checks the network interface and sees three new addresses, how does it know which is the "new one" | 16:31 |
fwereade_ | dimitern, that cannot be the container provisioner, can it? | 16:31 |
fwereade_ | dimitern, we don't want to give it the cloud credentials necessary to ask for the addresses | 16:31 |
dimitern | fwereade_, it won't talk to the cloud directly, obviously the api will be used | 16:32 |
dimitern | :) | 16:32 |
fwereade_ | dimitern, ok cool :) | 16:32 |
fwereade_ | dimitern, not 100% sure it's the container provisioner then? but, ehh, that's an irrelevant detail at this point I guess | 16:33 |
dimitern | voidspace, fwereade_, so effectively: the apiserver will be creating ipaddresses docs, calling the cloud to allocate them and assigning them to their machineid | 16:33 |
dimitern | voidspace, fwereade_, that's a detail yes, we'll get there | 16:33 |
fwereade_ | dimitern, so, what we have at the moment is that the machiner calls SetMachineAddresses with all the addresses it knows about -- and presumably we'll keep on doing roughly the same thing? just doing so more often, and not in the machiner? | 16:37 |
marcoceppi | bac: building amulet now | 16:38 |
marcoceppi | will be released in about 20 mins | 16:38 |
bac | thanks marcoceppi | 16:38 |
dimitern | fwereade_, IIRC SetMachineAddresses discovers what's on the machine, not via the cloud api | 16:39 |
fwereade_ | dimitern, yes | 16:39 |
dimitern | fwereade_, so it will be the same | 16:39 |
dimitern | fwereade_, the address updater is problematic and has to be fixed carefully to separate container addresses from host addresses when refreshing the instance info from the cloud | 16:41 |
fwereade_ | dimitern, and we also get a stream of SetAddresses~s coming from the instanceupdater as well | 16:41 |
fwereade_ | dimitern, how *can* we separate them? what's the difference between a provider address that was allocated for a container, and one that was allocated for the host machine? | 16:42 |
dimitern | fwereade_, yeah - which in turn also calls SetAddresses at the end via the apiserver | 16:43 |
katco | sinzui: we have confirmation that 1397995 is fixed in v1.21. discussions are ongoing about expected behavior, but that's just a spec thing, not a bug. | 16:43 |
dimitern | fwereade_, ha! there's the beauty of it :) | 16:43 |
sinzui | katco, \o/ | 16:43 |
katco | sinzui: tyvm for your efforts and assistance. you did a great job! :) | 16:43 |
dimitern | fwereade_, because we pre-allocate the address we know what it is and for which machine id (or container) it's allocated before the container starts | 16:44 |
fwereade_ | dimitern, ok, so we (could) have a provider id for each address that comes in via the instanceupdater? | 16:44 |
dimitern | fwereade_, I think the least painful way to resolve all these is to change state.Machine.SetAddresses() internally to do the separation considering what's in ipaddressesC | 16:45 |
dimitern | fwereade_, by provider id you mean instance id in this case? | 16:45 |
fwereade_ | dimitern, no? I think I mean a provider id for the address | 16:46 |
dimitern | fwereade_, there's no specific id - the IP itself is the key - at least in maas, openstack and ec2 | 16:46 |
fwereade_ | dimitern, instance id is not a problem, is it? we know which *host* machine already because it's 1:1 with instances | 16:46 |
fwereade_ | dimitern, I am suspicious that there is an implicit assumption that those addresses won't change | 16:47 |
fwereade_ | dimitern, which does not gel with what I understand ec2 in particular is likely to do to you if you suspend your instances for a bit | 16:48 |
dimitern | fwereade_, it depends on how the instance was started (i.e. the termination behavior) - we don't do suspend/resume ourselves | 16:51 |
fwereade_ | dimitern, ok, I am still questioning the assumption that addresses won't change | 16:52 |
fwereade_ | dimitern, eg when an AZ falls over for a bit I would like it if we were able to absorb that when everything was running again | 16:53 |
voidspace | evilnickveitch: ping | 16:59 |
evilnickveitch | voidspace, hey | 17:00 |
voidspace | evilnickveitch: hey | 17:00 |
voidspace | evilnickveitch: you just sent me an email about juju docs | 17:00 |
voidspace | evilnickveitch: I suspect it was intended for someone else... | 17:00 |
evilnickveitch | voidspace, oops - sorry! I am blaming gmail autocomplete | 17:01 |
voidspace | evilnickveitch: :-) | 17:01 |
dimitern | fwereade_, the address won't change *at will* | 17:03 |
dimitern | fwereade_, it will happen due to some specific user action - i.e. stopping an instance and restarting it perhaps - juju should be able to detect this and know how to handle it | 17:04 |
dimitern | fwereade_, an address that was allocated (reserved, in-use, whatever) cannot be reused by any other node on the same subnet (except for some weird clustering/balancing/etc. on L2) | 17:06 |
dimitern | fwereade_, a machine can get a new (allocated) address and become unreachable under the old one, but the old one won't just appear on some machine (as long as the machine is up) | 17:07 |
fwereade_ | dimitern, no arguments there... the problem STM to be that i-abc123 can reasonably return 2 non-overlapping sets of IPs in two consecutive requests to the provider | 17:08 |
fwereade_ | dimitern, and if we don't have some notion of what the underlying provider id of a given IP is, we can't cleanly map the first set onto the second | 17:08 |
fwereade_ | dimitern, (that said, I believe that the underlying provider ids do not necessarily exist -- I'm not saying you should find them, but I am fixating on the nasty edge cases that I think I can see) | 17:09 |
fwereade_ | dimitern, in general it's not outside the bounds of probability that we might see, for a single machine | 17:10 |
dimitern | fwereade_, I understand the problem and appreciate you bringing it up - I'll make sure we do have tests for such cases | 17:10 |
fwereade_ | dimitern, SA: a; SMA: b; SA: c, d; SMA: e,f | 17:11 |
fwereade_ | dimitern, cool | 17:11 |
fwereade_ | dimitern, have I just been massively derailing you then? :( | 17:11 |
dimitern | fwereade_, shouldn't we prefer addresses discovered on the machine rather than the ones coming via the cloud api? | 17:12 |
fwereade_ | dimitern, it still sort of feels like the problem you're talking about is "how do we reconcile these streams of addresses from two different sources" | 17:12 |
fwereade_ | dimitern, hmm | 17:12 |
fwereade_ | dimitern, I *think* the cloud api is more likely to give us usable addresses, isn't it? | 17:12 |
fwereade_ | dimitern, like how in address-get we want to return the advertise address rather than the bind address? | 17:13 |
dimitern | fwereade_, no, it's very useful as I did have much time to think in detail about how to solve the 2 address sources problem | 17:13 |
fwereade_ | dimitern, I assert <wave hands vigorously> that what the cloud tells us is more likely to correspond to the addresses which will cause the traffic to be delivered to the right place | 17:13 |
dimitern | fwereade_, well, if the cloud *thinks* node A uses IP1, but eth0 on A says IP2, the latter will be usable (all other things equal - i.e. no restrictive firewalling/routing for IP1 vs IP2) | 17:14 |
dimitern | fwereade_, yeah | 17:14 |
fwereade_ | dimitern, my concern is exactly that all other things will not be equal | 17:15 |
dimitern | fwereade_, at some point we'll need a way to actually *verify* a given node's IP is usable (inbound and out) | 17:16 |
fwereade_ | dimitern, eg when you expose something to the public internet, you need to advertise the public address that will get the packets to the right place, which is not necessarily the one you're binding to at all | 17:16 |
fwereade_ | dimitern, that's a nice-to-have where it's practical | 17:16 |
dimitern | fwereade_, until then I guess trusting the cloud makes more sense | 17:16 |
fwereade_ | dimitern, but I'm not sure it's something we can depend on in all cases | 17:17 |
fwereade_ | dimitern, in particular | 17:17 |
fwereade_ | dimitern, these thorny sorts of questions | 17:17 |
fwereade_ | dimitern, are why I'm keen that we continue to record the stuff that gets set via SA/SMA unchanged | 17:17 |
dimitern | fwereade_, the "just-in-" case :) | 17:18 |
fwereade_ | dimitern, and we recalculate our best guess at reality, taking *both* of those things into account, in response to a change in either | 17:18 |
voidspace | dimitern: when you have a minute (shouldn't take any longer) can you check the implementation of AllocateTo | 17:19 |
voidspace | dimitern: http://reviews.vapour.ws/r/564/ | 17:19 |
voidspace | dimitern: all issues resolved | 17:19 |
dimitern | fwereade_, right, but the ips known to be allocated should be used when needed, while the SA/SMA ones only when they change | 17:19 |
dimitern | voidspace, sure, will look shortly | 17:19 |
dimitern | fwereade_, (the IPs in ipaddressesC I mean) | 17:20 |
fwereade_ | dimitern, true -- the providers that *do* tell us what IPs they've allocated should probably be trusted over and above the SA/SMA ones | 17:20 |
dimitern | fwereade_, what if the cloud says node A's IP is now b (used to be a)? Should we also ensure SMA reports b as well? (I mean reconfigure the interface on the machine) | 17:22 |
fwereade_ | dimitern, I have literally no idea, I'm afraid :( | 17:23 |
fwereade_ | dimitern, I suspect "it depends" but I don't really know what on | 17:24 |
dimitern | fwereade_, and also how often/when this is possible | 17:24 |
dimitern | voidspace, reviewed | 17:38 |
dimitern | voidspace, I'm ok with moving subnet method tests from state_test.go into a new subnets_test.go as a follow-up | 17:38 |
alexisb | dimitern, you joining the networking call? | 18:01 |
voidspace | dimitern: ok | 18:06 |
voidspace | dimitern: thanks, still a few things to fix | 18:07 |
voidspace | dimitern: do we *need* DeepEquals for comparing structs? | 18:08 |
voidspace | dimitern: I wasn't sure | 18:08 |
natefinch | equals will work for simple structs that don't have, for example, maps or slices in them | 18:09 |
voidspace | natefinch: and if it doesn't work it should complain, right? | 18:10 |
voidspace | natefinch: as it worked I didn't feel the need to use DeepEquals | 18:10 |
natefinch | yep | 18:10 |
voidspace | cool | 18:10 |
voidspace | natefinch: so no reason to prefer jc.DeepEquals over gc.Equals if it's not required? | 18:11 |
natefinch | voidspace: there's probably no reason not to use DeepEquals, the difference in speed is going to be negligible..... trying to think if there might be a reason why Equals would be preferred... | 18:15 |
voidspace | natefinch: heh, your answer is the opposite of what I asked :-) | 18:18 |
natefinch | voidspace: my point is, jc.DeepEquals should always work, so there may be no reason to ever use gc.Equals unless you're doing a very large number of comparisons. If one way always works, and one way usually works... why not just always use the way that always works? | 18:23 |
voidspace | natefinch: because I already have a way that works and I need a reason to swap... | 18:24 |
voidspace | natefinch: I understand what you're saying though | 18:25 |
natefinch | voidspace: if it works, it works, don't muck with it ;) | 18:26 |
voidspace | my thoughts exactly :-p | 18:26 |
voidspace | I'll start using jc.DeepEquals just to not have to ever have this conversation again... | 18:26 |
natefinch | haha | 18:27 |
dimitern | voidspace, it's good to use DeepEquals for structs, otherwise you might end up comparing .String() or .GoString() values | 18:27 |
voidspace | dimitern: ah, so there is a reason | 18:27 |
voidspace | dimitern: when would that happen? | 18:27 |
dimitern | voidspace, it depends on how the Equals checker is implemented I guess | 18:32 |
dimitern | voidspace, can't really say without experimenting a bit | 18:32 |
voidspace | ah | 18:32 |
dimitern | voidspace, I tend to use DeepEquals (the jc version, not the gc one as it prints more comprehensive error messages with deeply nested structs / long maps, etc.) | 18:34 |
dimitern | voidspace, ..for anything than a simple type | 18:34 |
voidspace | dimitern: better failure messages is a good reason to prefer it | 18:34 |
dimitern | voidspace, but maybe it's just me being paranoid :) | 18:35 |
voidspace | dimitern: when adding a new test file is there anything else I need to do? | 18:35 |
voidspace | dimitern: it doesn't look like my tests are being run | 18:35 |
voidspace | as in, I inject a failure and nothing fails | 18:35 |
dimitern | voidspace, yeah :) | 18:35 |
dimitern | voidspace, register the suite | 18:35 |
voidspace | gc.Suite(...) | 18:36 |
voidspace | or something elsse? | 18:36 |
dimitern | var _ = gc.Suite(&machineSuite{}) | 18:36 |
voidspace | that's there | 18:36 |
=== kadams54 is now known as kadams54-away | ||
dimitern | voidspace, and no tests run still? | 18:37 |
voidspace | doesn't look like it | 18:37 |
voidspace | package is state_test, file is state/subnet_test.go | 18:37 |
=== kadams54-away is now known as kadams54 | ||
voidspace | hmmm... maybe it should be subnets_test.go | 18:38 |
voidspace | dimitern: does each test file need to match a source file? | 18:38 |
dimitern | voidspace, yeah, but that shouldn't matter | 18:39 |
voidspace | dimitern: you're on late | 18:39 |
dimitern | voidspace, btw - see this http://paste.ubuntu.com/9356541/ | 18:39 |
voidspace | dimitern: fair enough :-) | 18:39 |
voidspace | hmmm... but the ipaddresses_test is definitely being run | 18:40 |
dimitern | voidspace, even though I added a .GoString() method on network.HostPort to shorten the output considerably (from network.HostPort{Value: 1.2.3.4, Port:42, ...} to 1.2.3.4:42 | 18:40 |
dimitern | voidspace, hmm.. let me see if something else was needed | 18:40 |
dimitern | voidspace, I'm on since 7am :) | 18:41 |
dimitern | really need to stop soon | 18:41 |
dimitern | voidspace, can you paste your complete subnets_test.go ? | 18:42 |
voidspace | dimitern: http://pastebin.ubuntu.com/9356566/ | 18:42 |
voidspace | dimitern: there's a deliberate error in assertSubnet inside TestAddSubnet (first test) | 18:43 |
voidspace | that should fail | 18:43 |
dimitern | voidspace, you don't need to embed ConnSuite, do you?\ | 18:44 |
voidspace | dimitern: StateSuite does, which is what I copied for this and IPAddressSuite | 18:44 |
voidspace | if we can get rid of it then cool | 18:44 |
dimitern | voidspace, hmm.. no sorry - you need it | 18:44 |
voidspace | we need s.State | 18:44 |
dimitern | voidspace, yeah | 18:45 |
voidspace | I've renamed to subnets_test.go | 18:45 |
voidspace | that didn't help | 18:45 |
dimitern | voidspace, why do you have that s.policy.GetConstraintsValidator in SetUpTest? | 18:45 |
dimitern | voidspace, HA! found it | 18:46 |
dimitern | voidspace, a single character is needed :) | 18:46 |
voidspace | go on | 18:47 |
dimitern | voidspace, var _ = gc.Suite(&SubnetSuite{}) - instead of var _ = gc.Suite(SubnetSuite{}) - all methods have pointer receivers | 18:47 |
voidspace | gah | 18:47 |
voidspace | of course | 18:47 |
voidspace | it registers an empty one | 18:47 |
voidspace | thanks | 18:47 |
dimitern | no worries | 18:47 |
dimitern | i'll be off then :) | 18:47 |
voidspace | sorry | 18:47 |
voidspace | dimitern: g'night | 18:48 |
dimitern | voidspace, g'night | 18:48 |
voidspace | I'm off too | 18:58 |
voidspace | g'night all | 18:58 |
ericsnow | natefinch: could you take a look at http://reviews.vapour.ws/r/577/ | 19:50 |
ericsnow | natefinch: (and http://reviews.vapour.ws/r/573/ too) :) | 19:50 |
ericsnow | natefinch: that first one fixes the blocker | 19:50 |
natefinch | ericsnow: cool, looking | 19:50 |
thumper | jw4: ping | 19:57 |
thumper | jw4: I actually have a branch that fixes one of the actions bits https://github.com/juju/juju/pull/1261/files | 19:57 |
thumper | jw4: may clash with your dependency update too | 19:57 |
thumper | jw4: interesting that both our solutions were identical - completely identical :-) | 20:01 |
natefinch | ericsnow: a couple small concerns, but nothing too hard to fix. | 20:12 |
ericsnow | natefinch: k | 20:12 |
=== kadams54 is now known as kadams54-away | ||
ericsnow | natefinch: I've addressed your comments | 20:35 |
natefinch | ericsnow: looking | 20:38 |
ericsnow | natefinch: keep in mind that cmd/plugins/juju-restore/restore.go is going away, so it probably isn't worth nitpicking there :) For that file I just want to be sure I have the logic right since there isn't any simple way to test it | 20:40 |
thumper | ericsnow: I did a review too, nothing to add | 20:41 |
thumper | ericsnow: I'd wait for natefinch's LGTM before landing though | 20:41 |
ericsnow | thumper: thanks :) | 20:41 |
ericsnow | thumper: will do | 20:41 |
natefinch | ericsnow: I just gave it a shipit :) | 20:41 |
perrito666 | \o/ | 20:41 |
ericsnow | natefinch: thanks | 20:42 |
* thumper awaits an unblocked laner | 20:42 | |
thumper | lander | 20:42 |
natefinch | thumper: while you're waiting, do you have some time to talk about provider configuration? I'm porting fwereade_'s skeleton provider from 10-months-ago juju to today-juju, and some of the internals bug me. He thought you might have some insight. | 20:43 |
thumper | sure | 20:43 |
thumper | natefinch: make a hangout | 20:43 |
natefinch | thumper: https://plus.google.com/hangouts/_/canonical.com/moonstone?authuser=1 | 20:45 |
ericsnow | davecheney: I was just reading through your functional options talk (very interesting) | 21:00 |
ericsnow | davecheney: do you think there are any spots that could stand to benefit from functional options in juju? | 21:00 |
ericsnow | davecheney: would it be a good approach for pulling in the different paths that need to be backed up (I think mattyw was hinting at this a while back) | 21:02 |
davecheney | ericsnow: sure | 21:02 |
davecheney | i think there are a few places we could use it | 21:02 |
jw4 | thumper: thanks! | 21:09 |
davecheney | ok, time for breakfast | 21:11 |
davecheney | then, REVIEWING! | 21:11 |
thumper | ericsnow: did you see that your branch failed early? | 21:21 |
ericsnow | thumper: yeah, fixing it now | 21:21 |
thumper | kk | 21:21 |
* thumper wants to land stuff | 21:21 | |
thumper | anastasiamac: I'm reviewing your mega branch | 21:52 |
thumper | anastasiamac: done | 21:58 |
ericsnow | anyone know how to force a CI test to run (functional-ha-backup-restore, specifically)? | 22:02 |
ericsnow | thumper: my fix landed but I hesitate to mark the bug as committed until the CI test (functional-ha-backup-restore) runs | 22:02 |
thumper | ericsnow: I think they happen automatically | 22:03 |
jw4 | fix committed won't clear CI til it's marked as fix released | 22:03 |
thumper | ericsnow: and marking the bug fix committed doesn't release the bot | 22:03 |
ericsnow | thumper: right | 22:03 |
ericsnow | thumper, jw4: thanks for clarifying | 22:03 |
perrito666 | interesting, windows does not run update as an atomic process | 22:42 |
* perrito666 has an install running in a machine next to him that installed 170 packages, failed on the 171 and just rolled back the whole thing... this all thing took 4 hs | 22:43 | |
jw4 | thumper: https://bugs.launchpad.net/juju-core/+bugs?field.searchtext=&orderby=-importance&field.status%3Alist=NEW&field.status%3Alist=CONFIRMED&field.status%3Alist=TRIAGED&field.status%3Alist=INPROGRESS&field.status%3Alist=FIXCOMMITTED&field.status%3Alist=INCOMPLETE_WITH_RESPONSE&field.status%3Alist=INCOMPLETE_WITHOUT_RESPONSE&field.importance%3Alist=CRITICAL&field.tag=ci+regression+&field.tags_combinator=ALL | 22:44 |
jw4 | thumper: when that list is empty CI will be unblocked | 22:44 |
* thumper sighs | 22:44 | |
* thumper waits | 22:44 | |
jw4 | thumper: I personally use https://api.launchpad.net/devel/juju-core?ws.op=searchTasks&status%3Alist=Triaged&status%3Alist=In+Progress&status%3Alist=Fix+Committed&importance%3Alist=Critical&tags%3Alist=regression&tags%3Alist=ci&tags_combinator=All | 22:44 |
thumper | impatiently | 22:44 |
thumper | handy link | 22:45 |
* jw4 has had code to land for a week now | 22:45 | |
perrito666 | davecheney: do you feel answered by my email? I was not sure if you answered the review with an email or added a review and then deleted it | 23:16 |
perrito666 | natefinch: fwereade_ ericsnow you should all have a mail with the shared draft for b&r specs | 23:17 |
=== kadams54-away is now known as kadams54 | ||
=== kadams54 is now known as kadams54-away | ||
anastasiamac | thumper: thnx for the review!!! | 23:31 |
wallyworld_ | davecheney: i think you're ocr? you able to take a look at http://reviews.vapour.ws/r/552/ for me? | 23:31 |
=== kadams54-away is now known as kadams54 | ||
davecheney | wallyworld_: looking | 23:54 |
davecheney | perrito666: could you summarise the state of play for me | 23:55 |
wallyworld_ | ty | 23:56 |
davecheney | i'm unclear what or if there is a problem | 23:56 |
davecheney | wallyworld_: on a scale of 1 to "don't be picky dave", is the importance of this PR ? | 23:58 |
wallyworld_ | davecheney: i would like to land but if there are issues, record them | 23:58 |
davecheney | understood | 23:58 |
davecheney | wallyworld_: URGH params.StateServingInfo | 23:59 |
davecheney | that fucking type | 23:59 |
davecheney | such a mess | 23:59 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!