/srv/irclogs.ubuntu.com/2014/04/14/#juju-dev.txt

thumperdavecheney: does this bug still happen? https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/130416700:02
_mup_Bug #1304167: syntax error, trusty beta-2 cloud image <apparmor (Ubuntu):Confirmed> <https://launchpad.net/bugs/1304167>00:02
wallyworldthumper: maybe you were smoking weed when you wrote the email00:02
thumperdavecheney: seems like a quite major bug if so00:02
thumperwallyworld: nah...00:02
thumperalthough I am wondering if it would help00:02
wallyworldcouldn't hurt :-)00:02
thumperha00:02
davecheneythumper: would it be possible for you to log00:03
davecheney"%T", err00:04
thumperdavecheney: sure00:04
davecheneythanks00:04
davecheneythumper: yes, the bug is still open00:04
davecheneyit has screwed LXC on any platform that uses apparmor00:04
thumper:-(00:05
davecheneythumper: when you run the destroy-enviromnet, you're not in that directory are you00:05
davecheneyie; mkdir /tmp/t00:06
davecheneycd /tmp/t00:06
davecheneyrmdir /tmp/t00:06
thumperdavecheney: no00:06
davecheneyok, just checking00:06
davecheneyhttp://gcc.gnu.org/releases.html00:07
davecheneygcc 4.9 released00:07
davecheneybut not really00:07
thumperif you destroy too close to bootstrap, you don't get it00:07
davecheneythumper: hmm ok00:08
thumperoh...00:08
thumperI think I know what it could be...00:08
davecheneythumper: hold pls00:09
thumperwhen we kill the machine agent with pkill00:09
thumperit cleans up after itself00:09
thumperwe then have a race00:09
davecheneythumper: right, so things are racing on the directory listing00:09
thumperthe agent is trying to remove some files00:09
thumperand then so does the destroy command00:10
davecheneyhttp://golang.org/src/pkg/os/error_unix.go00:10
davecheneyso is the agent removing ~/.juju/local ?00:10
davecheneyie it's not a file00:10
davecheneybut the top level directory itself ?00:10
davecheneyso os.RemoveAll goes to remove ~/.juju/locla00:11
davecheneyand the whole thing has been deleted already ?00:11
thumpernot all of it...00:11
thumperbut some of it00:11
thumperoh...00:11
thumperyeah, sometimes all of it00:11
thumperyeah...00:11
thumperit does00:11
thumper*os.SyscallError00:12
thumperthey are racing to remove the datadir00:12
davecheneythumper: ok, that should be possible to make a repro00:13
davecheneyi'll do that while i'm waiting for gccgo to compile00:13
thumperdavecheney: what do you think should happen?00:13
davecheney10:12 < thumper> *os.SyscallError00:13
davecheney^ is that %T ?00:13
thumperyeah00:13
davecheneycheaky bugger00:14
davecheneythumper: leave it with me00:14
davecheneyraise an issue maybe00:14
davecheneyi need to make a repro00:14
thumperdavecheney: you see it as a golang bug?00:14
davecheneythumper: it won't fit through http://golang.org/src/pkg/os/error_unix.go00:15
davecheneyhttp://play.golang.org/p/mp5i8GFL4700:16
* davecheney goes to find out where that os.SysclalError is coming from00:16
davecheneythumper: for the moment you'lre going to have to code around it00:18
davecheneythis won't be fixed in 1.200:18
davecheneydir_unix.go00:18
davecheney41:                             return names, NewSyscallError("readdirent", errno)00:18
davecheneythis is where it's coming from00:18
* davecheney feels very depressed00:19
davecheneyit's just bugs, bugs, and more bugs00:19
thumperdavecheney: I'll work around it00:20
thumperdavecheney: we already ignore errors from two other things that we are racing with00:21
davecheneythumper: i'll get a repro quick smart00:30
davecheneyi can see where it happens00:30
waiganimorning davecheney.00:37
waiganidavecheney: when I run make check on vm I get the following: http://pastebin.ubuntu.com/724696800:37
waiganiany hints?00:37
waiganithumper: wip on jujud isolation: https://codereview.appspot.com/8713004500:38
waiganithumper: cmd/juju and environs/bootstrap are now passing00:38
waiganienvirons/sync is going to take a bit more thought00:39
waiganiand right now I'm too hungry to think00:39
davecheneywaigani: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/130475400:39
_mup_Bug #1304754: gccgo on ppc64el using split stacks when not supported <ppc64el> <trusty> <gccgo-4.9 (Ubuntu):Confirmed> <https://launchpad.net/bugs/1304754>00:39
waiganidavecheney: reading00:40
davecheneywaigani: short versoin00:43
davecheneydowngrading to an older kernel works around the problem00:43
davecheneybut isn't a fix00:43
waiganidavecheney: yep, thanks00:43
waiganiI neeeeed food. bbl00:44
davecheneythumper: if err, ok := err.(*os.SyscallError); ok { if os.IsNotFound(err.Err) }00:46
davecheneyor something00:46
thumperaxw: just saw your answer too00:56
thumperaxw: however the error that is being returned isn't os.IsNotExist00:57
thumperaxw: as the race is being caught elsewhere00:57
axwthumper: ah, maybe in the Readdir then00:59
axwanyway, there's definitely a race, and you should ignore it I think00:59
davecheneythumper: lucky(~/devel/issue) % go run issue.go00:59
davecheney2014/04/14 10:58:58 creating temporary directories rooted at "/tmp/issue015782153"00:59
davecheney2014/04/14 10:58:59 preparing workers00:59
davecheney2014/04/14 10:58:59 release the swarm00:59
davecheney2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"00:59
thumperah... read-dir-int00:59
davecheney2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"00:59
davecheney2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"00:59
davecheney2014/04/14 10:58:59 unexpected error: *os.SyscallError, "readdirent: no such file or directory"00:59
thumpernot re-addir-int00:59
davecheneythumper: raising an issue00:59
thumperaxw: yeah, that's it00:59
thumperI couldn't parse the smashedtogetherwords01:00
mwhudsonheh i finally have results for waigani and he's gone01:05
mwhudsonbut i think his problem was actually the "things randomly die on ppc" bug...01:05
mwhudsonthings all in all don't look too bad on arm64 actually01:06
davecheneythumper: https://code.google.com/p/go/issues/detail?id=7776&thanks=7776&ts=139743769501:08
thumpermwhudson: \o/01:09
mwhudsonnot actually good01:09
mwhudsonjust not terrible01:09
davecheneymwhudson: /usr/include/features.h:374:25: fatal error: sys/cdefs.h: No such file or directory01:10
davecheneyany suggestions which package contains this header01:10
mwhudsonuh, no, looks basic though01:10
mwhudsonhm01:10
mwhudsondpkg -S sez libc6-dev-i38601:11
mwhudsonwhich seems a bit random01:11
davecheney% dpkg -S /usr/include/sys/cdefs.h01:11
davecheneylibc6-dev-i386: /usr/include/sys/cdefs.h01:11
davecheneyyeah01:11
mwhudsonah01:11
mwhudsonum01:11
davecheneymwhudson: this is compiling gcc 4.901:11
mwhudson"real" libc6-dev installs it to /usr/include/$triplet/sys/cdefs.h01:12
mwhudsondavecheney: from upstream or the deb?01:12
davecheneymwhudson: upstream01:12
davecheneymwhudson: our deb produces broken binaries01:12
mwhudsondavecheney: on powerpc64 i assume?01:13
mwhudsonum, that sounds like something doko should know about :)01:13
mwhudsonis this the split stack thing?01:13
davecheneyyup01:13
mwhudsoni guess libc6-dev-i386 must be some kind of pre-multiarch thing01:14
* davecheney tries patching in some of the arguments from /usr/bin/gcc -v01:14
mwhudsondavecheney: "dpkg --listfiles libc6-dev | grep cdefs.h" on your platform?01:14
davecheney$ dpkg --listfiles libc6-dev | grep cdefs.h01:15
davecheney/usr/include/powerpc64le-linux-gnu/sys/cdefs.h01:15
davecheneymaybe ./configure got the tripplet wrong01:15
davecheneywell, i was wondering why this was such a good compile box01:16
davecheneyclock           : 4284.000000MHz01:16
davecheneyziiing01:16
davecheneygcc, just keep adding flags until it compiles01:21
davecheneynope01:22
davecheneystill broke01:22
davecheneyfuck this01:22
davecheneyi'm using symlinks01:22
davecheneywow. such multiarch01:34
davecheneymwhudson: ok, here is what I think01:40
davecheneygccgo on ppc is correctly detecting that split stacks are not supported01:40
davecheneyand using the default 'large' stack model01:40
davecheneybut .. the stack is still too small01:40
davecheneyi'm bt'in in gdb and at stack frame 1475 with no end in sight01:41
mwhudsonhaha01:42
mwhudsonok01:42
mwhudsonso stack overflow?01:42
mwhudsonhmm01:42
davecheneymake that stack frame 3,30001:42
mwhudsonis this on the altstack?  i.e. while handing a signal?01:42
davecheneyso, in summary, gccgo doesn't give a clean indication when you fall off the end of the stack01:42
davecheneymwhudson: nope, with split stacks disabled01:42
davecheneyyou get a c style stack per goroutine01:42
mwhudsondavecheney: that's not what i mean01:42
mwhudsonsure01:42
mwhudsonbut signals are handled on a different stack again01:43
mwhudson(sigaltstack and all that)01:43
mwhudsoni think those stacks are smaller?01:43
mwhudsonanyways01:43
davecheneymwhudson: i'm going to say, conditionally, yes01:43
davecheneymwhudson: the sig handled gets a SEGV01:43
mwhudsondavecheney: it's easy ish to make the stacks bigger i think01:43
davecheneyand it blames the topmost stack frame for hittig a nil01:43
mwhudsoni found the code that was allocating them01:44
davecheneywhen actaully all it did was call a function01:44
mwhudsonyeah, well, if you fall off the end of the stack it's certainly going to break01:44
davecheneymwhudson: are you adding -fsplit-stack on aarch64 ?01:44
mwhudsondavecheney: no01:44
davecheneyshit, 5,000 stack frames01:44
davecheneyhow in gods name could juju use so much stack ...01:45
mwhudsoncould this "just" be application infinite recursion for some reason?01:45
mwhudsonor does the backtrace look reasonable?01:45
davecheneymwhudson: the latter01:46
davecheneymaybe a dozen frames01:46
davecheneythis is going to be an 8mb stack01:46
davecheney18,000 stack frames01:46
davecheney#31380 0x000000001000522c in main.count ()01:47
davecheney#31381 0x0000000010005854 in main.main ()01:47
_mup_Bug #31381: POMsgSet.active_texts assumes POFile.pluralforms is an int <lp-translations> <oops> <Launchpad itself:Fix Released by matsubara> <https://launchpad.net/bugs/31381>01:47
_mup_Bug #31380: source package sort by version doesn't cope with invalid version numbers <lp-foundations> <oops> <Launchpad itself:Fix Released by kiko> <https://launchpad.net/bugs/31380>01:47
mwhudsonthat doesn't sound reasonable01:47
mwhudsonlolmup01:47
davecheney#-101:47
mwhudsonalthough, eh, i guess it works well enough on platforms that do have split stacks01:47
davecheneymwhudson: most gccgo developers are on amd6401:48
davecheneywhen I say most01:48
davecheneyi mean01:48
mwhudsonall 1 of them?01:48
davecheneyeveryone except you and me and some neckbeard using mips01:48
mwhudsonstrange this doesn't happen on arm64 though01:49
* davecheney goes to talk to ian taylor01:49
davecheneymwhudson: gccgo src/test/peano.go01:49
davecheney./a.out01:49
mwhudsoni wouldn't have thought that stack frames would be much bigger on that01:49
mwhudsonwell yes, that fails on arm64 too01:49
davecheneyi wonder if it is unrelated01:49
davecheneythat gives a straight segfault01:49
davecheneyand the go handler doens't catch it01:49
davecheneyi wonder if we're barking up the wrong tree01:49
davecheneymwhudson: i'm thinking these are two different issues01:53
davecheney[492932.974051] a.out[25065]: bad frame in setup_rt_frame: 000000c20ffaf0e0 nip 0000000010004e0c lr 00000000100051fc01:53
davecheney^ this is what running off the stack looks like01:53
davecheneynote nip01:53
davecheney[2028013.988376] jujud[400]: bad frame in setup_rt_frame: 0000000000000000 nip 0000000000000000 lr 000000000000000001:54
davecheney^ this is what a juju segfault on a bad kernel looks like01:54
davecheneynip and lr are 001:54
davecheneysomething branched to 0 and nuked the lr for good measure01:54
mwhudsonwell, once you have a disagreement over whether a bit of memory is stack or not, it's not exactly predictable what happens next01:55
davecheneytrue01:55
davecheneybut why is the ip 001:55
davecheneyboth cases this is unmapped memory01:55
mwhudsonbecause something stomped over the link register on the stack, so it branched to lala land when trying to do a procedure return?01:56
mwhudsoni don't know the ppc abi but i certainly saw that sort of thing a lot on arm6401:56
davecheneymwhudson: anything with a LR is probably going to act the same01:56
mwhudsonalso01:57
davecheneymwhudson: ok, so if we're not running of the end of the stack01:57
davecheneyand i'm pretty sure we're not01:57
davecheneythen why does the size of the kernel page size affect the result01:57
davecheney$ pmap -x 96902:25
davecheney969:   /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug02:25
davecheneyAddress           Kbytes     RSS   Dirty Mode  Mapping02:25
davecheneytotal kB               0       0       002:25
davecheney---------------- ------- ------- -------02:25
davecheneywell, thanks02:25
davecheneythumper: juju stutus returns 0 if there are hook errors02:28
davecheneyaxw: sorry, maybe this question is best addressed to you02:28
axwis that a problem?02:30
davecheneyaxw: dunno02:30
davecheneydepends what we've promised status willdo02:30
davecheneyi know that people want to be able to say 'is this environment ok'02:30
davecheney$ pmap -x 96902:30
davecheney969:   /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug02:30
davecheneyAddress           Kbytes     RSS   Dirty Mode  Mapping02:30
davecheneysory02:30
davecheney---------------- ------- ------- -------02:31
davecheney$ pmap -x 96902:31
davecheney969:   /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug02:31
davecheneyAddress           Kbytes     RSS   Dirty Mode  Mapping02:31
davecheneyoh for fucks sake02:31
davecheney---------------- ------- ------- -------02:31
davecheneytotal kB               0       0       002:31
davecheney  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND02:31
davecheney  969 root      20   0 1413376 515456  19136 S   9.6  6.2   0:18.51 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug02:31
axwyeah I can see the use case, but AFAIK it always just returned 002:31
davecheneyaxw: i think this might be related02:31
davecheneyheavy use of the api server causes RES to rise02:31
davecheneyoh god02:36
davecheneyi hate everything02:36
davecheneyupstart isn't logging the stderr of jujud-machine-002:36
davecheney:cry: SIGQUIT doesn't do what I think on gccgo02:41
thumperwallyworld: hangout died03:06
thumperwallyworld, axw, waigani: I figured I was done anyway :-)03:06
axwthumper: will take a look at your CL after I finish up on this HA thing03:06
thumperaxw: ack03:06
thumperaxw: I first read that as "hating"03:07
axwheh03:07
thumpermade me chuckle03:07
* thumper goes for a brief lie down before his head explodes03:07
waiganiwallyworld: I found the mockable BuildToolsTarball, what was the other one? bundleTools?03:13
wallyworldyeah BundleTools03:13
wallyworldin environs/tools03:13
waiganithat isn't mockable?03:13
waiganienviron/tools/build.go:20503:14
wallyworldyou just need to introduce a var03:14
wallyworldmake the method lower case03:14
wallyworldmake te var upper case03:14
waiganiah sure, make it mockable - no problem03:15
davecheneymwhudson: https://bugs.launchpad.net/juju-core/+bug/130728203:19
_mup_Bug #1307282: cmd/jujud: gccgo api server consumes ~500mb of ram on machine-0 <gccgo> <ppc64el> <juju-core:Triaged> <https://launchpad.net/bugs/1307282>03:19
davecheneyERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)03:22
davecheneyERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)03:22
davecheneydid this get fixed ?03:22
davecheneywaigani: can you send me `uname -a` from your vm ?03:24
waiganidavecheney: Linux winton-09 3.13.0-24-generic #46-Ubuntu SMP Thu Apr 10 19:09:21 UTC 2014 ppc64le ppc64le ppc64le GNU/Linux03:24
davecheneywaigani: intersting03:26
davecheneyi'm trying a -24 kernel and I can't get it to crash03:26
davecheneywaigani: did you just upgrade to that kernel ?03:26
waiganihmmm03:26
davecheneywaigani: uptime03:27
waiganidavecheney:  03:27:40 up  1:09,  2 users,  load average: 0.00, 0.01, 0.0503:27
waiganiI did a restart, to see if that helped at all03:28
waiganiran make check after, same problem03:28
davecheneywaigani: ok03:28
davecheneythanks, that makes it concrete03:28
davecheneydmesg03:28
davecheney^^03:28
waiganidavecheney: http://pastebin.ubuntu.com/7247924/03:29
davecheneywaigani: ta03:29
davecheneyi should have said03:29
davecheneydmesg | tail03:29
davecheneywaigani: could I ask you to check again03:29
waiganidavecheney: http://pastebin.ubuntu.com/7247927/03:30
davecheneysorry03:30
davecheneythe test03:30
davecheneynot the dmesg03:30
waiganiah right03:30
davecheneywhat i'm looking for is a line like03:30
davecheney(no worries, this was my fault)03:30
davecheney11:54 < davecheney> [2028013.988376] jujud[400]: bad frame in setup_rt_frame: 0000000000000000 nip 0000000000000000 lr 000000000000000003:30
davecheney^ should see something like this03:30
waiganiokay, I'll paste when done and keep an eye out for a line like that03:31
davecheneywaigani: can you ssh-import-id dave-cheney on your vm03:39
davecheneyso I can stooge around you /var/log/03:39
davecheneyand see what kernel you were running before reboot03:39
waiganidavecheney: already done, your public key is on the vm03:40
davecheneydanka03:41
davecheneywaigani: i have a theory that -24 kernel fixes the issue03:41
davecheneyit's not much of a theory atm03:41
waiganidavecheney: http://pastebin.ubuntu.com/7247954/03:41
waiganidavecheney: I have a theory that I did something stupid03:42
waiganinot so much a theory as a constant axiom03:42
davecheneywaigani: ubuntu@winton-09:/var/log$ grep '\-generic' dmesg.0 dmesg03:43
davecheneydmesg.0:[    0.000000] Linux version 3.13.0-20-generic (buildd@denneed04) (gcc version 4.8.2 (Ubuntu 4.8.2-17ubuntu1) ) #42-Ubuntu SMP Fri Mar 28 09:55:49 UTC 2014 (Ubuntu 3.13.0-20.42-generic 3.13.7)03:43
davecheneydmesg.0:[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-3.13.0-20-generic root=UUID=30486aa4-f767-4397-ab88-dd0e02e66651 ro console=hvc0 earlyprintk03:43
davecheneydmesg:[    0.000000] Linux version 3.13.0-24-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #46-Ubuntu SMP Thu Apr 10 19:09:21 UTC 2014 (Ubuntu 3.13.0-24.46-generic 3.13.9)03:43
davecheneydmesg:[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-3.13.0-24-generic root=UUID=30486aa4-f767-4397-ab88-dd0e02e66651 ro console=hvc0 earlyprintk03:43
davecheneylooks like you were running -20, then you got -24 when you rebooted03:44
davecheneywaigani: dmesg ?03:44
waiganidavecheney: http://pastebin.ubuntu.com/7247958/03:45
waiganisorry, is that what you meant?03:45
davecheneywaigani: yup03:45
davecheneyintersting03:45
davecheneyall prevoius panics of this class leave a message in dmesg03:45
davecheneyok, there could be two unrelated issues03:46
davecheneywaigani: could you log a bug for http://pastebin.ubuntu.com/7247954/03:46
davecheneytag it gccgo ppc64el03:46
waiganidavecheney: yep, gladly :)03:46
davecheneywaigani: ta03:47
waiganidavecheney: I'll just double check that I have not done something stupid it the code. It *should* be latest trunk03:47
davecheneywaigani: nah03:47
davecheneythis isn't you03:47
davecheneythe panic is happening in /usr/bin/go03:47
davecheneyif you want to ingestigate03:47
davecheneyapt-get source gccgo-go03:48
waiganiright, that is what stumps me03:48
davecheneythen have a look at that line in build.go03:48
davecheneywaigani: i ran into that about a week ago03:48
davecheneythat was when the floor fell out from under me03:48
waiganilol03:48
waiganiyep, I know that one03:48
waiganidavecheney: https://bugs.launchpad.net/juju-core/+bug/130728903:53
_mup_Bug #1307289: Go panics when running tests on ppc64 <gccgo> <ppc64el> <juju-core:New> <https://launchpad.net/bugs/1307289>03:53
davecheneywaigani: jolly good03:53
davecheneyaxw: ERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)03:57
davecheneyERROR loaded invalid environment configuration: storage-port: expected int, got float64(8040)03:57
davecheney^ did this get fixed recently03:57
davecheneyor should I log a bug03:57
thumperaxw: do you really think that two filtering methods is better than one with a bool?03:58
thumperaxw: I'll write it and look at the diff03:58
axwthumper: I really do. With that approach you can see without a doubt that nothing can change the behaviour at runtime; with the bool you need to ensure that nothing changes it03:59
thumperok04:00
axwdavecheney: wallyworld fixed that already I think04:01
davecheneyaxw: right04:01
davecheneythis is 1.17.8 (ish)04:01
axwyeah, fixed in 1.18.1 I believe04:01
wallyworldyeah, fixed in trunk04:01
davecheneyi think I saw a branch last week04:01
davecheneyright o04:01
thumperaxw: like this http://paste.ubuntu.com/7247988/ ?04:04
axwthumper: yup04:05
axwthumper: comment on countedFilterLine needs fixing04:05
davecheneyping jam ?04:49
jamdavecheney: /wave04:50
davecheneyjam: i think we're eating hte elephant from different ends04:50
davecheneywrt to the api server memory usage04:50
jamI'm not sure I understand04:51
jamthumper: I'm around whenever you would like to hangout04:52
davecheneyjam: ok04:52
davecheneyin trying to trace down the panics i'm seeing04:52
davecheneyi've sort of discovered just how much memory jujud consumes04:52
davecheneyit's horrific04:52
jammy initial results showed about 0.5MB per agent, which wasn't great, but wasn't terrible. but when something gets into a bad situation, I see memory spike terribly04:53
davecheneyjam: gccgo04:53
davecheneyit's more like 250mb per agent04:54
davecheneytwo agesnts per machine04:54
davecheneyat a minimum04:54
jamwow...04:54
davecheneyits complicated04:54
jamthat's way different04:54
davecheneygccgo when not using split stacks04:54
davecheneyallocates an 8 mb stack from the heap04:54
davecheneyso that puts the heap under a lot of pressure04:54
davecheneyeven if large amounts of that 8mb stack remain uncomitted04:55
jamyeah, 8MB per goroutine would be really bad for how much we use it04:55
davecheneyi'm also seeing strange things that make me thing when a client disconnects04:55
jamso I think we have a bug that if a client disconnects in a bad way, it cascades into causing an APIServer restart, but I haven't tracked down the exact issues yet.04:56
davecheneywe're not releasing all the server side resources used by the client04:56
davecheneyin my test04:56
davecheney3 machines04:56
davecheneyon the manual provider04:56
jamIt might just be that it leaves resources behind, right04:56
davecheneykilling the agents on the service units04:56
davecheneycausese memoryu usage to almost double04:56
davecheneywith gc and 8k stacks, you won't feel a few leaked goroutines04:57
davecheneywith 8mb stacks04:57
davecheneyyup, you'll feel it04:57
davecheney$ grep -c goroutine /tmp/out04:58
davecheney24704:58
davecheney^ starts at 169 for 4 agents04:58
davecheneyafter a few restarts of the agents we're up to 24704:58
thumperjam: ok, with you in 1m04:59
davecheneyaxw: http://paste.ubuntu.com/7248220/05:43
davecheneyi don't get it05:43
davecheneyi did destroy-machine as requiested05:43
davecheneythe agents are stopped05:43
davecheneybut I can't destroy the environment05:43
axwdavecheney: umm05:48
axwdavecheney: if they never disappear from state, seems that's a bug. but you can do destroy-machine --force to clean up manually05:49
davecheneyaxw: right05:50
jamthumper: the connection seems to have died05:51
thumperjam: google tells me my connectivity is experiencing issues05:52
=== vladk|offline is now known as vladk
davecheneyaxw: --force doesn't give me any love05:57
axwdavecheney: did it return an error or anything?06:00
axwor just silence?06:00
davecheneysilence06:02
axwdavecheney: the provisioner should remove the machine from state when it's dead... it's entirely possible that someone has changed the provisioner so that it doesn't work with manual anymore06:02
axwwe need a "no provider left behind" act06:03
* davecheney reaches for rm 06:04
axwdavecheney: destroy-environment --force should work as a last resort, if all the machines really are cleaned up06:04
davecheneyok, some good news, 3.13.0-24 may fix the issue06:25
davecheneyoh06:26
davecheneynope06:26
davecheneyhmm06:26
davecheneyhard to tell06:26
davecheneyneed more information06:26
=== rogpeppe1 is now known as rogpeppe
rogpeppemornin' all06:43
jammorning rogpeppe06:43
davecheney'moin06:43
rogpeppejam: hiya06:44
rogpeppedavecheney: yo!06:44
axwmorning rogpeppe06:55
rogpeppeaxw: hiya06:55
axwrogpeppe: landed the EnsureAvailability MP. is there something else you'd like me to look at now?06:55
davecheneyevening06:56
rogpeppeaxw: there is one thing that would be awesome if we could do06:56
rogpeppeaxw: currently we can't upgrade to a HA environment06:56
axwok06:57
rogpeppeaxw: because there is no mongo user configured on the admin database06:57
rogpeppeaxw: we need to change EnsureMongoServer to add one06:57
rogpeppeaxw: (if necessary)06:57
axwrogpeppe: I guess there's a tonne of other things that need to be done for upgrades too, though? like rewriting mongo scripts? or has nate done that already?06:58
rogpeppeaxw: the mongo upstart script is already written when necessary (well, actually, it's been disabled for the moment, pending this)06:59
axwI see06:59
axwok, I will take a look06:59
rogpeppeaxw: to add the admin user, while the service is stopped, we need to start the mongod in non-authenticated mode06:59
rogpeppeaxw: then add the admin user in that mode06:59
axwthanks07:00
rogpeppeaxw: before tearing mongod down again and starting it up normally07:00
rogpeppeaxw: i did manually verify that that does actually work, but i'm afraid i can't remember the exact steps i used07:00
wallyworldrogpeppe: hiya, i have a reflection question for you if you have a moment07:11
rogpeppewallyworld: sure07:11
wallyworldi have a reflect.Value07:12
wallyworldi want to create a nil value pointer07:12
wallyworldeg reflect.ValueOf((*string)(nil))07:12
wallyworldif it were for a *string07:12
wallyworldbut i want to do it dynamically07:12
wallyworldreflect.New(val.Type().Elem()) gives me a pointer to a zero value07:13
wallyworldbut i want a pointer to nil that i can use with value.Set()07:13
wallyworldmake sense?07:13
rogpeppewallyworld: what would the code in normal Go look like? use T for the type of the value07:13
wallyworldvar foo *T07:14
wallyworldfoo = nil07:14
wallyworldfoo is a field of a struct07:14
wallyworldi have it working using a switch on the field Kind and using reflect.ValueOf((*sgtring)(nil))07:15
wallyworldbut i want to do it without that07:15
rogpeppewallyworld: so you want a nil value of the same type as a pointer to the type of the field?07:15
wallyworldyeah, i think so, so that a call to value.Set() works07:16
rogpeppewallyworld: do you want to actually set the value of the field in the struct?07:16
wallyworldyep07:16
rogpeppewallyworld: i don't think you want a pointer, in that case07:17
wallyworldreflect.ValueOf(*mystruct).Elem().FieldByName(fieldName) is what i use to get the value07:17
rogpeppewallyworld: right, well you can just call Set on the result of that07:17
wallyworldso if val is the result of the above07:17
wallyworldi call Set() yes07:17
wallyworldbut i can't find out what to pass to Set()07:17
rogpeppewallyworld: a reflect.Value of the same type as the field...07:18
wallyworldreflect.New(val.Type().Elem())  gives a pointer to "" for example07:18
wallyworldi want to do it dynamically07:18
rogpeppewallyworld: are you just trying to set the field to nil?07:18
wallyworldyes07:18
wallyworldi thought i'd need value.Set()07:19
rogpeppewallyworld: val := reflect.ValueOf(mystructptr).Elem().FieldByName(fieldName); val.Set(reflect.Zero(val.Type())07:20
jamdimitern: morning. We can do a 1:1 if you would like, though officially that's natefinch's responsibility now.07:20
wallyworldbut reflect.Zero() gives me "" doesn't it?07:20
wallyworldrogpeppe: ah, it seems to have worked07:21
wallyworldthank you. for some reason i was thinking reflect.Zero() would give me the wrong thing07:21
jamwallyworld: you did "reflect.New(val.Type().Elem())"07:22
jamnote that Elem is an element of the pointed to type07:22
jamvs07:22
jamreflect.New(val.Type())07:22
jamval.Type() is a pointer, val.Type().Elem() is the actual object07:22
jamand the Zero of a pointer is nil07:22
jamthe Zero of a string is ""07:22
wallyworldah ffs, stupid mistake, thanks07:22
jam(11:18:25 AM) wallyworld: reflect.New(val.Type().Elem())  gives a pointer to "" for example07:22
rogpeppeyeah, New is exactly equivalent to the language primitive "new"07:24
dimiternjam, oh is that so07:30
dimiternjam, well, i can join the regular meeting?07:30
jam1fwereade: looks like we made our N^2 problem with CharmURL worse in 1.18 because of the changes to Upgrade now watching the machine's agent version.08:23
jam1This one may not matter *quite* as much in practice, if you aren't deploying multiple units to machines.08:23
jam1But in my sim tests, we wake up the Upgrader even more often than we wake up the CharmURL08:24
fwereadejam1, ha08:53
fwereadejam1, yeah, I think we write something extra to the machine doc now -- dimitern, do I recall correctly?08:54
fwereadedimitern, btw can we please undo those errors changes? I added a note to the review but it was already landed ofc08:54
jam1fwereade: well, we also wake up every 15 min because the instance poller claims the machine has a new address08:54
fwereadejam1, yeah, indeed08:55
dimiternfwereade, I'm working on that now as a follow-up08:55
fwereadejam1, I cannot figure out how to schedule those sorts of fixes though -- unless we carve out X% of time for paying down tech debt and classify it as that08:56
jam1fwereade: well, if we have a client that wants us to scale to 10000 units, we can bill them for it, as well08:56
jam1fwereade: ATM, I'm mostly focused on "this is where we're at"08:56
fwereadejam1, I guess :)08:58
fwereadejam1, clarity on that front is indeed helpful08:58
jam1fwereade: "juju status" with 10k machines actually is doing ok performance wise, but nobody wants 10,000 lines of output08:59
fwereadejam1, indeed08:59
jam1so there are quite a few things that would need tweaking to scale to that level08:59
jam1fwereade: though for *testing* purposes, the N^2 stuff bites me in the ass a lot. 'juju add-unit" to add another 100 units each to 19 machines takes: 200s, 400s, 1200s, 2800s, and I'll let you know when it finishes seconds.09:00
rogpeppejam1: 1-1?09:01
fwereadejam1, yeah -- I kinda feel like those sorts of issues are... they should work properly *now*09:01
jam1rogpeppe: I just need to switch machines, 1 sec09:01
fwereadejam1, but, ehh, prioritisation :/09:02
jamdimitern: so it looks like Canonical admin got it backwards, its actually you on my team and roger's on nate's team.09:03
jamdimitern: so I think everyone is still on the same standup for now09:04
dimiternjam, what team am i supposed to be on?09:06
jamdimitern: so looking at Alexis's email about Nate and Ian, you're on my team09:10
dimiternjam, yeah, I thought so09:12
rogpeppejam: you've frozen...09:56
jamrogpeppe: I got logged out of my google account somehow09:56
jamend of month?09:56
rogpeppejam: perhaps09:56
perrito666morning09:58
jammgz: 1:1? (just running to the restroom myself)10:00
mgzsure, I'll wait for you thdere10:00
mgz...the hangout, not the restroom10:01
waiganiwallyworld: I can get TestUpgradeJujuWithRealUpload to pass by patching sync.BuildToolsTarball but not when I patch envtools.BundleTools10:07
waiganiwallyworld: here is my attempt at mocking out bundleTools: http://pastebin.ubuntu.com/7248910/10:08
wallyworldwhat is the error?10:08
waiganiwallyworld: ... and http://pastebin.ubuntu.com/7248930/10:08
waiganiwallyworld: error uploading tools: no tools uploaded10:08
wallyworldwaigani: why is the bundle tools mock uploading tools as well?10:12
wallyworldit shouldn't be doing that10:12
waiganiwallyworld: good question! I just read the logic, let me give that another go ...10:13
wallyworldthat is my guess as to what the error is, as there would be no metadata or anything10:13
waiganiwallyworld: I basically ripped the logic out of BuildToolsTarball10:13
wallyworldupload tools needs the tarball and also metadata10:13
waiganiwallyworld: right, let me try again10:14
jamespageevilnickveitch, the links on https://juju.ubuntu.com/docs/ looked foobared to me - are you aware?10:30
evilnickveitchjamespage, ooh. they were working yesterday. let me have a look10:30
perrito666fwereade: morning, are you around?10:31
evilnickveitchjamespage, hmm. seem to be working for me - was there a particular page or link that wasn't working for you?10:32
jamespageevilnickveitch, the links on the lhs of the page don't appear for me10:33
evilnickveitchjamespage, the links are pasted in by a bit of javascript at the end of the page10:33
evilnickveitchso either the js isn't loading10:33
evilnickveitchbecause something is messed up on that page, or the page isn't loaded10:34
jamespagehmm10:34
evilnickveitchhave you tried refreshing etc?10:34
evilnickveitchare you sure page has finished loading? some external assets take a while to load sometimes, and the link JS is right at the end10:35
TheMueevilnickveitch: quick test here on FF show no links too10:37
TheMueevilnickveitch: jamespage is right10:37
evilnickveitchTheMue, jamespage okay, I guess mine was fetching from cache. i will check into it10:38
evilnickveitchTheMue, jamespage okay, I found the problem, some wonky HTML which prevents the rest of the page loading, it's only on the front page, the others should work fine10:43
evilnickveitchI will fix it ASAP10:43
TheMueevilnickveitch: Great, thanks.10:44
mgzevilnickveitch: do you not validate? :P10:45
evilnickveitchmgz, it was the stupid linter that caused the problem :P10:46
mgzevilnickveitch: :D10:46
fwereadeperrito666, sorry, I completely missed yu there10:47
perrito666fwereade: happns :)10:55
perrito666fwereade: still missing the transaction hooks tests but https://codereview.appspot.com/86430043 I did ignore some of your comments because they broke functionality :) but I am willing to re-try once I make sure this goes the right way (altough my assert is either broken or making blow an error existing that was not being discovered bc I am failing 5 tests) https://codereview.appspot.com/8643004310:57
fwereadeperrito666, cheers, I'll take a look10:58
waiganiwallyworld: I exported tools.archive: http://pastebin.ubuntu.com/7249092 (tests pass now)10:59
wallyworldgreat, in standup, will look later10:59
waiganiah11:00
wallyworldhad a quic look, looks nice and simple11:00
wallyworldlike i'd hoped11:00
waiganiyeah, just hope it's okay that I've made Archive public - adding noise to the API?11:01
waiganianyway, I'll leave it for the review11:01
fwereadeperrito666, rogpeppe has a deepcopy package that may help with cloning11:04
rogpeppefwereade, perrito666: it doesn't work any more11:04
fwereaderogpeppe, bah11:04
perrito666fwereade: ah, might be much better than the by-hand copy I am doing ther...11:04
perrito666rogpeppe: :(11:04
rogpeppeit was trying to be too clever11:04
rogpeppeperrito666: what are you copying?11:05
perrito666rogpeppe: units and machines11:05
rogpeppeperrito666: why?11:05
rogpeppeperrito666: is it just for testing?11:06
perrito666rogpeppe: sory I was listening on the other side :) no, not just for testing11:06
perrito666trying to get a copy that ensures me won't change wile I am working in it in certain circumstances (I am just making a method of something previously done by hand)11:07
rogpeppeperrito666: what are you actually trying to do?11:27
fwereaderogpeppe, clone state.Machine/Unit -- I commented that it'd be nice to do it properly11:28
rogpeppefwereade: ah11:28
fwereaderogpeppe, there are a few places we do it in varyingly hackish ways iirc11:28
evilnickveitchTheMue, jamespage docs should be working now11:29
jamespageevilnickveitch, ta - next question - do release notes get published on /docs ?11:29
evilnickveitchjamespage, very good question - not as yet, but I do have a branch that will add them to the reference section. At least for the ones I can find11:30
evilnickveitchCheck back after 7.30pm11:30
jamespageevilnickveitch, its something that ceph does quite well upstream11:31
evilnickveitchjamespage, cool, I will check out what they do. I was just intending to dump them all in newest first order with an index of links at the top11:31
rogpeppefwereade, perrito666: two thoughts: 1) we could probably avoid doing a deep copy of the machineDoc, as we don't allow mutation of pieces inside its components11:32
fwereaderogpeppe, I suspect that statement is only mostly accurate11:33
rogpeppefwereade, perrito666: 2) if we decided to, it would be easy (but not greatly efficient) to clone by serialising/deserialising through bson11:33
fwereaderogpeppe, perrito666: ha, I could live with that11:34
rogpeppefwereade: tbh i think it's reasonable to have methods that return mutable values with a stipulation that you should not modify the contents11:35
rogpeppefwereade: (i presume you're thinking about the Jobs method here)11:36
rogpeppefwereade: if we did that, then Clone could be ultra cheap11:38
mgzrogpeppe: got around to finishing the last few test failures: https://codereview.appspot.com/8754004311:41
rogpeppemgz: thanks. looking.11:42
rogpeppemgz: LGTM11:43
mgzrogpeppe: thanks!11:43
rogpeppeoops, upgrade-juju seems to have killed its own environment11:48
* rogpeppe hates it when that happens11:49
rogpeppehmm, this is the second time this morning i've had a live bootstrap fail with this error:11:53
rogpeppe2014-04-14 11:53:05 ERROR juju.cmd supercommand.go:299 cannot write file "tools/releases/juju-1.19.0.1-precise-amd64.tgz" to control bucket: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.11:53
=== psivaa is now known as psivaa-lunch
fwereaderogpeppe, perrito666: if the clone were an internal method with "do not modify result" I'd be fine12:15
fwereaderogpeppe, perrito666: if it's exported there's just way too much opportunity to screw up at a distance12:16
rogpeppefwereade: i'm thinking of the Jobs method only12:16
fwereaderogpeppe, does Jobs not copy? it should ;p12:16
* perrito666 sees another ball coming his way :p12:17
rogpeppefwereade: well, if Jobs copies, then why do we need a deep copy of the machine doc?12:17
fwereaderogpeppe, plenty of methods mutate bits and pieces of state12:17
rogpeppefwereade: (it doesn't, BTW, but it probably should)12:17
rogpeppefwereade: which methods mutate stuff that's pointed to by the machine doc, rather than machine doc fields themselves?12:18
fwereaderogpeppe, and considering current cases misses the point; things will change and if we expose this functionality without insulating the objects frm one another we *will* screw it up12:18
rogpeppefwereade: if we make all methods return copies of the underlying data, what is there to screw up?12:19
rogpeppefwereade: AFAICS this should be fine: func (m *Machine) Clone() *Machine { m1 := *m; return &m1}12:22
fwereaderogpeppe, various methods write to the document on success12:23
rogpeppefwereade: that's fine12:23
rogpeppefwereade: the document is stored in the Machine as a value type. as long as none of our Machine methods change things that are pointed to by things in the machine doc, we're ok12:24
fwereaderogpeppe, we can be sure that none of them will ever change, say, a slice on the document?12:24
rogpeppefwereade: that's not a hard invariant to maintain (it's local)12:24
rogpeppefwereade: we can be sure they don't now, and it's not hard to verify that in the future12:24
fwereaderogpeppe, my experience is that it's a very difficult invariant to maintain, even with a team of ultra-smart people half the size of this one12:25
rogpeppefwereade: i think it's better than adding to memory pressure and writing a bunch more code that needs to be maintained every time a field is changed.12:25
fwereaderogpeppe, if we're exporting a Clone method, that clone method must deep-copy the data12:26
fwereaderogpeppe, if it's not exported I'm willing to be a bit laxer12:26
fwereaderogpeppe, not because it won;t screw us, because it *will*12:27
fwereaderogpeppe, but because at least the scope of the weirdness will be small enough that we'll have a chance of dealing with it12:27
rogpeppefwereade: tbh, i would prefer us to make Machine etc immutable12:27
rogpeppefwereade: i don't think we gain much by having methods mutate our local idea of state12:27
fwereaderogpeppe, that's probably a reasonable position, especially considering current usage, but it's not really on the table at the moment12:29
fwereaderogpeppe, in terms of potentially fiddly changes, errgo has a much bigger payoff ;p12:29
* fwereade needs to go to the airport, hadn;t realised he was flying so early12:29
* fwereade will say hi again this evening if he can12:29
natefinchrogpeppe: gotta help with my daughter for a bit, probably be 45-60 mins12:29
rogpeppefwereade: where are you going?12:30
rogpeppenatefinch: ok12:30
perrito666rogpeppe: well have to wait until he returns to know :p12:30
rogpeppeha, i have a machine where provisioning failed (amazon says "Server.InternalError: Internal error on launch") but i can't call retry-provisioning because the machine isn't in an error state12:45
dimiternrogpeppe, mgz, errors package improvements - https://codereview.appspot.com/87560043 - it's a bit big, but most changes are renames12:52
mgz...scary12:53
rogpeppedimitern: the Suffix field looks like it's not used - is it?12:56
rogpeppedimitern: similarly ArgsConstructor doesn't appear to be used13:01
dimiternrogpeppe, it's used in tests only13:01
rogpeppedimitern: right. i'm not sure we need to pollute the production code with test-specific functionality.13:02
dimiternrogpeppe, allErrors is unexported - how does it pollute?13:02
rogpeppedimitern: it makes the code more complex13:02
dimiternrogpeppe, so you're saying let's have 2 almost identical []struct{} defined - one for testing, the other for production?13:03
rogpeppedimitern: i don't think you need the table at all in the production code - i'm just writing up a suggestion13:03
dimiternrogpeppe, if it stays like this there's less chance of forgetting to add a new error type to allErrors and have it tested13:04
dimiternrogpeppe, ok, thanks13:04
sinzuijamespage, Do you know who I can show Bug #1305280 to get an apparmor issue addressed?13:06
_mup_Bug #1305280: juju command get_cgroup fails when creating new machines, local provider arm32  <armhf> <local-provider> <lxc> <packaging> <juju-core:Triaged> <apparmor (Ubuntu):New> <https://launchpad.net/bugs/1305280>13:06
jamhi sinzui, I had some CI things I wanted to work with you on13:07
sinzuihi jam13:07
jamsinzui: specifically, looking at the log files, you're using "juju-1.18.0" to do "scp"13:08
jamwhich is known broken for you13:08
jamand we released 1.18.1 with that specific fix for you13:08
jamthough beyond that, "juju scp" always requires the API server to be functioning, which is what is breaking in the "upgrade" test13:08
jamso it might be nice if we tried to use raw "scp" if we can.13:08
jameither try to raw scp first, or try "juju scp" first and fall back13:08
jamsinzui: we can get the API IP address from the environment/foo.jenv file13:09
sinzuijam, yes, abentley and I discussed the fallback13:09
jam(If we've ever connected successfully, we'll be caching the value there, and we'd like to get to the point where we cache it at the end of bootstrap)13:09
jamsinzui: I'm trying to debug the local bootstrap problem. I haven't reproduced it yet, but I'm currently on Trusty13:10
sinzuijam, and we can also update to 1.18.1 today13:10
jamso I have to fire up a Precise instance first13:10
=== psivaa-lunch is now known as psivaa
sinzuijam, interesting. http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/aws-upgrade-trusty/ shows trusty is upgrade fails in parallel to precise13:11
* sinzui starts update and upgrade13:12
sinzuijam, I can re-run an upgrade test for the cloud and series of your choice13:14
jamsinzui: but that is not local13:14
sinzui1.18.1 is installed13:14
sinzuinow13:15
jamI'm just starting with trying to fix the local-deploy issue13:15
jamso I can try to not fire up a remote machine just to debug upgrade13:15
jamsinzui: the main question about local right now is that probably the version of mongod running on precise is different from trusty13:18
jamso while I think we also have an upgrade bug13:18
jamIt might be that bootstrap is failing because trusty has 2.4.9 (which works for us), and Precise is running 2.4.6 or something13:18
jamI just realized my test won't work, as local under LXC doesn't work13:19
rogpeppedimitern: reviewed (kinda)13:21
dimiternrogpeppe, cheers, I have the next one for you btw :) - it's tiny https://codereview.appspot.com/8747004413:22
rogpeppedimitern: i agree that the mgo/txn docs could be clearer, BTW13:23
dimiternrogpeppe, no doubt about it13:24
rogpeppedimitern: LGTM13:24
dimiternrogpeppe, ta!13:25
dimiternrogpeppe, updated https://codereview.appspot.com/87560043 - it's nicer now I think13:44
rogpeppeaxw: ha, it seems that 7 maximum parallel try attempts is way too small for real world API dialling13:44
rogpeppeBTW I now have a functional environment where I destroyed the bootstrap instance13:45
* axw tries to remember why it's 713:45
axwcool :)13:45
* dimitern will bbiab (1h)13:46
rogpeppeaxw: in my environment with 3 state servers, i see 21 addresses cached in the .jenv file...13:47
axwrogpeppe: I guess I was thinking one per state server, but we will need more for each address type...13:47
rogpeppeaxw: and because each dial attempt takes ages to time out, we don't get to try the second valid address until it has.13:47
rogpeppeaxw: i'm tempted to just allow unlimited concurrent dials13:48
axwrogpeppe: how do you have 21 addresses for 3 state servers?13:49
axwwhy so many?13:49
rogpeppeaxw: http://paste.ubuntu.com/7249818/13:49
rogpeppeaxw: machine-local addresses, ipv6 addresses, etc13:49
axwrogpeppe: we are ignoring the machine-local ones, right?13:50
mgzthat number should get filtered a little, yeah13:50
axwI should know the answer to this :)13:50
rogpeppeaxw: no, not in api dial, because we don't currently store the metadata in the jenv13:50
rogpeppeaxw: that needs to be fixed13:50
axwah right, yeah13:50
axwrogpeppe: so really, I think we'd have 2*state-server for both public and internal, if we had that13:51
rogpeppestill, the point remains that you probably want to try dialling all your api server addresses at once, because sod's law says that the one you don't try is the only one that works13:51
axwtrue. there's the private-inside-private scenario to cater for13:52
rogpeppeaxw: probably 4, because DNS-name vs numeric13:52
axwrogpeppe: I was thinking public IP & name, but yes we do need to try private too13:52
rogpeppeaxw: yup13:52
jamjamespage: sinzui: do we know why cloud-archive:tools only has juju-1.16.3 ?13:52
jamespagejam: its called a blocked SRU13:53
jamespagecan't get into cloud-tools before you go into saucy13:53
axwrogpeppe: I guess we can do unlimited... if we get in trouble, we could try a more complicated initially-short but expanding timeout13:54
jamespagejam: we don't have an MRE yet so I have to detail how to test every bug in full - see bug 127752613:55
_mup_Bug #1277526: [SRU] juju-core 1.16.6 point release tracker <juju-core (Ubuntu):Fix Released> <juju-core (Ubuntu Saucy):In Progress by james-page> <juju-core (Ubuntu Trusty):Fix Released> <https://launchpad.net/bugs/1277526>13:55
jamjamespage: ouch. Going from 1.16.6 => 1.18.X is going to be a massive PITA for that.13:55
jamespagejam: there is not SRU for 1.16.6 -> 1.18.x13:55
jamjamespage: I realize there isn't (yet), but wouldn't the plan be to have the stable version of Juju in cloud-archive:tools ?13:56
jamespagejam: that happens when cloud-archive:tools get's superceded by the trusty version13:56
jamespagesuperceeded/replaced13:56
jamespagejam: actually - while I'm thinking about this - how does backup/restore work on 1.16.6?13:57
jamespageI see the update-bootstrap-node stuff mgz did in the bug list13:57
jamjamespage: AFAIK it works through all the 1.16's because that is what we wrote it against.13:57
jamespagejam: but for 1.16.x there is no backup or restore plugin?13:58
jamjamespage: I think it was added in 1.16.5 ?13:58
jamespagereally?13:58
jamjamespage: we added it for CTS13:58
mgzyeah, it was a bit of a fudge for minor version13:59
natefinchrogpeppe: hey, sorry, that took a lot longer than expected, obviously.14:02
rogpeppenatefinch: i'm just about to go to lunch14:02
natefinchrogpeppe: ok, where are we right now?14:03
rogpeppenatefinch: i've had a mostly-success with my integrated branch14:03
rogpeppenatefinch: two things we need to fix: agent.Config.StateInfo needs to return localhost always14:04
rogpeppenatefinch: api.Open should try all addresses concurrently14:04
rogpeppenatefinch: oh, and one other one line fix14:04
rogpeppenatefinch: APIWorker needs to fetch agent config again after dialling14:05
natefinchrogpeppe: ok, I can start working on those.  Should I branch off your branch or just do that in a new branch off trunk?14:06
rogpeppenatefinch: i'd just do new branches off trunk14:06
rogpeppenatefinch: they're all trivial14:07
natefinchrogpeppe: yep, ok14:07
jamjamespage: "juju-local" doesn't seem to depend on rsyslog-gnutls14:15
jamah, maybe it does now, but upgrade didn't do it?14:16
jamweird14:16
jamespageit does14:16
jamjamespage: I thought I did apt-get upgrade, but I had to "apt-get install juju-local" again to get it14:16
jamjamespage: anyway, it looks like 1.18.1 does depend on it, so thanks for that, sorry about the confusion14:17
jamespagenp14:17
jamsinzui: I'm unable to reproduce the "local bootstrap" failure with trunk and cloud-archive:tools version of mongo (2.4.6)14:26
jamI see the replicaSet line, but it doesn't fail14:26
sinzuijam, I don't know which bug you are working on. The lxc bug I know of is about apparmor: bug 130528014:32
_mup_Bug #1305280: juju command get_cgroup fails when creating new machines, local provider arm32  <armhf> <local-provider> <lxc> <packaging> <juju-core:Invalid> <apparmor (Ubuntu):New> <https://launchpad.net/bugs/1305280>14:32
jamsinzui: https://bugs.launchpad.net/juju-core/+bug/130621214:32
_mup_Bug #1306212: juju bootstrap fails with local provider <bootstrap> <ci> <local-provider> <regression> <juju-core:In Progress by jameinel> <https://launchpad.net/bugs/1306212>14:32
jamsinzui: since I can't reproduce that right now, I'm switching to https://bugs.launchpad.net/juju-core/+bug/130745014:33
_mup_Bug #1307450: upgrading from 1.18.1 to 1.19 (trunk) fails (API server stops responding) <ci> <juju-core:Triaged by jameinel> <https://launchpad.net/bugs/1307450>14:33
sinzuijam: please do14:34
jamsinzui: so offhand, we have a different bug, which is that "juju upgrade-juju --upload-tools" doesn't end up putting the tools where the agents can find them. :(14:38
sinzuidamn14:39
jamsinzui: it looks like it uploads the tools, but doesn't make it readable14:40
sinzuijam, We would be happy if local-provider honoures tools-metadata-url. We want to set it to a testing stream since local has to use streams to get tools for different series14:40
sinzuijam, but I won't redirect you for delivering the fastest fix14:40
jamsinzui: well this is testing "juju-1.19.0 upgrade-juju --upload-tools"14:41
jamwhich should be working, but something isn't right14:41
alexisbmorning all (and good evening)14:43
jamsinzui: sorry I couldn't get any farther on this, but I have to EOD14:44
jamwallyworld wanted to pick it up in the morning14:44
sinzuiThank you for you time jam14:44
jamsinzui: and I think he was the one who did the changes to "upload-tools" so he probably has better insight there14:44
natefinchalexisb: morning alexis  (I think the convention is just to use the greeting relative to your own time zone... everyone knows what you mean :)14:47
jammorning alexisb14:47
jamyou're up awfully early14:48
jamsinzui: launchpad Q. If I have sensitive all-machines.log, can I upload it as a private attachment?14:48
sinzuijam No private attachment :(14:52
jamsinzui: fortunately VIM can global search for the secrets and replace them with XXX without too much trouble14:53
rogpeppeaxw: ping14:57
rogpeppealexisb: hiya14:57
jamsinzui: hmm... It looks like "juju bootstrap" started creating i386 instances, and you cant "upgrade-juju --upload-tools" with a 64-bit version14:58
jamit will let you, but it can't find the i386 tools (for obvious reasons)14:58
axwrogpeppe: hey14:58
alexisbjam, not that early 8am for me14:59
jamsinzui: can you check if your 1.18.1 bootstrapped instances are i386 ?14:59
jamit was for me14:59
jamwhich is also a bug14:59
rogpeppeaxw: the existing code doesn't seem to mention juju-mongodb14:59
rogpeppeaxw: do you know how we should tell if it's available?14:59
rogpeppeaxw: (looking at your comments on https://codereview.appspot.com/86920043 )14:59
jamalexisb: well, you were on a bit earlier, but I did the math wrong. 11 hours makes you 1 hour closer, not 1 hour farther away15:00
axwrogpeppe: right. no, I don't. I guess it just hasn't been done yet - so that can be TODO15:00
jamI thought it was 5:30 ish15:00
rogpeppeaxw: ok, cool15:00
axwrogpeppe: this upgrade thing is a massive PITA. may take me a little while yet to come up with a nice solution15:01
rogpeppeaxw: where do the main difficulties lie?15:01
axwrogpeppe: upgrade steps require API server & state, API server dies when state gets bounced15:02
rogpeppeaxw: don't do it in upgrade steps15:02
rogpeppeaxw: do it in EnsureMongo15:02
jamsinzui: so I have a bit more I can try to go on tomorrow, or *maybe* later tonight depending on how things go.15:02
rogpeppeaxw: where we're already stopping and restarting the service15:02
sinzuijam, okay. I am still looking for  the arch that was used15:03
axwrogpeppe: I *think* there's a problem then that server.pem may not exist15:03
axwerr15:03
axwmaybe not that one15:03
axwthere was another file that was created on upgrade15:03
jamsinzui: I think we have a 1.18.2 Critical bug that 1.18.X no longer prefers amd6415:03
rogpeppeaxw: EnsureMongoServer is responsible for writing out the files that mongo requires, so we *should* be ok, i think15:03
axwrogpeppe: anyway. I did start down that path... I'll keep looking into it tomorrow15:03
axwok15:04
rogpeppeaxw: thanks a lot15:04
jamI *think* someone commented that it was because of PPC/ARM64 enablement15:04
jam(we can't force amd64, so we let the cloud tell us what to use, but that means if both i386 and amd64 are available we now do i386, when we should do amd64 if possible)15:04
axwsleepy time.. night all15:04
jamsinzui: I do believe you can force it with: juju bootstrap --constraints="arch=amd64"15:04
rogpeppeaxw: BTW the reason for moving InitiateMongoServer into peergrouper is...15:05
jamand now, I really must go spend time with my family :)15:05
rogpeppetoo late!15:05
sinzuijam. CI has started a new round of tests. These will use 1.18.1. I will watch them for arch mismatches15:09
rogpeppenatefinch: ping15:11
natefinchrogpeppe: hi15:11
rogpeppenatefinch: hangout?15:12
natefinchrogpeppe: sure15:12
rogpeppenatefinch: https://plus.google.com/hangouts/_/canonical.com/juju-core?authuser=115:12
rogpeppecould someone have a look at this please? we've addressed comments, but it still needs a LGTM and it's a major blocker for HA. https://codereview.appspot.com/86920043/15:39
sinzuijam: I don't see an arch mismatch deploying 1.18.1. CI doesn't use upload-tools when deploying stable (since upload-tools is officially an developer feature)15:41
* sinzui tries locally15:41
natefinchdimitern, mgz, jam, ping on the review above that roger posted15:41
rogpeppetrivial review anyone? https://codereview.appspot.com/8756004415:57
rogpeppedimitern, mgz, jam: ^15:58
dimiternrogpeppe, looking16:03
rogpeppedimitern: ta!16:03
dimiternrogpeppe, i'd swap you for https://codereview.appspot.com/87560043 :)16:04
rogpeppedimitern: will do, after i've finished investigating this issue16:05
dimiternrogpeppe, sure, np - just reminding16:06
dimiternrogpeppe, LGTM16:06
rogpeppedimitern: we really really need a review of https://codereview.appspot.com/86920043/ if you could muster the energy for it16:07
rogpeppedimitern: but thanks for that review too :-)16:07
dimiternrogpeppe, looking that one as well16:08
rogpeppedimitern: much appreciated16:08
jamrogpeppe: on https://codereview.appspot.com/87560044/ is there something about direct State destruction that we lose with your patch?16:33
rogpeppejam: no16:33
rogpeppejam: AFIK16:33
rogpeppeAFAIK16:33
rogpeppejam: we only connect to the API if we don't use --force, and in that case we really want to use the usual API connection methods16:36
dimiternrogpeppe, natefinch, that HA CL LGTM with some trivials16:53
rogpeppedimitern: thanks muchlu16:54
rogpeppey16:54
dimiternrogpeppe, i'll poke you again about https://codereview.appspot.com/87560043 though :) (last time for today)16:54
rogpeppedimitern: ok, will look now :-)16:54
dimiternrogpeppe, tyvm!16:55
rogpeppedimitern: the only comment i might have would be that it might be more idiomatic to have the error types themselves as pointer types, embedding wrapper as a value16:58
rogpeppedimitern: in fact, i think that's definitely worth doing16:59
rogpeppedimitern: because it means that %#v will work better on errors16:59
rogpeppedimitern: so: type notFound {wrapper}16:59
rogpeppedimitern: and func (*notFound) new( etc16:59
dimiternrogpeppe, ok, that sgtm17:44
dimiternrogpeppe, did I see LGTM as well? :)\17:45
rogpeppedimitern: i really think those tests could use sorting out17:45
dimiternrogpeppe, which ones?17:45
rogpeppedimitern: i've been struggling to understand the logic17:45
rogpeppedimitern: errors_test.go17:45
rogpeppedimitern: after some effort, i think i've managed to tease out a suggestion17:45
dimiternrogpeppe, for each error in allErrors I add like 20ish cases17:45
dimiternrogpeppe, I didn't want to repeat the same tests for all types and possibly miss something along the way17:46
rogpeppedimitern: i know, but the logic is quite a bit more complex than it needs to be17:46
rogpeppedimitern: lines 180 to 190 are really hard to follow17:46
rogpeppedimitern: and the errorSatisfier type doesn't seem to be doing much any more17:47
dimiternrogpeppe, I confess I kept it only for the String() method17:47
rogpeppedimitern: yeah, it feels like a weird holdover17:48
rogpeppes/holdover/relic/17:48
rogpeppedimitern: and you don't even need the String method for what you're using it for17:48
dimiternrogpeppe, I need a way to compare 2 satisfiers (== or !=) and i can't do it with func pointers it seems17:49
rogpeppedimitern: you could have two nested loops over allErrors17:49
dimiternrogpeppe, isn't that worse than using reflect?17:50
rogpeppedimitern: then you just need to compare indexes (or perhaps pointers if you prefer)17:50
rogpeppedimitern: it's certainly simpler17:50
rogpeppedimitern: so i think it's better17:50
dimiternrogpeppe, but I have test.satisfier and allErrors[i].satisfier17:51
rogpeppedimitern: you don't need test.satisfier17:51
dimiternrogpeppe, I can't just compare them and the indexes don't matter17:51
rogpeppedimitern: the only reason you have that is that you're mixing in nil satisfier tests17:51
rogpeppedimitern: they don't really fit, and they complicate all the logic17:51
dimiternrogpeppe, hmm..17:51
dimiternrogpeppe, I guess I can make a separate set of tests + loop in another test case for nils17:52
rogpeppedimitern: i'd move the contextf tests into their own function too17:52
rogpeppedimitern: it's really a totally independent function17:52
dimiternrogpeppe, but it needs to loop over allErrors as well17:52
dimiternrogpeppe, ok, can be done separately I agree17:53
rogpeppedimitern: not necessarily17:53
rogpeppedimitern: its logic is independent of allErrors17:53
rogpeppedimitern: you do need to check that each error implements the newer interface, but that's easy to check statically17:53
dimiternrogpeppe, the origin of this CL is the behavior of ErrorContextf - I need to check each error type is preserved17:54
rogpeppedimitern: fair enough. but that's a very simple test and loop over allErrors.17:54
dimiternrogpeppe, yeah, but that's an implementation detail that you, as a user of Contextf doesn't need to know17:54
dimiternrogpeppe, exactly17:55
jamnatefinch: https://codereview.appspot.com/87570043/ <= log the version of mongo as we create the upstart job17:55
dimiternrogpeppe, ok, I appreciate your comments and will look at it a bit later or tomorrow17:56
* dimitern reached eod17:56
rogpeppedimitern: np, sorry for the push-back.17:56
dimiternrogpeppe, not to worry - it was useful :)17:57
jamsinzui: just in case it wasn't clear, "juju upgrade-juju --upload-tools" failed because bootstrap picked an i386, but upload-tools can only upload the amd64 that I'm running.18:05
jamso it was a combination of bug #130440718:05
_mup_Bug #1304407: juju bootstrap defaults to i386 <amd64> <apport-bug> <ec2-images> <metadata> <trusty> <juju-core:Triaged> <juju-core 1.18:Triaged> <juju-core (Ubuntu):New> <https://launchpad.net/bugs/1304407>18:05
jamand bug #128286918:06
_mup_Bug #1282869: juju bootstrap --upload-tools does not honor the arch of the machine being created <bootstrap> <constraints> <ppc64el> <upload-tools> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1282869>18:06
sinzuio O (clue x 4)18:06
jamsinzui: so I'm going to try it again and see if I can reproduce the failing to upgrade (for the right reason)18:07
jamsinzui: though it looks like bug #1282869 isn't quite complete, as we fixed "bootstrap" but not "upgrade-juju"18:07
_mup_Bug #1282869: juju bootstrap --upload-tools does not honor the arch of the machine being created <bootstrap> <constraints> <ppc64el> <upload-tools> <juju-core:Fix Released by wallyworld> <https://launchpad.net/bugs/1282869>18:07
jamsinzui: I reproduced the "cannot upgrade to 1.19.0" bug: 2014-04-14 18:11:40 ERROR juju runner.go:220 worker: exited "state": cannot log in to admin database as "machine-0": unauthorized mongo access: auth fails18:12
jamnatefinch: ^^18:12
jamrogpeppe: if you're still around, found the upgrade bug18:26
jamspecifically, 1.19.0 always tries to login to the "admin" db18:26
rogpeppejam: really? cool.18:26
jambut on an upgrade, it doesn't have rights as machine-018:26
rogpeppejam: oh of course, dammit18:26
jamrogpeppe: so... do we back out logging into admin, do we make it "try but be ok if it fails" ?18:27
jamrogpeppe: if we aren't going to do the full "upgrade support for HA" then we need to put in hacks18:27
rogpeppejam: i think we've got to do the latter18:27
rogpeppejam: otherwise HA won't work even when not upgraded18:27
jamrogpeppe: so out of curiousit, why are we doing "admin := session.DB(AdminUser)" I realize the name of the db is "admin" but that shouldn't be AdminUser should it?18:28
rogpeppejam: hmm18:28
jamrogpeppe: it is just that we're using the "admin" as the name of the user as the name of the DB18:29
jammostly just a constant that "works" but isn't actually the right named constant18:29
rogpeppejam: yeah, it does seem odd18:30
jamrogpeppe: k, the other Database names are just hard-coded strings in the function, so I'll follow suit for clarity18:31
rogpeppejam: sgtm18:33
rogpeppejam: personally i like hard-coded strings anyway - i think they're often clearer18:33
rogpeppejam: DB(AdminUser) does seem wrong to me. i don't know what i was thinking.18:35
jamrogpeppe: is there an obvious way how to remove an agent from admin? (I'd like to add a test that we come up ok when we can't access 'admin' as we'd run into after upgrade)18:36
rogpeppepwd18:37
jamafaict we don't do anything with the "admin" db we just logged into18:37
jamat least not directly18:37
jamthe other DB objects in that func are put into the State object18:37
rogpeppejam: st.db.SessionDB("admin").RemoveUser(AdminUser)18:37
jamrogpeppe: thanks18:38
jamwell, in this case "RemoveUser(info.Tag)" aka ("machine-0"18:38
rogpeppejam: yeah18:38
rogpeppejam: no, we don't do anything with the admin db18:38
rogpeppejam: but we do need access to it for manipulating the replica set18:39
jamrogpeppe: right, it allows you to call particular functions *if* you're logged in18:39
rogpeppejam: yeah18:39
jamside-effect is on Mongo side18:39
jamrogpeppe: presumably we also need to change State.setMongoPassword to allow for AddUser on the "admin" table to fail?18:41
jamor we shouldn't ever be creating one of those18:41
jamsince we can't be in HA we shouldn't be adding any machines that would want to18:42
rogpeppejam: yeah18:42
rogpeppejam: we should change EnsureAvailability to fail if we're not in replica set mode18:43
rogpeppejam: that way people can't get themselves into a nasty twist18:43
jamrogpeppe: &mgo.LastError{Err:"not authorized to remove from admin.system.users",18:57
rogpeppejam: hmm18:57
rogpeppejam: i suppose it might have removed the user anyway18:57
jamthat is after trying to do:18:58
jamadminDB := s.state.db.Session.DB("admin")18:58
jampassword := testing.FakeConfig()["admin-secret"].(string)18:58
jamadminDB.Login(AdminUser, password)18:58
jamso theoretically ensuring that I'm admin, though I need to check the err code18:58
jamauth fails ...18:58
jamrogpeppe: from what I can tell, TestingInitialize returns State object that isn't actually logged into the Admin db18:59
jamTestingInfo doesn't have a password18:59
jamrogpeppe: what is *really* strange is that SetMongoPassword was perfectly happy, which *should* be setting the password in "admiN"18:59
jamso you are authed to add people, but not remove them?19:00
rogpeppejam: it does seem odd19:00
rogpeppejam: mongo has some weird semantics sometimes19:00
jamrogpeppe: so I haven't figured out the password for "admin", but I have found that if I call Machine.SetMongoPassword() I can then log into "admin" as "machine-0" with the password I just gave it, and then use *those* credentials to remove the "machine-0" user.19:09
jamWTWTTWW WTF?19:09
natefinchlol mongo is wacky19:09
jamnatefinch: yeah, so $CURRENT_USER can add admins, but can't remove them, but you can add one, log in as it, and then do whatever-you-want19:10
jamnatefinch: apparently the model changed in mongo 2.6: http://docs.mongodb.org/manual/reference/method/db.addUser/19:13
jamrogpeppe: from what I can tell, calling adminDB.RemoveUser("machine-0") removes it completely, and not just from admin19:13
rogpeppejam: ha19:14
rogpeppejam: so i guess you'll have to remove it, then add it back to the ones you want it on19:14
jamrogpeppe: actually, looks like I was just screwing up the password, so I need to try again19:15
jamfinally, failing in the way I wanted19:16
jamand now success19:16
rogpeppejam: so mongo wasn't being weird at all, in fact?19:16
jamrogpeppe: well, I still have to log in as the agent I just created to delete it19:17
jamthat is still weird-as-fuck19:17
rogpeppejam: ah yes19:17
jambut removing it from admin only removes it from admin19:17
sinzuiI think we want a juju-local-kvm package to sort of kvm deps. juju-local is lxc centric19:22
natefinchrogpeppe: lp:~natefinch/juju-core/043-localstateinfo19:26
jamnatefinch: rogpeppe: It makes me wonder if we couldn't just add ourselves if we weren't in admin to start with ...19:31
rogpeppejam: i *think* i tried that19:32
rogpeppejam: but try it anyway19:32
jamrogpeppe: natefinch: sinzui: https://code.launchpad.net/~jameinel/juju-core/soft-login-failure-1307450/+merge/21574219:34
jamI was able to reproduce the upgrade failure with the local provider19:34
jamand that patch lets it get further19:34
rogpeppejam: codereview?19:34
sinzui\o/19:34
jamrogpeppe: lbox is thinking about it19:34
jamsinzui: not that there won't be any other bugs, but the first one I think I got19:35
jamrogpeppe: weird, still thinking19:35
jamrogpeppe: https://codereview.appspot.com/8773004319:35
jamrogpeppe: natefinch: I'm off to sleep, unfortunately, so if it needs tweaks, I'm sure curtis would appreciate you picking it up from here.19:36
rogpeppejam: i like "haven't implemented bug #xxxx" - sounds like we want to implement a bug...19:36
jamor you can point wallyworld at it when he gets up19:36
natefinch jam:cool19:36
rogpeppejam: thanks19:36
rogpeppenatefinch: FWIW, the last remaining diffs that we haven't already got branches in progress for: http://paste.ubuntu.com/7251459/19:41
natefinch rogpeppe: wow, that's awesome19:41
bacsinzui: so swift remains dead to us.  jenkins for charmworld uses juju to update to the newly blessed code, so staging is now stuck and useless.21:09
=== marrusl is now known as marrusl_afk
sinzuibac: yep21:09
* bac is sad21:09
sinzuibac: our only option at this time would be to replace the stack under our personal credentials, but we also need different public IPs21:10
bacsinzui: why the last part?21:10
bacsorry, that was cryptic, sinzui why do we need new public IPs?  because we can't wrest them away from the current assignees?21:11
sinzuibac: public IPs are not shareable or transferable between accounts21:11
bachi thumper21:11
thumpero/21:11
bacsinzui: oh.  can they be revoked from orange and given to us?21:11
sinzuibac: if we wanted to preserve the current IPs we need to revoke then hope we get the same ones when we allocate new ips21:12
bacoi goi oi21:12
thumpersinzui: I'm going to test bootstrapping on precise21:12
bacor, dios mio as they say here21:12
thumpersinzui: I have a precise machine here21:12
sinzuibac: my success rate is is 25% in my attempts to get an IP I had in another account21:12
thumperwhat is our ppa for precise stuff?21:12
bacsinzui: so, what's another RT to update dns in the grand scheme?21:13
waiganimorning all21:14
bacsinzui: so it looks like i need to push a change directly to production without running on staging first.  guess i'll wait until the morning.21:14
sinzuithumper, while you slept I worked out how to use debug-log21:15
thumpersinzui: okay...21:16
sinzuibac: yes lets wait till the morning. I can think about how to make the machine do an update like the charm would21:16
sinzuithumper, I will ping you when I would like your review.21:16
thumpersinzui: oh... for docs...21:17
thumperyeah, this documentation thing still slips by me ...21:17
thumpersinzui: we should get a summary of the help doc into the actual command line help21:17
sinzuithumper, I agree. Maybe I will make that a topic for vegas21:18
thumpersinzui: for the local bootstrap test on precise21:24
thumpersinzui: what is the minimum I need to install?21:24
sinzuithumper, CI uses real precise + juju + juju-local21:25
thumpersinzui: juju and juju-local from where?21:25
thumpersinzui: also, do you know which compiler?21:26
sinzuithumper, any recent. I have juju 19 and juju-local 1.18.1. I haven't changed the last package in a while21:26
thumpersinzui: as I may need to build additional logging21:26
sinzuithumper, good question21:27
thumpersinzui: let me be clear, the precise box I have currently has no juju deps at all21:27
* sinzui looks21:27
thumpersinzui: I'm assuming there is a ppa21:27
sinzui$ apt-cache madison golang-go21:28
sinzui golang-go | 2:1.1.2-2ubuntu1~ctools1 | http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/cloud-tools/main amd64 Packages21:28
sinzuithumper, and if you want very close matches to packages I can offer this...bug I assure you I have't changed local packaging since 1.18.021:29
sinzuihttp://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/publish-revision/ws/tmp.fQ6PU5ZxX5/21:29
thumperah ha21:29
thumperI have precise-updates/cloud-tools in apt21:29
* thumper installs juju-local21:29
thumpersinzui: it seems weird to me that jam was able to boot trunk on aws but CI was not21:30
sinzuithumper, I think you have that reversed21:30
thumperah... wat?21:31
sinzuithumper, http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/21:31
sinzuiCI can deploy fine21:31
thumpersinzui: what is the current state of the local provider CI tests?21:31
thumperwhich one is that21:31
sinzuijam reports that deploy is using the wrong tools. I have not seen that personally or in CI21:31
thumperlocal-deploy is red21:31
sinzuiit has been broken for a few days. it is not "techincally" aws as we have done this on canonistack too21:32
rogpeppehmm, this is a regrettable error when the machine in question is down (actually, its instance has been destroyed): 2014-04-14 21:32:16 ERROR juju.cmd supercommand.go:299 some agents have not upgraded to the current environment version 1.19.0.3: machine-021:34
thumpersinzui: how come the precise-updates cloud tools doesn't have 1.18?21:35
rogpeppei think there should probably be a way to force that21:35
rogpeppethumper: hiya21:35
thumperor perhaps another question would be21:35
thumperwhy doesn't my machine see it?21:36
thumperhi rogpeppe21:36
thumperrogpeppe: I'm wondering if the 'regrettable error' is an understatement for something?21:36
rogpeppethumper: well, it means that the environment is now broken - i cannot upgrade it21:36
sinzuithumper, politics21:37
rogpeppethumper: it is an understatement, yeah21:37
thumperrogpeppe: I suppose an error message that says "you're borked, sucks to be you" wouldn't be appreciated21:37
rogpeppethumper: at least then i'd know it was deliberate...21:37
sinzuithumper, Ubuntu rejected 1.16.4 (they consider backup and restore a feature). jamespage is still trying to get 1.16.6 into archive for precise to ensure they can upgrade, then go to 1.18.021:38
rogpeppethumper: it's an interesting situation actually, because usually i'd be able to do destroy-machine --force, but in this case the machine in question is a state server21:38
sinzuithumper, we have never said users can upgrade from 1.16.3 to 1.18.021:38
thumpersinzui: even in the cloud-tools?21:38
sinzuiIt's not our repo21:38
* rogpeppe creates a bug21:39
sinzuithumper, I talked with a few people today about it. There is a chance 1.18.1 will become official in the archive when trusty is released and customers cannot upgrade to it21:40
rogpeppehmm, actually maybe it's just a bug for me at this moment21:40
thumpersinzui: aargh... that is terrible21:40
=== vladk is now known as vladk|offline
thumpersinzui: ok, can confirm that 1.19.0 bootstraps the local provider on my precise machine22:01
thumperr262622:01
thumperwhich I can see fails on CI22:01
thumpersinzui: so the big question now becomes, what is different?22:02
sinzuithumper, well.22:03
sinzuiwhat changed in lp:juju-core r259322:03
sinzuithumper, when CI slows I can run the deploy with --debug22:03
thumpersinzui: that was when the machine agent became responsible for setting up the mongo upstart script22:04
thumpersinzui: can you capture the mongo logs from the CI machine?22:05
thumperI wonder if this is the crash that dave had reported22:05
thumpersinzui: https://bugs.launchpad.net/juju-core/+bug/130653622:06
_mup_Bug #1306536: replicaset: mongodb crashes during test <juju-core:Triaged> <https://launchpad.net/bugs/1306536>22:06
sinzuibugger, CI is trying the local job more than 5 times22:07
sinzuithumper, I think it is related since the logs report it http://ec2-54-84-137-170.compute-1.amazonaws.com:8080/job/local-deploy/1174/console22:07
thumpersinzui: there is also the mongo log file22:10
thumpersinzui: /var/log/upstart/juju-db-tim-local.log is my file22:11
thumpersinzui: replace <tim> for ci user, and <local> for the env name22:11
sinzuithumper, noted.22:11
thumpersinzui: that way we'll get any extra crash info22:11
sinzuithe local upgrade test is still playing so I cannot start the deploy test22:12
thumperack22:12
thumpersurely if the upgrade test is running, then the local provider bootstraps?22:12
thumperor is it taking a long time to fail?22:12
thumpersinzui: perhaps also worth noting that my precise machine is running i38622:13
sinzuithumper, 1.18.1 is good. We can bootstrap with stable, we cannot upgrade to unstable22:13
sinzuiWe are amd6422:13
thumpersinzui: http://paste.ubuntu.com/7252245/22:16
sinzuiI can bootstrap now22:16
thumpersinzui: where?22:17
sinzuiOn the CI machine22:17
thumper?!22:17
thumperwhat changed?22:17
sinzuithumper, this is the log of my bootstrap attempt https://pastebin.canonical.com/108508/22:20
sinzuithumper, I didn't mean CI could pass bootstrap. I meant that the env was free for me to bootstrap22:20
sinzuithumper, I didn't get logs in a local dir or juju-jenkins-local22:23
sinzuithumper, maybe this config offends you: https://pastebin.canonical.com/108509/22:24
* thumper looks22:24
thumperwhat is test-mode?22:24
thumperwhat is bootstrap-timeout in?22:25
rogpeppeoops, ensure-availability shouldn't have done *that*22:25
sinzuithumper, this is mongodb-server https://pastebin.canonical.com/108510/22:26
thumpersinzui: log?22:26
sinzuithumper, test-mode tell the charm store to not count the deployment22:26
sinzuino logs22:26
sinzui^22:26
sinzuithumper, bootstrap failures don't seem to ever leave logs22:27
thumpersinzui: same mongo version22:27
sinzuihmm22:27
sinzuithumper, I can try to tail something in another terminal while I bootstrap22:27
thumpersinzui: can I log into that machine?22:28
sinzuisure22:29
sinzuithumper, ssh -i ./cloud-city/staging-juju-rsa jenkins@54.84.137.17022:30
thumpersinzui: I don't have that identity22:30
sinzuithumper, the key  is in lp:~sinzui/+junk/cloud-city22:30
sinzuiwhich is shared with you22:30
* thumper looks22:30
sinzuiThat also has the env for everything we test22:31
thumpersinzui: I'm in22:32
sinzuithumper, export GOPATH=/var/lib/jenkins/jobs/local-deploy/workspace/extracted-bin/22:33
sinzuithumper, export JUJU_HOME=~/cloud-city22:33
* rogpeppe has an environment that seems reasonably HA22:42
thumperrogpeppe: \o/22:44
rogpeppethumper: there are still... strangenesses22:44
rogpeppethumper: but still, i destroyed the bootstrap instance and everything carried on much as usual22:45
wallyworldthumper: sinzui: i am going to land john's recent branch "soft-login-failure-1307450" which fixes an issue preventing upgrade from 1.18 to .19 from working22:45
sinzuirogpeppe, send me a bried summary of how you made it HA via the command line. I think I can reused the backup restore test to instrument a failure of a machine. I expect with HA, juju status still works after the failure22:46
sinzui\o/ wallyworld22:46
rogpeppesinzui: the requisite branches haven't landed yet22:46
wallyworldsinzui: well, i'm going by the description - there may be other issues :-)22:46
rogpeppesinzui: there's one which isn't ready to be proposed yet22:46
rogpeppesinzui: i can push the branch that i'm testing, if you like22:47
rogpeppesinzui: essentially i did this: http://paste.ubuntu.com/7252375/22:49
sinzuirogpeppe, no rush. I am busy preparing for a release and trying to get juju 1.16.6 in the cloud archive22:49
sinzuirogpeppe, excellent. as I hoped22:49
* rogpeppe grinds to a halt22:55
rogpeppeg'night all22:55
waiganinight rogpeppe22:55
rogpeppewaigani: ttfn22:56
waiganicongrats on HA22:56
rogpeppeit's not there yet!22:56
waiganicongrats on *almost* HA ;)22:57
sinzuithumper, Can you read my debug-log draft at https://docs.google.com/a/canonical.com/document/d/1BXYrLC78H3H9Cv4e_4XMcZ3mAkTcp6nx4v1wdN650jw/edit22:58
=== marrusl_afk is now known as marrusl
hazmatwhy does local provider try to reverse dns on the ip addresses..23:17
* hazmat wonders how he got dns-name: 176.52.236.23.bc.googleusercontent.com.23:18
=== mjs2 is now known as mjs0
hazmatha.. yummy!23:20
hazmatrogpeppe, sinzui  if you want an additional tester for that.. send me some instructions23:20
sinzuihazmat, thank you23:24
hazmatsmoser, you ever seen cloudinit on trusty hang..  i'm in a container.. and the last output is http://pastebin.ubuntu.com/7252494/  but its blocking the rest of the container startup (ssh, etc).23:27
wallyworldsinzui: john's branch landed at r2627 so hopefully that might help the upgrade tests pass. we'll see i guess23:36
smoserhazmat, can you turn cloud-init debug on.23:40
smoserand get paste.23:40
smoserhazmat, in /etc/cloud/cloud.cfg.d/05_logging.cfg just turn 'handler_consoleHandler' to be23:41
smoserlevel=DEBUG23:41
smoserrather than23:41
smoserlevel=WARNING23:41
smoseryou should see lots more output.23:41
smosernot sure how you ran that though23:42
sinzuithumper, I think CI will start testing in 15 minutes. Do you want me to disable the local tests so that you can use the env as you like?23:46
thumpersinzui: yeah, for now would be good23:46
thumperjust otp with alexisb23:46
alexisbyes sinzui I was distracting thumper, I am done he is all yours now23:48
sinzuithumper, local is all yours. Say when you are done so that I can re-enable the test23:48
thumpersinzui: ok23:48
thumperin poking around now23:48
thumpersinzui: um...23:52
thumpersinzui: hangout?23:52
hazmatsmoser, ack23:52
hazmatsmoser, it was an old version of trusty i was updating.23:52
hazmati'll see if i can reproduce and log23:53
sinzuithumper, I can 40 minutes. My children want dinner23:53
thumperok23:54
thumpersinzui: can in 40 minutes?23:54
thumperor only for 40 minutes23:54
thumper:)23:54

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!