[00:07] thumper: yup, this environment is now broken [00:07] it doesn't respond to any cli commmands [00:16] fwereade: still there? [00:43] davecheney: hmm... interesting [00:45] perrito666: I'd expect not, fwereade is probably sleeping, and if he isn't, he should be [00:49] thumper: normally i'd be expecting hte unit agents to be freaking out and restarting like crazy [00:49] but they are all connected, and just sitting there [00:49] weird [00:50] thumper: do you want to take a look [00:50] sure [00:51] what is your lp id name [00:51] ie, ssh-copy-id $WHO [00:51] thumper [00:51] :-) [00:51] ubuntu@winton-02:~/charms/trusty$ ssh-copy-id thumper [00:51] /usr/bin/ssh-copy-id: ERROR: No identities found [00:51] maybe i'm doing it wrong [00:52] yup [00:52] i was [00:52] thumper: machine is winton-02 [00:52] Hostname 10.245.67.2 [00:52] copy your .ssh/config stanza and replace the hostname [00:54] * thumper sshes in [00:57] davecheney: something here is lying [00:57] davecheney: I can status [00:57] and it tells me that the machine agent for 0 is down [00:57] which it isn't [00:57] becaues it responded to status [00:57] :-) [00:58] davecheney: you rebooted about an hour ago? [00:58] the three lxc machines are all showing as started [00:59] davecheney: juju ssh 1 works, and both the machine and unit agent are running according to upstart [01:05] thumper: yeah, but they don't do anything [01:05] i did juju remove-unit mysql/1 [01:05] and it's still there [01:05] * thumper is reading logs [01:05] it's liek status is jammed at some point in the past [01:06] 1016 juju status [01:06] 1017 juju remove-unit mysql/1 [01:06] 1018 juju remove-machine 3 [01:06] 1019 juju status [01:06] ^ did jack === marco-traveling is now known as marcoceppi [01:11] thumper: try to do stuff with that enviornment [01:11] hmm... [01:22] davecheney: none of the watchers are firing [01:32] urk [01:32] but they are poll driven, right ? [01:36] kinda [01:42] thumper: being told I have to go to the shops to get food for our family [01:42] afk for a bit [02:04] waigani, wallyworld_: axw has power issues, will be online later [02:04] ok [02:04] power as in electricity, not ppc64 [02:04] davecheney: hey [02:04] what's that concept that's not bundles but is bundles [02:04] but is like bundles [02:05] stacks [02:05] thanks thumper [02:05] np [02:06] marcoceppi: I had to turn aufs off by default with lxc-clone [02:06] marcoceppi: too many weird edge-cases [02:06] marcoceppi: it is however, btrfs aware [02:06] thumper: yeah, saw the release, but it will still use lxc-clone [02:06] so will use fast snapshots [02:06] woo [02:06] yes, still use lxc-clone by default [02:06] but lots of i/o to create a machine as it copies ~800M [02:07] * thumper goes to make a coffee [02:27] agent-state on new machines (except 0) is stuck on pending [02:27] trying to ssh into machine one fails: [02:27] ERROR machine "1" has no public address [02:28] all-machines log logs this error: [02:28] ERROR juju runner.go:220 worker: exited "environ-provisioner": no state server machines with addresses found [02:30] waigani: how long did you wait? [02:30] lxc-ls and "uvt-kvm list" list no containers [02:30] waigani: if you haven't done things before, it takes a while [02:30] waigani: you need to do 'sudo lxc-ls' [02:30] thumper 10min? [02:30] 'sudo lxc-ls --fancy' [02:30] still not up [02:30] ah, I'll try that [02:30] is it downloading the image? [02:31] if you haven't run the local provider before, it is downloading the cloud image [02:31] ah, there is a lot of network activity [02:31] tail the log [02:31] all-machines ? [02:31] sure [02:32] thumper: tailing and seeing activity - cheers [02:57] thumper: back [02:57] are you still using winton-02 ? [02:57] davecheney: I have wallyworld_ and waigani looking in to replicating this locally while I finish off a much needed patch [02:57] no, I'm out of winton-02 [02:57] thumper: ok [02:57] can I manually destroy that environmet [02:58] or do you guys stil need it [03:00] davecheney: no, kill it [03:00] I think we have enough info to work from [03:00] kk [03:01] thumper, davecheney: lxc-start/stop machine 1 -agent-status showed started/down as expected. Restarted my machine, machine 0 agent-status: started. [03:01] waigani_: which mongo are you using ? [03:02] waigani_: are you on trusty yet? [03:02] no :( (busted) [03:02] waigani_: recommendation: spin up an environment on ec2 [03:02] waigani_: can I get you to cause a change to the machine that has started up? [03:02] deploy cs:ubuntu/trusty [03:02] and use that to test [03:02] davecheney: MongoDB shell version: 2.4.6 [03:02] waigani_: just to confirm that the bits are hooked up [03:03] waigani_: dpkg -l | grep mongo [03:03] davecheney: 1:2.4.6-0ubuntu5 [03:04] waigani_: wrong version [03:04] you neet juju-mongodb [03:05] ah, how do I get that? [03:05] waigani_: 1. use trusty [03:06] hehe, okay [03:06] 2. sudo apt-get install juju-mongodb [03:06] tried that [03:06] could not find it [03:06] waigani_: it wont if you aren't using trusty [03:06] it may be available in the cloud archive if you are using a cloud image on hp cloud or ec2 [03:06] oh these are steps, not options [03:06] right [03:07] okay, so I need to update to trusty to debug [03:07] waigani_: i'd recommend deploying the ubuntu charm on a cloud [03:07] it's faster and less likely to ruin your afternoon debugging upgrde programs [03:07] problems [03:08] okay, do we have ec2 creds we can use? [03:08] waigani_: i can ask for a new ppc vm for you [03:08] probably take longer than today [03:09] then you can debug the problem at the source [03:09] davecheney: :D [03:09] waigani_: someone (not me) can give you the hp cloud credentials [03:09] that might be another solution [03:09] hp, okay [03:09] thumper: ? [03:09] i only say ec2 because I *KNOW* they have working trusty images [03:09] hp cloud, less certain [03:09] * thumper has no hp clould stuff [03:10] wallyworld_: ? [03:10] yes? [03:10] cummon folks, we're developing a tool to managed public clouds, and nobody has the credentials to test on the clouds ? [03:10] * wallyworld_ reads backscroll [03:10] wallyworld_: do you have HP creds you can share with waigani_ [03:11] waigani_: you need to be on trusty to get the juju-mongodb [03:11] hehe [03:11] i do. [03:11] okay, I need to update to trusty anyway... [03:12] davecheney: except it's my own user name with my password [03:12] best to get a new account added by asking antonio [03:13] he has a master account he can create sub accounts from i believe [03:20] i think antonio is still online [03:36] sorry, I was off downloading trusty, what is antonio's nic? [03:37] arosales: [03:39] msg hi arosales, I'm part of tim's team. I'm told you are the keeper of cloud credentials. Would I be able to get some for ec2 please? [03:39] * davecheney sadtrombine [03:39] doh forgot the / [03:39] waigani_: good thing you didn't include your payment details [03:39] lol - facepalm === vladk|offline is now known as vladk [03:57] thumper: on trusty, i can restart via juju run reboot and also via lxc-stop machine 1 and it shows and down and then started again and config changes seem to be propagated [03:57] i haven't retried rebooting the host yet [03:57] wallyworld_: the test case is [03:57] 1. use trusty [03:57] tick [03:57] 2. use juju-mongodb [03:57] 3. reboot [03:58] ok, 2 out of 3 ain't bad [03:58] i'll reboot [03:58] good bye cruel world [03:58] * davecheney plays revelry [03:58] wallyworld_: you're going to win! hmph [04:02] davecheney: well that didn't go so well, no bootstrap agents restarted after reboot [04:02] well [04:02] they are running but juju status fails [04:03] can't connect to state api port [04:04] actually, machine 1 agents are running but not machine 0 [04:09] wallyworld: so the containers came back up correctly, but not the host agent. What is "service juju-agent-wallyworld-local status" say? [04:10] as well as "service juju-db-wallyworld-local status" [04:10] jam: i checked those, db was running, agent was not [04:10] i started agent by hand [04:10] but stats still fails, loking into it [04:10] wallyworld: hopefully there is something in machine-0.log ? [04:11] waigani_: hey [04:11] nope :-( [04:11] hello [04:11] waigani_: sorry tab fail [04:11] was curious about wallyworld's status [04:12] wallyworld: http://askubuntu.com/questions/207143/how-to-diagnose-upstart-errors says "/var/log/upstart/JOBNAME.log" [04:12] ah found it [04:12] * thumper goes back to write more tests [04:12] wallyworld: found the failure [04:12] or the log file [04:12] mine has: /bin/sh: 1: /bin/sh: cannot create /var/log/juju-jameinel-local/machine-0.log: Directory nonexistent [04:12] the upstart script has the wrong log dir [04:12] which doesn't bode well, I don't know why it would be trying to log there. [04:12] so the job fails [04:13] WTF? [04:13] mine also says [04:13] /home/ian/jujulocal/tools/machine-0/jujud machine --data-dir '/home/ian/jujulocal' --machine-id 0 --debug >> /var/log/juju-ian-local/machine-0.log [04:13] sad trombone [04:14] and /var/log/juju-ian-local doesn't exist? [04:14] nope [04:14] i have a ~/jujulocal/log [04:15] which is where the logs were written before i rebooted [04:15] so its just the upstart script that is wrong [04:15] ah... [04:15] I know what it is [04:15] the upstart script is being rewritten [04:15] with the wrong log dir [04:15] yep :-) [04:15] i fucking knew it [04:15] it shouldn't use the log dir from the agent [04:15] as that is wrong [04:15] wallyworld, thumper: I see the same thing in my upstart, it is trying to redirect to /var/log/$STUFF but we should only be writing to /home/jameinel/.juju/local/$STUFF [04:15] it isn't what is used by the local provider [04:16] I have to finish this branch [04:16] thumper: you mean it is using the $DIR that is bind mounted inside the LXC ? [04:16] well, machine-0 can't be bind mounted [04:16] jam, wallyworld: hangout? and I can explain what I think is the problem [04:16] then I can go back to work [04:17] sure, i think i can find it anyway [04:17] but let's talk to be sure [04:17] thumper: technically, I'm not working and I have to go have breakfast, but feel free to chat with wallyworld [04:17] haha [04:17] kk [04:17] thumper: https://plus.google.com/hangouts/_/calendar/am9obi5tZWluZWxAY2Fub25pY2FsLmNvbQ.mf0d8r5pfb44m16v9b2n5i29ig?authuser=1 [04:17] calendar? [04:25] * thumper puts head back down [04:27] davecheney: sooo, there is one showstopper - juju machine agent update script is wrong for local provider, so after a reboot, no machine 0 agent for you. that doesn't explain everything you are seeing perhaps, but that's the focus right now to be fixed [04:45] wallyworld: https://codereview.appspot.com/78660043 [04:45] looking [04:45] thumper: i've fixed the issue, i think. gotta write a test and test live [04:45] ugh [04:45] I've missed a test [04:45] wallyworld: ack, and very cool, thanks [04:45] * wallyworld will wait for test to be added [04:47] lboxing now [04:47] * thumper waits... [04:48] thumper: did you now destroy env for local doesn't remove the upstrat scripts? [04:48] know [04:48] wallyworld: it should [04:48] well, it didn't just now [04:48] that's a bug... [04:48] yeah [04:49] i'll file 2 bugs [04:49] it should use the same mechanism that manual provider uses [04:49] to remove the script [04:49] it is possible that manual agent removal is broken too [04:49] wallyworld: test updated and pushed [04:49] 1 bug for upstart script creation for local, one for removal [04:49] looking [04:49] wallyworld: the only drive-by there, is a change of an existing job [04:49] from "host units" to "all" [04:49] ok [04:50] as the lock dir is needed on all machines [04:50] * thumper thinks [04:50] oh... ick [04:50] * thumper thinks some more and looks at code [04:51] ugh [04:51] I remember why we did it... but it is icky [04:51] the only machine that is not "host units" is a local provider state machine [04:52] as normal state machines also host units [04:52] so we don't try to run it on machine 0 for local [04:52] which is right [04:53] hmm... [04:53] actually doesn't hurt [04:53] because it uses agent dir [04:54] and it only tries to chown if there exists a /home/ubuntu [04:54] so I'd rather keep it as "all machines" as it better describes what is intended [04:56] sounds reasonable [04:59] thumper: my fix works and we now have a valid all-machines-log symlink :-D [05:00] it wasn't valid before [05:00] so looking good, Vern [05:01] thumper: sadly, i changed the code and no tests failed :-( [05:01] :( [05:02] possible to write a test to save us next time? [05:02] yep, that's the plan [05:02] i always write a test when fixing a bug [05:02] that's because you're AWESOME! [05:02] * wallyworld blushes [05:04] thumper: that destoy env thing - i used --force cause machine agent wasn't running. it left behind upstart scripts as well as mongo process etc. so not really a show stopper i guess [05:04] wallyworld: we already have a bug for making --force clean up more [05:04] lets just make sure we do it next week [05:04] yep [05:05] * thumper has hit EOW [05:05] I've approved that branch and hope it lands [05:06] later folks... === vladk is now known as vladk|away === vladk|away is now known as vladk [05:26] waigani_, hello [05:26] sorry I missed your ping [05:26] arosales: hello :) [05:27] I was looking for some ec2 creds. Are you the right person to talk to? [05:27] waigani_: hp man, hp [05:28] arosales: s/ec2/hp [05:30] waigani_, hp I can do let me get those to you. [05:35] ok, lxc-clone: true is like, the best thing EVAR. You guys have made me a non-trival % more productive. 2 months before review time too! Thanks! :D [05:45] no power and no coffee makes axw go something something === vladk is now known as vladk|away [05:47] sigh, gotta restart - hotplugging display link doesn't seem to work since too well since I upgraded === vladk|away is now known as vladk [05:58] axw: what's going on? [05:58] waigani_: with hotplugging? [05:59] thumper said you lost power or something today? [05:59] waigani_: oh yeah, power outage all morning [05:59] ha, that sucks for you [06:00] pen and paper coding... [06:00] I had like 10 minutes left on my laptop before it came back on [06:00] oh nice [06:00] couldn't do much without coffee though ;p [06:00] haha [06:01] caffeine dependence is the price for flavoursome mornings === vladk is now known as vladk|offline [07:58] fwereade: i'm off to soccer. here's a small mp that fixes a critical 1.17.7 issue to do with local provider logging and upstart config https://codereview.appspot.com/78730043 [08:05] wallyworld: what makes you think it's incorrect? thumper found that rsyslog has an apparmor profile that only allows it to write in /var/log/... [08:05] * axw plays with it [08:05] axw: the symlink was wrong [08:06] the branch fixes it [08:06] ok, I'll take a look. [08:06] the symlink in ~/.juju/local/log didn't point to /var/log/... [08:07] the upstart file was also wrong so that a reboot didn't restart the local machine agent [08:07] * wallyworld -> soccer [08:07] later [08:15] mornin' all [08:20] morning === vladk|offline is now known as vladk [08:34] good morning [08:42] morning vladk [09:05] waigani: I've already done the local provider Destroy vs. broken environments (except for one last thing which I'm doing now), so reassigning the card to myself [09:06] kk [09:35] hello [09:36] * davecheney waves to wwitzel3 [09:38] hey wwitzel3 [09:43] davecheney: did you got to the bottom of why local on ppc doesn't survive reboot? [09:56] axw: not yet [09:56] it's hard to reproduce the problem [09:57] it might be juju-mongodb [10:03] davecheney: ok, just wondering if it was connected to the bug wallyworld raised [10:04] (#1295501) [10:04] <_mup_> Bug #1295501: local provider upstart script broken

[10:24] mgz, hey [10:25] mgz, how is it going with the state changes? [10:27] good morning [10:27] fwereade, hey [10:28] dimitern, heyhey [10:28] fwereade, re state changes for vlans [10:28] dimitern, yeah [10:28] fwereade, you're thinking of having 2 new collections - serviceNetworks and machineNetworks? [10:28] fwereade, or the latter will just be a couple of fields in the machine doc? [10:28] dimitern, yeah -- serviceNetworks needs NoNetworks, machineNetworks just needs Networks [10:28] morning perrito666 [10:29] dimitern, I'm generally against extending entity documents [10:29] dimitern, we have a history of screwing up watcher behaviour by doing so [10:29] fwereade, why just networks for machines? [10:29] fwereade: turns out the rsyslog thing has actually changed, apparently restore is broken, but since tests didn't pass before we did not know [10:29] fwereade, we should list both included and excluded ones i think [10:29] axw: thanks for the mail :) [10:29] dimitern, because the machine stuff is the record of reality, while the service stuff is the specification [10:30] dimitern, it's like hardware characteristics vs constraints [10:30] fwereade, so to get both we need to fetch the service's excluded networks and use the machine's included networks [10:31] perrito666: no worries [10:31] dimitern, ah-ha, thank you, something has crystallised in my mind [10:31] fwereade, yeah? [10:31] dimitern, so, looking forward, we'll want to be able to add machines with net/nonet specifications [10:31] fwereade, not so forward even [10:32] fwereade, i thought that was one of the basic features we're aiming to have for maas [10:32] dimitern, at the moment we take that info purely from the assigned units [10:32] dimitern, we kinda elided that in favour of servce-only [10:33] dimitern, but the forces in play are actually the same as for constraints [10:33] dimitern, are you familiar with machine constraints? [10:33] fwereade, hmm.. but how about the networker worker - where will it record what networks it started/not started for a machine? and where to get which ones to process in the first place? the service? [10:33] fwereade, not that much [10:34] dimitern, ok, so when we create a new machine for a unit to live on, we record the constraints in play (env/service combination) and subsequently use those when provisioning the machine [10:35] dimitern, this is a bit different to the model we thought we'd have for networks, but it shouldn't be, I think [10:35] fwereade, yeah - so we compute the effective set and save it with the machine [10:35] dimitern, exactly [10:35] dimitern, same deal [10:35] dimitern, and this means that it's trivial to create a machine without units, but with net/nonet specification, and store that directly [10:35] dimitern, so in fact my "call it serviceNetworks" thing on mgz's review was wrong [10:36] fwereade, why would you do that? [10:36] fwereade, net/nonet spec should always go with a service [10:36] fwereade, well... except in case "i know what i'm doing, just give me a machine like that" [10:36] dimitern, people do sometimes like to create machines ahead of time and leave them idle, even if they know what they will want to do with them in future [10:36] dimitern, yeah [10:37] fwereade, why was your comment about serviceNetworks wrong? [10:37] dimitern, because we need to store net/nonet data for both services and machines [10:38] fwereade, true [10:38] dimitern, (mgz, perrito666): and this means we want globalKey ids, not serviceName keys [10:38] fwereade, what will the key be? either serviceName or machineId ? [10:38] dimitern, (mgz, perrito666): and *then* subsequently we want to store, separately, what-the-machine-actually-got, analogous to HardwareCharacteristics [10:38] dimitern, it should be the entity's globalKey [10:39] dimitern, like constraints/settings/any other collection that assoicates with multiple entity types [10:40] fwereade, i see [10:40] fwereade, sgtm [10:41] fwereade, i'll look some more into constraints / hardwarecharacteristics [10:46] dimitern, fwereade, mgz: standup? [10:46] wallyworld: ^ [10:47] jam, ^^ [10:48] oh, he's probably off today right [10:52] hey, can I bzr switch to a tag? [11:14] natefinch: I merged in trunk and fixed some conflicts on my copy of 030-MA-HA and I'm pushing that up now [11:15] natefinch: I need a copy of your latest fixes though, I don't have the environment fixes [11:22] rogpeppe: grabbing coffee, stretching my limbs and returning === vladk is now known as vladk|lunch [11:26] natefinch: actually you should probably merge your stuff with trunk before pushing it up, since you will actually have all the changes and I'll just have to merge again when I pull your branch [11:27] wwitzel3: ok === psivaa_ is now known as psivaa [11:54] axw: you still around? [12:05] wallyworld: I am now [12:05] axw: the reason the upstart script failed is because the output file is is being redirected to doesn't exist [12:06] jam had the same problem [12:06] wallyworld: why doesn't it exist? [12:06] cause local provider creates a log file elsewhere [12:06] not in /var/lib/juju [12:06] /var/log [12:07] my changes fix the issue and also repair the brokem symlink [12:07] wallyworld: see my comment in launchpad - it's not broken on my system. so there's something more at play here [12:07] ie all-machines.log -> /var/log/juju-ian-local/all-machines.log [12:07] wallyworld, it /var/log/juju- full path: /log/all-machines.log -> /var/log/juju-ian-local/all-machines.log [12:08] the above was broken [12:08] wallyworld: machine-0.log is written into ~/.juju/local/log. all-machines.log is written into /var/log/juju- [12:08]