[00:16] ok ruty ruty here how could I check the Juju syslog again on the command line ? [00:26] Teranet juju debug-log [00:27] thx [01:12] lazyPower: yea === scuttlemonkey is now known as scuttle|afk [07:14] is there a way to stop agent executing charm upgrade process? [07:14] I think i managed to create infinite loop for my charm and I now it is just processing the upgrade and I would like to fix that :) [07:17] Apparently by killing the juju agent process which is handling the charm-upgrade [07:25] Good morning Juju world === frankban|afk is now known as frankban [08:20] huh. Nobody has written a passwordless ssh layer yet. I thought this would be required more often? [08:40] what do you mean? [11:22] hi, I'm using canonical-kubernetes bundle charms and I noticed that the default Grafana/InfluxDB packaged in it does not seem to erase graphs from deleted Pods.. do you know where can I do some cleaning? [11:22] automatic-cleaning, I mean, if that's possible :) [11:31] Zic: sounds like an action feature request. I can see the desire to keep some data around for historical context, but definitely also see the desire to clean stuff out. [11:32] is there a way to react (set state) when metric value is <> something? [11:32] Zic: https://github.com/juju-solutions/bundle-canonical-kubernetes/issues please file an issue here [11:40] rick_h: thanks, I will [11:40] rick_h: I also notice a strange behaviour, don't know if it's a bug or is it normal: I don't see my new created namespaces in Grafana list of namespace (dashboard: Pods), but if I enter the name of my namespace manually, it works and displayed proper data [11:41] is it up to me to add namespace in this drop-down list? or it should be auto-completed? [11:43] for Pods drop-down list, it's auto-filled from what is running in the k8s cluster [11:43] but the namespace list only contain default and kube-system namespaces :o [11:52] Zic: the grafana container is the one run from upstream gcr.io, we don't do much munging for that, that I'm aware of, but most of the k8s team is US based so it's early there [11:55] ok, if the charms does not manage this area, I will try to poke them at their Slack :) thanks anyway [12:27] marcoceppi_: what's the best place to get the kubectl for the bundle? [12:27] marcoceppi_: have the core bundle on my maas and wanting to tinker around with it [12:29] rick_h: read the readme ;) [12:30] rick_h: https://jujucharms.com/kubernetes-core/#yui_3_17_1_2_1485260992490_80 [12:30] marcoceppi_: oh....I was looking at the readme and it had me get the config, i missed it had me get the binary as well [12:31] marcoceppi_: my bad, I'm reading the readme I swear :P [12:31] rick_h: yeah, conjure-up makes it nice because it does that automatically, but for manual deploys we don't really have a way to `juju download` or fetch something [12:31] rick_h: in the very near future you'll just snap install kubectl [12:31] marcoceppi_: yea all good. I thought it was a snap but wasn't sure where/etc [12:32] marcoceppi_: ty for the poke on looking harder [12:32] bwuhahaha Kubernetes master is running at https://10.0.0.160:6443 [12:33] \o/ === rogpeppe1 is now known as rogpeppe [13:35] hellooooo from bluefin... bit weird sat at a desk in this place [14:07] magicaltrout: oi, what are you up to in bluefin? === scuttle|afk is now known as scuttlemonkey [14:15] marcoceppi_: went to talk to Tom Calaway about join marketing stuff for Spicule & Juju in 2017 [14:15] magicaltrout: awesome [14:15] have a few hours to kill before DC/OS meetup [14:15] so sitting around taking up space [14:16] well, enjoy o/ [14:40] marcoceppi_: can you dig out that big half pull request thing that existed for getting Centos support in Reactive? [14:41] I want to improve on the DC/OS stuff charms, but I also want them to run on Centos because thats what they support and not my half bodged Ubuntu support [14:42] i'm doing their office hours in a couple of weeks and it'd be a nice thing to discuss alongside generic Juju support [14:44] Oh yeah, bogdan sent that in didn't he? [14:44] dunno :) [14:45] i think it withered on the vine, we left some feedback and it wasn't circled back. [14:45] clearly i could write old school charms, but the hooks make me so sad :) [14:45] but thats from memory so its likely incorrect [14:45] i dont blame you magicaltrout, not one bit [14:45] no its likely correct, marco showed me the PR ages ago and I said I'd take a look [14:45] then didn't [14:45] but i didn't really have a use case 6 months ago [14:46] magicaltrout: it landed a while ago [14:51] landed as in merged or landed as in got pushed as a PR a while ago and is now very stale? marcoceppi_ [14:52] i suspect the latter, which is fine, I was just going to take the spirit of the commit and figure out what needs changing, refactoring and implementing === dannf` is now known as dannf [15:22] magicaltrout: it's landed and released [15:24] magicaltrout: we don't have centos base layers, or anything there, but charm-helpers is present with basic centos support: http://bazaar.launchpad.net/~charm-helpers/charm-helpers/devel/files/head:/charmhelpers/core/host_factory/ [15:25] oh nice [15:25] interesting! [15:25] magicaltrout - see? I told ya my brain was probably incorrect [15:25] * lazyPower mutters something about scumbag brain [15:27] alright then [15:27] so i can take a stab at creating a centos base layer [15:31] that would make the Mesos guys and the NASA guys happy [15:31] * magicaltrout greps around in the code to see whats going on [15:41] magicaltrout: yeah, I think we need to rename basic to ubuntu, tbh. Having a centos layer adds complexity, cory_fu and I chatted breifly about having the notion of a special "base" layer where layer:ubuntu and layer:centos wouldn't be compatible [15:41] magicaltrout: then again, snaps will basically save us from ever needing to worry about distros again [15:45] true that [15:45] looking forward to snap integration [15:46] i've cloned layer-basic locally, i'll create a layer-basic-centos to prevent clases [15:46] clashes [15:46] we have a snap layer that stub wrote, I know ryebot and Cynerva have taken a run at it a few times for kuberentes stuff [15:46] i'll have to check that out when i get time... i'd still get the "does it run on centos" question though ;) [15:47] SA's are a picky bunch... why can't they just get on with it? [15:47] I don't care if it doesn't fit their nagios templates ;) [16:33] marcoceppi_: to save me digging all over the place [16:33] do you know where you define stuff that ubuntu needs to install when creating a new machine to run the charms, python3 etc [16:35] bootstrap_charm_deps [16:35] think i found it [16:49] magicaltrout: yeah, it's in lib/layer/basic.py of the basic layer IIRC [16:51] aye boos [16:51] boss [17:10] http://askubuntu.com/questions/875716/juju-localhost-lxd [17:11] any ideas on this one? [17:14] well [17:14] you don't do intermodel relations yet [17:14] so thats not going to fly very far currently === frankban is now known as frankban|afk [17:39] magicaltrout - sounds like they want 2 controllers, so two completely isolated juju planes on a single host [17:39] i'm not sure if we've even piloted that use case as a POC, as multi-model controllers removed a lot of the concerns here [17:42] lazyPower: dustin swung by and asked about the DC/OS stuff, said he'd be happy to do some collaboration and pitch some talks at Mesoscon and stuff, now I'm off up to the Mesos meetup in London to go talk to the Mesosphere guys, maybe one day someone will offer to write the code as well! ;) [17:42] magicaltrout - i'd love to do that integration work *with* you [17:42] anyway the LXD poc is coming along slowly [17:42] i think there's a good overlap in changes in k8s and mesos charms that will make it simple [17:42] once we have something that semi works i'll get you guys and the mesosphere guys looped in to fill in the gaps [17:43] you're just plugging in api ip's and what not, i think theya re both rest based no? [17:43] and some dinky config flags to replace the scheduler too, dont let me forget those important nuggets [17:43] as a complete oversimplification of the problem domain ^ [17:44] you've lost me now [17:44] what are you asking? [17:44] oh were you not talking about k8s/mesosphere integration? [17:45] lxd -> Mesos <-juju [17:45] ^ [17:45] - [17:45] k8s [17:45] oic [17:46] yeah i missed the mark there [17:46] well the DCOS chaps are certainly interested in getting LXD into Mesos which would be a win because then juju can bootstrap against mesos [17:46] you could run K8S on Juju within Mesos ;) [17:47] and on your side Dustin says he's interested in getting Juju and DC/OS playing nicely which includes the LXD stuff [17:48] but! when that starts coming together [17:48] i shall be stealing your flannel network and stuff [17:50] not that i have a clue how to monetize such a platform, but in my head juju managed mesos and juju bootstrapped mesos sounds pretty sweet [17:52] i concur [17:52] maybe the monetization is the support of such a platform and not the tech itself [17:52] not that anybody we know does that *whistles* [18:34] kwmonroe: if you have a sec I am seeing some curious behavior with spark via the hadoop-spark bundle + zeppelin [18:34] kwmonroe: doesn't seem like sparkpi is running successfully per the action output http://paste.ubuntu.com/23859116/ I also don't see the job in the spark job history server @ http://54.187.77.213:8080/ [18:39] I do see "2017-01-24 17:28:22 INFO sparkpi calculating pi" in the /var/log/spark-unit logs though . . . [18:44] arosales: i'm not surprised at the lack of spark info in the spark job history server.. you're probably in yarn-client mode, which means the job will log to the resourcemanger (expose and check http://rm-ip:8088 to be sure) [18:45] arosales: in your pastebin, search for "3.14", you'll see it. [18:45] granted, that action output needs some love.. no sense in having all the yarn junk in there [18:47] kwmonroe: I do see jobs @ http://54.187.77.213:18080/ through [18:47] and it looks like pagerank ran ok, just issues with sparkpi [18:49] arosales: i see 3 sparkpi jobs at your link... did you run some after the pagerank that are no longer showing up? [18:49] I did [18:50] kwmonroe: resourcemanger = http://54.187.130.194:8088/cluster [18:52] arosales: spark 8080 is the cluster view which (i think) will only show jobs when spark is running in spark mode (that is, not yarn-* mode). spark 18080 is the history server that will show jobs that ran in yarn mode -- yarn 8088 also shows these. [18:52] kwmonroe: so looks like it ~is~ running just the raw output for the sparkpi action may be a little tuning? [18:53] kwmonroe: sorry, and to your previous question if I am _not_ seeing jobs I submitted --- I think I am seeing all the jobs @ http://54.187.77.213:18080/ and http://54.187.130.194:8088/cluster [18:55] yeah arosales, from those URLs, it looks like you're seeing all the spark jobs (sparkpi + pagerank) on the spark history server (18080), and all the spark *and* yarn jobs (sparkpi + pagerank + tera*) on the RM (8088) [18:56] kwmonroe: was http://paste.ubuntu.com/23859116/ the output you were expecting for sparkpi ? [18:56] to your point "3.14" is in there just amongst a lot of other data [18:57] so arosales, i mispoke in my 1st reply. to be clear, you should see all spark jobs in the spark history server (18080) and resourcemanager job history (8088). you will *not* see job history in the spark cluster view (8080) while in yarn mode. [18:57] kwmonroe: ack [18:57] yup arosales, as long as it says "pi is roughly 3.1", we promulgate ;) [18:57] because ovals are basically circles [18:58] kwmonroe: I was just concerned about the 153 lines of output to tell me a circle is like a oval [18:58] :) ack arosales, i'll see if we can clean up that raw output without hiding the meat. [18:59] * arosales will submit a feature on the spark charm ;-) [18:59] Just glad its working as expected [18:59] kwmonroe: thanks for the help [19:01] np arosales - thanks for the action feedback! if it ever returns "Pi is roughly a squircle", let me know asap. [19:02] kwmonroe: I'll not that for the failure scenerio [19:04] marcoceppi_: You ready in about an hour ? (#Discourse) [19:12] arosales: one more thing.. if you've still got the env, would you mind queueing multiple sparkpi/pagerank/whatever actions in a row? i noticed sometimes the yarn nodemanagers would go away if memory pressure got too high, and yarn would report multipel "lost nodes" at http://54.187.130.194:8088/cluster. so if you don't mind, kick off a teragen, sparkpi, and pagerank action at the same time so i can watch. [19:13] iirc, sometimes the RM would wait for resources before firing off those jobs, but sometimes not. curious if that's easily recreatable. [19:23] kwmonroe: gah, missed this message before I tore down :-/ [19:23] kwmonroe: easily reproducible though [19:25] np arosales -- i've been meaning to get to the bottom of that. i'm like 25% sure we fixed it when we disabled the vmem pressure setting. i've got a deployment in my near future, so i'll check it out. just one of those things i keep meaning to try but keep forgetting until someone like you comes along. [19:25] kwmonroe: np, spinning back up. I'll let you know what I find [19:26] thanks! [19:26] CoderEurope: yup! [19:27] marcoceppi_: On here ? or do you want me to keep PM'ing you ? [19:28] Back in ten mins [19:46] marcoceppi_: 15 to go ....... :) [19:50] CoderEurope: heh, you're on a mission eh? [19:54] rick_h: The weather is with us tonight ! https://www.youtube.com/watch?v=z3TNGDVOMA4&feature=youtu.be [19:54] CoderEurope: oh man, did not see that coming [19:55] rick_h: I have ordered one of these in my budget: http://amzn.eu/e93uNrY [20:02] kwmonroe: ok got 20+ actions running @ http://54.202.97.95:8088/cluster from resourcemanager and spark [20:04] marcoceppi_: you ready ? [20:04] kwmonroe: ref of actions running = http://paste.ubuntu.com/23859496/ [20:07] I have a nose bleed - hangon marcoceppi_ [20:08] whoa holy actions arosales [20:09] kwmonroe: wanted some mem pressure [20:09] :-) [20:09] marcoceppi_: Okay at the ready :D [20:12] CoderEurope: yo, lets do this! [20:14] kwmonroe: also in ref to spark sparkpi action https://issues.apache.org/jira/browse/BIGTOP-2677 (low priority) [20:28] +100 arosales! thanks for opening the jira. [20:29] ugh, i missed an opportunity to +3.14 [20:29] kwmonroe fun fact, my apartment number is pi [20:29] you must have a very long apt number lazyPower [20:29] its a subset of pi, but pi all the same [20:30] 3? close enough. r square it. [20:30] :P [20:36] kwmonroe: let me know if you need me to run any more tests on this hadoop/spark cluster [20:37] so arosales, you've hit it. see your cluster UI http://54.202.97.95:8088/cluster.. 3 "lost nodes". [20:38] * arosales looks [20:38] arosales: what i'm waiting to see now is whether or not yarn will recover and process the last running job if/when the nodemgrs come bak. [20:38] ah http://54.202.97.95:8088/cluster/nodes/lost -- kind of hidden [20:38] kwmonroe: ack, I"ll let it run [20:38] marcoceppi_: Just to keep you in the loop - my chromebook just crashed - 2 mins till re-surface on meet.jit.si [20:47] arosales: what has happened here is that each of your nodemgrs is only capably of allocating 8gb of physical ram. when yarn schedules mappers or reducers (which take min 1gb) on a nodemgr that already has a few running, it can "lose" those nodes. i don't know the diff between an "unhealthy" node and a "lost" node, but i think the latter can come back once jobs complete. what's interesting to me is that you got through 19 [20:47] of the 20+ jobs. surely that's not a coincidence. [20:48] at 8gb per nodemgr and 3 nodemgr wee should expect to see around 24 jobs at least allocated, correct? [20:50] arosales: i would expect 24 jobs *possible* because the min allocation to each nodemgr is 1gb. but jobs may specify mem constraints that differ from the min. at any rate, 73 of your 74 jobs have completed, which makes me think those nodes get lost and come back once they're freed up -- like "i'm busy, don't allocate to me anymore". === frankban|afk is now known as frankban [20:51] arosales: and now i see 82 jobs. you're not giving this thing a break, are you :) [20:52] i mean, "big data" is usually just a handful of word count jobs.. #amirite lazyPower? [20:52] kwmonroe: just for my education once a node is marked as lost does it return? Seems the three @ http://54.202.97.95:8088/cluster/nodes/lost haven't been "returned" [20:52] trufacts [20:53] but you were probably looking for the reaction: (ﾉ´･ω･)ﾉﾐ ┸━┸ [20:53] kwmonroe: I stopped submitting a bit ago, but the submitted jobs are at http://paste.ubuntu.com/23859496/ [20:53] marcoceppi_: Awesome and valuable work there ... high five o/ [20:53] \o [20:55] * CoderEurope leaves till tomorrow - bye y'all. [20:55] arosales: i don't know for sure (if lost nodes return). i think they do once they have > min phys mem available. i have hope. when we started this convo, you had 73 of 74 jobs completed with 3 nodes lost. now you have 89 of 90 jobs complete. i think that means that your job completion rate has lengthened, but they do appear to be completing eventually.. i suspect that's because a lost node comes back to grab another [20:55] task. [20:56] yeah arosales, for sure that's what's happening. the terasort job that was running with 3 lost nodes has completed. now 3 are still lost, but a new app (nnbench) is going. [20:56] so all is well. your cluster is just kinda small for 90+ concurrent jobs :) [20:58] kwmonroe: thanks for the info, and here with the hadoop-spark bundle e hae 3 separate units hosting the slave (datenode and node manager) correct? [20:58] correct [20:59] and how do we map those units to the lost nodes (http://54.202.97.95:8088/cluster/nodes/lost) [21:01] arosales: they are the same. your 3 lost nodes are the 3 slave units. [21:02] so, what makes a node lost? a failed health check, or a violation of constraints (like < min memory available). the former means the nodemanager slave unit hasn't told the RM that it's available for jobs; the latter means it can't take jobs because it doesn't have enough resources available. [21:02] that was what I thought, but then I was wondering where were the 3 lost nodes (http://54.202.97.95:8088/cluster/nodes/lost) at in reference to units [21:03] logically the 3 lost have different node addresses than the 3 active nodes [21:03] negative arosales - the node address is the ip of the slave unit. [21:04] yes, you are correct. Same address different ports [21:04] which makes me think we start a new process on the unit for a new node [21:05] _if_ we started with 3 [21:05] yup arosales -- when the RM has a new task, it farms it out to the slaves / nodes and they spawn a new java (hi mbruzek), which opens a new port. [21:05] yes Kevin? [21:06] just java buddy [21:06] OK [21:06] kwmonroe: gotcha -- cool thanks for the lesson here [21:07] kwmonroe: no pending actions on the juju side -- all have completed [21:08] ack arosales... now let's watch and see if the nodes come back in the cluster view now that the jobs are done. [21:08] health checks happen every 10ish minutes [21:08] which should bring them back if there are no jobs [21:09] * arosales will eagerly await kwmonroe === alexisb is now known as alexisb-afk [21:31] arosales: your cluster ui (:8088) shows 3 active and 3 lost nodes. i'm not sure how to reset the lost count without restarting yarn, or even if the 'lost' count is all that important given the expected nodes are 'active' now. i'll google around to try and learn more about 'lost'. [21:35] kwmonroe: given we started with 3 and we have 3 active. I am not sure lost are recliamable [21:37] kwmonroe: it seems the cluster did try to keep the 3 "active" albeit starting a new "node" process on the 3 given units [21:49] yeah arosales, it seems restarting yarn (sudo service hadoop-yarn-resourcemanager restart) resets the lost node count. maybe the lost node count is supposed to be indicative of how many times the yarn cluster was starved for resources. i really don't know. i wish you never asked me this on a public forum because now people know i'm ignorant (on this one thing). [21:51] but thanks for running this with me -- like i said earlier, i had been meaning to get to the bottom of lost nodes. it's good to know the cluster can still do jobs (evinced by your 90+ jobs) even if nodes get reported as lost. [21:51] with which command is : juju set been replaced with in juju 2.x ? [21:52] Teranet: juju config [21:52] so syntax I hope right ? [21:53] juju config =. pretty sure it's the same, just s/config/set [21:53] er, s/set/config [21:59] kwmonroe: np, thanks for the info and keeping with my odd questions :-) [22:00] I could have tried to look it up, but it I was lazy and just pinged you [22:05] ooo someone said my name, but not directed at me :D [22:11] lazyPower: :-) [22:59] he was balding... he was pink... they called him kevin....keeevin. He had a swimming pool, with questionable filtration... they called him Kevin Keeeeeevin or just kwmonroe [23:00] kwmonroe: if i punt you chaps some big data chapters from my little book over the next couple of weeks can you give them the once over? === alexisb-afk is now known as alexisb [23:08] magicaltrout: I was going to catch you in gent, but would love to look over your chapters. Would like to see what you have lined up for testing and layer creation === frankban is now known as frankban|afk [23:10] not a great deal yet arosales :) I have some stuff sketched out for layer creation. I didn't do testing yet because a) kjackal_ suggested he might help out there and also I wanted to review what went down in the office hours the other day with the python stuff from bdx [23:10] to try and aggregate examples [23:11] i'll be certain to punt stuff your direction though to help suggest gaps [23:11] i'm crowdsourcing knowledge ;) [23:12] magicaltrout: tvansteenburghhttps://gist.github.com/jamesbeedy/dad808872e5488b43cf3fa5d5f2db87c [23:12] magicaltrout: would be good to catch you up on charm-ci in gent if not sooner [23:13] errrg, tvansteenburghhttps was just giving me some pointers on that jenkins script I've been working on [23:15] magicaltrout: but yes kjackal is also an excellent guy to help out with the charm ci bits as well [23:15] bdx: someone on the book feedback stuff specifically asked for CI and testing examples [23:16] so I figure we should probably solve that conundrum ;) [23:27] Hey do we have anyone here who knows rabbitmq-server setup for a cluster ? I got 3 nodes but somehow they won't peer right. [23:32] Log output and OpenSTACK overview : http://paste.ubuntu.com/23860574/ [23:35] Teranet: not sure what folks are around atm, but you also may have some luck posting in #openstack-charms [23:36] thx will do [23:36] wallyworld, the osx bug is on us indeed. You're free :-) [23:37] yay, ty