/srv/irclogs.ubuntu.com/2017/01/24/#juju.txt

Teranetok ruty ruty here how could I check the Juju syslog again on the command line ?00:16
lazyPowerTeranet juju debug-log00:26
Teranetthx00:27
bdxlazyPower: yea01:12
=== scuttlemonkey is now known as scuttle|afk
anrahis there a way to stop agent executing charm upgrade process?07:14
anrahI think i managed to create infinite loop for my charm and I now it is just processing the upgrade and I would like to fix that :)07:14
anrahApparently by killing the juju agent process which is handling the charm-upgrade07:17
kjackalGood morning Juju world07:25
=== frankban|afk is now known as frankban
stubhuh. Nobody has written a passwordless ssh layer yet. I thought this would be required more often?08:20
anrahwhat do you mean?08:40
Zichi, I'm using canonical-kubernetes bundle charms and I noticed that the default Grafana/InfluxDB packaged in it does not seem to erase graphs from deleted Pods.. do you know where can I do some cleaning?11:22
Zicautomatic-cleaning, I mean, if that's possible :)11:22
rick_hZic: sounds like an action feature request. I can see the desire to keep some data around for historical context, but definitely also see the desire to clean stuff out.11:31
anrahis there a way to react (set state) when metric value is <> something?11:32
rick_hZic: https://github.com/juju-solutions/bundle-canonical-kubernetes/issues please file an issue here11:32
Zicrick_h: thanks, I will11:40
Zicrick_h: I also notice a strange behaviour, don't know if it's a bug or is it normal: I don't see my new created namespaces in Grafana list of namespace (dashboard: Pods), but if I enter the name of my namespace manually, it works and displayed proper data11:40
Zicis it up to me to add namespace in this drop-down list? or it should be auto-completed?11:41
Zicfor Pods drop-down list, it's auto-filled from what is running in the k8s cluster11:43
Zicbut the namespace list only contain default and kube-system namespaces :o11:43
marcoceppi_Zic: the grafana container is the one run from upstream gcr.io, we don't do much munging for that, that I'm aware of, but most of the k8s team is US based so it's early there11:52
Zicok, if the charms does not manage this area, I will try to poke them at their Slack :) thanks anyway11:55
rick_hmarcoceppi_: what's the best place to get the kubectl for the bundle?12:27
rick_hmarcoceppi_: have the core bundle on my maas and wanting to tinker around with it12:27
marcoceppi_rick_h: read the readme ;)12:29
marcoceppi_rick_h: https://jujucharms.com/kubernetes-core/#yui_3_17_1_2_1485260992490_8012:30
rick_hmarcoceppi_: oh....I was looking at the readme and it had me get the config, i missed it had me get the binary as well12:30
rick_hmarcoceppi_: my bad, I'm reading the readme I swear :P12:31
marcoceppi_rick_h: yeah, conjure-up makes it nice because it does that automatically, but for manual deploys we don't really have a way to `juju download` or fetch something12:31
marcoceppi_rick_h: in the very near future you'll just snap install kubectl12:31
rick_hmarcoceppi_: yea all good. I thought it was a snap but wasn't sure where/etc12:31
rick_hmarcoceppi_: ty for the poke on looking harder12:32
rick_hbwuhahaha Kubernetes master is running at https://10.0.0.160:644312:32
marcoceppi_\o/12:33
=== rogpeppe1 is now known as rogpeppe
magicaltrouthellooooo from bluefin... bit weird sat at a desk in this place13:35
marcoceppi_magicaltrout: oi, what are you up to in bluefin?14:07
=== scuttle|afk is now known as scuttlemonkey
magicaltroutmarcoceppi_: went to talk to Tom Calaway about join marketing stuff for Spicule & Juju in 201714:15
marcoceppi_magicaltrout: awesome14:15
magicaltrouthave a few hours to kill before DC/OS meetup14:15
magicaltroutso sitting around taking up space14:15
marcoceppi_well, enjoy o/14:16
magicaltroutmarcoceppi_: can you dig out that big half pull request thing that existed for getting Centos support in Reactive?14:40
magicaltroutI want to improve on the DC/OS stuff charms, but I also want them to run on Centos because thats what they support and not my half bodged Ubuntu support14:41
magicaltrouti'm doing their office hours in a couple of weeks and it'd be a nice thing to discuss alongside generic Juju support14:42
lazyPowerOh yeah, bogdan sent that in didn't he?14:44
magicaltroutdunno :)14:44
lazyPoweri think it withered on the vine, we left some feedback and it wasn't circled back.14:45
magicaltroutclearly i could write old school charms, but the hooks make me so sad :)14:45
lazyPowerbut thats from memory so its likely incorrect14:45
lazyPoweri dont blame you magicaltrout, not one bit14:45
magicaltroutno its likely correct, marco showed me the PR ages ago and I said I'd take a look14:45
magicaltroutthen didn't14:45
magicaltroutbut i didn't really have a use case 6 months ago14:45
marcoceppi_magicaltrout: it landed a while ago14:46
magicaltroutlanded as in merged or landed as in got pushed as a PR a while ago and is now very stale? marcoceppi_14:51
magicaltrouti suspect the latter, which is fine, I was just going to take the spirit of the commit and figure out what needs changing, refactoring and implementing14:52
=== dannf` is now known as dannf
marcoceppi_magicaltrout: it's landed and released15:22
marcoceppi_magicaltrout: we don't have centos base layers, or anything there, but charm-helpers is present with basic centos support: http://bazaar.launchpad.net/~charm-helpers/charm-helpers/devel/files/head:/charmhelpers/core/host_factory/15:24
lazyPoweroh nice15:25
magicaltroutinteresting!15:25
lazyPowermagicaltrout - see? I told ya my brain was probably incorrect15:25
* lazyPower mutters something about scumbag brain15:25
magicaltroutalright then15:27
magicaltroutso i can take a stab at creating a centos base layer15:27
magicaltroutthat would make the Mesos guys and the NASA guys happy15:31
* magicaltrout greps around in the code to see whats going on15:31
marcoceppi_magicaltrout: yeah, I think we need to rename basic to ubuntu, tbh. Having a centos layer adds complexity, cory_fu and I chatted breifly about having the notion of a special "base" layer where layer:ubuntu and layer:centos wouldn't be compatible15:41
marcoceppi_magicaltrout: then again, snaps will basically save us from ever needing to worry about distros again15:41
magicaltrouttrue that15:45
magicaltroutlooking forward to snap integration15:45
magicaltrouti've cloned layer-basic locally, i'll create a layer-basic-centos to prevent clases15:46
magicaltroutclashes15:46
marcoceppi_we have a snap layer that stub wrote, I know ryebot and Cynerva have taken a run at it a few times for kuberentes stuff15:46
magicaltrouti'll have to check that out when i get time... i'd still get the "does it run on centos" question though ;)15:46
magicaltroutSA's are a picky bunch... why can't they just get on with it?15:47
magicaltroutI don't care if it doesn't fit their nagios templates ;)15:47
magicaltroutmarcoceppi_: to save me digging all over the place16:33
magicaltroutdo you know where you define stuff that ubuntu needs to install when creating a new machine to run the charms, python3 etc16:33
magicaltroutbootstrap_charm_deps16:35
magicaltroutthink i found it16:35
marcoceppi_magicaltrout: yeah, it's in lib/layer/basic.py of the basic layer IIRC16:49
magicaltroutaye boos16:51
magicaltroutboss16:51
jcastrohttp://askubuntu.com/questions/875716/juju-localhost-lxd17:10
jcastroany ideas on this one?17:11
magicaltroutwell17:14
magicaltroutyou don't do intermodel relations yet17:14
magicaltroutso thats not going to fly very far currently17:14
=== frankban is now known as frankban|afk
lazyPowermagicaltrout - sounds like they want 2 controllers, so two completely isolated juju planes on a single host17:39
lazyPoweri'm not sure if we've even piloted that use case as a POC, as multi-model controllers removed a lot of the concerns here17:39
magicaltroutlazyPower: dustin swung by and asked about the DC/OS stuff, said he'd be happy to do some collaboration and pitch some talks at Mesoscon and stuff, now I'm off up to the Mesos meetup in London to go talk to the Mesosphere guys, maybe one day someone will offer to write the code as well! ;)17:42
lazyPowermagicaltrout - i'd love to do that integration work *with* you17:42
magicaltroutanyway the LXD poc is coming along slowly17:42
lazyPoweri think there's a good overlap in changes in k8s and mesos charms that will make it simple17:42
magicaltroutonce we have something that semi works i'll get you guys and the mesosphere guys looped in to fill in the gaps17:42
lazyPoweryou're just plugging in api ip's and what not, i think theya re both rest based no?17:43
lazyPowerand some dinky config flags to replace the scheduler too, dont let me forget those important nuggets17:43
lazyPoweras a complete oversimplification of the problem domain ^17:43
magicaltroutyou've lost me now17:44
magicaltroutwhat are you asking?17:44
lazyPoweroh were you not talking about k8s/mesosphere integration?17:44
magicaltroutlxd -> Mesos <-juju17:45
magicaltrout      ^17:45
magicaltrout      -17:45
magicaltrout     k8s17:45
lazyPoweroic17:45
lazyPoweryeah i missed the mark there17:46
magicaltroutwell the DCOS chaps are certainly interested in getting LXD into Mesos which would be a win because then juju can bootstrap against mesos17:46
magicaltroutyou could run K8S on Juju within Mesos ;)17:46
magicaltroutand on your side Dustin says he's interested in getting Juju and DC/OS playing nicely which includes the LXD stuff17:47
magicaltroutbut! when that starts coming together17:48
magicaltrouti shall be stealing your flannel network and stuff17:48
magicaltroutnot that i have a clue how to monetize such a platform, but in my head juju managed mesos and juju bootstrapped mesos sounds pretty sweet17:50
lazyPoweri concur17:52
lazyPowermaybe the monetization is the support of such a platform and not the tech itself17:52
lazyPowernot that anybody we know does that *whistles*17:52
arosaleskwmonroe: if you have a sec I am seeing some curious behavior with spark via the hadoop-spark bundle + zeppelin18:34
arosaleskwmonroe: doesn't seem like sparkpi is running successfully per the action output http://paste.ubuntu.com/23859116/  I also don't see the job in the spark job history server @ http://54.187.77.213:8080/18:34
arosalesI do see "2017-01-24 17:28:22 INFO sparkpi calculating pi" in the /var/log/spark-unit logs though .  . .18:39
kwmonroearosales: i'm not surprised at the lack of spark info in the spark job history server.. you're probably in yarn-client mode, which means the job will log to the resourcemanger (expose and check http://rm-ip:8088 to be sure)18:44
kwmonroearosales: in your pastebin, search for "3.14", you'll see it.18:45
kwmonroegranted, that action output needs some love.. no sense in having all the yarn junk in there18:45
arosaleskwmonroe: I do see jobs @ http://54.187.77.213:18080/ through18:47
arosalesand it looks like pagerank ran ok, just issues with sparkpi18:47
kwmonroearosales: i see 3 sparkpi jobs at your link... did you run some after the pagerank that are no longer showing up?18:49
arosalesI did18:49
arosaleskwmonroe: resourcemanger = http://54.187.130.194:8088/cluster18:50
kwmonroearosales: spark 8080 is the cluster view which (i think) will only show jobs when spark is running in spark mode (that is, not yarn-* mode).  spark 18080 is the history server that will show jobs that ran in yarn mode -- yarn 8088 also shows these.18:52
arosaleskwmonroe: so looks like it ~is~ running just the raw output for the sparkpi action may be a little tuning?18:52
arosaleskwmonroe: sorry, and to your previous question if I am _not_ seeing jobs I submitted --- I think I am seeing all the jobs @ http://54.187.77.213:18080/ and http://54.187.130.194:8088/cluster18:53
kwmonroeyeah arosales, from those URLs, it looks like you're seeing all the spark jobs (sparkpi + pagerank) on the spark history server (18080), and all the spark *and* yarn jobs (sparkpi + pagerank + tera*) on the RM (8088)18:55
arosaleskwmonroe: was http://paste.ubuntu.com/23859116/ the output you were expecting for sparkpi ?18:56
arosalesto your point "3.14" is in there just amongst a lot of other data18:56
kwmonroeso arosales, i mispoke in my 1st reply.  to be clear, you should see all spark jobs in the spark history server (18080) and resourcemanager job history (8088).  you will *not* see  job history in the spark cluster view (8080) while in yarn mode.18:57
arosaleskwmonroe: ack18:57
kwmonroeyup arosales, as long as it says "pi is roughly 3.1", we promulgate ;)18:57
kwmonroebecause ovals are basically circles18:57
arosaleskwmonroe: I was just concerned about the 153 lines of output to tell me a circle is like a oval18:58
kwmonroe:)  ack arosales, i'll see if we can clean up that raw output without hiding the meat.18:58
* arosales will submit a feature on the spark charm ;-)18:59
arosalesJust glad its working as expected18:59
arosaleskwmonroe: thanks for the help18:59
kwmonroenp arosales - thanks for the action feedback!  if it ever returns "Pi is roughly a squircle", let me know asap.19:01
arosaleskwmonroe: I'll not that for the failure scenerio19:02
CoderEuropemarcoceppi_: You ready in about an hour ? (#Discourse)19:04
kwmonroearosales: one more thing.. if you've still got the env, would you mind queueing multiple sparkpi/pagerank/whatever actions in a row?  i noticed sometimes the yarn nodemanagers would go away if memory pressure got too high, and yarn would report multipel "lost nodes" at http://54.187.130.194:8088/cluster.   so if you don't mind, kick off a teragen, sparkpi, and pagerank action at the same time so i can watch.19:12
kwmonroeiirc, sometimes the RM would wait for resources before firing off those jobs, but sometimes not.  curious if that's easily recreatable.19:13
arosaleskwmonroe: gah, missed this message before I tore down :-/19:23
arosaleskwmonroe: easily reproducible though19:23
kwmonroenp arosales -- i've been meaning to get to the bottom of that.  i'm like 25% sure we fixed it when we disabled the vmem pressure setting.  i've got a deployment in my near future, so i'll check it out.  just one of those things i keep meaning to try but keep forgetting until someone like you comes along.19:25
arosaleskwmonroe: np, spinning back up. I'll let you know what I find19:25
kwmonroethanks!19:26
marcoceppi_CoderEurope: yup!19:26
CoderEuropemarcoceppi_: On here ? or do you want me to keep PM'ing you ?19:27
CoderEuropeBack in ten mins19:28
CoderEuropemarcoceppi_: 15 to go ....... :)19:46
rick_hCoderEurope: heh, you're on a mission eh?19:50
CoderEuroperick_h: The weather is with us tonight ! https://www.youtube.com/watch?v=z3TNGDVOMA4&feature=youtu.be19:54
rick_hCoderEurope: oh man, did not see that coming19:54
CoderEuroperick_h: I have ordered one of these in my budget: http://amzn.eu/e93uNrY19:55
arosaleskwmonroe: ok got 20+ actions running @ http://54.202.97.95:8088/cluster  from resourcemanager and spark20:02
CoderEuropemarcoceppi_: you ready ?20:04
arosaleskwmonroe: ref of actions running = http://paste.ubuntu.com/23859496/20:04
CoderEuropeI have a nose bleed - hangon marcoceppi_20:07
rick_hwhoa holy actions arosales20:08
arosaleskwmonroe: wanted some mem pressure20:09
arosales:-)20:09
CoderEuropemarcoceppi_: Okay at the ready :D20:09
marcoceppi_CoderEurope: yo, lets do this!20:12
arosaleskwmonroe: also in ref to spark sparkpi action  https://issues.apache.org/jira/browse/BIGTOP-2677  (low priority)20:14
kwmonroe+100 arosales!  thanks for opening the jira.20:28
kwmonroeugh, i missed an opportunity to +3.1420:29
lazyPowerkwmonroe fun fact, my apartment number is pi20:29
kwmonroeyou must have a very long apt number lazyPower20:29
lazyPowerits a subset of pi, but pi all the same20:29
kwmonroe3?  close enough.  r square it.20:30
lazyPower:P20:30
arosaleskwmonroe: let me know if you need me to run any more tests on this hadoop/spark cluster20:36
kwmonroeso arosales, you've hit it.  see your cluster UI http://54.202.97.95:8088/cluster.. 3 "lost nodes".20:37
* arosales looks20:38
kwmonroearosales: what i'm waiting to see now is whether or not yarn will recover and process the last running job if/when the nodemgrs come bak.20:38
arosalesah http://54.202.97.95:8088/cluster/nodes/lost  -- kind of hidden20:38
arosaleskwmonroe: ack, I"ll let it run20:38
CoderEuropeLenovmarcoceppi_: Just to keep you in the loop - my chromebook just crashed - 2 mins till re-surface on meet.jit.si20:38
kwmonroearosales: what has happened here is that each of your nodemgrs is only capably of allocating 8gb of physical ram.  when yarn schedules mappers or reducers (which take min 1gb) on a nodemgr that already has a few running, it can "lose" those nodes.  i don't know the diff between an "unhealthy" node and a "lost" node, but i think the latter can come back once jobs complete.  what's interesting to me is that you got through 1920:47
kwmonroe of the 20+ jobs.  surely that's not a coincidence.20:47
arosalesat 8gb per nodemgr and 3 nodemgr wee should expect to see around 24 jobs at least allocated, correct?20:48
kwmonroearosales: i would expect 24 jobs *possible* because the min allocation to each nodemgr is 1gb.  but jobs may specify mem constraints that differ from the min.  at any rate, 73 of your 74 jobs have completed, which makes me think those nodes get lost and come back once they're freed up -- like "i'm busy, don't allocate to me anymore".20:50
=== frankban|afk is now known as frankban
kwmonroearosales: and now i see 82 jobs.  you're not giving this thing a break, are you :)20:51
kwmonroei mean, "big data" is usually just a handful of word count jobs.. #amirite lazyPower?20:52
arosaleskwmonroe: just for my education once a node is marked as lost does it return?  Seems the three @ http://54.202.97.95:8088/cluster/nodes/lost haven't been "returned"20:52
lazyPowertrufacts20:52
lazyPowerbut you were probably looking for the reaction: (ノ´・ω・)ノ ミ ┸━┸20:53
arosaleskwmonroe: I stopped submitting a bit ago, but the submitted jobs are at http://paste.ubuntu.com/23859496/20:53
CoderEuropemarcoceppi_: Awesome and valuable work there ... high five o/20:53
marcoceppi_\o20:53
* CoderEurope leaves till tomorrow - bye y'all.20:55
kwmonroearosales: i don't know for sure (if lost nodes return).  i think they do once they have > min phys mem available.  i have hope.  when we started this convo, you had 73 of 74 jobs completed with 3 nodes lost.  now you have 89 of 90 jobs complete.  i think that means that your job completion rate has lengthened, but they do appear to be completing eventually.. i suspect that's because a lost node comes back to grab another20:55
kwmonroetask.20:55
kwmonroeyeah arosales, for sure that's what's happening.  the terasort job that was running with 3 lost nodes has completed.  now 3 are still lost, but a new app (nnbench) is going.20:56
kwmonroeso all is well.  your cluster is just kinda small for 90+ concurrent jobs :)20:56
arosaleskwmonroe: thanks for the info, and here with the hadoop-spark bundle e hae 3 separate units hosting the slave (datenode and node manager) correct?20:58
kwmonroecorrect20:58
arosalesand how do we map those units to the lost nodes (http://54.202.97.95:8088/cluster/nodes/lost)20:59
kwmonroearosales: they are the same.  your 3 lost nodes are the 3 slave units.21:01
kwmonroeso, what makes a node lost?  a failed health check, or a violation of constraints (like < min memory available).  the former means the nodemanager slave unit hasn't told the RM that it's available for jobs; the latter means it can't take jobs because it doesn't have enough resources available.21:02
arosalesthat was what I thought, but then I was wondering where were the 3 lost nodes (http://54.202.97.95:8088/cluster/nodes/lost) at in reference to units21:02
arosaleslogically the 3 lost have different node addresses than the 3 active nodes21:03
kwmonroenegative arosales - the node address is the ip of the slave unit.21:03
arosalesyes, you are correct. Same address different ports21:04
arosaleswhich makes me think we start a new process on the unit for a new node21:04
arosales_if_ we started with 321:05
kwmonroeyup arosales -- when the RM has a new task, it farms it out to the slaves / nodes and they spawn a new java (hi mbruzek), which opens a new port.21:05
mbruzekyes Kevin?21:05
kwmonroejust java buddy21:06
mbruzekOK21:06
arosaleskwmonroe: gotcha -- cool thanks for the lesson here21:06
arosaleskwmonroe: no pending actions on the juju side -- all have completed21:07
kwmonroeack arosales... now let's watch and see if the nodes come back in the cluster view now that the jobs are done.21:08
kwmonroehealth checks happen every 10ish minutes21:08
kwmonroewhich should bring them back if there are no jobs21:08
* arosales will eagerly await kwmonroe21:09
=== alexisb is now known as alexisb-afk
kwmonroearosales: your cluster ui (:8088) shows 3 active and 3 lost nodes.  i'm not sure how to reset the lost count without restarting yarn, or even if the 'lost' count is all that important given the expected nodes are 'active' now.  i'll google around to try and learn more about 'lost'.21:31
arosaleskwmonroe: given we started with 3 and we have 3 active. I am not sure lost are recliamable21:35
arosaleskwmonroe: it seems the cluster did try to keep the 3 "active" albeit starting a new "node" process on the 3 given units21:37
kwmonroeyeah arosales, it seems restarting yarn (sudo service hadoop-yarn-resourcemanager restart) resets the lost node count.  maybe the lost node count is supposed to be indicative of how many times the yarn cluster was starved for resources.  i really don't know.  i wish you never asked me this on a public forum because now people know i'm ignorant (on this one thing).21:49
kwmonroebut thanks for running this with me -- like i said earlier, i had been meaning to get to the bottom of lost nodes.  it's good to know the cluster can still do jobs (evinced by your 90+ jobs) even if nodes get reported as lost.21:51
Teranetwith which command is : juju set been replaced with in juju 2.x ?21:51
kwmonroeTeranet: juju config21:52
Teranetso syntax I hope right ?21:52
kwmonroejuju config <app> <key>=<value>.  pretty sure it's the same, just s/config/set21:53
kwmonroeer, s/set/config21:53
arosaleskwmonroe: np, thanks for the info and keeping with my odd questions :-)21:59
arosalesI could have tried to look it up, but it I was lazy and just pinged you22:00
lazyPowerooo someone said my name, but not directed at me :D22:05
arosaleslazyPower: :-)22:11
magicaltrouthe was balding... he was pink... they called him kevin....keeevin. He had a swimming pool, with questionable filtration... they called him Kevin Keeeeeevin or just kwmonroe22:59
magicaltroutkwmonroe: if i punt you chaps some big data chapters from my little book over the next couple of weeks can you give them the once over?23:00
=== alexisb-afk is now known as alexisb
arosalesmagicaltrout: I was going to catch you in gent, but would love to look over your chapters. Would like to see what you have lined up for testing and layer creation23:08
=== frankban is now known as frankban|afk
magicaltroutnot a great deal yet arosales :) I have some stuff sketched out for layer creation. I didn't do testing yet because a) kjackal_ suggested he might help out there and also I wanted to review what went down in the office hours the other day with the python stuff from bdx23:10
magicaltroutto try and aggregate examples23:10
magicaltrouti'll be certain to punt stuff your direction though to help suggest gaps23:11
magicaltrouti'm crowdsourcing knowledge ;)23:11
bdxmagicaltrout: tvansteenburghhttps://gist.github.com/jamesbeedy/dad808872e5488b43cf3fa5d5f2db87c23:12
arosalesmagicaltrout: would be good to catch you up on charm-ci in gent if not sooner23:12
bdxerrrg, tvansteenburghhttps was just giving me some pointers on that jenkins script I've been working on23:13
arosalesmagicaltrout: but yes kjackal is also an excellent guy to help out with the charm ci bits as well23:15
magicaltroutbdx: someone on the book feedback stuff specifically asked for CI and testing examples23:15
magicaltroutso I figure we should probably solve that conundrum ;)23:16
TeranetHey do we have anyone here who knows rabbitmq-server setup for a cluster ? I got 3 nodes but somehow they won't peer right.23:27
TeranetLog output and OpenSTACK overview : http://paste.ubuntu.com/23860574/23:32
arosalesTeranet: not sure what folks are around atm, but you also may have some luck posting in #openstack-charms23:35
Teranetthx will do23:36
balloonswallyworld, the osx bug is on us indeed. You're free :-)23:36
wallyworldyay, ty23:37

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!