SpamapS | hazmat: right, any juju after "reboot" support cannot spawn nodes for a juju before reboot support. | 01:39 |
---|---|---|
SpamapS | hazmat: so distro support is incompatible in oneiric | 01:40 |
=== almaisan-away is now known as al-maisan | ||
vladas | hello, can someone help me? | 11:52 |
vladas | I'm trying to deploy openstack on vm's and got weird error. | 11:53 |
vladas | I have deployed all charms successfuly, except nova-volume and openstack-dashboard | 11:56 |
vladas | I'm getting these errors: | 11:56 |
vladas | Error processing 'cs:precise/nova-volume': entry not found | 11:56 |
vladas | Error processing 'cs:precise/openstack-dashboard': entry not found | 11:57 |
vladas | Any ideas? | 11:57 |
hazmat | vladas, those charms are not in the store | 12:01 |
hazmat | vladas, they need to be deployed from bzr | 12:01 |
vladas | Oh, i see. Thanks, will try that. | 12:04 |
=== al-maisan is now known as almaisan-away | ||
=== vladas is now known as hitexlt | ||
=== carif_ is now known as carif | ||
=== zyga is now known as zyga-food | ||
=== zyga-food is now known as zyga-food] | ||
=== zyga-food] is now known as zyga-food | ||
=== zyga-food is now known as zyga | ||
=== almaisan-away is now known as al-maisan | ||
=== al-maisan is now known as almaisan-away | ||
=== mrevell_ is now known as mrevell | ||
SpamapS | gooooooooooood morning juju town! | 15:34 |
mramm | gooood morning to you! (sorry for the reduction in O's) | 15:35 |
SpamapS | Its like an echo.. it can't sustain.. ;) | 15:36 |
marcoceppi | gd morning | 16:09 |
SpamapS | marcoceppi: done with wordpress? Are you done yet? How about now? done..? | 16:11 |
SpamapS | ;) | 16:11 |
marcoceppi | no yes! nope not yet | 16:11 |
marcoceppi | it's in bzr now though | 16:11 |
SpamapS | Ugh that reminds me I need to finish nagging the last of the un-maintained charms | 16:12 |
koolheadd17 | SpamapS, met someone at cls12 who is working on newest owncloud charm | 16:13 |
SpamapS | koolheadd17: oh? | 16:13 |
SpamapS | koolheadd17: did you hand it off to them or they just wanted to make it better? | 16:14 |
koolheadd17 | SpamapS, he wanted to make it better | 16:14 |
SpamapS | jimbaker: jitsu --help looks *amazing* now btw. THANK YOU | 16:14 |
SpamapS | koolheadd17: *awesome* | 16:14 |
koolheadd17 | he is working on integrating nfs based support too i guess bkerensa knows him | 16:15 |
SpamapS | cool | 16:21 |
koolheadd17 | i have few charms in baking state though :) | 16:22 |
SpamapS | cool | 16:22 |
negronjl | 'morning all | 16:22 |
SpamapS | I'm working on making nagios more awesome | 16:23 |
koolheadd17 | SpamapS, waoo :) | 16:23 |
SpamapS | and then hopefully collectd too | 16:23 |
koolheadd17 | SpamapS, is someone working on Monit? | 16:23 |
=== salgado is now known as salgado-lunch | ||
SpamapS | koolheadd17: https://bugs.launchpad.net/charms/ ... look there | 16:23 |
koolheadd17 | k | 16:24 |
koolheadd17 | SpamapS, so jcastro google doc list and all is primitive now | 16:24 |
SpamapS | koolheadd17: it has been dead for months now | 16:26 |
SpamapS | it should have been deleted if it hasn't been already | 16:26 |
koolheadd17 | ejat, around | 16:30 |
ejat | koolheadd17 : yups .. | 16:30 |
koolheadd17 | ejat, are you coming to china for openstack asia pacific event | 16:30 |
ejat | hmmm not sure yet … | 16:31 |
ejat | how about u ? confirm ? | 16:31 |
koolheadd17 | not yet, need to talk to my manager on the same and visa might be difficult | 16:31 |
koolheadd17 | ejat, you should definately go there | 16:31 |
* ejat in the mist of changing employment … so could not give a say yet .. but hopefully … | 16:32 | |
koolheadd17 | ejat, ol | 16:33 |
koolheadd17 | k | 16:33 |
ejat | SpamapS : is there any tutor to deploy OS using juju after installing via MAAS ? | 16:33 |
* ejat means for production environment .. | 16:34 | |
ejat | or how to wiki .. . | 16:34 |
SpamapS | ejat: yes there's something on wiki.ubuntu.com | 16:39 |
ejat | https://help.ubuntu.com/community/UbuntuCloudInfrastructure <-- ok thanks | 16:43 |
jimbaker | SpamapS, thanks. over at oscon with m_3 working on demo stuff | 16:55 |
SpamapS | jimbaker: woot | 16:55 |
bkerensa | m_3: so the dependency should not be tight according to the upstream author so I removed the ppa portion | 17:04 |
hazmat | jimbaker, m_3 rackspace is a fail btw | 17:05 |
hazmat | jimbaker, m_3 they implement their own version of credential passing | 17:05 |
hazmat | instead of keystone v2, even fixing that there are some other issues | 17:05 |
hazmat | jimbaker, m_3 i'd go with ec2 or hpcloud | 17:05 |
m_3 | hazmat: I love you man... `jitsu export | jitsu import -enewenv -` worked | 17:06 |
m_3 | bkerensa: thanks | 17:07 |
=== salgado-lunch is now known as salgado | ||
negronjl | SpamapS: got a chance to look further re: haproxy-overhauled ? | 17:22 |
SpamapS | negronjl: I want to wrap up this nagios thing I did, and then I'll dive back into it | 17:28 |
negronjl | SpamapS: you hate me :) | 17:29 |
SpamapS | hazmat: wtf?! no keystone? | 17:29 |
SpamapS | negronjl: because you're beautiful | 17:37 |
_mup_ | Bug #1025382 was filed: Add a generic constraint "persistent-root" <juju:New> < https://launchpad.net/bugs/1025382 > | 17:44 |
hazmat | SpamapS, its a bastardized keystone | 18:06 |
hazmat | SpamapS, mostly its just different credential passing | 18:07 |
SpamapS | hazmat: ahh the fun of apache licensing | 18:07 |
hazmat | SpamapS, openstack is a kit for building your own snowflakes ;-) | 18:07 |
SpamapS | "Lets create a single place to dump code that nobody runs" | 18:07 |
SpamapS | btw.. 'juju status | ccze -A' is purty | 18:07 |
SpamapS | we should like, build in --color to juju cli | 18:08 |
hazmat | SpamapS, don't even get me started on how broken rspace is about their client tools.. nova client has had broken packages for months, and their swift client doesn't work with their custom keystone either. | 18:08 |
SpamapS | 'started' is green, 'error' is red.. | 18:08 |
hazmat | nice | 18:09 |
* hazmat should file a bug on status improvements | 18:09 | |
SpamapS | hostnames are blue | 18:09 |
SpamapS | and known things are purple.. which is weird.. nrpe and mysql are purple.. but nagios is not | 18:10 |
SpamapS | wow | 18:11 |
SpamapS | jitsu export is.. like, everything I've been looking for | 18:11 |
SpamapS | ^5's hazmat | 18:11 |
hazmat | SpamapS, :-) | 18:11 |
SpamapS | That should be like, feature #1 to go into goju | 18:11 |
hazmat | SpamapS, i haven't pimped it out to folks yet, cause people hates features.. | 18:11 |
hazmat | but perhaps you guys can | 18:11 |
SpamapS | Its enough to prove the concept is valid | 18:11 |
SpamapS | we can move forward from there once we have a permanent home for the feature :) | 18:12 |
hazmat | SpamapS, this is pretty much the exact syntax from the austin/lxc sprint ml discussion | 18:12 |
hazmat | bcsaller, is the repairs branch ready for review? | 18:41 |
bcsaller | hazmat: since we are not supporting restarts I was just testing changes that use start-ephemeral to see if its faster for people | 18:42 |
hazmat | bcsaller, pls stop adding additional features to the bug fix, its much, much more important to release the fix for the original bug that add extraneous things. | 18:42 |
hazmat | s/than | 18:42 |
hazmat | moreover most of things are inappropriate for an SRU | 18:43 |
hazmat | and at the very least need to have separate bugs/documentation | 18:44 |
bcsaller | hey, before that bug you were the one that had me working on this ;) | 18:44 |
hazmat | bcsaller, i wanted a fix for the bug, not for any of the other things that are in that branch | 18:44 |
hazmat | some of those things can definitely go into an SRU, but their still more appropriate and better documented with separate branches and bugs. | 18:45 |
hazmat | and specifically for the download feedback, that was in the context of informing of downloads when doing cloud-images, not tying the existing ma into desktop-notifications. | 18:46 |
hazmat | feels much more like pending stuff that's been discussed and you decided to put into this branch, but it was never part of the expectation for the bug in question | 18:47 |
SpamapS | a good SRU btw, would be to *just* remove the 'start on' from the upstart jobs | 18:48 |
* hazmat nods | 18:48 | |
SpamapS | so you can land whatever fix you want | 18:49 |
SpamapS | Once its fixed in quantal, I can do a much simpler patch for SRU :) | 18:49 |
hazmat | actually for the upstart jobs, its better to just not have them if it doesn't survive restart and just fork the relevant processes. | 18:52 |
hazmat | rather than dropping files into sys dirs that are irrelevant | 18:52 |
SpamapS | yes that would be great | 18:53 |
SpamapS | tho its nice to be able to easily restart them | 18:54 |
SpamapS | hazmat: upstart has a notion of user jobs | 18:54 |
SpamapS | but I think they're off by default IIRC | 18:54 |
SpamapS | hrm, getting lots of tracebacks when running relation-get -r$relid in an upgrade-charm hook | 19:33 |
SpamapS | http://paste.ubuntu.com/1095430/ | 19:34 |
SpamapS | Ok, I *think* have working nagios+nrpe in a generic fashion... | 19:38 |
SpamapS | to the point where this is all thats needed to add monitoring to wordpress: | 19:39 |
SpamapS | http://paste.ubuntu.com/1095441/ | 19:39 |
james_w | cool | 19:41 |
SpamapS | james_w: I haven't forgotten custom nagios plugin support either :) | 19:41 |
james_w | :-) | 19:42 |
SpamapS | james_w: lp:~clint-fewbar/charms/precise/nrpe/trunk ... but.. I'd wait for my README updates and blog post.. there's a *LOT* there :) | 19:42 |
SpamapS | time for lunch | 19:43 |
=== salgado is now known as salgado-afk | ||
dpb___ | is there a limit to the number of lxc clients that juju could effectively start? | 21:37 |
SpamapS | dpb___: ZooKeeper keeps everything in RAM.. and each one needs at least 400MB of disk space... | 21:40 |
SpamapS | dpb___: so, yes, your box will crumble well before any logical limits are reached | 21:40 |
dpb___ | oh, nice. | 21:41 |
SpamapS | dpb___: I've done 7 at a time before.. it made my SSD feel slow. :) | 21:42 |
SpamapS | All that dpkg :-P | 21:42 |
dpb___ | ya, I just tried to spin up 10 and it's zookeeper is throwing errors in debug-log | 21:42 |
dpb___ | new one for me: 2012-07-16 22:08:28,265 ERROR Invalid Remote Path provider-state | 22:09 |
imbrandon | SpamapS: even the local ZK keeps a full FS and not just pointers to it ? so ... in LXC a single NRPE deploy would be 1.2GB of disk just to store the charm | 22:17 |
imbrandon | ... that seems wrong. | 22:18 |
dpb___ | ok, for future reference, data-dir cannot do ~ substitution. :) | 22:18 |
imbrandon | ( also seems wrong there is 400MB of data in the charm ... ) | 22:18 |
SpamapS | imbrandon: no it doesn't keep *EVERYTHING* in it | 22:22 |
imbrandon | actually a more than 2.4GB ... 400 in the local:repo ~/.juju/charmname 400 more and then ZK bootstrap lxc has 400mb on the zk fs and then it deploys to another lxc that add 400 in /var/lib/juju/service/* and finaly another in /mnt or where ever its running from the final 400mb... | 22:22 |
SpamapS | imbrandon: it keeps everything that is in the ZK tree in RAM | 22:22 |
imbrandon | right i'm just saying a 400mb charm is 2.4gb deployed in LXC | 22:22 |
imbrandon | seems ... off | 22:22 |
imbrandon | SpamapS: btw got me a shiney new fileserver powered on and in the spare bedroom ( my in-home DC , lol ) | 22:25 |
imbrandon | now i have enough disks spun up and in the right manner that i can migrate things from 4 other machines I have around doing various things all to that one + any minor crons etc they ran, and I've already purchased a rsync.net account just to offsite backup that one box GREATLY simplifying by @home setup that has become a frankenstein over the years ... one server, running minimal services with 12tb of raided storage and a offsite backup ready to g | 22:28 |
imbrandon | BUT the major win out of the whole thing , and what caused me to finaly do it this weekend ? heheh in-house MAAS with OpenStack on 5 nodes ( got a lil laptop to use as the controller ) with Gigabit speeds | 22:29 |
imbrandon | woot woot | 22:30 |
imbrandon | :) | 22:30 |
imbrandon | hoping by Wed or so I'll have everyting copied into place and verified etc etc + the boxes reprovisioned fo sum fun hehh | 22:31 |
imbrandon | ( i keep off site backups of family members data too so i cant afford to goof it up , lol ) | 22:31 |
SpamapS | imbrandon: where do you get 2.4gb from 400MB? | 23:06 |
SpamapS | imbrandon: you want CoW for the charm on local? Thats a bit far reaching. ;) | 23:06 |
SpamapS | imbrandon: some day.. not today | 23:06 |
imbrandon | SpamapS: yea , i konw its not fixable ( well not in the current context other things need to fall into place first ) | 23:08 |
imbrandon | just had no real strong feeling about more data in the charm than config templates untill just now tho | 23:09 |
imbrandon | and i realized that | 23:09 |
SpamapS | 400Mb would be a bit weird | 23:09 |
imbrandon | i mean i dident like it but was like meh, now it seems like a very very bad idea | 23:09 |
imbrandon | and yea 2.4 | 23:09 |
imbrandon | count it with me , maybe i'm wrong ... | 23:10 |
SpamapS | 400 in the charm.. 400 into the file storage.. 400 on the disk layed down.. thats 1.2G not 2.4 | 23:10 |
imbrandon | ok so you have /var/lib/charm/400.mb then you juju bootstrap and it copies it to ~/.juju/400.mb and then again to the lxc zk/400.mb then you "juju deploy charm" and zk copies it to new node at /var/lib/charm/400.mb and then the hooks fire and unpack it to /mnt/400.mb | 23:11 |
imbrandon | whats that add up ... all on your laptop ... but even on ec2 thats a bit extreeme | 23:12 |
imbrandon | oh, and there is also a cached copy in s3 as well ... | 23:12 |
SpamapS | imbrandon: I don't believe it copies it to ~/.juju if its a local: | 23:15 |
SpamapS | imbrandon: and it never copies the charm into zk | 23:15 |
SpamapS | zk is structure only | 23:15 |
SpamapS | imbrandon: /var/lib/charm also doesn't exist :) | 23:15 |
imbrandon | ahh ok so i was mixing up my zk/400.mb with s3/400.mb | 23:15 |
imbrandon | and thats where i deploy local: form | 23:16 |
imbrandon | from* so just used it in my head | 23:16 |
* imbrandon echo "JUJU_REPOSITORY=/var/lib/charms/precise/" >> /etc/profile.d/juju-charms | 23:17 | |
imbrandon | :) | 23:17 |
SpamapS | Hrm, bug in subordinates | 23:18 |
imbrandon | nice clean place outside of my ~/Projects/juju/charms/ to let "charm getall" live and not mix uninstentionally with the ones in Projects i'm actively developing :) | 23:19 |
SpamapS | if I destroy the primary service.. I never see a depart for the unit of the subordinate | 23:19 |
imbrandon | hrm | 23:21 |
imbrandon | race ? | 23:21 |
SpamapS | something like that | 23:22 |
SpamapS | 2012-07-16 16:17:03,362 unit:nrpe/5: statemachine DEBUG: relationworkflowstate: transition complete depart (state departed) {} | 23:22 |
imbrandon | i never took the time to dig into pyjuju enough to be effective even trying to look since i'm just concentrating on other things waiting for go | 23:22 |
SpamapS | Thats the depart from the sub<->primary relationship | 23:22 |
imbrandon | ok ... | 23:23 |
SpamapS | so its possible it just commits hari-kari and never cleans anything else up | 23:23 |
imbrandon | so thats right ... | 23:23 |
imbrandon | oh well sure , well kinda | 23:23 |
SpamapS | it should be departing all of its non-primary relations first | 23:23 |
imbrandon | isnt that the point of its async call backs is it can do it anytime, so it is right? its just sloppy | 23:23 |
imbrandon | like i said not dug enough into this bit to be really informed | 23:24 |
imbrandon | just thought thats how it was tho | 23:24 |
SpamapS | Yeah unless one of your callbacks does its own little self-suicide instead of telling the reactor to exit when its all done | 23:24 |
imbrandon | right, so its a bug but not quite the same kind | 23:24 |
SpamapS | this is all speculation | 23:24 |
imbrandon | just a bad implmentation not accounting for any order | 23:25 |
SpamapS | I have a reproducible problem | 23:25 |
imbrandon | right rihgt | 23:25 |
SpamapS | which causes my nagios stuff to never clean up after itself :-/ | 23:25 |
imbrandon | just just trying to help ya play strawman :) | 23:25 |
SpamapS | which sucks | 23:25 |
imbrandon | but you destroyed it | 23:25 |
imbrandon | why care ? | 23:25 |
SpamapS | because I've been able to do it without regenerating the whole nagios config every time | 23:25 |
SpamapS | I destroyed it | 23:25 |
SpamapS | so now I need to remove it from the nagios configs | 23:25 |
imbrandon | oh THAT hook isnt going ? | 23:25 |
SpamapS | thats the hook that isn't going | 23:26 |
imbrandon | but i thought that they all fire just late ... | 23:26 |
imbrandon | or ... ok one sec let me re-read that above | 23:26 |
imbrandon | confused mtself i think lol | 23:26 |
SpamapS | there's a relationship between nrpe<->nagios .. and one between wordpress<->nagios .. and when I destroy wordpress, it nrpe/X is gone.. but never departs from the nagios<->nrpe relationship | 23:26 |
imbrandon | ummm | 23:27 |
imbrandon | it shouldent | 23:27 |
imbrandon | oh wait ... | 23:27 |
imbrandon | it /should/ | 23:27 |
imbrandon | but dident you tell me that before that might happen | 23:28 |
imbrandon | with something i did in the 1st newrelic ones | 23:28 |
_mup_ | Bug #1025478 was filed: destroying a primary service does not cause subordinate to depart <juju:New> < https://launchpad.net/bugs/1025478 > | 23:28 |
imbrandon | one* | 23:28 |
SpamapS | yes but in this case the relation hasn't been broken.. nrpe and nagios are still related.. just a *unit* departed | 23:28 |
imbrandon | right, i dident care cuz i dont work without it | 23:29 |
imbrandon | you could still try to be working ... if they both reported to the same nrpe ... but honestly thats the incorrect way to deploy nagios i was taught | 23:29 |
imbrandon | here let me run through a line of how i was shown to deploy nagios when first learning about it at GSI logn long ago and this si from memory but i think will not be effected by the bug even if the bug needs fixwed | 23:31 |
imbrandon | ok soooo | 23:31 |
imbrandon | service we say simle html app on apache | 23:31 |
SpamapS | is_principal = not (yield self._service.is_subordinate()) | 23:31 |
SpamapS | I just love inlineCallbacks :-P | 23:31 |
imbrandon | easy one check on port 80 nrpe checking it | 23:31 |
imbrandon | i actually do ... specicaly recursive ones :( | 23:32 |
imbrandon | heheh | 23:32 |
imbrandon | anyhow so yoy got one nrpe on 80 | 23:32 |
SpamapS | imbrandon: sorry wtf are you getting on about nagios? | 23:32 |
imbrandon | you run 3 nagios in your senerio | 23:32 |
imbrandon | in the right way | 23:32 |
imbrandon | like how jenkins is run | 23:32 |
imbrandon | jenkins.qa.ubuntu.com dont run any jobs, other jenkins report their results into it | 23:33 |
SpamapS | so, in my world, NRPE is for checking things local to the box, and everything else the nagios server does direct against the host address. | 23:33 |
SpamapS | and then nsca goes the other way.. pushes things back from server -> nagios | 23:34 |
SpamapS | imbrandon: if you're talking about scaling out nagios.. we're not there yet. Lets just *configure* nagios first. | 23:34 |
imbrandon | nrpe local -> to one nagios per service --> one more nagios for whole environment | 23:35 |
SpamapS | yeah | 23:35 |
SpamapS | slow your roll | 23:35 |
SpamapS | thats later | 23:35 |
imbrandon | yea but if you do that then bug dont matter | 23:35 |
SpamapS | yes | 23:35 |
imbrandon | is what i was pointing out | 23:35 |
SpamapS | yes it does | 23:35 |
SpamapS | because I'd still be monitoring a now deceased box | 23:35 |
imbrandon | no because both services die thgen | 23:35 |
lifeless | what do I need to do to push the charm review foeward ? | 23:35 |
SpamapS | You're going to run *ALL OF NAGIOS* on the box?! | 23:35 |
imbrandon | no , well thats how i said it but not in rewality | 23:36 |
lifeless | I'm reminded of the perl joke here. | 23:36 |
imbrandon | it was for wasy explaining | 23:36 |
SpamapS | lifeless: m_3 is supposed to pilot tomorrow, but I suspect he'll be busy prepping for OSCON demo's .. so I might take his spot | 23:36 |
SpamapS | imbrandon: ok, so you're going to run a nagios for every service? | 23:37 |
imbrandon | but you end up with nrpe plugins all over , maybe even many on one box , then one nagios where ever per service name its related to if it gets another relation name then it fires up another daemon of nagios , and then both report to a 3rd | 23:37 |
bkerensa | imbrandon: hi | 23:38 |
imbrandon | maybe even on that same box as well but that 3rd will be the one the customer "reads" | 23:38 |
bkerensa | SpamapS: he is very busy preparing ^ | 23:38 |
imbrandon | SpamapS: not just every service but every service relation, but they can all still just be on one "nagios" machines physicly | 23:39 |
imbrandon | heya bkerensa , hows it goin | 23:39 |
SpamapS | imbrandon: so if I have 40 services, I have 40 /usr/sbin/nagios running? All due respect, but that sounds bat@!$% crazy | 23:39 |
SpamapS | imbrandon: nagios is perfectly capable of scaling one nagios daemon up to thousands of service checks | 23:40 |
imbrandon | lifeless: i'll have some time tonight and tomarrow as well, i'll review it and minimum give ya feed back if I think its too complex for me to +1 alone | 23:40 |
imbrandon | :) | 23:40 |
bkerensa | imbrandon: nothing much just hanging with all the juju folks except for SpamapS :D | 23:40 |
imbrandon | SpamapS: yea and no , and in yes 40 and no its the norm, or at least how i've actually seen nagios deployed realworld | 23:41 |
SpamapS | imbrandon: also for this bug.. I have a workaround.. which is to just trash anything that belongs to the primary service even if the sub relationship still thinks it should be there. | 23:41 |
imbrandon | but seriously 40 services in one env ? | 23:41 |
SpamapS | imbrandon: even 5.. I see no reason to do that. | 23:41 |
imbrandon | sure, nagios is designed just exactly for that and its light | 23:41 |
elmo | imbrandon: I've never seen nagios deployed that way or even heard anyone doing it that way, FWIW | 23:41 |
SpamapS | imbrandon: and it would still suffer the same problem, because the nrpe relationship would still be left dangling | 23:41 |
elmo | (until now) | 23:41 |
imbrandon | its like built into the setup that one is normaly a "collector" and gathers the others | 23:42 |
imbrandon | SpamapS: but would only break one instnce of the daemon | 23:42 |
imbrandon | that dont matter at that point anyhow | 23:42 |
SpamapS | imbrandon: yes thats one thing nagios can do, but thats for aggregating several monitoring boxes into one.. not for "the norm" | 23:42 |
imbrandon | soory i dont follow what you mean there | 23:43 |
imbrandon | heya elmo :) | 23:43 |
SpamapS | I could see an issue where nagios would be heavily single threaded and need more processes to scale up onto one box with lots of cores.. | 23:43 |
SpamapS | but nagios is pretty well written and is almost always just waiting on slow network stuff and disk | 23:43 |
elmo | nagios already spawns processes to distribute check load | 23:44 |
SpamapS | Yeah, I've seen pretty moderate boxes keep up just fine with thousands of checks defined and running at pretty regular intervals | 23:44 |
imbrandon | elmo: yea thats what i'm trying to unsuccessfully explain that its very common to break nagios up like that | 23:44 |
imbrandon | or other arbitrary ways | 23:44 |
imbrandon | SpamapS: yea i know for fact there is a dell 2650 in the basement reporting about 10k checks :) [ its my brother in laws machine heh ] | 23:45 |
imbrandon | and those are only like c2d 1.6ghz or something silly | 23:45 |
imbrandon | ( that dumb dude went nuts and has like one website cheked like for real 8 ways or something , like tcp, then telnet then http 80 then text exists etc etc etc ] | 23:47 |
imbrandon | i think he was just bored or was trying to make it break ) | 23:47 |
imbrandon | heh | 23:47 |
* imbrandon setups up 2 checks for php/python/rails apps ( on each node direct ) one to pull a txt file from /health-check/plain.txt and make sure its 200 with body "OK" and one to pull /health-check/dynamic.php{or .py or .erb/.rb} to get a 200 and a "OK" body thats printed by the code | 23:50 | |
imbrandon | that should cover about any thing as far as if the server is working ( not taking something slips past lint and other checks at build and/or deploy and is a app error ) | 23:51 |
SpamapS | imbrandon: IMO the really important monitor is the one that verifies traffic is flowing | 23:52 |
SpamapS | imbrandon: artificial checks are great, but I want to know that requests are happening at normal levels | 23:53 |
imbrandon | right, thats a whole nother class of check , as well as the disk space one, dunno how many times i've seen a log fill the box and totaly hose everyting | 23:53 |
imbrandon | even if most all logs are on other partiions or something , someone fraks up and writes their own or logrotate dies or some crap | 23:54 |
imbrandon | never fails | 23:54 |
* imbrandon thinks damn near every log should be sent to /dev/null on prod machines anyhow with the exception of the authlog and dmesg for hardware failures etc | 23:55 | |
imbrandon | but all service/daemons supporting services/and apps should not need or really imho be forcefully sent to /dev/null in prod | 23:56 |
SpamapS | imbrandon: I think I'll stick with the "should be sent to a central logging host asynchronously" not /dev/null | 23:59 |
imbrandon | better ways to clicktrack or anything else they might provide, and you should have a identical setup in staging to debug issues, not similar, identical, that way if something is up its in a hardware log/syslog still cuz its hardware ( or more likely delll openmanage has already told nagios that a hdd has been spun and you just get the report from in the NoC HUD's and call dell to add another when they show up in their 4 hrs and it gets hotswapped | 23:59 |
SpamapS | imbrandon: if that host decides to devnull them thats fine, but I want to be able to see them | 23:59 |
imbrandon | oh i know , just ranting at this point | 23:59 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!