[00:00] <jsonperl_> cwill_at_work... does a system with high run q and low load indicate that it's I/O bound to you?
[00:01] <jsonperl_> cwillu_at_work, sorry
[00:01] <Patrickdk> iowait% is i/o bound
[00:01] <Patrickdk> you have 0% iowait
[00:02] <jsonperl_> That post was of a healthy system... I was just trying to figure out how to interpret
[00:04] <jsonperl_> I logged some netstat numbers during the issue... I see some high send and recv Q, this may be somethin here
[00:06] <Patrickdk> that would likely be a nic driver issue, I would think
[00:06] <Patrickdk> do all your machines have the same driver/nic?
[00:07] <Patrickdk> been a long time since I attempted to diagnose or work on something at that level
[00:08] <jsonperl_> Yep every machine is identical
[00:08] <jsonperl_> Most are in LAST_ACK
[00:08] <Patrickdk> e1000e?
[00:09] <jsonperl_> but some ESTABLISHED
[00:09] <jsonperl_> like almost 200k in some queues
[00:09] <Patrickdk> it might just be the result of the issue, but it might be a cause
[00:09] <jsonperl_> the card?
[00:09] <jsonperl_> it's a broadcom hangom
[00:09] <Patrickdk> not sure what to tell you to figure it out
[00:10] <jsonperl_> Broadcom 5720
[00:10] <jsonperl_> That queue size seems pretty unhealthy right?
[00:11] <Patrickdk> what do you get for ethtool -k eth0
[00:11] <jsonperl_> I don't have a currently borked system
[00:11] <Patrickdk> doesn't matter
[00:12] <jsonperl_> http://pastebin.com/ARwR2W6K
[00:12] <Patrickdk> but atleast till someone has an idea how to diagnose this somemore, I can atleast throw you some things to see if they have any effect
[00:12] <Patrickdk> if they do, it's likely the cause, of not, just an effect
[00:13] <jsonperl_> Yea totally. You've been reallllly helpful
[00:13] <jsonperl_> This stuff is all fairly new to me
[00:13] <Patrickdk> give a try: ethtool -K rx off tx off sg off tso off gso off rso off rxvlan off txvlan off eth0
[00:13] <Patrickdk> maybe again for eth1 if you use it
[00:13] <Patrickdk> opps
[00:14] <Patrickdk> ethtool -K eth0 rx off tx off sg off tso off gso off rso off rxvlan off txvlan off
[00:14] <Patrickdk> I am not sure about the broadcoms, but I know the intel driver has gone back and forth on it working and not working
[00:14] <Patrickdk> my older intel ones, I had to disable a few of those, to make it work correctly
[00:15] <Patrickdk> this will cause higher cpu usage
[00:15] <Patrickdk> I doubt it will be enough for you to notice though
[00:15] <jsonperl_> so basically turning everything off
[00:15] <Patrickdk> yep
[00:15] <jsonperl_> any potential for badness here?
[00:15] <jsonperl_> besides cpu usage
[00:15] <Patrickdk> no
[00:15] <Patrickdk> the chcksums just lower cpu usage
[00:16] <jsonperl_> But in your experience they gunk up the works sometimes?
[00:16] <Patrickdk> the rest mainly cause the nic and linux to move around 64k of data at a time, instead of one packet at a time
[00:16] <Patrickdk> gro sometimes, rxvlan on one of mine here at home
[00:16] <Patrickdk> tso I think I had an issue with on some too
[00:17] <Patrickdk> this system I am using now needs: ethtool -K eth0 rxvlan off tx off
[00:17] <Patrickdk> that leaves only gso turned on
[00:18] <Patrickdk> forget about the tx on it, but it doesn't support rxvlan, but the driver thinks it does
[00:18] <jsonperl_> Its a gigabit card, it's speed is set at 100Mb... likely the network its on
[00:19] <Patrickdk> oh, at 100mbit you will never see the increased cpu usage :)
[00:19] <jsonperl_> I'm wondering if I need to upgrade the network... I've never seen us go beyond maybe 15Mb though
[00:27] <jsonperl_> Seems like a lot of data in queue in LAST_ACK state indicates a problem with our code no?
[00:27] <jsonperl_> Basically the connection has been severed on their end, but we haven't gotten rid of it
[00:28] <jsonperl_> TCP:   458 (estab 50, closed 96, orphaned 63, synrecv 0, timewait 1/0), ports 0
[00:28] <jsonperl_> Lotta orphans
[00:29] <Patrickdk> dunno what a LAST_ACK is
[00:30] <jsonperl_> Right before the tcp connection closes
[00:30] <Patrickdk> oh, that is actually a state
[00:30] <Patrickdk> I never see those
[00:30] <jsonperl_> I've got a bunch... perhaps thats an issue
[00:30] <Patrickdk> na, normally that for me is TIME_WAIT, where the connection was closed, but not properly
[00:31] <sarnold> "The remote end has shut down, and the socket is closed. Waiting for acknowledgement."
[00:31] <Patrickdk> ya, sounds like your sending it data, but it's not responding
[00:31] <Patrickdk> oh
[00:31] <sarnold> is there a funny firewall in the way preventing those packets?
[00:32] <jsonperl_> just iptalbes
[00:32] <Patrickdk> hmm, odd though, never seen them, just the FIN_WAIT TIME_WAIT mainly
[00:34] <jsonperl_> Here so you have some idea what I'm looking at: http://pastebin.com/9RNzEbb9
[00:34] <jsonperl_> ips jiggled to protect the innocent :)
[00:36] <jsonperl_> I wonder if we're just shipping data to a "almost closed" socket, and filling up the tcp queue
[00:42] <sarnold> jsonperl_: the 'slabtop' utility ought to be able to show you if TCP is eating too much of your memory
[00:43] <jsonperl_> I'll check it out
[00:43] <jsonperl_> though memory utilization is quite good now
[00:43] <jsonperl_> (with a little help from my buddy PatrickDK)
[00:49] <jsonperl_> gotta head home... thanks folks, back later
[00:49] <sarnold> have fun :)
[01:11] <jsonperl> back for more!
[01:31] <jsonperl> PatrickDK
[01:32] <jsonperl> I just had a system flake out… I hit those networking settings live, and it seems to have fixed it?
[01:32] <jsonperl> (super super anecdotally)
[01:32] <Patrickdk> dunno :)
[01:32] <jsonperl> so your theory there is that there is a driver issue with the card?
[01:32] <Patrickdk> personally, I would put those on like 3 or so, and see
[01:33] <Patrickdk> well, driver or firmware
[01:33] <jsonperl> yea the whole "it didn't explode" thing is a really frustrating way to prove stuff :)
[01:33] <Patrickdk> more likely driver, but firmware could affect the drivers actions
[01:33] <jsonperl> so by turning all of that off, we reduce the load on the card essentially?
[01:33] <jsonperl> and let the os take care of stuff
[01:33] <Patrickdk> well, it puts the card into normal dumb mode basically
[01:34] <Patrickdk> instead of attempting to limit interrupts, and queue up requests and stuff
[01:34] <Patrickdk> and offloading some of the work
[01:34] <Patrickdk> it might be there is some kind of buffer overrun happening on the nic, causing the issue
[01:34] <Patrickdk> but I'm totally random guessing
[01:35] <jsonperl> me2
[01:35] <jsonperl> :D
[01:35] <Patrickdk> but now since that is off, nothing is really getting buffered
[01:35] <jsonperl> oh man, if this fixes the problem
[01:35] <Patrickdk> I have had issues with broadcom drivers before, but not on linux
[01:35] <Patrickdk> but then, I really have not used broadcom on linux so :)
[01:35] <jsonperl> I use what they rent me :)
[01:35] <jsonperl> (peer1 / serverbeach)
[01:37] <sarnold> jsonperl: was that an ethtool command that seems to be fixing it?
[01:37] <jsonperl> yep
[01:38] <jsonperl> ethtool -K eth0 rx off tx off sg off tso off gso off rxvlan off txvlan off
[01:38] <jsonperl> 'seems' being the operative word
[01:40] <Patrickdk> if your really interested, start knocking one off at a time, till it acts up again :)
[01:41] <jsonperl> hahahaha
[01:41] <jsonperl> oh man, the fact that thats a reasonable thing to do kinda of makes me ill :)
[01:42] <sarnold> :)
[01:43]  * Patrickdk bets on the tso or gso
[01:44] <jsonperl> im gonna turn everything off on all machines
[01:44] <Patrickdk> could be tx, but normally not
[01:44] <jsonperl> then i'll pull those on one of them
[01:44] <jsonperl> so do tso, gso, and tx in that order huh :)
[01:44] <Patrickdk> or, pull a different one per machine? :)
[01:44] <sarnold> yeah, I'm also suspicious of tso and gso
[01:44] <jsonperl> ahahha
[01:44] <sarnold> and it feels like 'sg' would be nice to have back
[01:44] <Patrickdk> I have no idea what sg is, never bothered by it before :)
[01:45] <sarnold> (at least I assume it means Scatter/Gather)
[01:45] <Patrickdk> it does
[01:46] <jsonperl> oh man, im excited
[01:46]  * Patrickdk locates a bed
[01:46] <jsonperl> I MAY BE ABLE TO SLEEP
[01:46] <sarnold> g'night :)
[01:46] <jsonperl> cya Patrick, thanks again
[01:52] <jsonperl> allright, all machines updated
[01:52] <jsonperl> now I wait :)
[01:52] <jsonperl> sarnold/Patrickdk, it makes sense those settings kick in live right?
[01:52] <jsonperl> no networking restart or anything
[01:53] <sarnold> jsonperl: right
[01:53] <jsonperl> good… because if it didn't that would disprove that it fixed it ;)
[07:27] <jamespage> yolanda, https://code.launchpad.net/~james-page/glance/sqlalchemy-bump/+merge/176613 if you are around :-)
[07:27] <jamespage> zul, ^^
[07:27] <yolanda> morning
[07:27] <jamespage> I'm gonna review all packages today
[07:27] <yolanda> great
[07:28] <jamespage> yolanda, morning!
[07:28] <yolanda> jamespage, bad news, since this branch is on ubuntu-server-dev, i don't have permissions
[07:30] <jamespage> yolanda, just need a review
[07:30] <jamespage> not a merge
[07:30] <jamespage> I'll do that myself
[07:30] <yolanda> jamespage, assign me as a reviewer
[07:30] <yolanda> otherwise i can't
[07:31] <yolanda> i don't have the permissions to "Request review"
[07:31] <jamespage> yolanda, dog
[07:31] <jamespage> doh rather
[07:31] <jamespage> yolanda, done
[07:33] <yolanda> ok, reviewed, i cannot change the main status anyway
[07:34] <jamespage> ack
[07:34] <jamespage> thanks
[10:25] <ThothCastel> how to check if ssh server is running?
[10:26] <mardraum> ThothCastel: service ssh status
[10:26] <greppy> ps -ef | grep ssh
[10:30] <ThothCastel> greppy: mardraum: thanks, it's running, however I am unable to connect to it via ssh :S
[10:31] <mardraum> what exactly happens? use pastebin if you must
[10:36] <jamespage> zul, when you start review needed please - https://code.launchpad.net/~james-page/neutron/fixup-h2/+merge/176650
[10:43] <cwillu_at_work> greppy, "sshd"
[11:05] <jamespage> zul, you might wanna take a look at the python-greenlet upload you did yesterday
[11:05] <jamespage> it blasted all of the python3 work that you did in the previous two ubuntu versions
[11:06] <jamespage> (which is why its block in proposed right now)
[11:35] <zul> jamespage:  fuuuuuu
[11:38] <ikonia> zul: ?
[12:14] <streulma> hello, I can upgrade my kernel on Ubuntu Server 12.04, but when I reboot, the server don't boot and hangs, it's KVM virtualisation
[12:18] <jamespage> zul, hey - I also uploaded trivial fixes for keystone and glance autopkgtest failures
[12:19] <jamespage> I'm stuffing them into havana staging as well
[12:19] <zul> jamespage:  ack
[12:21] <jamespage> streulma, anything on the console?
[12:21] <streulma> there is on the moment a problem with console, the isp upgraded to new version of OnApp
[12:22] <streulma> but before I had the problem
[12:22] <streulma> it boots the kernel
[12:22] <streulma> and then hangs after keyboard...
[12:22] <streulma> before the services loads
[13:40] <zul> jamespage:  http://people.canonical.com/~chucks/ca/
[13:42] <jamespage> zul, ceilometer?
[13:43] <zul> jamespage:  yep
[13:56] <jamespage> zul, why does simplejson need "     - Build for python 3.2 as well."
[13:56] <jamespage> I know precise has python 3.2
[13:57] <jamespage> but can't a generic fix be applied in saucy which makes it a no-change backport again?
[13:58] <zul> jamespage:  because it explicity dependeon on python 3.3
[13:58] <jamespage> zul, +1 for msgpack-python
[13:59] <zul> jamespage:   python3-all-dev (>= 3.3.0-3) in the debian/control
[14:01] <jamespage> zul, ack
[14:01] <jamespage> reviewing now
[14:01] <zul> jamespage:  ill fix the saucy version
[14:01] <jamespage> zul, does it work with python3.2
[14:01] <zul> jamespage:  yeah
[14:01] <jamespage> just wondering if that why the min-versions are specced
[14:02] <zul> nothing in the changelog
[14:03] <jamespage> zul, nope
[14:04] <jamespage> and it looks OK - maybe poke piotr in #debian-python on OFTC and see if there are any gotchas
[14:04] <zul> jamespage:  nope im not uploading it, i just noticed a bug
[14:04] <jamespage> zul, do we really need the new webtest?
[14:04] <jamespage> is 1.3.3 -> 1.3.4
[14:04] <jamespage> its rather
[14:04] <zul> jamespage:  im not sure, nack it please
[14:05] <zul> jamespage:  chuck@homer:~/pbuilder/precise_result$ dpkg -c python3-simplejson_3.3.0-2ubuntu1~cloud0_amd64.deb
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/share/
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/share/doc/
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/share/doc/python3-simplejson/
[14:05] <zul> -rw-r--r-- root/root      3160 2013-07-24 09:06 ./usr/share/doc/python3-simplejson/changelog.Debian.gz
[14:05] <jamespage> zul, -1
[14:05] <zul> -rw-r--r-- root/root      1645 2011-02-15 15:56 ./usr/share/doc/python3-simplejson/copyright
[14:05] <zul> chuck@homer:~/pbuilder/precise_result$ dpkg -c python-simplejson_3.3.0-2ubuntu1~cloud0_amd64.deb
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/share/
[14:05] <jamespage> \o/
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/share/doc/
[14:05] <zul> drwxr-xr-x root/root         0 2013-07-24 09:14 ./usr/share/doc/python-simplejson/
[14:05] <zul> -rw-r--r-- root/root      7062 2013-05-01 16:01 ./usr/share/doc/python-simplejson/index.rst.gz
[14:05] <zul> -rw-r--r-- root/root      3160 2013-07-24 09:06 ./usr/share/doc/python-simplejson/changelog.Debian.gz
[14:05] <Pici> nice
[14:05] <zul> -rw-r--r-- root/root      1645 2011-02-15 15:56 ./usr/share/doc/python-simplejson/copyright
[14:05] <zul> shit!
[14:05] <Pici> its zul, so I'll let it slide... this time ;)
[14:06]  * jamespage drowns in irc
[14:06] <zul> jamespage:  tests are not enabled in that package either
[14:19] <dranix> hi everyone
[14:19] <dranix> i need some help with ldap integration with packetfence
[14:19] <dranix> anyone has any idea how to go about doing this?
[14:40] <Monotoko> if I connect to an openvpn server in the office... should it not tunnel all my internet connection through it?
[14:40] <Monotoko> I have the same IP as before...
[14:42] <rbasak> Depends on how you have it configured.
[14:43] <Monotoko> rbasak, it was configured by my predecessor - where can I check?
[14:44] <rbasak> I don't recall, sorry. Check the docs for mentions of your default gateway. I think it's a client-side setting, but you can also configure the client to accept the server's settings and then configure it on the server (IIRC).
[14:45] <rbasak> Or may default route, rather than default gateway.
[14:45] <rbasak> maybe
[14:46] <Monotoko> lots of mention of bridging...
[14:47] <raub> Trivial question: how do I upgrade a kernel module that is in use? By in use it is module for raid controller but I am booting using a live CD
[14:47] <oozbooz> Monotoko: check your routing, is default route via VPN or your ISP?
[14:48] <oozbooz> command "ip  r sh"
[14:49] <oozbooz> usually, server pushed routes to the client, but client can overwrite it or do some other tricks w/out getting server involved
[14:49] <oozbooz> pushed=pushes
[14:50] <Monotoko> oozbooz, http://pastebin.com/65vcRqk7
[14:50] <Monotoko> I tried to remove the comment in the config here: ;push "redirect-gateway def1 bypass-dhcp"
[14:51] <Monotoko> however then the client wouldn't load anything
[14:51] <oozbooz> I assume 5.10.152.225 is your ISP GW
[14:51] <oozbooz> then your internet traffic should go over it
[14:51] <Monotoko> yeah, we have a /29 I believe
[14:51] <Monotoko> when I'm connected from outside the office
[14:51] <Monotoko> I want it to still use the office IP
[14:52] <oozbooz> use office IP for ... ?
[14:52] <oozbooz> you mean send your ALL traffic via the tunnel?
[14:52] <Monotoko> yeah - it's static - a lot of people who work here work from homes etc, with dynamic IP's
[14:53] <Monotoko> I'd rather they all used our network to make it easier to firewall the servers and not keep punching random holes in the FW
[14:55] <oozbooz> I don't get your last statement ...
[14:55] <oozbooz> usually, you want to only relevant traffic to send to your office via the tunnel,
[14:55] <oozbooz> rest of the stuff, they should use their ISP
[14:56] <oozbooz> why would you want them to download youtube videos using office bandwidth
[14:56] <rbasak> I'd say it depends. Road warriors might prefer everything to go via the office if they don't trust the connections they're using (coffee shops, hotels, etc)
[14:56] <Monotoko> oozbooz, we have a "cloud" provider off site that I need to give developers access to, and certain things that they can log into through the web browser but only from this IP
[14:57] <oozbooz> aha
[14:57] <oozbooz> 3rd party mess..
[14:58] <Monotoko> aye - obviously I need a static IP I can trust for that, so I'd rather tunnel everyone through our office network
[14:58] <oozbooz> well... you can create a new route that only traffic for cloud provider goes via the tunnel
[15:00] <Monotoko> hmm, what route would I be adding for that? route add 1.2.3.4 gw 5.10.152.227 eth0 ?
[15:01] <oozbooz> but, if you decide to divert all traffic, you will have to change routing rules on the server, that will be pushed to the client
[15:01] <oozbooz> which VPN server do you use
[15:01] <Monotoko> openvpn
[15:01] <zul> jamespage:  simplejson fixed locally ill upload to the regular archive and get it for the cloud archive as well
[15:02] <oozbooz> openvpn or openvpn-AS?
[15:02] <Monotoko> regular openvpn AFAIK
[15:02] <Monotoko> yeah
[15:03] <Monotoko> just checked with dpkg
[15:03] <oozbooz> ok, first my advice to upgrade to openvpn-AS - much easier to manage
[15:04] <oozbooz> there is IRC channel "openvpn", you should confirm with them... but it should be not difficult
[15:05] <Monotoko> cheers oozbooz
[15:06] <oozbooz> have fun
[15:07] <zul> jamespage:  http://people.canonical.com/~chucks/ca/
[15:14] <zul> smb: ping i was wondering if you could offer some insight on it https://launchpadlibrarian.net/145685953/buildlog_ubuntu-precise-amd64.xen_4.2.2-1ubuntu1~cloud0_FAILEDTOBUILD.txt.gz
[15:15] <smb> zul, maybe, let me read
[15:16] <zul> smb: this is on precise
[15:17] <smb> zul, Looks like the known problem of passing LDFLAGS in gcc format -Wl but don't we work around that
[15:18] <zul> smb: yeah seems to ignore that for some reason
[15:18] <smb> And why do you compile xen 4.2.2 on Precise?
[15:18] <smb> :-P
[15:18] <smb> Still have not cleared theat MRE
[15:19] <smb> Actually I would not aim 4.2.2 immediately but 4.1.5... or .6 but anyway
[15:20] <jamespage> zul, +!
[15:20] <jamespage> +!
[15:20] <jamespage> +1 rather
[15:20] <zul> jamespage:  cool thanks
[15:21] <smb> zul, "LDFLAGS = $(shell dpkg-buildflags --get LDFLAGS|sed -e 's/-Wl,//g')" in debian/rules?
[15:21] <jamespage> zul, https://code.launchpad.net/~james-page/neutron/fixup-rootwrap-conf/+merge/176708
[15:21] <soahccc> I'm more the nginx kind of guy so what did I missed here? Installed apache, changed port (so that it won't conflict with nginx), getting this nestat "tcp    0     0 0.0.0.0:8080     0.0.0.0:*       LISTEN      0      185527477   18552/apache2" but it just reacts to local requests. There is no iptable rule... Any ideas, I'm desperate :(
[15:26] <smb> zul, Just out of curiosity is that the 4.2.2 version from current Saucy?
[15:26] <zul> yeah
[15:27] <smb> zul, Hm, so it has that line... but for some reason I vaguely remember something going wrong with something like this (but I believe that was another package)
[15:29] <smb> zul, Oh wait maybe because in P LDFLAGS is exported by the build system...
[15:29] <zul> hmm...interesting ill try it out
[15:30] <smb> zul, Is that LDFLAGS := instead of LDFLAGS =
[15:39] <smb> zul, Oh I think I can imagine what is going on: we do not set LDFLAGS at all by default in newer releases. So when compiling in S I did not notice none of them being used and setting LDFLAGS in debian/rules being useless
[15:39] <smb> But in P when they are set by default it fails...
[15:39] <zul> smb:  so disable it?
[15:40] <smb> zul, I'd probably try either an export in debian/rules or move the definition into debian/rules.real for a moment
[15:40] <zul> smb:  ok ill try that
[15:41] <smb> zul, And I need to make sure I really use those flags in the Xen 4.3 I am preparing
[15:41] <smb> for S that is
[15:43] <zul> smb:  when are you doing 4.3?
[15:43] <smb> zul, I am just about to think I got all pieces together. Testing it on my boxes
[15:44] <zul> smb: ok cool
[15:45] <med_> zul, a user asked me if there will be any quantum-> neutron renaming in raring or earlier (and similarly, anything before havana)
[15:45] <med_> my answer was "NO, but I'll check with zul"
[15:46] <zul> med: no quantum in raring was quantum
[15:46] <med_> nod.
[15:47]  * med_ was pretty sure it was only a cease and desist not a "go undo the world"
[15:59] <roaksoax> Madkiss: howdy! have you looked into packaging dlm?
[16:00] <zul> smb:  nope neither worked
[16:01] <smb> zul, Hm, ok need to figure out how to modify it correctly for the actual compile. Seems the more recent releases just don't use any
[16:02] <smb> I mean it does not get passed in and fails because where we change it somehow does not replace the default of the system
[16:04] <smb> zul, Doing the export did break the build in the same way on S though... So maybe := is the second missing piece
[16:13] <smb> zul, having LDFLAGS= and export LDFLAGS both in rules.real seems to make the compile run longer (not finished yet)
[16:14] <zul> smb:  can i see a snippet your rules.real please?
[16:17] <Daviey> rbasak: BTW, merges.py won't work right now - until egress firewall is more relaxed.  Have raised RT
[16:20] <rbasak> Daviey: OK, thanks.
[16:20] <Daviey> roaksoax: Hey, does Openstack / Kombu support Rabbit Active/Active in Havana?
[16:20] <rbasak> I'll try and keep people.canonical.com/~rbasak/delta.py updated in the mean time, though note that I'm doing it manually.
[16:22] <roaksoax> Daviey: I haven't check yet, sorry! I'm doing the whole upgrade process of the clustering tools, whcih is not as easy as syncing packages from debian
[16:24] <adam_g> Daviey, the issue wasn't active/active its the lack of any type of heartbeating support, so that the rpc layer (quickly) detects failure and migrates to a new server
[16:27] <jsonperl> Patrick, I still got the issue, but I think I'm getting closer
[16:28] <jsonperl> Patrickdk that is
[16:28] <jsonperl> Would a BUNCH of connections in CLOSE_WAIT stop up the tcp pipeline at some point?
[16:40] <zul> jamespage:  still around?
[16:44] <jamespage> zul, yes
[16:44] <zul> jamespage:  one more for you today http://people.canonical.com/~chucks/ca/
[16:45] <jamespage> zul, does that one build against the havana-staging PPA?
[16:45] <zul> jamespage:  just finished building
[16:45] <jamespage> zul, +1 then
[16:45] <zul> jamespage:  thanks
[16:47] <patdk-wk> jsonperl, if that is the case, a couple of issues could be the case
[16:47] <patdk-wk> open file handles?
[16:48] <patdk-wk> or just exaustion of resources
[16:48] <patdk-wk> maybe look here, it seems to have an ok description of the sysctl's involved
[16:48] <patdk-wk> http://www.ufirsttech.com/content/linux-kernel-settings-related-tcp-connections-68
[16:48] <jsonperl> Awesome thanks
[16:49] <patdk-wk> normally there are several sysctls that need to be adjusted for any kind of high performance server
[16:49] <patdk-wk> expecially when handling lots of connections
[16:49] <jsonperl> In this case it's actually a library i use to hit amazon s3
[16:49] <patdk-wk> don't think any of this would cause that single cpu usage issue though
[16:49] <jsonperl> which is the least often used connection i got
[16:49] <jsonperl> I think all of what we were seeing is a RESULT of connectivity issues
[16:50] <jsonperl> no players = no processing
[16:50] <patdk-wk> oh, that page uses proc, I normally do it via sysctl instead
[16:50] <jsonperl> I think the ethtool command to change stuff maybe reset the stuck connections?
[16:50] <patdk-wk> jsonperl, still :)
[16:50] <jsonperl> making it look fixed
[16:50] <patdk-wk> setup a ping
[16:50] <patdk-wk> see if you start missing, or get delayed pings
[16:51] <patdk-wk> if your running tcpdump on the server at the time too, watching just for icmp
[16:51] <jsonperl> ok, we use pingdom… that sufficient you think?
[16:51] <patdk-wk> you should be able to easily tell
[16:51] <jsonperl> I actually try tcp to the server every minute
[16:51] <patdk-wk> isn't that like once a minute?
[16:51] <jsonperl> yea
[16:51] <jsonperl> You're thinking more often?
[16:51] <patdk-wk> ya, I would go second, and watch delays
[16:51] <patdk-wk> you want to know how long it takes, you know it gets there ,and responds
[16:52] <patdk-wk> you want to know if it gets lost, or delayed
[16:52] <patdk-wk> well, tcp would get lost and retried
[16:52] <patdk-wk> but ping would just get lost
[16:52] <jsonperl> Any service you can recommend? or you just do it from another box
[16:52] <patdk-wk> I normally just do it from my home box
[16:52] <jsonperl> gotcha
[16:52] <patdk-wk> or a work computer
[16:52] <patdk-wk> not like ping uses much traffic
[16:53] <jsonperl> Doesn't feel very enterprisey :D
[16:53] <patdk-wk> now if you want to take it a step more, use mtr :)
[16:53] <patdk-wk> so you can see where the issue actually happens, if it's network related
[16:53] <jsonperl> It's not
[16:53] <jsonperl> this is my boxes
[16:54] <jsonperl> I wish it were somebody elses fault!
[16:54] <patdk-wk> no, if you think the issue was you aren't receiving the players traffic
[16:54] <patdk-wk> that would be network issue :)
[16:54] <patdk-wk> ping would easily show that
[16:54] <jsonperl> But I see the same issue cross machines, cross facilities
[16:54] <jsonperl> different parts of the US
[16:54] <jsonperl> same issue
[16:54] <patdk-wk> not likely then
[16:55] <patdk-wk> I really don't know where to go
[16:55] <patdk-wk> unless I actually get on it and dig around and maybe setup my own stuff to monitor it
[16:55] <jsonperl> I feel like i need to get rid of those orphaned connections
[16:55] <patdk-wk> but not even sure how good I could do that
[16:55] <jsonperl> Want a consulting job? :D
[16:55] <patdk-wk> I have enough of those :)
[16:55] <jsonperl> haha
[16:56] <jsonperl> But we're a super entertaining indie game company
[16:56] <jsonperl> like on the tv :D
[16:56] <jsonperl> So real quickly...
[16:57] <jsonperl> Do you believe it's possible that piling up of CLOSE_WAIT connections eventually can lead to connectivity issues in the tcp stack?
[16:57] <jsonperl> or am I going up the wrong road here
[16:57] <patdk-wk> it can, I doubt your anywhere near that though
[16:57] <patdk-wk> I doubt your even >5% of the limit
[16:58] <jsonperl> Does the OS limit per process?
[16:58] <patdk-wk> check ulimit for that
[16:58] <jsonperl> k
[16:58] <patdk-wk> remember, tcp connections are file handles, and count with open files
[16:59] <jsonperl> So what seems like a clue to me is
[16:59] <jsonperl> Turning everything off with ethtools fixed "the glitch"
[16:59] <jsonperl> Temporarily
[17:00] <jsonperl> No question… went from "very borked" to normal the moment I changed the settings
[17:00] <patdk-wk> what kernel you running on these?
[17:01] <jsonperl> 3.2.0-38-generic-pae #61-Ubuntu SMP Tue Feb 19 12:39:51 UTC 2013 i686 i686 i386 GNU/Linux
[17:01] <patdk-wk> hmm, 32bit
[17:01] <patdk-wk> why not 64?
[17:02] <jsonperl> actually wait… that box is an oddball
[17:02] <jsonperl> the rest are 64
[17:02] <patdk-wk> :)
[17:02] <patdk-wk> using any dkms modules?
[17:02] <patdk-wk> I doubt you are
[17:02] <jsonperl> 32 was to save memory
[17:02] <jsonperl> these are the rest 3.2.0-49-generic #75-Ubuntu SMP Tue Jun 18 17:39:32 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
[17:03] <jsonperl> dkms? I donno what that is
[17:03] <patdk-wk> addon modules for the kernels
[17:03] <jsonperl> ah… can I list em?
[17:03] <jsonperl> it's pretty stock 12.04
[17:03] <patdk-wk> like, vmware drivers, xtables, ....
[17:03] <patdk-wk> nvidia
[17:03] <jsonperl> ah, i doubt
[17:03] <jsonperl> no video
[17:03] <patdk-wk> normally should show up in dpkg -l | grep dkms
[17:03] <jsonperl> no virtualization
[17:03] <jsonperl> nothin
[17:04] <patdk-wk> since I still believe it's a kernel issue
[17:04] <patdk-wk> might be worth giving a 3.8 kernel test on it
[17:04] <patdk-wk> though, on all my servers I haven't hit this issue, but then, I likely wouldn't have noticed either
[17:06] <patdk-wk> I'm using 3.8 on my firewall machines for the newer firewall stuff in it
[17:07] <patdk-wk> to install it, apt-get install linux-generic-lts-raring linux-tools-lts-raring
[17:07] <patdk-wk> then reboot
[17:07] <patdk-wk> you can always uninstall it too
[17:07] <jsonperl> Was just looking into that?
[17:08] <jsonperl> How do you downgrade?
[17:08] <patdk-wk> it's just a new grub kernel option
[17:08] <patdk-wk> just select a different one
[17:08] <patdk-wk> then once it's booted apt-get remove those two
[17:08] <patdk-wk> I had all kinds of dkms issues with it
[17:08] <patdk-wk> cause I needed both vmware and xtables dkms modules
[17:09] <jsonperl> Makes sense… That's why i like to stay 2 steps behind bleeeeeding edge
[17:09] <patdk-wk> I really wanted the bufferbloat stuff in 3.8 though :)
[17:09] <patdk-wk> for the firewall, and firewall needs xtables :)
[17:09] <patdk-wk> all my other machines are normal 64bit 12.04 though
[17:10] <patdk-wk> but I wonder if the issue your having got fixed in the kernel already
[17:10] <patdk-wk> and there is a LOT of changelogs to read to find out easily
[17:10] <patdk-wk> without just testing it
[17:11] <jsonperl> Or testing that it doesn't happen to explode
[17:11] <jsonperl> over a period of days :)
[17:12] <patdk-wk> I guess we could always setup an ice, and test it there :)
[17:12] <jsonperl> ice?
[17:12] <patdk-wk> http://en.wikipedia.org/wiki/In-circuit_emulator
[17:13] <patdk-wk> when you go there, it's not pretty
[17:14] <patdk-wk> I guess these days people would just use a vm
[17:15] <patdk-wk> but oldschool it was using an ice
[17:17] <jsonperl> gotcha… yep that's before my time!
[17:17] <jsonperl> mtr is cool
[17:17] <jsonperl> cept allll my packets are lost on the way to my server
[17:18] <jsonperl> Must be clipping all but the first
[17:21] <jsonperl1> whoops
[17:33] <jsonperl> btw I would be HAPPY to give you access to the box :)
[17:43] <jpds> jsonperl: Sounds like a dreadful idea from a security point-of-view.
[17:44] <jsonperl> haha
[17:44] <jsonperl> Truth
[18:42] <rizzuh> Hey guys. I want to install Redis on a 12.04 Azure Extra Small VM. It has only 768MB of RAM available. How can I find the RAM usage and what steps should I follow to minimize memory usage, so Redis can have the lion's share?
[18:43] <sarnold> rizzuh: measuring memory use is a bit complicated; 'free' will give you a very quick overview of free memory on the system, the -/+ buffers/cache line is probably most important summary of the summary..
[18:43] <sarnold> rizzuh: ps auxw or top (sorted with M), look for the highest RSS numbers, that's what's actually resident in RAM for those programs..
[18:44] <sarnold> rizzuh: ut sometimes shared libraries take a pile, the 'smem' tool can help you find out wihch processes have which shared libraries loaded, and apportions to each of them a certain amout of the fault for the memory used by those shared libraries
[18:47] <rizzuh> sarnold, well ATM top shows 554478k free - if that isn't woefully inaccurate it's pretty good
[18:48] <sarnold> rizzuh: well, "free" is a funny thing. the kernel keeps some memory around, free, to handle spikes of allocations. but it tries to minimize the amount of free memory because free memory is wasted memory. :)
[18:48] <rizzuh> sarnold, ahh, sure, free as in not reserved by an app. If it's full of cache that ain't an issue.
[18:48] <sarnold> rizzuh: that's where the -/+ buffers/cache line comes in -- that includes memory that is currently being used for storing in ram copies of files but _could_ be thrown away under pressure
[18:49] <sarnold> rizzuh: *nod* *nod*
[18:55] <rizzuh> sarnold, that said, 500 MB RAM to use is good, but damn this thing is slow. Good that Redis doesn't need much processing power. It's taking a while to update a few apt packages.
[18:56] <sarnold> rizzuh: at least the amazon micro instances are very heavily penalized in much the same way.. not bad for slight spikes in a mostly-idle environment, but installing a few hundred packages is -painful-
[18:57] <jsonperl1> yea those micros
[18:57] <jsonperl1> i'm fairly sure they arbitrarily throttle you...
[18:57] <rizzuh> sarnold, yeah these are pretty much the same as AWS micro. 5 Mbit network as well, not great.
[18:57] <sarnold> if the azure storage can be moved among instances, it might even make sense to turn it off, attach to a good instance, upgrade, and move back to cheap again.. heh.
[18:57] <sarnold> rizzuh: 5MBit? wow!
[18:58] <rizzuh> The next one is small at $50 a month, with 1.5GB RAM and a dedicated core. Oh and 100 Mbit network or something like that.
[18:59] <rizzuh> But then through BizSpark we pay 33% less. "Pay", as we have $150 credit / dev, with production usage rights, so it's pretty good for the money :P
[19:50] <jsonperl> Patrickdk, so running simulators at a box… I'm able to REALLLLLY pile up on LAST_ACK state connections
[19:50] <jsonperl> Over about 20 minutes, I'm able to get to a count of 450 or so
[19:51] <patdk-wk> nice
[19:51] <jsonperl> Seems odd right?
[19:52] <patdk-wk> something isn't closing the connection correctly
[19:52] <patdk-wk> might just be normal for ios, no idea though
[20:02] <jsonperl> Our server was trying to "close a connection after writing remaining data"
[20:02] <jsonperl> I changed it to just close the connection, seems to fix that at least
[20:34] <jsonperl> sarnold: Ive dumped some dmesg output from blocked processes, but still unclear how to read it
[20:35] <hallyn> jdstrand: would adding AUDIT_WRITE to libvirtd apparmor policy be acceptable?
[20:41] <jdstrand> hallyn: usr.sbin.libvirtd?
[20:46] <hallyn> yes
[20:55] <jdstrand> hallyn: that's fine, libvirtd is not really confined anyway (the VMs it launches are)
[20:55] <jdstrand> hallyn: let me point you at a bug though
[20:55] <jdstrand> hallyn: actually, nm, you should be ok
[20:59] <hallyn> jdstrand: ok, thanks.  (i consider this ultra-low priority)
[20:59] <hallyn> zul: ^ if you happen to be merging libvirt soon-ish, we should toss that in i guess (there is an open bug requesting it)
[21:05] <jsonperl> netstat -s output… does anything here look overly concerning? http://pastebin.com/bnzEFRPh
[21:05] <thumper> hi hallyn
[21:05] <thumper> hallyn: thanks for the comprehensive email
[21:05] <thumper> it has me thinking...
[21:06] <thumper> hallyn: also, lxc-device isn't available in the precise lxc that we are limited to
[21:08] <sarnold> jsonperl: 10878 invalid SYN cookies received
[21:08] <sarnold> jsonperl: that seems steep.
[21:08] <jsonperl> take the system down steep
[21:08] <jsonperl> ?
[21:08] <sarnold> maybe it's normal on the internet now, but .. it'd be worth asking your host if you're under attack..
[21:08] <sarnold> jsonperl: what's this machine -do-?
[21:09] <jsonperl> serves a game via a persistent tcp connection to a bunch of users
[21:09] <jsonperl> at this time only about 50-100 concurrent on that machine
[21:09] <jsonperl> distributed amongst 14 servers on that machine
[21:10] <hallyn> thumper: are you actually limited to the stock precise lxc, or could you use lxc from the ubuntu-lxc ppa for precise?  AFAIUI you're using ppas anyway....  but in any case lxc-device is just a nicety, you do NOT need it :)
[21:11] <thumper> hallyn: possibly not necessarily limited to stock lxc
[21:11] <thumper> but I've not considered extra ppas
[21:11] <thumper> managed to not really need it at this stage
[21:12] <thumper> hallyn: this would be on every machine, and I don't think we install ppas on every machine
[21:12] <hallyn> thumper: well lxc-device itself isn't enough of a reason to switch to ppa i don't think
[21:12]  * thumper nods
[21:12] <thumper> I need to find someone who knows maas
[21:12] <thumper> to work out how to do the "gimmie a nic" thing
[21:12] <hallyn> thumper: is it acceptable to simply start up the container after getting the nic from <whatever hands it to you> ?
[21:12] <thumper> yes, I think we can do that
[21:13] <hallyn> cool, that'll be easiest
[21:13] <thumper> as long as the getting a nic doesn't take too long
[21:13] <thumper> < 10s would be ok I think
[21:13] <thumper> longer than that and we might need to work out something else
[21:13] <thumper> by something else
[21:13] <thumper> just a better work flow
[21:13] <jsonperl> sarnold: Any ideas for further investigation into the invalid syn cookies?
[21:14] <thumper> hallyn: I wish I knew about the "no network conf" bit to use the host
[21:14] <thumper> that would have been a good enough setting by default I think
[21:14] <jsonperl> an attack certainly could explain the very random connectivity issues we've seen
[21:14] <thumper> I need to consider the implications for the local provider
[21:14] <hallyn> thumper: i don't follow.  you mean lxc.network.empty ?
[21:14] <thumper> no, the number 2
[21:15] <thumper> no network entry
[21:15] <sarnold> jsonperl: syn packets tie up kernel memory; syn cookies are one way to tyr to avoid the worst of the kernel memory use. for some good backgroud information, see http://lwn.net/Articles/277146/
[21:15] <sarnold> jsonperl: /etc/sysctl.conf has a configuration you can set to turn on syn cookies
[21:15] <thumper> also I need to work out how to have a nice api to our internal providers, and how to handle that config with the containers
[21:16] <jsonperl> ok, thanks for the read
[21:16] <thumper> the brain is busy handling this with a background process :)
[21:16] <thumper> I think I almost have it :)
[21:16] <jsonperl> sarnold: if netstat is reporting invalid syn cookies, doesn't that mean they're on?
[21:17] <sarnold> jsonperl: maybe? :)
[21:26] <jsonperl> sarnold is that the only thing of concern that popped out at ya?
[21:28] <sarnold> jsonperl: the high connection counts made me wonder, but the use makes sense, hehe
[21:28] <jsonperl> Kids jumping in and out of the game
[21:28] <sarnold> sorry nothing just stands out to me ;(
[21:29] <jsonperl> worlds exist on one server on one machine, and they can "teleport" between them
[21:29] <jsonperl> haha ok :)
[21:31] <jsonperl> sarnold: good reading on syncookies thanks
[21:50] <thumper> hallyn: still around?
[21:50] <hallyn> thumper: yup
[21:51] <thumper> hallyn: thinking about number four, where we create a veth pair
[21:51] <thumper> hallyn: if the container hasn't been started, there is no network namespace right?
[21:51] <thumper> or is there?
[21:52] <hallyn> nope.
[21:52] <thumper> also, this "sudo lxc-unshare -s NETWORK -- /bin/bash" seems like it does something intersting I don't quite grok
[21:52] <hallyn> thumper: that's just doing the same thing as creating a container.
[21:52] <hallyn> it starts a task inside a new, private network ns
[21:52] <hallyn> as for veth - if MAAS/openstack/ec2 will hand you a nic, then ignore veths
[21:53] <hallyn> lxc.network.type = veth will always create a new veth pair and attach the one end to lxc.network.link.
[21:53] <thumper> well openstack won't
[21:53] <thumper> ah, I was going to ask what the link bit was
[21:54] <thumper> hallyn: can I run my idea past you?
[21:54] <hallyn> so if you *were* going to use veth, which my feeling is you won't, then you would bridge whatever you get <handwaving> from openstack to br0, then say lxc.network.type = veth lxc.network.link=br0
[21:54] <hallyn> sure
[21:54] <thumper> hallyn: although #juju-dev might be better