[00:00] cwill_at_work... does a system with high run q and low load indicate that it's I/O bound to you? [00:01] cwillu_at_work, sorry [00:01] iowait% is i/o bound [00:01] you have 0% iowait [00:02] That post was of a healthy system... I was just trying to figure out how to interpret [00:04] I logged some netstat numbers during the issue... I see some high send and recv Q, this may be somethin here [00:06] that would likely be a nic driver issue, I would think [00:06] do all your machines have the same driver/nic? [00:07] been a long time since I attempted to diagnose or work on something at that level [00:08] Yep every machine is identical [00:08] Most are in LAST_ACK [00:08] e1000e? [00:09] but some ESTABLISHED [00:09] like almost 200k in some queues [00:09] it might just be the result of the issue, but it might be a cause [00:09] the card? [00:09] it's a broadcom hangom [00:09] not sure what to tell you to figure it out [00:10] Broadcom 5720 [00:10] That queue size seems pretty unhealthy right? [00:11] what do you get for ethtool -k eth0 [00:11] I don't have a currently borked system [00:11] doesn't matter [00:12] http://pastebin.com/ARwR2W6K [00:12] but atleast till someone has an idea how to diagnose this somemore, I can atleast throw you some things to see if they have any effect [00:12] if they do, it's likely the cause, of not, just an effect [00:13] Yea totally. You've been reallllly helpful [00:13] This stuff is all fairly new to me [00:13] give a try: ethtool -K rx off tx off sg off tso off gso off rso off rxvlan off txvlan off eth0 [00:13] maybe again for eth1 if you use it [00:13] opps [00:14] ethtool -K eth0 rx off tx off sg off tso off gso off rso off rxvlan off txvlan off [00:14] I am not sure about the broadcoms, but I know the intel driver has gone back and forth on it working and not working [00:14] my older intel ones, I had to disable a few of those, to make it work correctly [00:15] this will cause higher cpu usage [00:15] I doubt it will be enough for you to notice though [00:15] so basically turning everything off [00:15] yep [00:15] any potential for badness here? [00:15] besides cpu usage [00:15] no [00:15] the chcksums just lower cpu usage [00:16] But in your experience they gunk up the works sometimes? [00:16] the rest mainly cause the nic and linux to move around 64k of data at a time, instead of one packet at a time [00:16] gro sometimes, rxvlan on one of mine here at home [00:16] tso I think I had an issue with on some too [00:17] this system I am using now needs: ethtool -K eth0 rxvlan off tx off [00:17] that leaves only gso turned on [00:18] forget about the tx on it, but it doesn't support rxvlan, but the driver thinks it does [00:18] Its a gigabit card, it's speed is set at 100Mb... likely the network its on [00:19] oh, at 100mbit you will never see the increased cpu usage :) [00:19] I'm wondering if I need to upgrade the network... I've never seen us go beyond maybe 15Mb though [00:27] Seems like a lot of data in queue in LAST_ACK state indicates a problem with our code no? [00:27] Basically the connection has been severed on their end, but we haven't gotten rid of it [00:28] TCP: 458 (estab 50, closed 96, orphaned 63, synrecv 0, timewait 1/0), ports 0 [00:28] Lotta orphans [00:29] dunno what a LAST_ACK is [00:30] Right before the tcp connection closes [00:30] oh, that is actually a state [00:30] I never see those [00:30] I've got a bunch... perhaps thats an issue [00:30] na, normally that for me is TIME_WAIT, where the connection was closed, but not properly [00:31] "The remote end has shut down, and the socket is closed. Waiting for acknowledgement." [00:31] ya, sounds like your sending it data, but it's not responding [00:31] oh [00:31] is there a funny firewall in the way preventing those packets? [00:32] just iptalbes [00:32] hmm, odd though, never seen them, just the FIN_WAIT TIME_WAIT mainly [00:34] Here so you have some idea what I'm looking at: http://pastebin.com/9RNzEbb9 [00:34] ips jiggled to protect the innocent :) [00:36] I wonder if we're just shipping data to a "almost closed" socket, and filling up the tcp queue [00:42] jsonperl_: the 'slabtop' utility ought to be able to show you if TCP is eating too much of your memory [00:43] I'll check it out [00:43] though memory utilization is quite good now [00:43] (with a little help from my buddy PatrickDK) [00:49] gotta head home... thanks folks, back later [00:49] have fun :) [01:11] back for more! [01:31] PatrickDK [01:32] I just had a system flake out… I hit those networking settings live, and it seems to have fixed it? [01:32] (super super anecdotally) [01:32] dunno :) [01:32] so your theory there is that there is a driver issue with the card? [01:32] personally, I would put those on like 3 or so, and see [01:33] well, driver or firmware [01:33] yea the whole "it didn't explode" thing is a really frustrating way to prove stuff :) [01:33] more likely driver, but firmware could affect the drivers actions [01:33] so by turning all of that off, we reduce the load on the card essentially? [01:33] and let the os take care of stuff [01:33] well, it puts the card into normal dumb mode basically [01:34] instead of attempting to limit interrupts, and queue up requests and stuff [01:34] and offloading some of the work [01:34] it might be there is some kind of buffer overrun happening on the nic, causing the issue [01:34] but I'm totally random guessing [01:35] me2 [01:35] :D [01:35] but now since that is off, nothing is really getting buffered [01:35] oh man, if this fixes the problem [01:35] I have had issues with broadcom drivers before, but not on linux [01:35] but then, I really have not used broadcom on linux so :) [01:35] I use what they rent me :) [01:35] (peer1 / serverbeach) [01:37] jsonperl: was that an ethtool command that seems to be fixing it? [01:37] yep [01:38] ethtool -K eth0 rx off tx off sg off tso off gso off rxvlan off txvlan off [01:38] 'seems' being the operative word [01:40] if your really interested, start knocking one off at a time, till it acts up again :) [01:41] hahahaha [01:41] oh man, the fact that thats a reasonable thing to do kinda of makes me ill :) [01:42] :) [01:43] * Patrickdk bets on the tso or gso [01:44] im gonna turn everything off on all machines [01:44] could be tx, but normally not [01:44] then i'll pull those on one of them [01:44] so do tso, gso, and tx in that order huh :) [01:44] or, pull a different one per machine? :) [01:44] yeah, I'm also suspicious of tso and gso [01:44] ahahha [01:44] and it feels like 'sg' would be nice to have back [01:44] I have no idea what sg is, never bothered by it before :) [01:45] (at least I assume it means Scatter/Gather) [01:45] it does [01:46] oh man, im excited [01:46] * Patrickdk locates a bed [01:46] I MAY BE ABLE TO SLEEP [01:46] g'night :) [01:46] cya Patrick, thanks again [01:52] allright, all machines updated [01:52] now I wait :) [01:52] sarnold/Patrickdk, it makes sense those settings kick in live right? [01:52] no networking restart or anything [01:53] jsonperl: right [01:53] good… because if it didn't that would disprove that it fixed it ;) === jtv2 is now known as jtv === smb` is now known as smb [07:27] yolanda, https://code.launchpad.net/~james-page/glance/sqlalchemy-bump/+merge/176613 if you are around :-) [07:27] zul, ^^ [07:27] morning [07:27] I'm gonna review all packages today [07:27] great [07:28] yolanda, morning! [07:28] jamespage, bad news, since this branch is on ubuntu-server-dev, i don't have permissions [07:30] yolanda, just need a review [07:30] not a merge [07:30] I'll do that myself [07:30] jamespage, assign me as a reviewer [07:30] otherwise i can't [07:31] i don't have the permissions to "Request review" [07:31] yolanda, dog [07:31] doh rather [07:31] yolanda, done [07:33] ok, reviewed, i cannot change the main status anyway [07:34] ack [07:34] thanks === Ursinha-afk is now known as Ursinha === racedo` is now known as racedo === tim___ is now known as vorpalbunny === LordOfTime is now known as LordOfTime|EC2 === tedski- is now known as tedski === vorpalbunny is now known as thumper === Tribaal_ is now known as Tribaal === thumper is now known as thumper-afk === thumper-afk is now known as thumper === psivaa_ is now known as psivaa [10:25] how to check if ssh server is running? [10:26] ThothCastel: service ssh status [10:26] ps -ef | grep ssh [10:30] greppy: mardraum: thanks, it's running, however I am unable to connect to it via ssh :S [10:31] what exactly happens? use pastebin if you must [10:36] zul, when you start review needed please - https://code.launchpad.net/~james-page/neutron/fixup-h2/+merge/176650 [10:43] greppy, "sshd" [11:05] zul, you might wanna take a look at the python-greenlet upload you did yesterday [11:05] it blasted all of the python3 work that you did in the previous two ubuntu versions [11:06] (which is why its block in proposed right now) [11:35] jamespage: fuuuuuu [11:38] zul: ? [12:14] hello, I can upgrade my kernel on Ubuntu Server 12.04, but when I reboot, the server don't boot and hangs, it's KVM virtualisation [12:18] zul, hey - I also uploaded trivial fixes for keystone and glance autopkgtest failures [12:19] I'm stuffing them into havana staging as well [12:19] jamespage: ack [12:21] streulma, anything on the console? [12:21] there is on the moment a problem with console, the isp upgraded to new version of OnApp [12:22] but before I had the problem [12:22] it boots the kernel [12:22] and then hangs after keyboard... [12:22] before the services loads === smb` is now known as smb [13:40] jamespage: http://people.canonical.com/~chucks/ca/ [13:42] zul, ceilometer? [13:43] jamespage: yep [13:56] zul, why does simplejson need " - Build for python 3.2 as well." [13:56] I know precise has python 3.2 [13:57] but can't a generic fix be applied in saucy which makes it a no-change backport again? [13:58] jamespage: because it explicity dependeon on python 3.3 [13:58] zul, +1 for msgpack-python [13:59] jamespage: python3-all-dev (>= 3.3.0-3) in the debian/control [14:01] zul, ack [14:01] reviewing now [14:01] jamespage: ill fix the saucy version [14:01] zul, does it work with python3.2 [14:01] jamespage: yeah [14:01] just wondering if that why the min-versions are specced [14:02] nothing in the changelog [14:03] zul, nope [14:04] and it looks OK - maybe poke piotr in #debian-python on OFTC and see if there are any gotchas [14:04] jamespage: nope im not uploading it, i just noticed a bug [14:04] zul, do we really need the new webtest? [14:04] is 1.3.3 -> 1.3.4 [14:04] its rather [14:04] jamespage: im not sure, nack it please [14:05] jamespage: chuck@homer:~/pbuilder/precise_result$ dpkg -c python3-simplejson_3.3.0-2ubuntu1~cloud0_amd64.deb [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/share/ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/share/doc/ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/share/doc/python3-simplejson/ [14:05] -rw-r--r-- root/root 3160 2013-07-24 09:06 ./usr/share/doc/python3-simplejson/changelog.Debian.gz [14:05] zul, -1 [14:05] -rw-r--r-- root/root 1645 2011-02-15 15:56 ./usr/share/doc/python3-simplejson/copyright [14:05] chuck@homer:~/pbuilder/precise_result$ dpkg -c python-simplejson_3.3.0-2ubuntu1~cloud0_amd64.deb [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/share/ [14:05] \o/ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/share/doc/ [14:05] drwxr-xr-x root/root 0 2013-07-24 09:14 ./usr/share/doc/python-simplejson/ [14:05] -rw-r--r-- root/root 7062 2013-05-01 16:01 ./usr/share/doc/python-simplejson/index.rst.gz [14:05] -rw-r--r-- root/root 3160 2013-07-24 09:06 ./usr/share/doc/python-simplejson/changelog.Debian.gz [14:05] nice [14:05] -rw-r--r-- root/root 1645 2011-02-15 15:56 ./usr/share/doc/python-simplejson/copyright [14:05] shit! [14:05] its zul, so I'll let it slide... this time ;) [14:06] * jamespage drowns in irc [14:06] jamespage: tests are not enabled in that package either === mahmoh1 is now known as mahmoh [14:19] hi everyone [14:19] i need some help with ldap integration with packetfence [14:19] anyone has any idea how to go about doing this? [14:40] if I connect to an openvpn server in the office... should it not tunnel all my internet connection through it? [14:40] I have the same IP as before... [14:42] Depends on how you have it configured. [14:43] rbasak, it was configured by my predecessor - where can I check? [14:44] I don't recall, sorry. Check the docs for mentions of your default gateway. I think it's a client-side setting, but you can also configure the client to accept the server's settings and then configure it on the server (IIRC). [14:45] Or may default route, rather than default gateway. [14:45] maybe [14:46] lots of mention of bridging... [14:47] Trivial question: how do I upgrade a kernel module that is in use? By in use it is module for raid controller but I am booting using a live CD [14:47] Monotoko: check your routing, is default route via VPN or your ISP? [14:48] command "ip r sh" [14:49] usually, server pushed routes to the client, but client can overwrite it or do some other tricks w/out getting server involved [14:49] pushed=pushes [14:50] oozbooz, http://pastebin.com/65vcRqk7 [14:50] I tried to remove the comment in the config here: ;push "redirect-gateway def1 bypass-dhcp" [14:51] however then the client wouldn't load anything [14:51] I assume 5.10.152.225 is your ISP GW [14:51] then your internet traffic should go over it [14:51] yeah, we have a /29 I believe [14:51] when I'm connected from outside the office [14:51] I want it to still use the office IP [14:52] use office IP for ... ? [14:52] you mean send your ALL traffic via the tunnel? [14:52] yeah - it's static - a lot of people who work here work from homes etc, with dynamic IP's [14:53] I'd rather they all used our network to make it easier to firewall the servers and not keep punching random holes in the FW [14:55] I don't get your last statement ... [14:55] usually, you want to only relevant traffic to send to your office via the tunnel, [14:55] rest of the stuff, they should use their ISP [14:56] why would you want them to download youtube videos using office bandwidth [14:56] I'd say it depends. Road warriors might prefer everything to go via the office if they don't trust the connections they're using (coffee shops, hotels, etc) [14:56] oozbooz, we have a "cloud" provider off site that I need to give developers access to, and certain things that they can log into through the web browser but only from this IP [14:57] aha [14:57] 3rd party mess.. [14:58] aye - obviously I need a static IP I can trust for that, so I'd rather tunnel everyone through our office network [14:58] well... you can create a new route that only traffic for cloud provider goes via the tunnel [15:00] hmm, what route would I be adding for that? route add 1.2.3.4 gw 5.10.152.227 eth0 ? [15:01] but, if you decide to divert all traffic, you will have to change routing rules on the server, that will be pushed to the client [15:01] which VPN server do you use [15:01] openvpn [15:01] jamespage: simplejson fixed locally ill upload to the regular archive and get it for the cloud archive as well [15:02] openvpn or openvpn-AS? [15:02] regular openvpn AFAIK [15:02] yeah [15:03] just checked with dpkg [15:03] ok, first my advice to upgrade to openvpn-AS - much easier to manage [15:04] there is IRC channel "openvpn", you should confirm with them... but it should be not difficult [15:05] cheers oozbooz [15:06] have fun [15:07] jamespage: http://people.canonical.com/~chucks/ca/ === pleia2_ is now known as pleia2 [15:14] smb: ping i was wondering if you could offer some insight on it https://launchpadlibrarian.net/145685953/buildlog_ubuntu-precise-amd64.xen_4.2.2-1ubuntu1~cloud0_FAILEDTOBUILD.txt.gz [15:15] zul, maybe, let me read [15:16] smb: this is on precise [15:17] zul, Looks like the known problem of passing LDFLAGS in gcc format -Wl but don't we work around that [15:18] smb: yeah seems to ignore that for some reason [15:18] And why do you compile xen 4.2.2 on Precise? [15:18] :-P [15:18] Still have not cleared theat MRE [15:19] Actually I would not aim 4.2.2 immediately but 4.1.5... or .6 but anyway [15:20] zul, +! [15:20] +! [15:20] +1 rather [15:20] jamespage: cool thanks [15:21] zul, "LDFLAGS = $(shell dpkg-buildflags --get LDFLAGS|sed -e 's/-Wl,//g')" in debian/rules? [15:21] zul, https://code.launchpad.net/~james-page/neutron/fixup-rootwrap-conf/+merge/176708 [15:21] I'm more the nginx kind of guy so what did I missed here? Installed apache, changed port (so that it won't conflict with nginx), getting this nestat "tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN 0 185527477 18552/apache2" but it just reacts to local requests. There is no iptable rule... Any ideas, I'm desperate :( [15:26] zul, Just out of curiosity is that the 4.2.2 version from current Saucy? [15:26] yeah [15:27] zul, Hm, so it has that line... but for some reason I vaguely remember something going wrong with something like this (but I believe that was another package) [15:29] zul, Oh wait maybe because in P LDFLAGS is exported by the build system... [15:29] hmm...interesting ill try it out [15:30] zul, Is that LDFLAGS := instead of LDFLAGS = === JonnyNomad_ is now known as JonnyNomad [15:39] zul, Oh I think I can imagine what is going on: we do not set LDFLAGS at all by default in newer releases. So when compiling in S I did not notice none of them being used and setting LDFLAGS in debian/rules being useless [15:39] But in P when they are set by default it fails... [15:39] smb: so disable it? [15:40] zul, I'd probably try either an export in debian/rules or move the definition into debian/rules.real for a moment === Catbuntu is now known as LexieGrey [15:40] smb: ok ill try that [15:41] zul, And I need to make sure I really use those flags in the Xen 4.3 I am preparing [15:41] for S that is [15:43] smb: when are you doing 4.3? [15:43] zul, I am just about to think I got all pieces together. Testing it on my boxes [15:44] smb: ok cool [15:45] zul, a user asked me if there will be any quantum-> neutron renaming in raring or earlier (and similarly, anything before havana) [15:45] my answer was "NO, but I'll check with zul" [15:46] med: no quantum in raring was quantum [15:46] nod. [15:47] * med_ was pretty sure it was only a cease and desist not a "go undo the world" [15:59] Madkiss: howdy! have you looked into packaging dlm? [16:00] smb: nope neither worked [16:01] zul, Hm, ok need to figure out how to modify it correctly for the actual compile. Seems the more recent releases just don't use any [16:02] I mean it does not get passed in and fails because where we change it somehow does not replace the default of the system [16:04] zul, Doing the export did break the build in the same way on S though... So maybe := is the second missing piece [16:13] zul, having LDFLAGS= and export LDFLAGS both in rules.real seems to make the compile run longer (not finished yet) [16:14] smb: can i see a snippet your rules.real please? [16:17] rbasak: BTW, merges.py won't work right now - until egress firewall is more relaxed. Have raised RT [16:20] Daviey: OK, thanks. [16:20] roaksoax: Hey, does Openstack / Kombu support Rabbit Active/Active in Havana? [16:20] I'll try and keep people.canonical.com/~rbasak/delta.py updated in the mean time, though note that I'm doing it manually. [16:22] Daviey: I haven't check yet, sorry! I'm doing the whole upgrade process of the clustering tools, whcih is not as easy as syncing packages from debian [16:24] Daviey, the issue wasn't active/active its the lack of any type of heartbeating support, so that the rpc layer (quickly) detects failure and migrates to a new server [16:27] Patrick, I still got the issue, but I think I'm getting closer [16:28] Patrickdk that is [16:28] Would a BUNCH of connections in CLOSE_WAIT stop up the tcp pipeline at some point? [16:40] jamespage: still around? [16:44] zul, yes [16:44] jamespage: one more for you today http://people.canonical.com/~chucks/ca/ [16:45] zul, does that one build against the havana-staging PPA? [16:45] jamespage: just finished building [16:45] zul, +1 then [16:45] jamespage: thanks [16:47] jsonperl, if that is the case, a couple of issues could be the case [16:47] open file handles? [16:48] or just exaustion of resources [16:48] maybe look here, it seems to have an ok description of the sysctl's involved [16:48] http://www.ufirsttech.com/content/linux-kernel-settings-related-tcp-connections-68 [16:48] Awesome thanks [16:49] normally there are several sysctls that need to be adjusted for any kind of high performance server [16:49] expecially when handling lots of connections [16:49] In this case it's actually a library i use to hit amazon s3 [16:49] don't think any of this would cause that single cpu usage issue though [16:49] which is the least often used connection i got [16:49] I think all of what we were seeing is a RESULT of connectivity issues [16:50] no players = no processing [16:50] oh, that page uses proc, I normally do it via sysctl instead [16:50] I think the ethtool command to change stuff maybe reset the stuck connections? [16:50] jsonperl, still :) [16:50] making it look fixed [16:50] setup a ping [16:50] see if you start missing, or get delayed pings [16:51] if your running tcpdump on the server at the time too, watching just for icmp [16:51] ok, we use pingdom… that sufficient you think? [16:51] you should be able to easily tell [16:51] I actually try tcp to the server every minute [16:51] isn't that like once a minute? [16:51] yea [16:51] You're thinking more often? [16:51] ya, I would go second, and watch delays [16:51] you want to know how long it takes, you know it gets there ,and responds [16:52] you want to know if it gets lost, or delayed [16:52] well, tcp would get lost and retried [16:52] but ping would just get lost [16:52] Any service you can recommend? or you just do it from another box [16:52] I normally just do it from my home box [16:52] gotcha [16:52] or a work computer [16:52] not like ping uses much traffic [16:53] Doesn't feel very enterprisey :D [16:53] now if you want to take it a step more, use mtr :) [16:53] so you can see where the issue actually happens, if it's network related [16:53] It's not [16:53] this is my boxes [16:54] I wish it were somebody elses fault! [16:54] no, if you think the issue was you aren't receiving the players traffic [16:54] that would be network issue :) [16:54] ping would easily show that [16:54] But I see the same issue cross machines, cross facilities [16:54] different parts of the US [16:54] same issue [16:54] not likely then [16:55] I really don't know where to go [16:55] unless I actually get on it and dig around and maybe setup my own stuff to monitor it [16:55] I feel like i need to get rid of those orphaned connections [16:55] but not even sure how good I could do that [16:55] Want a consulting job? :D [16:55] I have enough of those :) [16:55] haha [16:56] But we're a super entertaining indie game company [16:56] like on the tv :D [16:56] So real quickly... [16:57] Do you believe it's possible that piling up of CLOSE_WAIT connections eventually can lead to connectivity issues in the tcp stack? [16:57] or am I going up the wrong road here [16:57] it can, I doubt your anywhere near that though [16:57] I doubt your even >5% of the limit [16:58] Does the OS limit per process? [16:58] check ulimit for that [16:58] k [16:58] remember, tcp connections are file handles, and count with open files [16:59] So what seems like a clue to me is [16:59] Turning everything off with ethtools fixed "the glitch" [16:59] Temporarily [17:00] No question… went from "very borked" to normal the moment I changed the settings [17:00]