[00:07] who wants to help a noob debug a problem server from sysstat information :) [00:08] !ask [00:08] Please don't ask to ask a question, simply ask the question (all on ONE line and in the channel, so that others can read and follow it easily). If anyone knows the answer they will most likely reply. :-) See also !patience [00:10] Patrickdk whats up! [00:10] I'm back with data [00:11] So I have a problem where my machine (many servers) crawls to a stop and stops functioning. CPU usage drops precipitously [00:12] I do happen to see a spike in runq-sz during the same time, but thats the only interesting thing i've noticed [00:12] !ask [00:12] Please don't ask to ask a question, simply ask the question (all on ONE line and in the channel, so that others can read and follow it easily). If anyone knows the answer they will most likely reply. :-) See also !patience [00:20] sysstat data can be found at http://pastebin.com/bgPJxqJ3 [00:20] the problem starts occuring after19:17:00 [00:24] defently not a swap issue [00:24] no disk issue, no memory issue [00:25] you will need to use strace to find the real issue [00:25] it's a simple programming issue [00:25] I would bet your hitting a mutex lock in the kernel [00:26] likely to do when your freeing a bunch of memory at once [00:26] maybe try using jemalloc? [00:28] I would think that would show up under %sys cpu usage though [00:28] maybe you just have a funny workload though, like packets from the network stop flowing for a second or two, so the cpu load drops off [00:29] but if it's cause at that point in time, you killed a process, I would try jemalloc and see how it affects it [00:29] won't hurt to try [00:30] we do free a lot of memory when a "world" shuts down [00:31] and when a single server reboots, it frees several "worlds" at the same time [00:31] I'm not sure how jemalloc deals with freeing memory, it's more made to deal with fragmented small allocations and to be fast [00:32] so it might help, cause it probably frees a large chunk, instead of the many smaller chunks your current malloc does [00:32] hmm [00:32] i can def look into integrating that [00:32] no real intergration needed [00:32] just install it, and set it [00:32] how would i take advantage of it? [00:32] google :) [00:33] lots of people use it with different programs, like java, mysql, ... [00:33] looking now [00:34] what gives you that impression from the data btw [00:34] nothing [00:34] ok [00:34] only that you said, you had a memory leak [00:34] and you kill the program [00:34] since nothing else looks to be an issue, defently not swap/disk [00:35] so based on, you know how your thing works, is the only clue I'm going by [00:35] okee [00:35] I'm more used to having issues allocating, memory, than freeing it though [00:35] the spiked runq-sz is unconcerning? [00:36] well it restarts and everybody reconnects [00:36] where is that? [00:36] so it's a bit of both [00:37] take a look starting here: 07:17:04 PM 17 462 2.74 4.45 4.64 0 [00:37] I don't know where here is [00:37] what pastbin line [00:37] It deallocates a lot of memory, and then quickly allocates a bunch of memory [00:38] line 519, sorry [00:38] ya, that is an issue [00:38] so is that my guy [00:38] 10-20 programs want to run, but they can't [00:38] sounds like my problem [00:39] why is loadavg-1 still at 1 though [00:39] im learning all this as I go, i'll go look that up [00:42] hmm, your using threads [00:42] whats a typical cause for a high runq, or is it way too broad a topic? [00:42] that is probably why it shows up that way [00:42] probably a locking issue inside your program then [00:42] we're built upon eventmachine... it defaults to a threadpool of 20 threads [00:43] its strange that one process would bork the whole system though [00:43] it's not [00:43] think of it this way [00:43] one thread does something, but it says, nothing else can do anything till it's done [00:43] then cpu, and everything else will go low [00:43] till it says, everything else can start again [00:43] that is how mutex locks work [00:44] but outside of the process? [00:44] you need them, so your threads don't keep writing over each other [00:44] who said this was outside? [00:44] sure... we def have some mutex locking goin on [00:44] i have 14 processesing running [00:44] and each on a single core at that (ruby) [00:45] hmm, only see 5 normally wanting to run I guess [00:45] I just am not all that sure how linux reports threads vs processes all the time, as I normally don't care [00:46] well, it's over me [00:46] me 2 :) [00:46] I would still have to say a mutex lock, just would be more a kernel one then [00:46] i have a lot of angry players :D [00:46] ok, well thanks for the help Patrick [00:46] strace should show exactly what is going on when it happens [00:46] but still, if it's when a restart happens, and likely a memory issue [00:46] give jemalloc a try [00:47] even if it doesn't fix it, it might help otherwise [00:47] that basically overrides the default malloc? [00:47] oh, what is your tcp window size set to? [00:47] yes [00:47] I don't think it is closing all those tcp sessions [00:48] looks like that happens fast [00:48] see lines 935 to 937? [00:48] do you have that for the earlier timeframe? [00:49] oh I found it [00:50] window_scaling? [00:50] yep, whats up with those lines [00:51] need me to pull any other data? [00:51] well, people are saying those numbers are better [00:51] but it just looks inversed [00:51] <1 process running [00:51] and context switchs dropping [00:51] so expected [00:51] showing same issue, in reverse [00:52] oh wait [00:52] maybe not [00:53] what is your entropy? [00:53] tcp settings? [00:54] im not sure what you're asking [00:54] do this [00:54] watch -n 1 cat /proc/sys/kernel/random/entropy_avail [00:54] and see what it shows [00:55] and see what it says when your cpu goes to 0 again [00:55] ha, may be a while [00:55] what does it normally show? [00:55] it's really sporadic [00:56] 130-185 [00:56] that isn't good [00:56] you want to keep it >2000 [00:56] what am i looking at [00:56] how many random bits of info linux has ready to use [00:56] it uses it for everything [00:56] so if it drops too low, things can get held up [00:57] like, forking/starting a program [00:57] making a tcp connection [00:57] hmm [00:58] generally >500 is good, but I perfer some safety [00:58] and around 100 is the lowest it allows it to get [00:58] so i'm looking at another server i have with basically nothing running on it and it's solidly in the 140s [00:59] well, if nothing is happening, it might not be causing enough random events to fill it [00:59] the worst things for it, is vm's [00:59] they never fill the entropy pool [00:59] there is a TON happening on that other server === cmagina is now known as cmagina-away [00:59] Like all processors working at at least 50% [01:00] nothing running != 50% cpu load [01:00] no the original one we were talking about [01:00] The running game server [01:00] with hundreds of players [01:00] ya, the game server is low? [01:00] correct [01:01] quick fix to see if that is the issue [01:01] apt-get install rng-tools [01:01] and edit /etc/default/rng-tools to something like: http://pastebin.com/PLD39DN5 [01:02] that is *not* recommended, but it will keep the pool full, so you can tell if that corrects the issue or not [01:03] those are the only two things I can say [01:03] keep the entropy pool fuller, and try out jemalloc [01:04] ok [01:04] so you're no longer concerned about tcp issues? [01:04] we do drop, and reconnect a bunch of connections [01:04] a bunch = up to 50 at a time [01:04] Is anyone familiar with adding drivers to CUPS samba shared printers? I seem to be having issues that I just can't get around, and I am thinking it is due to the drivers I am using, but I don't know of any suitable ones. [01:06] I was following this: https://wiki.samba.org/index.php/Samba_as_a_print_server#Uploading_printer_drivers_for_Point.27n.27Print_driver_installation and that is where I am stuck at. I have the "Add" button, but, no matter what driver I choose, PS, PCL5, or PCL6 it fails. [01:10] Patrickdk reading about jemalloc now, thanks for the suggestion [01:28] Patrickdk libjemalloc1 ? [05:51] cheers [05:51] am I correct to assume that a direct upgrade from 10.04 to 12.04 is supported? [05:54] aw. screw it let's just do it. === smb` is now known as smb [07:52] /quit [07:56] how to add domain mydomain.com search mydomain.com to resolv.conf in 12.04? [07:57] saban, i put "dns-search mydomain.com" in /etc/network/interfaces [08:22] Hi, I'm moving over a dreadful windows server setup to a more efficient linux-based machine. I have been able to move over everything but I'm having a slight issue on the gateway-part (I'm not very skilled in the networking-side of system administration). The new server is supposed to do DHCP, basic firewalling, on-the-fly virusscanning (if possible and efficient) and most prominently bandwidth throttling. For dhcp I can use dhcp3, to block some ports [08:22] I can use iptables/netfilter and for the virusscanning part I should be fine with clamav, if my research is right. The only thing I'm having a hard time with is the bandwidth throttling. Can someone give me a few pointers or names to search for? Basically I want to limit the amount one MAC-address/IP can use to 50Kbps downstream, except for a select group of vital machines. [08:51] hello [08:51] a question ? [08:52] i got list of server name on file sometime the same server name appear 2 times sometime appear only one time : i m looking for a command that can filter and count the number of the server in one time [08:53] #ubuntu [09:01] etyuio, what about sort |uniq|wc -l ? [09:01] hi. i have problem with my bind server here is the log http://pastebin.com/0WfxbHSR [09:01] its a fresh install [09:02] /etc/bind/named.conf: permission denied who is trying to start bind ? [09:04] that work also melmoth sort | uniq [09:04] wc -l what'doing ? [09:05] counting the line [09:05] unfortunately it not counting properly [09:05] what is the difference between nl and wc -l ? [09:06] i dont know nl [09:06] sort file |uniq|wc -l and sort file |uniq|nl not having the same result [09:07] normal or not ? [09:07] yeah, nl is outputing the file content as well, wc -l just counting [09:07] why i don't get the same result for both command ? [09:08] im trying to run bind as root [09:09] root@mail:/etc/bind# /etc/init.d/bind9 start [09:09] then may be wrong permission on the file, or may be an apparmor thingy (wich i know nothing about) [09:09] but the problem is about permission reading this file [09:10] well its a fresh install. and first thing is i remove apparmor like i always do. i checked permissions on files and everything looks ok http://pastebin.com/YagmBZHj [09:10] i got 3 different result [09:10] uniq file | nl & sort file |uniq|nl & sort file |uniq|wc -l [09:10] which one is correct ? [09:16] etyuio: wc -l counts the newline character, while nl counts the number of lines in the file, I think [09:17] but I think this is more of a question for #bash [09:18] etyuio, you dont wanna use uniq on a unsorted list [09:18] (if i unerstand correclty) [09:18] ola melmoth [09:18] *hola [09:18] hola senior koolhead17 [09:24] yolanda, feedback on squid3 merge in MP [09:24] ok [09:40] jamespage, about the patches, i already did what you suggested [09:40] but i keep finding the same problem [09:57] jamespage: would you like me to look at removing dh_strip from bug 1200255 or do you want to work on that? [09:57] Launchpad bug 1200255 in golang "go get ... fails with SIGILL on armhf" [Undecided,Confirmed] https://launchpad.net/bugs/1200255 [09:57] rbasak, yes please - that would be helpful [09:57] rbasak, reviewing zuls quantum->neutron rename [09:57] OK [09:57] blimey [10:24] zul, whats the current plan re sqlalchemy? [10:25] TeTeT: hello there [10:26] koolhead17: hey koolhead, how's life in India? [10:27] TeTeT: not bad at all. You tell me how is Germany treating you :D [10:27] jamespage: hey there, what magic zul is planning with sqlalchemy :) [10:28] koolhead17, no idea - but we need a plan for saucy [10:28] everything is broken right now [10:29] koolhead17: ready for vacation in August :) [10:29] jamespage: I would love to see things in place in cloud archive 4 precise. We already had disucussion in mailinglist about basic install guide release dates and all [10:30] koolhead17, Ca for havana might be OK-ish right now [10:30] it still has the old sqlalchemy [10:31] new one is stuck in proposed for saucy right now [10:31] cool. === huats_ is now known as huats [10:48] yolanda, how did you create the branch for the merge? still trying to figure out your patches issue [10:49] jamespage, i merged from lp:ubuntu/squid3 [10:49] merged the debian one [10:49] commited and pushed [10:50] yolanda, meh - its gone away now [10:50] jamespage, i retried again [10:51] https://code.launchpad.net/~yolanda.robla/ubuntu/saucy/squid3/debian_merge/+merge/174968 [10:52] jamespage, but i'm not sure now about the diff output. It shows a diff with .quilt_patches, for example, and i see the same in my branch and in ubuntu/squid3 [10:53] yolanda, yeah - ignore that for the time being [10:54] take a look at the new MP then, i also explained the changelog with more detail [10:54] jamespage, so when i push a branch, it should be always pushed with the patches applied? [10:54] yes [10:54] ok, maybe was that then [10:55] yolanda, hmm - can I suggest that you use the previous merge changelog in full [10:55] jamespage, what do you mean? [10:56] yolanda, 3.1.20-1ubuntu1 contains a full list of all delta between Debian/Ubuntu [10:56] your current merge proposal does [10:56] not [10:56] ah, i paste the same contents? [10:56] when I do a merge - I start with the previous merge contents and check it off [10:56] to see if its still needed or not [10:56] oh ok [10:57] I also look at the subsequent changes in Ubuntu and detail those as well if still applicable [10:57] so its really a copy/paste exercise in the changelog entry [10:57] i paste them and i add my autopkg delta [10:57] great to know [10:57] yolanda, it makes it alot easier for a reviewer that way as well [10:59] ok, i'll do it like that [11:00] do you know about that entry? Add transitional dummy packages [11:01] it doesn't apply i think, but what packages are these? [11:02] yolanda, they where the squid and squid-common packages at the bottom of the ubuntu control file [11:02] ok, the dropped ones [11:02] they can be dropped now as the migration happened in 12.04 [11:03] yolanda, can you also sync the build-depends line with the Debian one - its different right now and does not need to be [11:03] ok [11:03] yolanda, you also managed to drop " - Added Suggests on winbindd for NTLM authentication" [11:03] which was applied in Debian since last merge [11:06] jamespage, should i tell something about that drop in my changelog? it comes from Debian directly [11:07] yolanda, you should not drop it [11:14] ok, pushed that [11:14] difficult one [12:04] yolanda, I fixed autopkgtest that was wrongly behaving with squid3. Now the testsuite terminates normally but is still failing on amd64 [12:11] jibel, which error? [12:18] yolanda, AssertionError: Could not find "Directory" in test_ftp_proxy [12:18] https://jenkins.qa.ubuntu.com/job/saucy-adt-squid3/ARCH=amd64,label=adt/25/console [12:19] mm, i think it's a case problem. The original test had only the check for "irectory" [12:19] i'll fix it, i'm just building an MP for squid right now, so i'll integrate it [12:31] jamespage: https://code.launchpad.net/~yolanda.robla/ubuntu/saucy/libnss-ldap/debian_merge/+merge/174993 [12:40] jamespage / zul: Looks like we need a newer kombu and amqp NEW packaged. [12:41] Daviey: wtf? [12:44] zul: see http://lists.openstack.org/pipermail/openstack-dev/2013-July/011452.html ? [12:47] Daviey: thats old...http://paste.ubuntu.com/5880713/ ;) [12:47] 2.5.12 is the latest and greatest [12:48] zul: ca.archive.ubuntu.com looks whacky to me [12:49] zul: it's not part of the cloud archive, tho right? [12:49] Daviey: it is i think [12:50] Daviey: it isnt [13:07] hello there [13:08] this is my file [13:08] http://paste.ubuntu.com/5880753/ [13:09] okay? [13:09] i would like add dotgouv at the end of the file [13:09] what to do ? [13:11] don't care about that file [13:11] it is just an example [13:11] ertuiu: use adduser to add users to your system. [13:11] do you get my question ? [13:12] i simply would like to add dotgouv at the end of each line [13:12] simply [13:12] If you just want to append text to the end of a file, you can do something like echo "Sample Text" >> /path/to/file [13:14] you still not understand my question [13:14] i would like to apend text to the end of each line [13:14] oh. Sorry. [13:14] not end of the file [13:15] sed 's/$/words/g' /path/to/file (use -i to save it to the file instead of printing to stdout) [13:19] you are absolutly correct [13:19] work [13:20] but the problem is there different type of data on that file [13:20] according to the data i want to append the correct words [13:20] how to do ? [13:22] first do you get ? Pici [13:23] its getting much more complicated then [13:23] you'll need to make a loop and check each line for the data [13:23] and then rewrite it iwth the new word appeneded into a new file and replace it [13:23] ask in #bash I guess [13:24] this is my file http://paste.ubuntu.com/5880799/ [13:24] for example [13:24] if serverone i want that it append .fr [13:25] if servertwo i want that it append .com [13:25] if serverthree i want that it append .de [13:25] how to do ? [13:26] ertuiu,i would say, learn bash+sef+awk or learn python. [13:26] or perl (but it s so XXth century) [13:29] perl rocks (tm) [13:30] ertuiu: you want this? http://paste.ubuntu.com/5880817/ [13:31] ertuiu: python has grown more popular than perl the latest years, and is easy to learn - try that ;) [13:45] jamespage: nova is almost building for me now [13:47] yolanda, squid3 uploaded - thanks [13:47] jamespage, even with the "Directory" patch? [13:48] problems with case in ftp tests [13:48] yolanda, I just pulled [13:48] so it should be there, yes [13:48] great [13:48] there was a changelog entry for it [13:49] i'm just continuing fixing merges, adding the lp bug and creating MP instead of debdiff [13:56] yolanda, OK - https://launchpad.net/ubuntu/+source/squid3/3.3.4-1ubuntu1/+build/4799196 [13:56] so as expected squid3 went into dep-wait sate as libecap is not in main [13:57] yolanda, so can you file the MIR bug for that please [13:57] jamespage, i filed the bug for libecap already [13:57] yolanda, wonderful? [13:57] sorry - that should have been wonderful! bug ref? [13:57] https://bugs.launchpad.net/ubuntu/+source/libecap/+bug/1200173 [13:57] Launchpad bug 1200173 in libecap "[MIR] libecap" [Undecided,New] === ffio_ is now known as _-_ === _-_ is now known as Guest65645 === Guest65645 is now known as ffio [14:01] yolanda, brilliant - thanks! [14:01] * jamespage sits back and lets yolanda get on with it [14:02] jamespage, once it's assigned, what's the process for it? just wait? [14:02] yolanda, yep === cmagina-away is now known as cmagina [14:27] on lucid, how can I change the default pastebin target? [14:27] pastebinit [14:28] man pastebinit ? [14:28] didn't say anything about changing defaults [14:28] hmm, it should talk about pastebinit.xml at the bottom [14:29] though probably it is to old [14:29] RoyK: Use the -b option, as follows: [ pastebinit -b http://paste.kde.org/ ] (just an example) [14:29] RoyK: (I mean, who'd want headless server output in the KDE pastebin, but that's how it's done) [14:29] ogra_: didn't say anything about that [14:30] anyway - does this disk look healtyh to you? I'm not sure... http://paste.ubuntu.com/5881016/ [14:30] http://paste.ubuntu.com/5881017/ [14:30] thats the raring manpage [14:30] i thought lucid was able to do that too [14:30] might be wrong though [14:32] ogra_: seems it works - thanks [14:32] ogra_: probably just missing from the manual [14:32] yeah [14:32] blame stgraber [14:32] :) [14:32] hehe [14:33] haha, yeah, don't count on me to keep man pages up to date, it's already a miracle there's one ;) [14:33] anyway - what do you think of that disk? it's not a big problem if it crashes, it's in a mirror after all, but some of those counters were pretty wierd === cmagina is now known as cmagina-away [14:42] jamespage: https://bugs.launchpad.net/nova/+bug/1201828 [14:42] Launchpad bug 1201828 in nova "Nova test suite with sqlalchemy >= 0.7.9" [High,New] [15:14] jamespage: disabling dh_strip in golang fixes bug 1200255. My fix is https://launchpadlibrarian.net/145094227/saucy.debdiff. What do you think? [15:14] Launchpad bug 1200255 in golang "go get ... fails with SIGILL on armhf" [Undecided,Confirmed] https://launchpad.net/bugs/1200255 [15:17] jamespage/roaksoax: https://code.launchpad.net/~zulcss/nova/nova-sqlalchemy/+merge/175042 [15:46] hallyn, You might be the man that has the secret runes to get the upstream git version of libvirt compiling on ubuntu. What an annoying experience just to make sure some patches compile before submitting them... ;-P [15:48] jamespage/hallyn: https://code.launchpad.net/~zulcss/cinder/cinder-ftbfs-j16/+merge/175050 [15:54] smb: zul has been doing the libvirt merges lately. but I assume you're saying latest upstream git has problems that 1.0.6 does not? [15:56] hallyn, Seems to be known. At least I finally found a patch of yours to avoid tests breaking for gnutls and ignoring selinux seemed to make it do something without error at least. [15:58] I suppose I will take this as "successful"... [15:58] rbasak, does that do the trick? [15:59] jamespage: yes, it seems to work. [16:00] rbasak, well proof is in the pudding and everything :-) [16:00] +1 [16:01] jamespage/roaksoax: https://code.launchpad.net/~zulcss/python-ceilometerclient/readme/+merge/175056 [16:01] jamespage: do you want me to upload that fix? Also, what is our interaction with Debian for golang? We're with -0ubuntu versioning? [16:01] Are we upstream of them? [16:02] rbasak, no [16:02] 2:1.1.1-3ubuntu3 [16:02] we where for about two days [16:03] and then 1.1.1-3 landed into unstable [16:03] I'd pushed a couple of bugs back prior to that landing based on my initial testing [16:03] rbasak, but I'd prefer it was uploaded and then fed back to debian - we can always re-sync [16:04] zul: not sure why yo pointed that cinder merge to me? [16:04] jamespage: OK - I'll upload and file a Debian bug. [16:04] hallyn: dont you want to do a review [16:04] smb: that very vaguely rings a bell (gnutls). do we not have it upstream? [16:04] rbasak, Michael has been pretty responsive to my feedback [16:04] zul: ok [16:04] hallyn: misfire :) [16:05] jamespage: sorry https://code.launchpad.net/~zulcss/cinder/cinder-ftbfs-j16/+merge/175050 [16:07] hallyn, The thread there seemed like them not wanting it everywhere but only on specific tests, you having done a lot of individual additions and then maybe just given up. (http://web.archiveorange.com/archive/v/8XiUWvec9X8NdzlDXPtK) [16:09] let me sing you a tune, ... [16:09] (yeah sounds familiar) [16:12] zul, Btw, as hallyn said you touched it last... (not completely true) and not sure you saw me earlier... I would be looking for a libvirt sponsor... :) [16:13] smb: sure [16:13] just point your stuff so i can get it [16:14] zul, In the usual place in the dead tree... I mean chinstrap:~smb/4review [16:15] smb: cool ill have a look this afternoon [16:16] zul, cheers... dammit and I nearly missed the meeting again... :-P [16:16] smb: its ok ;) [16:16] zul, Yeah, saw I am still early enough :) === cmagina-away is now known as cmagina [16:34] adam_g, roaksoax: do we make tests for all redux charms prior to proposing or not? [16:37] jamespage, what tests are you talking about? [16:37] adam_g, charm unit tests [16:37] I think [16:37] * jamespage shrugs [16:37] just looking at you cinder stuff for steering on that [16:38] jamespage, ive been writing them with the charm. i'd prefer we have them as a requirement of merging [16:38] adam_g, I agree - but I think that makes EOW a strech esp as h2 is out thursday [16:38] jamespage, oh, right [16:39] jamespage, are we expecting to actually merge this stuff to upstream charms this week, or have them done + ready for review? [16:39] jamespage, im worried if we punt on unittests now, they wont get done. :) [16:39] nah - done and ready for review [16:39] adam_g, I 100% agree [16:40] jamespage, i kinda went overboard with tests in cinder, mostly as an exercise in TDD charming. i think at least test coverage of the basic relation cases should be doable. [16:42] adam_g, I think once I get up to speed on approach they will come a bit faster (like they do in charm-helpers now) [16:42] hi adam_g, pleasure to read you once again :) === cmagina is now known as cmagina-away [16:45] Madkiss, o/ === cmagina-away is now known as cmagina [16:50] jamespage: btw alembic doesnt work with sqlalchemy 0.8 yet [16:50] oh great [16:51] jamespage: i do really don't mind having the unittests before merging them, the only problem that I see is that it will delay the charmwork. But Ideally, it would be great to get them out there for people to test too, so we can have some feedback and identify issues that we might have missed [16:51] unit tests before merge; adam_g, roaksoax: how about we get them all staged ready for testing under ~openstack-charmers [16:52] we can cross the board test with juju-core as well then early next week [16:53] jamespage, sounds good. ill be looking at kapil's pyjuju + juju-core deployer work hopefully today and cwill set up a jenkins job to do deployment testing with specified charm branches + juju implmenetation [16:56] adam_g, ack [16:56] sounds good [16:57] jamespage, also i need to get back to your reviews from last week wrt having this stuff actually land in lp:charm-helpers. [17:00] adam_g: https://code.launchpad.net/~zulcss/nova/nova-sqlalchemy/+merge/175042 === NomadJim_ is now known as NomadJim [17:19] hey, I've just got an abuse report on our mail server from AOL... sent from a customer that isn't ours [17:19] through our mail server [17:19] slightly concerned... [17:32] as you should be [17:33] why are you forwarding/relaying spam? === Ursinha_ is now known as Ursinha === mlocher_ is now known as mlocher === LordOfTime is now known as LordOfTime|EC2 === tedski- is now known as tedski === jkyle_ is now known as jkyle === andreas__ is now known as ahasenack === hodge is now known as Hodgestar === baffle_ is now known as baffle === matsubara is now known as matsubara-lunch [18:35] adam_g: can you have a look at https://code.launchpad.net/~zulcss/nova/nova-sqlalchemy/+merge/175042 [18:36] adam_g: and https://code.launchpad.net/~zulcss/cinder/cinder-ftbfs-j16/+merge/175050 [18:45] zul, done. pleaes add some information to the patch header of that sqlalechmy patch [18:46] zul, we have too many patches to tests with no context as to why we need them, and end up carrying them indefinitely [18:46] adam_g: cool thanks === matsubara-lunch is now known as matsubara [19:28] adam_g: http://people.canonical.com/~chucks/ca/ [19:29] Question: Having RAID1 on 12.04 I just found out that sdb is failed now. But `ls /dev/ | grep sdb` shows only sdb, but it should show sdb sdb1 sdb2. Anyone know what's going on? [19:30] zul: can you review https://code.launchpad.net/~hopem/ubuntu/raring/python-eventlet/lp1199037 please? [19:31] Daviey: is this for the SRU? [19:31] zul: yes [19:31] raring [19:32] changelog needs some work [19:32] zul: https://code.launchpad.net/~hopem/ubuntu/raring/python-eventlet/lp1199037/+merge/175107 [19:32] zul: I haven't looked.. :) [19:32] Daviey: heh ok [19:33] zul: if it's just a little bit of polish, can you fixer it up for dosaboy_ please? [19:34] i don't really want to look at it until it's in the queue [19:34] Daviey: sure i commented in the merge proposal [19:35] * zul goes back to fixing neutron [19:41] zul, +1 [19:42] adam_g: cool thanks [19:42] zul, can you please start adding descriptions or commit messages to packaging merge proposals? i'd like to start using tarmac locally to merge them for me, but it requires at least a description in the MP [19:42] adam_g: sure [19:52] adam_g: it always bugs me that it doesn't default to using the commit msg if none provided. [19:53] Daviey: who would be good to ping about https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1201938 ? [19:53] Launchpad bug 1201938 in libvirt "excessive memory use from libvirtd" [Undecided,Confirmed] [19:53] Daviey, ya. well, the commit message in MP != the message(s) to bzr commit -m as i've recently learned [19:54] Daviey: I just filed it, but having done so pleia2 immediately said she's run into it too, so it may have a nontrivial set of affected folks [19:59] lifeless: Ugh, reproducer seems kinda abstract :) [19:59] lifeless: Can you provide more data on what you are doing? Clearly pleia2 and yourself are doing something similar [20:00] Seems to not affect every libvirt user, or we'd see more of it. [20:01] Question: Can faulty power supply unit be cause of SATA HDD failure? [20:01] lifeless: Interestingly, are you only using py3? [20:01] Daviey: so I didn't realise I was seeing this for ages; what I saw was virt-manager going 'waah I lost my qemu:// connection' [20:01] vlad_starkov: faulty power can cause all kinda whacky things.. but probably not. [20:02] Daviey: then upstart would restart libvirt, and the kvm processes aren't affected [20:02] Daviey: w.r.t. livbirt I have no idea; my own scripts are py2 still [20:03] Daviey: I just got one of 2 hdds failure in RAID1. And I can't see smartctl output of the failed HDD. [20:03] lifeless: Hmm, i am interested that the apport hooks failed. [20:04] hallyn_: I suspect there isn't much we can do for bug 1201938 without more data.. but if you could take a quick look, that would be super.. (note, that the apport failed for some odd reason) [20:04] Launchpad bug 1201938 in libvirt "excessive memory use from libvirtd" [Undecided,Confirmed] https://launchpad.net/bugs/1201938 [20:05] Daviey: sudo ls /var/crash/ [20:05] robertc@lifelesshp:~$ [20:05] Daviey: nada in there. [20:06] How odd :/ [20:08] lifeless: have you seen this on >1 box? [20:09] lifeless: could you do an 'apport-collect 1201938' and see if it'll at least post things like libvirt config and df? [20:09] hallyn_: yes, one box for me and one-box for pleia2 [20:10] lifeless: What are you actually using libvirt for? Related to openstack? [20:10] FWIW, just checked some nova-compute + 13.04 systems that been up and exercising libvirt for ~40 days, and they look fine: http://paste.ubuntu.com/5882018/ [20:11] lifeless: are you by chance using lots of rbd storage? [20:13] Daviey: hallyn_ https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1201938/comments/7 [20:13] Launchpad bug 1201938 in libvirt "excessive memory use from libvirtd" [Undecided,New] [20:13] (answering in the bug for future-us to read) [20:13] thx [20:15] lifeless: i'll go ahead and set up a reproducer. i do see a few patches upstream - most for memory leaks in error paths, but not all of them are obviously so. [20:15] adam_g: you don't connect to those with virt-manager though right? [20:16] I assume that's the trigger [20:18] adam_g: hallyn_: it could be as simple as 'have a virt-manage qemu:// connection open for days' [20:20] yup, just need to set up a raring vm to test in [20:22] like there's a memory leak in the generic virnetclient.c. [20:24] actuallly I guess its qemu+system:// or whatever. [20:24] the default virt-manager connection [20:32] pleia2: hallyn_ thinks it may be a memory leak in the daemon [20:32] 08:22 < hallyn_> like there's a memory leak in the generic virnetclient.c. [20:32] ah === medberry is now known as med_ === unreal_ is now known as unreal [21:10] what should be the problem? http://pastebin.com/BQtEBC3N [21:10] it's not the right way to restart networking [21:10] its been moved to an upstart job [21:16] huh whats the right way? [21:16] im been doing this for long time :P [21:16] use the "service" command [21:23] with 12.04 a lot of changes for networking... tnx [21:53] Patrickdk you around? [21:53] not anymore, since you did a privmsg [21:53] ha [21:54] i'm an irc n00b… i don't know etiquette, is that rude? [21:54] if you where not asked, yes [21:54] Here's some strace output from a server under load (simulated) http://pastebin.com/Sy9A6ZzH [21:55] I suspect the futex calls may be noise from the parent processs…. Each process manages a threadpool of 20 worker threads, so maybe thats them waiting [21:55] Anything look interesting? [21:55] that does not looks like a strace [21:55] strace -c -f -p [21:56] it is some profile [21:56] is there a preferred format? [21:57] the perered format is something like, strace -p xxxx [21:57] the issue is, you need to show the relevent parts, as that is going dump gigs of data [21:58] counts are not interesting huH? === maxb_ is now known as maxb [21:58] counts don't mean anything, except if you want to speed up your application [21:58] locate what is slowing it down [21:58] which I do [21:58] we are looking to fix a problem, not optimize it [21:59] but if we want to take what it says, and assume you did it correctly [21:59] futex, fast userspace mutex, issue [21:59] lots of errors right? [21:59] so, your threads are blocking [21:59] no [21:59] where is the errors? that only shows what was called [22:00] in the errors column [22:00] so you likely still have a mutex issue [22:00] no idea what the errors column means, and probably doesn't mean anything [22:00] if futex has like a timeout option, it will return an error then [22:00] but that is *perfectly* normal [22:00] So you feel its blocking just because of time spent? [22:01] that would defently cause cpu load to go down [22:01] That may be standard for ruby [22:01] dunno, I don't know you application, I can only guess :) [22:03] i think the futexes are probably ok [22:03] it's likely the server waiting for a request [22:03] normally that is poll [22:03] it's a reactor [22:03] with a bunch of running threads, if that helps visualize it [22:04] not really, I don't do games :) [22:04] lotta web servers use reactors as well [22:04] normally programs are made in 2 ways [22:05] something single threaded, that works on a poll/select loop [22:06] though lots of those get broken into threads for the work these days, they pass off the handle [22:06] or a state engine, event based thing [22:06] you have threads and polls, so likely your working in like an apache worker format [22:07] one process to handle connection setup, then it passes it off, or forks a thread to handle it [22:07] think of it as one single threaded loop [22:07] that just runs fast as hell and does no blocking IO [22:08] and it has a pool of worker threads that are pre-spawned that it passes async work to [22:08] with the result to be delivered back to the main reactor thread [22:08] hmm, the whole point of poll, is to block [22:09] but the question still is, what is the futex (mutex lock) blocking on? [22:09] that is the only way to make it faster [22:09] but that profiling doesn't tell us [22:09] if that futex is a normal thing [22:09] or only happens during the *slow down* [22:10] that was a normal run [22:10] no slow down [22:10] which doesn't mean it isn't an issue... [22:10] it may just mean that it doesn't actually gunk up the works at this load level [22:11] ok, so we know what a normal profile looks like [22:11] now the issue, and it won't be easy [22:11] is to compare it to one that is having an issue [22:12] and to make sure the strace is on the right one, during that issue [22:13] i installed jemalloc on one of the machines btw, to see if that helps [22:14] i can't find a whole lot of info on it… seems like i just install it and go [22:14] no configuration needed [22:14] it has to be configured [22:14] lsof -n | grep jemalloc [22:15] nothin [22:15] then it's not using it [22:15] is there a configuration resource you can point me to? I didn't find anything [22:15] http://stackoverflow.com/questions/10946506/using-jemalloc-in-existing-huge-code [22:15] note the export LD_PRELOAD [22:16] add that to your programs startup script [22:16] ah, ok [22:16] dammit [22:16] now that's in prod… i cannot touch it :) [22:16] i'll give that a shot on the staging environment [22:18] thanks for the link [22:27] Patrickdk I'm not seeing the so in lib after the install [22:28] hmm? [22:28] never mind [22:28] i'm a dummy [22:28] :) [22:29] ya, whoever made that package for jemalloc did not do a good job [22:30] it installed at least [22:30] lets see if she runs [22:31] i can always just build it if it doesn't cut it [22:31] one more thing to add to build scripts :) [22:31] no, it's fine, it's just missing the .so symlink and stuff [22:32] ERROR: ld.so: object '/usr/lib/jemalloc.so.1' from LD_PRELOAD cannot be preloaded: ignored. [22:32] so if it got upgraded, it would need manual fuzzing with again [22:32] wrong location [22:32] and it's not called that [22:32] /usr/lib/libjemalloc.so.1 [22:33] # ls /usr/lib/libjemalloc.so.1 => /usr/lib/libjemalloc.so.1 [22:34] well, that isn't what you posted up there [22:34] oh [22:34] that is totally true haha [22:35] awesome, she's a runnin [22:35] it does not segfault… so that's certainly something [22:35] you checked with lsof? [22:36] lsof [22:36] whoops [22:37] ruby 19356 root mem REG 252,0 108100 7344276 /usr/lib/libjemalloc.so.1 [22:38] sounds like if memory allocation is or is not the problem [22:38] jemalloc makes more sense for these servers... [22:38] from the bit i've read about it [22:40] I've recently had some servers come to a grinding crawl lately, cause of that [22:40] but in my case atleast, it was extreemly high cpu usage [22:40] because of jemalloc huh? [22:41] and switching back to native fixed it? [22:41] no, cause of not using it [22:41] ah, i c [22:41] we do a tremendous amount of memory manipulation [22:41] from a lot of threads [22:41] what kind of servers are you managing? === Corey_ is now known as Corey [22:46] ok here goes… i put half of the servers on jemalloc [22:54] ok this resolfconf is... idk what were they thinking but i want to like in old days to edit resolv.conf and that resolvconf would not overwritte it? how to do it? i did resolvconf --disable-updates with no luck [22:55] heh? [22:55] just delete /etc/resolv.conf [22:55] and make your own [22:55] but why not just do it the correct way, using /etc/network/interfaces === arrrghhhAWAY is now known as arrrghhh [22:58] becouse the correct way is just getting changed every release.. and i lost 1 hour just for networking. i did with interfaces but got stuck on how to put domain and search in it :/ [23:00] if you lost time cause of it, it's cause you did not read the release notes, that came out almost 2 years ago [23:00] there is a very specific reason there are release notes, so people like you would read them, and NOT have issues [23:02] Patrickdk: you are right. [23:03] * Patrickdk has nothing to do with ubuntu [23:03] wow strace is HEAVY... [23:04] yes [23:08] i love that no matter how hard i try, i just cannot make a machine exhibit the problem [23:08] production only [23:08] normally how a locking issue works [23:08] it has to be timed just perfect [23:08] yep [23:09] i've written simulated players [23:09] that do just about EVERYTHING that a real player does [23:09] had a nfs kernel issue, went in aug [23:09] but not a single person had an issue till and of dec [23:09] i have all 8 procs PEGGED [23:09] i'd be fine with a rarely seen issue :) [23:10] no, at end of dec, I hit the issue multible times a day [23:10] each time, the server would panic and reboot [23:33] aiight Patrickdk thanks for the help again, enough banging my head for now…