[09:00] <luckyduck> Hi. We're using 8.04 on a lamp cluster. yesterday at 12:24 the packetfilters on all 3 wwwnodes (loadbalanced) logged an dropped packet at the same time. 
[09:00] <luckyduck> it looked like the conntrack rule est. / related stopped working
[09:01] <luckyduck> however , the conntrack table was not reported as full at that time
[09:02] <luckyduck> i then inserted a rule which should allow the traffic (incoming answer packages from the mysql server at port 3306).
[09:02] <luckyduck> there is also a central firewall which connects the different subnets for wwwnodes and the mysql servers (there are a few others, but not relevant)
[09:03] <luckyduck> it seems that there is the same problem, but nothing gets logged and the conntrack table is also not problem (not reported as full). 
[09:04] <luckyduck> it seems that some packages simple get dropped, without getting logged or something
[09:05] <luckyduck> it doenst happen all the time, it's occurs in a "random" fashion
[09:05] <luckyduck> is there anything you can recommend to debug this?
[09:06] <luckyduck> the main problem is that everything worked fine for ~2-3 month
[09:06] <luckyduck> it started with the dropped packages on all 3 wwwnodes at the same time
[09:06] <Ng> at the exact same time?
[09:07] <luckyduck> yep, with 1 second difference
[09:07] <luckyduck> 12:23:59 on two wwwnodes and 12:23:58 on the other
[09:08] <luckyduck> we'll, the wwwtraffic is loadbalanced. ~  the same amount of traffic is handled by each node
[09:08] <luckyduck> is it possibly related to a bug in netfilter or the kernel?
[09:09] <luckyduck> i can show you logs or anything else you need. 
[09:10] <luckyduck> we've analyzed that at some point the mysql connections which get initiated by the wwwnodes get dropped by the central firewall:
[09:10] <luckyduck> the wwwnode sents the SYN, this is received on the vlan interface on the firewall
[09:11] <luckyduck> but then it's not routed into the database subnet
[09:11] <luckyduck> nothing gets logged about dropped packages and as i said, the conntrack table isn't full
[10:39]  * apw_ is disconnected from his proxy ... history is ... history
[10:40] <smb> apw_, You need the instant video to rewind it
[10:41] <apw_> actually i need the [] button in Braid to run time backwards and try again
[10:42] <smb> apw_, Heh, one would wish to have that from time to time
[10:42] <smb> apw_, Ok, I forwarded a mail from hpa to the kernel-team list. It is about CPU_DEBUG
[10:43] <apw_> cool.  i saw the original so we may well want to drop it so we don't regress
[10:43] <smb> apw_, For us it seems to be only enabled in Lucid-amd64 as a module
[10:43] <apw_> less modules == happy apw
[10:43] <smb> But as it sounds, yes, we probably just want to get rid of it before release
[10:43]  * apw_ wonders just how slow his hard drive really is ... check out my tree damn it
[10:44] <smb> happy apw == less whip lashes in my back ;-P
[10:44] <apw_> i'd not assume that :)
[10:44] <apw_> _still_ checking out
[10:44] <smb> *ouch*
[10:44] <apw_> _finally_
[10:47] <apw_> its that first time after reboot.... a world of pain
[11:05] <apw_> smb, i may have to go back to bed ... a wheel just fell off my chair and it tried to eat me
[11:05] <apw_> its just not safe out here right now
[11:06] <smb> apw_, Nasty
[11:06] <apw_> yeah its just _not_ my day
[11:07] <smb> You have to be careful when opening the fridge...
[11:08] <apw_> bah _now_ banshee is all mental, getting jammed on OSD ...
[11:08]  * apw_ cries
[11:08] <apw_> seems happy with katy perry but not kate nash wtf
[11:10] <smb> apw_, That surely is the new "policy enforcer" :-P
[11:10] <apw_> heh ... seems like it
[12:51]  * apw returns ... something is working at least
[12:52] <soren> apw: It'll pass.
[12:53] <apw> lol sadly true
[12:54]  * apw looks at the massive heap of updates for lucid ... 'stable' updates hrm
[12:55] <smb> Sometimes it feels like staple udates
[12:57] <apw> yeah ... luckily i am not under quite the review constraints you are ... for which i am thankful
[13:46] <apw> smb, well that kernel boots on my 10v at least, and doesn't eat my card, but its not a direct mmc so ...
[13:47] <smb> apw, Sounds at least good enough to try it here again. With an unimportant card, though
[13:48] <apw> heh yeah ... am pushing the i386 build now is that useful to test;  assume so
[13:49] <apw> smb, can you get to rookery? 
[13:49] <smb> apw, nope
[13:56] <apw> ahh sorted
[13:56] <soren> Could one of you guys take a quick look at bug 499520? Specifically the last comment (with attachment). I'm puzzled why 251192 kB of RAM is used by those processes.
[13:56] <ubot3> Malone bug 499520 in vm-builder "default uec-image requires at least 300 M of RAM to run - m1.small and c1.medium not needed by default" [High,New] https://launchpad.net/bugs/499520
[13:56] <soren> Adding all the processes' SZ and RSS comes to 27219 and 21608, respectively.
[13:59] <soren> To me, it looks like a kernel problem. I can't see where else all that memory could have gone.
[14:01] <apw> soren, define gone
[14:02] <apw> the machine there has 1.7gb of ram, just cause ram has stuff in doesn't mean it would be needed and retained if the machine had 256MB of ram
[14:02] <soren> apw: I understand that.
[14:03] <apw> so whats our proof that 300mb is 'needed' to boot
[14:03] <soren> apw: That is, nevertheless, the case. People have attempted to run this on a 256 MB RAM box, and failed.
[14:03] <apw> do we have a boot log from such a failure?
[14:04] <apw> as the consumer is likely the one to get it in the eye when we run out
[14:04] <soren> apw: And even though the kernel usually makes smart decisions about this sort of thing (like you suggest), you can still look around and account for the current memory use. However, in this case, I cannot.
[14:04] <apw> how much can you account for in userspace
[14:04] <soren> apw: 27 MB.
[14:05] <soren> Sorry, no. 22 MB.
[14:06] <apw> and 206 is in the buffer cache
[14:06] <soren> apw: Sorry, it didn't fail to boot with only 256 MB of RAM, but it was swapping.
[14:06] <soren> apw: Yes.
[14:06] <soren> apw: Err... Sorry, what?
[14:07] <apw> doesn't the output of free tell us that 206 is in the buffer cache
[14:07] <apw> ie hanging about
[14:07] <apw> with near 2gb of ram we'd have a fair chunk for the kernel
[14:07] <apw> and we're only 23 short there
[14:07] <soren> You make a reasonably convincing argument.
[14:08] <soren> What significance do 3460 and 41492 have, then?
[14:08] <apw> /proc/meminfo has a lot of data on whats where too
[14:08] <soren> Sorry, phone call.
[14:08] <apw> not sure i've ever used the output of free to know exactly what its fields are, i tend to use /proc/memingo
[14:09] <apw> Buffers:          422960 kB
[14:09] <apw> on my 4GB machine i have almost nothing but buffers ... wow
[14:09] <apw> doh missreading ...
[14:10] <apw> Cached:          2363964 kB
[14:10] <apw> impressive as there is only me on it
[14:10] <apw> soren, /proc/meminfo and /proc/slabinfo are worth collecting when its up
[14:18] <soren> apw: I'll make a note of that.
[14:18] <soren> apw: Thanks for your feedback.
[14:18] <apw> soren, what we are here for :)
[14:21] <apw> smb, http://people.canonical.com/~apw/lucid/ <- i386 test build for your mmc machine
[14:22] <smb> apw, Ok, cool
[14:42] <bjf> **
[14:42] <bjf> ** Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting
[14:42] <bjf> **
[15:48] <mjeanson> could someone have a look at bug #508008 ? very straightforward with patch attached, but I'm not sure how to get this considered for an hardy update
[15:48] <ubot3> Malone bug 508008 in linux "[hardy] Bridging a 802.3ad bonded interface will break it" [Undecided,Confirmed] https://launchpad.net/bugs/508008
[15:57] <rtg> mjeanson, hmm, looks like an obvious fix
[15:58] <smb> mjeanson, You should probably forward the patch with a reference to the bug and a short description of the impact and where the fix comes from to kernel-team@lists.ubuntu.com
[15:59] <mjeanson> smb: will do, thanks
[16:08] <mjeanson> rtg was faster, thank you both for your help
[16:10] <rtg> mjeanson,  bridging on servers is fairly important. I'm surprised I haven't noticed this before 'cause the bug is not strictly related to bondings. It appears that if _any_ intermediate bridge segment could drop STP packets if its not enabled on that bridge host.
[16:12] <mjeanson> rtg, I was supprised to be the first to notice that, I tough bridge on bond was a fairly regular scenario for kvm hosts
[16:12] <rtg> mjeanson, well, kvm _was_ in its infancy for Hardy.
[16:13] <mjeanson> rtg, when should I expect a fixed kernel in the archive?
[16:14] <rtg> mjeanson, it'll go to -proposed first where it'll rattle around for the best part of the first quarter. 
[16:15] <rtg> mjeanson, smb could better answer
[16:16] <smb> Its a good summary. With the special exception of other security updates in the works which take precedence on it
[16:18] <smb> For now this probably could get up to proposed somewhen next week. The bug gets an update that is will be available and if this gets verified (positive test comment on the proposed kernel in the bug) usually moves into updates a week later
[16:20] <mjeanson> well thank you both again, I'm not used to such good feedback on IRC
[16:21] <rtg> timing is everything :)
[16:21] <smb> It might be beer time which it bad for responsiveness. But we try our best. :)
[16:22] <rtg> dude, I just finished my oats for breakfast. beer time is a ways off yet
[16:23] <smb> Its all relative. Mine would in theory just be half an hour away...
[16:24] <smb> Just typing in the meeting with one bottle in the hand is kinda hard :)
[16:24] <JFo> heh
[16:25] <rtg> smb, clamp it between your knees
[16:26] <apw> smb, how is your SD card?
[16:26] <smb> apw, Still well but more to the fact that I wanted the surrounding system to be up to date, which still ploughs along
[16:27] <apw> oh ok... wanted to get that testnig before i upload ... let me know when you are there :)
[16:28] <smb> rtg, Heh yeah, should do that. I hope I never have to explain any strange muscle cramps resulting from that
[16:28] <rtg> smb, that would be TMI
[16:29] <smb> apw, I wished I could say. But I hope soon
[16:30] <smb> rtg, Ohh, I am sure we have heard worse. :)
[16:31] <rtg> apw, what kind of external USB drive are you carrying? I need to get something light for travel
[16:31] <apw> smb, you need a straw
[16:32] <smb> apw, Interesting, remember that bug we discussed yesterday about the mouse that has both absolute and relative axis? While trying to figure out what goes wrong it seems all my mice show up in "xinput list" as type keyboard. Just with a relative mode.
[16:33] <smb> apw, And yes, the straw would solve the hand problem. Maybe with the side-effect of getting drunk quicker. Which again impacts typing. :-P
[16:33] <smb> rtg, You probably want to look for an external case for 2.5" drives
[16:33] <rtg> smb, externally powers and small case
[16:34] <rtg> powered*
[16:34] <smb> rtg, I got a IcyBox which is usb powered (with an option of having a second usb port as additional power)
[16:42] <apw> rtg that one is a Maxtor one touch which seems to work pretty well
[16:43] <apw> actually it being separate is a bit annoying as its like carrying a bag of worms when you have it attached
[16:43] <apw> but its better than nothing and better than lugging this heap round with me
[16:44] <rtg> apw, well, I mostly want it for evenings since I rarely do development on my laptop
[16:44] <apw> yeah i do use my netbook without it a lot when i am away
[16:49] <smb> apw, Update finished now. But unfortunately it now seems to stop dead in its boot. Next thing for me to try is nomodeset
[16:49] <apw> wtf
[16:53] <smb> apw, Was that nomodeset? At least its not the kernel. The old one gets stuck at the same place
[16:53] <apw> nomodeset yes
[16:54] <apw> OH check thingy
[16:54] <apw> crap, those .so's for X
[16:54] <apw> smb, check the version of xserver-xorg-core
[16:54] <apw> and paste it
[16:55] <smb> apw, If I ever get a working console I will
[16:55] <apw> boot single perhaps
[16:55] <smb> Neither nomodeset, nor text worked yet
[16:56] <apw> pgraner`, where did we look for missing .so's for your xserver-xorg-core issue
[16:56] <apw> and smb you had it too ... could it be the same
[16:56] <smb> It finished "Staring crypto disks" but nothing after that. 
[16:56] <bjf> ##
[16:56] <bjf> ## Ubuntu Kernel Team Meeting - in 5 minutes - #ubuntu-meeting
[16:56] <bjf> ##
[16:57] <apw> hrm
[16:57] <apw> so somthing else in userspace
[16:57] <apw> grrrrrr
[16:57] <smb> apw, I remember seeing the same in one bug report. But need to dig further to get somewhere
[16:58] <smb> Might be just on my installation too
[17:00] <smb> apw, Hm, whatever it was. At least I now dropped into the busybox
[17:23] <apw> jjohansen, so is that miss reporting thing important enough to hold up my upload for?  and disrupt you to get it ready?  or shall we wait 
[17:23] <jjohansen> no, upload away
[17:24] <jjohansen> the tool still works, it just can't combine the unconfined tasks together
[17:24] <apw> minor then ... fine
[17:24] <jjohansen> yep
[17:24] <apw> smb, so its more about you and whether you might be able to test ... sounds like you are in a heap
[17:25] <smb> Yeah. And it does not seem to me tied to a kernel
[17:25] <smb> So, any is as bad as the other to me
[17:27] <apw> smb, ok i just risked an update on my mini, so i should have the same 'rest of the crap' as you
[17:27] <apw> and i seem to be booting ok there
[17:30] <smb> Its a weird thing
[17:30] <apw> [jjohansen] clean up on pam_apparmor: TODO
[17:30] <apw> any idea when that might be planned to be done?
[17:31] <jjohansen> apw: hrmm, either this or next week, its is an entirely user space task
[17:31] <smb> apw, fs is mounted. system reacts well to sysrq. it seems to fall into the idle of death on something
[17:31] <apw> so lucid-alpha-3 then?
[17:31] <jjohansen> apw: ys
[17:31] <jjohansen> s/ys/yes
[17:31] <apw> jjohansen, ta
[20:02] <lamont> WTF does ping of a remote host go from 2 ms to 3500 ms when I do it via the openvpn tunnel?
[20:09] <lamont> meh.  nm
[21:16] <stgraber> I just confirmed bug 509808 as something that should be done in order to get full LXC support as described in the server-lucid-contextualization blueprint. I forgot to include that parameter in the initial bug report so would appreciate it being turned on if possible by alpha-3.
[21:16] <ubot3> Malone bug 509808 in linux "Enable user namespaces in Lucid server kernel" [Wishlist,Triaged] https://launchpad.net/bugs/509808
[21:16] <stgraber> Please tell me if you'd prefer I e-mail ubuntu-kernel about it too. Thanks
[21:41] <dhillon-v10> stgraber, hi there :) it would be a great idea to email the list as it might get more attention, but you should stick around for a little bit and other could probably help you more
[21:41] <dhillon-v10> ogasawara, ping regarding glibc()