[04:58] <bluefoxicy> I want oops insurance
[04:59] <bluefoxicy> when my kernel oopses I want it to file a bug and mail the full oops output to every kernel dev so I KNOW they're fixing it  :D
[05:30] <infinity> bluefoxicy: Every time I commit a fix for something, I want it instantly forced on every user's machine, and I want them to be forced to test it immediately.  (Also, we can't always get what we want)
[05:31] <bluefoxicy> infinity:  it was a joke
[05:31] <infinity> So was mine.  Sort of. ;)
[05:31] <bluefoxicy> infinity:  although instant testing would be nice ;)
[10:05] <dilinger> infinity: i'd like every user to accompany their bug reports w/ patches that solve the problem.  also, a pony.  i want a pony.
[10:09] <Mithrandir> dilinger: what colour?
[10:11] <dilinger> blue
[10:49] <dilinger> [  123.737130]  Call Trace: [ffffffff802e0e15]  schedule_timeout+0x25/0xd0
[10:49] <dilinger> [  123.737143]   [ffffffff8014b803]  prepare_to_wait+0x23/0x80
[10:49] <dilinger> [  123.737148]   [ffffffff802dc383]  unix_stream_recvmsg+0x2c3/0x5d0
[10:49] <dilinger> [  123.737155]   [ffffffff8014b550]  autoremove_wake_function+0x0/0x30
[10:50] <dilinger> hey look, it's an almost readable backtrace, just like i386 and sparc64
[11:19] <dilinger> ugh, is that intentional?
[11:19] <dilinger> dmesg output:
[11:19] <dilinger> [   82.375220]   [<ffffffff8019adc0>]  __pollwait+0x0/0xf0
[11:19] <dilinger> syslog (kern.log):
[11:19] <dilinger> Feb 24 05:15:42 throat kernel: [   82.375220]   [__pollwait+0/240]  __pollwait+0x0/0xf0
[12:16] <mjg59> BenC: Why the change to CONFIG_2GB?
[01:17] <doko> is it intended for a server setup that the hard disk spin down after activity? and if yes, that period seems to be too short
[01:31] <BenC> doko: that would be one of the startup scripts doing that, or it's your bios
[01:31] <BenC> mjg59: mainly to allow laptops with 1GB of ram to still be able to suspend/resume
[01:31] <doko> BenC: no, did start sometime in the dapper cycle
[01:32] <BenC> doko: has to be a startup script then...not sure which one though
[01:32] <mjg59> BenC: Eh?
[01:32] <mjg59> BenC: Suspend/resume should work with highmem...
[01:32] <BenC> mjg59: I've had reports that it doesn't, also read it somewhere
[01:32] <mjg59> BenC: How weird
[01:32] <mjg59> BenC: It has the side effect of breaking sbcl
[01:33] <BenC> yeah, it seems to be breaking more than it fixes
[01:33] <Mithrandir> when I get "recursive die() failure, output supressed", that's bad, right?
[01:33] <fabbione> hey BenC 
[01:33] <BenC> hey fabbione
[01:33] <mjg59> BenC: Suspend to RAM or suspend to disk?
[01:33] <BenC> fabbione: I did silo last night, just forgot to upload new upstream tarball
[01:33] <Mithrandir> our current live CD seems exceedingly unhappy in vmware.
[01:33] <BenC> mjg59: can't recall, let me check
[01:34] <fabbione> BenC: please pull from my tree. changes: more sun4v love, config changes for sparc to move sunhme, sungem and esp from mod to inline to spare people a few errors in d-i
[01:34] <BenC> fabbione: those d-i problems were supposed to be fixed, I filed bugs :/
[01:35] <fabbione> BenC: i am talking about modprobe errors during install
[01:35] <BenC> fabbione: did you also make scsi/sd built-in aswell?
[01:35] <fabbione> no i didn't
[01:35] <BenC> what sort of modprobe errors?
[01:35] <fabbione> only sunhme, sungem and esp
[01:35] <fabbione> it's problem with the drivers
[01:35] <fabbione> when the hw is not there, they exit badly
[01:35] <fabbione> and d-i catch the error and show a big fat red screen to the user
[01:35] <fabbione> = scary
[01:36] <fabbione> bu
[01:36] <fabbione> BUT
[01:36] <fabbione> if they are compiled in, hw is still recognized
[01:36] <fabbione> and there is no need to modprobe them
[01:36] <fabbione> = works
[01:36] <BenC> gotcha
[01:37] <fabbione> BenC: i also added the sparc ABI files for 16.22 and 16.23
[01:38] <fabbione> and removed the abi checker
[01:38] <fabbione> sorry i meant the sparc.ignore
[01:38] <BenC> ok
[01:38] <fabbione> now it takes me about 20 minutes to build :)
[01:39] <BenC> how's the DC sparc hw coming?
[01:40] <BenC> 20 minutes? my machine can do it in 10 :)
[01:41] <BenC> btw, someone is giving me an e3.5k
[01:41] <BenC> not sure where I'll put it, or why I need it, but it's free
[01:41] <infinity> mjg59: swsusp is the one that supposedly doesn't work with > 1GB of RAM (according to reports on user lists and forums, etc), though it's worked in the past on my laptop with 2GB of RAM..
[01:42] <fabbione> BenC: 20 minutes including udebs and without ccache?
[01:42] <fabbione> BenC: 20 is full dpkg-buildpackage :)
[01:42] <infinity> BenC: Don't suppose they want to spend ridiculous sums to ship it to Australia?
[01:42] <fabbione> BenC: the machines are installed in a rack.. i think elmo crashed before installing anything on them
[01:42] <BenC> fabbione: full udebs, but ccache, yes
[01:43] <fabbione> kill ccache :)
[01:43] <fabbione> and we can talk ;)
[01:43] <BenC> infinity: maybe I can sneek it through customs in to germany in my duffle bag
[01:43] <infinity> BenC: You must have gotten a larger duffle bag since last we met...
[01:43] <BenC> I think It was 30 minutes without ccache
[01:43] <fabbione> Feb 24 04:40:30 sunrise udevd[2686] : get_netlink_msg: unable to receive kernel netlink message: No buffer space available
[01:44] <fabbione> HMM
[01:44] <fabbione> kernel or udev issue?
[01:44] <BenC> fabbione: AHA! someone else sees it aswell
[01:44] <BenC> that's been fucking up my e3k for weeks now
[01:44] <infinity> Blame davem.
[01:44] <BenC> it has to be kernel, I haven't seen it anywhere else
[01:45] <BenC> someone the netlink foo is returning ENOBUF but I can't see how
[01:45] <fabbione> BenC: AHHHHH
[01:45] <BenC> s/someone/somewhere/
[01:45] <fabbione> crap
[01:45] <fabbione> i will blame davem :)
[01:46] <infinity> Or, blame davem's girlfriend.  She's too distracting.
[01:46] <BenC> infinity: s/duffle bag/empty crago hold/
[01:46] <fabbione> because it looks like udevplug is triggering that thing exactly when it needs to probe the network driver
[01:46] <BenC> fabbione: for me it occurs when it needs to load sd_mod, which means I have to manually bring the machine past initrd stage
[01:47] <fabbione> ok..
[01:47] <fabbione> so it's random..
[01:47] <fabbione> good to know
[01:47] <fabbione> david will love to fix that :)
[01:47] <BenC> not really, for me sd_mod is the first thing it is trying to do
[01:48] <fabbione> the netdriver is not the first
[01:48] <fabbione> it's way past that i think
[01:48] <fabbione> but i collected all the infos together
[01:48] <BenC> it's probably the first thing that requires a kernel event
[01:50] <fabbione> could be
[01:53] <fabbione> BenC: btw.. silo is up
[01:53] <fabbione> uploaded this morning
[02:11] <BenC> yeah, saw that
[02:12] <fabbione> cool
[02:12] <fabbione> hey Keybuk 
[02:12] <fabbione> i was just waiting for you
[02:12] <fabbione> Keybuk: http://people.ubuntu.com/~fabbione/sparc/
[02:12] <fabbione> this is the error i am getting on sparc
[02:13] <fabbione> there is lspci, the error in syslog and the strace of both udevd and udevplug
[02:13] <fabbione> each time i run udevplug i can reproduce the error
[02:13] <fabbione> what i see at each reboot is network not loaded (e1000)
[02:13] <fabbione> but BenC for instance has no sd_mod
[02:14] <fabbione> BenC: i also fixed hw-detect and Mithrandir did kbd-chooser
[02:15] <fabbione> the latter will make that annoying error disappear when you don't have a keyboard installed
[02:15] <BenC> so hw-detect should find my sbus devices now?
[02:15] <fabbione> BenC: only after you will upload the new kernel...
[02:15] <fabbione> that has esp built in
[02:15] <BenC> well that's no help, new kernel will make my sbus modules built-in :P
[02:16] <Keybuk> cute, that error message isn't listed in the recv() manpage
[02:16] <Keybuk> ENOBUFS
[02:16] <fabbione> BenC: i will look at hw-detect in debian and see what they have 
[02:16] <BenC> Keybuk: that error is from the netlink core
[02:16] <Keybuk> BenC: how do we avoid that error?
[02:16] <fabbione> BenC: otherwise yeah. i guess that's the solution :/
[02:16] <BenC> Keybuk: I couldn't figure out why it happened
[02:16] <BenC> netlink code is so confusing
[02:17] <Keybuk> could it be that the kernel overflowed the netlink buffer space
[02:17] <BenC> fabbione: they have a working libdetect that correctly descends secondary sbus busses :)
[02:17] <BenC> Keybuk: seems that it some how does
[02:18] <fabbione> BenC: what package is that?
[02:19] <BenC> fabbione: detect or libdetect or something like that
[02:19] <BenC> from what I remember, we don't use the same thing they do to detect devices
[02:19] <fabbione> BenC: oh you mean discover?
[02:19] <BenC> yeah, that's it
[02:19] <fabbione> they did get rid of it
[02:20] <fabbione> and the fix for the double sbus was merged in breezy under my "heavy" pressure
[02:20] <Keybuk> BenC: what's BUFFER_SIZE in lib/kobject_uevent.c ?
[02:21] <Keybuk> (that's the size allowed for a single uevent)
[02:21] <BenC> Keybuk: no idea, but there is no ENOBUFS in that code, so I think it's elsewhere
[02:22] <Keybuk> it could be that it's trying to generate a uevent that's too big
[02:22] <Keybuk> or it could that it's queuing too many uevents, and udevd isn't getting enough cpu time to slurp them all
[02:22] <BenC> the BUFFER_SIZE overun case returns ENOMEM
[02:22] <fabbione> it's probably the latter
[02:22] <fabbione> too many events
[02:23] <BenC> net/netlink/genetlinks:ctrl_build_msg() returns ENOBUFS
[02:24] <BenC> and so does net/netlink/af_netlink:netlink_overrun()
[02:24] <BenC> it's probably the second one generating the error
[02:24] <Keybuk> that function's called in a few places
[02:25] <fabbione> Keybuk: is it possible to build udevplug to wait let say half second between sending each event?
[02:25] <fabbione> that would exclude the "too many events at once"
[02:25] <Keybuk> fabbione: you'd be in the ten-minutes-to-boot area with that delay
[02:25] <fabbione> Keybuk: i don't need to run at it boot :)
[02:25] <fabbione> i can test it in userland
[02:25] <fabbione> i get the same error
[02:26] <fabbione> in both situation
[02:26] <fabbione> 10 minutes.. no problem ;)
[02:26] <Keybuk> run "udevplug -s" to test if that's the case
[02:26] <Keybuk> that waits for the previous event to be processed before sending the next
[02:26] <fabbione> what does -s do?
[02:26] <fabbione> ok
[02:26] <fabbione> sure i can do
[02:26] <fabbione> in a few minutes..
[02:26] <fabbione> i am enjoying this extremely cleaned up d-i
[02:27] <Keybuk> BenC: could be caused by nlmsg_new not returning a message, though I can't find that function
[02:27] <BenC> do_one_broadcast() also has a few failure points
[02:27] <Keybuk> static inline struct sk_buff *nlmsg_new(int size)
[02:27] <Keybuk> {
[02:27] <Keybuk>         return alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL);
[02:27] <Keybuk> }
[02:28] <BenC> netlink_broadcast_deliver() seems like a likely candidate
[02:29] <Keybuk> yeah, there's a whole bunch of them there
[02:29] <BenC> it tries to push one to the queue
[02:29] <Keybuk> if only we could tag it so we knew which one it was
[02:29] <BenC> if fabio's test doesn't prove anything I'll start sprinkling some printk's to find out where it is failing
[02:29] <Keybuk> ah, that pushes into an actual socket rcvbuf ... so if that was full, then it wouldn't fit
[02:29] <BenC> right
[02:30] <Keybuk> interesting that this has only affected sparc so far though
[02:30] <fabbione> hmmm
[02:30] <BenC> hey, can you increase udev's socket bufsize?
[02:30] <fabbione> Keybuk: it might be a cpu speed issue?
[02:30] <Keybuk> I'd've thought an amd64 would show up more
[02:30] <BenC> Keybuk: sparc64 is slower
[02:30] <Keybuk> ahh, of course, sparc is slower so udevd gets less time because the kernel is using it all
[02:30] <BenC> it's odd that it happens on a 6-way system though
[02:30] <Keybuk> BenC: remind me how to do that
[02:31] <BenC> my sparc64 has 6 cpu's and 6gigs of ram :/
[02:31] <Keybuk> it's a setsockopt isn't it?
[02:31] <fabbione> "mine" only 32 CPUs and 16GB of ram
[02:31] <Keybuk> fabbione: cat /proc/sys/net/core/rmem_max /proc/sys/net/core/rmem_default
[02:32] <fabbione> Keybuk: you will have to wait.. i am in the middle of testing parted.. just a few minutes
[02:32] <BenC> Keybuk: maybe setsockopt(), let me check
[02:33] <fabbione> rebooting now...
[02:33] <BenC> fabbione: maybe we have just too much memory/cpu and it's confusing udev :)
[02:33] <fabbione> BenC: possibly
[02:33] <fabbione> BenC: these udev hackers and their laptops
[02:33] <BenC> lol
[02:33] <BenC> Keybuk: IIRC, you can set the buffer to a local one
[02:35] <BenC> SO_RCVBUF maybe
[02:37] <Keybuk> it already sets that SO_RCVBUFFORCE thing
[02:38] <BenC> what does that do?
[02:39] <Keybuk> dunno
[02:40] <fabbione> re
[02:40] <Keybuk> ah
[02:40] <Keybuk> got it, sets the rcvbuf and forces it over the maximum if necessary
[02:41] <Keybuk>         const int buffersize = 16 * 1024 * 1024;
[02:41] <fabbione> Keybuk: ok.. now i am without network
[02:41] <Keybuk>         setsockopt(uevent_netlink_sock, SOL_SOCKET, SO_RCVBUFFORCE, &buffersize, sizeof(buffersize));
[02:41] <Keybuk> so that forces the rcvbuf of that socket to 16MB
[02:41] <fabbione> root@sunrise:~# cat /proc/sys/net/core/rmem_max /proc/sys/net/core/rmem_default
[02:41] <fabbione> 131071
[02:41] <fabbione> 124928
[02:41] <Keybuk> which is roughly 16,000 uevents
[02:41] <BenC> Keybuk: maybe that's broken on sparc
[02:42] <fabbione> BenC: or events are bigger?
[02:42] <Keybuk> events are fixed at 1024 in the kernel and in udev
[02:42] <Keybuk> fabbione: try "udevplug -s -v | tee events.txt" then count how many lines you get :)
[02:43] <fabbione> Keybuk: udevplug -s is running
[02:43] <Keybuk> ah, ok
[02:43] <fabbione> i can do that later again :)
[02:43] <fabbione> the error did always appear before..
[02:44] <fabbione> so one run more or less won't cahnge my life
[02:45] <Keybuk> ok
[02:45] <BenC> Keybuk: are you checking the return value of setsockopt()?
[02:45] <Keybuk> see whether it appears this run first
[02:46] <Keybuk> BenC: no...
[02:46] <BenC> it may be failing
[02:46] <Keybuk> BenC: looking at the code, it can only fail with -EPERM
[02:47] <BenC> yeah, but it could also be something as stupid as a signed extension or 32/64 value that is getting junked and causing it to be set at minimum bufsiz
[02:47] <BenC> but that wouldn't error out
[02:47] <BenC> amd64 isn't doing this in 32-bit
[02:47] <fabbione> Keybuk: it looks like that the run did not generate the error, but it still doesn't bring up the network
[02:48] <fabbione> BenC: amd64 doesn't need memory to be alligned at 64 bit
[02:48] <fabbione> that can cause issues
[02:48] <fabbione> like it was with apt in breezy
[02:48] <BenC> no, I mean amd64 isn't pushing this through a compat layer
[02:48] <fabbione> yes i understand
[02:48] <fabbione> are we?
[02:48] <fabbione> yes
[02:48] <fabbione> i think..
[02:49] <fabbione> Keybuk: i have the events file..
[02:50] <Keybuk> fabbione: just "wc -l" it
[02:50] <fabbione> wc -l events.txt 
[02:50] <fabbione> 0 events.txt
[02:50] <zul> heyl
[02:50] <fabbione> ?
[02:50] <fabbione> hey zul 
[02:50] <BenC> looks like it just pushes it through as a compat, no translation
[02:51] <Keybuk> fabbione: you ran with -v ?
[02:51] <fabbione> udevplug -s -v | tee events.txt
[02:51] <Keybuk> weird
[02:51] <fabbione> exactly as you wrote it
[02:51] <Keybuk> dunno why that didn't give you anything
[02:51] <Keybuk> does it without the | tee ?
[02:51] <Keybuk> ie just udevplug -s -v ?
[02:51] <fabbione> yeah
[02:51] <BenC> Keybuk: doesn't give me anything on my amd64 box either
[02:52] <fabbione> udevplug -s -v | wc -l 
[02:52] <Keybuk> fabbione: is /sys mounted? :)
[02:52] <BenC> ah, are you in a chroot?
[02:52] <fabbione> Keybuk: you joking right? it's ubuntu running system 100%
[02:52] <fabbione> no chroot
[02:52] <Keybuk> find /sys -name uevent
[02:53] <fabbione> find /sys -name uevent | wc -l
[02:53] <fabbione> 730
[02:53] <Keybuk> BenC: it works just fine on mine, I get a huge number of /sys lines printed
[02:53] <BenC> I do now that /sys is mounted
[02:54] <BenC> I need a serial console to my sparc so I can test this stuff too
[02:54] <Keybuk> fabbione: what about just "udevplug -v" does that print anything?
[02:54] <BenC> I ran udevplug under linux32 in an i386 chroot on my amd64, which should use the same codepath as sparc, and it was just fine
[02:55] <fabbione> Keybuk: checking in a sec
[02:55] <Keybuk> BenC: yeah, I've just done the same
[02:55] <Keybuk> quest scott# time udevplug -v | wc -l
[02:55] <Keybuk> 773
[02:55] <Keybuk> udevplug -v  0.01s user 0.03s system 2% cpu 1.685 total
[02:55] <fabbione> hmm
[02:55] <fabbione> it's taking too long
[02:56] <Keybuk> fabbione: check you don't have an empty /dev/.udev/queue directory (just sudo rmdir it)
[02:56] <fabbione> Keybuk: yeah that's where i was sticking my nose :)
[02:56] <Keybuk> that's a common failure mode of previous udevd
[02:56] <fabbione> time udevplug -v | wc -l
[02:56] <fabbione> 735
[02:56] <fabbione> real    0m1.093s
[02:56] <fabbione> user    0m0.048s
[02:56] <fabbione> sys     0m0.164s
[02:56] <Keybuk> could also explain why -s doesn't work (it's also waiting for that to go away first, and just times out after three minutes)
[02:56] <Keybuk> ok
[02:56] <fabbione> so now
[02:57] <fabbione> let's try again the -s
[02:57] <Keybuk> well, you don't have any more events than my amd64, slightly less in fact
[02:57] <fabbione> i got the error in syslog
[02:57] <Keybuk> those should take only 735,000 bytes of memory to hold in the kernel
[02:57] <Keybuk> BenC: do you know of a way to find out the size of a socket from userspace?
[02:57] <Keybuk> lsof?
[02:58] <fabbione> hmmm this is interesting
[03:00] <fabbione> Keybuk: if i run udevplug -v -s
[03:00] <fabbione> it gets to /sys/class/vc/vcsa
[03:00] <fabbione>  /sys/devices/pci0000:02
[03:00] <fabbione> and it stalls there
[03:00] <Keybuk> probably aborts with SIGALRM :)
[03:01] <fabbione> no no
[03:01] <fabbione> udevplug is still running
[03:01] <fabbione> i had to ctrl+c
[03:01] <Keybuk> odd
[03:01] <Keybuk> what's in /dev/.udev/queue?
[03:01] <fabbione> no queue
[03:01] <Keybuk> hmm
[03:01] <Keybuk> strace it
[03:03] <fabbione> it's polling queue
[03:03] <fabbione> probably udev is still processing
[03:03] <Keybuk> does queue still exist?
[03:03] <fabbione> yes
[03:04] <fabbione> but it's empty
[03:04] <Keybuk> ah
[03:04] <Keybuk> did you rmdir it first?
[03:04] <fabbione> yes
[03:06] <fabbione> like i said
[03:06] <fabbione> udevplug did generate the queue
[03:06] <fabbione> and waiting for it to disappear
[03:06] <fabbione> but it's empty
[03:06] <BenC> Keybuk: getsockopt()
[03:06] <fabbione> Keybuk: udevd is doing nothing
[03:07] <Keybuk> that's weird
[03:07] <Keybuk> that suggests the kernel never gave the event to udevd
[03:07] <fabbione> well it did
[03:07] <Keybuk> can you run "udevmonitor -e" as well?
[03:07] <fabbione> otherwise i would have no / ;)
[03:07] <Keybuk> if it had given the event to udevd, udevd would have done something and removed the queue directory
[03:08] <fabbione> ok one sec..
[03:09] <fabbione> i am setting up a slightly better env
[03:09] <fabbione> like 20 xterm
[03:09] <Keybuk> udevmonitor should give you a UEVENT and UDEV for each things udevplug prints (with -s)
[03:10] <fabbione> root@sunrise:/dev/.udev# ls -asl
[03:10] <fabbione> total 0
[03:10] <fabbione> 0 drwxr-xr-x  4 root root    80 Feb 24 06:09 .
[03:10] <fabbione> 0 drwxr-xr-x 14 root root 13920 Feb 24 06:05 ..
[03:10] <fabbione> 0 drwxr-xr-x  2 root root   520 Feb 24  2006 db
[03:10] <fabbione> 0 drwxr-xr-x  2 root root    40 Feb 24 06:09 failed
[03:10] <fabbione> root@sunrise:~# udevmonitor -e
[03:10] <fabbione> udevmonitor prints the received event from the kernel [UEVENT] 
[03:10] <fabbione> and the event which udev sends out after rule processing [UDEV] 
[03:10] <fabbione> ok?
[03:10] <fabbione> do we agree that it is ok?
[03:10] <Keybuk> ok
[03:10] <fabbione> i can see the events
[03:11] <Keybuk> that's a good starting point
[03:11] <Keybuk> "udevplug -s -v" on that ... for each thing it prints you should see a UEVENT and then a UDEV for it
[03:11] <Keybuk> if you get no UEVENT, then that's bad
[03:11] <Keybuk> if you get a UEVENT and no UDEV, that's even worse
[03:11] <fabbione> UEVENT[1140790262.767691]  add@/class/vc/vcsa
[03:11] <fabbione> UDEV  [1140790262.770291]  add@/class/vc/vcsa
[03:11] <fabbione> ok?
[03:11] <Keybuk> ok
[03:11] <Keybuk> and udevplug printed /sys/class/vc/vcsa as well?
[03:12] <fabbione> it didn't get that far???
[03:12] <fabbione>  /sys/class/tty/tty4
[03:12] <fabbione> it stopped a few letters before
[03:12] <fabbione> make that 40
[03:12] <fabbione> it did skip 4
[03:12] <fabbione> meh 0
[03:13] <fabbione> there is a queue
[03:13] <fabbione> and it is empty
[03:13] <Keybuk> ok, what was the last thing udevplug printed?
[03:13] <fabbione> yes
[03:13] <Keybuk> what was?
[03:13] <fabbione>  /sys/class/tty/tty4
[03:13] <Keybuk> ok
[03:13] <fabbione> now
[03:13] <Keybuk> what was the last UEVENT/UDEV combo printed?
[03:14] <fabbione> ok one second dude
[03:14] <fabbione> when udevplug was at /sys/class/tty/tty4
[03:14] <fabbione> udevmonitor was printing the vcsa
[03:14] <Keybuk> that's a bit weird
[03:14] <fabbione> there was a queue and it was empty
[03:14] <fabbione> now one more thing
[03:14] <fabbione> i did remove the queue
[03:14] <fabbione> and saw it recreated
[03:14] <fabbione> more events did pass
[03:15] <fabbione> and udevplug did finish
[03:15] <Keybuk> (btw, worth noting that "udevplug -s" is not very well tested)
[03:15] <Keybuk> it could be just a general bug with it
[03:15] <fabbione> so ok.. what do you want me to test next?
[03:15] <Keybuk> hmm
[03:15] <Keybuk> so udevplug completed normally
[03:15] <Keybuk> and you didn't get that error
[03:15] <fabbione> nope
[03:16] <Keybuk> nope it didn't complete noramlly?
[03:16] <fabbione> but know i don't know if it would have loaded the module
[03:16] <fabbione> nope = no error
[03:16] <fabbione> i think the problem is here:
[03:16] <Keybuk> right, so this suggests that the netlink buffer doesn't overflow if you go slowly
[03:16] <fabbione> tty4 was way before than /sys/class/vc/vcsa
[03:17] <Keybuk> how do you know? :)
[03:17] <fabbione> (in the print from udevplug)
[03:17] <fabbione> the print order?
[03:17] <fabbione> now
[03:17] <fabbione> listen up
[03:17] <Keybuk> did udevplug never do vcsa before then?
[03:17] <fabbione> no it didn't
[03:17] <fabbione> it did it after
[03:17] <fabbione> if you let me :)
[03:18] <fabbione> udevplug was printing tty4 - udevmonitor was at vcsa
[03:18] <fabbione> the line after vcsa in udevplug is /sys/devices/pci0000:02
[03:18] <fabbione> the same where it was hanging a long time before
[03:18] <fabbione> now...
[03:19] <Keybuk> yeah, I get the same behaviour here (though udevplug actually prints that ... I suggest your stdout buffers weren't flushed <g>)
[03:19] <fabbione> could it be a bug in /sys parsing of that device?
[03:19] <Keybuk> this is just a "-s" bug
[03:20] <fabbione> next test...
[03:20] <fabbione> queue was never deleted tho
[03:20] <fabbione> if i run only udevplug
[03:20] <fabbione> i can see all the events and the error
[03:20] <fabbione> that happens very early
[03:20] <pappan> is there kernel debugging tool in ubuntu
[03:20] <fabbione> almost at the beginning
[03:21] <Keybuk> right, because udevd had finished processing the event, and was waiting for the next ... where udevplug had made the queue directory and then ended up waiting on it
[03:21] <pappan> i am facing a problem with reboot in my laptop
[03:21] <fabbione> but if i run it normally, the queue dir disappear
[03:22] <Keybuk> I'll have to debug that, but it's reasonably safe race :)
[03:22] <Keybuk> so ignore that for now
[03:22] <fabbione> Keybuk: so do you think the bug is from udev itself?
[03:22] <Keybuk> fabbione: no, I think this is a kernel bug
[03:23] <fabbione> i am not really worried about the message itself
[03:23] <Keybuk> sending the events slowly seems to not produce ENOBUFS
[03:23] <Keybuk> sending them at normal speed produces it
[03:23] <Keybuk> so, BenC, can we get some printk()s to find out which -ENOBUF that is?
[03:23] <fabbione> it's only annoying that it doesn't bring up the ethernet
[03:23] <fabbione> anyway i need to take off for a while now
[03:23] <fabbione> Keybuk: thanks a lot dude
[03:23] <BenC> Keybuk: yeah, let me get my sparc back up
[03:24] <fabbione> later guys
[03:25] <zul> toodles
[03:29] <Keybuk> damn, that -s bug is entirely consistent at the first pci device
[03:30] <Keybuk> tickle: uevent: '/sys/devices/pci0000:00/uevent'
[03:30] <Keybuk> make_queue: directory: '/dev/.udev/queue'
[03:30] <Keybuk> create_path: stat '/dev/.udev'
[03:30] <Keybuk> wait_for_queue: directory: '/dev/.udev/queue'
[03:30] <Keybuk> oh
[03:30] <Keybuk> because tickling /sys/devices/pci0000:00/uevent DOES NOTHING
[03:31] <Keybuk> BenC: kernel bug! kernel bug! kernel bug! :)
[03:31] <BenC> Keybuk: stop picking on me! :)
[03:32] <Keybuk> (this is irrelevant to the ENOBUFS error, it's just amusing to find more errors along the way to debugging that one)
[03:32] <BenC> I think ENOBUFS is a red herring
[03:33] <Keybuk> you do?
[03:33] <BenC> more than likely our problem is more related to uevents not getting where they should
[03:33] <Keybuk> yeah, but if the socket buffer is full, they won't get there
[03:34] <Keybuk> the fact that pci0000:00 doesn't generate a uevent is only important when using "-s" where it waits patiently for the event ... during normal booting it's irrelevant
[03:34] <BenC> I can't see generic sockets getting full...lots of things would be broken
[03:34] <Keybuk> aye
[03:34] <Keybuk> udevd used to do it quite regularly until they increased the size to 16MB
[03:34] <BenC> were the symptoms the same?
[03:35] <Keybuk> yeah, I think so
[03:35] <BenC> I just can't see 16k uevents occuring, even on a sparc :)
[03:35] <Keybuk> we know there's only 730 events
[03:35] <BenC> so filling it would take a lot of effort
[03:35] <Keybuk> which is less than my amd64
[03:35] <Keybuk> I'm wondering whether it's actually that there's an event bigger than 1K
[03:35] <Keybuk> an event is just an env buffer, after all
[03:36] <BenC> true...guess I can put some debug to see what the event size is
[03:36] <BenC> or just check for > 1k
[03:39] <Keybuk> I wonder ...
[03:39] <Keybuk> the fact udev wants the socket buffer to be 16MB is just a hint about how big it should never grow past
[03:39] <Keybuk> it doesn't mean it can actually grow that big, the kernel might not have any free memory to grab
[03:39] <Keybuk> so it may actually be effectively smaller than the 730K needed to do the job
[07:35] <shaya> anyone home?
[07:36] <shaya> just filed a bug, can help try to debug it if that would help?
[11:07] <cjb> Sorry, stupid question:  dmesg/lspci dumps for lkml, should they go inline or attached?
[11:07] <cjb> (The only examples I can find were all attached, so I wonder if there's some differing standard for dmesg as opposed to patches.)