[00:20] theres room for the usb end of things to be clever as well; like only sending drawing operations or doing delta coding in the driver [00:21] if it were pushing new pixels for every displayed frame it'd be pretty gimped [00:23] buffalo has a wireless usb one too [00:28] -ECHAN? :) === lucent_ is now known as lucent === _LibertyZero is now known as LibertyZero === smb` is now known as smb [08:28] * smb yawns [08:28] * apw knows how smb feels [08:29] * smb checks the clock and decides apw must be an imagination at this time of day [10:04] smb, ping [10:04] psurbhi, Yup, whats up? [10:05] smb, hie [10:05] waned to discuss the nfs bug [10:06] in case of Lucid, does not commit_inode() call WRITE rather than COMMIT? [10:06] Sure. Well from my tests it does do a WRITE with UNSTABLE and then a COMMIT [10:06] Which is valid nfs v3 behavior [10:07] is this what you see for Lucid? [10:08] The issue seems to be that the description for what O_SYNC does for a nfs filesystem does sound like it shout switch back to the nfs v2 mode [10:08] Which is to send every write with the FILE_SYNC flag set [10:08] forcing the server to write out every packet before replying [10:08] i thought that in Lucid, instead of a COMMIT a WRITE occurs [10:08] while nfs v3 uses the commit packet to force a write out [10:09] in the upstream code, nfs_file_fsync() eventually calls COMMIT procedure (at the client side) [10:09] Not according to my log. And even from what I read from the bug report [10:09] the bug report is for lucid [10:09] for lucid i thought that a WRITE is called [10:09] The ibug report is about the fact that they use O_SYNC for the file open and still get the write/commit sequnec [10:10] ok.. [10:10] in case of upstream instead of the WRITE, COMMIT is called [10:10] WRITE gets called in both cases but for the old mode it sets FILE_SYNC and for the new one it uses UNSTABLE [10:10] and additionally in the new case the WRITE is followed by a commit [10:10] my understanding is that in the new mode (in the upstream kernel) it calls only COMMIT if the sync flag is used [10:11] No [10:11] and since the data does get to the backing storage this should suffice [10:11] You see a write, write reply and commit, commit reply when you look at the screenshot [10:11] for upstream code or lucid? [10:11] both [10:11] for lucid, i cannot seem to see the commit [10:12] and by tracing the upstream code i thought that nfs_file_fsync only calls COMMIT procedure [10:12] no? [10:12] no [10:12] but the same nfs_file_fsync in lucid calls WRITE [10:13] let me check again [10:13] The writes use unstable (you cannot just call commit) [10:13] commit does nothing but to indicate a writeout/sync to the server [10:13] look at comment #10 in the private bug [10:14] psurbhi, And that is exactly the same as I saw in my wireshark logs [10:14] in the upstream code, i can see that nfs_file_fsync() -> nfs_commit() -> nfs_commit_list() -> nfs_commit_rpcsetup() [10:14] which ultimately calls COMMIT procedure [10:15] This is the function called for a sync [10:15] And only does a sync [10:15] correct [10:15] The bug is about the fact that the write request(s) use UNSTABLE and not NFS_FILE_SYNC as flags to the write requests [10:15] but this is the same function called for O_SYNC too [10:16] so what i am saying is this [10:16] that the semantics of O_SYNC say that the data needs to be backed on stable storage [10:16] nfs specification does not say how to do that [10:16] so a client is free to do it with a COMMIT [10:16] or with a WRITE [10:16] where argument stable is FILE_SYNC [10:17] in case of upstream code this is done with COMMIT [10:17] in case of Lucid [10:17] nfs_do_fsync() ultimately calls WRITE [10:17] and does not mark the stable arg as "FILE_SYNC" [10:17] so i think the code works well in upstream code [10:17] but not for lucid [10:19] Well first. The wireshark logs look exactly the same for upstream and lucid. So I do not see where you get the wrong behavior for Lucid [10:19] does the COMMIT arise because of writeback behavior in Lucid? [10:19] rather than a synchronous behavior? [10:19] Second the new mode is nfs v3 behavior and documentation about O_SYNC says it is supposed to switch back the way things are done to the mode nfs v2 used [10:20] which is to mark every wreite with sync [10:20] but the nfs specification does not say anything about it [10:20] or no? [10:21] i mean the rfc 1813 [10:21] The nfs spec says that a v3 client can use the new mode and the server responds in the new way. Or the old mode and the server responds in the old way. The spec does not directly say anything about O_SYNC there. That I found in some description about the file flags in Linux [10:22] correct [10:22] http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html#MOUNTOPTIONS [10:22] In addition to the above definition of synchronous behavior, the client may explicitly insist on total synchronous behavior, regardless of the protocol, by opening all files with the O_SYNC option. In this case, all replies to client requests will wait until the data has hit the server's disk, regardless of the protocol used (meaning that, in NFS version 3, all requests will be NFS_FILE_SYNC requests, and will require that the Server returns t [10:22] his status). In that case, the performance of NFS Version 2 and NFS Version 3 will be virtually identical. [10:24] i agree on the part of hitting the stable storage [10:24] but that could be done with COMMMIT too [10:24] so i dont see why this should be a bug? [10:24] meaning that, in NFS version 3, all requests will be NFS_FILE_SYNC requests [10:25] which is not the case [10:25] i think that NFS_FILE_SYNC is the return value? [10:25] no, its in the request as well [10:25] and the description is when the whole filesystem is mounted with sync option [10:25] somehow semantically its the same [10:25] as the data needs to be written to the backing storage [10:26] when the file is opened with a sync [10:26] O_SYNC flag [10:26] and about the O_DIRECT [10:26] nfs_file_direct_write() is called in this case [10:26] the comment for this code says the following [10:26] We avoid unnecessary page cache invalidations for normal cached [10:26] readers of this file [10:27] I am not sure this is a good thing [10:27] also, the rpcs are often optimised depending on if data is found in the client cache [10:27] in this case, since we bypass the client cache totally [10:27] there is no oportunity to do this [10:27] The sentence above talks about opening *files* (regardless of the mount option used) [10:27] but the bug is not about the mount option [10:27] its about opening a file [10:27] with O_SYNC [10:28] so the filesystem might nt have been mounted with sync flag [10:28] And that using O_DIRECT changes it is rather lucky as the testcase only uses a short data size [10:28] i think the test case was just to show us [10:28] that the bug exists [10:28] in generic terms [10:28] its not correct to use O_DIRECT [10:28] when u need O_SYNC [10:29] there could be multiple apps opening the same file at the client side [10:29] or one app opening the file multiple times as well [10:29] in such cases O_DIRECT for getting FILE_SYNC would not be the same [10:30] And I just said that using O_DIRECT I can see that the client can use the other mode. Not necessarily that this always will result in the right behavior [10:30] correct [10:30] i agree on that [10:30] The bug is about the fact that O_SYNC is described as changing the behavior [10:30] but there was a comment about using this as a workaround [10:30] regardless whetehr the mount option sync was used [10:30] and i dont see this being a work around [10:30] smb, exactly [10:30] so i am saying O_DIRECT cannot be a work around [10:31] and I feel that in upstream code nfs_file_fsync() is used for O_SYNC [10:31] and this uses COMMIT to gurantee data sync on stable storage at server side [10:31] so i think this cannot be an upstream bug [10:31] but ofcourse these are my thoughts [10:31] if they help, then great.. [10:31] :) [10:32] Well this might be a shortcomming of the testcase [10:32] oko [10:32] I can see a commit but this may result from the file getting closed [10:32] exactly [10:32] in case of lucid [10:32] this is what i feel [10:32] :) [10:33] where as in upstream, irrespective of close, if O_SYNC is used, then COMMIT will happen [10:34] Ok, I see your point now [10:34] ok, great :) thanks !! [10:35] psurbhi, Though I think that leads us to two issues [10:35] First, in lucid there might be a general problem of having not all the writes followed by a commit [10:35] yessss [10:35] :) [10:36] in lucid fsync() too would never gurantee a COMMIT [10:36] unless wbc->reclaim is set to 1 [10:36] But secondly and this would be an upstream problem still, is that O_SYNC should give nfs2 behavior, which means all the writes should use writes with NFS_FILE__SYNC as a flg [10:36] ok, on this one i differ [10:37] because of what happens eventually to the data [10:37] and by nfs rfc [10:37] so that the client should be free to send whatever it wants to get the data to the disk [10:37] could be through WRITE with stable=FILE_SYNC or with COMMIT [10:37] This is not about the data it is just : "ll requests will be NFS_FILE_SYNC requests" [10:37] all [10:37] which will be the data+meta data [10:37] which will happen with the COMMIT [10:38] smb, correct [10:38] and this is what i felt at first too [10:38] but then decided against it because of the end result [10:38] IF all requests (including the write) are flagged with sync, you do not need the commit [10:39] only write needs to flagged with SYNC [10:39] O_SYNC gurantees that data is backed to the stable storage [10:39] how the client achieves it [10:39] should not matter [10:40] but i could be wrong here... maybe the semantics of the client matter [10:40] i am not sure [10:40] but to me really since the data gets backed up [10:40] O_SYNC from the applications perspective is achieved [10:41] psurbhi, yeah but that ignores the specific paragraph in the manual page which modified the behaviour ... right or wrong people should be able to assume the documented behaviour [10:42] or it could be a documentation bug [10:43] apw, which man page is this? [10:43] psurbhi, yeah it could, but advertised behaviour is often hard to change, even if it was a stupid thing to say originally :) [10:43] agreed [10:43] which man page is this really? [10:43] * apw loses it, hang on [10:44] also, i wonder why an application should look at the rpc flags? rather than the behavior? [10:46] apw, I found that on http://www.faqs.org/docs/Linux-HOWTO/NFS-HOWTO.html#MOUNTOPTIONS [10:46] smb, ok [10:46] Not sure how official that is, but it may be a place the bug reporters looked at [10:52] apw, smb, see you around then! [10:52] o/ [10:53] psurbhi, sorry chatting about the issue on mumble [10:53] ok [10:58] you are of course welcome [11:32] <_ruben> howdy, is there an "easy" way to capture kernel panics, as they tend to scroll off screen [11:44] _ruben: there are some ways to capture crashes, a classic way is using serial console if you have a serial port on the machine [11:45] for serial console, on the machine you want to capture the crash, you enable serial console with: [11:45] <_ruben> its a vm (vmware esxi), so a virtual serial cable might be an option [11:45] console=tty console=ttyS0 [11:46] ah ok, so you can try this way, enable serial console on the guest, and then attach a minicom etc. on capture machine/host [11:48] probably for vmware the setup is similar as 'Serial Console in VirtualBox' listed here: https://wiki.ubuntu.com/Kernel/KernelDebuggingTricks [11:48] you may want to check that out [11:50] <_ruben> seems i can bind a serial port to a ip:port in esxi [11:51] <_ruben> if only it'd work [12:05] * apw wanders to find a monitor ... [12:12] <_ruben> ah nice .. got it to work and caught the oops [12:13] <_ruben> wonder if this a bird or kernel issue : http://pastebin.ubuntu.com/564390/ [12:15] <_ruben> i'd be inclined to think kernel, since quagga causes similar crashes [12:30] <_ruben> hm, this one doesn't even mention bird: http://pastebin.ubuntu.com/564402/ [12:41] <_ruben> and then there's http://pastebin.ubuntu.com/564405/ .. everything's ipv6 related, but i dont see much (other) common factors [13:01] _ruben: not much ideas on this, the kernel shouldn't oops, I'm downloading the debugging symbols from this kernel to check more [13:02] (the ddeb package) [13:08] <_ruben> herton: the oopses happen within a minute after i enable ospf in bird6 .. since ipv6 support is fairly new in bird, it could very well be problem with bird(6) .. but even then, the kernel should be robust enough i'd say to handle userland "misbehaviour" [13:09] _ruben: meanwhile, you can try boot with slub_debug kernel parameter and test again, see if you get a report/different trace from slub, just to check if isn't some sort of memory corruption (double kfree, use after free, etc.) which makes code crash later, and different stack traces [13:13] <_ruben> herton: let me try that === artir is now known as afk|artir [13:15] <_ruben> herton: wow .. adding that option made it crash before i had a change to log in [13:16] <_ruben> hm, that was an ipsec related one [13:17] _ruben: did you got the oops/trace of this crash? probably is what is causing the corruption/bug [13:19] <_ruben> ah, that's quite likely actually .. part my ipv6 stuff runs over an ipsec tunnel with ipv4 on the outside and ipv6 on the inside .. and using the klips stack at that, let's try it with netkey .. and yeah, i got the oops, will pastebin it in a sec [13:20] <_ruben> http://pastebin.ubuntu.com/564425/ [13:32] _ruben: most likely ipsec module is broken, it isn't stock in the kernel, do you know where you got it from? === afk|artir is now known as artir [13:33] I went to check but ipsec isn't on debug package/kernel, so should be a external built module [13:37] <_ruben> herton: it's part of openswan .. and there ipv6 is quite "new" as well .. i'll take it up with them .. currently running with netkey stack and so far no problems (but also no full functionality yet, config issues) [13:37] <_ruben> herton: thanks for helping so far tho [13:40] _ruben: no problem. and indeed then should likely to be a bug in openswan [13:59] <_ruben> now to figure out why this isn't working with the netkey stack .. grr === diwic is now known as diwic_afk [14:44] apw: poke? :) === sconklin-gone is now known as sconklin [14:45] I'm looking for volunteers for a ubuntu developer week session :D === herton is now known as herton_lunch === diwic_afk is now known as diwic === bjf[afk] is now known as bjf [15:27] ## [15:27] ## Ubuntu Kernel Team Meeting - Today @ 17:00 UTC - #ubuntu-meeting [15:27] ## agenda: https://wiki.ubuntu.com/KernelTeam/Meeting [15:27] ## === herton_lunch is now known as herton === diwic is now known as diwic_afk === artir is now known as afk|artir [16:46] hello, I see a fix was posted for Gobi2k 3g radios in Kernel, (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554099?comments=all) but I can't use that Kernel due to a TPM-locked Kernel architecture.. and I was hoping the needed driver was available as a Module.. my other problem is I can't compile the module myself as until i fix 3g in ubuntu, my computer won't have internet. [16:46] Launchpad bug 554099 in linux "Qualcomm Gobi 2000 3G (gobi_loader/qcserial) broken" [High,Fix released] [16:59] ## [16:59] ## Meeting starting now [16:59] ## [17:00] bjf, not yet [17:00] bjf, I am here btw :) [17:00] bjf, You may rush the server team out [17:02] hello, I see a fix was posted for Gobi2k 3g radios in Kernel, (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554099?comments=all) but I can't use that Kernel due to a TPM-locked Kernel architecture.. and I was hoping the needed driver was available as a Module.. my other problem is I can't compile the module myself as until i fix 3g in ubuntu, my computer won't have internet. [17:02] Launchpad bug 554099 in linux "Qualcomm Gobi 2000 3G (gobi_loader/qcserial) broken" [High,Fix released] === ogra is now known as Guest60731 [17:38] hello, I see a fix was posted for Gobi2k 3g radios in Kernel, (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/554099?comments=all) but I can't use that Kernel due to a TPM-locked Kernel architecture.. and I was hoping the needed driver was available as a Module.. my other problem is I can't compile the module myself as until i fix 3g in ubuntu, my computer won't have internet. [17:38] Launchpad bug 554099 in linux "Qualcomm Gobi 2000 3G (gobi_loader/qcserial) broken" [High,Fix released] === ogra_ is now known as ogra__ === ogra__ is now known as ogra_ [17:42] Gartral, This bug was there to track that there was a) no gobi_loader (program to load the firmware) and for Lucid the driver itself did not have support for those newer modems. So fix here means that for Lucid this was added to a backports-modules-wwan package and Maverick got a gobi_loader package. So first question would be on which release you are running? [17:45] Actually the last status updates in that bug were caused by the upstream bug being closed as fixed in 2.6.35-rc1 [17:46] JFo, hey ... about ? [17:46] apw, yeah mostly [17:47] still suffering a bit of a headache [17:47] we haven't talked for a while, probabally about time we did [17:47] not as bad as it was last night [17:47] nasty [17:47] k, let me get my headset... one sec. [17:53] smb: 10.10 I stopped keeping track of the names [17:54] and I have Gobi_loader but the card won't connect, it Sees Verizon Wireless, but the logs keep saying there's no carrier [17:54] Gartral, Ok, that was Maverick. So in that the kernel driver should be ok [17:54] ah ok [17:54] smb: i'm not on the Ubuntu Kernel either, I'm on a chrome kernel.. as that's the only thing this hardware boots [17:55] which, should work, as it does in ChromeOS [17:55] Cannot speak from first hand experinece but it felt like firmware issues were quite a source of trouble === bjf changed the topic of #ubuntu-kernel to: Home: https://wiki.ubuntu.com/Kernel/ || Natty Kernel Version: 2.6.38 || Ubuntu Kernel Team Meeting - February-15 - 17:00 UTC || If you have a question just ask, and do wait around for an answer! [17:57] kees: About the issue we explored the other day: https://groups.google.com/forum/?pli=1#!topic/linux.kernel/MDIzfQMT3zU [17:57] I believe to have it available as a modem would be a sign that the kernel driver is supporting it. But then every provider seemed to have slightly different firmware files. Which one needs to get from the right provider. [18:00] I am afraid I would not know more to help. But the bug report definitely was not concerned with more than to make the modem detected by the kernel driver, not specific connection issues, [18:01] niemeyer: cool === artir is now known as afk|artir === sforshee is now known as sforshee-lunch [18:15] <- food. hopefully that will help the head a bit. biab === afk|artir is now known as artir === artir is now known as afk|artir === sforshee-lunch is now known as sforshee === afk|artir is now known as artir === ogra_ is now known as ogra === niemeyer_ is now known as niemeyer === Edgan_ is now known as Edgan === Quintasan_ is now known as Quintasan [20:41] * jjohansen -> lunch === diwic_afk is now known as diwic [21:16] i'm trying to install an older kernel through the kernel-ppa but not sure what package to install after adding the ppa. any ideas? [21:32] So I see an updated kernel for dove in lucid and maverick, but no notification (saw it on the kernel-meeting mintutes). It appears to be a security update, but I'm not sure. Does it need verification? There are no tags, and no one on my team was subscribed. [21:36] sconklin, ^^ [21:37] GrueMaster, we'll try to send out email when the pocket-copy to -proposed happens, at least after we know it's happened [21:37] JFo, bug #714719 looks to be another kernel drm bug. There's a kernel patch that needs tested. [21:37] GrueMaster, apw: I don't know, Tim produced that kernel, and he's out today. [21:37] Launchpad bug 714719 in xserver-xorg-video-intel "[i915gm] GPU lockup (ESR: 0x00000001 IPEHR: 0x02000004)" [High,Confirmed] https://launchpad.net/bugs/714719 [21:37] bryceh, thanks :) [21:38] I meant I don;t know whether verification is required. and what bjf said [21:38] JFo, next action for that bug is probably to roll a kernel .deb with that patch for the user to test, but I'll leave it in your capable hands here out [21:38] bjf: It sat in proposed since 2/4. [21:38] k [21:39] GrueMaster, you are correct [21:40] All I ask is for more timely notification. For some reason it never made the maverick-changes or lucid-changes mailing lists. [21:41] GrueMaster, we are not expecting you to do anything with it at this point, though I suppose a boot test would be nice [21:41] GrueMaster, because these are happening as a pocket copy from a PPA to -proposed, i'm assuming this is bypassing the process that sends email to *-changes [21:42] I guess I need to know what to look for when I do run across these to know I can ignore them. [21:43] Otherwise I get blindsided, which is very disruptive. [21:43] GrueMaster, we'll work on it [21:43] :) [21:44] GrueMaster: stuff in -proposed isn't announced to -changes because it isn't considered "published" yet. [21:45] This confuses me then, as I get updates for omap/omap4 via that mailing list. [21:53] JFo, bryceh got that one will do something tommorrow [21:55] JFo, apw, the good news is as I'm grinding through all these gpu lockup bugs, I'm having decent luck identifying dupes. I think among these scores of bug reports there's only maybe 4-6 'real' bugs [21:55] apw, still finding tons for that vesafb issue; bunch filed against maverick too [21:56] bryceh, that is a bit of good news :) [22:04] bryceh, sound good, sub me to them all and get jfo to put them on the list [22:05] and i'll spin them all tommorrow [22:07] yep yep [22:08] apw, will do [22:11] GrueMaster: do you run the q-r-t tests when doing kernel testing? You might want to compare notes with hggdh [22:19] kees: QRT? give me a hint (or a link). [22:21] GrueMaster: one sec [22:21] GrueMaster: I find https://wiki.ubuntu.com/KernelTeam/StableKernelMaintenance that mentions it. though there are more scripts. [22:26] I did find lp:qa-regression-testing. Is that the same? [22:29] GrueMaster: yup. [22:30] Ok. Pulling. Thanks. [22:30] We'll see how much of these work on armel. [22:34] GrueMaster: it should work; I designed them to work. :) usually test-kernel*.py gets run, excepting -hardening, since that's for future work. and the aslr may take forever on arm, so you might want to skip that too. [22:35] These platforms sit idle, so they can churn until they die. [22:35] Current platforms can run overnight or on the weekend if needed. [22:38] My goal is to have as much automated tests running as possible during down time (i.e. nights & weekends). [22:42] GrueMaster: cool [22:42] Only problem so far is my lack of time to learn python. I'm also frustrated that my son snagged my only python book. [22:43] (kids - sheesh). [22:44] bjf: sconklin: hi, do you recall the email I send ago about hardy netbook-lpia branch? anything suggested? [22:44] s/ago// [22:44] ikepanhc, did you send it to the mailing list or direct to us ? [22:45] ikepanhc: no, when did you send it? [22:45] bjf: direct to you [22:45] bjf: about two weeks ago [22:45] sconklin: ^^ [22:46] I'm looking for it - I don't see it in my inbox [22:46] sconklin: ok, I will resend it :) [22:46] ikepanhc, i see one from 10/15 of last year [22:47] weird.... [22:47] ok, resent [22:49] I got it [22:49] ikepanhc, i got it as well [22:49] :) [22:50] sounds like smtp blocks me :( [22:53] ikepanhc, i'm cloning it now [22:54] bjf: thanks [22:56] ikepanhc: I'm about to leave for the day - I'll look tomorrow unless Brad has it covered [22:56] sconklin: ok, thanks [22:56] ikepanhc, i should be able to get it reviewed today [22:57] :) === sconklin is now known as sconklin-gone [23:09] JFo, btw here's a list of the GPU lockup bugs on my radar - http://tinyurl.com/4t7gtyz [23:09] JFo, in case you want to pre-emptively sub apw to any of them [23:10] JFo, all the ESR: 0x00000001 ones I'm betting to be dupes but don't have evidence of that yet [23:12] ok [23:12] I'll have a look at them :)