=== fabbione [i=fabbione@gordian.fabbione.net] has joined #ubuntu-ports === Starting logfile irclogs/ubuntu-ports.log === ubuntulog [i=ubuntulo@ubuntu/bot/ubuntulog] has joined #ubuntu-ports === Topic for #ubuntu-ports: http://cdimage.ubuntu.com/ports/releases/dapper/beta/ | http://cdimage.ubuntu.com/ubuntu-server/ports/releases/dapper/beta/ === Topic (#ubuntu-ports): set by fabbione at Thu Apr 20 19:26:35 2006 === daq4th [n=darkness@netstation-005.cafe.zSeries.org] has joined #ubuntu-ports === alvaro_ [n=alvaro@200.122.148.115] has joined #ubuntu-ports === alvaro_ is now known as fede2 === fede2 is now known as towel-day === braddr [i=java@209.189.198.126] has joined #ubuntu-ports === towel-day is now known as shinmen === ajmitch [n=ajmitch@203.89.166.123] has joined #ubuntu-ports === lamont [n=lamont@mib.fc.hp.com] has joined #ubuntu-ports === fabbione [i=fabbione@195.22.207.162] has joined #ubuntu-ports [05:35] fabbione: have you ever built a linux/sparc kernel from solaris/sparc? I'm gonna try to see if I can gather some more debugging info. Dave doesn't believe he'll have time before I have to return the box. [05:35] or from linux/x86, for that matter [07:56] braddr: well i did from linux arch foo to sparc [07:57] but not from solaris [07:57] google for crosstoolchain === braddr nods. compiling from solaris is proving to be rather painful [08:28] no wonder [08:28] if you have a patch i can build a kernel for you [08:28] it takes me only a few minutes here [08:28] or something you want to play with [08:29] well, I want to turn on all the debug options and probably add a bunch of printfs around the point of failure.. so the ability to iterate and build the net boot images on demand [08:30] yeah i understand that [08:30] hmm [08:30] if you have a'linux box, just build the cross compile tools [08:30] it's really easy to do === braddr nods, "just never have before. :)" [08:31] braddr: there is a set of scripts that do that for you. it's one of the first hits on google :) [08:39] building gcc and glibc.. this might just take a little while. [08:42] yeah it doesn't take that long.. iirc it disables the test suite [08:44] while this is running, if you feel like spinning a build with all the debug options turned on that might give something to go on. If I recall correctly, we were suspecting memory corruption, so CONFIG_DEBUG_SLAB, CONFIG_DEBUG_VM, CONFIG_DEBUG_PAGEALLOC, and maybe CONFIG_DEBUG_BOOTMEM might be interesting. [08:44] yes.. i can do that [08:45] have you seen other reports of boot problems or successes with 16 cpu models? [08:45] not one :( [08:45] but i did add you to the Release Notes [08:45] that's unfortunate [08:45] * SUN T2000 with 4 cores have been reported as not working. [08:47] ok give me a sec and i will build for you [08:47] more than a sec [08:47] i need to power up everything [08:48] no hurry.. the scripts are running to build the sparc64 tool chain [08:54] braddr: i assume you did try already to boot the latest installer... [08:54] I haven't tried anything since we last worked on this.. been sticking with solaris. [08:55] can you try just in case? [08:55] sure [08:55] http://archive.ubuntu.com/ubuntu/dists/dapper/main/installer-sparc/current/images/sparc64/netboot/2.6/ [08:55] may never know it was "auto-fixed" [08:56] stranger things have certainly happened [08:56] fetching now [08:56] yup === fabbione hates the console with all the his heart === braddr had forgotten how quiet it was when the beast is off. [09:03] eehhe [09:03] i had it turned on for the last few days together with the SAN [09:03] to do test installs for release [09:04] i can't hear my wife anymore.. and that's good... ;) [09:04] not sure i will be able to hear anything anylonger.. [09:04] mine's been on for weeks [09:04] no i power it off when i don't need it [09:05] i did rebuild all of the 9000 pkgs of dapper in 36 hours flat on my T2000 :) [09:05] with 2 of them you could make a port of a distro in a day [09:06] a few additional messages in the boot log, but dies with essentially the same error, but no stacktrace [09:06] ok [09:07] before: [09:07] [ 11.338121] NET: Registered protocol family 17 [ 11.648468] Badness in proc_get_inode at fs/proc/inode.c:157 [ 11.821338] Call Trace: [ 11.898036] [00000000004a1ee0] __lookup_hash+0xe0/0x140 [ 12.065575] [00000000004a5128] open_namei+0x128/0x640 [ 12.227346] [00000000004924c0] filp_open+0x20/0x60 [ 12.379717] [0000000000492658] do_sys_open+0x38/0xe0 [ 12.543989] [0000000000406a14] linux_sparc_syscall32+0x34/0x40 [ 12.739641] [0 === braddr glares at the paste === fabbione looks at braddr [09:08] well i remember that [09:08] in the 'after' picture.. after the net: registered protocol familly 17 is some usb probing then it probes all 4 eth ports [09:08] ending with: [09:08] [ 14.590307] e1000: eth3: e1000_probe: Intel(R) PRO/1000 Network Connection [09:08] Starting system log daemon: syslogd, klogd. [09:08] [ 16.902270] SUN4V-DTLB: Error at TPC[41c400] , tl -1034272768 [09:08] [ 17.143287] SUN4V-DTLB: vaddr[ffffffffffff0000] ctx[0] pte[800007ffffff0743] error[2] [09:08] HMMMM [09:09] where you running slowlaris before rebooting in Linux? [09:09] thought you might like the user space part. :) [09:09] yes [09:09] can you please poweroff -> poweron -> boot ? [09:09] roger [09:09] what was the openboot command to disable auto booting? [09:09] set auto-boot false? [09:10] like if i remember.... hold on :) [09:10] setenv auto-boot false? [09:11] looks like it's already set that way from last time. [09:12] powering back on.. 10 minutes to info. :) [09:12] take your time [09:12] i am still installing kernel b-d [09:13] the machine was in an interesting state today after a few tons of installs [09:14] hrm.. looks like the crosstool build blew up while building the kernel [09:14] that's no surprise.. the kernel config is crap there iirc [09:15] I'll poke at the script and see how to disable building the kernel and glibc.. all I need is gcc [09:19] Starting system log daemon: syslogd, klogd. [09:19] Killed [09:19] /build/buildd/cdebconf-0.97ubuntu3/src/debconf.c:135 (main): Cannot initialize debconf template database [09:19] /build/buildd/cdebconf-0.97ubuntu3/src/debconf.c:135 (main): Cannot initialize debconf template database [09:19] FATAL: Module usbkbd not found. [09:19] FATAL: Module usbhid not found. [09:19] FATAL: Module usbserial not found. [09:19] /build/buildd/cdebconf-0.97ubuntu3/src/debconf.c:135 (main): Cannot initialize debconf template database [09:20] the last line repeats [09:20] ok [09:20] it's that "killed" before [09:20] but at least it looks less scary [09:21] ok building the kernel now [09:21] there isn't a SUN4V-DTLB part this time [09:21] yes [09:21] i can see that :) [09:21] it's probably being catched by syslog or klogd === braddr nods, just confirming that I didn't leave it out. [09:21] that's why i told you to poweroff [09:22] sometimes slowlaris leave the CPU in an "interesting" unresettable state (for linux) [09:23] i think somehow the main issue has been solved but there is still something fishy going on. otherwise you would have seen the same problem [09:24] TBH if i was you i would try yet another reboot.. just to enjoy the 4 secs of silence between reboots [09:24] no seriously.. i would like to see another reboot [09:24] to see if the DTLB error comes up again [09:24] sometimes it's just a matter of timing with syslog starting [09:24] oh.. you wanted it w/o the powercycle? [09:25] powercycle was perfectly fine [09:25] just do another one now [09:25] ok.. in progress. [09:25] what we want to try to see if syslog is hiding it or not [09:25] given that we can't access the logs once it crashes [09:25] it might be interesting to see if two boots back to back w/o a powercycle gets the original error [09:25] up to you [09:26] the kernel here is almost done [09:31] past post, into the linux boot now [09:31] :) [09:32] crosstool past a stupid linux source bug and into building binutils [09:32] Starting system log daemon: syslogd, klogd. [09:32] [ 10.285601] SUN4V-DTLB: Error at TPC[41c400] , tl -1034272768 [09:32] [ 10.521781] SUN4V-DTLB: vaddr[ffffffffffff0000] ctx[0] pte[800007ffffff0743] error[2] [09:33] yup [09:33] it was just hidden somewhere [09:37] kernel is linking [09:37] ccache populated :) [09:38] gcc is buliding [09:40] I presume I'll need to grab more bits to build a boot.img file.. === braddr wanders back to google, the fount of knowledge [09:41] oh to build that one.. yes [09:41] i can give you the initrd [09:41] all you have to do is to slam it at the bottom of the a.out image [09:42] just cat the two together? [09:42] do you have debian/ubuntu on the other linux box? [09:42] debian,yes [09:42] apt-get source debian-installer [09:43] look for tftpboot.sh [09:43] that's the script that creates the boot.img [09:43] roger [09:46] hmm i can tell you that even if we manage to boot this debug, you won't be able to install [09:46] but i can workaround that with a lot of patience [09:46] s'ok.. step one, figure out what's breaking [09:46] yeah [09:46] step two.. profit [09:46] :) [09:48] that tftpboot.sh script relies on some binaries not part of the installer.. I believe I recognize piggyback from the kernel build, not sure about elf2aout [09:49] elf2out just converts a elf exec to a aout [09:49] i am sure there are sources somewhere [09:49] for sparc they are on sparc-utils pkg === braddr nods === braddr will tackle that after getting the crosscompiler built and the linux kernel itself built. :) [09:58] i am uploading the image.. [10:01] http://people.ubuntu.com/~fabbione/braddr/ === braddr wgets === braddr boot net's. [10:05] probably should powercycle, but let's see if this gets anything interesting [10:06] seems to stall here: [10:06] Booting Linux... [10:06] mem_init: Calling free_all_bootmem(). [10:07] interesting [10:07] any thoughts before I start the 10 minute power cycle bootup? [10:08] it might be just slow??? [10:08] been well over a minute now === braddr starts the cycle [10:09] ok try again [10:09] i will build one that doesn't trigger that === braddr will need to put a cap on this.. it's after 1am here. [10:10] I'll go until 2 [10:11] ok [10:11] #ifdef CONFIG_DEBUG_BOOTMEM [10:11] prom_printf("mem_init: Calling free_all_bootmem().\n"); [10:11] #endif [10:12] totalram_pages = num_physpages = free_all_bootmem() - 1; [10:12] seems pretty harmless [10:13] checking what that does exactly [10:13] considering there's other printf's in the neighborhood, adding one more can't be a problem by itself. [10:15] yeps [10:15] slamming one immediatly after [10:15] to se [10:15] to see if that completes [10:18] same stall. === braddr will give it several minutes [10:19] i am almost done with the other printk === braddr nods.. that's what I was figuring would be the next boot. :) [10:19] it takes a little bit to do things properly [10:20] in terms of change -> track change -> build proper images -> create boot.img -> blabla [10:21] screw the properly part. :) [10:22] no i don't.. it's important for me that each bit it's always tested properly... [10:22] we used to hack that way when we were adding the support to the kernel [10:22] but we realized only later that at cleanup time we lost bits here and there === braddr eyes the crosstools build.. died again. [10:22] duplicating work etc. === braddr nods. [10:23] I tend to take a 2 pass approach.. hack and slash until the main bug goes away, then re-do things with better knowledge of the problem from the original source [10:26] yeah but this is more building the image trick to be consistent [10:26] the code is in git.. so it's easy for me to grab the diff and reapply it clean to the main truck [10:26] trunk [10:27] sparc64-unknown-linux-gnu-hello-static: ELF 64-bit MSB executable, SPARC V9, version 1 (SYSV), for GNU/Linux 2.4.3, statically linked, for GNU/Linux 2.4.3, not stripped [10:27] yay [10:27] nice [10:31] uploading [10:33] ok it's there [10:33] same url [10:33] getting... [10:33] for what it's worth, still stalled there. [10:34] #ifdef CONFIG_DEBUG_BOOTMEM [10:34] prom_printf("mem_init: Calling free_all_bootmem().\n"); [10:34] #endif [10:34] totalram_pages = num_physpages = free_all_bootmem() - 1; [10:34] #ifdef CONFIG_DEBUG_BOOTMEM [10:34] prom_printf("mem_init: done Calling free_all_bootmem().\n"); [10:34] #endif [10:34] this should tell you if it goes any further [10:34] the stuff that free_all_bootmem calls are woodoo to me === braddr consults his magic eight ball to see what it predicts. [10:34] i predict it boots, print my stuff and hangs [10:34] that'd be the A answer. [10:35] if it goes further you should see: [10:35] printk("Memory: %uk available (%ldk kernel code, %ldk data, %ldk init) [%016lx,%016lx] \n", [10:36] wow.. break from sc isn't getting a prompt. powercycleing [10:43] mem_init: done Calling free_all_bootmem(). [10:44] and it hangs... === fabbione sighs [10:44] ok let's try without BOOMEM debugging [10:44] at least the voodoo off in the boot mem freeing code can be avoided. :) [10:44] yeah [10:51] i should really poweron the SAN to do these builds [10:51] spending ages on I/O on these slow SAS disks [10:52] heh [10:53] hi fabbione [10:53] hi aj [10:57] braddr: almost done... [10:57] got sucked with a customer :/ [10:57] no problem.. reading the docs on kbuild to learn how to do cross compiles [10:57] braddr: that's easy ;) [10:58] either edit arch/sparc64/Makefile [10:58] or export hmmm what envvar.. [10:58] HOSTCC and CC [10:58] that should do [11:00] and to trigger it as a cross build? [11:00] make ARCH=sparc64 ... [11:01] uploading new image [11:01] this one without DEBUG_BOOTMEM [11:01] sweet [11:04] done === braddr wgets the same url [11:09] Remapping the kernel... done. [11:09] Booting Linux... [11:09] [11:09] how much ram do you have there? [11:09] 8G [11:10] ok letme try to boot that image here [11:11] i also have 8GB [11:12] at least we can exclude one thing [11:14] this one stalls for me too [11:14] so it's either scratching all the 8GB of ram very very slowly [11:14] or it is simply broken === braddr votes the latter [11:14] i can try removing the ram [11:14] i will let it run for a bit while i get some lunch [11:14] one of the boots had to have been sitting there for 10 minutes or so [11:14] and try to come back with an image for tomorrow [11:15] i have no idea how verbose these debugging things are [11:15] i think i will disable all of them and enable one at a time to see what breaks [11:15] it's problably easier [11:15] but it requires a few tons of reboots === braddr nods. [11:15] go get some sleep [11:15] i hope to have an image by tomorrow or something [11:16] I'm about to try the first kernel build [11:16] ok :) [11:16] but remember that netboot needs to be a.out or it won't work === fabbione -> food [11:17] thanks for the help.. see ya in 20ish hours [11:17] no problem.. sure [11:17] i should be here :) [11:17] well, that failed fast. :) [11:34] fyi: make ARCH=sparc64 CROSS_COMPILE=sparc64-unknown-linux-gnu- vmlinux.aout === braddr grogs. [04:15] morning [04:18] hiya. before I went to sleep, I got a kernel build that worked enough to show the Booting linux... message [04:18] oh nice [04:19] i managed to add a couple of DEBUGGING options from none [04:19] but i had to stop [04:19] i am going to add more now [04:19] I didn't get the kernel + initrd image built.. I was more anxious to see the kernel working. I'm gonna eat dinner, watch a bit of tv, then dive back in [04:20] sure [04:20] it will give me a little bit to build another couple of kernels [04:20] you're trying to see what debugging options work and which ones cause your own box to die during bootup? [04:21] yeps [04:21] like we discussed yesterday [04:21] just checking. They're all 'supposed' to work, I assume.. something buggy with either sparc in general or t2000 specifically, potentially? [04:22] I'm still running 2.4 everywhere, so I'm a tad behind the times [04:22] i assume they are all supposed to work individually [04:22] not sure in big combos [04:22] I see references to turning some of 'em on in various l-k threads from time to time [04:23] I should probably do a build with a .config that matches the installer v 35 build [04:23] yeps [04:23] what sources are you using? [04:24] vanilla or ubuntu? [04:24] vanilla 2.6.17-rc5 [04:24] ok [04:24] the config might work [04:24] but a lot of drivers won't be there [04:24] i am booting up now [04:25] can give you a config in 2 minutes or so [04:25] for a t2000, a lot of drivers aren't relevant. [04:26] i know [04:26] they are just there [04:30] fyi.. a snippit from an email with my sun contact: [04:30] > Thank you for your feedback. I have passed it along to the T2000 Try and Buy [04:30] > Program people, as it is a new program, we appreciate the constructive [04:30] > feedback. As a follow up to your first point, on the T2000 Linux is not [04:30] > formally supported (yet) so any efforts Dave is making are best effort. [04:31] the 'yet' part I like seeing. [04:31] ehhe [04:31] my reply basically said I/we weren't looking for support, just time. === bradd1 [n=braddr@209.189.198.126] has joined #ubuntu-ports === bradd1 is now known as braddr === fabbione [i=fabbione@195.22.207.162] has joined #ubuntu-ports === gnu2it2 [n=kgrimm@cpe-069-134-161-191.nc.res.rr.com] has joined #ubuntu-ports [07:01] little help please.. E: Malformed line 3 in source list /etc/apt/sources.list (dist parse) [07:01] line 3 = deb http://ports.ubuntu.com/ubuntu-ports dapper [07:02] i dont see what is malformed [07:04] sure is quiet in here :) [07:30] deb http://http.us.debian.org/debian/ testing main non-free contrib [07:31] you're missing section entries [07:31] (making up the term, not sure what it is officially) === braddr [n=braddr@209.189.198.126] has left #ubuntu-ports [] === braddr [n=braddr@209.189.198.126] has joined #ubuntu-ports [06:50] fabbione: hopefully you're still asleep, but for when you wake, not a single oops or other kernel message during the entire install except for during module loading, and there just a bunch of symbol size mismatch messages [06:50] braddr: nice... [06:50] braddr: did you try to boot the SMP kernel? [06:50] not yet [06:51] could you please? [06:51] just try the default installed smp kernel? [06:51] yeps [06:51] ok.. I'll reboot after the rsync of the ubuntu.git tree is done [06:51] thanks [06:53] without the debug options, the race or whatever we were hitting is likely to hit, but it'll be interesting regardless. [06:55] it might not [06:57] rebooting [06:59] take your time. i am kind of busy anyway [06:59] no problem.. just munching on some pizza and catching up on some tv [07:04] ncpus probed : 16 [07:04] ncpus active : 16 [07:06] :) [07:06] now just play with it :) [07:06] and see how bad it goes [07:06] I much prefer reproducable, reliable, regular crashes. [07:07] i might be able to get the config changes in if they are not too heavy [07:07] in terms that we can work around it, but i need to verify how much heavier is the DEBUG_SLAB UP kernel [07:07] well, this smp kernel has no config changes.. it's what came from the default install. [07:07] yeps [07:07] that means that it's only with UP kernel the problem [07:08] or that the race condition hasn't triggered [07:08] probably [07:08] as i said.. play with it and see if you notice anything strange [07:08] i have 24h to upload a kernel [07:08] after that it will be a matter of custom installers [07:09] I'll have it do a few make -j32 kernel compiles.. always good for a stress test [07:09] -j512 is good too [07:09] don't be shy [07:09] the machine won't yell at you [07:10] the disk might.. it's not one of the snappiest ever created. [07:10] don't worry about that [07:10] trust me :) [07:10] davem didn't finf bugs on the kernel doing -j32 [07:10] i did when pushing to the edges of -j4096 [07:10] right.. make -j it is. [07:10] yeah [07:11] ok while you play.. i need to get ready to fly to London [07:11] if it's gonna be stressed, let's throw it all at her [07:11] whee [07:11] i have an airplane to catch in a few hours [07:11] stress her as much as you can [07:11] the more the better [07:12] while /bin/true; do make clean; make -j; done [07:15] :) [07:20] ouch.. aboutu 1.5G into swap, about 1200 processes all fighting for some time [07:20] that's o [07:20] k [07:20] don't worry [07:20] oh, I know.. as long as I don't run outta swap it's all good [07:21] you won't [07:21] make -j will make sure not to go over machine resources [07:21] it's done on purpose [07:21] yeah, looks like it's over the parallelism hump anyway [07:22] ok gotta go for a bit now [07:22] ttyl [07:22] have fun in London [07:22] nah still need to prepare the bag, shower and close here ;) [07:22] another 2/3 hours [07:22] just need to get started :D [07:26] interesting that 2 make processes seem to be consuming 2 of the cores full time [07:34] ooh.. an oops, and file-max limit reached [07:37] max file is a known bug upstream [07:37] log the OOPS [07:37] already done [07:37] the file-max is due to SLAB not releasing cleared fd fast enough [07:38] the oops is in one of the make's that's spinning.. which makes some sense. [07:38] wait, no it's not. [07:42] fun.. the two make's aren't kill -9able [07:44] add all to the logs :) [07:44] oh hold on [07:44] once you hit the file-max you need to reboot [07:44] way ahead of you [07:45] there is no proper way to recover from that [07:45] the oops came before the first file-max [07:46] evening [07:46] hiya aj [07:47] sounds like you're stressing the box nicely :) [07:47] just following suggestions === ajmitch is still waiting to hear what gets sorted out with this one [07:47] me too [07:47] but progress has definitly been made === braddr reboots [08:39] braddr: what's the url for the log again? [08:39] http://www.puremagic.com/~braddr/t2000/boots.txt [08:39] thanks [08:40] drop the file for a blog + a list of all the past files (from a month or so ago) [08:40] ? [08:40] http://www.puremagic.com/~braddr/t2000/ [08:41] oh ok :) [08:49] drivers/block/aoe/aoemain.o -- there's an Age of Empires game embedded in the kernel? :P [08:49] hahah [09:48] -> London [09:48] later === fabbione [n=fabbione@george.kkhotels.co.uk] has joined #ubuntu-ports === fabbione [n=fabbione@george.kkhotels.co.uk] has joined #ubuntu-ports [10:09] 32 cycles of make -j700 / make clean -- no oopses === braddr has restarted the loop with -j900