braddr | sounds like a potential pain, testing wise. | 12:03 |
---|---|---|
fabbione | yes | 12:04 |
fabbione | indeed | 12:04 |
fabbione | but the first thing i will do is to test your combination | 12:04 |
fabbione | if i can reproduce it, i know what i need to tell people to do :) | 12:04 |
fabbione | because the point is that the UP kernel is well.. UP | 12:05 |
fabbione | it doesn't use anything of the SMP code if not to detect how many CPU's are available on the system | 12:05 |
fabbione | and that's done parsing the OBP devicetree | 12:05 |
fabbione | the test i did on your machine was to disable thread 0 of core 1 | 12:06 |
fabbione | (your booting CPU basically) | 12:06 |
fabbione | because your CPU has no core+ | 12:07 |
fabbione | core0 even | 12:07 |
fabbione | but the ALOM and OBP do some magic cpu virtual remapping | 12:07 |
fabbione | so even if in reality the code is running on core1 thread0, the kernel sees it as cpu0 | 12:07 |
fabbione | that might be buggy | 12:07 |
=== braddr wonders if he has a 6 core model where cpu 0 was flawed somehow so it was disabled and shipped as a 4 core. | ||
fabbione | braddr: all the cpu's are born as 8 cores | 12:09 |
braddr | ah | 12:09 |
fabbione | the 6/4 core models are 8 core where 8 - $x core did fail | 12:09 |
fabbione | so they get disabled in hw | 12:09 |
fabbione | if you do a showcomponent in the ALOM you can see it | 12:10 |
fabbione | the first block of MP/CPU0/.. are the core/thread set | 12:10 |
fabbione | the second block (8 entries) are the 4 dual bus for memory access | 12:10 |
fabbione | MB/CPU0/R0/D1 iirc | 12:10 |
=== braddr nods. | ||
fabbione | MB/CMP0/P0 | 12:11 |
fabbione | <- core0 thread0 | 12:11 |
braddr | so, barring remapping, cpu p0-3 and 20-31 were disabled | 12:11 |
fabbione | you don't have that | 12:11 |
fabbione | MB/CMP0/CH0/R0/D0 <- memory | 12:12 |
fabbione | basically yes. the 4 cores that did not pass the test have been disabled | 12:12 |
fabbione | that's also why (as you can see) there are "holes" | 12:12 |
fabbione | one thing that did puzzle me also was the ALOM POST process | 12:13 |
braddr | so, on your and david's sytems, at least one of them has 0-3 disabled? | 12:14 |
fabbione | in both your machine and my machine i did disable the booting CPU | 12:14 |
fabbione | and the POST was still running on it.. | 12:14 |
fabbione | so that's what triggered a bell | 12:14 |
fabbione | nope.. on all our machines there is a core0/thread0 working | 12:14 |
fabbione | david has 1x8 and 1x6 | 12:15 |
fabbione | i have 1x6 and 2x8 | 12:15 |
fabbione | but all with core0 | 12:15 |
fabbione | your is "special" | 12:15 |
=== braddr waits for the short bus to stop by | ||
fabbione | eheh | 12:15 |
fabbione | but we will figure it | 12:15 |
fabbione | it might as well be a case we don't handle properly OR a combination of bugs | 12:16 |
fabbione | we will see soon hopefully | 12:16 |
braddr | it'll get sorted out, I have no doubts. | 12:16 |
fabbione | yepos | 12:16 |
=== braddr [n=braddr@209.189.198.126] has joined #ubuntu-ports | ||
=== braddr [n=braddr@209.189.198.126] has joined #ubuntu-ports | ||
braddr | hark.. is that the sound of a jet passing overhead or did someone just turn on my t2000? :P | 07:59 |
fabbione | morning braddr :) | 07:59 |
braddr | g'morning | 08:01 |
braddr | brb.. my windows desktop's networking is acting all screwy.. gonna reboot. | 08:02 |
=== braddr [n=braddr@209.189.198.126] has left #ubuntu-ports [] | ||
=== braddr [n=braddr@209.189.198.126] has joined #ubuntu-ports | ||
braddr | what's on the testing block for today/tonight? | 08:08 |
fabbione | david has a possible workaround for your problem, but i think he is actually eating to power up brain and brainstorm on a proper solution | 08:09 |
fabbione | i just woke up and i am going to test another fix on your box | 08:10 |
fabbione | "another possible fix" | 08:10 |
fabbione | boot.img-debian-sparc <- tsk ;) | 08:16 |
fabbione | debian doesn't support Niagara :D | 08:16 |
braddr | hey, fire me. :) | 08:17 |
braddr | notice the date on that file? | 08:18 |
fabbione | nah.. i was just teasing :P | 08:19 |
braddr | I know | 08:19 |
fabbione | these are the moments in which i wish gzip could fork on N cpu's | 08:30 |
braddr | heh | 08:30 |
braddr | yup.. lots of cpu's just don't help single threaded apps. | 08:30 |
fabbione | about to boot the new image.. | 08:39 |
fabbione | you can go in console read only mode if you like | 08:40 |
fabbione | boot.img-fabbione-9 <- | 08:40 |
=== braddr crosses his fingers. | ||
fabbione | oh well | 08:42 |
braddr | kablooie | 08:42 |
fabbione | crap | 08:43 |
braddr | shouldn't need to be that verbose.. just boot net mem=1024k should be enough | 08:45 |
fabbione | DOUBLE TYPO | 08:45 |
fabbione | go fabion! | 08:45 |
braddr | er.. 1024m | 08:45 |
fabbione | it depends on the tftpd you are using | 08:45 |
fabbione | tftp-hpa doesn't support broadcast | 08:46 |
fabbione | so you need to specify the tftp server address | 08:46 |
braddr | It's been working fine for me forever. :) | 08:46 |
fabbione | it doesn't here and never did | 08:46 |
fabbione | might be my anal firewall | 08:46 |
fabbione | THERE WE GO | 08:48 |
braddr | I wasn't watching, but just saw the screen reset. | 08:48 |
fabbione | i am at the language selector | 08:48 |
braddr | yup.. I see. :) | 08:49 |
fabbione | is there an extra disk i can use to test an install? | 08:49 |
fabbione | i would like to make sure that it doesn't crash later | 08:49 |
=== braddr tries to remember if he did anything interesting with disk1 beyond just do the install back when we had it working. | ||
braddr | drop to a shell and mount the large partition? | 08:50 |
braddr | I definitly don't have a spare disk, but I'm thinking that it'd be fine to reinstall on disk1 | 08:50 |
fabbione | ok.. give me a few minutes to get to the partitioner | 08:50 |
fabbione | otherwise we can't mount anything | 08:50 |
braddr | might be quicker to just boot that disk. :) | 08:51 |
braddr | on the other hand.. gotta get that far to do an install for real anyway | 08:51 |
fabbione | yeah i think i am going to check a couple of more things before we do the install | 08:52 |
fabbione | yeps.. ok it's reproducible | 08:55 |
fabbione | let see what's the watermark for the issue to appear | 08:55 |
braddr | want me to take control for a sec and check that disk for wipe-ability? | 08:56 |
fabbione | in a few minutes | 08:56 |
fabbione | if that's ok with you | 08:56 |
braddr | take your time | 08:56 |
fabbione | thanks | 08:56 |
fabbione | this mem thing opens a completely new frontier :) | 08:57 |
braddr | the amount of memory seems to be a factor? Bizarre. | 08:57 |
fabbione | i have 8GB too | 08:57 |
fabbione | and it works | 08:57 |
fabbione | it might be a very complex combination at this point | 08:58 |
fabbione | 2g boots | 08:59 |
=== fabbione attempts 4 | ||
=== fabbione attempts 8 | ||
braddr | kablooie | 09:06 |
fabbione | yeps | 09:07 |
fabbione | that's good.. | 09:07 |
fabbione | it was expected | 09:07 |
braddr | if this goes well.. try 8G-1M or -1k even | 09:10 |
fabbione | yeah | 09:10 |
braddr | interesting | 09:13 |
fabbione | yes | 09:13 |
braddr | oh, sure, return to bisecting. | 09:15 |
fabbione | yeps | 09:15 |
fabbione | it might even be 7G+1m | 09:16 |
fabbione | hem | 09:16 |
=== fabbione hides in a corner | ||
braddr | stick with bisecting.. it won't take that many steps to figure out. | 09:18 |
fabbione | yeah i know | 09:18 |
fabbione | i forgot the m at the end | 09:18 |
braddr | and thus the upper bound has been changed. | 09:19 |
braddr | er, lower bound | 09:20 |
fabbione | yeps | 09:20 |
fabbione | except i keep forgetting the m :) | 09:20 |
braddr | what's it default to, bytes? | 09:20 |
fabbione | kb iirc | 09:21 |
braddr | 7680-8191.. | 09:21 |
fabbione | no worries.. that one is normal because i did interrupt the boot | 09:22 |
braddr | pick a number, any number.. where will the tip over point land. | 09:22 |
fabbione | 7936m <- 7.75G | 09:22 |
braddr | 7936-8192.. I'll put my money on 7999/8000 for no good reason. :) | 09:25 |
fabbione | 8064 now :) | 09:26 |
fabbione | it might even be bad ram | 09:26 |
fabbione | in UP used for something and it crashes | 09:26 |
fabbione | in SMP used by some init tables that are not used once init is done | 09:27 |
fabbione | that would be ironic | 09:27 |
braddr | that would be hilarious and certainly ironic. | 09:28 |
braddr | I certainly haven't done much that stresses memory to hell and back. | 09:28 |
fabbione | 8064 is good | 09:29 |
braddr | 8064-8191.. so much for my first pointless guess | 09:29 |
braddr | time for easier numbers.. 8100 | 09:29 |
fabbione | that would be.... 8064+64 | 09:30 |
fabbione | 8128 ? | 09:30 |
fabbione | yeah | 09:30 |
fabbione | i am pretty sure this will work too | 09:30 |
fabbione | last 32M :) | 09:34 |
braddr | 8128-8191, running out of midpoints. | 09:34 |
fabbione | for your blog -9 is just the dapper kernel from git with security fixes and stuff | 09:35 |
fabbione | -10 that i am about to boot after the bisect had DEBUG_BOOTMEM on | 09:35 |
braddr | on bellevue: ~braddr/.www/t2000/boots.txt -- I just made it world writable, so you can edit | 09:36 |
fabbione | nah i am not going to edit www please :) | 09:38 |
fabbione | so this one boots too | 09:38 |
braddr | well, you've been testing stuff when I'm not around, so it might be worth making notes. I just made that one file writable | 09:38 |
braddr | I'll add this block though | 09:39 |
fabbione | oh i didn't do much before.. | 09:39 |
fabbione | this is realtime debugging ;) | 09:39 |
fabbione | but i need a short break | 09:39 |
braddr | -9 git as of what date or id? | 09:40 |
braddr | -10 is -9 + debug_bootmem? | 09:41 |
braddr | anything to note about -8 and boot.img-dapper-orig? | 09:41 |
braddr | I think -8 was just current as of that day, but I'm not sure about that | 09:42 |
fabbione | -8 is old | 09:43 |
fabbione | -dapper-orig is just the acutal dapper kernel. | 09:43 |
fabbione | -9 is just git as of yesterday | 09:43 |
fabbione | no special id | 09:43 |
fabbione | -10 is -9+DEBUG_BOOTMEM | 09:43 |
fabbione | i didn't boot -10 yet | 09:44 |
=== braddr nods. | ||
fabbione | we are still booting -9 | 09:44 |
braddr | I saw. Where are we with the memory boot ranges? | 09:44 |
fabbione | 8176 | 09:45 |
fabbione | booting now | 09:45 |
braddr | highest good ? | 09:45 |
fabbione | davem suspects that we are sharing a page with some firmware stuff and everything goes bad | 09:45 |
fabbione | that's the bisect | 09:45 |
fabbione | and it works | 09:46 |
fabbione | so it's lower bound now | 09:46 |
braddr | okey | 09:46 |
fabbione | so next is.... | 09:46 |
fabbione | 8182 | 09:46 |
fabbione | no | 09:46 |
fabbione | 8184 | 09:46 |
fabbione | there | 09:49 |
braddr | sorry.. not paying attention to the boot attempts, updating the blog of what I've been doing with the box and the testing you've been doing. | 09:50 |
fabbione | bah | 09:50 |
fabbione | 8184 does error | 09:51 |
fabbione | and i forgot m again | 09:51 |
fabbione | so lower mark 8176 and higher mark is 8184 | 09:51 |
braddr | a hah.. new top end point! | 09:51 |
fabbione | new top 8180 | 09:55 |
fabbione | 8176 - 8180 | 09:55 |
braddr | down to a 4m range | 09:55 |
=== fabbione tests 8178 | ||
fabbione | Check cable and try again | 09:57 |
fabbione | Link Down | 09:57 |
fabbione | Timed out waiting for Autonegotation to complete | 09:57 |
fabbione | what have you done? | 09:57 |
fabbione | it went up again.. brrr | 09:57 |
fabbione | scary | 09:57 |
braddr | odd.. nothing, but it retried and made it | 09:57 |
=== braddr checks bellevue's dmesg log | ||
braddr | nothing interesting on that side | 09:58 |
fabbione | ok 8178 is good | 09:59 |
fabbione | 8178 - 8180 | 09:59 |
braddr | last boot! | 09:59 |
fabbione | nope... | 09:59 |
fabbione | we need to get down to kb | 09:59 |
braddr | well, unless you wanna dip into the k's. | 09:59 |
braddr | heh | 09:59 |
fabbione | 8179 | 09:59 |
fabbione | well ideally we need to find the page that's making this issue | 10:00 |
=== braddr nods. | ||
fabbione | a boundary of 4k would do | 10:00 |
braddr | get the page and have the kernel dump the entire thing before the crash and if possible as part of the oops or whatever that die point is | 10:01 |
fabbione | the problem is that we can't dump the page | 10:01 |
fabbione | as soon as we access that page the hypervisor blocks the system | 10:02 |
fabbione | that's the error we are getting | 10:02 |
fabbione | we need to understand what is there and why | 10:02 |
braddr | oh | 10:02 |
braddr | slip the guy a bit of valium, he's obviously a little too high strung. | 10:02 |
fabbione | 8179 is good | 10:02 |
fabbione | so now.. some maths :) | 10:03 |
fabbione | 8375296 - 8376320 | 10:04 |
fabbione | range in kb | 10:04 |
braddr | 2 boots to establish an acceptable good and bad value? | 10:04 |
fabbione | we will bisec that now :) | 10:04 |
fabbione | it's just one mega | 10:04 |
fabbione | so now it is 8179+512k | 10:05 |
fabbione | i think we found it :) | 10:07 |
fabbione | wow.. it doesn't even break! | 10:08 |
braddr | for a definition of we that has you doing all the work. | 10:08 |
=== fabbione powersoff | ||
fabbione | so low mark is still 8375296 | 10:09 |
fabbione | highmark is 8375808 | 10:09 |
fabbione | let's try in the middle | 10:09 |
braddr | worth repeating the last try? | 10:09 |
fabbione | yes, but not now | 10:09 |
fabbione | i want to test lowmark+256k | 10:09 |
fabbione | if it still hangs, i want to test passing a known to work mem in kb | 10:10 |
=== braddr nods | ||
fabbione | we need to make 100% sure we are not hitting other bugs, like MB = 1000kb instead of 1024 | 10:10 |
=== braddr recalls suggesting that. :) | ||
fabbione | i am pretty sure it's done in 2^ values | 10:11 |
fabbione | but you may never know ;) | 10:11 |
braddr | all we need is a good boot to see it in the boot output | 10:11 |
=== braddr eyes it warily. | ||
fabbione | it's hanging again | 10:15 |
fabbione | waits a few secs.. | 10:15 |
braddr | wait.. is this the -10 kernel? | 10:15 |
fabbione | -9 | 10:15 |
fabbione | we didn't boot 10 at all | 10:15 |
braddr | ok.. just checking | 10:15 |
braddr | this is the symptom we used to see with bootmem I tink. | 10:15 |
fabbione | yeps it hangs | 10:15 |
=== braddr checks his notes | ||
fabbione | probably because we are scraping the same page | 10:16 |
fabbione | bootmem does basically clean all mem allocation | 10:16 |
fabbione | oh neat | 10:16 |
fabbione | sc> reset doesn't the trick without poweroff/poweron | 10:17 |
fabbione | now i am booting with 8179 in kbytes | 10:18 |
fabbione | just to make sure we are converting it properly | 10:18 |
braddr | hrm.. my notes are insufficient | 10:18 |
fabbione | and guess what.. we are not | 10:18 |
=== braddr kicks himself. | ||
braddr | try it in m again | 10:18 |
fabbione | yeps | 10:18 |
fabbione | actually | 10:19 |
braddr | I was seeing this sort of boot hang with my hand built kernels, and I don't have enough notes on the differences. I did get one of them too boot and I want to say that bootmem was one of the differences, but I'm not confident enough in that memory. | 10:19 |
fabbione | i think mem is in bytes.. | 10:19 |
fabbione | let me check | 10:19 |
braddr | it was definitly hanging to the point of needing to reset it | 10:20 |
braddr | just append a k | 10:20 |
braddr | remove the doubt | 10:20 |
fabbione | mem=nn[KMG] | 10:21 |
fabbione | it doesn't say the default | 10:21 |
fabbione | so i guess it's bytes | 10:21 |
fabbione | this is 8179m in k | 10:21 |
fabbione | score | 10:21 |
fabbione | so the last boots were just wrong | 10:23 |
=== fabbione resets the counters | ||
braddr | ok.. so there's insufficient checking for minimum memory somewhere in early bootup. :) | 10:23 |
fabbione | 8375296 - 8376320 | 10:23 |
fabbione | 8375296 (8179) = good | 10:23 |
fabbione | 8376320 (8180) = booting now.... | 10:24 |
=== braddr hopes, for our sanity's sake, it fails | ||
fabbione | math is not an opinion in these cases ;) | 10:26 |
braddr | sure sure.. but as I get older, I'm starting to believe more and more that hardware can be spiteful. | 10:26 |
fabbione | hehe | 10:26 |
fabbione | see.. boom | 10:26 |
braddr | yup | 10:27 |
fabbione | 8375808k (8179+512k) = booting now.. | 10:27 |
=== braddr wanders to the kitchen to see what he can find to eat and drink. I kinda forgot to do something about dinner. | ||
fabbione | ehhe | 10:28 |
fabbione | yeah i will spend MAX another hour debugging | 10:28 |
braddr | I was gonna grab a subway sandwich or something before the closed, but that was 90 minutes ago.. oops. | 10:28 |
fabbione | it's sunday, i am tired and i want to get ready for the F1 race :P | 10:28 |
fabbione | subway rocks! | 10:28 |
braddr | they're not bad.. good for the sandwich category. | 10:28 |
fabbione | yeps | 10:29 |
braddr | back shortly | 10:29 |
fabbione | sure | 10:29 |
fabbione | bang | 10:30 |
fabbione | new highmark :) | 10:30 |
fabbione | lowmark= 8375296 (8179) | 10:30 |
fabbione | highmark= 8375808k (8179+512k) | 10:30 |
=== fabbione bisects | ||
fabbione | 8375552 (8179+256k) = good | 10:34 |
fabbione | 8375680 (8179+386k) = ahhhh new abort | 10:38 |
fabbione | i think we got it | 10:38 |
fabbione | that one just stops booting | 10:39 |
fabbione | Booting Linux... | 10:39 |
fabbione | Program terminated | 10:39 |
fabbione | so it's hitting something | 10:39 |
fabbione | the limit seems to be 8375552 (8179+256k) = good | 10:42 |
fabbione | that + 128 = stops immediatly | 10:43 |
fabbione | that + 64 = init the console and stops | 10:43 |
fabbione | clearly we don't catch the hypervisor error that early in the boot | 10:43 |
fabbione | bang bang | 10:44 |
fabbione | we are there | 10:44 |
fabbione | summary: | 10:50 |
fabbione | from mem=1024m to mem=8375552k (8179m+256k) is all good. | 10:50 |
fabbione | 8375552k + 32k = hangs hard at Booting Linux... | 10:50 |
fabbione | 8375552k + 64k = init the console and abort "Program terminated" 8375552k + 128k = "Booting Linux...\nProgram terminated" | 10:50 |
fabbione | 8375552k + 256k = as above | 10:50 |
fabbione | 8375552k + 512k = boots but stops at the hypervisor error we knew about. | 10:50 |
fabbione | i am switching to -10 | 10:50 |
fabbione | http://people.ubuntu.com/~fabbione/t2k-memboot-results.txt | 11:00 |
=== braddr captures that in his log as well. | ||
fabbione | ok i am done for today :) | 11:11 |
braddr | good progress | 11:12 |
fabbione | i guess we are pretty close now | 11:12 |
braddr | I look forward to hearing with davidm has to say about it | 11:12 |
fabbione | yeah he went to sleep not too long ago | 11:13 |
fabbione | so it will be not before tomorrow | 11:13 |
=== braddr nods, "I should have, but luckily it's the weekend. | ||
fabbione | i think i will try booting the bootmem kernel on my machine to see the diff | 11:13 |
fabbione | but later :) | 11:13 |
braddr | have a good rest of the weekend | 11:14 |
fabbione | thanks | 11:14 |
fabbione | good night | 11:14 |
=== ChanServ [ChanServ@services.] has joined #ubuntu-ports | ||
=== ajmitch [n=ajmitch@203.89.166.123] has joined #ubuntu-ports | ||
=== jb-home [n=jbailey@modemcable139.249-203-24.mc.videotron.ca] has joined #ubuntu-ports | ||
=== ajmitch__ [n=ajmitch@203.89.166.123] has joined #ubuntu-ports | ||
=== ajmitch [n=ajmitch@203.89.166.123] has joined #ubuntu-ports |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!