=== Ursinha_ is now known as Ursinha [08:46] <`jpg> Hey guys, I am running into an issue with an image build script that is trying to install linux-image-generic-lts-vivid, it works most of the time but occasionally fails with: [08:47] <`jpg> Depends: linux-image-extra-3.19.0-30-generic but it is not going to be installed, Recommends: thermald but it is not installable [08:48] `jpg, where is it getting its packages from ? [08:49] `jpg, as that implies wherever it is has missmatched linux-meta-lts-vivid and linux-lts-vivid package contents [08:50] <`jpg> Here is my sources.list: https://gist.github.com/josephglanville/62b70cde524ca89d5423 [08:51] `jpg, well other than an out of sync mirror, that ought to not happen [08:51] `jpg, the thermald complaint is perhaps indicative of somethiing, as that should always exist at some version ? [08:51] `jpg, do you have a fuller log of the build failing and particularly all of the output of the apt bits that fail [08:52] <`jpg> Yeah, here is one from a few days ago: https://github.com/flynn/flynn/issues/1829 [08:54] <`jpg> Ok, maybe a month ago lol. I have one from today too: https://gist.github.com/josephglanville/6942421412c920d9c0af [08:55] `jpg, and you do and apt-get update somewhere i assume ? [08:56] <`jpg> Yep, it happens immediately before apt-get install [08:56] `jpg, and no errors or anything from there, can you paste that bit too ? [08:58] <`jpg> https://gist.github.com/josephglanville/65a1f4e3dfee65e6e71b [08:58] `jpg, and this was "last night" ... exactly how long ago ? [09:00] <`jpg> 2015-10-07 at 6:30 UTC [09:00] henrix, when did -30 go out into the archive, it was 3-4 days ago right ? [09:01] apw: yeah, on Monday iirc [09:02] henrix, that was the emergency one right, and that wasn't an abi bumper i assume [09:02] apw: correct (for both questions) [09:03] guys, do we have any knowledge of incompatibilities b/w LSI's mpt2sas driver versions and the firmware of the adapter ? [09:03] i.e. do we know if running mpt2sas version 18.00.00 on an adapter with newer f/w (20.00.00) can be an issue ? [09:03] caribou, i don't think i have any specific knowledge no, sorry [09:04] apw: no worry, I can't find anything relevant on their website either [09:04] caribou, it is worth looking to see if the driver asks the version and changes behavoir, which would imply they expect it to work [09:05] apw: good idea, I will look into it [09:07] `jpg, the timing of this is confusing, a -30 kernel and therefore that missing package would have been in the archive for many days at the time that failed, so it is hard to see how the mirror could be that out of sync for that long [09:08] `jpg, around the time you did that a new kernel did drop into -proposed but your sources say you arn't using it [09:08] <`jpg> apw indeed, especially considering I switched to archive.ubuntu.com to combat this exact issue. [09:09] `jpg, also as it doesn't attempt to download anything nor fail doing so that implies you have this inconsitancy in the local apt information (i think) [09:09] `jpg, that linux-meta is updated and linux is not in that update. and that seems wack [09:10] <`jpg> apw agreed. this happens on an ec2 instance that was previously using an ec2 provided ubuntu mirror before the script runs to switch it to the ubuntu archive and run apt-get update. [09:10] <`jpg> apw is there further steps that should be taken when switching the apt sources on an already installed system? [09:11] `jpg, apt-get update (which succeeds) should leave you in a good place [09:11] `jpg, i think i would advise trying to get more information out of the instance when this fails (i have no concrete answers) [09:12] <`jpg> apw yeah I will try tell it not to terminate one so I can inspect if after the build fails. [09:12] `jpg, an "apt-cache show 'linux*'" perhaps would be instructive on failure [09:12] <`jpg> apw cheers, will do that [09:13] `jpg, so we can tell if what you see is -29 or -31 in this case, or inded none at all [09:13] `jpg, this is the kind of instability one expects if ysing -proposed, which you are not, so thats very odd [09:13] This is not a mirror issue. [09:14] ^ which is also backed up by changing mirrors not having any effect [09:14] The thermald bit confirms that, even if the -30 thing wasn't already obviously a red herring. [09:14] yeah that ought to always be there, and it is an unversioned recommends [09:14] <`jpg> Yeah that bit confuses me greatly. [09:15] `jpg: Can I get a full console output of everything from update to failure, rather than snippets from different days? [09:16] `jpg: And immediately after the failure, "apt-cache policy thermald" might be interesting, as well as "ls -l /var/lib/apt/lists" [09:16] `jpg: But, I can tell you from a distance that it's not a broken mirror. It's a local breakage of some sort. [09:16] And a rather creative one, I wager. [09:16] <`jpg> infinity that is my conclusion too [09:17] <`jpg> the scary part is very little happens, this script is all that is ran against a clean ubuntu-cloud AMI: https://github.com/flynn/flynn/blob/master/util/packer/ubuntu-14.04/scripts/upgrade.sh [09:21] <`jpg> infinity: here is a coherent log of the full process of that script running: https://gist.github.com/josephglanville/57d1d1e800fffd2eb1a2 [09:21] `jpg, how is that [09:21] amazon-ebs: + [[ amazon-ebs == [09:22] occuring ... as the right hand side of that in the script is a constant string [09:22] `jpg: What it looks like to me is that, somehow, you're losing state for the "trusty" release pocket. [09:22] `jpg: So, you still have trusty-updates, but none of the packages it depends on. [09:22] How or why that would happen is a bit of a mystery. [09:23] <`jpg> yeah, especially considering the first line of sources.list is: deb http://archive.ubuntu.com/ubuntu trusty main universe [09:23] Can you get a shell on one of these immediately after thefailure? [09:24] <`jpg> Not easily I'm not able to reproduce reliably. [09:25] and i would say that Get:14 in that log ought to be getting that info [09:26] You'd think, right? [09:26] infinity, should i be supprised it is not a Hit: ? [09:26] Which is why I'm curious to see the state at failure. [09:26] <`jpg> I will set it up so it keeps the EBS volumes for a while though. [09:26] as it is a static thing ? [09:27] apw: Depends on how the image is built. This is first boot, so /var/lib/apt might not be populated at all, to save space. [09:27] infinity, there are other Hits in the list [09:27] but not many [09:28] apw: Only one. Which is, in itself, a bit weird. [09:28] `jpg, did you say this was in an instance that was previous updated against the ec2 mirrors before this bit occurs ? [09:28] And, indeed, might relate. [09:28] Since it's the trusty Release sig that hits instead of getting. [09:28] Which, if bogus, would mean trusty would be unauthenticated. [09:28] And apt will refuse to offer packages from unauthenticated sources without forcing. [09:28] But also, WTF, buttercup? [09:29] I wonder if there's a proxy here playing nutty butters. [09:29] <`jpg> apw it uses an ubuntu-cloud AMI, no updates happen or anything before this script runs [09:29] <`jpg> It literally boots instance from AMI, then runs this over SSH. [09:29] This may well be a subtle apt bug. But not one I've ever seen. [09:30] <`jpg> infinity that would be just my luck heh, i have a habit of finding obscure failure modes. :( [09:30] why are those three lines missing their right hand sides i wonder [09:30] Get:13->15 [09:30] <`jpg> apw that is the formatter unfortunately [09:30] `jpg: How often does this happen? Like, infrequent enough (1 in 1000, less?) to chalk up to a race somewhere, or is it more like 1 in 10? [09:31] <`jpg> 1 in 10 or so. [09:31] Okay. That's busted. [09:31] And, one would think, reproducible. [09:31] Unless there's also a transparent proxy muddying the waters on your behalf. [09:31] `jpg, do you have a date for the previous one you mentioned [09:32] i aske because this one occured right about the time that the new -proposed was copying out [09:32] which may be utter coincidence [09:32] <`jpg> Hmm, I could probably get all dates of all failures since 2014 with a bit of grep magic. [09:32] might be worth checking them against the publish dates for kernels, as 1/10 i assume means one about every 10 days or so [09:32] apw: It's pure coincidence. The problem isn't proposed/updates at all, it's the release pocket. At least, for that failure. [09:33] infinity, fair [09:33] but getting dates would give us a better frequency rate too [09:34] apw: Pulled my proxy config out, and trying in a loop to reproduce. [09:34] <`jpg> https://gist.github.com/josephglanville/61c0bf8a9b17beeac54c [09:34] apw: I'm kinda wanting to blame ec2 somehow. [09:35] <`jpg> infinity it wouldn't surprise me, this only occurs in ec2. [09:35] `jpg: And this runs how often? Daily? [09:35] <`jpg> 2x a day, every 12 hours. [09:35] Okay, yeah, that failure rate is insanely high, then. [09:35] So, not a mirror pulse race, unless you're amazingly unlucky at cron. [09:36] jpds: Do you have a throwaway instance in the exact same environment that you can experiment on? [09:36] <`jpg> infinity i can make one [09:37] <`jpg> It's just ami-c135f3aa in us-east-1 [09:37] # for i in `seq 1 1000`; do rm /var/lib/apt/lists/*Release* ; apt-get update 2>&1 | grep Release; done [09:37] <`jpg> kk [09:37] `jpg: ^-- The above might be interesting. [09:38] I see Ubuntu-lts-4.2 tags in trusty's kernel tree; do we plan to release 4.2 kernels in Trusty ? [09:38] caribou: Yes. [09:38] caribou: lts-* kernels are backported all the way to the next LTS. [09:38] caribou: See, eg, linux-lts-trusty in precise. [09:38] caribou, ^ that [09:39] infinity: good, that's what I thought; I just couldn't locate the package. I guess it's not out yet [09:39] caribou: It's not, cause wily's not out yet. [09:39] caribou, no they are not normally published in trusty itself until wily releases [09:39] caribou, they are being built in our PPA i believe [09:40] apw: well, that sort of answer my previons mpt2sas modules question : 4.2 has 20.00.00 module in it [09:40] * infinity retries his loop with, like, a logfile. [09:40] infinity: apw: thanks for the details [09:41] caribou: I'm not sure that answers anything regarding that question. :P [09:41] caribou: Since we don't force people to install the shiny new kernel. [09:42] infinity: true, but in this case, they want to know if an HBA running f/w 20.00.00 would work with the 18.00.00 driver which is in the 3.19-lts kernel [09:42] caribou: Right, which I'm saying it still a relevant question. Cause we're not dropping the 3.19 kernel. [09:42] Nor forcing people to install 4.2 [09:43] (Not to mention 3.13 and 3.16...) [09:43] infinity: the question being : Can Ubuntu certify that a HBA running f/w 20.00.00 will work fine with the 18.00.00 driver from the 3.19 kernel [09:43] caribou: I should hope so. [09:44] caribou: Unless LSI are complete morons. [09:44] infinity: I'm not able to identify any source of information saying Yay or Nay on this [09:50] caribou, is someone asking if it will, or asking for paperwork to say it will, or has one whic doesn't [09:51] apw: Sounds like a customer is asking for certification reassurance. [09:51] apw: 20.00.00 f/w fixes a problem for them so they want to be sure that it will not create other issues [09:51] i am not sure it is ever possible to assert a negative like that [09:51] now that I know that the 20.00.00 driver will soon be available, that should be enough [09:52] caribou, it may be worth asking cert if they have certed that combinatoin [09:52] apw: good idea, will do [09:55] caribou, apw - i'm not in certification anymore but i don't think we've ever certified individual components, only whole systems (officially) [09:56] well, I think I found the answer in the RL of the 20.00.00 f/w [09:56] brendand, yeah i was thinking a system with that combination indeed [09:56] they state that the 20.00.00 f/w is supported on Ubuntu14.04LTS [09:56] (3.13.0-24-generic) [09:57] caribou, ok good [09:57] and this one doesn't have the corresponding 20.00.00 module [09:57] case closed! [09:58] caribou, apw - http://www.ubuntu.com/certification/catalog/search/?query=HBA btw [09:58] brendand: thanks! [10:08] stgraber, tyhicks, we seem to have a lot of lxc test suite failures all of a sudden; handing at the end of the symlink tst [10:09] (after that test is reported, so while presumably running lxc-test-ubuntu) [10:09] apw: the LXC image server was down for 5 minutes earlier, you may have been unlucky and run right at that time :) [10:10] apw: basically during the same time I was disconnected from IRC (rebooting my main server for kernel update) [10:10] stgraber, would the ubuntu test be the only one affected ? [10:10] (or the first one affected ?) [10:11] well, depends on timing I guess, it only lasted 5min and the testsuite definitely takes more than 5min to run :) [10:11] we have multiple failures, over i think a longer period than that which have the saem stopping point [10:11] apw: do you have the failure log somewhere? [10:11] https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-trusty/trusty/amd64/l/lxc/20151007_094141@/log.gz [10:12] the ubuntu test actually doesn't hit our image server, it does however do a full debootstrap and grab a cloud image, so if the archive is messed up somehow, that'd break it [10:12] stgraber, that is just one instance, we have a ocuple of others too, and two in hang right now which will explode at the 5 hour mark [10:13] I don't suppose someone can grub us a ps output from one of those stuck instances? since the test isn't actually failing, we're not getting the full console output that'd be needed to figure this out [10:15] if I had to guess, I'd guess it's either debootstrap getting stuck on asking a question or something (which would be a critical bug in the archive) or the test runner's having a hard time reaching the cloud image server [10:15] stgraber, lets go ask pitti [10:16] apw: I'll do a manual run of the debootstrap side of things at home see if that's the issue [10:19] stgraber: apw: fyi, i'm also trying to reproduce it here in a VM by running the lxt-test-ubuntu (which is the one that seems to be failing) [10:19] stgraber, thanks [10:19] henrix, nice [10:21] hmm... i'm actually getting a failure too... it starts failing downloading a bunch of packages [10:22] * henrix wonders if it could be a disk space prob, as there's only 2G left in that VM [10:22] a debootstrap worked fine here [10:23] actually running the whole lxc-test-ubuntu now [10:29] lxc-test-ubuntu succeeded here [10:31] ok, i'm reruning (with the vivid kernel in -updates, which should pass) [10:31] (i was actually running with an older kernel before) [10:33] ah yeah, I'm running on wily but not with the proposed kernel [10:34] I can't see why the other tests would pass if it actually was a kernel bug [10:36] grrr! i just accidentaly killed the VM i was running the test on! [10:38] but it seemed to be running ok now, i didn't saw any errors downloading packages [10:40] `jpg: So, in an entertaining twist, we just rolled out an archive change that might accidentally fix your issue. [10:41] <`jpg> infinity oh? [10:43] `jpg: We just deployed InRelease (inline-signed Release files) support, which apt will favour in >= trusty. [10:43] `jpg: So, if the problem was Release/Release.gpg disagreeing occasionally, that will go poof. [10:43] <`jpg> infinity hah, neat [10:44] `jpg: Do you have the log from your loop? [10:44] `jpg: Curious if it ever hit the issue. [10:56] <`jpg> infinity loop seems to have been stuck at Get:4 http://archive.ubuntu.com trusty-updates InRelease [64.4 kB]. I started it maybe 20 mins ago. [10:57] <`jpg> Oh, it's moving. Just really slowly. [11:00] stgraber: apw: ok, i confirm i was *not* able to reproduce with the vivid kernel currently on -updates. i'm not sure what happen before, with the error i had download packages [11:00] henrix, ok so i guess we care if you can with the one in -proposed ... [11:01] apw: yeah, i'm now upgrading into -proposed and re-running just to confirm [11:04] `jpg: Uhm, yeah, seeing that here too. [11:06] fwiw maybe bug 1503655 [11:06] bug 1503655 in linux (Ubuntu) "Kernel bug in eventpoll_release_file+0x46/0xa0 with 3.13.0-66.107" [Undecided,Incomplete] https://launchpad.net/bugs/1503655 [12:47] `jpg: So, good news and bad news. Good news, we fixed the mirrors. Bad news, we fixed them by reverting the change that I said would mask your problem. :P [12:47] <`jpg> infinity haha [12:47] <`jpg> Such is life. :P [12:48] <`jpg> Do you think wiping out /var/lib/apt/lists/* before doing apt-get update is likely to work around the problem? [12:48] smb, I've a similar stack dump on my Trusty laptop, re: bug #1503655. I went back to 3.13.0-65 [12:48] bug 1503655 in linux (Ubuntu) "Kernel bug in eventpoll_release_file+0x46/0xa0 with 3.13.0-66.107" [Undecided,Confirmed] https://launchpad.net/bugs/1503655 [12:48] `jpg: Depends on what the problem really is, but it might make it more obvious and debuggable? :) [12:49] <`jpg> Ok, I think I might drop that in there for now then and add some debug code for failures and hopefully something shows up. :) [12:50] `jpg: It could prove to be a heisenbug that goes away due to your attempts to debug it. [12:50] `jpg: I suppose there could be worse results. [12:51] <`jpg> infinity indeed. Right now I just need the builds to work reliably lol. [12:52] rtg, yeah. seems henrix also had a similar one for vivid-proposed as well. we are digging into it (well henrix doing bisect) [12:53] smb, I wonder if it was CVE-2015-2925 causing regressions [12:54] rtg, your wily kenrel is blowing the same way [12:54] [ 22.977677] CPU: 3 PID: 2606 Comm: pulseaudio Tainted: P W OE 4.2.0-15-generic #17-Ubuntu [12:54] [ 22.978082] [] ep_unregister_pollwait.isra.7+0x6c/0x90 [12:54] apw, then it almost certainly CVE-2015-2925. everything else is unrelated [12:54] so i assume whatever it is in 4.2.x stable too [12:55] rtg, hm, cannot find that number in git log... [12:55] apw, oh, forgot about 4.2.3 [12:56] rtg, I am trying with reverting the aufs3 change [12:57] smb, I assume that code isn't even run until you mount an overlay file system [12:57] rtg, no these are in madvise [12:57] rtg, right, somehow the title misled me first as well [12:57] ah. maybe I'll just keep my mouth closed :) [12:58] smb, that does look rather suspicious, i may have utterly spaced the backport there [13:00] smb, yep they are UTTERLY wrong [13:00] henrix, those aufs3 fixes i did are just wrong [13:01] apw, hm, I am looking at the patch in the debian tracker... and cannot see that wrongness directly [13:01] and they are so utterly and obviously wrong i don't know how i didin't notice [13:01] smb, when the function changes it should change to (file) or (f) [13:01] apw: yeah, i was just comparing them with debian and... they are different [13:01] apw, oh! [13:01] and wrong, so very wrong [13:02] henrix, what do you want me to do, submit fixed ones ? [13:02] Respin city today, then? [13:02] apw, not that I saw that when reviewing [13:02] whne isn't it [13:02] smb, i know, i didn't when reviewing either [13:02] apw: I didn't catch up on backscroll, is this only in -proposed (and wily) kernels? [13:02] only in proposed yes [13:02] apw: whatever is easier for you: either a fix, or a revert+correct fix [13:02] Kay, shiny. Yay for actually catching bugs in proposed. [13:02] infinity, proposed and all [13:03] infinity, yep adt blowing chunks, then smb blowing chunks, when my laptop :) [13:03] It's nice when our process actually has the desired effect. :P [13:03] henrix, i say lets revert and apply a new set [13:04] apw: ack. want me to drive it? i can pick the new patch and submit it if you're busy [13:04] apw: also, i guess i can abort the bisect, or would you like me to continue? [13:04] (just to confirm... although that's probably it) [13:04] henrix, up to you, but that is soooo broken that who knows what it would do [13:05] henrix, is it easier if i clean up my mess ? [13:05] apw: ok, so i'll abort and will test the fix before respinning [13:05] henrix, sounds good [13:06] apw, push a fix on wily as well [13:06] rtg, will go there first indeed [13:07] apw, once done, I'l rejigger and upload just that patch which will clobber the kernel in proposed. [13:07] infinity, ^^ [13:10] rtg: No ABI bump, or do I need to go NBS-hunting in proposed? [13:10] s/do I/will I/ [13:11] infinity, prolly no ABI bump, but I'll have to mess with things since I haven't downloaded ABIs in awhile. [13:12] judicious application of ignore files [13:12] rtg: Well, fetchabis should Just Work, no? The current kernel's in the archive on all arches. [13:12] rtg: I see no reason to ignore, just use the machinery correctly. :P [13:12] in this case I could [13:13] rtg: The alternative, and always my favourite, is no new changelog entry. [13:14] what a heinous thought [13:15] we tend to not think of uploads in -proposed that don't make -updates, ad matering [15:15] rtg: hey, quick questions: kernel module signing is only performed when -signed packages are installed, correct? also, if the bootloader fails to verify the kernel and falls back to uefi quriks disabled, does module verification still happen? [15:17] jdstrand, as far as I know modules are still loaded, even if the signature fails (regardless of boot loader). I could be wrong, though. perhaps apw knows better ? [15:18] I don't think we have strict enforcement enabled, but I'm gonna have to check. [15:18] ok [15:20] enforcing secure boot by default is being worked on by foundations (there will be a secure way to disable that), so we'll want to be able to enforce it for modules too at some point [15:20] (and that point may be soonish-- we can discuss in the meeting later if needed) [15:20] jdstrand, ack [15:23] jdstrand, CONFIG_MODULE_SIG_FORCE=n - Reject unsigned modules or signed modules for which we don't have a key. Without this, such modules will simply taint the kernel. [15:23] cool, thanks [15:24] jdstrand, the kernel does bitch about it though [15:24] * jdstrand nods [15:24] * jdstrand is documenting the current situation [15:32] the kernel provided in the dvd (vmliz.efi) is compiled with some specific parameters? [15:32] *vmlinuz.efi [15:36] Have you guys noticed bug #1503647? Seems to be an urgent issue. [15:36] bug 1503647 in linux (Ubuntu) "System hangs with kernel 3.19.0-31" [Critical,Confirmed] https://launchpad.net/bugs/1503647 [15:40] GunnarHj: this could be a duplicate of bug #1503655 [15:40] bug 1503655 in linux (Ubuntu Wily) "Kernel bug in eventpoll_release_file+0x46/0xa0 with 3.13.0-66.107" [Undecided,Confirmed] https://launchpad.net/bugs/1503655 [15:41] GunnarHj: it's being fixed at the moment [15:41] GunnarHj: i'll comment in the bug with a link to a test kernel [15:42] henrix: Really? It's not the same kernel version. [15:42] GunnarHj: yeah, but we found this issue in several kernels (3.13, 3.16, 3.19 and even 4.2) [15:43] GunnarHj: the fix is already uploaded, should be in -proposed soon [15:43] (assuming it's actually the same issue) [15:43] henrix: Ok, I'm certainly not an expert on these things. ;) Thanks! [15:51] someone can help me, please? (see above) [16:34] i've recompiled the vmlinuz.efi, but i'm inable to boot the dvd in the uefi mode, i'm kicked in busybox shell (initramfs). In bios mode boot fine. Any hints?