robru | jhodapp: that looks like a network error to me, feel free to retry | 00:21 |
---|---|---|
imgbot | === IMAGE 71 building (started: 20150120-02:10) === | 02:10 |
jhodapp | robru, cool | 02:14 |
imgbot | === IMAGE RTM 203 building (started: 20150120-03:10) === | 03:10 |
imgbot | === IMAGE 71 DONE (finished: 20150120-03:20) === | 03:20 |
imgbot | === changelog: http://people.canonical.com/~ogra/touch-image-stats/71.changes === | 03:20 |
=== chihchun_afk is now known as chihchun | ||
elopio | ping cihelp: somebody around to check some weird errors on the devices? | 04:01 |
elopio | https://jenkins.qa.ubuntu.com/job/generic-deb-autopilot-runner-vivid-mako/752/ | 04:02 |
elopio | this was not happening a couple of hours ago. Things fail on settle, and we are getting crashes. | 04:02 |
imgbot | === IMAGE RTM 203 DONE (finished: 20150120-04:20) === | 04:20 |
imgbot | === changelog: http://people.canonical.com/~ogra/touch-image-stats/rtm/203.changes === | 04:20 |
michi | ci-help: cloud-worker-09 is running ridiculously slowly, causing our tests to fail. Can someone please fix it or take it off-line? | 07:29 |
robru | michi, cant help you, but ping cihelp for a response (no hyphen) | 08:25 |
michi | robru: Ah, thanks! Sorry for that! | 08:25 |
robru | michi, youre welcome. Bedtime for me! | 08:26 |
michi | Sure. Thanks for the heads-up! | 08:26 |
vila | michi: which job are you seeing ? The history for cloud-worker-09 (http://s-jenkins.ubuntu-ci:8080/computer/cloud-worker-09/builds) doesn't seem that bad | 08:42 |
michi | vila: sec… | 08:42 |
michi | vila: Maybe it’s not limited to cloud-worker-09 | 08:44 |
michi | Basically, we are seeing totally ridiculous test failures. They can happen only if the machine runs like a dog. | 08:44 |
michi | As in less than 1/5th the speed of a Nexus 4 | 08:44 |
vila | michi: I need to start somewhere... We're still investigating why *some* workers suddenly become slow | 08:45 |
michi | vila: right. Let me find one of the silly failures. | 08:45 |
michi | vila: Here is one: http://s-jenkins.ubuntu-ci:8080/job/unity-scopes-api-vivid-amd64-ci/57/console | 08:47 |
michi | It’s *impossible* for test 8 to fail unless the machine is seriously ill. | 08:47 |
vila | michi: right, I'm not denying slow workers lead to failures ;) | 08:48 |
michi | OK. I just wanted to re-assure you that I’m not blaming you for something that’s our fault :) | 08:49 |
vila | michi: hehe, thanks ;) | 08:49 |
michi | Basically, when our tests fail, we see a pattern that’s occured before. | 08:49 |
michi | Every now and then, a node on Jenkins goes really slow, and then tests blow up in random places. | 08:49 |
vila | michi: yeah, it's hard to track, so far workers 06, 08, 11, 12 and now 09 have been caught slacking... | 08:50 |
vila | and 03 too... | 08:50 |
vila | sometimes we can point to CPU or mem starvation, sometimes we look and all is well, a pain... | 08:51 |
michi | :( | 08:51 |
michi | How many jobs are on those nodes concurrently? | 08:51 |
vila | and 05... | 08:51 |
michi | Normally jsut one build at a time, I thought? | 08:51 |
vila | yes | 08:51 |
michi | So, if memory goes missing, that must be caused by the previous run of some build. | 08:52 |
michi | In other words, that would point at a really low level VM bug or some such. | 08:52 |
vila | michi: sorry, I was unclear, those are cloud instances, the starvation is at the cloud level | 08:52 |
vila | not a t the worker itself level | 08:52 |
michi | Aha | 08:52 |
michi | So a whole bunch of nodes can slow down, presumably because the hardware for those VMs is under-provisioned. | 08:53 |
vila | michi: something like that... :-/ | 08:56 |
pstolowski | trainguards hey, we don't need landing-021 anymore, you can free this silo, thanks! | 09:00 |
sil2100 | pstolowski: ok, will do, thanks for the info! | 09:00 |
vila | michi: s-jenkins is behaving weirdly but it seems http://s-jenkins.ubuntu-ci:8080/job/unity-scopes-api-vivid-amd64-ci/62/consoleFull was succesful... | 09:07 |
michi | vila: yes, occasionally one happens to work. | 09:07 |
* vila cries | 09:07 | |
michi | I feel with you :) | 09:08 |
* vila chases 04 isntead | 09:10 | |
vila | michi: the plot goes more obscure, but I see weird things on cloud-worker-10: kern.log is full of lines like: Jan 20 08:01:33 juju-jenkins-stack-prod-machine-13 kernel: [3988674.042973] .mir_unit_tests[17386]: segfault at 0 ip 0000000001607241 sp 00007fff7a02f4d0 error 4 in .mir_unit_tests-uninstalled[400000+2996000] | 09:49 |
vila | michi: that's where http://s-jenkins.ubuntu-ci:8080/job/unity-scopes-api-vivid-amd64-ci/61 is running, swapping like mad (~2G) and apparently blocking job #62 to finish (even if the console says it's succesful) | 09:50 |
vila | michi: does that remotely ring any sort of bell ? | 09:51 |
michi | No. | 09:51 |
michi | mir has nothing to do with us. | 09:51 |
michi | If mir core dumps in some of its tests, that shouldn’t affect anything else on the machine though. | 09:52 |
michi | It’s just a core dump, after all. | 09:52 |
vila | isn't it ? :-/ | 09:52 |
vila | swap divided by 2 already 833504 used | 09:53 |
michi | vila: I’m very sure it’s not anything we are doing in unity-scopes-api :) | 09:55 |
vila | michi: hehe, lucky you, I wish I can be sure of *something* ;) | 09:56 |
vila | michi: #61 finished, tests appear to be succesful yet the job is marked as fail, can you see why ? | 10:15 |
vila | michi: sry, the url is http://s-jenkins.ubuntu-ci:8080/job/unity-scopes-api-vivid-amd64-ci/61/consoleFull | 10:17 |
vila | oh, the last test was interrupted maybe ? | 10:18 |
oSoMoN | trainguards: can silo 16 be published, please? | 10:45 |
sil2100 | oSoMoN: doing! | 10:45 |
sil2100 | Ah, it's this ;) | 10:46 |
sil2100 | oSoMoN: +1, releasing | 10:47 |
oSoMoN | cheers! | 10:51 |
rhuddie | cihelp, is there a known problem with makos currently? We are seeing some strange failures in autopilot: https://jenkins.qa.ubuntu.com/job/generic-deb-autopilot-runner-vivid-mako/757/testReport/autopilot.tests.unit.test_introspection_search/ProxyObjectTests/test_find_matching_connections_attempts_multiple_times/ | 11:08 |
psivaa_ | rhuddie: let me take a look | 11:08 |
rhuddie | psivaa_, thank you | 11:09 |
psivaa_ | rhuddie: I remember seeing ths: http://paste.ubuntu.com/9793526/ traceback before, probably some autopilot experts to take a look at | 11:20 |
rhuddie | psivaa_, is it possible the mako might need reboot/refresh ? or some environment issue? | 11:23 |
rhuddie | as we were not able to reproduce this locally | 11:24 |
=== MacSlow is now known as MacSlow|lunch | ||
psivaa_ | rhuddie: the devices are rebooted anyway for each tests. (by using reboot-and-unlock.sh) and reflashed ( the latest being image 71) | 11:29 |
psivaa_ | rhuddie: and i see image 71 does not have this issue | 11:29 |
psivaa_ | rhuddie: so this boils down to the autopilot version that *this test is trying to test | 11:29 |
psivaa_ | rhuddie: i.e. python3-autopilot_1.5.0+14.10.20140806bzr527pkg0vivid865+autopilot0_all.deb | 11:29 |
psivaa_ | rhuddie: so again, that's autopilot guys :) | 11:30 |
rhuddie | psivaa_, thanks | 11:31 |
psivaa_ | rhuddie: yw | 11:31 |
=== alan_g is now known as alan_g|afk | ||
=== alan_g|afk is now known as alan_g | ||
sil2100 | o/ | 12:26 |
sil2100 | bzoltan_: https://code.launchpad.net/~fboucault/ubuntu-ui-toolkit/activity_indicator_animator_rtm/+merge/246631 needs to be approved | 12:27 |
bzoltan_ | sil2100: this is ambarassing ... i am sorry | 12:28 |
alan_g | cihelp we're seeing a lot of "error: device not found" failures - e.g. https://jenkins.qa.ubuntu.com/job/mir-mediumtests-runner-mako/3999/console - can you help? | 12:46 |
satoris_ | ping trainguards, something seems to be broken: https://jenkins.qa.ubuntu.com/job/thumbnailer-vivid-amd64-ci/11/console | 12:50 |
boiko | trainguards: could you please check why the "Reconfigure Silo" of row 60 is not a link? I added some extra MRs there | 12:53 |
sil2100 | boiko: looking | 12:58 |
sil2100 | satoris_: that's more like a thing for cihelp ^ | 12:58 |
sil2100 | boiko: hm, looks like a link to me | 12:59 |
sil2100 | boiko: could you try to refresh the page? | 12:59 |
boiko | sil2100: sure, let me see | 13:00 |
boiko | sil2100: oh, refreshing the page did fix it, sorry for the noise, and thanks for looking into that :) | 13:00 |
=== alan_g is now known as alan_g|lunch | ||
satoris_ | ping cihelp, there seems to be something wonky with build machines: https://jenkins.qa.ubuntu.com/job/thumbnailer-vivid-amd64-ci/11/console | 13:01 |
jibel | Elleo, when do you plan to update silo 2 with the additional dep for autopilot and the bug fix you wanted to add? | 13:06 |
Elleo | jibel: they've just landed in trunk, I'm going to ping bill to update his rtm sync stuff for silo2 when he gets on in about an hour | 13:10 |
jibel | Elleo, OK, thanks | 13:10 |
=== chihchun is now known as chihchun_afk | ||
cprov | satoris_: let me check. | 13:18 |
satoris_ | thanks | 13:22 |
=== MacSlow|lunch is now known as MacSlow | ||
* sil2100 needs to jump out for lunch | 13:28 | |
sil2100 | brb | 13:28 |
cprov | satoris_: something is wrong with some cloud worker (I suspect), I will dig further | 13:33 |
=== alan_g|lunch is now known as alan_g | ||
=== chihchun_afk is now known as chihchun | ||
=== fginther changed the topic of #ubuntu-ci-eng to: Need a silo or CI Train support? ping trainguards | Need help with something else? ping fginther | Train Dashboard: http://bit.ly/1mDv1FS | QA Signoffs: http://bit.ly/1qMAKYd | Known Issues: - | ||
bfiller | sil2100: trying to rebuild ubuntu-keyboard in rtm 2 and keep getting this error: https://ci-train.ubuntu.com/job/ubuntu-rtm-landing-002-1-build/100/console | 14:27 |
sil2100 | bfiller: let me take a look | 14:27 |
=== chihchun is now known as chihchun_afk | ||
sil2100 | bfiller: ah, crap... ok, it seems robru didn't fix that yet | 14:28 |
bfiller | sil2100: any workaround? just need a resync of ubuntu-keyboard | 14:29 |
sil2100 | bfiller: this error pops up when you try to build/work with a package that didn't get yet published for the given distribution... | 14:29 |
sil2100 | bfiller: let me try pushing the packages directly | 14:29 |
sil2100 | (at least the ones that cause problems) | 14:29 |
bfiller | sil2100: ok, thanks! | 14:29 |
=== fginther changed the topic of #ubuntu-ci-eng to: Need a silo or CI Train support? ping trainguards | Need help with something else? ping cihelp | Train Dashboard: http://bit.ly/1mDv1FS | QA Signoffs: http://bit.ly/1qMAKYd | Known Issues: - | ||
sil2100 | pete-woods: we need to get this approved: https://code.launchpad.net/~unity-api-team/unity-scopes-shell/fix-location-caching-vivid/+merge/246981 | 14:37 |
pete-woods | sil2100: even for vivid? | 14:38 |
pete-woods | oh, sorry the MR | 14:38 |
sil2100 | pete-woods: yeah, no one would want to release un-approved branches anywhere, right? ;) | 14:38 |
pete-woods | d'oh! | 14:38 |
pete-woods | have asked someone to look at it | 14:41 |
sil2100 | cjwatson: hey! Could you maybe copy myspell-hr to ubuntu-rtm? It's a main package so I have no power over it, and it will be required by a landing after it lands :) | 14:42 |
sil2100 | cjwatson: not sure if we can do a binary copy from utopic or not | 14:42 |
sil2100 | cjwatson: CI Train currently has issues handling new packages | 14:43 |
cjwatson | sil2100: done | 14:43 |
sil2100 | cjwatson: wow, that was fast, thanks \o/ | 14:44 |
ogra_ | plars, yo | 14:44 |
plars | ogra_: hi | 14:44 |
ogra_ | plars, so i need to land this adbd change that blocks on locked screen this week ... iirc there was an issue with the lab that you need a newer u-d-f which we didnt solve before the holidays ... do you know if that was solved now ? | 14:45 |
plars | ogra_: right, I pushed a MP on friday to make our stuff work with it and use a recovery image on krillin only (this is the only place where we'll have this problem right?) | 14:46 |
ogra_ | plars, err,no ... this is about adb | 14:47 |
plars | ogra_: A lot of us were out yesterday, so I'm going to try to get someone to review it and get it pushed in today | 14:47 |
plars | ogra_: oh, adb | 14:47 |
ogra_ | adb not accepting connections when the screen is locked | 14:47 |
plars | ogra_: sorry, refresh me... I thought this was about the recovery image update | 14:47 |
ogra_ | there was a u-d-f that puts the right override file in place you need to upgrade to | 14:47 |
ogra_ | not sure which u-d-f version that was, perhaps sergiusens recalls ? | 14:48 |
plars | ogra_: we're currently on 0.10-0ubuntu1 it seems, I'm not sure who updated it though. Could be landscape forced it on us again. We were on the latest previous one before that though | 14:50 |
plars | ogra_: when I looked on friday, we were on 0.4+15.04.20141125-0ubuntu1 | 14:50 |
plars | (of udf) | 14:50 |
sergiusens | ogra_: plars hah, we are at 0.13 now | 14:51 |
sergiusens | ogra_: plars --recovery was introduced on 0.11 iirc | 14:51 |
ogra_ | sergiusens, well, which version had the adbd lockscreen stuff ? | 14:51 |
plars | no, it's in .10 | 14:52 |
ogra_ | this isnt about recovery atm | 14:52 |
plars | Candidate: 0.10-0ubuntu1 is the latest I see for trusty | 14:52 |
sergiusens | ogra_: you want to make me navigate debian/changelog it feels :P | 14:52 |
ogra_ | well, thats what i'm doing here (on the vivid-changes ML though) | 14:53 |
ogra_ | i cant find any entry foir this | 14:53 |
sergiusens | * ubuntu-device-flash: --developer-mode extended to now also inhibit | 14:53 |
sergiusens | adb disabling when the screen is locked | 14:53 |
sergiusens | (0.4+15.04.20141104.1-0ubuntu1 | 14:54 |
ogra_ | heh, i didnt go that far back :P | 14:54 |
sergiusens | ogra_: I know; so much changelog :-P | 14:54 |
sergiusens | ogra_: this was during the washington sprint | 14:54 |
sergiusens | I recall asking plars to test now :-P | 14:54 |
sil2100 | bfiller: ok, I'm re-syncing ubuntu-keyboard, let's see how it goes | 14:55 |
ogra_ | plars, well, if you are running 0.10 to provision teh krillins i guess we're fine | 14:55 |
bfiller | sil2100: thanks | 14:55 |
sil2100 | bfiller: we might need to remove myspell-hr from the silo list later though, as it's already in the archives now | 14:55 |
bfiller | sil2100: that's fine | 14:56 |
plars | ogra_: I saw some chatter this morning about some device issues yesterday though, so I need to go check into what was going on with that, and if it's related. I'm in a meeting right now though. Let me get back to you on that.. | 14:56 |
ogra_ | plars, ok | 14:56 |
sil2100 | rsalveti: hey! If you have a moment, could you take a look at https://bugs.launchpad.net/ubuntu/+source/goget-ubuntu-touch/+bug/1412495 ? | 14:56 |
ubot5 | Launchpad bug 1412495 in goget-ubuntu-touch (Ubuntu) "ubuntu-emulator fails to start on Vivid" [Undecided,New] | 14:56 |
plars | ogra_: but if .10 is working then we should be ok, is that right? sergiusens? | 14:57 |
rsalveti | sil2100: sure | 14:57 |
sil2100 | Thanks :) | 14:57 |
ogra_ | plars, thats how i understand it, yes | 14:57 |
plars | and 0.4+15.04.20141125-0ubuntu1 is what we were running before for a long time I know | 14:58 |
sergiusens | plars: yes | 15:03 |
sergiusens | plars: you can go all the way to .13, .11 or .12 allows customization tarball overrides in case you plan on adding that as well | 15:04 |
renatu | sil2100, could you check why silo 000 is not updating the ppa packages | 15:38 |
renatu | sil2100, bfiller just push a new build but the ppa did not get updated | 15:39 |
sil2100 | renatu: hey! Let me take a look | 15:51 |
renatu | sil2100, thanks | 15:51 |
sil2100 | renatu: so, you want to rebuild sync-monitor in the PPA, right? | 15:52 |
renatu | yes | 15:52 |
sil2100 | renatu: the problem is that the build was started with 'watch-only' selected, which means 'don't do anything, just watch what's up in the PPA' | 15:53 |
sil2100 | renatu: let me rebuild it without that | 15:53 |
renatu | bfiller, ^^ | 15:53 |
renatu | sil2100, thanks | 15:54 |
sil2100 | renatu: yeah, it seems building now, yw | 15:54 |
renatu | sil2100, ^^ | 15:54 |
bfiller | renatu, sil2100 : ok thanks, I guess I checked that by accident and didn't realizes | 15:54 |
sil2100 | ogra_, jibel, davmor2, popey, brendand, robru: I need to skip todays evening meeting - you can still sync up on it if you want, but if there's anything important for me just leave me a ping on IRC | 16:06 |
popey | ok | 16:06 |
ogra_ | sil2100, i would like ot skip today as well | 16:06 |
jibel | sil2100, OK | 16:07 |
ogra_ | (my evening is still full of stuff) | 16:07 |
ogra_ | (and its surely depressing anyway, i guess brendand will just show off his new phone the whole meeting :P ) | 16:08 |
brendand | ogra_, :P | 16:08 |
jgdx | cihelp: Hello, I'm seeing a failure [1] on the u-s-s ci run for RTM on jenkins, but cannot reproduce that on my device. Any clue? :) [1] https://jenkins.qa.ubuntu.com/job/generic-deb-autopilot-runner-14.09-mako/14/testReport/junit/ubuntu_system_settings.tests.test_datetime/TimeDateTestCase/test_same_tz_selection/ | 16:13 |
jgdx | … using krillin | 16:14 |
rsalveti | sil2100: alright, triggering a new rtm build | 16:33 |
sil2100 | rsalveti: ok, thanks ;) | 16:33 |
imgbot | === IMAGE RTM 204 building (started: 20150120-16:40) === | 16:40 |
fginther | jgdx, the first thing that comes to mind is that jenkins is using a mako and not a krillin, hopefully that doesn't matter, but maybe it does? | 16:48 |
pstolowski | cihelp hello, i need help with silo 15 notoriously failing on powerpc with one of our tests... I've just increased the timeout in the test from 2 to 8 seconds and that didn't help... | 17:12 |
elopio | ping cihelp: can somebody please check the last run in this MP: | 17:21 |
elopio | https://code.launchpad.net/~canonical-platform-qa/autopilot/custom-assert-doc/+merge/246963 | 17:21 |
elopio | a couple of bad things there. One of the jobs failed, but jenkins still approved the review. | 17:21 |
elopio | and on the failed test, I see: error: device not found | 17:21 |
alecu | yes, please. | 17:22 |
alecu | that was quick, thanks! | 17:26 |
Ursinha | elopio: alan_g pointed me another job with device not found, plars said he would have a look | 17:26 |
elopio | Ursinha: ok, thanks. | 17:27 |
jgdx | fginther, maybe. Shouldn't though. | 17:38 |
plars | elopio: Ursinha: yeah, I'm looking at it right now. It seems a lot of devices failed and I'm trying to recover them | 17:38 |
elopio | plars: thanks. yesterday we saw a lot of weird crashes. | 17:39 |
plars | I did kill https://code.launchpad.net/~alan-griffiths/mir/MVC-introduce-default-controller-object/+merge/246924 on one which was clearly stuck, I'll restart in a moment after I get that device back up | 17:39 |
imgbot | === IMAGE RTM 204 DONE (finished: 20150120-17:50) === | 17:50 |
imgbot | === changelog: http://people.canonical.com/~ogra/touch-image-stats/rtm/204.changes === | 17:51 |
=== alan_g is now known as alan_g|EOD | ||
=== dpm is now known as dpm-afk | ||
pmcgowan | bfiller, om26er is silo 14 blocked on the thumbs fix? was hoping to have sd card done | 18:23 |
om26er | pmcgowan, kind of yes, I am not able to completely run the test plan due to thumbnail issue. | 18:24 |
bfiller | pmcgowan: yes it's blocked | 18:36 |
bfiller | pmcgowan: we might have to consider reverting thumbnailer, the fixes are not appearing to be trivial | 18:36 |
pmcgowan | bfiller, thats unfortunate although not sure what the thumbnailer change got us | 18:37 |
bfiller | pmcgowan: a slight performance increase first time they are being created, that is it | 18:38 |
pmcgowan | hmmm | 18:38 |
bfiller | pmcgowan: nerochiaro is looking at a gallery fix but there are a few things broken because of the change | 18:38 |
pmcgowan | bfiller, not sure I get why photo roll works but gallery doesnt, do we need more code sharing? | 18:39 |
bfiller | pmcgowan: yup | 18:39 |
bfiller | pmcgowan: and gallery is broken when doing rotation/cropping/editing which is not present in camera | 18:40 |
pmcgowan | I see, seems like we should revert then? | 18:40 |
bfiller | pmcgowan: the black image problem is fixed and was easy, but we having issues with the others | 18:40 |
bfiller | pmcgowan: lets make a call tomorrow and see if we can get gallery fixed | 18:41 |
bfiller | if not or too risky I'd say revert | 18:41 |
pmcgowan | ok | 18:41 |
nerochiaro | bfiller: if thumbnailer only buys us some perf i would say revert, then when the rtm rush is past release the new image editor which will hopefully make things more maintenable for everyone | 18:42 |
nerochiaro | bfiller: (new image editor + improved photo image provider) | 18:42 |
nerochiaro | bfiller: then we can put back the thumbnailer changes | 18:43 |
robru | jgdx: hm, I just assigned you silo rtm 4 for your request on row 55, just be aware it conflicts with silo rtm 19. | 18:57 |
bfiller | popey: can you review the updated camera-app in the store so we can release it? | 19:04 |
popey | sure thing | 19:05 |
bfiller | popey: thanks | 19:05 |
popey | bfiller: done | 19:06 |
bfiller | popey: nice | 19:07 |
bfiller | pmcgowan: ^^^ new camera-app should be available soon, just happroved in the store | 19:07 |
bfiller | Kaleo: ^^ | 19:07 |
popey | its available now, I just updated my phone fwiw | 19:07 |
bfiller | even better | 19:08 |
pmcgowan | nice | 19:13 |
pmcgowan | hmm dont see it | 19:14 |
pmcgowan | popey, what version of click do you have | 19:15 |
popey | now? phablet@ubuntu-phablet:~$ click list | grep camera | 19:16 |
popey | com.ubuntu.camera 3.0.0.469 | 19:16 |
pmcgowan | why cant I see it in the store | 19:16 |
popey | nice new features! | 19:17 |
pmcgowan | popey, is 3.0.0.469 same as 3.0.0.latest? | 19:18 |
popey | did you manually install it at some point? | 19:18 |
popey | the version in the store is 3.0.0.469, the .latest suffix is common if you built your own (or someone built for you) in qtc | 19:19 |
pmcgowan | I didnt think so | 19:19 |
pmcgowan | must have | 19:19 |
pmcgowan | probably loaded the ppa and forgot | 19:19 |
Kaleo | bfiller: fuck yeah | 19:26 |
=== dpm-afk is now known as dpm | ||
=== dpm is now known as dpm-afk | ||
rsalveti | davmor2: pmcgowan: camera-app is fine with 172/mako | 19:53 |
rsalveti | feel free to update | 19:53 |
davmor2 | rsalveti: thanks | 19:53 |
pmcgowan | rsalveti, will do | 19:54 |
jgdx | robru, thanks. Noted. I'll wait | 20:50 |
robru | jgdx: ah you don't necessarily have to wait, you can build your silo if you want to start testing it now, you just have to be aware that whoever publishes first has to let the other person know to rebuild their silo after the first silo is merged. | 20:53 |
robru | dobey: https://ci-train.ubuntu.com/job/ubuntu-landing-021-2-publish/21/console need you to approve these merges | 20:58 |
jgdx | robru, ah, right. thanks | 21:00 |
robru | jgdx: you're welcome | 21:01 |
dobey | robru: oh right. oops. sorry about that, done | 21:02 |
robru | dobey: no worries, publishing | 21:04 |
=== pat_ is now known as Guest37854 | ||
=== Guest37854 is now known as pmcgowan | ||
michi | cihelp: I need help with failing builds on Jenkins-ci and in Silo 15. Anyone around? | 21:47 |
fginther | michi, I can help with jenkins-ci, what's the job? | 21:48 |
michi | Basically, we have builds and tests failing left right and center on Jenkins-ci. It’s happening because the build machines are ridiculously slow. | 21:48 |
michi | fginther: thanks! | 21:48 |
michi | Lots of jobs, and different nodes. | 21:48 |
michi | If you search through the scrollback, I chatted with vila yesterday, who was trying to help. | 21:48 |
michi | fginther: Here is one: https://jenkins.qa.ubuntu.com/job/unity-scopes-api-vivid-i386-ci/61/consoleFull | 21:49 |
michi | That’s one of many. | 21:49 |
michi | This time, the build was aborted because the compilation hadn’t finished after two hours. | 21:49 |
michi | We are also seeing tests failing randomly due to timeouts. | 21:49 |
michi | We also have a problem with Silo 15, on PPC only. | 21:50 |
michi | A test keeps timing out. | 21:50 |
michi | We have no way to diagnose or fix this, because none of us has a PPC machine. | 21:50 |
michi | Is it theretically possible that the PPC run in the silo is affected by the same thing as the Jenkins-ci nodes? | 21:51 |
michi | This particular test has not failed in months, which makes us think that it may be an infrastructure issue. | 21:51 |
fginther | michi, I'll have a look at vila's notes and try to sort out if there is a job configuration change that would help (we had similar problems with mir that have been helped by similar changes) | 21:52 |
michi | Thanks! Anything you can do would be most appreciated. Things started going wrong either last Friday or Saturday. Prior to that, everything was fine. | 21:52 |
michi | What about the PPC issue? | 21:52 |
fginther | michi, the PPC build in silo 15 is whole done in launchpad which is on separate infrastructure from the jenkins ci infrastructure | 21:52 |
michi | OK, so it’s not the same thing then. | 21:53 |
michi | How can we get access to a PPC machine so we can work out what’s going wrong? | 21:53 |
fginther | michi, there may be a way through IS or maybe the foundations team to get access to a PPC system. I'll also ask a few others that might know of some machines | 21:55 |
michi | Thanks! Whom should I be talking to? | 21:55 |
dobey | ironically, it didn't fail on ppc64 | 21:57 |
michi | dobey: ? Are the 32 and 64 buiilds for PPC? | 22:01 |
tedg | trainguards, thanks for the silos! | 22:01 |
robru | tedg: ah, you're welcome | 22:02 |
dobey | michi: yeah, there's powerpc and ppc64el archs | 22:02 |
michi | Ah, I didn’t know that. | 22:02 |
dobey | michi: looking at the failure log, it looks like the test is expecting a timeout, but the "slow" server is returning a response in under 10 seconds? | 22:03 |
michi | Possible. I have no idea. | 22:04 |
michi | I’ll dig into the test code today and see what I can learn. It’s not my own code, so I’m not too familiar with it. | 22:04 |
dobey | ok. well that's what it looks like, jut from the log anyway (expecting an exception, and a 200 OK in the log). | 22:04 |
michi | But, yes, if no exception arrives when one is expected, you’d think that response arrived when it shouldn’t have. Or it’s a race of some kind. | 22:05 |
dobey | michi: it could maybe be clock drift happening, and then being corrected by ntpd, while the tests are running, and might cause sleep() or such to skip out early | 22:06 |
dobey | just a possibility :) | 22:06 |
michi | dobey: Interesting thought. But I would expect that to be very rare. We are seeing the test failing repeatedly. And ntpd adjusts the time by slowly creeping it, rather than just setting it. | 22:07 |
dobey | i guess it would depend on the remaining capacity of the battery in the hardware, on how rare the drift would happen. | 22:10 |
dobey | i do find it quite odd that it only happens on powerpc though | 22:10 |
dobey | i gotta run though. later :) | 22:13 |
fginther | michi, FYI, we'll continue to look at the problem you raised and will try to have some improvements by EOD. I need to go afk for a few hours, but will pick it back up when I return | 23:38 |
michi | fginther: Thank you, much appreciate it! | 23:39 |
cjwatson | michi: Is there any pattern in the builders it's succeeded on in the past? (They're all pretty similar, but not quite identical, in ways that mostly don't matter) | 23:46 |
michi | cjwatson: I honestly don’t know. | 23:46 |
cjwatson | dobey: The builders it's failed on today are VMs :-) | 23:46 |
cjwatson | michi: You could look. | 23:46 |
michi | I suspect that some change last Friday or Saturday caused it. Up to then, things were working just fine. | 23:46 |
cjwatson | (So could I, but I'm not on my usual system right now.) | 23:46 |
michi | Branches that used to work started failing then. | 23:47 |
cjwatson | michi: I'm not aware of any changes. | 23:47 |
cjwatson | You could diff the build logs in case it's something inside the chroot. | 23:47 |
michi | cjwatson: I’ll go through the past half dozen failures or so | 23:47 |
cjwatson | (That would also tell you about kernel changes.) | 23:47 |
cjwatson | There's really not a lot else that could possibly affect anything, so diffing the build logs is a good place to start. | 23:48 |
michi | vila told me yesterday that some of the build nodes are swapping themselves into oblivion | 23:48 |
cjwatson | That's jenkins, not Launchpad. | 23:48 |
cjwatson | Totally different. | 23:48 |
michi | Ah | 23:48 |
michi | OK, you are talking about the silo failure | 23:48 |
cjwatson | Yes | 23:48 |
michi | I’m building in a chroot on PPC now, to see whether I can at least reproduce | 23:48 |
cjwatson | OK, good, you found porter-powerpc then | 23:48 |
cjwatson | But sorry, yes, I'm talking about the powerpc issues on the grounds that fewer people are usually able to respond to those so it's more worth helping. | 23:49 |
michi | I appreciate it! | 23:50 |
cjwatson | Of the last six successful builds on powerpc, they're evenly distributed among our three regular powerpc builders. | 23:51 |
cjwatson | So that rules out that theory. | 23:51 |
cjwatson | Start with https://launchpad.net/ubuntu/+source/unity-scopes-api/+publishinghistory and you can find the historically successful builds from there; easy enough to grab and diff build logs then. | 23:52 |
michi | Cool, tahnks! | 23:52 |
cjwatson | infinity might know if the VM software has changed. | 23:52 |
cjwatson | (on denneed) | 23:52 |
cjwatson | We could rule that out by trying on sagari, but last I checked it hadn't come back since the power work in 3FP earlier today. | 23:53 |
infinity | cjwatson: Nothing on denneed has changed. | 23:57 |
michi | cjwatson: can’t reproduce in the build I did in the chroot. | 23:58 |
michi | tests are ticking over just fine. | 23:58 |
infinity | michi: YOu have a link to the PPA build that was failing? | 23:58 |
michi | All the failures we’ve seen on PPC relate to timeouts. Basically, it looks like the machine is super-busy or thrashing from the symptoms. | 23:59 |
michi | infinity: sec... | 23:59 |
michi | https://launchpadlibrarian.net/195403187/buildlog_ubuntu-vivid-powerpc.unity-scopes-api_0.6.11%2B15.04.20150120.4-0ubuntu1_FAILEDTOBUILD.txt.gz | 23:59 |
infinity | michi: Link to the build instead of the log would be more friendly. :P | 23:59 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!