[04:00] vorlon: did you only reueue the ones for focal, or everything? just seeing the ones in impish triggered for gcc-7/gcc-8 being in progress as well [08:38] looks like just uploaded netplan.io is the latest thing to hit the missing riscv64 build problem. I guess what was said in #launchpad is the latest on this? [08:39] RikMills: Yeah. I was wondering why the riscv build is missing.. [08:40] if you hadn't seen, for some reason the builders are not doing riscv64 builds for arch: linux-any [08:50] okay. [08:55] excuses is seeing these as missing build, so blocking migration. so I hope we get a fix soonish :/ [09:11] -queuebot:#ubuntu-release- Unapproved: shim-signed (hirsute-proposed/main) [1.46 => 1.47] (core) (sync) [09:11] -queuebot:#ubuntu-release- Unapproved: shim (hirsute-proposed/main) [15.4-0ubuntu1 => 15.4-0ubuntu2] (core) (sync) [09:11] -queuebot:#ubuntu-release- Unapproved: shim-signed (groovy-proposed/main) [1.45 => 1.47] (core) (sync) [09:11] -queuebot:#ubuntu-release- Unapproved: shim (groovy-proposed/main) [15+1552672080.a4a1fbe-0ubuntu2 => 15.4-0ubuntu2] (core) (sync) [09:38] * enyc meows [09:38] hrrm, hirsute stuff appearing in -release channel .... does this channel remain 21.04hirsute related for a while now...? [10:16] enyc: all releases always appear here =) [10:16] enyc: so you see mentions for SRUs fro focal bionic here too. [10:18] xnox: aaah, i see. [10:18] xnox: hrrm, new puzzle -- why did a hueeeeeeeg list of price- changes all appear all at ance earlier? [10:34] enyc: precise has had ESM for a few years now. However, that is now stopping. It looks like all the precise ESM updates are now being made public, such that they end up in the public old-releases.ubuntu.com archive as part of precise complete sunset (end of ESM). [10:35] cause those who have paid for ESM in the past, need to still be able to find sources and binaries forever. [10:35] xnox: hrrm that mkes sense! =). [10:35] which will then be archived onto tape and taken to cold offline storage too. [12:13] vorlon, the i386-whitelist doesn't seem to apply to the nvidia 465 driver in this PPA (soon to be in the archive too) in Groovy and Focal (Hirsute and Bionic are covered): https://launchpad.net/~oem-solutions-group/+archive/ubuntu/nvidia-driver-staging/+packages [13:43] doko, python3-lib2to3 in impish-proposed needs python3:any (>= 3.9.5-0~), i think a python3-defaults upload is missing [13:43] doko, it makes systemd ftbfs [13:44] ok, uploading [13:45] hmm, better I soften the requirement [13:50] yes, possibly [14:00] Laney: did you just break autopkgtest.u.c? [14:00] Laney: now it's back but it just had server error, going to read logs [14:02] still takes 16s to load the page in chrome [14:02] the / page [14:03] but my curl needs 2s, so ... [14:03] ah now it got a 10s one too [14:04] time to rewrite autopkgtest-web in Go :D [14:46] heh [14:46] don't accuse me >:( [14:50] juliank: I bet some of the sql queries could be optimised, that's what it spends it time doing really [14:50] or the db itself [14:50] sqlite was a suboptimal choice maybe [14:51] but worth profiling the queries [15:18] -queuebot:#ubuntu-release- Unapproved: neutron (bionic-proposed/main) [2:12.1.1-0ubuntu4 => 2:12.1.1-0ubuntu7] (openstack, ubuntu-server) [16:03] -queuebot:#ubuntu-release- Unapproved: neutron (bionic-proposed/main) [2:12.1.1-0ubuntu4 => 2:12.1.1-0ubuntu4.1] (openstack, ubuntu-server) [16:04] i just uploaded this neutron, who keeps uploading the increasing version numbers? [16:04] coreycb are you uploading neutron, specifically the ...ubuntu7 version? [16:09] coreycb looks like your gpg key, are you ok if i ask for deletion of the ubuntu6 and ubuntu7 uploads, so they can review my ubuntu4.1 upload? it's properly versioned and includes the correct changelog in the changes file [16:09] bdmurray if you're on sru shift already today, could you delete the neutron uploads in bionic with versions ...ubuntu6 and ...ubuntu7, and review the upload in bionic with version ...ubuntu4.1 [16:10] the ubuntu4.1 and ubuntu7 uploads have the same patch content, the only diffs are version suffix and i put dep3 info into the added patches [16:14] juliank, yes -- the (single as far as I can tell?) query on the front-page would benefit from the addition of an index on result(run_id desc) [16:15] waveform: we can play with that! [16:18] juliank, it makes a fair difference to the explain output: https://paste.ubuntu.com/p/3PCznCdnKF/ [16:19] waveform: I'll do benchmark [16:19] What I think though is that we need some output caching [16:20] I don't know if Apache has that, I use nginx on my site and configure fastcgi output caching there [16:20] I'm always rather wary of caching when I can wring a bit more out of the (poor, usually horribly abused!) database [16:22] I don't know how many requests we get, but like a 30s cache of /running and / could be useful [16:23] Huh but running just opens a json file and dumps it, why is it so slow [16:24] how big's the json? [16:25] hmm, it is using the built-in the json parser too so if it's particularly large that might be another cheap win (switching to simplejson or some such) [16:26] looking forward to merge requests ;-) [16:27] some well selected indexes would be great [16:28] Laney, switching the json import I could certainly MP but fiddling with the database structure ... I can happily suggest things but I've yet to figure out where its structure is defined (unless it's just publish-db?) [16:29] I have code! [16:31] waveform: init_db() in utils.py or publish-db depending on which table you're after (yeah ...) [16:31] sounds like juliank is saving you though [16:33] Laney: Should be it https://code.launchpad.net/~juliank/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/402210 [16:33] Laney: eek, I pushed it to main repo [16:34] Laney: I have wrong remote set up :( [16:34] git revert! [16:35] nah it's ok, let's just try it [16:36] Laney: :) [16:36] fix that though 😬 [16:36] Laney: I have now :) [16:36] * waveform peeks through his fingers to see if his inadvertantly pushed suggestion blows up autopkgtest ... [16:37] lol [16:37] we have staging for testing such things [16:37] usually not things directly pushed to tha main branch though :p [16:38] https://autopkgtest.staging.ubuntu.com/ I mean, not broken ... [16:40] Laney: What changes is publish-db and download-{all-,}results really :D [16:40] Laney: What do I need to do aside from pushing, build the charm locally and then deploy it from wendigo? [16:41] juliank: I did it already, but https://autopkgtest-cloud.readthedocs.io/en/latest/deploying.html#update-the-code [16:41] Laney: ah ok, I have do the charm publishing locally and then run mojo run -m manifest-upgrade on wendigo, though for future things? [16:41] pushing to the charm store is the main bit which makes the manifest able to see it [16:42] Laney: Can we auto-publish main with some git hook from launchpad? [16:42] Sounds dangerous, need to remove commit privileges and setup a merge bot :D [16:43] we could lock it and require merge proposals [16:43] I don't know how to set that latter part up though [16:44] anyway that is in production now! [16:44] Or we make it publish tags, and tag when we want to publish, and continously deploy main to staging [16:44] * Laney whispers "github actions" [16:45] hmm, is publish-db generally a bottleneck? Just reading through it and there's quite a few changes I could suggest there too (would also depend on the version of python3 on the host though -- some stuff like using the SQLite backup API to copy stuff instead of messing with flocks would depend on it being >=3.7) [16:45] It doesn't run in the web frontend thread, but it sure eats up like 40% CPU when it runs [16:46] we're on focal there [16:46] okay - I can certainly understand why. Any idea what python version is on the host? [16:46] okay, that's good [16:47] be nice to move off sqlite completely really [16:47] Laney: Still have to create a sqlite db for publishing? [16:47] Well I guess we could go to another interchange format for britney and friends [16:48] while I'm more of a postgres type, sqlite really is pretty capable -- just need to treat it right :) [16:48] Laney: I think download-results.service needs a restart to make it create the index, which mojo didn't od [16:48] I'd run a NoSQL file based database :D [16:48] * waveform shudders [16:48] heh [16:49] juliank: ah right, I thought browse.cgi called init_db too, do it [16:50] two autopkgtest-webs don't forget [16:51] Laney: I think what we should do is split the result downloading and publishing into its own server, and then download the db on the web workers, this would avoid the spikes every couple minutes in CPU usage that drive request times up to 17s [16:52] Laney: waveform / now returns in like 0.8s best case, nice. [16:52] could do [16:53] Laney: or we switch to dqlite :D [16:53] cheap thing to do quickly would be to apply resource limits to it, but maybe waveform's fixes will sort it all out [16:53] good to hear -- I'll run some experiments tonight. I'm reasonably confident some fairly big gains could be made in publish-db though I'm still not clear on some bits of this architecture. Is there "one true" SQLite db (which stuff writes to) that gets periodically copied by publish-db for publishing? [16:54] waveform: yes, well each web server has its own db they built independently and then copy them [16:54] yeah there's a "read only" one [16:54] it's basically building that [16:56] waveform: So the JSON is 368K large [16:57] at that size it's probably worth benchmarking the internal json vs simplejson assuming adding dependencies like that isn't a major no-no [16:58] Laney: I think what would help would be second CPUs since we run at like 60% quite a lot [16:58] Most overhead is cache-amqp really [16:59] and amqp-status-collector and publish-db [17:01] But yeah, CPU limits on background services would help too [17:01] I guess we can either move the load off or beef up the machines or optimise the code [17:01] well, any combination of thoes [17:01] right I'm off, see ya later [17:02] ddstreet: I'm fine with my uploads getting rejected, let's just make sure the stable/queens git branch is updated [17:02] let me know if you don't have access to push [17:26] coreycb so the pkg can match what's already in the ubuntu-openstack-dev git tree, we should probably ask for my upload to be rejected, but i think you also need to rebuild your upload using the -v param so that it includes the changelogs from ...ubuntu5 and ...ubuntu6, since those versions were never accepted [17:26] either way is fine with me [17:49] britney runs are are not completing due to failure to fetch results. not had a successful run and update of excuses since ~11am [17:49] juliank: is that related to what you were doing, or something else? [17:50] no! [17:50] RikMills: Well, 11am UTC? [17:50] yep [17:50] Right no, not done anything before 15:00 [17:51] current excuses: Generated: 2021.05.04 10:42:04 +0000 [17:51] um 14:00 :D [17:51] That's when I saw the error on autopkgtestubuntu.com [17:51] I'll leave britney stuff to Laney and sil2100, I don't have insight there [17:51] or I guess someone in the US [17:51] runs are crashing with things like: https://paste.ubuntu.com/p/n53fXB6PvN/ [17:52] hmm works for me [17:52] That url [17:53] https://people.canonical.com/~ubuntu-archive/proposed-migration/log/impish/2021-05-04/17:26:13.log [17:53] It fetches result fine for a while, then boom [17:53] RikMills: ack, I see it in haproxy log [17:54] aha. anyway, as long as someone is now aware :) [17:56] which you highlight should have taken care of ;) [17:56] *your [18:06] RikMills: Hmm, so this is just proxying to swift storage [18:15] RikMills: I have forwarded that to IS [18:15] juliank: cheers! [18:58] RikMills: The error seems to be May 4 17:41:17 juju-4d1272-prod-proposed-migration-6 haproxy[3496355]: [WARNING] 123/174117 (3496355) : Server https_service/apache2-0-80 is DOWN, reason: Layer7 wrong status, code: 500, info: "Internal Server Error", check duration: 585ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue. [18:58] we should back off better, but I also think the proxying should move to haproxy, there's no need for it to have to hit the backends I think [18:58] Laney: ^ [18:58] juliank: both died or what? [18:59] that's half the point of load balancing them [18:59] Laney: Yes, both die regularly on / with database error (inconsistent image or whatever) and return 500 [18:59] 10.15.102.10 - - [04/May/2021:17:41:16 +0000] "HEAD / HTTP/1.0" 500 172 "-" "-" [18:59] at the same time though [18:59] sad [19:00] Laney: I think it won't retry immediately [19:00] Laney: Shifting britney cron job back 5 minutes might fix it... [19:00] anyway, let's move it off the web nodes [19:00] britney just runs whenever it can [19:00] do you speak haproxy? [19:01] Laney: I just read some docs where they did that, but no [19:01] This is the error fwiw, sqlite3.DatabaseError: database disk image is malformed: /var/lib/juju/agents/unit-autopkgtest-web-0/charm/webcontrol/browse.cgi [19:02] Not sure why it prints the browse.cgi script as the database name [19:02] what is it, a race with publish-db? [19:02] bad bad bad [19:02] Maybe? [19:03] for haproxy, probably use_backend swift_server if { path_beg /results/ } [19:03] and setup backend [19:03] lets fiddle on staging [19:11] Laney: I didn't understand how the image can be inconsistent, but I guess now it might be a problem with the journal [19:11] no? [19:11] It shouldn't be because the journal is a write ahead one that gets merged with the db? [19:12] and we rename the new db atomically [19:12] how can that race? [19:13] I don't get it, we should see either the old or the new one as we do it atomically [19:13] https://autopkgtest.staging.ubuntu.com/results/autopkgtest-impish/impish/armhf/g/gzip/20210428_154557_d7576@/log.gz <- that is proxying @ haproxy now [19:13] now I need to translate from haproxy.cfg to the charm config [19:13] There might not be a way to atomically copy the .db while the wal is being merged [19:13] By copying the file [19:14] waveform's changes to make it use sqlite backup stuff should fix that I suppose [19:15] Laney: Or, we are supposed to open the r/w database with sqlite, acquire a shared lock using it, and _then_ we can copy the file [19:17] so instead of the top, we should do [19:17] con = sqlite3.connect(config["web"]["database"]) [19:17] bck = sqlite3.connect(target_new) [19:17] with bck: [19:17] con.backup(bck, pages=1, progress=progress) [19:18] con.close() [19:18] That should fix the race I suppose if there is one [19:19] Potentially we might want to use VACUUM INTO instead, which makes the published database smaller [19:19] but costs cycles [19:20] so con.execute("VACCUM INTO {}", target_new) [19:20] um VACUUM INTO ? [19:23] let's run a quick bench [19:28] backup API makes more sense [19:30] SURE does! [19:34] Laney: They both have the same speed on our DB, though; but since we add data afterwards anyway it makes sense to use backup API [19:41] Laney: https://code.launchpad.net/~juliank/autopkgtest-cloud/+git/autopkgtest-cloud/+merge/402216 [19:41] Laney: Also backup is actually 1/3-50% faster [19:44] Laney: We ought to do it incrementally with sleep I guess as https://www.sqlite.org/backup.html says such that the lock can be released in between [19:45] Ah Python sleeps automatically for us [19:46] Ah no, we need to specify a pages=5 or something [19:49] Now it does online backup [19:49] with lock releasing to not block other things from writing to it for 2s :D [20:00] Testing now on staging sort of [20:04] -queuebot:#ubuntu-release- Unapproved: neutron (bionic-proposed/main) [2:12.1.1-0ubuntu4 => 2:12.1.1-0ubuntu7] (openstack, ubuntu-server) [20:05] ddstreet: alright, I've uploaded a new version. thanks for catching that. [20:06] please can someone reject all but the most recent upload of neutron (2:12.1.1-0ubuntu7) from the bionic unapproved queue? and if someone from the SRU time has cycles we'd like to see if we can get a review of that last upload. [20:09] juliank, yup -- backup API is definitely the way to do -- don't worry too much about vacuuming; there's generally very few scenarios where it's actually worthwhile (although I'm not familiar enough with autopkgtest to say it definitely isn't one of those, my general rule of thumb is that it's not :) [20:09] waveform: maybe you can review that MR if juliank stole your work? :-) [20:09] Laney, sure thing :) [20:11] oh, while I'm here could someone reject linux-firmware-raspi2 from focal-proposed -- was checking something earlier and realized I'd missed something when reviewing it last week (it shouldn't remove the Breaks line in d/control) [20:11] Laney, waveform I have run the current state of the branch successfully on staging instance fwiw [20:12] waveform: Sorry for stealing your stuff :D [20:12] juliank, no prob -- steal away :) [20:13] waveform: I'm not sure if I want to increase #pages or decrease the sleep (python defaults to 250ms) [20:16] Laney: Can I charm push this to edge channel? [20:17] Laney: I only curled the file and ran it manually :D [20:17] But I want to play with pushing to channel properly again [20:17] but there's your change in there [20:17] (I suppose) [20:18] Laney: eek, no write access to the autopkgtest-web charm anyway [20:18] one second [20:18] ah yeah you're not in ubuntu-release ;-) [20:19] let me remember how to grant that [20:19] * juliank wants to get britney running again before going to bed :D [20:20] ok that should work [20:20] juliank: can you check out wip/haproxy-proxy too [20:20] Laney: Shall I rebase on that? [20:20] merge them locally I guess [20:21] as you wish, I don't mind merge commits really [20:21] juliank, the changes in the merge look sane but I'm probably being horribly thick in not really understanding the *three* databases involved here (not that that's a change you've made -- I get that's always the way it's been, but I'm not grokking the purpose of them all) [20:22] waveform: The r/o database has extra data on top of the r/w for consumers, so we first copy the r/w one in; then the extra data from the old r/o one [20:22] probably should open the old r/o in ?mode=ro [20:22] Laney: can't push still :( [20:23] what [20:23] Laney: [20:23] $ charm push /tmp/charm-builds/autopkgtest-web cs:~ubuntu-release/autopkgtest-web [20:23] ERROR cannot post archive: access denied for user "juliank" [20:23] * juliank sad [20:24] let me read docs I guess [20:24] juliank, brief note on pages/sleep on the backup API -- I might be tempted to bump the pages up a bit -- with the default page-size (4KB) that'll mean it's copying 4MB then sleeping a quarter second, rinse'n'repeat. Given a typical SSD can manage quite a bit more than that I'd be tempted to bump it up to doing ~1s worth of writing before its 1/4 sec pause (which is reasonable) [20:25] Laney: Your change looks good [20:25] waveform: Right, but it only sleeps if there are writers waiting? [20:25] It certainly did not sleep for me, but I had no other connections open [20:26] juliank, good question, don't know -- I would *hope* that's the case but it's not a feature I'm terribly familiar with (one of those new things I've briefly played with and gone "cool!" but not dug into deeply yet) [20:28] waveform: I'll raise to 128k pages [20:28] laney@dev> charm grant --acl write cs:\~ubuntu-release/autopkgtest-cloud-worker juliank [20:28] laney@dev> [20:28] I don't know [20:29] wait web not cloud-worker [20:29] Laney: yeah web :D [20:29] ok did that [20:29] it returned 0 ... [20:29] waveform: Updated to 128k pages [20:31] Laney: doesn't work yet, maybe there's a delay until it takes effect [20:31] Laney: Anyhow, I manually tried the change by curling the file and running it [20:32] will have to ask the juju people I guess :( [20:32] * Laney joins #juju [20:33] * juliank joins as well [20:34] I merged my branch [20:35] now /results/ is not hitting the backends [20:35] hopefully that gets proposed-migration working [20:36] Laney: possibly, want to get mine in too to make sure though [20:36] :D [20:36] yeah for sure [20:36] might have to sort the ACL stuff out tomorrow [20:36] need to do duolingo and play animal crossing [20:37] I need to go to sleep soon :/ [20:39] waveform: could you write me an ACK on the merge? :D [20:42] late night infra fixing :) [20:42] sure, jsut a mo [20:43] done [20:45] Laney: please just merge, publish, and deploy that :D [20:45] charm playing on staging I can do tomorrow :D [20:45] This worked last year (two years ago?) [20:52] waveform: I looked for that pkg to reject btw and didn't see it [20:53] coreycb: I'll do yours tomorrow if someone doesn't beat me [20:53] nn! [20:53] thanks Laney [20:54] Laney, weird -- I can see it in the queue. Anyway, it can wait -- go have fun :) [20:54] maybe I'm looking in the wrong queue [20:54] share a link and i'll do it tomrrow too [20:55] https://launchpad.net/ubuntu/focal/+queue?queue_state=1&queue_text=&memo=30&start=30 linux-firmware-raspi2 second from the top on that page (multiverse, misc) [20:55] version 4-0ubuntu0~20.04.1 [20:56] released! [20:56] ah https://launchpad.net/ubuntu/focal/+queue?queue_state=1&queue_text=linux-firmware-raspi2 might be more useful! (doh) [20:57] RikMills: We believe that britney should be working again now [20:57] oh, it's not in -proposed yet -- that's why you can't see it. Anyway, doubly not to worry then! [21:06] waveform: heh, publish-db with backup API now takes 2 minutes instead of 30s, but the corruption is gone, so not super unhappy [21:07] A bit odd maybe, maybe it needs a lot of retrying due to lots of writes [21:15] that could well be the case -- it was always going to be longer than a straight file-copy anyway, and add a bit extra on top for any writes getting in the way -- that's not too bad. Can probably gain a bit back with some other optimizations -- will see if I can have another look tomorrow [21:28] -queuebot:#ubuntu-release- Unapproved: pi-bluetooth (focal-proposed/multiverse) [0.1.10ubuntu6 => 0.1.15ubuntu0~20.04.1] (raspi) [22:54] -queuebot:#ubuntu-release- New binary: pipewire [amd64] (impish-proposed/main) [0.3.26-1] (desktop-core)