=== jamesh_ is now known as jamesh === Foxhoundz is now known as BenderRodriguez === Foxhoundz is now known as BenderRodriguez === ledeni_ is now known as ledeni [12:13] sil2100: hey! any chance you could take a look at neutron in the bionic unapproved queue? it has some critical bug fixes. === mhcerri is now known as mhcerri_ === mhcerri_ is now known as mhcerri [13:16] bdmurray, hey. Are those "[bionic/nautilus] Possible Regression" emails supposed to be sent daily? Also I think Trevinho responded to it, was it good enough or was there still concerns? [14:00] seb128: Hi! The emaisl are supposed to be sent whenever a new regression is found which could be daily. One of the emails I received talked about the crashes in general as a bunch rather than each one individually. As an example I'd prefer to know that crash 69d1bc is a memory error, while crash b9dfd02 will be fixed by the next SRU, etc.... [14:00] Trevinho, ^ [14:01] bdmurray, k, let's see if we can get what you need ... is there a place that shows all the report ids that are of a concern atm? [14:01] I deleted some of the emails, I though the content was the same [14:01] https://people.canonical.com/~ubuntu-archive/phased-updates.html [14:01] seb128: I've looked at all of them, [14:02] personally the only fix I'm concerned about is https://code.launchpad.net/~3v1n0/ubuntu/+source/nautilus/+git/nautilus/+ref/ubuntu/bionic-fix-file-remote-type-search-crash [14:02] and covered by that branch [14:02] Trevinho, can you take those ~20 ids and do a list of "id: " to help bdmurray? [14:02] Trevinho, seb128: or "id, id, id: " if there are some that are the same type [14:03] most of them are like: "I don't know", actually, a part from the one mentioned above which might cause some of them which I've marked as duplicated alaready [14:03] already* [14:03] Trevinho, I think we need a full summary in one email to help bdmurray to be confident if the update is fine or not [14:03] "I don't know" doesn't sound like the phasing should be increased, unless its an old crash. [14:03] as for the most of them I'm quite sure they happen as per other upstream changes and just changed the trace compared to what we had before, but nothing really concerning [14:04] Trevinho, well "I don't know" doesn't give confidence it's not a regression? [14:04] right [14:04] well if you write that it's fine [14:04] "not a new issue, different signature but the problem was already existing" [14:05] these unknowns, are more of them related to different trace I think, but most of them seems like memory errorrs unrelated to an actual change, but I can see if I can resume it [14:07] Trevinho, well, we just need a list and show that we looked at all the reports and are confident the SRU is still fine [14:07] seems you are confident [14:07] but your reply didn't convince bdmurray [14:08] seb128: yeah, but if we want to be better, would be nice to reupload to fix to #1795028 [14:08] so please provide it in a format that he can review/understand enough to be fine with it as we are [14:08] Trevinho, ok, I can look at doing that ... do you think we should block the SRU until that one lands? [14:08] as I think there might be crashes related to memory issues that might be caused by that too... [14:08] ah [14:08] maybe it's better. [14:08] sounds like we do [14:08] k, let me put on my list to sponsor this week [14:08] bdmurray, ^ we are going to do a follow-up SRU with a fix and then we let you know the status :) [14:09] that branch is based on `XubuntuCancel` one though, so let me know if you want me to rebase it on top of ubuntu/bionic instead [14:09] sil2100: I'm going to finish up SRU reviewing packagekit that I started yesterday now, if that doesn't clash with you? [14:10] seb128: ^ [14:10] seb128: got it, let me know if you want the SRU reviewed. [14:10] Trevinho, yes please, let's not interlock those [14:10] bdmurray, will do, thx [14:10] ack [14:40] rbasak: no problem, thanks for the heads-up [14:48] bdmurray, oh, other topic, I mentioned it at the sprint, but it would be nice to remove 15.04/15.10 from the error tracker legend ... is there a bug tracker/place for such requests? [14:53] seb128: https://bugs.launchpad.net/errors/ [14:53] bdmurray, thx [14:56] bdmurray, https://bugs.launchpad.net/errors/+bug/1796107 [14:56] Launchpad bug 1796107 in Errors "Remove 15.04 and 15.10 from the graph/legend" [Undecided,New] === lool- is now known as lool [14:59] seb128: got it, thanks [15:12] oSoMoN, tdaitx: the libreoffice tests fail with OpenJDK 11. does it need just a rebuild? [15:12] are these related at all? [15:30] doko, not sure, I need to take a closer look at the error, but I'm stepping out now, can do later in the evening [16:55] sil2100: did you manage to look at bug 1782031 please? [16:55] bug 1782031 in openscap (Ubuntu Xenial) "[SRU][xenial] Enable SCE option and systemd probe in libopenscap8" [Undecided,In progress] https://launchpad.net/bugs/1782031 [16:55] From history: [16:56] sil2100_: around? May I have a second SRU opinion on bug 1782031 please? Seems to me there may be a functional (surprising) change to users there if unknown/notchecked ends up going to fail. [16:56] bug 1782031 in openscap (Ubuntu Xenial) "[SRU][xenial] Enable SCE option and systemd probe in libopenscap8" [Undecided,In progress] https://launchpad.net/bugs/1782031 [16:57] 13:53 bug 1782031 in openscap (Ubuntu Xenial) "[SRU][xenial] Enable SCE option and systemd probe in libopenscap8" [Undecided,In progress] https://launchpad.net/bugs/1782031 [17:41] oSoMoN: btw, hsqldb1.8.0 also breaks with openjdk-11 (System::runFinalizersOnExit() was removed) and libreoffice depends on it (it is the only dependency), do you know why? hsqldb 2.4 fixes this and we do have it in the archive, but it is not clear if it is a sane replacement for LO [17:49] rbasak: looking at it now [18:03] infinity: hey, for when you are around, do you have squid's apparmor profile enabled by any chance? [18:04] ahasenack: How would I know? [18:05] infinity: aa-status on the box, or ps faxwZ and check if the squid process is listed as "confined" [18:05] infinity: I got logs from jibel that show apparmor denied messages right around the crash, so I'm thinking he did enable it [18:06] ps faxwZ | grep squid | awk '{print $2}' [18:06] (enforce) [18:06] yep, enforce, sorry [18:06] so it's enabled [18:06] I didn't actively enable it. Surely, it's a default thing? [18:06] I didn't think it was [18:06] I had to explicitly enable it in my test boxes/vms [18:07] something to investigate, but it gives a hint about this bug [18:07] I mean, this laptop install dates back to vivid, so I can never remember exactly what all I may have done to it, but I'm pretty sure my squid setup is just a side-effect of "apt-get install squid-deb-proxy" and editing the whitelists. [18:07] noted [18:07] now let's see what kind of DENIED messages you have related to squid [18:07] So, maybe the AA profile isn't enabled by default *anymore*, but it was in the past, and the maintainer scripts (correctly) don't change the current setup on upgrade? [18:08] lots of things [18:08] could have happened. I will check that, but now I want to see if I can correlate the profile with the crash [18:08] ahasenack: http://paste.ubuntu.com/p/PM8D6WdQD8/ [18:09] rbasak: hmmm, indeed an interesting case this is [18:09] ahasenack: And I woke up to apport telling me of a new crash. Yay. [18:10] [95825.651047] audit: type=1400 audit(1537596453.254:301): apparmor="DENIED" operation="connect" info="Failed name lookup - disconnected path" error=-13 profile="/usr/sbin/squid" name="run/dbus/system_bus_socket" pid=24740 comm="squid" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0 [18:10] ok, [18:10] there are two messages [18:10] one about the net_admin capability [18:10] and the above [18:10] I've seen other fixes in apparmor profiles about this disconnected path issue [18:10] SIGABRT is comm_openex(). That's the same one we were looking at before, right? [18:10] yes, the result of the failed assert [18:10] That does smell of something an AA denial could cause. [18:11] let me get you a diff [18:11] for the profile [18:11] I'll also post it to the bug [18:11] Kay. [18:12] ahasenack: make sure the profile uses attach_disconnected === jdstrand_ is now known as jdstrand [18:12] jdstrand_: yeah, it's not using it [18:12] that will be my diff :) [18:13] I'm also wondering about the net_admin capability [18:13] but that has been in use since squid3 as far as I can tell [18:13] early squid3 [18:13] * bind to any address for transparent proxying [18:13] from man capabilities [18:14] Yeah, squid seems like a solid net_admin consumer. [18:14] * jdstrand nods [18:20] rbasak: let me think about it and leave a comment tomorrow, I think I have a split opinion about this SRU [18:22] infinity: apply this to /etc/apparmor.d/usr.sbin.squid: https://pastebin.ubuntu.com/p/R6Z84ZdsfP/ [18:22] then issue sudo apparmor_parser -r -T -W /etc/apparmor.d/usr.sbin.squid [18:22] jdstrand: looks ok? ^ [18:23] sil2100: thanks [18:23] jmbl: ^ [18:24] ahasenack: Applied. [18:25] infinity: does dmesg show a profile_replace going on for squid and squidguard? [18:26] ahasenack: Applied.Oct 4 12:24:10 nosferatu kernel: [1057755.099944] audit: type=1400 audit(1538677450.686:496): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/squid" pid=18164 comm="apparmor_parser" [18:26] Oct 4 12:24:10 nosferatu kernel: [1057755.109369] audit: type=1400 audit(1538677450.695:497): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="/usr/sbin/squid//squidguard" pid=18164 comm="apparmor_parser" [18:26] cool [18:26] Erm, that. [18:26] now we wait I guess [18:26] I love waiting. [18:26] :) [18:27] you have been getting one crash per day basically? [18:27] I wonder if logrotate triggers it [18:27] I tried here, no dice [18:27] Probably. [18:27] ahasenack: lgtm [18:27] jdstrand: thx [18:29] Oct 04 00:00:14 nosferatu squid[32112]: assertion failed: comm.cc:428: "!isOpen(conn->fd)" [18:29] Oct 04 00:00:15 nosferatu squid[10409]: Starting Squid Cache version 4.1 for x86_64-pc-linux-gnu... [18:29] Oct 04 00:00:15 nosferatu squid[10409]: Service Name: squid [18:29] Hrm, unless I logrotate at midnight now, that's not the trigger. [18:29] ok [18:29] is the crash always around that timE? [18:30] ahasenack: Not sure. [18:30] ahasenack: Huh. Yep. Midnight every day. [18:31] fascinating [18:31] ahasenack: So maybe cron.daily moved? Isn't it meant to be 6am or something? [18:31] Yeah, cron.daily and, thus, logrotate, is 6:25... [18:31] 25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) [18:31] mine is that [18:32] Except there's also a systemd timer for logrotate now? [18:32] Fun. [18:32] ahasenack, this machine has been upgraded since utopic, so it might be something that was set then removed [18:32] do you see logrotate messages around that time? [18:32] Not sure where I'd see them. [18:33] Ahh, syslog apparently. [18:33] And yes, logrotate is at midnight here. [18:33] Thanks, systemd new world order, for changing everything. Love you. [18:34] ahasenack: So yeah, I think it's fair to assume that the service restart/reload from logrotate is the trigger. [18:34] can you run it manually with logroate -f /etc/logrotate.conf? It will rotate your logs earlier than expected, if you care about it deeply [18:35] ahasenack: I can get sciency about that in a few minutes, sure. Should probably unapply the apparmor patch first, confirm a few times that it's reproducible, then apply pach. [18:35] patch, too. [18:35] that's fine, whenever you can [18:36] I'll post on the bug with the patch [18:36] jibel: can you check if you have the squid apparmor profile enabled? Check with "ps faxwZ | grep squid", see if it's listed as enforced [18:37] jibel: ah, sorry, just saw your bug update [18:40] vorlon: My Technical Architect, is that a thing we should care about? logrotate moved to systemd timers (apparently) and now runs at midnight instead of the previously-expected cron.daily time. [18:40] s/My/Mr/ [18:40] Didn't mean to imply you were a Fisher Price My First Architect. [18:41] lol [18:45] rbasak, sorry was away from desk. ok thanks. I am happy to help answer any questions. Unfortunately, we need that functionality to ship a product. sighhhh :-) [18:47] ahasenack: Oh, changing apparmor profiles doesn't apply to running processes, does it? [18:48] * infinity suspects not. [18:49] you need to apparmor_parser --reload /path/to/profile [18:51] infinity: if the process is already runningg under the parser, it does with apparmor_parser -r (or --replace, not --reload though) [18:52] infinity: if the process started outside of confinement, it needs to be restarted (see 'sudo aa-status') [18:52] Kay, I did replace. [18:52] So, this fix isn't a large enough hammer. [18:52] https://paste.ubuntu.com/p/nN5HqbS7ZB/ <-- ahasenack [18:52] I also tried a hard restart for kicks. [18:52] Looks like it's happer about net_admin, but still whiney about dbus, and still asserting. [18:52] happier, too. [18:53] I didn't handle dbus in that patch [18:53] that log is a bit different though [18:53] infinity: let's zero in on apparmor first, though. If you disable apparmor profile for squid, does the crash disappear? [18:54] ahasenack: Maybe! (how? sorry, I'm apparmor stupid) [18:54] aa-complain /usr/sbin/squid I *think* is enough [18:54] it should still log, but not actually deny [18:54] not sure if restarts are needed, just give it a try [18:54] argh. sorry about --reload. :( been dicking with systemd lately :( [18:56] ahasenack: No dice: https://paste.ubuntu.com/p/4zm9dDKgk2/ [18:56] I should probably fix the config file it's complaining about too, but I can't imagine that being the issue. :P [18:56] it's not, I get that too [18:57] just no crash [18:57] infinity: so is this enough to crash it? /usr/sbin/squid -k rotate [18:58] ahasenack: Yup. [18:58] hm [18:58] ahasenack: So, I guess apparmor was a red herring (but clearing out all those sketchy DENYs still seems like a solid plan) [18:58] yep [18:59] now why can't I get it to crash [18:59] That's a mystery I'm not sure I can solve. [18:59] infinity: is that a host, or a container/vm? And amd64 I assume? [18:59] ahasenack: amd64 bare metal. [18:59] ok [19:01] infinity: is this squid the version from the ppa, or 4.1 from cosmic? [19:01] ahasenack: cosmic. [19:01] ahasenack: Not against trying the PPA now that we have a consistent reproducer. [19:01] infinity: can you try the ppa one, and then run /usr/sbin/squid -k rotate command? [19:01] ahasenack: URL to the PPA again? [19:01] (or short name for apt-add...) [19:02] infinity: add-apt-repository ppa:ci-train-ppa-service/3450 [19:03] Why does squid take so friggin' long to shut down? [19:03] (longterm complaint, this isn't new) [19:03] I know [19:03] it waits 30s [19:03] Err, wat? [19:03] There's a sleep in there, it's not DOING anything? [19:03] it's like a graceful shutdown, but it always does that, regardless if there are open connections or not [19:04] That should be fixed.. [19:04] Oh, that's fun. Upgrading squid doesn't restart squid-deb-proxy. [19:04] Probably also a longstanding bug, but ew. [19:05] * infinity restarts manually. [19:05] interesting [19:05] /o\ surrounded by bugs [19:06] I mean, squid can't be expected to know about *all* its potential rdeps, but the ones it does know about (cf: the apparmor profile knows of some), it should probably try to detect and restart. [19:06] there are ways to link systemd units, maybe it could be done [19:06] Or that. [19:06] Oct 4 13:06:43 nosferatu squid[17502]: assertion failed: comm.cc:428: "!isOpen(conn->fd)" [19:06] (with the new squid) [19:06] ok [19:06] So, no dice. [19:07] thanks for your help [19:07] I wonder if it's just a stupid assertion that needs to not? :P [19:07] Like, exiting a loop there might be just as sane as DYING HORRIBLY. [19:08] (note: I've not looked at the code at all, maybe it's the 1 in 100 times when assert() is used correctly) [19:12] ahasenack: The other possibility is that this is really expected behaviour, and upstream just didn't think about people like us who trap all unclean exits as errors and whine about them. [19:12] ahasenack: Note that this is the child process that's dying and respawning, not the master, AFAICT. [19:13] ahasenack: So maybe that assert just needs to be an exit, and we can wash our hands of it. [19:13] ahasenack: But an understanding of WTF is going on would be helpful to determine that. [19:14] ahasenack: I make that assertion based on my master processes running since I started them, and the children having new start times after rotate. [19:15] hm [19:15] Also, this seems to not have anything to do with squid-deb-proxy, it's the non-deb version that we're testing and killing with squid -k rotate, it looks like. [19:15] found an old bug, looking at it: https://bugs.squid-cache.org/show_bug.cgi?id=4796 [19:15] many comments [19:17] ew. I have to admit I would have expected socket interaction with logfile rotating to have been sorted out twenty years ago. [19:19] So many wheels to reinvent. [19:19] ahasenack: Seems stalled on another round of review/commit, but the last patch looks plausibly not the worst? [19:20] Oh, and the discussion moved to github... Somewhere. [19:21] yeah, tracking [19:26] infinity: about the long shutdown: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=898469 [19:26] Debian bug 898469 in squid "Squid waits on shutdown even though there are no active clients" [Normal,Open] [19:27] been there since 3.5.x [19:27] Also, misreading "FreeBSD" as "Fedora" and then seeing libc.so.7 in a backtrace was mildly terrifying for a split second. [19:27] hehe [19:27] OH NO REDHAT WHAT HAVE YO DONE. [19:27] saw someone with a plan B of using restart for logrotate instead of "squid -k rotate" [19:28] ahasenack: Which then runs into the long shutdown issue. :) [19:29] true [19:29] ahasenack: I wonder if it would be out of line to suggest that the Debian/Ubuntu packages should drop the default for that to 5s or something. [19:29] yeah [19:29] Since it can be jacked back up by the config file for people who actually want that. [19:30] I mean, even 5s is too long. The real bug should be fixed upstream, but whee. [19:30] (The part where it doesn't differentiate between active clients and *any open socket*, and thus always waits the max time) [19:31] I assume this actually, hilariously, means that the log sockets we're currently asserting on are also responsible for the shutdown taking 30s. [19:31] Not that fixing A will fix B. [19:31] Just related code with two stupid bugs that should probably date and make hundreds of little bugs. [19:32] debian should also be affected by the crash [19:32] One would assume, yes. [19:32] * ahasenack reads through the bug one more time [19:32] But they don't have a crash handler installed by default, so less likely to notice. [19:32] As I pointed out, it's the *child* that dies and respawns, so there's not DoS or anything here. [19:33] So without a crash handler, you'd have to be scouring logs to even notice it happened. [19:33] * infinity grabs his crossbow and goes to hunt wild tacos. [19:34] mm tacos [20:17] got it in debian [20:17] Oct 4 20:17:00 sid-squid4 squid[582]: assertion failed: comm.cc:428: "!isOpen(conn->fd)" [20:17] not sure how yet, I ran -k rotate multiple times, sometimes with an open connection [20:17] but getting there [20:52] infinity: got a reproducer, and a one-line config change that explains why squid-deb-proxy doesn't crash by default, but squid does [20:52] squid-deb-proxy has cache_dir specified, as does my home proxy. With that, it doesn't crash on squid -k rotate [20:52] I'll update the upstream bug, and might file a debian one as well now that I have a simple reproducer [21:07] ahasenack: Shiny. Well-sleuthed. [21:08] ahasenack: So, the TLDR for me is that if I mask out the main service (as I wanted to do anyway), the bug goes away for me? :P [21:08] ahasenack: (not at all a valid excuse to not fix it, obviously) [21:08] infinity: yes [21:08] or add cache_dir to squid.conf, for reasons [21:09] Should it not have a baked-in default for that anyway, pointing to a dpkg-owned directory? [21:09] if you want to confirm it [21:09] squid-deb-proxy does the right thing [21:09] Seems like a packaging bug tickling the upstream bug. [21:09] squid.conf, I don't know why there is no default, probably because you have to specify a size [21:09] the cache is only in ram then, without cache_dir, afaik [21:09] A default config with no cache dir seems vaguely useless. [21:10] let me see if there is an option where you don't have to specify the max size [21:10] But maybe I'm not imagning some weird use of squid where you would be happy with a tiny RAM cache. [21:10] imagining* [21:11] well, it's a proxy and a cache [21:11] two things in one [21:11] the proxy bit can be used for access control [21:12] There are other proxies that are better at that if you don't also want the caching, IMO. [21:12] all cache_dir types require a size parameter, something hard to guess a good default for [21:12] But fair point. [21:12] #Default: [21:12] # No disk cache. Store cache ojects only in memory. [21:12] I still think preconfiguring a cache dir (even if tiny) makes sense. But, again, still not an excuse for not fixing the upstream bug. [21:12] after all, squid-deb-proxy does have a default cache dir [21:13] I'd rather waste some arbitrary value (500M?) of everyone's disk than eat their RAM. [21:13] If I had to pick a "sane" unpacked-but-not-user-configured state. [21:13] # use a different dir than stock squid and default to 40G [21:13] cache_dir aufs /var/cache/squid-deb-proxy 40000 16 256 [21:13] that's from squid-deb-proxy [21:13] someone made the call [21:13] infinity: yeah, esp. as a default configuration setting these days [23:38] infinity: ah, poor merger I that I didn't notice the change to timers also changed the time it ran. I think that yes, we ought to move it to again run in the 6am window. [23:51] vorlon: if you're around, there are some proposed hints: https://code.launchpad.net/%7Eubuntu-release/britney/hints-ubuntu/+activereviews