[05:35] <jamespage> mdeslaur: cosmic ceph packages also tested OK
[07:12] <zetheroo> on several (but not all) of our Ubuntu Server 18.04 installs the hostname resolution seems to intermittently fail and only 'service systemd-resolved restart' will get it working again. Any ideas as to what can cause this kind of behavior?
[07:22] <blackflow> zetheroo: can you give an example of how it fails, and what the systemd-resolve --status is  (did the upstream NS change?)
[07:24] <blackflow> meanwhile... due to resolved's stupid design not to obey the list of NS entries given, and many other quirks, my recommendation is always to drop systemd-resolved, esp. on prod servers where DNS config is static and must be consistently reliable (ie. no roaming, changing networks, etc...)
[07:25] <zetheroo> blackflow: when trying to ping a hostname you get this 'ping: hostname: Temporary failure in name resolution'
[07:25] <blackflow> there's dnsmasq, unbound, bind, powerdns... pick your poison if you need a local caching, recursive resolver. If not, just statically configure /etc/resolv.conf (and mask out systemd-resolved)
[07:26] <TJ-> zetheroo: yes, the reason is that when resolved has a list of DNS servers and one is unreachable it'll move onto, and remain using, another in the list. If you've got 'private' DNS as well as public that can then break the ability to resolve public names, for example
[07:27] <TJ-> zetheroo: if you do have 'private' DNS (for LAN say) that should be set as on-link only so it isn't used globally
[07:29] <zetheroo> This is the resolv.conf and netplan config from two systems where this happens: https://paste.ubuntu.com/p/mDjnJwQSxm/
[07:30] <blackflow> zetheroo: ah you have .local . systemd-resolved won't work with those unless you have mDNS in your network
[07:32] <zetheroo> blackflow: wdym? DNS does work ... 90% of the time ... it just goes dead after some random time ... or something is fritzing it out
[07:32] <blackflow> systemd-resolved doesn't resolve .local names, and you have mt.local in search so I assumed you're querying for .local names?
[07:36] <TJ-> zetheroo: what does "systemd-resolve --stauts" show?
[07:37] <TJ-> The problem I see there is you've got both private and public DNS servers listed "addresses: [192.168.81.9, 1.1.1.1]"
[07:38] <zetheroo> https://paste.ubuntu.com/p/j9NjmcPFQ3/
[07:38] <TJ-> zetheroo: 1.1.1.1, if used, is NOT going to resolve .local addresses whereas 192.168.81.9 presumably will
[07:39] <TJ-> zetheroo: so, if at some point resolved cannot get a response from 192.168.81.9 it'll switch to 1.1.1.1 at which point .local names won't be resolved
[07:39] <TJ-> zetheroo: is that what is happeing? local names fail? or is it public names fail ?
[07:39] <zetheroo> TJ-: ah, so if at any point the internal DNS (192.168.81.9) server doesn't work, resolve will use the next one (1.1.1.1) and from then on ignore the internal one?
[07:40] <zetheroo> TJ-: honestly I didn't try to reach an external hostname ... only internal ones.
[07:41] <zetheroo> blackflow: just doing 'ping hostname' resolves fine to 'hostname.mt.local'
[07:42] <zetheroo> blackflow: 'ping hostname.mt.local' also works fine
[07:43] <zetheroo> TJ-: systemd-resolve --status -> https://paste.ubuntu.com/p/j9NjmcPFQ3/
[07:46] <TJ-> zetheroo: I think the problem here is you're trying to use netplan to do something it is unable to do; what you need, from what I can see, is a systemd-resolved 'global' DNS server 1.1.1.1 set in resolved.conf (DNS=1.1.1.1) and then only have the 198.168.81.9 in the netplan config
[07:47] <zetheroo> I guess we are using netplan like we were using interfaces.conf
[07:47] <blackflow> TJ-: wait, both are in the netplan config?
[07:48] <TJ-> zetheroo: that should help but if the internal DNS server is unreachable for reasons that is what you ought to focus on because if that is dying then there'll be no mt.local resolution anyhow
[07:49] <TJ-> zetheroo: DNS should be a highly available service with fast response times; unfortnately it's not given the repsect it deserves in many LANs
[07:49] <TJ-> often bundled as a service with many others on some poor overloaded sytem :)
[07:49] <zetheroo> TJ-: ok, but what actually causes resolve to switch to the secondary DNS? is it a timeout? if so ... what time limit?
[07:49] <TJ-> zetheroo: connection timeout if I recall correctly
[07:50] <TJ-> zetheroo: remember that DNS uses UDP in the main, and the U stands for Unreliable :)  UDP can be dropped by routers under pressure
[07:54] <zetheroo> I'm just trying to get an idea of how critical the "drop" in DNS is for resolve - again we have dozens of Ubuntu systems, and this is only happening on 4.
[07:57] <TJ-> zetheroo: I'd check/monitor their network links
[07:59] <zetheroo> that's the thing ... 2 of the systems (the ones in the pastebin) are standalone hardware systems, and the other two are VMs on our virtualization, which are living with other Ubuntu VMs which don't do this ...
[07:59] <blackflow> (UDP is primarily used, but TCP must be allowed for requests and responses larger than single packet size)
[08:00] <zetheroo> but, OK, if we remove the external DNS address from the netplan config it should never ignore the internal DNS server entirely - right?
[09:15] <zetheroo> I normally use 'apt autoremove' to free up space in general, and it also frees up space in /boot by removing old kernels, but is there another way to specifically clean up /boot?
[09:16] <TJ-> zetheroo: clean up? or increase freespace?
[09:19] <TJ-> zetheroo: you could make the initrd.img smaller, /etc/initramfs-tools/initramfs.conf MODULES=dep
[09:20] <zetheroo> well it seems that even after the old kernels are removed there is still a bunch of files from those kernels in /boot
[09:20] <zetheroo> https://paste.ubuntu.com/p/rMRxjb55vy/
[09:21] <blackflow> zetheroo: you have to explicitly remove the 4.13 kernel (wth btw). autoremove only keeps current and current-1 version of currently (heh) running kernel
[11:09] <mdeslaur> jamespage: ack, thanks. xenial next?
[11:55] <supaman> I have an fstab entry to mount a smb share, when running mount -a I get that there is an error in options to the smb mount, here is the options, can someone see the error? credentials=/root/.smbcredentials,iocharset=utf8,sec=ntlm
[11:58] <supaman> hmmm ... removing sec=ntlm fixes the problem
[12:01] <supaman> setting sec=ntlmssp works
[16:01] <kinghat> https://arstechnica.com/gadgets/2019/06/zfs-features-bugfixes-0-8-1/
[16:03] <mybalzitch> too bad kernel devs are working overtime to neuter ZFS
[16:20] <kinghat> why is that?
[17:10] <sarnold> kinghat: the last paragraph here describes the mood well https://marc.info/?l=linux-kernel&m=154714516832389
[17:12] <kinghat> so because of the GPL?
[17:16] <tomreyn> because of the CDDL
[17:29] <swills> i saw the author of that ars article give a pretty decent talk in zfs, part of which talked about how the GPL is unenforcable
[17:29] <swills> all the stuff with the SFC vs SFLC etc
[17:30] <lordcirth> While I understand that supporting CDDL is annoying, I find it odd that they don't get that ZFS is good and people want it, regardless of what Sun wanted.
[17:30] <lordcirth> If they want to make btrfs raid5/6 stable, I'd be ok with that, but it's just not there.
[17:30] <swills> i think they do get that, but they also get that they are between a rock and a hard place
[17:30] <swills> and facebook is doing a lot of work on btrfs
[17:32] <lordcirth> Btrfs is quite nice for root partitions (way easier than ZFS root on most distros, even Ubuntu) but I need raid5 in order to use it for a lot of use cases
[17:33] <swills> zfs on root is automatic on FreeBSD, fwiw
[17:33] <swills> but anyway, look at all the talk about btrfs in recent LWN coverage of LSFMM
[17:33] <lordcirth> I'm aware, but that's not really an option for a lot of systems
[17:34] <swills> so i dunno, i think ultimately what's going to happen is everone is going to be told "btrfs is good enough, use it, not ZFS" and "if you want to use ZFS, don't expect it to get easier"
[17:35] <swills> but i could be wrong
[17:35] <kinghat> you have to set the trim flag with zfs, when setting up an ext4 fs on an ssd is it automagic?
[17:36] <tomreyn> there's an fstrim systemd timer
[17:36] <lordcirth> kinghat, iirc Ubuntu does scheduled trim, not instant trim
[17:36] <kinghat> is one better than the others?
[17:37] <kinghat> other*
[17:37] <lordcirth> scheduled is better, assuming you don't fill the entire drive with garbage before it can trigger
[17:37] <lordcirth> In general, at least
[17:38] <kinghat> oh wow, its weekly.
[17:38] <tomreyn> the "discard" mount option can cause serious I/O problems with some SSD / NVMEs.
[17:39] <tomreyn> https://wiki.debian.org/SSDOptimization#WARNING
[17:40] <swills> fun fact, some drives are slow at trim, so turning it on can make you disk seem slower
[17:40] <swills> perhaps schedule trim helps with that, i dunno
[17:41] <kinghat> so does that only run for the OS ssd or any attached ssds?
[17:41] <lordcirth> kinghat, well, it just runs "/sbin/fstrim -av"
[17:42] <tomreyn> https://wiki.archlinux.org/index.php/Solid_state_drive#Periodic_TRIM  "The service executes fstrim(8) on all mounted filesystems on devices that support the discard operation. "
[17:42] <lordcirth> And "man fstrim" says that "-a" is all supported mounted devices
[17:42] <kinghat> ah. would arch be applicable to ubuntu?
[17:43] <tomreyn> i think it's the same systemd service + timer
[17:44] <tomreyn> systemctl list-timers fstrim.timer
[17:45] <kinghat> ah
[17:47] <tomreyn> hmm the timer seems to lack randomization, always runs at 00:00.
[17:48] <tomreyn> if you want to review the timer + service: ls -l /lib/systemd/system/fstrim.*
[17:48] <kinghat> time is just a social construct tomreyn.
[17:49] <tomreyn> my point is you don't want all your servers to become I/O loaded at 00:00
[17:51] <tomreyn> RandomizedDelaySec= should be used
[17:51] <kinghat> agreed
[17:55] <lordcirth> Note that if you want to change a timer/service, do not edit the one in /var. Copy it to the equivalent directory in /etc and edit that. The /etc one will override the one in /var when read, but will not interfere with the packaged file.
[17:59] <tomreyn> bug 1833593 files
[17:59] <tomreyn> *fileD
[18:02] <tomreyn> if some of you have access to I/O load logs across larger server farms, it'd be great to check (and comment) whether this has any noticable impact.
[18:08] <lordcirth> Most of our servers have separate SSDs for root and data, so the root ones don't have much load. Maybe our DB servers?
[18:08] <lordcirth> Er, I meant, SSDs for root and HDDs for data.
[18:09] <lordcirth> It's Monday at local midnight? I will ask
[18:15] <sdeziel> lordcirth: 'systemctl edit $foo' lets you create an override snippet, add '--full' to it and it will do a full unit copy for you to edit
[18:15] <lordcirth> sdeziel, cool, didn't know that!
[18:15] <sdeziel> a bit like upstart's .override files but better
[19:56] <geard> exit
[23:18] <mybalzitch> Can I go from 18.10 to 18.04.2?
[23:19] <mybalzitch> I tried do-release-upgrade but it wants me to go to 19.xx
[23:20] <sarnold> you really can't go backwards
[23:21] <sarnold> individual packages might downgrade alright, once in a while, but the packages are packaged with the assumption that you only ever move forward along with the passage of time
[23:23] <mybalzitch> ok, I was trying to get zfs 0.8.1 with Jonathon F's packages, but it seems he only supports LTS releases, so I don't have a "easy" way to get it installed. I will have to do dkms myself I think
[23:24] <sarnold> oh dang. :/
[23:24] <sarnold> you could probably just build his packages locally
[23:32] <teward> wooooooooooow I feel like an imbecile today...
[23:32] <teward> i totally forgot to save a switch config so I thought my Ubuntu gateway machine in this one lab env was broken
[23:32] <teward> and ended up realizing after puttering with it for 2 hours that it was the switch
[23:32] <teward> >.<
[23:32] <teward> anyways... sarnold I'mma bother you again like I normally do :p
[23:32] <sarnold> teward: sounds like a problem best solved with a sandwich. or pizza. with beer.