[06:31] <cpaelzer> good morning
[07:13] <lordievader> Good morning
[11:05] <Slashman> hello, I think that I'm hitting a bug with the latest openjdk, I would like to rollback to the previous version, is there any way to do that ?
[11:17] <joelio> Slashman: within a point release or new version
[11:17] <joelio> but fundementally, yess
[11:19] <Slashman> joelio: apt-cache policy only show me 8u151-b12-0ubuntu0.16.04.2 or 8u77-b03-3ubuntu3
[11:19] <joelio> it may still be in your /var/cache/apt/archives dir
[11:20] <joelio> in which case you can dpkg -i it
[11:20] <joelio> I take it it's 8 series you need?
[11:28] <Slashman> joelio: no trace of it in /var/cache/apt/archives
[11:29] <Slashman> joelio: I used the oracle jdk instead, since any tar.gz can be downloaded from their website... it's a little sad that I have to switch to that to use a different version
[11:32] <joelio> the oracle jdk is available as a package too
[11:32] <joelio> https://launchpad.net/~webupd8team/+archive/ubuntu/java
[11:39] <joelio> Slashman: also, you probably have lost that specific version do to a security update
[11:39] <joelio> it's... java at the end of the day ;)
[11:41] <Slashman> I prefer to have several different java version to test it, but it doesn't seem to come from the jvm in the end
[11:42] <Slashman> I have "fork: retry: Resource temporarily unavailable" with 40GB free ram and plenty of none of limits that I know of breached
[11:42] <Slashman> I have "fork: retry: Resource temporarily unavailable" with 40GB free ram and none of limits that I know of breached
[11:43] <Slashman> the result is "java.lang.OutOfMemoryError: unable to create new native thread"
[11:44] <Slashman> with 40GB ram free and 100GB unused swap, it should be some kind of limit... but I don't see which
[12:11] <joelio> Slashman: you need to set in in the jvm options
[12:11] <joelio> as the heap is a value you set, depending on the resource needed
[12:12] <joelio> also be aware that the garbage collection mode changes when you set it about 4GB
[12:12] <joelio> but that will probably be your issue
[12:15] <joelio> Slashman: what's the application you're using in java (usually there is an /etc/default/{thing} that allows you to tune)
[12:15] <joelio> or in things like Elasticsearch there is a jvm.options file nowadays
[13:23] <faekjarz> Hi! [17.10 server / netplan] I want one of my NICs configured but booting into link DOWN. Where do i find the information / documentation to acheive this?
[13:26] <tomreyn> faekjarz: try asking in #netplan if you can't get help here
[13:27] <tomreyn> documentatio should be in 'man 5 netplan' and online at http://people.canonical.com/~mtrudel/netplan/
[13:27] <tomreyn> ...according to https://wiki.ubuntu.com/Netplan
[13:28] <faekjarz> tomreyn: aye, i did already, but my pesky impatience ;)
[13:28] <tomreyn> this is the first time i heard of it, i assume it's fairly new
[13:32] <faekjarz> yes, i've found ~mtrudel/netplan already but ctrl+f link doesn't highlight what i'm looking for. Wrong keyword?
[13:35] <Slashman> joelio: that's not  a jvm issue, the error happens even if I try to run "java -version"
[13:35] <Slashman> not sizing I meant, it's production software, that's not a new service or anything
[13:37] <Slashman> joelio: you can see the error when trying to run "java -version" here: https://apaste.info/TIIb
[13:39] <joelio> Slashman: have you tuned the heap, otherwise you'll be running on the default.
[13:39] <joelio> err, ifd java -versoon is broken then I don't know, sounds like it's fubar
[13:40] <Slashman> some system limit are reached, that's how I interpret it
[13:40] <Slashman> but nproc, nfile, etc are far from their limit
[13:42] <joelio> umm, you said you'd used oracle version?
[13:42] <joelio> vm_info: OpenJDK 64-Bit Server VM (25.151-b12) for linux-amd64 JRE (1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12), built on Oct 27 2017 21:59:02 by "buildd" with gcc 5.4.0 20160609
[13:43] <Slashman> I'm using openjdk atm
[13:43] <Slashman> 1.8.0_151 build 1.8.0_151-8u151-b12-0ubuntu0.16.04.2-b12
[13:43] <joelio> so what that output is literally from a java -version ?
[13:43] <joelio> (in the pastebin)
[13:43] <Slashman> https://apaste.info/mAn3
[13:43] <joelio> eh, you said it was failing
[13:44] <joelio> 13:35 < Slashman> joelio: that's not  a jvm issue, the error happens even if I try to run "java -version"
[13:44] <joelio> have you checked your heap space?
[13:44] <joelio> this is a fairly common thing to do
[13:44] <joelio> i.e. increase heap allocated to a java application
[13:44] <joelio> have to do it on stuff deployed here, as the 256Mb is pretty low
[13:47] <Slashman> I'll save you some time, this is a corporate production server, the prod JVM are tuned, the server had no issue, we serve thousand of connections per JVM, I'm trying to understand why now we have an issue where we have JVM error and "java -version" doesn't work anymore, the problem happened this morning and we had to restart 3/7 JVM, after restarting just one, the problems goes away, for a time
[13:48] <Slashman> so it works for a time and suddenly we see "java.lang.OutOfMemoryError: unable to create new native thread" in tomcat logs when there is available memory, both in the heap and on the os
[13:49] <Slashman> only solution is to restart tomcat at this point
[13:50] <Slashman> so, since this problem is "new", my guess is that we have reached some system limit, that is not the amount of available memory
[13:53] <Slashman> it may have been a bug in the new java version but I just confirmed that we hav the issue on the previous one too, so that's not it
[13:56] <joelio> Slashman: memtest?
[13:57] <joelio> or this n multiple boxes?
[13:57] <Slashman> not possible right now, but the server iDrac doesn't report any ECC issue, I'll have to try it
[13:57] <Slashman> joelio: it happens on 2 servers, you're right, doesn't seem related to hardware
[13:58] <Slashman> maybe it's ubuntu related, I have some debian servers without the issue with the same config, I'll compare the sysctl values...
[13:59] <joelio> yea, also check process list output to see what it's been instantiated with, just to make sure those values are being set
[14:01] <Slashman> what do you mean by that?
[14:07] <joelio> the ps -ef / auxx output for the java process
[14:08] <joelio> check how it's been intantiated
[14:08] <joelio> maybe there is a subtle difference
[14:12] <Slashman> oh, that's the exact same line exept pid number ofc
[16:06] <drab> ikonia: sdeziel: fwiw, I got ldirectord working with 3 containers, 1 director, 2 real servers
[16:07] <drab> couple things are broken in the default pkg so took me a while, but it otherwise works very well
[16:07] <drab> systemd unit has a bad path and the pkg actually won't install at all
[16:07] <drab> as there's a race condition with the config file
[16:09] <sdeziel> drab: glad to hear that. I'd have to look into ldirectord as it's new to me
[16:09] <drab> there's still something I don't understand as far as networking goes tho, several of the howtos seemed to say I had to set the director as the default gw for the real servers, but I didn't
[16:09] <drab> I guess it seems more lightweight than haproxy/nginx for just pure tcp/udp connections
[16:09] <drab> and *a lot* simpler, which is all I need
[16:11] <sdeziel> drab: the default gw thing is related to the asymmetric routing we talked about the other day
[16:13] <sdeziel> drab: could you share your ldirectord config?
[16:14] <sdeziel> http://www.linuxvirtualserver.org/docs/ha/ultramonkey.html shows that is can "masq"uerade real servers so that could be why you got away without changing the default gw
[16:14] <drab> well, you'd think, yeah, but actually when you look at how things re set up I don't get it
[16:14] <drab> I don't do masq, I do "gate" which is direct routing
[16:15] <drab> the cavia in most howtos, for the real server to respond , is that a non arping interface needs to get the VIP
[16:15] <drab> so often lo:0 gets the VIP/32 , that's how I've seen in in most howtos, and it makes sense
[16:15] <drab> box won't arp for that (you need to tweak sysctl), and it will still accept connections for that ip since it sees it as local
[16:16] <drab> responses will also originate from that ip since it's the obvious selection given that the request was received for it
[16:16] <drab> so it all seems to make sense and it works just fine, leaving me puzzled why I should be setting the gw to the director
[16:16] <drab> which several howtos I found mention
[16:17] <drab> altho they are all at least 5-7yrs old
[16:17] <sdeziel> yeah, everything in that space seems pretty dated documentation-wise
[16:18] <sdeziel> isn't ldirectord for http/https backends only though?
[16:19] <drab> well ldirectord actually really only check that backends are alive, it doesn't even do any of the switching etc
[16:19] <drab> that's done in kernel by ip_vs, which you have to modprobe
[16:19] <drab> so technical you can just install ipvsadm and modprove ip_vs and you're done
[16:19] <drab> as far as balancing goes
[16:19] <sdeziel> OK so only the health checks, right
[16:20] <drab> but that won't give you any monitoring of the backends. the monitoring part seems to be http in many examples, but maybe not
[16:20] <sdeziel> so yeah, much lighter than a user space proxy like HAproxy/nginx
[16:20] <drab> also ldirectord is just one perl script... which well, it's perl, but it's a single script
[16:20] <drab> sdeziel: and it happens in kernel space
[16:21] <drab> in theory this could simply be plugegd into nagios/icinga/whatever monitoring system
[16:21] <sdeziel> yeah, got that :)
[16:21] <drab> a nagios event_handler could run ipvsadm and remove the backend or something
[16:21] <drab> it would be trivial to implement
[16:21] <sdeziel> is your VIP actually movable between 2 or more boxes?
[16:22] <drab> I haven't tried that part yet, it's next on the list, testing one component at a time
[16:22] <drab> gonna give keepalived a shot
[16:23] <sdeziel> keepalived integrates nicely with IPVS
[16:23] <sdeziel> and keepalived can run whatever health you want it to
[16:23] <sdeziel> no need to mess with nagios handlers
[16:23] <drab> I'm actually pretty happy to have figured this one out, because even in the case of exposing containers and whatnot this is now really straightforward, no iptables or other stuff
[16:24] <drab> seems I can just do straight ip_vs and fiddle with ipvsadm and I'm done
[16:24] <drab> set that up on the baremetal maybe and redirect to whatever containers on it at will swapping things around in just a oneliner
[16:28] <drab> sdeziel: keepalived will take care of the VIP, not the realservers, for those you still need something else like ldirectord or nagios
[16:28] <drab> or whatever
[16:28] <drab> my point was, ldirectord isn't technically needed to get the balancing part going, I thought it was
[16:29] <sdeziel> http://manpages.ubuntu.com/manpages/xenial/man5/keepalived.conf.5.html see the LVS section
[16:36] <drab> yeah, looks like I was wrong, so maybe ipvs + keepalived is all that's needed to both manage the VIP on the directors and manage the real servers.
[16:36] <drab> that's great, one less compoennt, thanks
[16:37] <sdeziel> ldirectord seems to be responsible of monitoring and tapping into ipvsadm whenever needed
[16:37] <sdeziel> keepalived on the other hand provides a wrapper on top of ipvsadm and also handle monitoring
[16:38] <drab> which keepalivedd seems to be capable o doing too, no? that's how I read that section
[16:38] <drab> right
[16:38] <sdeziel> yeah
[16:39] <sdeziel> so if you setup with keepalived it should give you all the features you need
[16:40] <sdeziel> and you won't have a SPOF anymore
[16:53] <drab> yeah, that should be fine, what I'm most concerned about is the zfs snapshot part that comes after that
[16:54] <drab> I've redone the lxd hosts so that /var/lib/lxd is on zfs itself, that way I can send snapshots over to a backup host and have all containers setup ina  single swoop
[16:55] <drab> but in that case they are going to have the same ips, so they need to be stopped until it's time
[16:56] <drab> now that I have lvs I'm wondering if instead I should have the containers on diff ips/not synced and just sync the data DS
[16:56] <drab> haven't thought that through quite yet
[16:57] <sdeziel> what do you mean same IPs? your 2 lxd hosts?
[16:57] <drab> the containers
[16:57] <drab> if I put them on zfs and send the snaps to the failover lxd server
[16:57] <drab> then all configs will be the same including mac and ip
[16:58] <drab> so if they come up I have a conflict
[16:58] <sdeziel> I'm assuming you'll "lxc copy" them, right?
[16:59] <sdeziel> but yeah, the same instance can/should be up only once
[16:59] <drab> I wasn't planning on it, I was planning on setting up zfs-backup-snapshot or something like that
[16:59] <drab> since all lxd is zfs backed up
[16:59] <drab> that way I don't have to make a difference betwee lxd or other data stored on zfs
[16:59] <sdeziel> unless you use the PPA/backports, I think there is no easy way to adopt a zfs
[17:00] <sdeziel> hence the suggestion of lxc copy
[17:00] <drab> how do you mean? adopt a zfs, that is
[17:00] <sdeziel> say you zfs send/receive the container's FS, the receiving lxd host won't be able to start it as is
[17:01] <drab> why not?
[17:01] <drab> I create a DS which I mount on /var/lib/lxd and then the default storage pool for lxc is a LXD DS. so both containers data and lxd config gets moved over to the receiving host
[17:02] <drab> by simply snapshotting everything and sending it
[17:02] <drab> all names and whtnot are consistent, the only diff is the ip of the lxd host and its hostname, that's about it
[17:02] <drab> am I missing something?
[17:04] <sdeziel> I guess this would work if you flip everything at once
[17:04] <sdeziel> but if you want to do it per container that's where you will need a different solution
[17:04] <drab> this also means that ldirectord/keepalived wouldn't have to change since ips would be the same, it's basically almost cloning the whole system, which is quite neat, the only issue is the turn on/off
[17:04] <drab> right
[17:05] <drab> that's what I'm debating, if I will corner myself... but at the same time this keeps it pretty simple and this is a charity, not a tech company
[17:05] <drab> I just want them to have a decent failover solution and data in a diff physical place
[17:07] <sdeziel> zfs send/receive should cut it then
[17:08] <drab> are you using sanoid/syncoid by any chance? seems one of the common solutions to deal with that stuff
[17:08] <sdeziel> drab: sanoid+syncoid might assist you with that
[17:08] <drab> :)
[17:09] <sdeziel> if you want to venture into new territories, you can take a look at DRBD (~RAID1 over the network). Pretty nice
[17:12] <drab> yeah there was a thread about that on the zfsonnix ML, zfs + drdb, I think it's more than they need
[17:13] <drab> in fact simply telling them "if something happens turn off and keep off this machine" is possibly a very good place to be for them
[18:38] <drab> sdeziel: did you look at http://www.znapzend.org/ by any chance?
[18:38] <drab> or even the "official" https://github.com/zfsonlinux/zfs-auto-snapshot
[18:42] <sdeziel> drab: no first time I hear about znap
[18:44] <drab> I like there README quite a bit: https://github.com/mikalsande/znap
[18:44] <drab> and it's all bash, not a perl guy
[18:45] <drab> only cavia is , the author says he only uses fbsd so it's only really tested there
[22:07] <hallyn> cpaelzer: why is https://launchpad.net/~ubuntu-virt/+archive/ubuntu/virt-daily-upstream disabled? :)
[22:17] <Pinkamena_D> How to find what is causing 'Device or resource busy'? I am trying to move /home so that I can overwrite it with something else. However, I get "device or Resource busy" . I am su to root and there is nothing in lsof | grep home  . my cwd is in /
[22:18] <drab> Pinkamena_D: is home on a diff drive?
[22:18] <Pinkamena_D> no
[22:18] <drab> an mv command is saying resource busy?
[22:18] <Pinkamena_D> yes
[22:21] <drab> whups, got disconnected
[22:21] <drab> don't know if my msgs went through
[22:21] <Pinkamena_D> no, did not get any >.>
[22:21] <drab> Pinkamena_D: I was asking, did you log in with ur user and then su to root?
[22:21] <Pinkamena_D> sorry about it
[22:22] <drab> could you login directly with root and try again?
[22:22] <Pinkamena_D> thats correct, root login is disabled
[22:22] <drab> if your user's home is in /home then you can see the problem
[22:23] <Pinkamena_D> there is no way to remove all handles so thats not an issue?
[22:23] <drab> there's probably somewhere a ref too that that's giving you the error even tho your user is technically not opening any file
[22:23] <drab> atltho that's just my guess, I don't think I've personally ran into that before
[22:25] <drab> Pinkamena_D: what you could do is to change your homedir temporarily
[22:25] <drab> so say mkdir /var/tmp/tempuser
[22:25] <drab> change your homedir to that, logout, log back in, try again