[07:45] <lotuspsychje> ask here and idle a bit Koopz volunteers might wake at US timezone here
[07:46] <Koopz> hmm... my workday might be over by then ;D
[07:46] <lotuspsychje> ask anyway, you never know
[07:48] <mIk3_08> Hi guys.... how was the status of Linux as a Server Operating System?
[07:50] <Koopz> i've got 2 users on my server which i want to move over to a new server. Do i just need to edit /etc/passwd, /etc/group and /etc/shadow, add their lines, restart and i'm done? (already copied their home directories over)
[07:50] <Koopz> or do i even need to reboot?
[07:51] <ducasse> mIk3_08: do you have an ubuntu question?
[07:52] <mIk3_08> none so far ducasse
[08:23] <Koopz> okay i got a package related question here, i just tried installing "cacti" which somehow ended up in apt also installing apache2 which i didn't want and shouldn't really have happened either
[08:24] <Koopz> this is actually my first time checking dependencies for packages so i may be misinterpreting this but does cacti actually depend on libapache2-mod-php?
[08:26] <blackflow> Koopz: it shouldn't. I hink apache is just a default dep for "I need PHP!". If you installed php(-fpm) first, it should not call in apache
[08:27] <blackflow> Koopz: Uh, sorry, I mean web server, like nginx. if you have it installed first, it wouldn't call in apache for php deps. but you do need php-fpm
[08:27] <Koopz> i got php-fpm and nginx installed, that's why i'm asking
[08:28] <blackflow> oh and it still pulls in apache?
[08:28] <Koopz> https://gist.github.com/Koopzington/9f515c4146eacbe22914a7842c893923
[08:28] <Koopz> here's the map i got
[08:29] <Koopz> if i understand this thing correctly it wouldn't need libapache2-mod-php if i had javascript-common installed? ಠ_ಠ
[08:31] <blackflow> it shouldn't pull in apache if you have nginx installed. could be a bug in packaging. wouldn't be a first.
[08:33] <Koopz> there does seem to be some kind of hard linking between apache2 and cacti though. After i installed cacti i tried removing all apache2 packages afterwards and when i removed apache2-bin, cacti got uninstalled too
[08:35] <blackflow> the dep list says libapache2-mod-php OR php (virtual). so something else in that whole deplist must be pulling it in
[08:36] <Koopz> oh? does the pipe indicate the first option of the "OR"?
[08:36] <blackflow> Koopz: ah, wait....  it's a recommended package
[08:37] <blackflow> apt install --no-install-recommends cacti
[08:37] <blackflow> it's a bug if you ask me, it should be removed from cacti's recommended list as it's already a dependency pulled in via php (as alternative) and php is a must
[08:37] <jamespage> sahid: I'm about to start on the networking- and neutron- packages for release
[08:38] <blackflow> Koopz: also it's a bug that apache and nginx can coexist, as both would be attempted to start by default, and fail as both can't be listening on port 80
[08:38] <Koopz> http://koopz.rocks/s/2019-04-12_10-38-40.png
[08:39] <sahid> jamespage: ack, i'm building and testing nova
[08:39] <jamespage> sahid: fwiw if its really is just a version delta I'm not build testing prior to upload
[08:40] <blackflow> Koopz: yeah I was wrong. something else is pulling it in
[08:42] <Koopz> ah
[08:43] <sahid> jamespage: ah ok ok
[08:51] <Koopz> i figured out why apache2 was installed
[08:52] <Koopz> "php" wasn't marked as installed since i directly installed php-fpm
[08:54] <blackflow> Koopz: ah, so it must be explicitly installed, I see
[08:56] <Koopz> yeah i just avoided doing that since the last time i installed "php" apache2 was installed too
[08:56] <Koopz> it's safe to "install" it after installing fpm though
[09:04] <jamespage> sahid: ok just did neutron-vpnaas - how are you getting on?
[09:10] <jamespage> sahid: I'm going to restart from the bottom of the list and work upwards :-)
[09:10] <jamespage> well after a coffee
[09:37] <jamespage> sahid: up to placement
[09:41] <jamespage> sahid: ok just did openstack-trove
[09:42] <jamespage> sahid: that just leaves nova* and octavia*
[09:42] <jamespage> sahid: tinwood is cutting the nova-lxd release
[09:49] <sahid> jamespage: ok i had to be afk, i just pushed nova https://git.launchpad.net/~sahid-ferdjaoui/ubuntu/+source/nova/
[09:49] <sahid> and i'm taking care of octavia right now
[09:49] <sahid> (btw i still have tested ova)
[09:52] <sahid> jamespage: https://git.launchpad.net/~sahid-ferdjaoui/ubuntu/+source/octavia
[09:55] <jamespage> sahid: master branch of the nova repo has rc2?
[09:57] <sahid> jamespage: i looked at http://uca-tracker/stein_upstream_versions.html
[09:57] <jamespage> sahid: well release is out :-)
[09:58] <jamespage> I don't know how often that report updates
[09:58] <jamespage> sahid: gbp import-orig --uscan should pick the latest version
[09:58] <jamespage> in the stein series
[10:00] <sahid> jamespage: not sure to understand, did i made a mistake?
[10:01] <jamespage> sahid: how did you download the new tarballs?
[10:01] <sahid> jamespage: uscan --verbose --download-version "$version" --rename --timeout 60
[10:01] <sahid> gbp import-orig --no-interactive --merge-mode=replace ../${project}_${version}.orig.tar.gz
[10:01] <jamespage> sahid: "gbp import-orig --uscan" will do much the same in one command
[10:02] <jamespage> but will always pick the most recent version from tarballs.openstack.org (in this case)
[10:02] <jamespage> sahid: the debian/watch file is typically pinned to a major version series so its safe
[10:02] <jamespage> i.e. you won't jump to train :-)
[10:04] <sahid> oh yes...
[10:04] <sahid> ok let me retry that
[10:04] <sahid> jamespage: ^
[10:04] <jamespage> sahid: +1
[10:16] <sahid> jamespage: https://code.launchpad.net/~sahid-ferdjaoui/ubuntu/+source/nova/+git/nova
[10:16] <sahid> sounds better?
[10:19] <jamespage> sahid: yep - processing now!
[10:19] <jamespage> sahid: do you want todo the same for octavia (and octavia-dashboard)
[10:20] <sahid> yes sure i'm working on ocatva right now, i will do octavia-dashboard then
[10:25] <tinwood> jamespage, sahid, 19.0.0 nova-lxd is now tagged and pushed to gerrit
[10:25] <tinwood> sorry for delay; had to check a few things first
[10:25] <jamespage> tinwood: thanks!
[10:32] <jamespage> sahid: ok nova uploaded
[10:32] <sahid> jamespage: ack
[10:32] <jamespage> sahid: you do octavia-* I'll deal with nova-lxd
[10:32] <sahid> i'm working on octavia-dashboard but i have a issue with sphinx
[10:32] <sahid> when i execute gbp buildpackage -S -sa
[10:33] <jamespage> sahid: try with -d
[10:33] <sahid> :)
[10:34] <sahid> ok, all good: https://code.launchpad.net/~sahid-ferdjaoui/ubuntu/+source/octavia/+git/octavia https://code.launchpad.net/~sahid-ferdjaoui/ubuntu/+source/octavia-dashboard/+git/octavia-dashboard
[10:34] <jamespage> sahid: great
[10:43] <jamespage> sahid: ok both uploaded along with nova
[10:44] <jamespage> sahid: I'm just finishing off manila-ui and then I think we're all done
[10:44] <jamespage> sahid: most are wedged in the disco upload queue pending review by a member of the release team
[10:44] <jamespage> sahid: as we're in final freeze any seeded packages get reviewed
[10:44] <jamespage> sahid: but we have an exception so should all be ok
[10:50] <jamespage> sahid: release team just accepted all uploads I think so we're good
[10:50] <jamespage> time to build,backport and recheck
[10:50] <jamespage> (all automated :-))
[10:52] <DK2> i need to downgrade to 7.1.27-1+ubuntu16.04.1 from 7.1.28-1+ubuntu16.04.1 is there any possibilitys? in the repos i can only find 7.1.28 theres no older package anymore
[10:52] <DK2> PHP-Version
[10:54] <jamespage> sahid: if you need to check the upload queue - https://launchpad.net/ubuntu/disco/+queue?queue_state=1&queue_text=
[10:54] <jamespage> that's for disco - you can url hack for other releases :-)
[10:55] <sahid> jamespage: ack thanks :)
[10:55] <jamespage> sahid: so we have four in queue still - no need to chase yet :-)
[11:29] <tomreyn> DK2: why do you need to downgrade php to an earlier version,a nd where are these versions from anyways (not ubuntu)?
[11:30] <blackflow> !info php xenial
[11:31] <blackflow> !info php bionic
[11:32] <tomreyn> ppa:ondrej/php for xenial has 7.1.28-1+ubuntu16.04.1+deb.sury.org+3
[11:34] <tomreyn> https://www.php.net/ChangeLog-7.php#7.1.28 fixes two security vulnerabilities. you don'T want to downgrade to a non-patched version.
[13:50] <foo> Can someone confirm: 0 */3 * * * /home/dev/sky/db-backups/autopgsqlbackup.sh - this runs at 0, 3, 6, 9, 12, 15, 18, 21 - right?
[13:53] <rypervenche> foo: Correct. 00:00, 03:00, 06:00, etc.
[13:54] <foo> rypervenche: thank you
[14:01] <yossarianuk> hi - I am trying to setup a KVM host using ubuntu 18.04 - I want to set up a bonded bridge with VLAN with netplan
[14:01] <yossarianuk> are there any examples any where ?
[14:02] <yossarianuk> I can't find one that has bond, bridge, and VLAN
[14:03] <yossarianuk> I have tried to attempt it - however the vlan isn't working - it may be due to needing config on the switch - I just wanted to make sure my config was correct
[14:03] <yossarianuk> You can see it here -> https://pastebin.com/uYx3u1NA
[14:04] <cyphermox> it's a little hard to read because it's set up for tests, but there's https://github.com/CanonicalLtd/netplan/blob/master/tests/integration/scenarios.py#L75
[14:04] <yossarianuk> Could some one look at the config (url above) and let me know if it looks like sane config ?
[14:05] <yossarianuk> I wasn't sure if I put vlan in the right place..
[14:05] <cyphermox> yes that looks fine
[14:06] <yossarianuk> cyphermox
[14:06] <foo> I want to be extra sure... https://paste.ofcode.org/4aCLsTBGcQSi9M5Z44wXn7 - OOM is killing stuff left and right, this is becoming a significant issue and I'm having trouble tackling it. Does anyone see anything telling in that paste?
[14:07] <foo> I think OOM can still kill off process X even though it's caused by Y, right?
[14:07] <yossarianuk> cyphermox: thanks - and thanks for the example... I notice that in the example you posted the vlan was added to the bridge interface - do I need to do that ?
[14:09] <cyphermox> yossarianuk: no; you set it up the way you like we were just trying to mix and match things complicated enough to make it a good test
[14:09] <cyphermox> foo: it can kill any random thing asking for memory at the time; it doesn't have to be the process hogging things
[14:10] <sdeziel> foo: there is a selection process on what to kill during OOM
[14:10] <foo> cyphermox: thank you, thought that was the process.
[14:10] <foo> What's the best way to see what process is consuming the most amount of memory over time?
[14:12] <Ussat> vmstat
[14:12] <Ussat> man vmstat
[14:12] <Ussat> htop is good also
[14:13] <foo> Ussat: I was using atop but didn't see anything meaningful in there
[14:13] <foo> vmstat 1 shows me IF the system is swapping. I want to know the exact process sucking up most memory (if that's possible). I don't seem to see that specifically in the man page... but perhaps I missed it
[14:14] <foo> What's also strange... I haven't ever seen the system swap. Despite OOM killing stuff. Should I see some swappage?
[14:14] <foo> It's almost as if the system isn't set up to use swap
[14:15] <foo> (although it does exist I believe)
[14:15] <Ussat> look at htop
[14:15] <Ussat> I dont know any one command that will show that
[14:15] <foo> oh, actually. There is no swap. heh
[14:15] <foo> Swap:             0           0           0
[14:15] <foo> (from free -m)
[14:15] <foo> ... it's probably suggested to have 2GB swap on a system or such, right?
[14:15] <foo> ... to at least not have OOM kill off stuff
[14:15] <foo> I mean, tha doesn't solve my core problem but I'll probably want to do that
[14:15] <foo> Ussat: thank you
[14:16] <foo> Do you suggest htop over atop? atop may be older
[14:16] <yossarianuk> you could try this
[14:16] <yossarianuk> ps -eo size,pid,user,command --sort -size | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'
[14:16] <foo> Mem[||||||||||||||||||||||||||||||||||||||||||||||1.66G/1.95G]   Load average: 0.48 0.43 0.44
[14:16] <yossarianuk> that show mem of each process and sorts them
[14:17] <foo> yossarianuk: uh, thank you. |head of that... gives me some definite insight. This is helpful
[14:17] <yossarianuk> np
[14:17] <foo> yossarianuk: saving this nugget for future
[14:18] <Ussat> No kidding, saved here also
[14:18] <foo> yossarianuk++
[14:19] <foo> I'm thinking a polling script changed and is threading and sucking up resources.
[14:19] <foo> I'm tempted to run yossarianuk's command every minute with timestamp and log to file... |head
[14:19]  * foo does
[14:20] <foo> while [ 1 ] ; do date; ps -eo size,pid,user,command --sort -size | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }' | head; echo -------------; sleep 60; done
[14:20] <foo> Not the most pleasant, but output that to a file... it'll to the trick
[14:22] <Ussat> quick, dirty but effective, and thats what counts
[14:22] <Ussat> yossarianuk, I know you did not just come u with that, thats impressive
[14:22] <Ussat> nice one
[14:23] <foo> haha
[14:24] <foo> Hmm, I wonder what this is: 111.37 Mb /usr/bin/lxcfs /var/lib/lxcfs/
[14:25] <foo> ... now if only I could force this issue and see it happen in real time. For now, I wait, and trust the logs.
[14:25] <foo> Also, can we agree that swap is generally a good idea? I'm a bit rusty in my admin but IIRC that's something I want.
[14:26] <Ussat> https://linuxcontainers.org/lxcfs/introduction/
[14:27] <cyphermox> foo, swap won't save you if something is leaking memory or getting to consume all that is available anyway
[14:27] <foo> Ussat: oh. thank you.
[14:27] <foo> cyphermox: aka. swap might just be eaten up too, correct?
[14:27] <Ussat> yup
[14:27] <cyphermox> yes
[14:28] <foo> True, but can we agree... having it available (after I resolve this issue) is generally a good idea?
[14:28] <cyphermox> swap is just "extra memory" on disk, that can be used to free up some RAM when context switching; but it's not a cure-all
[14:28] <foo> cyphermox: agreed
[14:28] <cyphermox> it generally will help
[14:28] <sdeziel> but it provides a nice space to push pages that are not currently in use
[14:29] <foo> agreed. ok, I'll look into that post-resolving this issue.
[14:29]  * foo waits on the sidelines with a fly swapper 
[14:29] <foo> swatter*
[14:29] <foo> Come on you memory hog, show yourself
[14:29] <cyphermox> foo: you could just create a swapfile
[14:30] <foo> cyphermox: I could, but I do want this issue to show itself... in the odd chance it doesn't consume all swap. I have htop going and while [ 1 ] ; do date; ps -eo size,pid,user,command --sort -size | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }' | head; echo -------------; sleep 60; done >> /home/foo/mem-issue.txt
[14:32] <foo> actually, per htop, I can see my system currently at 1.72/2GB RAM consumption. It's "idling" there... meaning, just a little more requirement could cause a problem
[14:32] <foo> Do I understand that correctly? I know sometimes the system uses available ram for when it needs it and thus that's not an actual current utilization IIRC
[14:41] <foo> Great, it just happened - OOM murdered a process. Time to see what the culprit is.
[14:53]  * foo enables per-second logging for more accuracy, every minute not enough if something spikes within minute and gets killed off
[14:55] <JamesBenson> gbkersey: I've temporarily paused it.  I needed to get this Openstack deployment out.  I was hoping to use it for that, but I guess next round.  But I might ping you/community about it.  I bought these cards for all of our servers, r610,r710,r910.  So need to get them working!
[14:55] <foo> Anyone see anything strange here as it relates to memory usage? https://paste.ofcode.org/FDKFEpQt2e2ErWVXNb5Qrw
[14:55] <sdeziel> foo: you can look at some diagnostic that OOM-killer sends to dmesg, maybe that will tell you more about the culprit
[14:55] <foo> sdeziel: hmm, I was but didn't see anything obvious, will take another look - thank you
[14:56] <gbkersey> JamesBenson: hopefully you got the cards cheap... :)  I think I paid ~ $30 for mine...  Expensive thing was the 10G switch module for our 5406zl
[14:57] <JamesBenson> gbkersey: :-/ I think it was around $65 a pop... for 15 cards.
[14:58] <gbkersey> not that bad....  I bought a stack of fully populated R610/R710 for $100/box couldn't pass up the deal....
[14:59] <JamesBenson> We've been buying from servermonkey servers and parts.  RAM from memoryamerica (lifetime warrenty)
[14:59] <gbkersey> I found that the Dell twinax cables that came with the boxes would not work with the HP switch - because the nvram in the SPF did not say it was an HP
[14:59] <gbkersey> ended up buying a bunch of clone HP cables off of ebay and those worked just find.
[14:59] <gbkersey> s/find/fine/
[14:59] <JamesBenson> We have the dell cables and 10g switch atm.  But we will need more cables, I found some on Amazon that should work
[15:00] <foo> In the event someone knows how to read OOM data better than me and can provide some pointers, here's the OOM / kernel info: https://paste.ofcode.org/ytez6sPUZXdQbUQGyY69WS - I wonder if I want to look for oom_score in output?
[15:00] <foo> sdeziel: ^
[15:00] <foo> Thank you!
[15:00]  * foo skimms
[15:00] <JamesBenson> https://www.amazon.com/Cable-Matters-10GBASE-CU-Compatible-Supermicro/dp/B01DJL4LRE/ref=sr_1_4?keywords=SFP%2B%2Bcable&qid=1552935297&s=pc&sr=1-4&th=1
[15:00] <gbkersey> I bought the clones on ebay....  they were cheap.
[15:00] <JamesBenson> yeah, we can't buy from ebay... university
[15:00] <JamesBenson> too much of a pita.
[15:00] <foo> I mean, I guess it's possible this box just needs more memory...
[15:01] <foo> I could upgrade it to 4GB RAM. Actually, probably makes sense to add 2GB swap before doing that
[15:02] <sdeziel> foo: the way I read it, postgresql asked for more memory but none was available so OOM-killer started to look where to force reclaim some, the gunicorn process (27094) was selected as the best candidate to kill to free some RAM
[15:04] <blackflow> you mean the kernel dice rolled just at the PID to the chagrin of gunicorn :)
[15:04] <foo> sdeziel: thank you! Now, to help understand what you're seeing... you're basically looking at Apr 12 07:39:42 server kernel: [9534277.048613] postgres invoked oom-killer and then Apr 12 07:39:42 server kernel: [9534277.049074] Out of memory: Kill process 27094 (gunicorn) score 84 or sacrifice child - right?
[15:04] <sdeziel> foo: there are many invocation of the oom-killer in that paste, I only checked the first
[15:04] <sdeziel> foo: yes
[15:05] <foo> sdeziel: thank you. yeah, it looks like even sshd invoked oom killer.
[15:05] <gbkersey> JamesBenson: just be careful that the eeprom in the twinax matches your switch vendor especially if the switch is HP - the cards complain about the cable not being certified but they still work just fine.
[15:05] <sdeziel> foo: this first kill seem to have freed ~170mb of RAM
[15:05] <foo> sdeziel: ... which would lead me to believe just because postgres invoked oom-killer, it's not necessary the main culprit... it simply couldn't find more memory available
[15:06] <sdeziel> foo: the process that wakes oom-killer isn't necessary the culprit, it just happens to be one process needing for some more memory but the memory pressure is the result of every process taking some memory away from the kernel...
[15:06] <foo> sdeziel: I assume Apr 12 07:39:42 server kernel: [9534277.051857] Killed process 27094 (gunicorn) total-vm:391752kB, anon-rss:169656kB, file-rss:1164kB, shmem-rss:0kB - and specifically: anon-rss:169656kB is what you're seeing there. Thank you, this is helpful for me to do this myself next time.
[15:06] <sdeziel> foo: correct
[15:07] <sdeziel> foo: IIRC, the meaning numbers/metrics are "*-rss"
[15:08] <blackflow> eh "culprit" ... how do you define one. postgres wanted more RAM, kernel killed gunicorn in response. postgres totally is the culprit for that oom.    the only way to properly control that is to resource-limit individual processes, but that's usually less than optimal usage of RAM
[15:08] <foo> sdeziel: right right. The next question is: A) does this server simply need more memory? or B) are some of the python processes ( per https://paste.ofcode.org/FDKFEpQt2e2ErWVXNb5Qrw ) simply taking "too much" memory. Yup https://stackoverflow.com/questions/18845857/what-does-anon-rss-and-total-vm-mean
[15:09] <blackflow> foo: python is notorious for not returning the RAM it's no longer using, back to the OS
[15:09] <sdeziel> foo: it depends. your paste doesn't show the PID so it's hard to know. I'd check if a given gunicorn process sees its memory bubbling over time
[15:09] <blackflow> we have some uwsgi apps that, for some requests, need to spike up RAM usage several times more than average. so we configure uwsgi to kill a running process when rss is larger than a set threshold
[15:10] <foo> blackflow: THANK you. I do have control over python code and can see about it... it's possible there is a python library causing an issue here
[15:10] <blackflow> (which happens after the request is done, this "killing" is a graceful shutdown-and-restart of the process)
[15:10] <sdeziel> uwsgi is also what I've used and I liked it's flexibility
[15:10] <foo> What's interesting is I see this: [URGENT] set vm.overcommit_memory=2 in /etc/sysctl.conf and run sysctl -p to reload it. This will disable memory overcommitment and avoid postgresql killed by OOM killer. - from "/postgresqltuner.pl" - which leads me to believe... I might be able to set something to prevent oom getting invoked by python. Do ya'll generally suggest this?
[15:10] <blackflow> foo: no idea, you should analyze individual processes RAM usage and make decisions based on that
[15:11] <sdeziel> foo: that's a global flag so it would be less risky to do on a dedicated DB server which isn't the case of your box
[15:11] <blackflow> foo: no. overcommit is okay if used wisely. what you need is to resource-limit individual processes, so that OOM can't kill random processes
[15:12] <foo> One thing that's somewhat telling, per https://paste.ofcode.org/FDKFEpQt2e2ErWVXNb5Qrw - line 11... 45.53 Mb /home/dev/website.com/venv/bin/python3 /home/dev/website.com/venv/bin/gunicorn - that's a django-based app. The other gunicorn stuff is for another app... and that's all at ~150MB. Sure, it's a bigger app, but if I had more insight into which python libs were sucking up memory there... hmmm...
[15:12] <blackflow> but then only to find out what is frequently needing more than allocated, then act accordingly (eg, by adding more RAM, or by optimizing that process' RAM usage)
[15:12] <sdeziel> foo: you can probably do something more fine grain with systemd tuning how much RAM is given to gunicorn
[15:12] <foo> that might be helpful.
[15:12] <blackflow> which is the resource limiting that I'm talking about
[15:12] <sdeziel> yup
[15:12] <foo> sdeziel / blackflow  - thank you, I value some understanding here, appreciate your explanations.
[15:12] <blackflow> https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
[15:13] <gbkersey> JamesBenson: this is what I see on the server side with the twinax I'm using: Warning: Unqualified SFP+ module detected, Port 0 from OEM but the next line says - NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[15:13] <foo> Sounds like my options are A) resource control gunicorn or B) see if I can less RAM usage in gunicorn (so 3 processes aren't taking up 150MB) ... what I'm not sure of is how A) would affect the actual gunicorn process (eg. if it can't get all it's memory, would it force gunicorn to complain? would it "slow down" performance for that process?)
[15:13] <blackflow> foo: just remember that setting these flat limits makes your RAM usage suboptimal. allowing one process to temporarily peak is not bad, as long as you have the peaks under control
[15:14] <blackflow> when they all start to peak at the same time, that's when you need more RAM .... or somehow bring down those peaks.
[15:14] <foo> aka. bring down those peaks = option A or B)
[15:14] <blackflow> foo: limiting a process will result in an error for that process only, when it requests more RAM and there isn't any
[15:14] <sdeziel> foo: or load balance the incoming requests between more servers
[15:15] <blackflow> foo: python is verbose about that: https://docs.python.org/3.6/library/exceptions.html#MemoryError
[15:15] <foo> blackflow: great, that's what I was not aware of - thank you for explaining
[15:15] <foo> sdeziel: this is such a low profile and low traffic app... this all randomly start which is what I'm wondering about
[15:17] <foo> blackflow: I actually have been seeing MemoryErrors, too. There are several things happening at once. Postgres generally is what complains about MemoryError. I've been trying to figure out root cause of this for about 3 months now. It randomly happened once in Feb, once in March, then about a dozen times this month. Not much has changed that I'm aware of. If anything, we switched from ubuntu 14.04 to
[15:17] <foo> 18.04 in Dec 2018
[15:18] <blackflow> foo: how many workers have you configured for the gunicorn app?
[15:21] <foo> blackflow: there's a few different processes. eg. gunicorn runs django for main site, then gunicorn runs for our own app (that has 3 different gunicorn instances). Here's one of the instances sky-admin which is taking up the most RAM: https://paste.ofcode.org/YsYRbcRnnsbXdrj6rZpd7f - 1 worker
[15:23] <blackflow> so there's no dynamic number of processes? something that, say, scales up with number of requests coming in?
[15:24] <blackflow> bottom line you're definitely out of RAM. Since that's a DO droplet, perhaps it'd be wise to upgrade it, and then run a thorough analysis of how much RAM each process peaks at without an error, and then decide how/what to limit and whether you'll want to downgrade the droplet again
[15:26] <foo> blackflow: thank you! that's what I'm leaning towards... and even before upgrading droplet, I think enabling 2GB SWAP probably makes sense (right now none is enabled). Agreed? This would allow me to troubleshoot this, do testing, without things getting killed in producted by OOM
[15:27] <blackflow> partially, yes
[15:27] <foo> Actually, we probably had 2GB enabled swap on the old system before the 14.04 > 18.04 upgrade...
[15:27] <foo> That might have been why I never saw this
[15:27] <sdeziel> foo: you might want to look at zram/zswap
[15:27] <Ussat> I never run without a swap.
[15:27]  * foo checks https://linuxize.com/post/how-to-add-swap-space-on-ubuntu-18-04/
[15:27] <Ussat> on any of my systems
[15:27] <foo> Ussat: yeah... I think this was an oversight on my part
[15:27] <Ussat> NP, happens and easy to fix
[15:27] <foo> sdeziel: haven't heard of that, different than normal swapping I assume? Hmm, thanks
[15:28] <blackflow> I wouldn't recommend zram or zswap. that's like applying bandaid to a gaping wound.
[15:28] <Ussat> ^^
[15:28] <sdeziel> foo: I just learned that it's presumably being used by default in ChromeOS
[15:28] <foo> sdeziel: oh, interesting - thans
[15:28] <blackflow> especially zswap is not swap at all, but memory compression of unused pages ---- that still occupy memory.
[15:28] <foo> blackflow / Ussat - appreciate your vote, thank you
[15:28] <sdeziel> but yeah, my first recommendation would be to use a plain swap(file) first
[15:29] <foo> sdeziel: appreciate it!
[15:29] <foo> Is there a generally "best practice" swap size to use? I generally use 2GB
[15:30] <Ussat> foo, thats a HOTLY debated topic. Generally all my systems have between 2-4
[15:30] <sdeziel> foo: there are various guidelines. hybernation requires >= RAM IIRC
[15:30] <Ussat> but it depends on system use, memory etc
[15:30] <Ussat> is this a physical or vm ?
[15:30] <sdeziel> foo: 2G sounds OK to me. If you end up swapping that much you'll definitely notice the performance hit
[15:31] <Ussat> and ya that
[15:31] <foo> Ussat: VM, digial ocean droplet
[15:31] <foo> Currently at 2GB
[15:31] <Ussat> some apps (looking at you oracle) require a big swap
[15:31] <foo> I can add 2GB swap... if I go over that, probably makes sense to increase droplet swap
[15:31] <Ussat> 2G swap should be fine
[15:31] <Ussat> and ya if you swap all 2G you will notice it
[15:32] <blackflow> something like Munin to monitor and graph over time RAM, swap usage, and other things, is very recommended too
[15:32] <Ussat> most of my work VM's have 2-4
[15:33]  * foo learns about Swappiness Value
[15:33] <foo> Sounds like next step here... given the OOM and reason for doing this... is to set up system monitoring and watching how often swap is used with some pretty graph or such, agree?
[15:33] <foo> oh, heh, I just read backlog ... blackflow is a step ahead of me (thank you)
[15:34] <Ussat> if it runs a java app you might also look ap heap size
[15:34] <Ussat> loot at
[15:34] <foo> Ussat: negative
[15:34] <Ussat> ok
[15:34] <Ussat> good
[15:34]  * Ussat is NOT a fan of java apps
[15:35] <Ussat> They are the bane of my existance
[15:36] <foo> Ussat, blackflow, sdeziel, cyphermox, yossarianuk - I suspect I'm good for a bit here. I am very grateful for your time/contribution to this, thank you! This may all come down to not having swap enabled post a deployment I did in December. Still curious what is causing this to happen so much this month, but at least now I can troubleshoot/investigate without being stressed on a tight timeframe and I
[15:36] <foo> know swap can cover for a bit. Thank ya'll.
[15:37]  * foo waves magic wand and grants you all access to use his nick in config and code / etc
[15:37] <blackflow> foo: re monitoring, yes. I'm a big fan of Munin we use on all our servers. But there's Zabbix and others too.
[15:38] <blackflow> foo: having swap enabled even when you don't have OOMs is wise. nowadays various bloatware products will have unused pages that can be swapped out and RAM left for apps to use. esp. Python
[15:38] <foo> blackflow: I used to use nagios back in the day. I heard good things about Zabbix and Munun... going to look into getting this set up now. My next question is... do I spin up another Digital Ocean Droplet to monitor... wait actually, I guess I could run zabbix/munun on the system itself
[15:38] <foo> s/munin/
[15:38] <Ussat> we still use nagios :)
[15:38] <Ussat> I just finished building a nagios server for core-team here
[15:39] <blackflow> munin is very light, it's a cron based master process with (mostly) perl based sensors, that creates static HTML+png pages and graphs, which yes can run locally, no need for a separate DO
[15:39] <foo> Ussat: rad!
[15:39] <Ussat> blackboxsw, I am trying to convince them to change to munin
[15:39] <Ussat> but "we have always...."
[15:40] <sdeziel> munin and nagios have different use cases unless munin now offers more than it did years ago
[15:40] <Ussat> it does
[15:40] <blackflow> Ussat: munin is very nice. we use it even to send us alerts though that's a bit unoptimal because it'll keep mailing every 5 minutes until the alrt value is below the treshold.
[15:40] <sdeziel> however crappy NRPE is, it's pretty handy
[15:40] <blackflow> and custom plugins are dead easy to write, you can have them in anything. shell, perl, python, C, java, whatever.
[15:41] <Ussat> We are a mostly IBM shop so we have Tivoli monitoring for most things
[15:41] <sdeziel> NRPE checks are also trivial to write
[15:41] <blackflow> sdeziel: munin is primarily to graph things, but we use it for alerts too as it can do alerts on value tresholds.
[15:42] <sdeziel> blackflow: sounds like what netdata does which I'm more familiar with
[15:42] <Ussat> Most of my AIX stuff is monitored with Tivoli, and we are getting more and more linux into Tivoli monitoring
[15:42] <blackflow> sdeziel: I'm not familiar with netdata, sorry
[15:43] <sdeziel> blackflow: worth checking IMHO: https://my-netdata.io/#demosites
[15:44] <blackflow> sdeziel: huh real time streaming of data... interesting. sometimes I need that, and munin is limited to cron based invocations
[15:45] <sdeziel> blackflow: the way multiple sites are aggregated is pretty nice as well
[15:45] <sdeziel> it's decentralized by default and your browser is the one building the aggregated view
[15:47] <blackflow> I see.
[15:47] <foo> " < blackflow> and custom plugins are dead easy to write," - nice, I liked this about nagios... I wrote a few back in my day (~10 years ago)
[15:48] <Ussat> foo something you may look at for that situation is nmon for linux
[15:48] <Ussat> http://nmon.sourceforge.net/pmwiki.php
[15:48] <Ussat> full disclosure, I know the author
[15:48] <foo> I had someone recently suggest librenms.org - for discovery + monitoring. *shrug* Wasn't my call, but curious to see how it performs
[15:50] <foo> haha, just pulled up home pages of zabbix, munin, and grafana ... munin is the least pretty to look at. Which probably means it was built by techs who have solid tech and don't care about eye candy corporate/enterprise-y stuff... I could be mistaken, but fun thought
[15:50] <Ussat> I love that program
[15:52] <foo> Ussat: huh, thanks, nmon looks cool. /me saves
[15:52] <Ussat> its VERY extensive
[15:53] <Ussat> It was origionally written for AIX and has been continusely improved...Nigel ported it to Linux recently but its a GREAT tool
[15:53] <Ussat> I install it by default on all my builds
[15:55] <Ussat> NIgel is a performance specialist for IBM
[16:01] <neildugan> I have a boot on a zfs system.. recently I have been having a problem with doing a "apt dist-upgrade" ... I keep getting a "grub-probe: error: failed to get canonical path of `rpool/ROOT/ubuntu'." ... does anyone know how to fix this?
[16:02] <blackflow> neildugan: _boot_ or _root_ on ZFS?
[16:02] <blackflow> like /boot too?
[16:03] <neildugan> blackflow, both
[16:03] <neildugan> "grub-probe /" is the returning the error
[16:04] <blackflow> looks like an open issue   https://github.com/zfsonlinux/grub/issues/5
[16:04] <blackflow> personally I'm still under impression that grub ZFS support is not yet there.
[16:05] <blackflow> I run /boot separate on ext4 but that's primarily due to ZFS rootpool being LUKS'd
[16:12] <neildugan> blackflow, I wonder what changed recently to make this happen, though that is secondary to getting things working again
[16:14] <blackflow> neildugan: wouldn't know, really.
[16:16] <neildugan> blackflow, I tried a "grub-probe -vv /" I got a new error .. '/boot/grub/device.map': No such file or directory
[16:17] <neildugan> blackflow, should I generate one?
[16:17] <blackflow> neildugan: did you look at that bug report? there are some suggestions with env vars
[16:18] <neildugan> blackflow, yes I have, I have been reading it
[16:26] <neildugan> blackflow, I have found one that mentions zpool not being in the path... but on my system it is... I am reading further
[16:59] <neildugan> blackflow, thanks for the link, there are many options I should find something to that will work.
[19:39] <BrianBlaze> hey sarnold I have gotten the application running! So happy thanks for the link for mysql :)
[19:39] <BrianBlaze> it only took a day to make happen lol
[19:41] <sarnold> BrianBlaze: thanks for reporting back, it's great to hear you're up :)
[19:41] <BrianBlaze> I am so thankful to be on the latest version as it fixes a lot of issues we had
[19:41] <BrianBlaze> :)
[19:43] <sarnold> heh, given the fact that they wouldn't let you install on the newest mysql, somehow I'm not too surprised..
[19:43] <sarnold> even when it may be way better than it used to be, it still suggests a certain programming style :)