jfh | good morning | 05:54 |
---|---|---|
andrewc | jfh, good morning! | 07:49 |
andrewc | jfh, sorry to hear that you're having difficulty getting online :-( | 07:49 |
andrewc | jfh, as well as your mail, you could also try asking for assistance on the "canonical-sysadmin" channel on freenode... | 07:50 |
jfh | good morning andrewc - well, I hope I can figure that out soon ... seems to be an issue with the Sign-in through Canonical ... let's see ... | 07:51 |
jfh | good point - will try that, too | 07:51 |
zachman | :D | 09:59 |
cpaelzer | jfh: welcome to the dark side | 10:46 |
jamespage | o/ | 10:47 |
cpaelzer | hi jamespage | 10:47 |
cpaelzer | mihajlov: borntraeger: hi I hope you have a good weekend soon, but we would have a question regarding libvirt/kvm/openstack on s390 | 10:49 |
cpaelzer | I hope it is one of the former two so we can get a quick solution without asking the OS guys :-) | 10:49 |
cpaelzer | jamespage: here in the channel hit thie bug just a few minutes ago | 10:50 |
cpaelzer | https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1564831 | 10:50 |
ubottu | Launchpad bug 1564831 in nova (Ubuntu) "s390x: error booting instance" [Undecided,New] | 10:50 |
borntraeger | cpaelzer, ? | 10:50 |
cpaelzer | mihajlov: borntraeger: and I wondered what/why you don't hit that with z/KVM+OS | 10:50 |
cpaelzer | the title is rather misleading IMHO | 10:50 |
cpaelzer | libvirtd[21610]: this function is not supported by the connection driver: cannot update guest CPU data for s390x architecture | 10:50 |
cpaelzer | borntraeger: mihajlov: that is closer to where things might start to break | 10:51 |
borntraeger | cpaelzer, I would assume that this is about the "not yet available" cpu model support | 10:51 |
borntraeger | cpaelzer, mihajlov : but there was a workaround in libvirt for that | 10:52 |
jamespage | http://libvirt.org/git/?p=libvirt.git;a=blobdiff;f=src/cpu/cpu_s390.c;h=23a7f9d8d38a00dc9c673d224f797cf8a17aa5d1;hp=f9d7e216aec847df321d7c7d3a050415ee8550fd;hb=59403018893cf2c4f9a6f5145e387cefbd44399a;hpb=b789db36ae1cb5a48986c3b9e3bfb64131367872 | 10:52 |
jamespage | looks relevant but we appear to have that in the libvirt version in xenial - just double checking | 10:52 |
jamespage | yah - confirmed in 1.3.1 | 10:54 |
cpaelzer | jamespage: hmm - to be sure is that OS against libvirt/KVM ? | 10:55 |
cpaelzer | or containers anywhere in between? | 10:55 |
borntraeger | cpaelzer, jamespage , mihajlov its certainly a message from libvirt | 10:55 |
borntraeger | cpaelzer, jamespage, mihajlov , but I have not seen it here | 10:56 |
cpaelzer | jamespage: could you identify the exact (api) call it made to trigger that? | 10:56 |
jamespage | cpaelzer, borntraeger: actually yes there is a container in the way here | 10:57 |
jamespage | I think that's the cause of the problem... | 10:57 |
jamespage | empty /proc/cpuinfo is not helping I suspect | 10:58 |
cpaelzer | jamespage: do you want to give it a try without containers just with KVM ? | 11:04 |
xnox | jamespage, is missing /proc/cpuinfo an lxc/lxd bug, given that it needs to emulate/whitelist/synthesise it or some such? | 11:06 |
jamespage | xnox, yes I think so | 11:06 |
jamespage | cpaelzer, not just yet | 11:07 |
xnox | jamespage, i guess a manual provider can be mixed into the thing... ? | 11:08 |
jamespage | xnox, figured out how to bind mount the hosts cpuinfo into the container... | 11:11 |
xnox | ^_^ | 11:11 |
jamespage | xnox, getting alot of "Failed to allocate directory watch: Too many open files" | 11:11 |
jamespage | xnox, cpaelzer: lack of /proc/cpuinfo is a problem for LXD, but does not appear to be the cause of this... | 11:17 |
jamespage | xnox, cpaelzer: trying a trick to add the host machine to the deployment, but just hit the wall with the 2G root disk size... | 11:20 |
xnox | jamespage, what's your host? you should be able to active e.g. additional drives and add them to the vgroup. | 11:21 |
xnox | jamespage, btw i can reboot s1lp7 and give it to you as well, as an additional resource it should have ~100GB large rootfs. | 11:21 |
jamespage | xnox, my problem is that all of the control plan IP addresses are on the local bridge and not generally accessible... | 11:22 |
jamespage | 2016-04-01 11:21:49 INFO install E: Write error - write (28: No space left on device) | 11:22 |
jamespage | not unexpected... | 11:22 |
cpaelzer | xnox: does d-i in guided partitioning try to create a swap disk as huge as memory? | 11:47 |
cpaelzer | xnox: the disk of james on a 40G memory system had split the available ~41G into 38.x swap and 2G root | 11:47 |
cpaelzer | xnox: s390 is the land of small disks and (sometimes) a lot of memory | 11:48 |
cpaelzer | xnox: there should/could be a cap on the swap size | 11:48 |
xnox | cpaelzer, jamespage: this is a classic d-i/partman bug. there are no caps, just multiples. | 11:50 |
xnox | deactivate swap, remove it, enlarge partition, enlarge rootfs.... | 11:50 |
cpaelzer | xnox: already done that | 11:50 |
cpaelzer | I just wanted to avoid the next oen running into it | 11:50 |
cpaelzer | xnox: classic means the bug exists and is open= | 11:51 |
cpaelzer | ? | 11:51 |
xnox | https://bugs.launchpad.net/ubuntu/+source/partman-auto/+bug/1032322 | 11:51 |
ubottu | Launchpad bug 1032322 in partman-auto (Ubuntu) "Swap space allocation for large memory systems needs improvement" [Medium,Confirmed] | 11:51 |
cpaelzer | great, thanks | 11:51 |
xnox | first opened in 2012-08-02 but it has been around since forever, and typically reported by installer testers in e.g. qemu vms with like | 11:52 |
xnox | "i gave it 16GB of ram and 8GB rootfs disk" | 11:52 |
xnox | thinking about it. | 11:53 |
xnox | cpaelzer, does it even make sense to have swap on lpar / z/VM? | 11:53 |
cpaelzer | xnox: don't get this started in a public channel | 11:53 |
cpaelzer | nooooo | 11:53 |
cpaelzer | you did it | 11:53 |
cpaelzer | this is like vim and emacs | 11:53 |
xnox | lpars should be big enough, and z/VM can over-commit x2 RAM | 11:53 |
cpaelzer | it can overcommit up to whatever you can accept performance wise and often I've seen 2-3x | 11:54 |
xnox | but on z/VM only, not on LPAR, right? | 11:54 |
cpaelzer | even on kvm it works reasonably most of the time although there could be soom improvements | 11:54 |
cpaelzer | LPAR is only partitioning, no overcommit for memory | 11:54 |
xnox | thinking about it, maybe there should be a safe guard that e.g. swap cannot be more than 10% of total disk space, regardless of the sizing relative to RAM | 11:55 |
cpaelzer | IMHO the Host should swap not the guests | 11:55 |
xnox | or maybe the 200% should be from the smallest of (ram, disk) sizes | 11:55 |
cpaelzer | but there are quite a lot of cases where that alone is not the truth | 11:55 |
cpaelzer | I think I have seen some logic that groups into three categories by ram size | 11:55 |
cpaelzer | ram <2G, try 2*ram | 11:56 |
cpaelzer | else swap = ram size | 11:56 |
cpaelzer | but | 11:56 |
xnox | can one at all hybernate lpar & z/VM? cause on server hibernate on emergency power shut down is a poor mans choice for redundant power. | 11:56 |
cpaelzer | never go over 64G | 11:56 |
cpaelzer | and never go over x% of the disks | 11:56 |
xnox | why 64G? why not 65G? why not 63G? | 11:56 |
cpaelzer | xnox: suspend and resume is implemented | 11:56 |
cpaelzer | arbitrary choice, like the old 2x, why not 1.8x | 11:57 |
xnox | suspend&resume is not hibernate&thaw. E.g. swap is not needed for suspend, as RAM remains powered/active. | 11:57 |
cpaelzer | if science people are involved we could suggest a smooth scaling formular no one would understand :-) | 11:57 |
xnox | 2x -> is reasonable to have a good chance at hibernate, when things have overcommited ram. | 11:57 |
xnox | cause one needs to dump all of ram to swap, to hibernate, plus whatever got overcommited/spilled over to swap. | 11:58 |
cpaelzer | ah you mean to disk | 11:58 |
xnox | yes, hibernate. | 11:58 |
cpaelzer | never cares too much about that, I'd have to check if that works as well | 11:58 |
cpaelzer | hca: ^^ ? | 11:58 |
xnox | is there hibernate on lpar / z/vm -> if not, i'll just remove swap from default recipes full stop, and people can install swapfile package to add swap. | 11:58 |
cpaelzer | wait for hca's answer | 11:59 |
cpaelzer | but then power failure is so boo low end | 11:59 |
cpaelzer | I mean most cpu calculations are doen twice for quantum effects of random particles | 11:59 |
cpaelzer | power failure - pffff | 11:59 |
xnox | i think mainframe deployements have better power failure mode handling than other architectures. | 11:59 |
cpaelzer | well - eventually they have way better handling, but then this is (sadly) one of the things the business/finance people cut costs | 12:00 |
cpaelzer | it works the same without that battery pack, well then ... | 12:00 |
xnox | >_< | 12:01 |
xnox | cpaelzer, reading all the bug reports it's like "high memory system -> too large swap" and "swap not large enough to hibernate" | 12:12 |
xnox | the most reasonable comment is from superm1 | 12:12 |
xnox | https://bugs.launchpad.net/ubuntu/+source/partman-auto/+bug/576790 | 12:12 |
ubottu | Launchpad bug 576790 in partman-auto (Ubuntu) "Partman should support disabling swap in impractical scenarios" [Undecided,New] | 12:12 |
xnox | e.g. it should be possible to have a flag to essentially "skip swap" and calculate that for "impractical scenarios" e.g. RAM >> root disk (high memory system) | 12:13 |
xnox | with a threshold as to what a high memory system is | 12:13 |
xnox | and be able to preseed that key. | 12:13 |
xnox | imho "high-memory" is anything where RAM >> 10% of total disk space | 12:15 |
xnox | (specifically 10% of the /usr partition) | 12:15 |
xnox | well... no. | 12:15 |
* xnox needs to look at partman-auto to see if it has total disk size numbers available | 12:16 | |
cpaelzer | I'm ok with almost any limit, as the hard part is creating the infrastructure not defining the exact ratio/size of the limit | 12:35 |
jamespage | cpaelzer, borntraeger: OK so after looping around and re-deploying with the compute node directly on an LPAR running Ubuntu Xenial, I still see the same problem | 13:59 |
cpaelzer | jamespage: so you now run without Containers just KVM&Openstack | 14:24 |
cpaelzer | ? | 14:24 |
jamespage | cpaelzer, well the control plane bits are still in containers but the hypervisor is not | 14:24 |
cpaelzer | ok | 14:24 |
cpaelzer | didn't xnox already say it worked for him, maybe he has the workaround you need | 14:25 |
xnox | not with latest nova generated libvirt config for our cloud image | 14:27 |
jamespage | xnox, yeah - I suspect this is a break in nova's used of libvirt but not 100% sure yet... | 14:27 |
xnox | so we will need to debug the generated libvirt config i guess. | 14:28 |
xnox | jamespage, is the one you pasted on the bug report accurate? | 14:28 |
xnox | most recent | 14:28 |
jamespage | xnox, yes | 14:28 |
xnox | cool, i'll give it a poke in a few. | 14:29 |
xnox | need to finish a few things up, and have a call, and then will be able to look into it. | 14:29 |
jamespage | xnox, having a punt at setting the cpu-mode flags for nova to host-passthrough | 15:46 |
jamespage | xnox, we do the same for ppc64el | 15:46 |
jamespage | xnox, have you hit this "too many open files" warning/error on s390x? I think its actually impacting my deployment | 15:47 |
jamespage | I see it on the host and in containers as well.. | 15:47 |
mpavone | Hi, I have updated a comment to https://bugs.launchpad.net/ubuntu/+source/nova/+bug/1564831 regarding instance not starting on s390x | 16:27 |
ubottu | Launchpad bug 1564831 in nova (Ubuntu) "s390x: error booting instance" [Undecided,New] | 16:27 |
Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!