[03:38] rbasak: https://paste.ubuntu.com/p/M4nvkWnD6n/ [03:39] powersj: https://code.launchpad.net/~nacc/usd-importer/+git/usd-importer/+merge/336877 please [03:39] it should still fail, but with the same three errors as the above paste [05:15] rbasak: you had mentioned before using dpkg-deb or something? === MJCDoffice is now known as MJCDoffice1 === MJCDoffice1 is now known as MJCDoffice2 === MJCDoffice2 is now known as MJCDoffice3 === MJCDoffice3 is now known as MJCD_ULTIM8_ === MJCD_ULTIM8_ is now known as MJCDoffice [06:33] good morning [06:46] kirkland: does the new ubuntu server installer still work with preseeding? [07:04] Good morning [09:31] jamespage: coreycb: can you trigger https://bugs.launchpad.net/designate-dashboard/+bug/1715417 to be pulled from artful into uca/pike, please? [09:31] Launchpad bug 1715417 in Ubuntu Cloud Archive pike "Cannot view a zone in dashboard - 404 errors" [Medium,Fix committed] [09:33] frickler: can do as soon as someone marks the version in uca pike/proposed as verified :-) [09:36] nacc: try this: http://paste.ubuntu.com/p/7JgxcjV5w6/ [09:36] nacc: it stops building the .changes file. We could use dpkg-genchanges directly if we need it. [09:37] nacc: and if there are any differences, well all tests still pass, so we can deal with that if and when we find it's insufficient. [09:56] jamespage: ah, I missed that tag, will do in a bit, thx === _Jeepbeats is now known as Jeepbeats === Screedo_ is now known as Screedo === techmagus_ is now known as techmagus === led_ir23 is now known as led_ir22 === lynxman_ is now known as lynxman === rmk` is now known as rmk === DalekSec_ is now known as DalekSec [12:00] coreycb: ok I've done a few more tweaks on xtrabackup and imported and pushed the resulting work back to the main percona-xtrabackup repository [12:00] coreycb: most of that time was a copyright audit :-) [12:00] coreycb: all building in https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3160 [12:02] coreycb: I also did the same watch file and repack exclusions as I did for pxc-57 [12:02] limites the size further and helps with a load of compressed js inclusiosn [12:06] frickler: well ceph 12.2.3 was an experience [12:06] frickler: I was not expecting a boost version bump in a point release [12:09] frickler: thanks for testing the pike proposed designate dashboard - promoting that now [12:27] jamespage: coreycb: I was asked if/why pike seems to be out of date [12:27] jamespage: coreycb: plenty of fixes in artful up to https://launchpad.net/ubuntu/+source/qemu/1:2.10+dfsg-0ubuntu3.5 are not seen in there [12:27] is this in staging? [12:28] jamespage: I think the main misunderstanding about ceph is that they do not use semantic versioning, but their version numbers look like they did [12:28] yeah see it in staging [12:28] any ETA for when those are released that I could share? [13:04] frickler: that's a change in behaviour then - to-date point and patch releases have been just that [13:05] cpaelzer: I'd like to get those clear today but need to laise with coreycb first [13:09] jamespage: cpaelzer: ok pike and ocata have been regression tested successfully as of 2/15 [13:10] jamespage: cpaelzer: pike should be ready to promote. it's already promoted for artful and regression tested. i want to do some more thorough testing for kilo/ocata though as i had to adjust patches to backport those. [13:11] coreycb: ok actioning pike now [13:15] coreycb: actually can you annotate the bug for the point releases with test results and updates the tags :-) [13:16] coreycb: btw https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3172/+packages is working good for me on a three unit bionic cluster [13:17] jamespage: adding a new binary in 12.2.2 and severely changing default parameters already wasn't too nice either [13:17] jamespage: all set bug1744882 [13:17] bug 1744882 [13:17] bug 1744882 in qemu (Ubuntu Bionic) "Add SPEC_CTRL and IBRS changes" [Undecided,Triaged] https://launchpad.net/bugs/1744882 [13:18] ack done [13:18] jamespage: \o/ 5.7 [13:22] jamespage: iiuc the issue is that ceph calls every update to luminous 12.2.x, while 12.2.2 should really have been 12.3.0 and 12.2.3 == 12.4.0. there haven't been so drastic changes to jewel I think. but maybe they'll learn for 13.* [13:25] smoser, i see why the package is not there =) it got dropped to universe. [13:27] and ubuntu-meta builds with main only, asking for it to be promoted back into existance. [13:33] hallyn: actually, it uses curtin and cloud-init! [13:34] hallyn: which makes it basically the same as MAAS [13:51] jamespage: looks like it's still py3 in xenial-proposed/queens [13:55] erm [14:05] tobasco: http://paste.ubuntu.com/p/sdHSXmrfCP/ exists; the default is still py3, but its possible to use py2 via that package - it contains all of the required binaries and wsgi entry points [14:13] coreycb: doing some sysbench against those pkgs - hanging together ok so far [14:16] jamespage: awesome [14:20] coreycb: ok upload those both to bionic [14:21] coreycb: good work btw - a bit of a team effort which is good [14:21] as we both now know :-) [14:21] jamespage: woohoo \o/ thanks for all the help! [14:39] xnox: bah. what do we need to do then ? [14:48] jamespage: ah ty [14:52] smoser, asked archive admin to repromote it back in. it is done and now waiting to publish. [14:52] it's a "binary only movement to main" [14:53] xnox: someone must have done it. [14:53] http://paste.ubuntu.com/p/DJdPPxhzmW/ [14:53] uploading. [14:55] cpaelzer: around ? [14:55] can we discuss bug 1750780 [14:55] bug 1750780 in open-vm-tools (Debian) "Race with local file systems can make open-vm-tools fail to start" [Unknown,Incomplete] https://launchpad.net/bugs/1750780 [14:56] smoser, yeah =) it was not there like 40mins ago, so it must have just published. cool. [14:56] smoser, once that publishes and migrates; i guess we need to respin images? then retry the tests? then things will migrate? [14:58] i uploaded open-iscsi with the timeout yesterday, xnox [14:58] i'mok if you let stuff through. it really is at this pomoment "bad test" [15:03] coreycb: OK I've sniffed queens in proposed sufficiently - promoting to -updates [15:03] jamespage: ok [15:10] smoser, ok, proposing a hint https://code.launchpad.net/~xnox/britney/open-iscsi-vs-broken-cloud-image/+merge/338560 [15:12] xnox: can you just suggest that it should be removed if there is a cloud image > 2018-02-22 [15:13] or more specifically with 'server' at 1.411 ? [15:14] err.. ubuntu-server [15:14] smoser, that's not enough, we need image with server at 1.411 which is hard to codify =) the hints are manual.... [15:14] sure. i'm saying write that in text [15:14] smoser, however, the hint will self-become inactive, once the open-iscsi package in proposed passes the test and migrates. [15:15] smoser, so in effect, it will self-drop automatically when everything becomes good. [15:15] oh it will? [15:15] smoser, note the hint is version specific. [15:15] i didnt know there ws any magic [15:15] it's not an /all/ hint [15:15] ah. you have the version of open-iscsi i see. [15:15] at least stil write in text [15:15] if image has > ubuntu-server 1.411 [15:15] then this is not valid. [15:29] anyone here know who to talk to about AMIs in cn-north-1? None of recent 16.04LTS images in cn-north-1 are bootable [15:34] Odd_Bloke: ^ [15:38] irvingwashington: smoser: This is unexpected. We will take a look at this. [15:39] irvingwashington: Which AMIs have you tried? (Do older ones work for you?) [15:45] kirkland: so I can't use preseed? [15:45] kirkland: let me put it more construcively - would love to see a blog post about how to direct it to setup a particular paritioning scheme :) [15:46] (and maybe how to then do some setup afterward using cloud-init - that part is pretty clear to me but still woudl be good for a blog post) [15:46] hallyn: indeed, that's a good question [15:47] BTW there's another blog post / tutorial which would be good to see - how to write a simple replacement set of MAAS scripts to control the power controllers [15:48] No such post out there right now, would be both useful and probably helpful to maas adoption [15:51] thanks for the inof coreycb and jamespage [15:51] hallyn: well, "scripts" dont work. you can plug in a power controller to maas, but its integrated. python. [15:51] not "scripts". [15:52] smoser: I'm here [15:52] smoser: in your standup hangout now if you want? [15:53] k [15:53] hallyn: https://gist.github.com/smoser/375123ef1ef098be23cc856a5772c5c8 [15:53] that describes how you can kind of test things and such. [15:53] http://curtin.readthedocs.io/en/latest/ [15:53] jamespage: i'm going to try building pytest without the pypy-hypothesis BD and possibly patch that in ca-patches [15:53] has information on curtin, the 'configuration types' [15:54] are how to do storage layouts [15:54] and then... there are many examples in tree of storage. [15:54] hope that helps. [15:54] smoser: ah I found your bug update - reading ... [15:57] smoser: yeah yeah everyone keeps 'correcting' me on that - if it's interpreted it's a script in my lexicon :) [15:59] smoser: cool gist - you should make it a blog post :) [16:00] irvingwashington: I can reproduce the issue with the HVM instance-store AMI (ami-fc459891), but not with the HVM EBS AMI (ami-cc4499a1). [16:00] I'll dig in to the instance-store issue. [16:01] smoser: so the ubuntu installer is now based on curtin; if i do a pxe boot of a netboot image of bionic ubuntu server, can that curtin config file be a url? [16:01] rbasak: are you ok if I pull that into my branch? [16:02] hallyn: yes i should [16:02] I assume if I did it 'by hand' i woudl boot a liveos and run curtin from there - that's fine, but presumably the installer ... [16:04] jamespage: yeah no dice on dropping pypy-hypothesis. maybe we don't need to backport pytest. [16:04] checking [16:05] hallyn: *an* ubuntu installer is based on curtin [16:06] subiquity [16:06] which will be the primary offering for server [16:06] you will still be able to get the alternate download. [16:08] ah. [16:08] still - will subiquity be able to take a url for curtin config? [16:08] i *do* want to be able to use curtin :) [16:09] i do not know that. [16:09] ok :) [16:09] did you reply to that thread [16:09] what thread? [16:10] I need to try out mailborder, if that can do a decent job with my spam i'll try and un-devnull my ubuntu mail [16:11] nacc: go for it [16:11] smoser: anyway don't take it the wrong way but i'm gonna try and push you guys to blog a bit more :) [16:11] rbasak: thanks, building the snap locally [16:12] I'm hoping in the next few weeks to play with the maas scripts (!) a bit and theni can blog about hwo to do it, [16:12] but as i got pxe doing what i needed it's hard to justify the time [16:12] :) [16:13] nacc: I sort of intended you did that, as I can't easily test inside your snap environment, and didn't want to go further without checking that it actually does solve the in-snap problem. [16:13] rbasak: ack, understood :) [16:13] rbasak: can you also peek at https://git.launchpad.net/~nacc/usd-importer/commit/?id=3e6589aa6d2e4f53a5e76ecdb8fed2f12184f5e3 [16:13] rbasak: it's another obvious (to test in the snap, we need to adjust paths) [16:13] rbasak: the tests still work locally, as well, but i want to know if there's a better way to do that massage [16:14] hallyn: https://gist.github.com/smoser/9f9a2f521e13f3add8d45de00124c18d is related also [16:14] hallyn: yes. agree. [16:14] nacc: I need to look up the context. But immediate thought: maybe wrap Changelog.from_path in the tests? [16:17] nacc: https://jenkins.ubuntu.com/server/job/git-ubuntu-ci-redux/7/console [16:17] powersj: thanks [16:18] smoser: so i see i need to start watching https://gist.github.com/smoser :) [16:18] can i get an rss feed of those i wonder [16:21] smoser: cool looks like i shoudl update my ages-old uvt-kvm setup on my big host [16:23] Odd_Bloke: we started with ami-fc459891 and worked our way backwards. Eventually gave up and went back to using the original 16.04 AMI we started from sometime in June of 2017 [16:23] irvingwashington: All of them HVM/instance-store? [16:24] Odd_Bloke: yes, we didn't test any ebs AMIs [16:24] hallyn: yeah. you should. :) [16:24] i do mean to take a bunch of those and turn them into blogs. [16:25] on github/blogs or whatever tha tis [16:25] irvingwashington: OK, thanks for reporting the problem. :) [16:25] we track 14.04 more closely and the recent images for that are fine. Only 16.04 [16:25] smoser.github.io [16:25] Odd_Bloke: thanks for looking into this. Apologies for not reporting this sooner. [16:27] nacc: http://paste.ubuntu.com/p/XtFrSQpbnZ/ [16:31] smoser: i thought you'd run your own static site fed by m4. [16:31] disappointed [16:50] rbasak: http://paste.ubuntu.com/p/QydVfYxr6b/ [16:50] rbasak: tests still fail, but differently [17:05] nacc: I think that's a real bug. [17:06] nacc: the patch I gave you worked locally, so perhaps a newer dpkg is more pedantic or something. [17:06] nacc: the test needs updating to supply a version of '1-1' in the non-native case, instead of using the default '1' I think. Alternatively I need to fix SourceSpec or SourceFiles so if native is False then the version defaults to '1-1' instead of '1'. [17:14] rbasak: ok, i can look at it === Epx998- is now known as Epx998 [17:48] rbasak: stil there? had a quick question [18:20] hello there just to confirm, 17.10 is not a version of ubuntu I should be installing in servers right? like Dell R440 [18:21] jair: unless you are wanting to preview 18.04 features, I would not [18:21] dpb1: understand [18:21] dpb1: we installed in a dell r440 because the perc 740p raid controller in 16.04 server was not supporting it [18:22] jair: you could try the HWE kernel [18:22] but now we have this weird issue the memory keeps growing every 19 hours and then the server crash [18:22] Yep I got that tip but I am just trying to avoid re-install [18:22] jair: https://wiki.ubuntu.com/Kernel/LTSEnablementStack [18:22] ok [18:22] well [18:23] dpb1: we are having this issue > https://ibb.co/dmrfvx [18:23] jair: please don't crosspost [18:23] jair: i was already helping you in #ubuntu [18:23] jair: it *sounds* like you have rogue processes, rather than a hardware problem, stock Ubuntu on its own is not going to consume that much memory 'every 19 hours', more likely something you've got running is trying to take that memory [18:23] teward: it's THP, i'm fairly sure [18:23] here is the output of meminfo > http://paste.debian.net/1011521 [18:23] teward: it's possible it's a rogue process doing the THP, but seems unlikely [18:24] nacc: THP == ? [18:24] i'm tired and uncaffeinated today :) [18:24] teward: Transparent Huge Pages [18:24] thank you [18:24] * dpb1 backs away [18:24] teward: they have a ton of memory allocated there [18:24] you're right though THP is likely to be the problem [18:24] nacc: my sincere apologies, I just noticed that ubuntu-server is where I should have been chatting [18:24] 1G pages, which i believe are not swappable [18:24] jair: ya, if you are already talking to nacc, you should just keep doing that [18:24] whether it be rogue processes or not (but i just got here) [18:24] dpb1: i presume you backed away because of the crosspost reason... or was it because I'm not caffeinated :P [18:24] sorry all this is a server not a desktop [18:25] teward: lol [18:25] therefore I believe I should be chatting here the Debian team advise me that [18:25] jair: is this Ubuntu or Debian? [18:25] there *is* a difference [18:25] (Just confirming) [18:25] Ubuntu [18:25] 17.10 [18:25] teward: here http://paste.debian.net/1011531 [18:26] nacc: and we confirmed THP is enabled on their environment? [18:26] I already did what nacc told me about disabling THP [18:26] whoops speaking of memory issues... *grabs another stick of RAM to throw in the hypervisor that is almost out of RAM, disappears for a short while* [18:26] teward: yeah, madvise [18:26] teward: it was http://paste.debian.net/1011527 [18:26] teward: i'm going off of their meminfo [18:26] teward: I disabled it [18:26] jair: did you reboot yet? [18:27] nacc: the server is still running and the memory increasing [18:27] I suspect it will crash soon [18:27] jair: right, so reboot? [18:27] jair: not sure why that's relevant? [18:27] jair: i mean, we're trying to see if THP is what is causing your growing memory [18:27] I will need to let it do it by itself.. :( [18:27] jair: so disable it at t he grub config [18:27] jair: and reboot [18:27] ^ this [18:28] jair: i mean, i guess you're welcome to wait, but it doesn't tell us anything [18:28] in and of itself [18:28] jair: is this a production server? [18:28] if it is you're better off rebooting *now* rather than letting it 'die' on its own [18:28] nacc: understand but this is a prod router providing BGP to our organization I need to wait until reboot itself [18:29] jair: and your organization can't have a short period of downtime for 'emergency maintenance'? [18:29] dpb1: yes unfortunately I am trying to help the organization but I am not the main boss [18:29] if *that* is the case you have a bigger issue than just THP being enabled [18:29] teward: believe me it's complicated [18:29] jair: dude, insert testing in production meme here. :) [18:29] jair: i know what 'complicated' is, i'm an IT consultant for several businesses on my own, as well as employed by others directly, all in the IT role. [18:30] guys I know and I can't agree more , but I am not the boss unfortunately [18:30] but when things need emergency-fixed the companies tolerate a short period of downtime :P [18:30] jair: so talk to the boss. [18:30] jair: if it were me, I would reboot, always better to be in control of downtime [18:30] ^ this [18:30] only someone I don't know what they are thinking put a 17.10 in a prod server [18:30] jair: yes, that is also a fail [18:31] yep [18:31] jair: wait, so your compnay is ok with random reboots [18:31] jair: but not with planned reboots? [18:31] ^ this is what i was saying [18:31] if that is the case the company is fubar with policies,. [18:31] yeah :) [18:31] jair: take a page from me: [18:31] ***talk to your boss*** about an emergency reboot [18:32] they can probably tolearate 5 minutes of planned downtime vs. two hours as a result of a random crash [18:32] *just saying* [18:32] yes doing that now [18:32] hold on please [18:33] teward, I used that arguement in a meeting with the IT management once.....they were concerned about giving me downtime.....I just sat back and said "No Problem, when it crashes we will fix the issue". I got my downtime [18:34] +1 [18:34] Ussat: and that's usually my argument as well. And most of the outages I cause are no more than 20 minutes of downtime, and that part is usually when I'm just rebooting the SAN for emergency maintenance or have to reboot it to put drive expansion arrays on it [18:34] so... :p [18:36] for some reasons, I (wrongly?) assumed that THP was something you could disable live and the kernel would breakup the huge pages into multiple regular ones [18:37] you can disable it but I dont thing that changes existing huge pages [18:38] think [18:38] sdeziel: yeah, you can disable it [18:38] sdeziel: i think there might be a way to release the pages manually, but i don't know [18:39] nacc: trippeh: testing it as we speak [18:39] friends I am really sorry I am in Japan now and it is 3:39 am [18:39] so far AnonHugePages reduced a little [18:39] I will let the server crash not say anything and report if that change fixed the issue [18:40] I should test if THP is OK for $job workloads again one of these days. I've usually been left disappointed. [18:40] jair: sorry, i might be totally wrong [18:40] nacc: o/ [18:40] jair: i was doing more research (it's been a while since i was libhuge maintainer :) [18:40] jair: DirectMap1G is a reflection of the TLB status [18:40] nacc: OK [18:40] should I enable that back? [18:41] jair: can you pastebin /proc/meminfo again? [18:41] ok [18:41] rbasak: have a few minutes for a HO? [18:41] nacc: sorry, about to eat dinner [18:41] rbasak: np, i think i found a few gotchas in source_builder [18:41] i'm fixing them in my branch, but they'll need your review for sure :) [18:41] nacc: OK [18:41] rbasak: enjoy your evening! [18:41] Thanks! [18:43] jair: i think that menas your kernel is using 1G pages for something [18:44] nacc: here https://paste.ubuntu.com/p/MkkpnxySxR/ [18:45] jair: yeah, so iiuc, 80% of your system memory is being consumed by the kernel for its mappings [18:46] sorry I got disconnected [18:46] i can imagine a networking table using up a ton of space [18:46] if the server is being heavily used [18:46] nacc: so, should I enable back that setting? [18:46] jair: do you see that DirectMap1G value increasing? [18:46] jair: yeah, it won't have any effect [18:46] OK [18:46] jair: is that meminfo different than the last one you gave me? [18:47] no increase [18:47] I can compare [18:47] hold on [18:47] I will do a fdiff [18:47] diff [18:47] jair: i am diffing here [18:48] jair: hrm, something ate ~400M of memory from the free [18:49] jair: but i'm not seeing any equiv. growth [18:49] nacc: here http://paste.debian.net/1011539 [18:50] jair: yes [18:50] jair: your system is fully up to date? [18:50] Yes [18:50] nacc: here http://paste.debian.net/1011531 [18:51] so anonhugepages are not deaggregated [18:52] sdeziel: good to know :) [18:52] sdeziel: but it was a red herring/misapprehension on my part anyways [18:52] changed back: [18:52] http://paste.debian.net/1011531 [18:52] sorry [18:52] jair: i'm trying to think of what might be happening [18:52] it *seems* likely that something in kernel is reserving the memory [18:52] and not freeing it or so [18:52] jair: are new iptables rules being writen constantly? [18:53] I mean this > # cat /sys/kernel/mm/transparent_hugepage/enabled [18:53] always [madvise] never [18:53] nacc: nahh [18:53] this is just doing routing from our ISP to our infrastructure [18:53] we are using it as router [18:54] jair: can you pastebin `cat /proc/mounts` ? [18:54] the only reason I got from the guy who installed 17.10 in the R440 dell was because he could not install LTS server because did not supported the raid controller perc 740P [18:54] he could not see the drives [18:54] ok [18:55] nacc: http://paste.debian.net/1011542 [18:56] jair: to be clear, MemFree being low is normal [18:56] you want all your memory to be in use [18:56] but MemAvailable decreasing is a bit odd [18:56] right [18:57] jair: do you have a cpature of full console log when the system crashes? [18:58] specifically, the *first* oom report [18:58] nacc: yes [18:58] nacc: let me pass it [18:59] nacc: https://ibb.co/g65wkx [18:59] there [19:00] jair: there should be a bit more before that [19:01] nacc: this is captured from the idrac IPMI tool [19:01] nacc: I would have to record yje screen or something [19:01] nacc: perhaps dmesg? [19:04] it's going to crash soon [19:04] http://paste.debian.net/1011546 [19:04] jair: dmesg will be gone once you reboot [19:04] hmm [19:04] jair: you want to actually grab the console the whole time [19:05] I see [19:05] I wonder if that is possible in idrac ipmi [19:05] jair: well, use a typescript and hook into a screen session or so? [19:06] naac I think we will install hwe kernel and install 16.04 that is what we are all advising [19:06] I got this [19:08] jair: well, i mean the hwe kernel on 16.04 is the same as the kernel on 17.10 right now [19:08] afaik [19:08] https://answers.launchpad.net/ubuntu-certification/+question/664756 [19:09] jair: i mean, yes you should use the 16.04 release anyways [19:09] check the options Jeff Lane gave me [19:09] yes i'm reading [19:10] jair: when the oom killer runs, it emits a bunch of data about the state of memory [19:10] including all pages currently allocated [19:10] that is what you need to obtain to debug what is happening [19:10] jair: i would guess you'll see the same issues with 16.04.3, but that would be a good thing to test [19:11] Yep because Dell say in their website that 16.04 is supported [19:12] well nacc Thank you so much for battling with me on this [19:13] jair: yw [19:16] my syslog is printing this: smbd.service: Got notification message from PID 12210, but reception is disabled. [19:16] is this something to worry about or is it benign? [19:17] bight night [19:20] new vampire flick [19:22] dpb1: hehe :) [19:22] but with "bight", it'd be vampire pirates. it's a rope joke. [19:26] heh [19:45] powersj: ping [20:20] nacc: back? [20:21] powersj: yeah, sorry, power hiccup [20:21] i think i figured out my issue [20:21] ok :) [20:21] powersj: codecoverage plugin to pytest creates a file [20:21] yeah [20:21] i want to avoid doing that with the self-test, since we don't know where we're runnig from === chat is now known as Guest64380 === keithzg_ is now known as keithzg === Epx998- is now known as Epx998 [22:58] powersj: please rerun the new CI on https://code.launchpad.net/~nacc/usd-importer/+git/usd-importer/+merge/336877 [22:58] powersj: err, resubmitted so https://code.launchpad.net/~nacc/usd-importer/+git/usd-importer/+merge/338593 [22:59] rbasak: --^ fyi, please review that one [23:00] nacc: https://jenkins.ubuntu.com/server/job/git-ubuntu-ci-redux/8/console [23:00] powersj: thanks [23:00] powersj: that should pass