=== Mollerz2 is now known as Mollerz [04:18] Hi anyone that knows a good docker-compose service file for systemctl ? [04:18] Iā€™m wanting to start docker after reboot. But i want to delay its restart till after a specific service has been started [07:23] Good morning [11:10] good morning [11:14] šŸ‘‹ [11:17] ? === denningsrogue6 is now known as denningsrogue [15:14] hello, I'm tasked with "vacuuming" a dynamic web site and get a static copy of it for reasons. wget --mirror does a good first pass but doesn't bring the stuff that JS would cause a regular browser to fetch. Anyone got an idea to solved that? [15:16] Unless you have access to the code and can just copy it, nope [15:17] Ussat: I have access to the dynamic copy [15:17] just copy the code, BUT, it wont just work if thats your goal [15:18] I don't see how that help me :/ [15:20] for example, there is one file with src="/core/assets/vendor/modernizr/modernizr.min.js?v=3.3.1" I could grab a copy from the FS but then I'd miss the "v=3.3.1" query string [15:38] Right, there is no way of getting that because its dynamic [15:44] Ussat: OK, I guess I'm then left with two choices: scripted headless browser to save the whole deal or an aggressive caching proxy in front of it [15:44] SO, whats the ultimage outcome you want ? [15:44] ultimate [15:44] basically survived a planed DDoS [15:45] s/survived/survive/ [15:45] DDoS is a bit of an abuse, this will be a flood of legitimate clients [15:45] So, thats what backups are for [15:46] you are approaching this wrong [15:46] if legit clients take you down, this has nothing to do with that [15:46] maybe I described it wrong cause I'm not looking for backup. I want a site that performs well under heavy load [15:46] beef up your infra [15:46] so you need a beefer infra then [15:46] hence the idea to turn a dynamic monster into a static small site [15:47] Not gonna happen [15:47] only way to accomplish what you want is a beefier server [15:48] sdeziel: one of the teams at my employer had a similar case to what you wanted. Without a full copy of the site it was a page-by-page identification and scrape. [15:48] to get a static copy. [15:48] and that was a heavily MANUAL process because of the JS, etc. [15:48] if you have a copy of the code, DB, etc. that's what you'll need. [15:48] teward: I'm betting on nginx fronting the dynamic site and doing aggressive caching of everything (HTML included) [15:49] as for surviving a planned stress load, you need the beefier server [15:49] sdeziel: trust me that won't work very well [15:49] dynamic sites, NGINX doesn't really cache dynamic sites well because usually there's underlying headers, tags, etc. that indicate to NOT keep a cached copy [15:49] we will have Cloudflare in front of that cache of ours [15:50] your process is approaching this wrong [15:50] > hence the idea to turn a dynamic monster into a static small site [15:50] impossible and not going to happen [15:50] you either need: [15:50] hmm [15:50] (1) larger infra to survive the load [15:50] (2) backup copy of the site in its entirety running independent from the stress load [15:51] I was maybe too optimistic but I though I could tell nginx to ignore caching hints and just do as told [15:51] sdeziel: that may work for static contents but NOT for JS-generated stuff [15:51] since JS and such is all client side executed and such [15:51] your entire premise and approach is backwards of what it needs to bei [15:51] s/bei/be/ [15:51] sure theres ways to get "cached copies" of data [15:51] but that won't be a replacement [15:52] because for a cached copy to *work* basically **every single potential URL, request, content piece, query string, argument, parameters, etc. needs to go through the proxy for it to work as a 'static copy'** [15:52] I would suggest, for what you want, a hot-hot HA setup [15:52] OK so two against my (naive) proposal... not feeling confident anymore [15:52] and that's not really a thing you can (a) enumerate sanely, and (b) execute in a way that survives the DDoS [15:52] i agree with Ussat [15:53] hot-hot HA setup with a loadbalancer/failover mechanism to fail over to the alternate HA on overload [15:53] ^^ [15:53] but keep in mind if your DDoS is hostname driven and not a single IP target, it will potentially explode there as well [15:53] I would suggest, two servers behind something like a F5 load balancer, as an example [15:53] ^ something like that [15:54] Its, not trivial [15:54] DB HA-HA isn't trivial either [15:54] Because, youre talking shared FS in order to keep things in sync, and agreed teward [15:54] s/HA-HA/HA/ [15:54] right [15:55] OK, should NOT have said DDoS, it's just going to be a flood of legitimate clients [15:55] sdeziel: ... which would bypass CF and then hit hard. [15:55] sdeziel: same equal problem [15:55] So, something like, 2 servers behind a F5 to balance, that both talk to the same DB, but a beefy DB server [15:55] that'd do what you're after, yes. [15:55] maby 2 or 3 webservers to distribute the load [15:55] teward: CF will take the bigger part of the hit assuming I'm crafting the right Cache-Control headers [15:55] might not hurt to have a RO copy of the DB around either as a failover so when you're under load you can serve a readonly [15:56] (like StackOverflow/SE does) [15:56] ^^ [15:56] sdeziel: you still have to accomodate for the percentage of traffic that will slip past [15:56] sdeziel: in which case you're still looking at the same problem, full on DDoS or not [15:56] You alos need to realiose, nothing will be 100% bullet proof [15:56] if your system can't handle the *load* of the legitimate clients that CF does *not* block, then you need larger resources [15:56] and also what Ussat says [15:56] no solution we could provide will be 100% bulletproof [15:57] there'll be ways to still overload things [15:57] yup [15:57] and that doesn't account for simply flooding the pipe with requests and exhausting the connection/request queues causing 503s or similar unavailability cases [15:57] I've been told to expect "24k concurrent users" whatever concurrent means [15:57] (or triggering a ton fo 416s) [15:57] concurrent means simultaneously connected and accessing [15:57] it's basically a loadtest. [15:57] on that basis [15:58] I know what concurrent means but they are management people [15:58] you have to assume you will have 24000 simultaneous requests for information [15:58] sdeziel: without more information on the 'attack' you're going to be handling [15:58] we're going to speculate [15:58] so your options are: [15:58] sdeziel, I would suggest, if youre putting something together for 24k concurrant users, IRC is NOT the venue for this, I suggest you have a professional or two architect this out [15:58] let me describe it better [15:58] i also agree with Ussat [15:58] you will need a professional network architect involved [15:58] because thqats not trivial [15:59] ^^ [15:59] the site in question will be announces on public TV so we expect many curious to just look it up [15:59] ya, get a pro involved, there is no way over IRC this can be worked out [15:59] also, this will NOT be cheap [16:00] ^^ [16:00] and i'm speaking as a professional network security guy here [16:00] not just my Ubuntu hat today ;) [16:00] alright, I appreciate the discussion [16:01] sdeziel: the SIMPLEST solution is temporarily put the site in RO mode and serve read-only copies of the site [16:01] I'll have to evaluate my options quickly [16:01] sdeziel: the SIMPLEST solution is temporarily put the site in RO mode and serve read-only copies of the site, let the DB handle things [16:01] I used to manage an ecomm site that was very high traffic and ya, get some professionals to look this over [16:01] but you still won't get the solutoin you're after [16:01] There is no way to do this quickly [16:01] you'll need professionals to design/spec this out [16:01] not if you want it right [16:01] I need a PoC at the EOD [16:01] ^^ that [16:01] and a prod ready on Dec 7th [16:02] sdeziel: your bosses need to [CENSORED CENSORED CENSORED] [16:02] sdeziel: Impossible [16:02] solution in 6 days is not feasible [16:02] No chance by the 7th [16:02] you need at LEAST 6 days to work with pros to spec out what you need, what your goals are, research options, and get budget approvals to *purchase* any softwrae you need [16:02] not a chance [16:02] 100% guarantee you will NOT get a solution by the 7th [16:02] this isn't something you can put together last minute [16:02] You will need at least 6 days to start to design this [16:03] ^ that [16:03] This is a 6 month project [16:03] yup [16:03] * sdeziel is on the phone with said boss [16:03] sdeziel: orly? Put me on the line, I'll give your boss a few dozen words. *shot* [16:03] Plus you need testing, LOTS of load testing [16:03] (no seriously it's bosses like this that send me off on "I Want To Smack Someone" mode) [16:04] Having done something similar....ya this is at least 6 months [16:05] also, this is not going to be cheap [16:07] I didn't explain the dynamic nature of the site, it's basically just an information page. There is no session or custom content served by users [16:07] same content for everyone [16:08] That doesnt change things really [16:10] Ussat: in my view, that means it is really cache friendly [16:11] and my (naive) idea is to move all the clients to hit CF instead of my backend [16:11] sure its cache friendly, but you still will need something beefy to 1) server that and 2) handle all the connections, and at a minimum a HA solution to handle system failures [16:12] I have minimum experiance with CF, so I can not speak to that [17:14] sdeziel: if the page is NOT a static content page, then it's not cache friendly [17:15] teward: it's a Drupal page so it has the usual clutter but I still aim to cache it as aggressively as possible [17:15] testing will confirm if I'm deluded [17:16] we're going to go in a cyclical argument again [17:16] but I don't have any other options I can think of with the time I am allocated [17:16] sdeziel: you basically HAVE no options with the time you've been allocated [17:16] just saying [17:16] Dec 7th is soon enough [17:16] the previous stuff we stated remains the same. [17:16] any solution you do now is a 'bandaid' and '42000 concurrent connections' aren't going to happen at the same exact moment [17:16] understood, I just need to compose with the reality I'm facing... [17:17] sdeziel: i can acquaint you with the reality instantly. give me your target domain/etc. and we can loadtest it *now* :P [17:17] oh the page content is still being worked on... [17:17] reality is brutal like that sometimes [17:54] sdeziel: content or not, you can just throw a plain page up :p [17:54] and then loadtest it yourself via k6.io or one of the things I gave you [17:54] show you a small 50-concurrent-user load test :p [17:55] that not withstanding, you can indicate to your boss if you wish that multiple individuals who have had to deal with this request (including an IT Security professional) have all advised that you're going to be limited in your capabilities with sucha short deadline - protecting against DdoS / load is something done over many months NOT six days [17:55] but i rest my case on that [18:17] sdeziel: i guess you could try to produce static content, which makes it scalable. and then host it in the cloud to scale. [18:18] tomreyn: I went that way first only then to realize that it's not super trivial to turn a site into a static version. wget --mirror is only good for stuff pulled from HTML, nothing fetched async by JS... [18:20] tomreyn: for now, I'm taking the reverse proxy+aggressive caching of everything (HTML included) in order to leverage CF "distributed cache" [18:20] my goal is to only serve the first page load to warm Cloudflare's cache [18:20] sdeziel: the stuff that's fetched async by JS would need to continue to be fetched async by JS. that's only client side dynamics, not server side. [18:20] sdeziel: to the extent I've read about this, it really helps to have designed the site to have clearly separates dynamic vs static portions, so fully static stuff can be served from cache as-is with very little overhead, and the dynamic things are fairly restricted to a small portion of the site, and you can try to cache fragments of rendered pages for assembly by the servers before delivery to [18:20] clients [18:21] sdeziel: could you make more aggressive use of fragment caching in your application for those dynamic portions? could you strip some of the dynamic bits from pages that are 99.9% static and get them all the way to 100% static? [18:21] sarnold: yes unfortunately, the content people wanted a Drupal with zillion plugins [18:22] sarnold: I intend to have a reverse proxy adding Cache-Control to everything that pass by it and ignore what the backend thinks should be cached or not [18:22] this is essentially building a static copy that's populated 'live', on the first load [18:23] and then only refresh it every few minutes [18:23] in my test, this really leverages CF as I see no requests spilling to my reverse proxy ... I'm a bit more encouraged than earlier ;) [18:25] sdeziel: does it matter if you wind up serving pages meant for me to your boss? [18:25] about this src="/core/assets/vendor/modernizr/modernizr.min.js?v=3.3.1" this is really just a parameter passed to the javascript code that's evaluated by javascript running on the client. it doesn't involve additional work on the server if you allow all asset/... urls to be cached. [18:25] but if there's a single .php script in this url path then you break things [18:26] sarnold: that's the point I was probably not clear, the page is identical for every visitor, no cookie nor custom user stuff, just always the same [18:26] tomreyn: the ?v=3.3.1 is meant for cache busting purposes so I outta keep it [18:27] or that's how I understand it [18:27] orrrrrrr strip it off and not bust the cache [18:27] use last week's version of hypercard.js [18:28] or whatever they published six seconds ago [18:28] sarnold: the client wants to be able to push live modifications inside 15 minutes... [18:28] so that's the minimum Cache-Control max-age I'll set [18:29] the query string doesn't bother CF, it can cope with it, it's cache will simply grow bigger [18:29] maybe look into https://tome.fyi/ for a drupal static site generator [18:29] or CF can be told to ignore the query string altogether as you proposed [18:30] tomreyn: interesting, thx for that! === ijohnson is now known as ijohnson|lunch [18:34] you should really train dev's to develop websites which produce static code whereever possible, though, and to serve dynamic content from a different (sub)domain. [18:35] I just discovered they backed a search engine in the site ... to search among the possible 3 results.... [18:38] yet another option for .js with query parameters is to use remotely (CDN) hosted scripts, making scalability someone elses' problem. you can use https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity#Browser_compatibility to prevent script injection there. [18:39] it's a way to solve a problem by adding twenty patches on top instead of solving it at the source, though [18:41] tomreyn: my goal is to leverage Cloudflare for that and they are so far doing an amazing job at shielding me [18:42] it's just a matter of serving the first page load when a client happen to go through a cold CF endpoint [18:42] This still ? [18:43] Ussat: yes, sorry for the noise today [18:43] Not noise, not at all, but I dont think you are understanding, fundamentally, you have alm,ost zero chance of doing what you want in that time frame [18:44] I have managed projects of that scope.....and its about a 6 month project, if all goes well [18:44] Ussat: I may be set to fail but I've been told I have to try [18:50] Well, best of luck, and yes, I honestly mean that [18:51] I appreciate it and I'll be sure to report back post Dec 7th ;) [18:52] will be fun either way it goes [18:55] yes, good luck sdeziel :) one way or another december 8 will arrive :) [19:12] yeah and we'll have served a huge bunch of web pages, some 200 and some 50X. The ratio will be interesting ;) [19:14] hehehehe === cpaelzer__ is now known as cpaelzer [21:38] sdeziel: i would still STRONGLY suggest you test this yourself with load testers like I sent you [21:38] because you will *need* to test this LONG before it's prod-ready [21:38] with only 6 days that means "start testing today" (once a day) [21:39] ... or have someone else strike you hard with it (I have a k6 instance I can toss at you if needed :P) [21:39] point still stands, JS is client side stuff so [21:40] if it does any type of query to the server for data your attempt to cache will go into the hell that we warned you about [21:41] teward: I just learned that the official domain will be communicated to us at the latest 3 hours before the official launch [21:41] well... then you're SOL :P [21:42] :) [21:44] 3 hours [21:47] three hours [21:47] it'll be fun to hear their reactions when the domain name they intended to buy has been bought because the whois site they used front-run the domain purchase :) [21:49] lol [21:50] sdeziel: if you know the IP you can still throw salt at it. OR run k6 locally, i've got a script you can use you just have to adjust the path(s) internally === misuto8 is now known as misuto