/srv/irclogs.ubuntu.com/2020/12/01/#ubuntu-server.txt

=== Mollerz2 is now known as Mollerz
SupaYoshiHi anyone that knows a good docker-compose service file for systemctl ?04:18
SupaYoshiIā€™m wanting to start docker after reboot. But i want to delay its restart till after a specific service has been started04:18
lordievaderGood morning07:23
SupaYoshigood morning11:10
lordievaderšŸ‘‹11:14
SupaYoshi?11:17
=== denningsrogue6 is now known as denningsrogue
sdezielhello, I'm tasked with "vacuuming" a dynamic web site and get a static copy of it for reasons. wget --mirror does a good first pass but doesn't bring the stuff that JS would cause a regular browser to fetch. Anyone got an idea to solved that?15:14
UssatUnless you have access to the code and can just copy it, nope15:16
sdezielUssat: I have access to the dynamic copy15:17
Ussatjust copy the code, BUT, it wont just work if thats your goal15:17
sdezielI don't see how that help me :/15:18
sdezielfor example, there is one file with src="/core/assets/vendor/modernizr/modernizr.min.js?v=3.3.1" I could grab a copy from the FS but then I'd miss the "v=3.3.1" query string15:20
UssatRight, there is no way of getting that because its dynamic15:38
sdezielUssat: OK, I guess I'm then left with two choices: scripted headless browser to save the whole deal or an aggressive caching proxy in front of it15:44
UssatSO, whats the ultimage outcome you want ?15:44
Ussatultimate15:44
sdezielbasically survived a planed DDoS15:44
sdeziels/survived/survive/15:45
sdezielDDoS is a bit of an abuse, this will be a flood of legitimate clients15:45
UssatSo, thats what backups are for15:45
Ussatyou are approaching this wrong15:46
Ussatif legit clients take you down, this has nothing to do with that15:46
sdezielmaybe I described it wrong cause I'm not looking for backup. I want a site that performs well under heavy load15:46
Ussatbeef up your infra15:46
Ussatso you need a beefer infra then15:46
sdezielhence the idea to turn a dynamic monster into a static small site15:46
UssatNot gonna happen15:47
Ussatonly way to accomplish what you want is a beefier server15:47
tewardsdeziel: one of the teams at my employer had a similar case to what you wanted.  Without a full copy of the site it was a page-by-page identification and scrape.15:48
tewardto get a static copy.15:48
tewardand that was a heavily MANUAL process because of the JS, etc.15:48
tewardif you have a copy of the code, DB, etc. that's what you'll need.15:48
sdezielteward: I'm betting on nginx fronting the dynamic site and doing aggressive caching of everything (HTML included)15:48
tewardas for surviving a planned stress load, you need the beefier server15:49
tewardsdeziel: trust me that won't work very well15:49
tewarddynamic sites, NGINX doesn't really cache dynamic sites well because usually there's underlying headers, tags, etc. that indicate to NOT keep a cached copy15:49
sdezielwe will have Cloudflare in front of that cache of ours15:49
tewardyour process is approaching this wrong15:50
teward> hence the idea to turn a dynamic monster into a static small site15:50
tewardimpossible and not going to happen15:50
tewardyou either need:15:50
sdezielhmm15:50
teward(1) larger infra to survive the load15:50
teward(2) backup copy of the site in its entirety running independent from the stress load15:50
sdezielI was maybe too optimistic but I though I could tell nginx to ignore caching hints and just do as told15:51
tewardsdeziel: that may work for static contents but NOT for JS-generated stuff15:51
tewardsince JS and such is all client side executed and such15:51
tewardyour entire premise and approach is backwards of what it needs to bei15:51
tewards/bei/be/15:51
tewardsure theres ways to get "cached copies" of data15:51
tewardbut that won't be a replacement15:51
tewardbecause for a cached copy to *work* basically **every single potential URL, request, content piece, query string, argument, parameters, etc. needs to go through the proxy for it to work as a 'static copy'**15:52
UssatI would suggest, for what you want, a hot-hot HA setup15:52
sdezielOK so two against my (naive) proposal... not feeling confident anymore15:52
tewardand that's not really a thing you can (a) enumerate sanely, and (b) execute in a way that survives the DDoS15:52
tewardi agree with Ussat15:52
tewardhot-hot HA setup with a loadbalancer/failover mechanism to fail over to the alternate HA on overload15:53
Ussat^^15:53
tewardbut keep in mind if your DDoS is hostname driven and not a single IP target, it will potentially explode there as well15:53
UssatI would suggest, two servers behind something like a F5 load balancer, as an example15:53
teward^ something like that15:53
UssatIts, not trivial15:54
tewardDB HA-HA isn't trivial either15:54
UssatBecause, youre talking shared FS in order to keep things in sync, and agreed teward15:54
tewards/HA-HA/HA/15:54
tewardright15:54
sdezielOK, should NOT have said DDoS, it's just going to be a flood of legitimate clients15:55
tewardsdeziel: ... which would bypass CF and then hit hard.15:55
tewardsdeziel: same equal problem15:55
UssatSo, something like, 2 servers behind a F5 to balance, that both talk to the same DB, but a beefy DB server15:55
tewardthat'd do what you're after, yes.15:55
Ussatmaby 2 or 3 webservers to distribute the load15:55
sdezielteward: CF will take the bigger part of the hit assuming I'm crafting the right Cache-Control headers15:55
tewardmight not hurt to have a RO copy of the DB around either as a failover so when you're under load you can serve a readonly15:55
teward(like StackOverflow/SE does)15:56
Ussat^^15:56
tewardsdeziel: you still have to accomodate for the percentage of traffic that will slip past15:56
tewardsdeziel: in which case you're still looking at the same problem, full on DDoS or not15:56
UssatYou alos need to realiose, nothing will be 100% bullet proof15:56
tewardif your system can't handle the *load* of the legitimate clients that CF does *not* block, then you need larger resources15:56
tewardand also what Ussat says15:56
tewardno solution we could provide will be 100% bulletproof15:56
tewardthere'll be ways to still overload things15:57
Ussatyup15:57
tewardand that doesn't account for simply flooding the pipe with requests and exhausting the connection/request queues causing 503s or similar unavailability cases15:57
sdezielI've been told to expect "24k concurrent users" whatever concurrent means15:57
teward(or triggering a ton fo 416s)15:57
tewardconcurrent means simultaneously connected and accessing15:57
tewardit's basically a loadtest.15:57
tewardon that basis15:57
sdezielI know what concurrent means but they are management people15:58
tewardyou have to assume you will have 24000 simultaneous requests for information15:58
tewardsdeziel: without more information on the 'attack' you're going to be handling15:58
tewardwe're going to speculate15:58
tewardso your options are:15:58
Ussatsdeziel, I would suggest, if youre putting something together for 24k concurrant users, IRC is NOT the venue for this, I suggest you have a professional or two architect this out15:58
sdeziellet me describe it better15:58
tewardi also agree with Ussat15:58
tewardyou will need a professional network architect involved15:58
Ussatbecause thqats not trivial15:58
teward^^15:59
sdezielthe site in question will be announces on public TV so we expect many curious to just look it up15:59
Ussatya, get a pro involved, there is no way over IRC this can be worked out15:59
Ussatalso, this will NOT be cheap15:59
teward^^16:00
tewardand i'm speaking as a professional network security guy here16:00
tewardnot just my Ubuntu hat today ;)16:00
sdezielalright, I appreciate the discussion16:00
tewardsdeziel: the SIMPLEST solution is temporarily put the site in RO mode and serve read-only copies of the site16:01
sdezielI'll have to evaluate my options quickly16:01
tewardsdeziel: the SIMPLEST solution is temporarily put the site in RO mode and serve read-only copies of the site, let the DB handle things16:01
UssatI used to manage an ecomm site that was very high traffic and ya, get some professionals to look this over16:01
tewardbut you still won't get the solutoin you're after16:01
UssatThere is no way to do this quickly16:01
tewardyou'll need professionals to design/spec this out16:01
Ussatnot if you want it right16:01
sdezielI need a PoC at the EOD16:01
teward^^ that16:01
sdezieland a prod ready on Dec 7th16:01
tewardsdeziel: your bosses need to [CENSORED CENSORED CENSORED]16:02
tewardsdeziel: Impossible16:02
tewardsolution in 6 days is not feasible16:02
UssatNo chance by the 7th16:02
tewardyou need at LEAST 6 days to work with pros to spec out what you need, what your goals are, research options, and get budget approvals to *purchase* any softwrae you need16:02
Ussatnot a chance16:02
teward100% guarantee you will NOT get a solution by the 7th16:02
tewardthis isn't something you can put together last minute16:02
UssatYou will need at least 6 days to start to design this16:02
teward^ that16:03
UssatThis is a 6 month project16:03
tewardyup16:03
* sdeziel is on the phone with said boss16:03
tewardsdeziel: orly?  Put me on the line, I'll give your boss a few dozen words.  *shot*16:03
UssatPlus you need testing, LOTS of load testing16:03
teward(no seriously it's bosses like this that send me off on "I Want To Smack Someone" mode)16:03
UssatHaving done something similar....ya this is at least 6 months16:04
Ussatalso, this is not going to be cheap16:05
sdezielI didn't explain the dynamic nature of the site, it's basically just an information page. There is no session or custom content served by users16:07
sdezielsame content for everyone16:07
UssatThat doesnt change things really16:08
sdezielUssat: in my view, that means it is really cache friendly16:10
sdezieland my (naive) idea is to move all the clients to hit CF instead of my backend16:11
Ussatsure its cache friendly, but you still will need something beefy to 1) server that and 2) handle all the connections, and at a minimum a HA solution to handle system failures16:11
UssatI have minimum experiance with CF, so I can not speak to that16:12
tewardsdeziel: if the page is NOT a static content page, then it's not cache friendly17:14
sdeziel teward: it's a Drupal page so it has the usual clutter but I still aim to cache it as aggressively as possible17:15
sdezieltesting will confirm if I'm deluded17:15
tewardwe're going to go in a cyclical argument again17:16
sdezielbut I don't have any other options I can think of with the time I am allocated17:16
tewardsdeziel: you basically HAVE no options with the time you've been allocated17:16
tewardjust saying17:16
sdezielDec 7th is soon enough17:16
tewardthe previous stuff we stated remains the same.17:16
tewardany solution you do now is a 'bandaid' and '42000 concurrent connections' aren't going to happen at the same exact moment17:16
sdezielunderstood, I just need to compose with the reality I'm facing...17:16
tewardsdeziel: i can acquaint you with the reality instantly.  give me your target domain/etc. and we can loadtest it *now* :P17:17
sdezieloh the page content is still being worked on...17:17
sdezielreality is brutal like that sometimes17:17
tewardsdeziel: content or not, you can just throw a plain page up :p17:54
tewardand then loadtest it yourself via k6.io or one of the things I gave you17:54
tewardshow you a small 50-concurrent-user load test :p17:54
tewardthat not withstanding, you can indicate to your boss if you wish that multiple individuals who have had to deal with this request (including an IT Security professional) have all advised that you're going to be limited in your capabilities with sucha  short deadline - protecting against DdoS / load is something done over many months NOT six days17:55
tewardbut i rest my case on that17:55
tomreynsdeziel: i guess you could try to produce static content, which makes it scalable. and then host it in the cloud to scale.18:17
sdezieltomreyn: I went that way first only then to realize that it's not super trivial to turn a site into a static version. wget --mirror is only good for stuff pulled from HTML, nothing fetched async by JS...18:18
sdezieltomreyn: for now, I'm taking the reverse proxy+aggressive caching of everything (HTML included) in order to leverage CF "distributed cache"18:20
sdezielmy goal is to only serve the first page load to warm Cloudflare's cache18:20
tomreynsdeziel: the stuff that's fetched async by JS would need to continue to be fetched async by JS. that's only client side dynamics, not server side.18:20
sarnoldsdeziel: to the extent I've read about this, it really helps to have designed the site to have clearly separates dynamic vs static portions, so fully static stuff can be served from cache as-is with very little overhead, and the dynamic things are fairly restricted to a small portion of the site, and you can try to cache fragments of rendered pages for assembly by the servers before delivery to18:20
sarnoldclients18:20
sarnoldsdeziel: could you make more aggressive use of fragment caching in your application for those dynamic portions? could you strip some of the dynamic bits from pages that are 99.9% static and get them all the way to 100% static?18:21
sdezielsarnold: yes unfortunately, the content people wanted a Drupal with zillion plugins18:21
sdezielsarnold: I intend to have a reverse proxy adding Cache-Control to everything that pass by it and ignore what the backend thinks should be cached or not18:22
sdezielthis is essentially building a static copy that's populated 'live', on the first load18:22
sdezieland then only refresh it every few minutes18:23
sdezielin my test, this really leverages CF as I see no requests spilling to my reverse proxy ... I'm a bit more encouraged than earlier ;)18:23
sarnoldsdeziel: does it matter if you wind up serving pages meant for me to your boss?18:25
tomreynabout this     src="/core/assets/vendor/modernizr/modernizr.min.js?v=3.3.1"     this is really just a parameter passed to the javascript code that's evaluated by javascript running on the client. it doesn't involve additional work on the server if you allow all asset/... urls to be cached.18:25
tomreynbut if there's a single .php script in this url path then you break things18:25
sdezielsarnold: that's the point I was probably not clear, the page is identical for every visitor, no cookie nor custom user stuff, just always the same18:26
sdezieltomreyn: the ?v=3.3.1 is meant for cache busting purposes so I outta keep it18:26
sdezielor that's how I understand it18:27
sarnoldorrrrrrr strip it off and not bust the cache18:27
sarnolduse last week's version of hypercard.js18:27
sarnoldor whatever they published six seconds ago18:28
sdezielsarnold: the client wants to be able to push live modifications inside 15 minutes...18:28
sdezielso that's the minimum Cache-Control max-age I'll set18:28
sdezielthe query string doesn't bother CF, it can cope with it, it's cache will simply grow bigger18:29
tomreynmaybe look into https://tome.fyi/ for a drupal static site generator18:29
sdezielor CF can be told to ignore the query string altogether as you proposed18:29
sdezieltomreyn: interesting, thx for that!18:30
=== ijohnson is now known as ijohnson|lunch
tomreynyou should really train dev's to develop websites which produce static code whereever possible, though, and to serve dynamic content from a different (sub)domain.18:34
sdezielI just discovered they backed a search engine in the site ... to search among the possible 3 results....18:35
tomreynyet another option for .js with query parameters is to use remotely (CDN) hosted scripts, making scalability someone elses' problem. you can use https://developer.mozilla.org/en-US/docs/Web/Security/Subresource_Integrity#Browser_compatibility to prevent script injection there.18:38
tomreynit's a way to solve a problem by adding twenty patches on top instead of solving it at the source, though18:39
sdezieltomreyn: my goal is to leverage Cloudflare for that and they are so far doing an amazing job at shielding me18:41
sdezielit's just a matter of serving the first page load when a client happen to go through a cold CF endpoint18:42
UssatThis still ?18:42
sdezielUssat: yes, sorry for the noise today18:43
UssatNot noise, not at all, but I dont think you are understanding, fundamentally, you have alm,ost zero chance of doing what you want in that time frame18:43
UssatI have managed projects of that scope.....and its about a 6 month project, if all goes well18:44
sdezielUssat: I may be set to fail but I've been told I have to try18:44
UssatWell, best of luck, and yes, I honestly mean that18:50
sdezielI appreciate it and I'll be sure to report back post Dec 7th ;)18:51
sdezielwill be fun either way it goes18:52
sarnoldyes, good luck sdeziel :) one way or another december 8 will arrive :)18:55
sdezielyeah and we'll have served a huge bunch of web pages, some 200 and some 50X. The ratio will be interesting ;)19:12
sarnoldhehehehe19:14
=== cpaelzer__ is now known as cpaelzer
tewardsdeziel: i would still STRONGLY suggest you test this yourself with load testers like I sent you21:38
tewardbecause you will *need* to test this LONG before it's prod-ready21:38
tewardwith only 6 days that means "start testing today" (once a day)21:38
teward... or have someone else strike you hard with it (I have a k6 instance I can toss at you if needed :P)21:39
tewardpoint still stands, JS is client side stuff so21:39
tewardif it does any type of query to the server for data your attempt to cache will go into the hell that we warned you about21:40
sdezielteward: I just learned that the official domain will be communicated to us at the latest 3 hours before the official launch21:41
tewardwell... then you're SOL :P21:41
sdeziel:)21:42
Ussat3 hours21:44
sarnoldthree hours21:47
sarnoldit'll be fun to hear their reactions when the domain name they intended to buy has been bought because the whois site they used front-run the domain purchase :)21:47
tewardlol21:49
tewardsdeziel: if you know the IP you can still throw salt at it.  OR run k6 locally, i've got a script you can use you just have to adjust the path(s) internally21:50
=== misuto8 is now known as misuto

Generated by irclog2html.py 2.7 by Marius Gedminas - find it at mg.pov.lt!