=== polymorp- is now known as polymorphic === remolej4 is now known as remolej [03:36] Hello everyone! Interesting problem I'm wrestling with: I have a user with 60TB of video data in two locations, one properly organized, the other location...not. The data within is reasonably certain to be the same. Doing a recopy would take a ridiculous amount of time. Just wondering if there's a way to script walking the "correct" system into a [03:36] text file (a "map") and then on the remote system, reading the "map" and recreating the original layout by simply moving inodes around [03:38] Guest47: What you're trying to do is possible, but will require a bit of programming/scripting. [03:38] I probably can't write the script now, but what I'd do is something like this: [03:39] 1. On the "good" machine, build a tree with the file and directory structionre. Get a SHA256 hash of every file in the tree. [03:39] 2. On the "bad" machine, create a similar tree, making the hashes. [03:39] If there isn't an existing bit of code you all might know of, what are your thoughts on: "find $1 > map.txt" run on both systems, then a diff between the two maps. Finally, parse the diff file so if correct-folder-Ab exists, mv file-a folder-A. If the folder doesn't exist, mkdir -p, then move... simplistic, I know :-/ [03:39] oh [03:39] 3. Match the hashes up against each other. [03:39] Then you can be sure what file is what and be able to move it where it goes. [03:40] If the filenames are guaranteed to be unique *and* identical filenames on both systems indicate identical files, you can skip hashing and just use the filenames. [03:40] Guest47: The way you're mentioning sounds fairly straightforward. Depending on how the data on the servers works, it might be that easy. [03:41] I don't know for sure how exactly the data is going to behave though, so my suggestion should work in any event (so long as the videos are bit-for-bit identical). [03:41] arraybolt3 thanks for the thoughts! I'm *hopeful* file names are distinct, but without some kind of checksum, no way to be certain right? [03:41] True. [03:41] And sha256sum'ing 60 TB of data is going to take *looooooong*. [03:42] Is there something faster to checksum than sha? Is crc32 still considered reasonable? [03:42] If your code can cope with hash collisions, then something like a CRC or perhaps MD5 might be sufficient. [03:42] But if you do that, you *have* to be ready for hash collisions. [03:43] Guest47: I'd also check if `rsync --fuzzy` would work for this. I don't know how flexible it is to find similar files in various directories [03:43] MD5 is too likely to generate hash collisions to just be blindly trusted. whereas SHA256 is so unlikely to have a hash collision that it should be able to be trusted without any further checking. [03:43] (At least this is what I have been led to believe from my research.) [03:43] (If I'm wrong, someone please let me know.) [03:46] Guest47: If rsync --fuzzy can't be made to work, I'd benchmark the speed of SHA256 on your hardware, then benchmark the network connection, and use whichever one's faster. [03:46] If you have a 1 Gbps connection between the two, then probably the hashing will be quicker. If you have 100 Gbps, then you might want to just reclone and call it a day. [03:46] Odds are probably pretty low of a collision if comparing the simple sum, and the filename.. Any collisions still found I guess I'd just log [03:47] Sadly, it's fully remote, one system in LA, other near San Diego [03:47] Best one could hope for is 1gbps, but likely less [03:48] Blah. [03:48] Does copy time exceed programming/debugging time? It never ever does hahaha [03:48] arraybolt3: on some hardware, sha512sum is ~2x faster than sha256sum [03:48] sdeziel: Oh really? Didn't know that. That would be quite handy. [03:49] What is the "sum" command hashing? [03:49] That's a 16-bit checksum. [03:49] You are going to have so many many many collisions with that. [03:49] for 60TB I'd try the rsync approach of checking the metadata (size, name, etc) [03:50] comparing the size alone should give you a list of likely copies [03:51] well... "man sum" seems to show the 16bit sum, and the file size. So that could be kind of useful (maybe?) [03:51] sdeziel: Heh, what do you know, my system is one of the ones that can sha512 faster than it can sha256. [03:51] arraybolt3: re sha512sum being faster, it's apparently due to operating on larger blocks (https://crypto.stackexchange.com/questions/26336/sha-512-faster-than-sha-256) [03:51] (It also makes my system make an odd whining noise? That's creepy.) [03:52] There's also blake2b. [03:52] That one beats the sap out of sha256 and sha512. [03:53] brb laundry. Before I go, can I just say that I really appreciate your inputs on this? Thanks arraybolt3 and sdeziel [03:53] Guest47: Sure thing! [03:53] Yeah, seeing the speed of b2sum, I'd use that. It makes long hashes so it will probably not have hash collisions, and it's operating at mind-bending speeds for me - almost a full GiB/s when being piped in /dev/zero. [03:54] Guest47: np. Lastly, I wouldn't use `sum` as you end up reading 60TB from the disks anyway... checking the size would pull way less from the filesystem and give you a good enough first indicator [03:54] And piping it an ISO file, still way way faster than SHA. [03:54] I'd keep the heavy hash computation for files that are close in size to disambiguate them [03:58] (There's also b3sum which appears to be even faster, https://github.com/BLAKE3-team/BLAKE3) [04:41] 2.54s for b2sum vs 1.95s for sum on the same 225mb file. Hilariously, sum took 1m33.99s on an 8.01gb file, b2sum took 1m18.26s on the same file [04:47] I'll take a wack at writing the utility to do this tomorrow, I'll drop by and share it with you guys when I finish if you'd like [04:50] Thanks! Good luck! [05:35] Guest47: when doing such benchmarks, you should make sure they use the same cache. Either drop caches, or cache the whole file === xispita is now known as Guest4041 === xispita_ is now known as xispita [11:42] hey everyone... I have a server affected by this bug - https://bugs.launchpad.net/ubuntu/+source/cloud-initramfs-tools/+bug/1958260 - as the bug discussion points out, this only happens if the /lib/modules dir for your kernel version is empty in initramfs - see this code: https://git.launchpad.net/cloud-initramfs-tools/tree/copymods/scripts/init-bottom/copymods#n52 [11:42] -ubottu:#ubuntu-server- Launchpad bug 1958260 in cloud-initramfs-tools (Ubuntu) "cloud-initramfs-copymods hides the full list of modules from the system" [High, Incomplete] [11:44] unfortunately since the server is now in this state it seems difficult to get it unstuck. I would like to manually intervene either in the initramfs or this script that is in there so that I can get my system back into a cleanly booting form... [12:27] bryceh: this is an interesting git-ubuntu case in https://code.launchpad.net/~bryce/ubuntu/+source/nmap/+git/nmap/+merge/437077 [12:28] while it is my first time dealing with such case, I suppose it is quite common... [12:29] rbasak: when the patch gets uploaded, should we expect the current jammy-devel branch to be completely replaced by whatever is in bryceh's branch? === justache is now known as deliriumt === deliriumt is now known as justache === cpaelzer_ is now known as cpaelzer === sdeziel_ is now known as sdeziel [14:13] athos: yes the branch pointer will just be updated [14:13] athos: https://bugs.launchpad.net/git-ubuntu/+bug/1852389 [14:13] -ubottu:#ubuntu-server- Launchpad bug 1852389 in git-ubuntu "Branch pointers do not follow deletions, breaking ubuntu/devel and such" [Wishlist, New] === otisolsen70_ is now known as otisolsen70 [17:21] good morning all, attempting to setup apt-cacher-ng, server is all setup and now adding auto-apt-proxy to our local nodes, have configured the hostname/dnsdomainname correctly on the node, but when running auto-apt-proxy, nothing is returned on my first node (it does return on the apt-cacher-ng node) [17:23] running nslookup with a SRV type for the _apt_proxy._tcp.domain.name returns as expected [17:27] baldpope: if I understand correctly, you have to install the auto-apt-proxy package on your machines to (eventually) have them use your apt-cacher-ng, right? Instead of using a package to auto-detect, wouldn't it be simpler to deploy an apt.conf snippet saying to use a proxy? [17:27] `/etc/apt/apt.conf.d/01proxy`: `Acquire::http::proxy "http://apt-cacher-ng.domain.name:3142/";` [17:29] looks like that would accomplish the same thing? [17:32] thanks sdeziel [18:45] athos, I don't know if it's common, this is the first time I've come across it myself === chris15 is now known as chris14 === remolej4 is now known as remolej