Kage makes it stupid simple to archive websites before they disappear, and it has become my favorite read-it-later app

Kage makes it stupid simple to archive websites before they disappear, and it has become my favorite read-it-later app


One of my biggest reasons for starting a home lab was my obsession with archiving data. After all, storing a few dozen TBs of data on the cloud isn’t feasible in the long term, as I’d end up shelling out hundreds of bucks in just a few years. Not to mention, relying on an external platform to store my painstakingly-collected media isn’t ideal from a privacy standpoint.

But aside from my movie collection, ripped ROM files, and ebook library, I also have smaller-sized files that I prefer to store on my local workstations. Web pages are one such medium, and between link rot corrupting my curated collection of posts and JavaScript-laden websites refusing to run well even after I save them, I was starting to run out of options for my website archival needs. Fortunately, I ran into Kage while doom-scrolling on GitHub the other day, and this tool has everything I need to store entire websites locally.


Kage makes it stupid simple to archive websites before they disappear, and it has become my favorite read-it-later app


This free, self-hosted app is basically Jellyfin for comics and books, and I absolutely love it

Kavita is hands-down the best self-hosted utility for comic book lovers

Saving pages the conventional way isn’t ideal for JavaScript riddled websites

And it’s far too cumbersome when storing full-on websites

Saving a web page on Brave

Most web browsers ship with the ability to snapshot web pages and store them on a local machine. Unfortunately, unless a website is a simple collection of HTML pages, the built-in save functionality results in a jumbled collection of UI elements due to inaccessible JavaScript code. If the website in question goes down, it’d be impossible to access it anymore, rendering the saved instance completely useless.

But for folks who want to archive entire websites, saving every page manually can be a grueling task. Even though bookmark managers provide a neat catalog of saved links, manually adding every possible web page to them would be a test of patience, especially for large websites with thousands of pages. Then there’s the problem of local navigation, as these tools typically only capture a single page, making internal jumps really annoying on academic websites where I need to traverse dozens of pages to understand a specific topic.

Kage solves this archival problem with a neat workaround

And it lets me preview the saved pages by running a local server

Unlike the save facility on web browsers or bookmark managers, Kage provides a neat way to bypass the JavaScript limitation. Rather than just saving the contents of a URL, Kage first renders the entire page in a headless Chromium environment before caching everything under it. That way, the JavaScript content baked into the web page gets captured almost perfectly, though Kage also removes the event handlers and other scripts from the archived copy to get rid of the tracking functionality built into it and make the cached page safe for local browsing. The best part? Internal links, navbar buttons, and site maps work incredibly well on locally-served pages, so I can access all the website elements related to the archived page without manually searching for (and caching) them.

Another neat aspect of Kage is that it can deploy a server for its archived webpages. As such, I don’t need to stay on my local system just to view my websites. Kage can even turn the cached pages into the ZIM format, which can be easily accessed from the Kiwix app on smartphones. Likewise, Kage can convert an entire website into a self-contained binary, so I can open it directly from the File Explorer. Heck, tossing the –webview tag when building a binary lets Kage run my archived website using the underlying operating system’s WebView component, meaning I don’t even need a web browser to access it.

It’s great for archiving entire websites’ worth of content

With the right tweaks, Kage’s captured content doesn’t hog too much space, either

Packing a website via Kage

Remember how I mentioned that conventional bookmark managers are terrible for storing hundreds of pages? Well, Kage essentially saves full-on websites using a single command. For example, running kage clone followed by a website’s name forces the app to work its magic on every page associated with it. I’ve run it on a handful of websites (including our very own XDA-Developers), and I had to restart the process with the –max-pages 10000 flag. Otherwise, the app would run tens of thousands of pages within a headless Chromium app and cache them afterwards – a process that, despite being automated, would take several hours.

Kage also includes other useful flags to limit the archival process to specific pages. For example, the –scope-prefix flag, followed by something like /posts, forces Kage to only save the pages that start with this term. Likewise, I’ve used the –exclude flag to prevent Kage from bringing the useless components of documentation-heavy websites.

Personally, Kage’s synced websites usually don’t occupy too much space. But the tool also supports the highly-compressed ZIM format, which further reduces the amount of HDD space my archived websites can hog.

But it’s not the only obscure archival tool in my arsenal

An archived video accessible on Jellyfin

Besides Kage, I’ve got a handful of data-hoarding tools that don’t get the same amount of respect as Jellyfin, Plex, and other popular media server apps. Pinchflat, for example, can download YouTube videos in bulk, making it great for archiving entire YouTube channels. I’ve also got a Blinko instance for managing my ever-growing note collection, and its support for LLMs lets me run RAG-based inference tasks on all the ideas I jot down in a hurry. On the bookmark side of things, I typically use the AI-powered Karakeep to house random web pages when I don’t want to use Kage to store the entire website.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *