Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

@mgdigital · 1 year ago

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

@[email protected] · 1 year ago

This looks really cool! How resource intensive is this? What sort of storage requirements are there for this to be a reasonably reliable method of acquiring media? I’m probably just gonna find out myself. I’ve recently fully switched over to usenet, but this could make torrents pretty compelling again.

@mgdigital · 1 year ago

Hi, and thanks!

As a priority I’d like to gather some more rigorous performance benchmarks, but I can give you some hand-wavey stats now: Bitmagnet is currently fluctuating between 2-10% CPU usage on my M2 Mac Mini, and is using ~120MB of memory having currently been running for around 48 hours. Overall, the GoLang implementation seems pretty efficient to me considering how much I know is going on in the background.

Disk space usage of the database- this will be highly dependent on 2 configuration options, the first of which I’ve only just added in the just-released version. Copied from the configuration page of the website:

dht_crawler.save_files (default: true): If true, file metadata from the DHT crawler will be saved to the database. This provides more rich information about a torrent, but will use a lot more disk space. If disk space is at a premium you may want to consider disabling this.
dht_crawler.save_pieces (default: false): If true, the DHT crawler will save the pieces bytes from the torrent metadata. The pieces take up quite a lot of space, and aren’t currently very useful, but they may be used by future features.

For me, 24 hours of crawling uses ~2.5GB of database disk space for metadata on the ~120k torrents it has discovered. Yep, that sounds like a lot, however 90% of that is taken up with the files metadata, and could have been saved by setting dht_crawler.save_files to false. In fact I may set this to false by default and allow users to opt-in to the full-fat torrent info.

I’ve also imported the entire RARBG backup (the SQLite one, see tutorial on the Bitmagnet website). This, along with all the associated metadata from TMDB, took around 4GB of database space, which seems quite acceptable considering it’s basically every movie and TV show. Note that this does NOT include the metadata on individual files as I described above.

A priority feature for me (detailed on website) is smart deletion - this would allow you to automatically discard a lot of data that can be automatically determined of no interest and therefore greatly reduce disk space demands.

@kautau · edit-2 1 year ago

As someone interested in Usenet, what’s the best provider and client to start with in your opinion?

@[email protected] · 1 year ago

I’ve been using easynews/nzbgeek/nzbget with an arr stack on debian and it’s worked well for me. I’m fairly new to usenet, so take this with a giant grain of salt.

@kautau · 1 year ago

Cool, thanks for the reply!

Kushan · 1 year ago

Sabnzbd is probably the best choice of download client, fyi.

CosmicApe · 1 year ago

Linux program names are fucking wild

@deafboy · 1 year ago

Running for 6 days, save_pieces: false

My database is currently 184 GB

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

Introducing Bitmagnet: A self-hosted BitTorrent indexer, DHT crawler, content classifier and torrent search engine with web UI, GraphQL API and Servarr stack integration

Home

What is a DHT crawler?

Currently implemented features of Bitmagnet:

Interested?