Bittorrent Trackers go down a lot. If you’ve ever used torrents, you’ll know the frustration of one that won’t start, because the trackers are down. The reason for it will amaze you!
Ok, I thought I’d give a go at a buzzfeed-style headline, but the question is one that’s probably run through your mind at one time, and you probably will be a bit shocked/surprised at the answer. The answer simply put is…
Well, not just you, but you and a bunch of people who have gotten into lazy habits. Still, it’s something easily preventable, and something You can do something about.
You see, one of the leading causes of tracker instabilities is just one thing – Traffic. Take TPB for instance. Right now, on the front page it says
51.305.526 peers (39.251.829 seeders + 12.053.697 leechers) in 5.953.886 torrents.
The actual value may differ when you read this, but let’s just take that as a starting point, and round it down to 50 million.
The average bittorrent client updates to a tracker roughly every 30 minutes. It’s an approximation that helps keep the maths a little simpler, but also takes into account starting, stopping, etc. So don’t take these figures as the worlds most accurate figure, just some ballpark ones to work from.
So, 50M peers, updating every 30 mins. How many per second is that? Well, 1800 seconds in that 30 minute period, so it’s
Sounds like a lot doesn’t it? And that’s just the AVERAGE. Something very popular comes out, and that figure will rise, but we’ll round it to a whole number, and again keep it simple.
Each TCP tracker request is (again to make things simple) about 100 bytes long. So there are 27,778 requests of 100 bytes hitting the tracker every second. That’s 2,777,800 bytes every second, or close to 2.8 megabytes (or 16.8 Mbit), each and every second, and that soon adds up. 2.8megabytes/second average works out to about 7.25 terabytes of data over 30 days. That’s a lot of data.
Then we get to the responses, which is when the tracker sends data back to the client. We’ll assume 25 peers are being sent back, because there are always small swarms, which won’t have enough peers for the default 50. So we’re looking at maybe another 260 bytes per response, or 7.2Mb/sec, taking the 30-day total to 18.6TB sent out, and ~25.8TB total.
That is, of course, for TCP-based trackers. UDP ones are slightly different, since it uses a send-receive-send-receive pattern, but overall it works out to 114 bytes sent, and 186 bytes returned which at our loads means 3.1Mb/sec received and 5.1Mb/sec sent, giving 30-day totals of 8.2Tb received, and 13.4Tb sent (21.2Tb total)
And of course, none of this takes into account overheads, which adds about 3% onto TCP traffic, and half as much for UDP traffic (a better example is here)
It sounds a lot, and it is; and what’s more, most of it is preventable.
But I don’t understand. Example?
In the case of the Pirate Bay, when a torrent is uploaded, it is actually modified. Not in a way that affects the hash, but though the addition of trackers. Used to be 3 now it’s 5, which is odd since I only had one tracker – the Pirate Party of Canada’s CaPT tracker – on the torrent of No Safe Harbor when I uploaded it. But now there are five if you get it via TPB, and they are:
So what’s going on then?
Well, let’s look at things as a progression. As you can see they’re numbered. So let’s start with tracker number 1.
Every client that runs this torrent will announce to this tracker. We all know that and understand that, right? It’s pretty obvious. So let’s take that as read, and move on.
Now, the second tracker is a bit more nuanced. In theory, every client should announce to this tracker as well, but the reality is a little different. See, not all clients support multiple trackers If we assume that it’s about 1% of all clients, that means that on tracker 2, you will find listed 99% of the peers that are on tracker 1. What you won’t find, however, is anyone on tracker 2 that isn’t on tracker 1.
It’s the same story as tracker 2 above. In fact, the same is true for any number of additional trackers.
“Tracker 6” (Distributed Hash Table)
This was developed in 2005 to deal with the problem of unreliable trackers. At that time, the concept of multiple trackers was still not widespread, and not many clients handled them. DHT, based loosely on Kademlia, was the more robust solution. No matter how many trackers you put onto a torrent, they can all go down. DHT can’t, because it is not ‘one node’, but an amorphous ‘cloud’ of peers.
“Tracker” 7 (Peer EXchange, or PEX)
This is completely independent of the trackers. When you’ve connected to some peers, and have it active, it will swap peer info with connected peers. Basically, If you’re connected to peers W, X, Y, and Z, and Peer W is connected to A, B, and C, you’ll tell W about X, Y, and Z, and W will tell you about A, B, and C. It augments trackers, and means that even if there were two disparate swarms on a torrent (as can happen with DHT-only torrents, thanks to the incompatable Vuze and mainline systems), as soon as a single link is made (by a client that can handle both, for instance) the swarms start to intermingle.
So what we’ve got is a tracker that has all the peers (1) a bunch of other trackers that have most of the same peers and no new/different ones (2-5) a an alternative way to get peers (DHT, aka 6) and a way to keep all of them in one big group. So guess which one you can do away with without causing any issues?
Yes, it’s the duplicate trackers.
But what if a tracker goes down? Well, if the first tracker goes down, then yes, second tracker saves the day! (Hurrah!) And 3-5 are still completely redundant and a waste (Bugger!). And if it’s tracker 2-5 that goes down, who cares? It’s not made any different to the primary tracker. NOT ONE BIT. Even more fun comes when you realise that any client that supports multiple trackers, tends to also support DHT, and so DOESN’T NEED multiple trackers anyway.
Going back to the example, people running the original torrent, with the CaPT tracker on it, and those getting the torrent from TPB are going to be two different swarms, aren’t they? Well, yes, and no. If that was all we knew, then yes. If, however, someone on both groups had DHT running, then they’re going to become one swarm, thanks to PEX. DHT saves the day again!
But the question has to be asked, just WHY do people keep adding more trackers to a torrent? The answer is simple, and comes from a misunderstanding of how a tracker works, and what’s being done.
Why are people doing this?
In the main, people add trackers for one of two reasons, either a) to find more peers, or b) to try and add some resiliency. So let’s address these.
More peers make it faster; so if you want more peers, you add more trackers, because you can’t get peers without trackers, right? No. Once you’ve got the appropriate number of peers for your connection, trying to add more will just make things slower. Or, you don’t think that’s all the peers there are, and there must be some that no-one knows of right?
Both of these are due to errors understanding bittorrent. The first is a common fallacy. More peers does not mean more speeds. The fastest setup is the one that matches your connection, and more peers will make you slower (thanks to the increased overhead). As far as adding more peers through adding trackers, remember this. Those same people, with the same torrent, already have the same trackers you do, so they’d have to delete the ones already in the torrent, and add the same one you have, for them to be new peers.
Not very likely is it? No.
This was the reason the ability to add multiple trackers to a torrent was created in the first place. So it clearly makes sense, right? No. Not really. In fact, the widespread use of it has caused more tracker downtimes than anything it’s prevented. As one tracker goes down, load on other, similar trackers (especially the 5 listed on TPB/KAT) stays pretty much the same, and may even increase slightly (as people hit ‘update tracker’ to see if they can get a down one to work). These then add up into more traffic, which can then take the working trackers down. It’s actually a vicious cycle where the misguided attempts to mitigate problems are the root cause of the problems.
What’s more, in 2005 a better solution was released (Vuze first, Bittorrent Inc a few weeks later) for resilience. It’s DHT, and its means of resilience actually mirrors bittorrent for its swarm efficiency.
With a traditional download, the more people trying to download from a server, the slower it goes, until it’s eventually unresponsive, or at least very slow. Likewise a heavily loaded tracker starts to become less reliable the more popular it gets. Bittorrent shares the bandwidth of the download between many peers, and thus spreads the job between many. Take the initial seeder off after a short while and the downloa
ds can still continue. DHT does the same with the job of tracker, sharing it between the thousands, meaning one or two or twenty nodes going offline won’t even be noticed.
Despite the fact it’s been around for more than 8 years at this point, people still often don’t understand DHT and what it is. You get advice for self-professed bittorrent ‘experts’ telling you that it has to be turned off for certain sites (usually their own activity-logging site where DHT undermines their revenue stream), etc. It’s so bad, that there’s a whole page on TorrentFreak about various DHT myths being debunked from 2009 and they STILL get repeated.
There are sometimes other reasons trotted out as well, and they’re equally fatuous. I once asked a certain release group’s representative why they had multiple trackers, and was told it was “so the torrent would start fast for people”. Thing is, it starts no faster with 5 or 50 trackers as it would with 1, and while DHT can be a bit slow with a few peers, DHT and a single tracker works quickly enough most people can’t tell if the sourcing is DHT or trackers.
Another reason given was as “wanting to show the most seeds and peers on sites”. Again, yet another ridiculous idea. Sites don’t scrape (that’s when they pull stats from trackers) multiple trackers, and then add them up; they pull from one, and use that because otherwise you could turn 50 into 500 just by using 10 trackers. One tracker works just fine for this.
So what can be done?
The key thing is, stop adding trackers. Sites and groups like TPB need to start reducing tracker numbers as well – and TPB is a big problem, adding 5 trackers to every torrent. What makes it more ironic is that 18 months ago, TPB founder Fredrik Neij submitted proposals to reduce tracker loads, and save trackers money, and yet the TPB site is adding massively more to the load (and cost) of running a tracker than the problems he was dealing with.
The second thing is, Turn on DHT if you haven’t already.
The whole thing about DHT is that it can find peers without them being locked on a specific tracker. If they are running the torrent, it can find them and link you to them. It also doesn’t go down. Every reason people have for running multiple trackers is quite literally something DHT does far better.
Sure, it seems like ‘everyone knows’ that adding more trackers is better, but everyone knew the Earth was at the center of the universe, everyone knew radium was good for what ails you, and everyone knew Alabama-Auburn was going to overtime.
It turns out, what everyone knows, is usually crap.
And now you know better.