11 February 2012

More on peer-to-peer blogging


I was musing a few days ago on how to do blogging if SOPA-like measures take out hosting providers for user content.

Aaron Davies in a comment suggests freenet. I'm not sure about that; because you don't choose at all what other content you're hosting, I would expect the whole system to drown in movie rips and porn. The bittorrent idea where the stuff which you help distribute is the stuff which you want to consume seems less vulnerable. alt.binaries didn't die because of copyright enforcement, it died because the copyright infringement made such large demands on capacity that it was not worth distributing.

Bear in mind that I'm not going "full paranoid" here: my threat scenario is not "the feds want to ban my blog", it's "Blogger and the like have so much difficulty complying with IP law that they're becoming painful and/or expensive to use".

In that circumstance, simply running wordpress or geeklog on my own machine is an option, but rather a crappy one in capacity and reliability terms. I've already looked into using a general web hosting provider, and I could move onto that for probably five quid a month, but I've again been put off by reliability issues. Also, in the threat scenario under consideration, third-party web hosting might be affected also.

But Davies in passing mentioned email. When I saw that I went "D'oh". I hadn't thought of using SMTP. I'd thought of NNTP, which I have a soft spot for¹, but rejected it. SMTP could well be the answer — like NNTP, it was designed for intermittent connections. Running mailman or something on your home PC is a lot simpler and safer than running wordpress. The beauty of it is that not even Hollywood can get email banned. And if they tried, all you need to keep dodging is a non-government-controlled DNS, which is something people are already working on.

You still need a published archive though; one that people can link to. But that can work over SMTP too, as a request-response daemon. Those were actually quite common before the web: you could get all sorts of information by sending the right one-line email to the right address.

There were actually applications that ran over SMTP. One which lasted well into web days, and may even still exist here and there, was the diplomacy judge, for playing the board game Diplomacy over email.

Unmoderated comments would have to go under this scenario, whatever the technology, but moderated comments would be easy enough; the moderator would just forward acceptable comments onto the publication queue. Email clients in the days when mailing lists were very common were designed specifically to make following lists in this way easy (I remember mutt was much favoured for the purpose). Each list became a folder (by using procmail or the like), each post a thread, and each comment a reply. My own email is still set up that way, though I pretty much never look at the list folders any more, I think a couple of them are still being populated for things like development of the linux b43 wireless chipset driver.

The problem with using mail is spam. Everyone who wants to subscribe has to give me their email address — that's probably the biggest reason why the use of mailing lists declined; that and the impact of false positives from spam filtering.

 If generic publishing networks drown in media, and mail drowns in spam, then some more private network is needed.

Requirements:

  •  Anyone can access posts, as easily as possible
  •  I only have to process posts from sources I've chosen

Our big advantage is that the actual storage and bandwidth needed for blogging are rounding error in a world of digital video.

Reliable access requires that there are multiple sources for posts, to compensate for the fact we're not running industrial data centres.

The obvious approach is that if I follow a blog, I mirror it. Someone wanting to read one of my posts can get it from my server, or from any of my regular readers' servers. That just leaves the normal P2P problems

  • locating mirrors, in spite of dynamic IP assignment
  • traversing NAT gateways which don't allow incoming connections.
  • authenticating content (which might have been spoofed by mirror)


Authentication is trivial — there's no complex web of trust: each blog has an id, and that id is the digital signature. The first two are difficult, but have been solved by all the P2P networks. Unlike some of them, we do want to persistently identify sources of data, so presumably each node regularly notifies the other nodes it knows of of its location. Possibly other already-existing p2p networks could be used for this advertisement function. There's a DoS vulnerability there with attackers spoofing location notifications, so probably the notifications have to be signed. I guess the node id is distinct from the blog id (blogs could move, nodes could originate more than one blog) so it's also a distinct key. Like a blog id, a node id essentially is the public key. NAT traversal I'm not sure about — there's stuff like STUN and ICE which I haven't really dealt with.

Assuming we can map a persistent node id to an actual service interface of some kind, this is what it would have to provide:

  • List blogs that this is the authoritative source for
  • List blogs that this mirrors (also returning authoritative source)
  • List other known mirrors for a blog id
  • List posts by blog id (optional date ranges etc)
  • Retrieve posts by blog id and post id
  • Retrieve moderated comments by blog id and post id (optional comment serial range)
  • Retrieve posts and moderated comments modified since (seq num)

The service is not authenticated, but posts and moderated blog comments are signed with the blog key. (Comments optionally signed by the commenter's key too, but a comment author signature is distinguishable from a comment moderator signature).

The service owner can also

  • Create post
  • Add post to a blog
  • Edit post
  • Add a moderated comment to a blog
  • Check mirrored blogs for new posts & comments & mirror list updates

There's a case for mirroring linked posts on non-followed blogs: if I link to a post, I include it on my server so that whoever reads it can read the link too.  Ideally, there should be an http side to the service as well, so people outside the network can link to posts and see them if they have the good luck to catch the right server being available at the time.  That all needs more thought.

¹When RSS was coming in, I argued that it was just reinventing NNTP and we ought to use that instead.

2 comments:

sconzey said...

DIASPORA* is probably what you're looking for. I was pleased to discover earlier this week it has survived the suicide of one of the founding developers.

James A. Donald said...

Email is insecure and designed for a trusting world where everyone is well behaved and everyone knows everyone else. We are going to need a fully militarized protocol, since it is going to come under state sponsored attack.

The use case should be optimized for politically incorrect posts apt to destroy the poster's career, massive copyright violation, and child porn. Might as well be hung for a sheep as a lamb. And maybe we could figure out a way to make it useful for tax evasion. Since everything is associated with public and private keys, can be used to transfer promises to pay.

Nodes shall be identified by their public keys and Zooko's triangle. The network addresses associated with a public key should be located with DHT. The network address should include NAT traversal.

A blog is identified by the blog name and the signature of the node which is the authoritative source for that blog, and analogously blog posts and approved comments. All comments have to be approved. In an unmoderated blog, they are approved by the spam filter.