Matt Hamer, founder of Attribyte, has been posting on mth·ology since 1999.

The Algorithm

I opened Slate’s article, Who Controls Your Facebook Feed, and chuckled when I got to the second paragraph.

…according to a closely guarded and constantly shifting formula, Facebook’s news feed algorithm ranks them all, in what it believes to be the precise order of how likely you are to find each post worthwhile. Most users will only ever see the top few hundred. No one outside Facebook knows for sure how it does this, and no one inside the company will tell you.

The Algorithm always ends up being a riddle, wrapped in a mystery, inside an enigma. Its controllers are almost never willing, and rarely able to articulate exactly how its magic works. It is a secret agent with a license to kill a business based on what it decides to show or hide. An entire industry is devoted to empirically discovering and exploiting its biases, bugs and quirks.

Will Oremus does not bury the lede in this story, which probably reflects the sentiment of many Facebook users.

The algorithm’s rankings correspond to the user’s preferences “sometimes,” Facebook acknowledges, declining to get more specific.”

To improve it, as we learn in the paragraphs that follow, Facebook is employing humans in the form of “feed quality panels” and “every news feed tweak must undergo a battery of tests among different types of audiences, and be judged on a variety of different metrics.” In short, to get better it needs human input. But what about me? Can I participate as a live “feed quality panel” of one?

Algorithm Control Facebook is increasingly giving users the ability to fine-tune their own feeds—a level of control it had long resisted as onerous and unnecessary.

“Onerous and unnecessary?” I laughed again. I’ve been shouldering the burden of giving Facebook’s algorithm “hints” and I can’t tell if the result is better or not. Other than the heavy-handed options like “unfollow,” the rest feel like a placebo because there’s no feedback. If I ask to “see fewer posts like…” how can I judge the effect? I just have to wait and see. Why is this?

In addition to possible business reasons, there’s a technical one. When you view your feed on Facebook, a timeline on Twitter, or just about any other page on a site that has scaled to millions, or billions of users, what you’re seeing was previously generated before you clicked. In developer-speak, the view was created at write-time rather than query-time. The upside is consistently fast, reliable page loads that require minimal on-demand processing. The downside is that Facebook (and now Twitter?) must divine what to show you based on heuristics, clicks, focus groups and sponsorships; hence, The Algorithm.

No Hubbub - Cloud PubSub

Last week, Google announced the beta release of Cloud PubSub. They state,

“We designed Google Cloud Pub/Sub to deliver real-time and reliable messaging, in one global, managed service that helps developers create simpler, more reliable, and more flexible applications. It’s been tested extensively, supporting critical applications like Google Cloud Monitoring and Snapchat’s new Discover feature.”

Push-based publish/subscribe over HTTP is something I’ve been interested in since Google helped draft the PubSubHubbub protocol five years ago.

From the abstract,

“We offer this spec in hopes that it fills a need or at least advances the state of the discussion in the pubsub space. Polling sucks. We think a decentralized pubsub layer is a fundamental, missing layer in the Internet architecture today and its existence, more than just enabling the obvious lower latency feed readers, would enable many cool applications, most of which we can’t even imagine. But we’re looking forward to decentralized social networking.”

It’s a shame they chose a silly name and focused unnecessarily on the distribution of changes in “feeds” instead of using more general language. There’s no reason this protocol can’t be applied to the distribution of arbitrary messages. In fact, shortly after the first draft was published in 2010, I decided to build an implementation to use as a message distribution system for Attribyte. HTTPS with authentication, or spiped, allows messages to be distributed in near-real-time to services running inside the same rack or anywhere in the cloud.

So, is Cloud PubSub based on PubSubHubbub? No, but the terminology and mechanics of the “push” model are almost exactly the same. Publishers create “topics” on the service to which they publish messages. Applications subscribe to topics and receive published messages on a pre-configured HTTP endpoint.

How do the two compare?

Feature Cloud PubSub PubSubHubbub
Subscribe to topic? Synchronous. JSON API or Google developer console used to configure the callback endpoint for the topic. Multi-message asynchronous “dance” over HTTP allows subscribers to negotiate an endpoint for message reception with verification of intent.
Subscription lifetime? Until explicitly canceled. Specified lease time or explicitly canceled.
Receive callback messages? “Webhook HTTPS” I think this is jargon that simply means HTTPS POST HTTP/HTTPS POST
Message content type? JSON. Binary messages must be Base 64 encoded. Arbitrary bytes! Because this protocol was originally created for distributing changes to Atom/RSS feeds, most people, I think, assume that it is only useful for distributing feed changes.
Arbitrary “attributes” associated with the message? Embedded in the JSON message. HTTP headers.
Subscriber acknowledges receipt? “Success” HTTP response with configurable “ack deadline.” “Success” HTTP response. Timeout is not part of the protocol, but obviously configurable.
Delivery guarantees? Out-of-order delivery possible. At least one delivery. Retry on failure? Yes. Not specified. Neither delivery order nor single delivery is guaranteed. Retry on failure? “Hubs SHOULD retry notifications repeatedly until successful (up to some reasonable maximum over a reasonable time period)”
Security? HTTPS HTTPS and per-subscriber, HMAC-based, “authenticated content distribution.”
Run your own server? No – but there’s nothing that prevents building one with an identical API. Yes.

Ignore the subscription mechanics. A server that supports pubsubhubbub can easily support the “push” version of the Cloud PubSub API.

For either, Here’s what you get for delivery guarantees: maybe ordered, possibly delivered more than once. So, not much! Even if you just need to feel confident that, “the subscriber finished processing the message,” you’ll have to rely on hub retry. To accomplish this, the subscriber must wait until processing is complete, however long that takes, to send the “OK” response back to the hub. Of course, if the processing is stalled or takes a long time, the hub may decide to close the socket before processing finishes. When this happens, be prepared to handle the same message when the hub attempts a retry! If it happens frequently, the hub is going to run out of resources spending time and connections waiting for the synchronous response. To avoid this (but not the potential for duplicate messages), Cloud PubSub supports a “pull” model that PubSubHubbub doesn’t.

Publish-subscribe is a powerful tool, but not for the faint of heart. As Pat Helland says…

Messaging across loosely coupled partners is inherently an exercise in confusion and uncertainty.