privacy and tracking

I used to just despise any software that tracked user behavior. I still do, for the most part. But I do know there are some reasons online and otherwise that require a company to track you to do their business.

The big problem (and nobody treats it as such!) is that the technology of "aggregate" statistics isn't very practical yet. Right now, you give up the ability to guarantee "unique" impressions (and there are ways to exploit the system unfairly) or users give up privacy (which is the de-facto reality on the web.)

I think there needs to be more research in making this all work -- some may come from crypto, some from statistics. But right now, we have anonymous tracking cookies that (if they ever get connected with real user data) represent a huge invasion of privacy.

Let's say you're in the business of selling ads online. In that case, you want to be able to tell the difference between one guy pushing "reload" 1000 times, and 1000 true unique impressions/clickthroughs of your ad.

So right now, everyone opts to give you a cookie that lasts more or less forever, and as they track your behavior, they build up a log that is very personal, indeed.

Well, what are the alternatives?

There are several client-side alternatives, some of which strip tracking information during a browsing session, others of which "relay" your traffic through peers in an attempt to randomize the browsing. The first is great for individuals, but in a wide-enough deployment, it wouldn't give any information back to the businesses paying for a lot of services online. Again, either you have no tracking (and no revenue) or the abuse "1000 reloads" case becomes possible.

I'm intrigued by three potential solutions:
1. Randomly permuting user information (analogous to trading Ralph's club cards at the grocery store)
2. A trusted intermediary that guarantees user uniqueness, but can see no actual data (public key data crypto on the endpoints, the middleman only does authentication)
3. Statistics-based tracking, weighting based on IP address and repetitive behavior.

I believe that (1) holds the most promise if done fairly, but (3) has some attractiveness from a research perspective.

One significant problem is that the security cannot be client-side to work -- session cookies would work fine if this were the case, but they are too easily reset by someone attempting to scam the system. For instance, a system where you were allowed to "swap" keys with a random peer without any central knowledge of timing or the pairing would be ideal, but you'd need some guarantee that you weren't doing this per-request.

Server-based solutions are better for this kind of thing, but they're not adequate either, because people don't trust their privacy in the hands of a third party.

In effect, we need a system that is implemented server-side, but with client visibility into the workings. I don't know the solution, but I thought I'd write down my thoughts...perhaps someone does.

No comments:

Post a Comment