Interesting paper presentation by Tim Roughgarden.
He gave a nice introduction to the recent literature on provably privacy-preserving mechanisms for publishing statistical summaries such as counts of rows from databases satisfying some property (e.g., income > 100,000). Suppose a mechanism computes the actual count, and then reports something possibly different (e.g., by adding noise). There is a definition of a p-privacy if, for every possible output (count), for any person (row) the ratio of the probability of that output with the row present to the probability of that output with the row omitted is always in the range [p, 1/p]. Intuitively, whatever the actual count, there’s not much revealed about whether any particular person has high income.
One technique that works for counts, LaPlace-p, is to add to the correct count +/- z, where probability of z is 1/2(-lnp)e^^(z/lnp). For any reported count, there’s some confidence interval around it, and the size of that confidence interval is independent of the count. Thus, for reported count 1, you can’t really tell whether the correct count is 1 or 0, and thus you can’t really tell whether a particular person has high income, *even if you have great side information about everyone else in the database*. On the other hand, if the reported count is 27,000, you still can’t tell much about any one person, but you can be pretty sure that the correct count is somewhere around 27,000.
Roughgarden’s paper is about how much value you can get from the best count function (in terms of some loss function comparing true result to reported result) while still preserving the p-privacy requirement. It turns out that a mechanism very much like LaPlace-p, but discretized, works to minimize the expected loss no matter what the user of the count’s priors are about the distribution of correct counts. It is in this sense that it is universal. This requires a little user-specific post-processing of the algorithm’s output, based on the user’s priors about the correct counts. For example, if the reported count is -1, we know that’s not the correct count; it must really be 0 or something positive, and you can back out from the report and the user’s prior beliefs to infer a belief distribution over correct counts.