Universally Utility-Maximizing Privacy Mechanisms

Interesting paper presentation by Tim Roughgarden.

He gave a nice introduction to the recent literature on provably privacy-preserving mechanisms for publishing statistical summaries such as counts of rows from databases satisfying some property (e.g., income > 100,000). Suppose a mechanism computes the actual count, and then reports something possibly different (e.g., by adding noise). There is a definition of a p-privacy if, for every possible output (count), for any person (row) the ratio of the probability of that output with the row present to the probability of that output with the row omitted is always in the range [p, 1/p]. Intuitively, whatever the actual count, there’s not much revealed about whether any particular person has high income.

One technique that works for counts, LaPlace-p, is to add to the correct count +/- z, where probability of z is 1/2(-lnp)e^^(z/lnp). For any reported count, there’s some confidence interval around it, and the size of that confidence interval is independent of the count. Thus, for reported count 1, you can’t really tell whether the correct count is 1 or 0, and thus you can’t really tell whether a particular person has high income, *even if you have great side information about everyone else in the database*. On the other hand, if the reported count is 27,000, you still can’t tell much about any one person, but you can be pretty sure that the correct count is somewhere around 27,000.

Roughgarden’s paper is about how much value you can get from the best count function (in terms of some loss function comparing true result to reported result) while still preserving the p-privacy requirement. It turns out that a mechanism very much like LaPlace-p, but discretized, works to minimize the expected loss no matter what the user of the count’s priors are about the distribution of correct counts. It is in this sense that it is universal. This requires a little user-specific post-processing of the algorithm’s output, based on the user’s priors about the correct counts. For example, if the reported count is -1, we know that’s not the correct count; it must really be 0 or something positive, and you can back out from the report and the user’s prior beliefs to infer a belief distribution over correct counts.

Advertisements
Posted in Uncategorized | Leave a comment

Babaioff; Characterizing Truthful Multi-Armed Bandit Mechanisms

At Economics of Search conference.

Moshe Babaioff presented an interesting paper.

Suppose that you’re conducting an auction for adwords, where you want to rank the bidders based on expected revenue in order to allocate slots and determine prices for slots based on bids. But suppose you don’t know what the clickthrough rate will be for the items.

In a multi-armed bandit model, there are multiple bandit slot machines and you have to decide which arms to pull. There is an explore/exploit tradeoff– you need to explore (experiment) to estimate the clickthrough rates, including some experimentation with those you have low estimates for, in case that estimate is wrong. But over time you switch to more exploitation, where you pull the arm of the highest expected value.

The new twist in this paper is that you want advertisers to truthfully reveal their valuation for a click. If clickthrough rates are known, you can set price essentially using a second-price mechanism based on bid*clickthrough. But if you’re using a multi-armed bandit algorithm to determined clickthrough rates, the correct prices would depend on estimated clickthrough rates that you don’t necessarily see because you don’t test them.

It’s a theory paper. They prove that, with the requirement that the mechanism induce trruthful bidding, there’s always a pure exploration phase, where the selection of the winners can depend on previous clickthroughs but *not* on the bids; and then a pure exploitation phase, where the clickthroughs no longer affect allocation of slots in the next round. The best multi-armed bandit algorithms without the truth-telling requirement don’t have that separation of phases. And, it turns out that the best algorithms without the truth-telling requirement have less “regret” relative to the best you could do if you magically knew the clickthrough rates at the beginning.

So now I’m curious what the best algorithms are without the truth-telling requirement. My guess is that they put more exploration into things that the best estimate so far has higher value for. We actually need to use an algorithm like this for the next version of our “converation double pivots” work on drupal.org, where we’re going to dynamically change the set of recommended items based on a combination of prior generated from recommender algorithms and actual observed clicks. But we don’t have any truthful revelation requirement, so we should be able to use the standard algorithms.

Posted in Uncategorized | Leave a comment

Rewards program for social networking activity

As part of the CommunityLab project, for the past five years I’ve been doing research related to incentives for participation in online communities. Now one of my colleagues, Yan Chen, is working with a startup company, urTurn, that has created a cross-platform rewards program. That is, you accumulate points for posting photos or making friend links in social network sites like Facebook and MySpace. Then you turn in the points for prizes.

I’m not quite sure what their business model will be (what do they get from having people accumulate points on their site)? But it will be interesting to see how motivating the points are for people, and how they will prevent various attempts to game the system.

So, sign up, help Yan with her research (she has no financial stake in the company), and win valuable prizes!

Posted in Uncategorized | Leave a comment

How newsgroups refer to NetScan data

Reflections and Reactions to Social Accounting Meta-Data. Eric Gleave (U of Washington) and Marc Smith (Microsoft Research). At C&T.

In 18 months, there were about 5000 messages that explicitly referred to “netscan.research”. Analyzed/coded 952 messages.

Basic findings:

  • Half discuss groups. 80% of those linking to the Netscan report card for the group, 17% explicitly discuss the group’s “health”.
  • 22% discuss the message’s author, such as saying that the author is #1 in the group.
  • 31% discuss others, including their stats; 5% of these are “troll checks”
  • 48% discuss the Netscan system itself

Some discussion points:

  • Helpful for comparisons between competing groups on similar topics
  • Reduces costs of monitoring and sanctioning
  • Facilitates construction and maintenance of status
  • Identifies people who are trolls
Posted in Uncategorized | Leave a comment

Rhythms of social interaction at Facebook

Rhythms of social interaction: messaging within a massive online network. Scott A. Golder, Dennis M. Wilkinson and Bernardo A. Huberman (HP labs).

Scott Golder presenting at C&T. Log analysis of Facebook messaging patterns, from 496 North American universities.

The college weekend goes Friday noon to Sunday noon. Message traffic follows the same pattern Mon-Thurs. Friday morning is same as Mon-Thurs. morning. Sunday afternoon/evening is same as Mon-Thurs. Saturday all day, plus Friday PM and Sunday AM, have much lower traffic.

45% of messages and pokes went to people at different schools. However, this percentage was much lower in the late night/early morning hours.

Perhaps the most surprising result is the seasonal variation in the percentage of messages that are within versus between schools. During vacations, the percentage of within-school messages increases! The authors give the plausible explanation that the messaging is substituting for in-person communication between the same people that would occur when school is in session. This seems surprising to me, however, as I would have thought that the complementarity effect would be stronger– you send a poke or message to someone that you saw earlier today or expect to see later today. It would be interesting to see some future research that explores more directly the complementarity/substitution effects of various communication modalities with f2f meetings in everyday use.

Posted in Uncategorized | Leave a comment

Group Formation in Large Social Networks

L. Backstrom, D. Huttenlocher, J. Kleinberg and X. Lan. “Group Formation in Large Social Networks: Membership, Growth, and Evolution”, Proceedings of KDD 2006.

Datasets on membership in LiveJournal groups and explicit “friend” relationships; and on publishing in conferences and explicit citations between authors.

Question 1: How does the probability of joining a group depend on the friends who are already in it?
A: ‘The data suggest a “law of diminishing returns” at work, where having additional friends in a group has successively smaller effect but nonetheless continues to increase the chance of joining…’ But if a greater percentage of the friends are linked to each other, the probability of joining is even higher. They suggest that a “strength of weak ties” argument would suggest the opposite of this finding (you’re more likely to find out new info from weak ties who don’t know each other). But I think decisions about joining require much more than just finding out about the community. (See next blog entry on what makes people commit to/stay in a community.)

Question 2: Which communities will grow over time?
A: Here the characteristics provide a little less predictive power. One obvious one, given the result above, is if there are a lot of people who have a lot of friends in the group, then the group will have larger growth in the next time period. Somewhat more puzzling is that the more three-person cliques in the group, the less the group grows. This could reflect that stagnant groups eventually develop more links among members and hence more cliques.

Question 3: “given a set of overlapping communities, do topics tend to follow people, or do people tend to follow topics?”
A: More frequently, people active in a conference where a topic is hot start going to other conferences where the topic is already hot, rather than the transplantation of people causing the topic to become hot.

Posted in Uncategorized | Leave a comment

Kraut: Developing Commitment Through Conversation

Today I’m at the Communities and Technologies conference, at the workshop on studying interaction in online communities.

Bob Kraut is discussing some of the data analysis issues in his study in Usenet newsgroups of what independent variables predict whether a message would get responded to.

They first did some machine learning techniques to identify the signature of messages that have a “self-introduction”. Then they used that as a regressor, along with some directly measurable variables like using first-person pronouns.

He and Moira Burke have a paper tomorrow where they did a controlled experiment. They found that the key ingredient is saying that you’re part of the community, not that you share the interest/condition around which the group has formed.

Posted in Uncategorized | Leave a comment