How newsgroups refer to NetScan data

Reflections and Reactions to Social Accounting Meta-Data. Eric Gleave (U of Washington) and Marc Smith (Microsoft Research). At C&T.

In 18 months, there were about 5000 messages that explicitly referred to “netscan.research”. Analyzed/coded 952 messages.

Basic findings:

  • Half discuss groups. 80% of those linking to the Netscan report card for the group, 17% explicitly discuss the group’s “health”.
  • 22% discuss the message’s author, such as saying that the author is #1 in the group.
  • 31% discuss others, including their stats; 5% of these are “troll checks”
  • 48% discuss the Netscan system itself

Some discussion points:

  • Helpful for comparisons between competing groups on similar topics
  • Reduces costs of monitoring and sanctioning
  • Facilitates construction and maintenance of status
  • Identifies people who are trolls
Posted in Uncategorized | Leave a comment

Rhythms of social interaction at Facebook

Rhythms of social interaction: messaging within a massive online network. Scott A. Golder, Dennis M. Wilkinson and Bernardo A. Huberman (HP labs).

Scott Golder presenting at C&T. Log analysis of Facebook messaging patterns, from 496 North American universities.

The college weekend goes Friday noon to Sunday noon. Message traffic follows the same pattern Mon-Thurs. Friday morning is same as Mon-Thurs. morning. Sunday afternoon/evening is same as Mon-Thurs. Saturday all day, plus Friday PM and Sunday AM, have much lower traffic.

45% of messages and pokes went to people at different schools. However, this percentage was much lower in the late night/early morning hours.

Perhaps the most surprising result is the seasonal variation in the percentage of messages that are within versus between schools. During vacations, the percentage of within-school messages increases! The authors give the plausible explanation that the messaging is substituting for in-person communication between the same people that would occur when school is in session. This seems surprising to me, however, as I would have thought that the complementarity effect would be stronger– you send a poke or message to someone that you saw earlier today or expect to see later today. It would be interesting to see some future research that explores more directly the complementarity/substitution effects of various communication modalities with f2f meetings in everyday use.

Posted in Uncategorized | Leave a comment

Group Formation in Large Social Networks

L. Backstrom, D. Huttenlocher, J. Kleinberg and X. Lan. “Group Formation in Large Social Networks: Membership, Growth, and Evolution”, Proceedings of KDD 2006.

Datasets on membership in LiveJournal groups and explicit “friend” relationships; and on publishing in conferences and explicit citations between authors.

Question 1: How does the probability of joining a group depend on the friends who are already in it?
A: ‘The data suggest a “law of diminishing returns” at work, where having additional friends in a group has successively smaller effect but nonetheless continues to increase the chance of joining…’ But if a greater percentage of the friends are linked to each other, the probability of joining is even higher. They suggest that a “strength of weak ties” argument would suggest the opposite of this finding (you’re more likely to find out new info from weak ties who don’t know each other). But I think decisions about joining require much more than just finding out about the community. (See next blog entry on what makes people commit to/stay in a community.)

Question 2: Which communities will grow over time?
A: Here the characteristics provide a little less predictive power. One obvious one, given the result above, is if there are a lot of people who have a lot of friends in the group, then the group will have larger growth in the next time period. Somewhat more puzzling is that the more three-person cliques in the group, the less the group grows. This could reflect that stagnant groups eventually develop more links among members and hence more cliques.

Question 3: “given a set of overlapping communities, do topics tend to follow people, or do people tend to follow topics?”
A: More frequently, people active in a conference where a topic is hot start going to other conferences where the topic is already hot, rather than the transplantation of people causing the topic to become hot.

Posted in Uncategorized | Leave a comment

Kraut: Developing Commitment Through Conversation

Today I’m at the Communities and Technologies conference, at the workshop on studying interaction in online communities.

Bob Kraut is discussing some of the data analysis issues in his study in Usenet newsgroups of what independent variables predict whether a message would get responded to.

They first did some machine learning techniques to identify the signature of messages that have a “self-introduction”. Then they used that as a regressor, along with some directly measurable variables like using first-person pronouns.

He and Moira Burke have a paper tomorrow where they did a controlled experiment. They found that the key ingredient is saying that you’re part of the community, not that you share the interest/condition around which the group has formed.

Posted in Uncategorized | Leave a comment

Collusion-resistant, Incentive-compatible

At EC-07, Radu Jurca presented a paper extending work on eliciting honest ratings to consider situations where a set of players may collude to increase their payments for ratings. The setting is the same as that of my paper with Nolan Miller and Richard Zeckhauser, on the “The Peer Prediction Method“. That is, a set of raters are scored based on comparing their reports to the reports of other raters– there is no ultimate ground truth of whether the item is “good” that can be used to evaluate the raters.

Our paper showed that it is possible to construct payments that make honest reporting a Nash Equilibrium (i.e., best thing to do if others are doing it) while creating an expected reward large enough to encourage effort required for the raters to acquire a quality signal about the item. The technique is based on proper scoring rules, applied to the posterior distribution for a reference rater, computed from the prior distribution and the rater’s report. Jurca and Faltings considers whether it’s possible to make such incentive payments resistant to collusion (e.g., all raters agree to report that the item is good).

Interestingly, the authors find that it is useful to make incentive payments based on the ratings of more than one reference rater. Instead of just adding up the payments determined independently by each of their reports, which I assumed would be the most effective way to do it, the payments are tied to a count of the number of reference raters who report that the item is good. Consider, for example, if the implied probability distribution for each of the reference raters is that each will report “good” with probability 0.6. Then, the number who will report good follows a binomial distribution. By carefully choosing the points a rater gets for reporting “good” or “bad” when n other people report “good”, it is possible to rule out some forms of collusion. For example, with 10 raters and a prior probability distribution that each will report “good” with probability 0.5, it is easy to see that we can make the payoff be 0 when either none or all report “good”, yet make the payoff for 6 total “goods” when you report good be high enough that you will want to report “good” whenever you see it, if you think others will report honestly. Nolan Miller, Richard Zeckhauser and I had the basic intuition that we could punish all the raters if there was “more than the expected amount of agreement”. This fleshes out that intuition with a concrete way of setting the incentive payments.

The most interesting result in this paper comes in section 7, which considers “sybil attacks”. One person controls several raters, which I’ll refer to as sybils (split identities of the person). They each acquire a real signal. The person is trying to maximize the sum of the expected payoffs of the raters. The authors find that, depending on the particular prior distribution, if one or just a few reference raters is assumed to act honestly, the incentive payoffs can be constructed so that even if the rest of the raters are sybils controlled by a single entity, they cannot do better than to report the same number of “good” ratings as they actually perceived. The technique is a brute force approach (automated mechanism design) that just writes down each of the incentive compatibility constraints (for each possible number of good ratings perceived, the expected payoff given the distribution of ratings from the honest raters, is higher for honest reporting than for any false report) and then solves the linear programming problem to find the smallest expected payment subject to those constraints. It would be nice to get some stronger intuitions about what kind of payments will be selected by the brute force approach. That is, how is it leveraging the small number of honest raters to drive the colluding raters toward honest reporting? Still, I laud the authors for fine work in demonstrating that it generally is possible to resist such collusion, so long as they expect there to be a few honest raters around.

Radu Jurca and Boi Faltings, “Collusion-resistant, Incentive-compatible Feedback Payments”, Proceedings of ACM EC’07, P.200-209.

Posted in Uncategorized | Leave a comment

Recommenders and Sales Diversity

At the EC ’07 conference, Kartik Hosanagar presented a paper modeling the impact of recommender systems on sales diversity. Do they contribute to a long tail, where lots of products get a few sales, or do they reinforce blockbusters. The paper suggests the latter.

There are actually two effects that we should expect from recommenders. One is discovery– once one person discovers an item, some other people with similar tastes who would not have found that item do find it. The other is reinforcement– an item that many people have sampled will be more likely to get recommended.

The paper provides a simple two-item, two-player, two-urn model in section 4. Unfortunately, it begins with an assumption that both players have the same probabilities of choosing the two items, in the absence of a recommender. Without diversity in what people who choose without the recommender, it doesn’t seem to capture the discovery effect for recommenders.

Section 5 seems to provide a more promising simulation framework. Consumers have different “ideal points” in the space, and thus are likely to select some distribution of items in absence of a recommender. The recommender that increases the salience of some items to people that are little farther from their ideal point. Even here, however, it doesn’t quite seem to capture the phenomenon that the recommender makes salient an item that is in fact closer to the consumer’s ideal than what the consumer would have found. It seems to me that you’d need a variant of the Hotelling model where there’s a separate model of item salience that is not completely determined by the distance from the customer’s ideal. Things that are already blockbusters would be more likely to be noticed and chosen, even if farther from the customer’s ideal. That’s kind of how the recommender is modeled, but I think it needs to be applied to the base choice model, not just the effect of the recommender system.

D. Fleder, K. Hosanagar “Recommender Systems and Their Impact on Sales Diversity”, Proceedings of ACM EC ’07, pp.192-199.

Posted in Uncategorized | Leave a comment

Peer Prediction Method with Reduced Payments

ACM EC 06. Jurca and Faltings

Work extends my “Peer Prediction” paper, written with Nolan Miller and Richard Zeckhauser, on eliciting honest reports, by comparing reports between people.

Automatically selects a scoring rule, with lower expected payments but still incentive compatible.

Has some mechanism for probabilitistically filtering out unusual ratings. I’ll have to look at the paper to see the details of this.

Claims that the honest reporting equilibrium is evolutionarily stable, meaning that small coalitions can’t attack it. Again, I’ll have to take a look at this.

Posted in Uncategorized | Leave a comment

Collaborative Filtering with Privacy

ACM EC ’06. Presentation on privacy-preserving collaborative filtering.

Previous approaches:

  • secure multi-party computation to compute eignevectors (Canny).
  • add noise to each rating

This paper shows that adding noise may not preserve as much privacy as you’ d like. If the noise for each rating is a random draw from the same distribution, and if there is a finite set of possible ratings, then you can make a pretty good backward inference about what the original ratings were. The basic idea is…

The solution in this paper is to have users add a variable amount of noise to their ratings, not the same draw for each item.

I haven’t had a chance to read the paper in detail yet, but it seems quite elegant. I hope I’ll be able to use it in my recommender systems course this fall, though the math may be too advanced.

Posted in Uncategorized | Leave a comment

Sponsored Search Auction Mechanisms

Current session has several papers on auction mechanisms for conducting auctions for which ads will be displayed in sponsored search.

Lahaie, analysis of alternative auction designs, including Yahoo and Google’s current mechanisms. Offers an overview of the design space.

Mahdian and Saberi, MSR. Online algorithm, meaning that you have to decide which advertiser gets each search without knowing how many more searches there will be. Based on picking a single price to charge all advertisers. May be missing something, but the problem setup doesn’t seem to match real advertising allocation problems, and the solution seems to unnecessarily restrict to fixed-price for all advertisers, rather than the kinds of mechanisms in the previous and next papers.

Aggarwal, Google presentation, Aggarwal et al.. Current mechanism: Advertiser makes a per-click dollar bid (for a particular search keyword). Google orders the bids based on bid*estimated-clickthru-percentage. If you’re in slot j, you pay the rate based on the bid of slot j+1. This seems like it might be a nice generalization of 2nd price auction mechanism, but it’s not– it’s not incentive-compatible. Presented design for a new mechanism in which truthful bidding is best, assuming others are bidding truthfully. For some reason, she said you can’t use a VCG mechanism unless a “separability” condition holds. But the actual mechanism she presented is, I think, a VCG mechanism. Perhaps I’m missing something, or perhaps she has a more restricted idea of what a VCG mechanism is. The mechanism she presents is only incentive-compatible if there are no budget constraints that tie different auctions together or repeated-game effects from revealing your preferences today impacts on tomorrow’s auction behavior of your opponents.

Estimating click-through rates for ads, without actually paying the full cost of putting your ad up and measuring it. This estimate is useful for optimizing your bidding.

Posted in Uncategorized | Leave a comment

ACM EC 06: Fudenberg invited lecture

I’m at the ACM EC conference for the next couple days. Computer Science theory/algorithms/AI people looking at economic incentive issues.

This talk: “Stable Superstitions and Rational Steady State Learning”, given by Drew Fudenberg (joint work with Levine)

(These are scattered notes taken during the actual talk. If it seems to the reader that it’s getting at something interesting, you can probably get better intuitions about it, and more accurate characterization of results, from a paper, or a set of slides posted by Levine.)

Context: Learning in games. Anonymous random matching. Some history of previous papers that went too fast to capture.

“Self-confirming equilibrium”; less restrictive than Nash. No one can do better with “rational experimentation.” Nash requires people to know what would happen if you deviate.

Agents off equlibirum path play infrequently, so have much less incentive to experiment. Wrong steps one step off equilibrium can’t be stable, but wrong steps two off equilibrium can.

Illustration: Hmmurabi’s second law. Accused person is thrown in river. If lives, accuser is killed. if dies, accuser gets their property.
Superstition: guilty are more likely to drown than innocent. This supersition is stable, because accusers rarely get to find out, because if they believe it, they won’t accuse the innocent, and they don’t get to find out.
Alternative supersitition: guilty will be struck by lightning. This superstition is not stable. Kids try petty crime and discover they’re not struck by lightning.

Rational Steady-State Learning
Agent’s decision problem: each agent in role i expects to play T times. Agent observes only terminal node each time. Agent believes faces time-invariant distribution of opponents’ strategies. (This is wrong, but hopefully a reasonable model of how people would actually be thinking.) Steady states are where people play strategies that are optimal given the information they have from the previous rounds.

Results focus on characterizing steady states as T tends to infinity– most players have lots of observations of play (but only rational experimentation in those rounds of play), and htere are few novices in the game in any round.

Asymptotic result for Hammurabi caes: there will be no crimes (in the limit of arbitrarily long lifetimes). With long but finite T, some crimes are committed, some false accusations take place, and people making false accusations learn that they work. But if there are few opportunities for being a witness, then there’s no rational interest in experimenting with false accusation, because you won’t get to do it very often even if you find out that the false accusation works.

Model highlights the role of experimentation in determining when a superstition is likely to survice.

David Parkes: question about applications to Sponsord Search design– implications for encouraging experimentation or sharing information learned from experimentation.

Posted in Uncategorized | Leave a comment