Research and Analysis of Tail Phenomenon Symposium
August 20th, 2010 - Sunnyvale, CA
The last decade has witnessed the emergence of enormous scale artifacts resulting from the independent action of hundreds of millions of people; for example, web repositories, social networks, mobile communication patterns, and consumption in "limitless" stores. The stochastic analyses of these processes often reveals heavy-tailed distributions and their understanding has become a common challenge for a plethora of scientists: statisticians, physicists, sociologists, economists, computer scientists, etc. It is time for all of us to get together! To this end we invite you to participate in the first Research and Analysis of Tail phenomena Symposium (RATS) that will explore the different computational, statistical, and modeling problems related to tail phenomena. The symposium will consist of a series of longer invited talks by experts in the field, and a number of shorter contributed presentations. We are particularly encouraging summer interns in any of the Bay Area research centers to join us in the event.
August 20th, Friday, 2010
We will start with a video welcome by Chris Anderson (Wired), followed by a series of invited talks by Michael Mitzenmacher (Harvard), Aaron Clauset (Univ. of Colorado), Sharad Goel (Yahoo! Research, NY), Neel Sundaresan (eBay), Michael Schwarz (Yahoo! Research, CA) and Silvio Lattanzi (La Sapienza Univ. of Rome).
|9:00 - 9:30
|| Introduction, video welcome by Chris Anderson
|9:30 - 10:30
||Brief history of generative models by Michael Mitzenmacher [Slides]
|10:30 - 11:00
|11:00 - 12:00
||Power law distributions in empirical data by Aaron Clauset [Slides]
|12:00 - 1:00
|1:00 - 2:00
||Anatomy of the Long Tail by Sharad Goel [Slides]
|2:00 - 3:00
||Shapes and Shops: A Long Tail Commerce Network by Neel Sundaresan [Slides]
|3:00 - 3:30
|3:30 - 4:30
||Bayesian Decision Making under Extreme Uncertainty by Michael Schwarz [Slides (dvi), Slides (ppt)]
|4:30 - 5:00
||Models for the compressible web by Silvio Lattanzi [Slides]
Building E, Rooms 9-10
700 First Avenue,
Sunnyvale, CA 94089
For map and directions, click here
Speaker Biography and Abstracts
- Chris Anderson is editor-in-chief of Wired, which has won a National Magazine Award for general excellence three times during his tenure. He wrote an article in the magazine entitled The Long Tail, which he expanded upon in the book The Long Tail: Why the Future of Business Is Selling Less of More (2006). He is also the founder and chairman of BookTour.com. His newest book, entitled Free: The Future of a Radical Price, which examines the rise of pricing models which give products and services to customers for free, was released in 2009, by Hyperion.
- A brief history of generative models for power law and lognormal distributions
Michael Mitzenmacher, Harvard University
Abstract: At some point in the past, I became interested in a debate over whether file size distributions are best modeled by a power law distribution or a lognormal distribution. In trying to learn enough about these distributions to settle the question, I found a rich and long history, spanning many fields. Indeed, several models from the computer science community for things like page links in the World Wide Web have antecedents in work on power law and lognormal distributions from decades ago. Here, I briefly survey some of this history, focusing on underlying generative models that lead to these distributions. One finding is that lognormal and power law distributions connect quite naturally, and hence, it is not surprising that lognormal distributions have arisen as a possible alternative to power law distributions across many fields. Based on this history, I suggest some high level issues for future research, particularly in computer science. I propose that there are five stages in power law research: observe, signify, model, validate, and control. I argue that the key research issues currently are in the areas of validation and control.
Michael Mitzenmacher is a Professor of Computer Science in the School of Engineering and Applied Sciences at Harvard University. Michael has authored or co-authored over 140 conference and journal publications on a variety of topics, including Internet algorithms, hashing, load-balancing, erasure codes, error-correcting codes, compression, bin-packing, and power laws. His work on low-density parity-check codes shared the 2002 IEEE Information Theory Society Best Paper Award and won the 2009 SIGCOMM Test of Time Award. His textbook on probabilistic techniques in computer science, co-written with Eli Upfal, was published in 2005 by Cambridge University Press.
Michael Mitzenmacher graduated summa cum laude with a degree in mathematics and computer science from Harvard in 1991. After studying math for a year in Cambridge, England, on the Churchill Scholarship, he obtained his Ph. D. in computer science at U.C. Berkeley in 1996. He then worked at Digital Systems Research Center until joining the Harvard faculty in 1999.
- Title: Power-law distributions in empirical data
Aaron Clauset, University of Colorado at Boulder
Abstract: Power-law distributed quantities occur in many complex systems and have significant consequences both for understanding their underlying structure and for estimating the likelihood of large future events. Unfortunately, the detection and characterization of power laws is complicated by the large fluctuations that occur in the distribution's tail and by the difficulty of identifying the range over which power-law behavior holds. In this talk, I'll survey recent advances in statistical methods for discerning and quantifying power- law behavior in empirical data. As examples, I'll apply these methods to analyze several real-world data sets from a range of disciplines, such as the frequency of English words in text, the severity of terrorist attacks, and the degree distribution of the Internet, each of which has been conjectured to follow a power-law distribution. The results are that in some cases these conjectures are consistent with the data while in others the power law is ruled out. Time allowing, I'll comment briefly on the general utility of fitting and validating power laws and comparing them with other simple distributional models.
Aaron Clauset is an Assistant Professor of Computer Science at the University of Colorado at Boulder and a Fellow in the Colorado Initiative for Molecular Biotechnology. He holds a B.S. in Physics from Haverford College, a Ph.D. in Computer Science from the University of New Mexico, and was an Omidyar Fellow at the Santa Fe Institute. His research focuses on developing computational tools for detecting and characterizing patterns in large social, biological and technological data sets, and models for explaining the origin of these patterns. His research has spanned computer science, physics, chemistry, biology, applied mathematics and political science, and has been published in Nature, Science, Physical Review Letters, the Journal of the Association for Computing Machinery, and the Journal of Conflict Resolution.
- Title: Anatomy of the Long Tail: Ordinary People With Extraordinary Tastes
Sharad Goel, Yahoo! Research
Abstract: The success of "infinite-inventory" retailers such as Amazon and Netflix has been ascribed to a long tail phenomenon. While the majority of their inventory is not in high demand, in aggregate these "worst sellers," unavailable at limited-inventory competitors, generate a significant fraction of total revenue. The long tail phenomenon, however, is in principle consistent with two fundamentally different theories. The first hypothesis is that a majority of consumers consistently follow the crowds, with only a minority showing any interest in niche content; the second is that everyone is a bit eccentric, consuming both popular and specialty products. Examining user preferences for movies, music, web search, and web browsing, we find overwhelming support for the latter theory. Our findings thus suggest an additional factor in the success of infinite-inventory retailers: tail availability may boost head sales by offering consumers the convenience of one-stop shopping for both their mainstream and niche interests. Hence, the return-on-investment of niche products may go beyond direct revenue, extending to second-order gains associated with increased consumer satisfaction and repeat patronage. This is joint work with Andrei Broder, Evgeniy Gabrilovich, and Bo Pang.
Sharad Goel works in the Microeconomics and Social Systems group in Yahoo! Research. He is interested in empirical and theoretical problems at the intersection of computer science and the social sciences, particularly questions motivated by sociology and economics. Sharad received a PhD in Applied Mathematics from Cornell. Prior to joining Yahoo!, he was a research fellow in the math departments at Stanford and the University of Southern California.
- Title: Shapes and Shops: A Long Tail Commerce Network
Neel Sundaresan, eBay Research Labs
Abstract: In this talk we discuss a long tail complex commerce network that is eBay Marketplace. The asymmetric and preferential nature of buying and selling in terms of structure, interactions, trust and reputation measures and evolution of these has tremendous value in identifying business opportunities and building effective user applications. We will discuss the analysis of the macroscopic shapes and behaviors of the networks across the platform and also in specific marketplace categories. We will also discuss the dynamics of search experience, seller behavior, and reputation measures that are influenced by the long tail nature and other network properties of this network.
Neel Sundaresan is a Senior Director and head of eBay Research Labs. His current areas of research interest includes Social and Incentive Networks, Trust and Reputation Systems, Machine Learning as applied to Recommender systems, Classification, and Search. Prior to joining eBay was a founder and CTO of a startup focused on multi-attribute fuzzy search and network CRM. Prior to this he was the head of the eMerging Internet Technologies group at the IBM Almaden Research Center. There he built the first XML-based Search Engine. He was one of the early leaders in building XML technologies including schema-aware compression algorithms, application component generators and pattern-match systems and compilers. He built the first RDF reference implementation as a W3C standard recommendation. He led research work in other areas like domain specific search engines, multi-modal interfaces and assistive technologies, semantic transcoding, web mining, query systems, and classification for semi-structured data. Prior to this he worked on C++ compiler and runtime systems for massively parallel machines and for shared memory systems and also on retargetable compilers, program translators and generators. He has over 50 research publications and over 55 patents to his credit. He has been a frequent speaker at several national and international technology conferences.
- Title: Bayesian Decision Making under Extreme Uncertainty
Michael Schwarz, Yahoo Research
Abstract: Economic variables ranging from distributions of cities or firms by size to returns of financial assets tend to have very heavy tails. I will start with an overview of the role of heavy tail in finance and economics and proceed to an axiomatic model where beliefs about an unknown distribution can be defined as a measure on the space of probability density functions. I will introduce invariance restrictions on an agent's beliefs that (1) characterizes the set of beliefs that are invariant (2) establishes that updating invariant beliefs leads to a posterior probability density function with tail asymptotically approaching (c/(x(ln x)n)) where x is the value of a random variable and c and n are constants.
Michael Schwarz is a Principal Research Scientist at Yahoo! Research in Berkeley, CA. Prior to joining Yahoo! he was a National Fellow at the Hoover institution at Stanford, a Robert Wood Johnson Foundation Scholar at UC Berkeley he was on the faculty at Harvard University Economics Department from 1999 to 2004.
He is also a member of the NBER. Dr. Schwarz specializes in economic theory and its applications to business decision making and public policy. He works on a wide range of topics that include auction theory, decision making under uncertainty, economics of standards, economics of drug procurement and Medicare part D, and microstructure of financial markets. His current work applies game theory to the market for Internet advertisement and marketplace design.
Dr. Schwarz work appeared in scholarly journals including AER, RAND etc. His academic papers received attention in popular press. Business Week reported that as a result of publication of the paper by Dr. Schwarz and two of his students "Close-mouthed Google has opened up about AdWords since the three economists cracked its code" (March 6, 2006). Dr. Schwarz work was profiled in a number of other outlets including the front page of the Wall Street Journal.
- Title: Models for the compressible web
Silvio Lattanzi, La Sapienza University of Rome, Yahoo! Research Intern
Abstract: Graphs resulting from human behavior have been intensely studied in the past decade. In this talk we take our understanding of behavioral graphs a step further by studying the compressibility of web graph models. We first show that an empirical property of web graph —their compressibility — cannot be exhibited by well-known graph models. Then, inspired by the empirical evidence that the distribution of the lengths of edges of the web graphs follow a power law distribution, we develop a new model for web graphs and we show that it does exhibit compressibility, in addition to previously modeled web graph properties.
Silvio Lattanzi received his bachelor (2005) and master (2007) degree both with highest honors from the Computer Science department of Sapienza University of Rome. He is now a PhD student in Computer Science, his advisor is Alessandro Panconesi. Silvio joined Google Mountain View for two summer internships in 2008 and 2009, his host was D. Sivakumar. Silvio visited from February to April 2010 the University of Texas Austin where his host was Lorenzo Alvisi. He just finished his internship at Yahoo! Research in Santa Clara, his host being Ravi Kumar. Silvio's research interests are in social networks, information retrieval and randomized algorithms.