likely to become increasingly prevalent in data analysis oriented fields like applied statistics, machine learning, and
computer science. It is our belief that the SM and its ilk will
come to be seen as relatively simple building blocks for the
enormous and powerful hierarchical models of tomorrow.
Source code and example usages of the SM are available
at http://www.sequencememoizer.com/. A lossless compressor built using the SM can be explored at http://www.
We wish to thank the Gatsby Charitable Foundation and
Columbia University for funding.
1. Bartlett, N., Pfau, D., Wood, F.
Forgetting counts: Constant memory
inference for a dependent hierarchical
Pitman–Yor process. In 27th
International Conference on Machine
Learning, to appear (2010).
2. Bengio, Y., Ducharme, R., Vincent, P.,
Jauvin, C. A neural probabilistic
language model. J. Mach. Learn. Res.
3 (2003), 1137–1155.
3. Chen, S. F., Goodman, J. An empirical
study of smoothing techniques for
language modeling. Comput. Speech
Lang. 13, 4 (1999), 359–394.
4. Cleary, J. G., Teahan, W.J. Unbounded
length contexts for PPM. Comput. J.
40 (1997), 67–75.
5. Doucet, A., de Freitas, N., Gordon, N.J.
Sequential Monte Carlo Methods in
Practice. Statistics for Engineering and
Information Science. Springer-Verlag,
New York, May 2001.
10. Goldwater, S., Griffiths, T.L., Johnson, M.
Interpolating between types and
tokens by estimating power law
generators. In Advances in Neural
Information Processing Systems 18
(2006), MIT Press, 459–466.
11. MacKay, D.J.C., Peto, L.B. A hierarchical
Dirichlet language model. Nat. Lang.
Eng. 1, 2 (1995), 289–307.
12. Mahoney, M. Large text compression
benchmark. URL: http://www.
13. Mnih, A., Yuecheng, Z., Hinton, G.
Improving a statistical language
model through non-linear prediction.
Neurocomputing 72, 7–9 (2009),
14. Pitman, J. Coalescents with multiple
collisions. Ann. Probab. 27 (1999),
15. Pitman, J., Yor, M. The two-parameter
Poisson–Dirichlet distribution derived
from a stable subordinator. Ann.
Probab. 25 (1997), 855–900.
16. Robert, C. P., Casella, G. Monte Carlo
Statistical Methods. Springer Verlag,
17. Shannon, C.E. A mathematical theory
of communication. Bell Syst. Tech. J.
(reprinted in ACM SIGMOBILE Mobile
Computing and Communications
Review 2001) (1948).
Frank Wood ( firstname.lastname@example.org.
edu), Department of Statistics, Columbia
University, New York.
Jan Gasthaus ( j.gasthaus@gatsby.
ucl.ac.uk), Gatsby Computational
Neuroscience Unit, University College
Cédric Archambeau (cedric.
email@example.com), Xerox Research
Centre Europe, Grenoble, France.
Lancelot James ( firstname.lastname@example.org),
Department of Information, Systems,
Business, Statistics and Operations
Management, Hong Kong University of
Science and Technology, Kowloon, Hong Kong.
Yee Whye Teh ( email@example.com),
Gatsby Computational Neuroscience Unit,
University College London, England.
© 2011 ACM 0001-0782/11/0200 $10.00
You’ve come a long way.
Share what you’ve learned.
ACM has partnered with MentorNet, the award-winning nonprofit e-mentoring network in engineering,
science and mathematics. MentorNet’s award-winning One-on-One Mentoring Programs pair ACM
student members with mentors from industry, government, higher education, and other sectors.
• Communicate by email about career goals, course work, and many other topics.
• Spend just 20 minutes a week - and make a huge difference in a student’s life.
• Take part in a lively online community of professionals and students all over the world.
Make a difference to a student in your field.
Sign up today at: www.mentornet.net
Find out more at: www.acm.org/mentornet
MentorNet’s sponsors include 3M Foundation, ACM, Alcoa Foundation, Agilent Technologies, Amylin Pharmaceuticals, Bechtel Group Foundation, Cisco
Systems, Hewlett-Packard Company, IBM Corporation, Intel Foundation, Lockheed Martin Space Systems, National Science Foundation, Naval Research
Laboratory, NVIDIA, Sandia National Laboratories, Schlumberger, S.D. Bechtel, Jr. Foundation, Texas Instruments, and The Henry Luce Foundation.