structures, and that are backward
compatible, if not with the binaries,
with the programming model.
What are we doing now? A look at
the ISCA 2007 conference program
provides a good overview of the type
of research being done in our community. A survey of papers published
in that year reveals the following: 18
papers focused on multicore (eight
core and memory design, six transactional programming, four on-chip interconnect); six papers were focused
on single-core devices and/or applications, six papers were focused on
special-purpose or streaming/media
devices, four papers were focused on
power reduction and three were in
the general area of “beyond CMOS”
Figure 3 extends this data out for
the last seven years of ISCA. This
data extends the work of Hill, 45 who
tracked papers published in ISCA by
category from 1973–2001. That data
showed a precipitous rise and fall of
interest in multiprocessor research,
while data from the last seven years
depicts a renewed and vigorous multiprocessor research environment.
the most exciting time
In my lifetime, this is the most exciting time for computer architecture
research; indeed, people far older and
wiser than me46 contend this is the
most exciting time for architecture
since the invention of the computer.
What makes it exciting is that architecture is in the unique position of being
at the center of the future of computer
science and the IT industry. Innovations in architecture will impact everything from education to determining
who are the new winners and losers in
the IT business. Central to this excitement for me as an academic, is there is
no real clear way to proceed. Multicore
devices are being sold, and parts of the
software ecosystem will utilize them,
but the research and product space is
far more fluid and open to new ideas
now than ever before. Thus, while we
are central to the future directions of
computer science, we really lack a clear
vision for how to proceed. What could
be better than that?
References
1. Agarwal, V., Hrishikesh, M.S., Keckler, S. W., and Burger,
D. Clock rate versus IPC: The end of the road for
conventional microarchitectures. SIGARCH Comput.
Archit. News 28, 2, (2000), 248–259.
2. Austin, T.M. Diva: A reliable substrate for deep
submicron microarchitecture design. Micro. 00 196,
1999.
3. Borch, E. Tune, E., Manne, S., and Emer, J. Loose loops
sink chips. In Proceedings of the Eighth International
Symposium on High-Performance Computer
Architecture. Feb. 2-6, 2002, 299–310.
4. Bracy, A., Prahlad, P., and Roth, A. Dataflow
mini-graphs: Amplifying superscalar capacity and
bandwidth. In Proceedings of the 37th Annual IEEE/
ACM International Symposium on Microarchitecture.
IEEE Computer Society, Washington, D.C., 2004,
18– 29.
5. Brooks, D., Tiwari, V., and Martonosi, M. Wattch: A
framework for architectural-level power analysis and
optimizations. SIGARCH Comput. Archit. News 28, 2,
(2000), 83– 94.
6. Brooks, D.M., Bose, P., Schuster, S.E. Jacobson, H.,
Kudva, P. N. Buyuktosunoglu, A., Wellman, J-D.,
Zyuban, V., Gupta, M., and Cook, P. W. Power-aware
microarchitecture: Design and modeling challenges
for next-generation microprocessors. IEEE Micro 20, 6
(2000), 26– 44.
7. Burger, D., and Austin, T.M. The simplescalar tool set,
version 2.0. SIGARCH Comput. Archit. News 25, 3
(1997), 13– 25.
8. Ceze, L., Tuck, J., Montesinos, P., and Torrellas, J.
Bulksc: Bulk enforcement of sequential consistency.
SIGARCH Comput. Archit. News 35, 2 (2007),
278–289.
9. Cristal, A., Ortega, D., Llosa, J., and Valero, M. Out-of-order commit processors. hpca, 00: 48, 2004.
10. Dagum, R., Menon, L. Openmp: An industry standard
api for shared-memory programming. Computational
Science and Engineering 5, 11, (Jan-Mar 1998) 46– 55.
11. Draper, J., Chame, J., Hall, M., Steele, C., Barrett, T.,
LaCoss, J., Granacki, J., Shin, J., Chen, C., Kang, C. W.,
Kim, I., and Daglikoca, G. The architecture of the diva
processing-in-memory chip. In Proceedings of the 16th
International Conference on Supercomputing., ACM,
N Y, 2002, 14– 25.
12. Eden, T., Mudge, A. N., The YAGS branch prediction
scheme. In Proceedings of the 31st Annual ACM/IEEE
International Symposium on Microarchitecture (Nov.
30–Dec. 2, 1998), 69– 77.
13. Eeckhout, L., Stougie, B., Bosschere, K.D., and John,
L.K. Control flow modeling in statistical simulation
for accurate and efficient processor design studies.
SIGARCH Comput. Archit. News 32, 2 (2004), 350.
14. Ernst, D., Hamel, A., and Austin, T. Cyclone: A
broadcast-free dynamic instruction scheduler with
selective replay. In Proceedings of the 30th Annual
International Symposium on Computer Architecture
(June 9– 11, 2003), 253–262.
15. Fisher, J.A. Very long instruction word architectures
and the eli-512. In Proceedings of the 10th Annual
International Symposium on Computer Architecture
(Los Alamitos, CA, 1983). IEEE Computer Society
Press, 140–150.
16. Hill, M.D. Multiprocessors should support simple
memory-consistency models. IEEE Computer 31, 8
(1998), 28– 34.
17. Hinton, G., Upton, M., Sager, D., Boggs, D., Carmean,
D., Roussel, P., Chappell, T., Fletcher, T., Milshtein, M.,
Sprague, M., Samaan, S., and Murray., R. A 0.18-m
CMOS ia- 32 processor with a 4-ghz integer execution
unit. IEEE Journal of Solid-State Circuits, 36, 11 (Nov.
2001),1617–1627.
18. Jouppi, N.P. Improving direct-mapped cache
performance by the addition of a small fully
associative cache and prefetch buffers. SIGARCH
Comput. Archit. News, 18, 3a (1990), 364–373.
19. Kang, Y., Huang, W., Yoo, S.-M., Keen, D., Ge, Z., Lam,
V., Torrellas, J., and Pattnaik, P. Flexram: Toward an
advanced intelligent memory system. ICCD 00:192,
1999.
20. Kogge, P., Sunaga, T., Miyataka, H., Kitamura, K.,
and Retter, E. Combined DRAM and logic chip for
massively parallel systems. arvlsi 0: 4, 1995.
21. Lam, M.S., and Wilson, R.P. Limits of control flow
on parallelism. In Proceedings of the 19th Annual
International Symposium on Computer Architecture.
ACM, N Y, 1992, 46– 57.
22. McNairy, D., Soltis, C. Itanium 2 processor
microarchitecture. IEEE Micro 23, 2 (Mar.–Apr. 2003),
44– 55,
23. Moore, G. Cramming more components onto
integrated circuits. Electronics (Apr. 1965), 114–117.
24. Mukherjee, S., Kontz, M., and Reinhardt, S., Detailed
design and evaluation of redundant multithreading
alternatives. In Proceedings of the 29th Annual
International Symposium on Computer Architecture
(2002), 99– 110.
25. Mukherjee, S., Weaver, C., Emer, J., Reinhardt, S., and
Austin, T., A systematic methodology to compute
the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the
36th Annual IEEE/ACM International Symposium on
Microarchitecture (Dec. 3-5, 2003), 29– 40.
26. Nagarajan, R., Sankaralingam, K., Burger, D., and
Keckler, S. W. A design space evaluation of grid
processor architectures. In Proceedings of the
34th Annual ACM/IEEE International Symposium
on Microarchitecture. IEEE Computer Society,
Washington, D.C. 2001, 40– 51.
27. Nielsen, L.S., and Niessen, C. Low-power operation
using self-timed circuits and adaptive scaling of the
supply voltage. IEEE Trans. Very Large Scale Integr.
Syst., 2, 4 (1994), 391–397.
28. Oskin, M., Chong, F. T., and Farrens, M. HlS: Combining
statistical and symbolic simulation to guide
microprocessor designs. SIGARCH Comput. Archit.
News 28, 2 (2002), 71– 82.
29. Oskin, M., Chong, F. T., and Sher wood, T. Active
pages: A computation model for intelligent memory.
SIGARCH Comput. Archit. News 26, 3 (1998),
192–203.
30. Pai, V.S., Ranganathan, P., Adve, S.V., and Harton, T. An
evaluation of memory consistency models for shared-memory systems with ilp processors. SIGPLAN
Notices 31, 9 (1996), 12– 23.
31. Palacharla, S. Complexity-effective superscalar
processors. Ph. D. thesis, 1998.
32. Patterson, D., Anderson, T., Cardwell, N., Fromm, R.,
Keeton, K., Kozyrakis, C., Thomas, R., and Yelick, K. A
case for intelligent RAM. IEEE Micro 17, 2 (Mar.-Apr.
1997), 34– 44.
33. Patterson, D., Keutzer, K., Asanovic, K., Yelick, K.,
and Bodik, R. The landscape of parallel computing
research: A view from Berkeley. 2007.
34. Pollack, F., Keynote: New microarchitecure challenges
in the coming generations of CMOS process
technologies, 1999.
35. Sherwood, T., Perelman, E., Hamerly, G., and Calder,
B. Automatically characterizing large scale program
behavior. In Proceedings of the 10th International
Conference on Architectural Support for Programming
Languages and Operating Systems. ACM, N Y, 2002,
45– 57.
36. Swanson, S., Michelson, K., Schwerin, A., and Oskin, M.
Wavescalar. In Proceedings of the 36th Annual IEEE/
ACM International Symposium on Microarchitecture.
IEEE Computer Society, Washington, D. C., 291.
37. Taylor, M. B., Kim, J., Miller, J., Wentzlaff, D., Ghodrat,
F., Greenwald, B., Hoffman, F., Johnson, P., Lee, J.-W.,
Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N.,
Strumpen, V., Frank, M., Amarasinghe, S., and Agar wal,
A. The raw microprocessor: A computational fabric
for software circuits and general-purpose programs.
IEEE Micro 22, 2 (2002), 25– 35.
38. Valero, M., Gonzalez, A., Topham, N.P., and Cruz,
C. Multiple-banked register file architectures. isca
00:316, 2000.
39. Vijaykumar, T., Pomeranz, I., and Cheng, K., Transient-fault recovery using simultaneous multithreading.
In Proceedings of the 29th Annual International
Symposium on Computer Architecture (2002), 87– 98.
40. Wawrzynek, J., Patterson, D., Oskin, M., Lu, S.-L.,
Kozyrakis, C., Hoe, J. C., Chiou, D., and Asanovi, K.
RAMP: Research accelerator for multiple processors.
IEEE Micro 27, 2 (2007), 46– 57.
41. Wunderlich, R., Wenisch, T., Falsafi, B., and Hoe. J.
Smarts: Accelerating microarchitecture simulation via
rigorous statistical sampling. In Proceedings of the
30th Annual International Symposium on Computer
Architecture (June 9-11, 2003), 84– 95.
42. Ye, W., Vijaykrishnan, N., Kandemir, M., and Ir win, M. J.
The design and use of simplepower: A cycle-accurate
energy estimation tool. In Proceedings of the 37th
Conference on Design Automation. ACM, N Y (2000),
340–345
43. http://www.news.com/2100-1006 3-6119618.html.
44. http://www.itrs.net/.
45. http://pages.cs.wisc.edu/ markhill/mp2001.html.
46. Personal communication with Burton Smith.
Mark Oskin ( oskin@cs.washington.edu) is an associate
professor in the Department of Computer Science and
Engineering at University of Washington, Seattle.