3D Network-on-Chip with on-Chip DRAM: An Empirical Analysis for Future Chip Multiprocessor
 AMD, "The amd opteron 6000 series platform," May 2010,
 L. Benini and G. D. Micheli, "Networks on chips: A new soc paradigm,"
IEEE Computer, vol. 35, no. 1, pp. 70-78, January 2002.
 S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz,
D. Finan, P. Iyer, A. Singh, T. Jacob, S. Jain, S. Venkataraman,
Y. Hoskote, and N. Borkar, "An 80-tile 1.28tflops network-on-chip in
65nm cmos," in Solid-State Circuits Conference, 2007. ISSCC 2007.
Digest of Technical Papers. IEEE International, Feb. 2007, pp. 98-589.
 Intel, "Single-chip cloud computer," May 2010,
 ÔÇöÔÇö, "Intel core i7-980x processor extreme edition," May 2010,
 S. I. Association, "The international technology
roadmap for semiconductors (itrs)," 2007,
 B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin,
"Scaling the bandwidth wall: challenges in and avenues for cmp scaling,"
in Proceedings of the 36th annual international symposium on Computer
architecture, June 2009, pp. 371-382.
 A. Weldezion, Z. Lu, R. Weerasekera, and H. Tenhunen, "3-d memory
organization and performance analysis for multi-processor network-onchip
architecture," in 3D System Integration, 2009. 3DIC 2009. IEEE
International Conference on, 28-30 2009, pp. 1 -7.
 G. H. Loh, "3d-stacked memory architectures for multi-core processors,"
in ISCA -08: Proceedings of the 35th Annual International Symposium
on Computer Architecture. Washington, DC, USA: IEEE Computer
Society, 2008, pp. 453-464.
 D. Sylvester and K. Keutzer, "Getting to the bottom of deep submicron,"
in Computer-Aided Design, 1998. ICCAD 98. Digest of Technical
Papers. 1998 IEEE/ACM International Conference on, Nov 1998, pp.
 T. C. Xu, A. W. Yin, P. Liljeberg, and H. Tenhunen, "A study of 3d
network-on-chip design for data parallel h.264 coding," in Proceedings
of the 27th Norchip Conference, November 2009.
 G. L. Loi, B. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood, and
K. Banerjee, "A thermally-aware performance analysis of vertically
integrated (3-d) processor-memory hierarchy," in DAC -06: Proceedings
of the 43rd annual Design Automation Conference. New York, NY,
USA: ACM, 2006, pp. 991-996.
 M. Tremblay and S. Chaudhry, "A third-generation 65nm 16-core
32-thread plus 32-scout-thread cmt sparc processor," in ISSCC 2008,
February 2008, pp. 82-83.
 IBM, "Ibm power 7 processor," in Hot chips 2009, August 2009.
 T. Shyamkumar, M. Naveen, A. J. Ho, and J. N. P., "Cacti 5.1," HP
Labs, Tech. Rep. HPL-2008-20.
 U. of Catania, "Noxim, an open network-on-chip simulator,"
 S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, "The splash-
2 programs: Characterization and methodological considerations," in
Proceedings of the 22nd International Symposium on Computer Architecture,
June 1995, pp. 24-36.
 P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg,
J. Hogberg, F. Larsson, A. Moestedt, and B. Werner, "Simics: A full
system simulation platform," Computer, vol. 35, no. 2, pp. 50-58,
 Intel, "Intel core i7 processor extreme edition and intel
core i7 processor datasheet, volume 1," December 2008,
 D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey,
M. Mattina, C.-C. Miao, J. Brown, and A. Agarwal, "On-chip interconnection
architecture of the tile processor," Micro, IEEE, vol. 27, no. 5,
pp. 15 -31, sept.-oct. 2007.
 T. C. Xu, P. Liljeberg, and H. Tenhunen, "A study of through silicon
via impact to 3d network-on-chip design," in Proceedings of the 2010
International Conference on Electronics and Information Engineering
(ICEIE 2010), August 2010.
 H. Global, "Ddr 2 memory controller ip core for fpga and asic," June
 H. Sullivan and T. R. Bashkow, "A large scale, homogeneous, fully distributed
parallel machine," in Proceedings of the 4th annual symposium
on Computer architecture, March 1977, pp. 105-117.
 C. Kim, D. Burger, and S. W. Keckler, "An adaptive, non-uniform cache
structure for wire-delay dominated on-chip caches," in ACM SIGPLAN,
October 2002, pp. 211-222.
 A. Patel and K. Ghose, "Energy-efficient mesi cache coherence with
pro-active snoop filtering for multicore microprocessors," in Proceeding
of the thirteenth international symposium on Low power electronics and
design, August 2008, pp. 247-252.
 H.-S. Wang, X. Zhu, L.-S. Peh, and S. Malik, "Orion: a powerperformance
simulator for interconnection networks," in Proceedings of
the 35th Annual IEEE/ACM International Symposium on Microarchitecture,
November 2002, pp. 294-305.