Project Description: goals and purpose of the project

INTRODUCTION

The current processors implemented in computer systems offer a better capability of handling information by using more than one core, however, applications have also grown in terms of more memory needed by them. With this resulting exchange, the need for improvements in the system architecture becomes important. One of the studied areas is the memory hierarchy. The optimization of cache utilization is indeed necessary to improve the CPU throughput. This project looks at two aspects of memory hierarchy: replacement policy and extending the write buffer technique to work for multicore.

Goal 1: Replacement Algorithm

The project involves the creation of probabilistic cache block replacement algorithm (for use by the cache controller) that would increase prediction accuracy, increase the hit ratio and in-turn decrease memory access latency. The algorithm should determine what cache block should be replaced and at what time such that the probability that the desired data needed by the processor is in the cache is relatively high at all times.

Goal 2: Write Buffer Case Study

Reconsidering cache optimization techniques use for single core, to be applied to multicore/CMPs. In this part of the project we study the Write Buffer.

Process used on the project

REPLACEMENT ALGORITHM

Technique used:
A technique is developed that can maintain or improve L2 hit rate and L1 access time with a better utilization of bandwidth is proposed. This technique aims to hold the higher amount of information in cache levels in order to minimize memory access.

WRITE BUFFER CASE STUDY

Technique used:
Our technique introduces the level marking in WBs. Comparing the values are done by marking 3 levels in the WB. We choose to divide in 3 levels to keep the hardware implementation simpler. The number of entries that the WB can send to the next level cache depends on the level of its entries, i.e. the number of entries. This means that the WB with entries in level 3 will be allowed to send more blocks than the WB with entries in Level 1. Thus this WB will occupy the port for a longer period of time.

In both techniques mentioned, we are using SPLASH-2 as our benchmark suite.

Conclusions

REPLACEMENT ALGORITHM

Results:
We analyzed pre-fetching techniques that can give us a better utilization of the bandwidth. The analysis of these techniques is being made using SESC simulator in order to produce significant results that can indeed prove that pre-fetching techniques can reduce bandwidth utilization.

WRITE BUFFER CASE STUDY

Results:
The new technique is implemented in SESC simulator and more statistics are also added to get more understanding of how this technique affects overall performance.

CONCLUSION

Benefits of this project:

This project introduced undergraduate students to computer architecture research.
Students learned teamwork.
Students learned technical writing and analysis of results.
Undergraduate students can indeed get involved in research of such advanced topic as multicore processor design.

Websites Developed and Publications

Web pages developed:

http://www-ee.ccny.cuny.edu/www/web/mzahran/creu07/

Papers or posters at conferences:

Students are currently writing the results in a paper to be submitted to one of the conferences.

Back to 2007-2008 Project Listing

Project:		Toward Better Memory Hierarchy for Chip-Multiprocessor
Student Researchers:		Lina Cordero, Heba Gabre, Stephany Soria
Advisor:		Mohamed Zahran
Institution:		City College of City University of New York
Webpage:		http://ees2cy.engr.ccny.cuny.edu/www/web/mzahran/creu07/index.html