CRA-W

FINAL REPORT

Project: Processor Allocation to Independent Tasks in Low Dimensional Mesh Systems
Student Researchers: Erin B. Greening, Jessica (Tze-Yun) Lin
Advisors: Bonnie E. Melhart, Craig A. Morgenstern
Institution: Texas Christian University

Index

Goals and Purpose for the Project
Process Used in Completing the Research
Conclusions and Results Achieved

Our research project has been concerned with the allocation of processors to independent tasks in a two-dimensional mesh system. We have focused on heuristics to manage the tradeoff between the competing goals of optimizing for high processor utilization versus optimizing for low inter-task bandwidth contention.

Goals and purpose for the project

Our project goal was four-fold:
* to examine the performance of existing allocation strategies,
* to propose, develop, and test new allocation strategies to try to raise processor utilization while lowering task completion time,
* to determine how well new and existing strategies scale as the mesh and/or communication requirements increase, and
* to provide a realistic and successful research experience for undergraduate students in Computer Science.

The purpose of the project was to better understand the problems of such mesh systems and to contribute new strategies for allocation and scheduling that improve the overall system efficiency in the presence of communication between processors allocated to independent tasks.

Process used in completing the research

We began weekly meetings last fall semester. The advisors distributed a few essential papers and we discussed those together at first and made plans for the next weeks activities. We acquired the Procsimity simulator, developed at the University of Oregon, and began running experiments to duplicate the findings we had read about. This served to acquaint us with the workings of the simulator and helped us to understand the technical papers.

As we ran more experiments, we examined the simulation results and strategized about new allocation algorithms that might take advantage of the strong points of previous ones without the weak points they exhibited. For each new strategy, a series of experiments was designed to compare it to old strategies. As we began to consider the added problems of communication between the processors, we added experimental measurements for the effects of this to our simulation studies. We considered various messaging patterns and ways to simulate that would allow us to run the experiments efficiently. (Some experiments use several days of machine time to finish.)

Conclusions and results achieved

Our studies included comparisons of existing allocation strategies and a new strategy we developed that is a variation of the contiguous Fast Frame Sliding (FFS). Our Closest Shape (CS) strategy allows allocations the FFS would not by considering submeshs of a different size and shape than the request. For example, suppose a task requests a 3x5 submesh of 15 processors when there are no such submesh blocks available. If there are free submeshes of 2x8 or 4x4, FFS will not allocate, but CS will. None of the contiguous strategies can match the utilization when scattered (non-contiguous) allocation is allowed, however. Intuitively, utilization will be favored with scattered allocations if there is no communication between the processors.

Our further studies looked at the same strategies when communication is considered among the processors. Scattered allocations we studied included random and multiple-buddy allocation strategies. The Procsimity simulator allowed us to simulate various communication levels between the processors allocated to a specific task. Intuitively, high levels of communication will be handled better with contiguous allocations. It seems reasonable that one would want to at least limit the length of the path between communicating processors, though it is certainly possible to pass messages through as many other processors as it takes to deliver the message, including those allocated to other tasks. For our experiments we used wormhole routing, an XY-routing scheme that routes a message in the horizontal direction first and then in the vertical direction to its destination. The whole message follows along from processor to processor in a pipeline fashion.

We have simulated these strategies extensively for 16x16 and 32x32 meshes and have preliminary results for 64x64 mesh systems. In all cases, the utilization is better for the scattered allocation multiple-buddy. As the mesh size grows larger, the contiguous strategies improve their utilization for heavy traffic, while the scattered allocation maintains the same high utilization. Closest Shape has the highest of the contiguous strategies. The response time and finish time of contiguous vs. non-contiguous fluctuated depending on the severity of messaging traffic and the mesh size. The observed trend is that as the mesh system gets larger, there is little difference in performance between the best of the contiguous methods and the best of the non-contiguous methods. That is, contiguous methods can be competitive with non-contiguous methods. However, more studies are needed to determine the benefits and trade-offs of each type of allocation strategy. We hope to continue the studies with these strategies for the largest mesh systems that we can simulate. There is still much to explore for ways to improve the contiguous vs. non-contiguous allocations when processors require communication.

Our most useful outcomes are the new ideas for areas to consider with further heuristics and experimentation. The funding from the CREW program made it possible to get other funding from our university for equipment that we will be able to use solely for this project next year.

Jessica Lin is starting graduate school in Computer Science at the University of Texas at Arlington, while Erin Greening is starting her senior year of studies at TCU and plans to continue her involvement with this research project.