Provably Efficient Adaptive Scheduling for Parallel Jobs (2007)
He, Yuxiong, Hsu, Wen Jing, Leiserson, Charles E.
Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level...
Provably Efficient Adaptive Scheduling for Parallel Jobs (2007)
He, Yuxiong, Hsu, Wen Jing, Leiserson, Charles E.
Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level...
Programming with Exceptions in JCilk (2005)
Danaher, John S., Lee, I-Ting Angelina, Leiserson, Charles E.
JCilk extends the Java language to provide call-return semantics for multithreading, much as Cilk does for C. Java's built-in thread model does not support the passing of exceptions or return values...
Programming with Exceptions in JCilk (2005)
Danaher, John S., Lee, I-Ting Angelina, Leiserson, Charles E.
JCilk extends the Java language to provide call-return semantics for multithreading, much as Cilk does for C. Java's built-in thread model does not support the passing of exceptions or return values...
The JCilk Language for Multithreaded Computing (2005)
Danaher, John, Lee, I-Ting Angelina, Leiserson, Charles E.
JCilk extends the Java language to provide call-return semantics for multithreading, much as Cilk does for C. Java's built-in thread model does not support the passing of exceptions or return values...
On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs (2004)
Bender, Michael A., Fineman, Jeremy T., Gilbert, Seth, Leiserson, Charles E.
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two...
On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs (2004)
Bender, Michael A., Fineman, Jeremy T., Gilbert, Seth, Leiserson, Charles E.
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two...
On-the-Fly Maintenance of Series-Parallel Relationships in Fork-Join Multithreaded Programs (2004)
Michael A. Bender, Jeremy T. Fineman, Seth Gilbert, Charles E. Leiserson
A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two...
Adversarial Analyses of Window Backoff Strategies for Simple Multiple-Access Channels (2003)
Bender, Michael A., Farach-Colton, Martin, He, Simai, Kuszmaul, Bradley C., Leiserson, Charles E.
Backoff strategies have typically been analyzed by making statistical assumptions on the distribution of problem inputs. Although these analyses have provided valuable insights into the efficacy of...
Hardware Transactional Memory (2003)
Lie, Sean, Asanovic, Krste, Kuszmaul, Bradley C., Leiserson, Charles E.
This work shows how hardware transactional memory (HTM) can be implemented to support transactions of arbitrarily large size, while ensuring that small transactions run efficiently. Our...
A Media Player for Use in Distance Education (2003)
Huang, Kai, Leiserson, Charles E., Sarmenta, Luis F.G.
We have developed a media player for use in distance education. The player can incorporate several time-indexed sources, including video, audio, PowerPoint, and text index. We have converted all the...
A Media Player for Use in Distance Education (2003)
Huang, Kai, Leiserson, Charles E., Sarmenta, Luis F.G.
We have developed a media player for use in distance education. The player can incorporate several time-indexed sources, including video, audio, PowerPoint, and text index. We have converted all the...
Hardware Transactional Memory (2003)
Lie, Sean, Asanovic, Krste, Kuszmaul, Bradley C., Leiserson, Charles E.
This work shows how hardware transactional memory (HTM) can be implemented to support transactions of arbitrarily large size, while ensuring that small transactions run efficiently. Our...
Adversarial Analyses of Window Backoff Strategies for Simple Multiple-Access Channels (2003)
Bender, Michael A., Farach-Colton, Martin, He, Simai, Kuszmaul, Bradley C., Leiserson, Charles E.
Backoff strategies have typically been analyzed by making statistical assumptions on the distribution of problem inputs. Although these analyses have provided valuable insights into the efficacy of...
Transactions Everywhere (2003)
Kuszmaul, Bradley C., Leiserson, Charles E.
Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic...
Transactions Everywhere (2003)
Kuszmaul, Bradley C., Leiserson, Charles E.
Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic...
Thomas H. Cormen, Charles E. Leiserson
This thesis explores several issues that arise in the design and implementation of virtual-memory systems for data-parallel computing.
The Cilk System for Parallel Multithreaded Computing (2002)
Although cost-effective parallel machines are now commercially available, the widespread use of parallel processing is still being held back, due mainly to the troublesome nature of parallel...
Cilk: An Efficient Multithreaded Runtime System (2002)
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and...
Systolic Arrays for (VLSI). (2002)
Kung,H. T., Leiserson,Charles E.
A systolic system is a network of processors which rhythmically compute and pass data through the system. Physiologists use the work 'systole' to refer to the rhythmically recurrent contraction of...
A Layout for the Shuffle-Exchange Network. (2002)
Hoey,Dan, Leiserson,Charles E.
This paper describes a technique for producing a VLSI layout of the shuffle-exchange graph. It is based on the layout procedure which lays out a graph by bisecting the graph, recursively laying out...
Optimal Placement for River Routing. (2002)
Leiserson,Charles E., Pinter,Roy Y.
Programs for integrated circuit layout typically have two phases: placement and routing. The router should produce as efficient a layout as possible, but of course the quality of the routing depends...
Wafer-Scale Integration of Systolic Arrays, (2002)
Leighton,Frank Thomson, Leiserson,Charles E.
VLSI technologists are fast developing wafer-scale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind wafer-scale integration is to assemble an...
A Coherent VLSI Design Environment. (2002)
Glasser,Lance A., Leiserson,Charles E., Rivest,Ronald L.
This report covers the period from April 1, 1985 through September 30, 1985. The research discussed here is described in more detail in several published and unpublished reports cited. The CAD frame...
Randomized Routing on Fat-Trees, (2002)
Greenberg, Ronald I., Leiserson, Charles E.
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is...
Randomized Routing on Fat Trees. (2002)
Greenberg,Ronald I., Leiserson,Charles E.
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is...
Retiming Synchronous Circuitry. (2002)
Leiserson,Charles E., Saxe,James B.
This paper shows how the technique of retiming can be used to transform a given synchronous circuit into a more efficient circuit under a variety of different cost criteria. We model a circuit as a...
A Hyperconcentrator Switch for Routing Bit-Serial Messages (Extended Abstract), (2002)
Cormen,Thomas H., Leiserson,Charles E.
In highly parallel message routing networks, it is sometimes desirable to concentrate relatively few messages on many wires onto fewer wires. We have designed a VLSI chip for this purpose which is...
Communication-Efficient Parallel Graph Algorithms. (2002)
Leiserson,Charles E., Maggs,Bruce M.
Communication bandwidth is a resource ignored by most parallel random-access machine (PRAM) models. This paper shows that many graph problems can be solved in parallel, not only with polylogarithmic...
A Hyperconcentrator Switch for Routing Bit-Serial Messages. (2002)
Cormen,Thomas H., Leiserson,Charles E.
In highly parallel message routing networks, it is sometimes desirable to concentrate relatively few messages on many wires onto fewer wires. We have designed a VLSI chip for this purpose which is...
A Coherent VLSI Design Environment. (2002)
Dally, William J., Glasser, Lance A., Leiserson, Charles E.
The CAD frame Schema has been outfitted with a uniform protocol or aggregate data structures, which includes new control structures for examining and searching through the data structures. Circuit...
Using Cilk to Write Multiprocessor Chess Programs (2001)
Don Dailey, Charles E. Leiserson
This paper overviews the Cilk language, illustrating how Cilk supports the programming of parallel game-tree search and other chess mechanisms
Fast global synchronization provides simple, efficient solutions to many of the system problems of parallel computing. It achieves this by providing composition of both performance and correctness....
Algorithms for Data-Race Detection in Multithreaded Programs (2001)
Charles E. Leiserson, Arthur C. Smith
If two parallel threads access the same location and at least one of them performs a write, a race exists. The detection of races---a major problem in parallel debugging---is complicated by the...
On Retiming Synchronous Circuitry And Mixed-Integer Optimization (2001)
In this paper we investigate properties of retirning, a circuit transformation which preserves the behavior of the circuit as a whole. We present an algorithm which transforms a given combinational...
Portable High-Performance Programs (2001)
Charles E. Leiserson, Matteo Frigo
This dissertation discusses how to write computer programs that attain both high performance and portability, despite the fact that current computer systems have different degrees of parallelism,...
you design an efficient out-of-core iterative algorithm? These are the two questions answered in this thesis.
Cilk: An Efficient Multithreaded Runtime System (2001)
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and...
Executing Multithreaded Programs Efficiently (2001)
This thesis presents the theory, design, and implementation of Cilk (pronounced "silk") and Cilk-NOW. Cilk is a C-based language and portable runtime system for programming and executing...
Managing Storage for Multithreaded Computations (2001)
Robert D. Blumofe, Charles E. Leiserson
Multithreading has become a dominant paradigm in general purpose MIMD parallel computation. To execute a multithreaded computation on a parallel computer, a scheduler must order and allocate threads...
Debugging Multithreaded Programs that Incorporate User-Level Locking (2001)
Charles E. Leiserson, Arthur C. Smith, Andrew F. Stark
A multithreaded program with a bug may behave nondeterministically, and this nondeterminism typically makes the bug hard to localize. This thesis presents a debugging tool, the Nondeterminator-2,...
A new competitive analysis of randomized caching (Extended Abstract) (2000)
Ching Law, Charles E. Leiserson
We provide new competitive upper bounds on the performance of the memoryless, randomized caching algorithm RAND. Our bounds are expressed in terms of the inherent hit rate of the sequence of memory...
Algorithms for Data-Race Detection in Multithreaded Programs (1999)
Charles E. Leiserson, Arthur C. Smith, Guang-ien Cheng
If two parallel threads access the same location and at least one of them performs a write, a race exists. The detection of races---a major problem in parallel debugging---is complicated by the...
Cache-Oblivious Algorithms (1999)
Harald Prokop, Charles E. Leiserson, Arthur C. Smith
This thesis presents "cache-oblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache...
A Minicourse on Multithreaded Programming (1999)
Charles E. Leiserson, Harald Prokop
These notes contain two lectures that teach multithreaded algorithms using a Cilklike [7, 9, 11] model. These lectures were designed for the latter part of the MIT undergraduate class 6.046...
The Implementation of the Cilk-5 Multithreaded Language (1999)
Matteo Frigo, Charles E. Leiserson, Keith H. Randall
The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the...
Cilk: An Efficient Multithreaded Runtime System (1999)
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and...
Optimixing Synchronous Systems. (1998)
Leiserson,Charles E., Saxe,John B.
The complexity of integrated-circuit chips produced today makes it feasible to build inexpensive, special-purpose subsystems that rapidly solve sophisticated problems on behalf of a general-purpose...
Wafer-Scale Integration of Systolic Arrays, (1998)
Leighton,Frank Thomson, Leiserson,Charles E.
VLSI technologies are fast developing wafer-scale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind wafer-scale integration is to assemble an entire...
A Space-Efficient Algorithm for Finding the Connected Components of Rectangles in the Plane. (1998)
Leiserson,Charles E., Phillips,Cynthia A.
An algorithm is presented for determining the connectivity of a set of N rectangles in the plane, a problem central to avoiding aliasing in VLSI design rule checkers. Previous algorithms for this...
Retiming Synchronous Circuitry. (1998)
Leiserson, Charles E., Saxe, James B.
This paper describes a circuit transformation called retiming in which registers are added at some points in a circuit and removed from others in such a way that he functional behavior of the circuit...
The Organization of Permutation Architectures with Bussed Interconnections. (1998)
Kilian, Joe, Kipnis, Shlomo, Leiserson, Charles E.
This paper explores the problem of efficiently permuting data stored in VLSI chips in accordance with a predetermined set of permutations. By connecting chips with shared bus interconnections, as...
The Organization of Permutation Architectures with Bussed Interconnections. (1998)
Kilian, Joe, Kipnis, Sholomo, Leiserson, Charles E.
This paper explores the problem of efficiently permuting data stored in VLSI chips in accordance with a predetermined set of permutations. By connecting chips with shared bus interconnections, as...
VLSI Theory and Parallel Supercomputing. (1998)
Since its inception, VLSI theory has expanded in many fruitful and interesting directions. One major branch is layout theory which studies the efficiency with which graphs can be embedded in the...
Partial contents: Micron-Scale Display Technology; Virtual Memory for Data-Parallel Computing; The Optimal Synthesis of VLSI Array Architectures from Algorithmic Descriptions; Clay-1: A Distributed...
A Timing Analysis of Level-Clocked Circuitry, (1998)
Ishii, Alexander T., Leiserson, Charles E.
This paper presents an algorithm for verifying proper timing in VLSI circuits where latches are controlled by the levels (high or low) of the controlling clocks rather than the transitions (edges) of...
Cilk: An Efficient Multithreaded Runtime System (1998)
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and...
Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers (1998)
Charles E. Leiserson, Satish Rao, Sivan Toledo
When a numerical computation fails to fit in the primary memory of a serial or parallel computer, a so-called "out-of-core" algorithm, which moves data between primary and secondary memories, must be...
Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers (1998)
Charles E. Leiserson, Satish Rao, Sivan Toledo
When a numerical computation fails to fit in the primary memory of a serial or parallel computer, a so-called "out-of-core" algorithm, which moves data between primary and secondary memories, must be...
Dag-Consistent Distributed Shared Memory (1998)
Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, Keith H. Randall
We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag consistency in software for the Cilk...
Space-Efficient Scheduling of Multithreaded Computations (1998)
Robert D. Blumofe, Charles E. Leiserson
. This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single-processor...
Programming Parallel Applications in Cilk (1997)
Charles E. Leiserson, Aske Plaat
Cilk (pronounced "silk") is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or...
Optimizing Two-Phase, Level-Clocked Circuitry (1997)
Alexander T. Ishii, Charles E. Leiserson, Marios C. Papaefthymiou
We investigate two strategies for reducing the clock period of a two-phase, level-clocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates...
Programming Parallel Applications in Cilk (1997)
Charles E. Leiserson, Aske Plaat
Cilk (pronounced "silk") is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or...
A Comparison of Sorting Algorithms for the Connection Machine CM-2 (1997)
Guy E. Blelloch, C. Greg Plaxton, Charles E. Leiserson, Stephen J. Smith, Bruce M. Maggs, Marco Zagha
We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM-2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and...
A Comparison of Sorting Algorithms for the Connection Machine CM-2 (1997)
Guy E. Blelloch, C. Greg Plaxton, Charles E. Leiserson, Stephen J. Smith, Bruce M. Maggs, Marco Zagha
We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM-2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and...
A Comparison of Sorting Algorithms for the Connection Machine CM-2 (1997)
Guy E. Blelloch, C. Greg Plaxton, Charles E. Leiserson, Stephen J. Smith, Bruce M. Maggs, Marco Zagha
We have implemented three parallel sorting algorithms on the Connection Machine Supercomputer model CM-2: Batcher's bitonic sort, a parallel radix sort, and a sample sort similar to Reif and...
Randomized Routing on Fat-Trees (1996)
Ronald I. Greenberg, Charles E. Leiserson
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is...
Efficient Interconnection Schemes for VLSI and Parallel Computation (1996)
Charles E. Leiserson, Arthur C. Smith, Ronald I. Greenberg
This thesis is primarily concerned with two problems of interconnecting components in VLSI technologies. In the first case, the goal is to construct efficient interconnection networks for...
A Hyperconcentrator Switch for Routing Bit-Serial Messages (1996)
Charles E. Leiserson, Thomas H. Cormen
In highly parallel message routing networks, it is sometimes desirable to concentrate relatively few messages on many wires onto fewer wires. We have designed a VLSI chip for this purpose which is...
A Hyperconcentrator Switch for Routing Bit-Serial Messages (1996)
Charles E. Leiserson, Thomas H. Cormen
In highly parallel message routing networks, it is sometimes desirable to concentrate relatively few messages on many wires onto fewer wires. We have designed a VLSI chip for this purpose which is...
An Experimental Analysis of Parallel Sorting Algorithms (1996)
Guy E. Blelloch, C. Greg Plaxton, Charles E. Leiserson, Stephen J. Smith, Bruce M. Maggs, Marco Zagha
We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating...
An Experimental Analysis of Parallel Sorting Algorithms (1996)
Guy E. Blelloch, C. Greg Plaxton, Charles E. Leiserson, Stephen J. Smith, Bruce M. Maggs, Marco Zagha
We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating...
An Experimental Analysis of Parallel Sorting Algorithms (1996)
Guy E. Blelloch, C. Greg Plaxton, Charles E. Leiserson, Stephen J. Smith, Bruce M. Maggs, Marco Zagha
We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a machine by enumerating...
Dag-Consistent Distributed Shared Memory (1996)
Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, Keith H. Randall
We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag consistency in software for the Cilk...
An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms (1996)
Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, Keith H. Randall
In this paper, we analyze the performance of parallel multithreaded algorithms that use dag-consistent distributed shared memory. Specifically, we analyze execution time, page faults, and space...
Scheduling Multithreaded Computations by Work Stealing (1996)
Robert D. Blumofe, Charles E. Leiserson
This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind...
Cilk: An Efficient Multithreaded Runtime System (1996)
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, Yuli Zhou
Cilk (pronounced "silk") is a C-based runtime system for multithreaded parallel programming. In this paper, we document the efficiency of the Cilk work-stealing scheduler, both empirically and...
Dag-Consistent Distributed Shared Memory (1996)
Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, Keith H. Randall
We introduce dag consistency, a relaxed consistency model for distributed shared memory which is suitable for multithreaded programming. We have implemented dag consistency in software for the Cilk...
Space-Efficient Scheduling of Multithreaded Computations (1996)
Robert D. Blumofe, Charles E. Leiserson
This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single-processor...