DEC MasPar

MasPar Company History
MasPar Computer Corporation was formed in 1988 by a DEC Vice-President; Jeff Kalb. The company retained an association with DEC. It was quite small company with a base of around 30 machines and 100 staff. It produced a single range of SIMD (Single Instruction Multiple Data) machines, the MP-1 series, which consisted of five models. The MP-1 was commercially available in 1990. The range supported a UNIX operating system, C and Fortran compilers, an advanced graphical programming environment and other tools. Some researchers in Kalb's division were building a supercomputer based on the Goodyear MPP (massively parallel processor) supercomputer. The DEC researchers enhanced the architecture by:

Making the processor elements to be 4-bit instead of 1-bit.
Increasing the connectivity of each processor element to 8 neighbors from 4.
Adding a global interconnect for all of the processing elements, which was a triple-redundant switch which was easier to implement than a full crossbar switch.

After Digital declined to commercialize the research project, Kalb decided to start a company to sell MPP mini supercomputers. In 1990, the first generation product MP-1 was delivered. In 1992, the follow-on MP-2 was shipped. The company shipped more than 200 systems. There was no MP-3.

MasPar exited the computer hardware business in June 1996, halting all hardware development and transforming itself into a new data mining software company called NeoVista Software. NeoVista was acquired by Accrue Software in 1999, which in turn sold the division to JDA Software in 2001.

DEC's MasPar Offering
Digital Equipment Corporation (DEC), as a major computer manufacturer entered the massively parallel processing business with the launch of the MasPar Computer Corporation computer in DEC colours as the DECmpp 12000. The company configured the DECmpp 12000 as a line of eight field-upgradeable models with from 1,024 to 16,384 processors. Peak performance in the full configuration was claimed at 26,000 MIPS and 1.3 GFLOPS. The machines come with a colour DECstation 5000 as the front end, and Ultrix and DECnet software licenses. The company also announced a new DECmpp Disk Array System that provided parallel access to large data files, made up of 720Mb disks that could be added four at a time for a maximum of 24 in a single cabinet and for a total storage of 11.2Gb. They offered a sustained I/O rate of 9 Mbytes per second.

MPP Architecture
MasPar is unique in being a manufacturer of SIMD supercomputers (as opposed to vector machines). In this approach, a collection of ALU's listen to a program broadcast from a central source. The ALUs can do their own data fetch, but are all under control of a central Array Control Unit. There is a central clock. The emphasis is on communications efficiency, and low latency. The MasPar architecture was designed to scale and balance processing, memory, and communication.

MasPar’s first design, the MP-1 was based directly on the research done at DEC. The Maspar MP-1 PE and the later binary-compatible MasPar MP-2 PE were based on custom CMOS chips, designed in-house and fabricated by various vendors such as Hewlett Packard and Texas Instruments. The Array Control Unit (ACU) handled instruction fetch. It was a load-store architecture. The MasPar architecture is Harvard in a broad sense. The ACU implements a microcoded instruction fetch, but achieves a RISC-like 1 instruction per clock. The Arithmetic units, ALUs with data fetch capability, were implemented 32 to a chip. Each ALU is connected in a nearest neighbour fashion to 8 others. The edge connections were brought off-chip. In this scheme, the perimeters can be toroid-wrapped. Up to 16,384 units could be connected within the confines of a cabinet. A global router, essentially a cross-bar switch, provided external I/O to the processor array.

Each PE contained a 4-bit ALU, a 1-bit logic unit, a 64/16 (mantissa/exponent) unit for handling floating point. Each PE also had 48 32-bit registers. There were designed as a 32-bit RISC processor, which means, that with the 4-bit ALU, any ALU operation would take at least 8 cycles. This was considered acceptable in a MPP type system. Each custom VLSI CMOS MP-1 chip contained 32 individual PE’s. They were made on a 1.6u process and contained 400,000 transistors. Clock speed was 12.5MHz and this allowed the chip to be air cooled. A 1024 PE processor board (32 chips) dissipated only 50 Watts.

The MP-2 was fundamentally the same as the MP-1, with significant upgrades; the ALU had been upgraded to a full 32-bits. This did two things, it greatly increased performance, and it did so with no change in code, the MP-2 was binary compatible with the MP-1. This compatibility was very important when most of the application software running on a MasPar was client written. A client could upgrade their performance, with no change to their software. The MP-1’s 48 32-bit registers was upgraded to 64, 32-bit registers. The MP-2 chips were made on a 1 micron process, and contained 950,000 transistors. They dissipated a similar amount of heat at the same, 12.5 MHz, clock.

Guides

Platform Specifications
Architecture	The computer consists of Processing Elements (PEs) connected in a 2-D lattice. The computer is driven by a VAX front-end workstation. High speed I/O devices can be attached, and direct access to the DEC memory bus is possible.
Processors	The PEs were custom designed by MasPar. They were RISC-like and grouped into clusters of 16 on the chips. Each cluster had the PE memories and connections to the communications network. Instructions were issued by the Array Control Unit, which were a RISC-like processor based on standard chips from Texas Instruments.
Operating System	The MasPar computers were front-ended by a host machine, commonly, a VAX workstation. The computational engine of the MasPar does not have an operating system of it's own. A DEC 3100 workstation running Ultrix (DEC's Unix) was provided the programmer with an interface to the MasPar. The workstation provided users with a windowing programming environment, networking capability, I/O device access, etc. When MasPar programs were executed, the user process ran on the workstation while the parallel code was automatically passed to the DPU for execution. Programs could be compiled and debugged on the workstation using MPPE (MasPar Programming Environment).
Programming Languages	MasPar licensed a version of the Fortran conversion package VAST-2 from Pacific-Sierra Research Corporation. This product converted from scalar Fortran 77 source code to parallel MPF source. The conversion could also be done in reverse. Full IEEE single- and double-precision floating point was supported. A "C" compiler was also available.
Performance	1.2 GFLOPS (2.6 GIPS) for a 16,384 PE computer
Data Transfer	Nearest neighbour 18 GBytes/second for a 16,384 PE computer and 1300 MBytes/second using the global router.
Scalability	From 1,024 to 16,384 PEs.
Fault Tolerance	MasPar claimed a mean time between failures of over 8,000 hours. No fault tolerant features were built-in.
Application Fit	The machine is marketed as a Grand Challenge machine due to its high reliability. Typical applications are DNA sequence matching, weather prediction and image processing. Applications that perform the same calculations over thousands of data points (e.g. image pixel processing) are ideally suited for SIMD computers.

MasPar Topology
The MP-2 PE chip contained 32 processor elements, each a full 32-bit ALU with floating point, registers, and a barrel shifter. Only the instruction fetch feature was removed, and placed in the ACU. The PE design was literally replicated 32 times on the chip. The chip was designed to interface to DRAM, to other processor array chips, and to communication router chips.

Each ALU, called a PE slice, contained sixty four 32 bit registers that were used for both integer and floating point. The registers were both bit and byte addressable. The floating point unit handled single precision and double precision arithmetic on IEEE format numbers. Each PE slice contained two registers for data memory address and the data. Each PE also had two one-bit serial ports, one for inbound and one for outbound communication to its nearest neighbour. There is no cache for the ALUs. Cache is not required, due to the memory interface operating at commensurate speed with the ALU data accesses. The ALUs do not implement memory management for data memory. The ACU uses demand paged virtual memory for the instruction memory. The direction of communication was controlled globally. The PEs also had inbound and outbound paths to a global router for I/O. A broadcast port allowed a single instance of data to be "promoted" to parallel data. Alternately, global data could be 'or-ed' to a scalar result.

The serial links supported 1 Mbyte/second bit-serial communication that allowed coordinated register-register communication between processors. Each processor had its own local memory, implemented in DRAM. No internal memory was included on the processors. Microcoded instruction decode was used.

The 32 PEs on a chip were clustered into two groups sharing a common memory interface, or M-machine, for access. A global scoreboard kept track of memory and register usage. The path to memory was 16 bits wide. Both big and little Endian formats were supported. Each processor had its own 64 KByte of memory. Both direct and indirect data memory addressing was supported. The chip was implemented in 1.0-micrometer, two-level, metal CMOS, dissipated 0.8 watt and was packaged in a 208-pin PQFP. A relatively low clock rate of 12.5 MHz was used.

MasPar MP-2 Board, 1024 PEs, 32 PE chips and 3 router chips along with 192 DRAM chips, Source: cpushack.com

Sales
The MP-1 systems could have from 1024 processors (32 32 bit-PE chip on one board) to 16,384 PE’s, using 16 boards. By 1992 MasPar had sold over 130 MP-1 systems, at an initial price of $150,000 for the 1024 PE system. In 1992 the MP-2 was announced.

The price of a MP-2 base system was $260,000 and $1.6 million for a fully equipped 16k system. The MP-1 continued to be sold, at a reduced price of $75,000, which wasn’t much more then that of a high end workstation at the time. DEC worked as a second source for MasPar selling the systems as well as maintaining them. By 1996 demand for MPP supercomputers had dropped off and MasPar exited the hardware market. In the six years of sales only over 200 systems were sold.

Guides

Document Name	Order Part No.	Publication Date	Domain
DECmpp12000/Sx Model 100 Hardware Service Manual	EK–DECAC–SM	September 1992	HW
DECmpp12000/Sx Model 100 Hardware Installation Guide	EK–DECAC–IG	September 1992	HW
DECmpp12000/Sx Model 100 Parallel Disk Array Reference Manual	EK–DECAB–RM	September 1992	HW
DECmpp12000/Sx Model 100 Parallel VME Reference Manual	EK–DECAB–PM	September 1992	HW

Guides

Document	Author	Publication Date	Publisher
The Design of the MasPar MP-1: A Cost Effective Massively Parallel Computer	John R. Nickolls	1990	IEEE
By using CMOS VLSI and replication of components effectively, massively parallel computers can achieve extraordinary performance at low cost. Key issues are how the processor and memory are partitioned and replicated and how interprocessor communication and I/O are accomplished. This paper describes the design and implementation of the MasPar MP-1, a general purpose massively parallel computer system that achieves peak computation rates beyond a billion floating point operations per second, yet it is priced as a minicomputer.

Sources:
The CPU Shack Museum has a good description of the MasPar architecture.

MANUEL DUARTE

Navigation

DEC MasPar