Tuesday, June 4, 2019

Shared Memory MIMD Architectures

divided reposition MIMD ArchitecturesIntroduction to MIMD Architectures six-fold instruction period, multiple information stream (MIMD) machines rescue a number of act uponors that function asynchronously and independently. At either time, contrary accomplishors whitethorn be executing disparate instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as estimator-aided design/computer-aided manufacturing, simulation, modeling, and as conversation switches. MIMD machines female genitalsful be of either dual-lane reminiscence or distributed computer shop categories. These bodifications are based on how MIMD processors coming stock. Shared remembrance machines may be of the bus-based, ex xded, or hierarchical type. Distributed memory machines may assume hypercube or mesh interconnectedness schemes.MIMDA type of multiprocessor architecture in which several instruction cycles may be active at any habituated t ime, for each one independently fetching instructions and operands into multiple processing units and operating on them in a concurrent fashion. Acronym for multiple-instruction-stream.Bottom of Form(Multiple Instruction stream Multiple Data stream) A computer that can process two or more independent sets of instructions simultaneously on two or more sets of data. Computers with multiple CPUs or wiz CPUs with dual cores are examples of MIMD architecture. Hyperthreading also results in a certain degree of MIMD performance as well. Contrast with SIMD.In computing, MIMD (Multiple Instruction stream, Multiple Data stream) is a technique active to achieve parallelism. Machines using MIMD have a number of processors that function asynchronously and independently. At any time, different processors may be executing different instructions on different pieces of data. MIMD architectures may be used in a number of application areas such as computer-aided design/computer-aided manufacturing, simulation, modeling, and as communication switches. MIMD machines can be of either shared memory or distributed memory categories. These classifications are based on how MIMD processors vex memory. Shared memory machines may be of the bus-based, extended, or hierarchical type. Distributed memory machines may have hypercube or mesh interconnection schemes.Multiple Instruction Multiple DataMIMD architectures have multiple processors that each execute an independent stream (sequence) of machine instructions. The processors execute these instructions by using any accessible data rather than being forced to operate upon a virtuoso, shared data stream. Hence, at any given time, an MIMD system can be using as many different instruction streams and data streams as there are processors.Although software processes executing on MIMD architectures can be synchronized by move data among processors through an interconnection network, or by having processors examine data in a shared memory, the processors autonomous execution makes MIMD architectures asynchronous machines.Shared Memory Bus-basedMIMD machines with shared memory have processors which share a common, central memory. In the simplest form, all processors are attached to a bus which connects them to memory. This setup is called bus-based shared memory. Bus-based machines may have an otherwise bus that enables them to communicate directly with angiotensin-converting enzyme another. This additional bus is used for synchronization among the processors. When using bus-based shared memory MIMD machines, only a itsy-bitsy number of processors can be supported. There is contention among the processors for access to shared memory, so these machines are limited for this reason. These machines may be incrementally expanded up to the point where there is too much contention on the bus.Shared Memory ExtendedMIMD machines with extended shared memory attempt to avoid or reduce the contention among processors for shared memory by subdividing the memory into a number of independent memory units. These memory units are connected to the processsors by an interconnection network. The memory units are treated as a unified central memory. One type of interconnection network for this type of architecture is a crossbar slip network. In this scheme, N processors are linked to M memory units which requires N times M switches. This is not an economically feasible setup for connecting a erect number of processors.Shared Memory HierarchicalMIMD machines with hierarchical shared memory use a hierarchy of buses to give processors access to each others memory. Processors on different boards may communicate through inter nodal buses. Buses support communication between boards. We use this type of architecture, the machine may support over a railyard processors.In computing, shared memory is memory that may be simultaneously accessed by multiple platforms with an intent to fork out communication among them or a void bare(a) copies. Depending on context, programs may run on a single processor or on multiple separate processors. Using memory for communication inner a single program, for example among its multiple threads, is generally not referred to as shared memoryIN HARDWAREIn computer hardware, shared memory refers to a (typically) large block of random access memory that can be accessed by several different central processing units (CPUs) in a multiple-processor computer system.A shared memory system is relatively easy to program since all processors share a single view of data and the communication between processors can be as ready as memory accesses to a same location.The issue with shared memory systems is that many CPUs need fast access to memory and will likely save up memory, which has two complicationsCPU-to-memory connection dos a bottleneck. Shared memory computers cannot scale very well. Most of them have ten or fewer processors. save up coherence Whenever one cache is u pdated with information that may be used by other processors, the change needs to be reflected to the other processors, otherwise the different processors will be working with incoherent data (see cache coherence and memory coherence). Such coherence protocols can, when they work well, provide extremely high-performance access to shared information between multiple processors. On the other hand they can sometimes become overloaded and become a bottleneck to performance.The alternatives to shared memory are distributed memory and distributed shared memory, each having a similar set of issues. See also Non-Uniform Memory Access.IN SOFTWAREIn computer software, shared memory is eitherA method of inter-process communication (IPC), i.e. a way of exchanging data between programs running at the same time. One process will create an area in RAM which other processes can access, orA method of conserving memory billet by directing accesses to what would ordinarily be copies of a piece of dat a to a single instance instead, by using virtual memory mappings or with explicit support of the program in question. This is most often used for shared libraries and for Execute in Place.Shared Memory MIMD ArchitecturesThe distinguishing feature of shared memory systems is that no matter how many memory blocks are used in them and how these memory blocks are connected to the processors and address spaces of these memory blocks are unified into a global address space which is completely visible to all processors of the shared memory system. Issuing a certain memory address by any processor will access the same memory block location. However, concord to the physical brass of the logically shared memory, two main types of shared memory system could be distinguishedPhysically shared memory systemsVirtual (or distributed) shared memory systemsIn physically shared memory systems all memory blocks can be accessed uniformly by all processors. In distributed shared memory systems the memo ry blocks are physically distributed among the processors as local memory units.The terzetto main design issues in increasing the scalability of shared memory systems areOrganization of memoryDesign of interconnection networksDesign of cache coherent protocolsCache CoherenceCache memories are introduced into computers in order to bring data closer to the processor and hence to reduce memory latency. Caches widely accepted and employed in uniprocessor systems. However, in multiprocessor machines where several processors require a copy of the same memory block.The maintenance of consistency among these copies raises the so-called cache coherence problem which has terce causesSharing of writable dataProcess migrationI/O activityFrom the point of view of cache coherence, data structures can be divided into three classesRead-only data structures which never cause any cache coherence problem. They can be replicated and placed in any number of cache memory blocks without any problem.Shar ed writable data structures are the main source of cache coherence problems.Private writable data structures pose cache coherence problems only in the case of process migration.There are several techniques to maintain cache coherence for the critical case, that is, shared writable data structures. The applied methods can be divided into two classeshardware-based protocolssoftware-based protocolsSoftware-based schemes unremarkably introduce some restrictions on the cachability of data in order to prevent cache coherence problems.Hardware-based ProtocolsHardware-based protocols provide general solutions to the problems of cache coherence without any restrictions on the cachability of data. The price of this approach is that shared memory systems must(prenominal) be extended with sophisticated hardware mechanisms to support cache coherence. Hardware-based protocols can be classified according to their memory update insurance, cache coherence policy, and interconnection scheme. Two ty pes of memory update policy are applied in multiprocessors write-through and write-back. Cache coherence policy is divided into write-update policy and write-invalidate policy.Hardware-based protocols can be further classified into three basic classes depending on the nature of the interconnection network applied in the shared memory system. If the network efficiently supports broadcasting, the so-called snoopy cache protocol can be advantageously exploited. This scheme is typically used in single bus-based shared memory systems where consistency commands (invalidate or update commands) are broadcast via the bus and each cache snoops on the bus for incoming consistency commands. monumental interconnection networks like multistage networks cannot support broadcasting efficiently and therefore a mechanism is needed that can directly forward consistency commands to those caches that contain a copy of the updated data structure. For this purpose a directory must be maintained for each b lock of the shared memory to administer the actual location of blocks in the possible caches. This approach is called the directory scheme.The third approach tries to avoid the application of the costly directory scheme but still provide high scalability. It proposes multiple-bus networks with the application of hierarchical cache coherence protocols that are generalized or extended versions of the single bus-based snoopy cache protocol.In describing a cache coherence protocol the following definitions must be givenDefinition of possible separates of blocks in caches, memories and directories.Definition of commands to be performed at various read/write hit/miss actions.Definition of state transitions in caches, memories and directories according to the commands.Definition of transmission routes of commands among processors, caches, memories and directories.Software-based ProtocolsAlthough hardware-based protocols offer the fastest mechanism for maintaining cache consistency, they i ntroduce a significant extra hardware complexity, particularly in scalable multiprocessors. Software-based approaches represent a good and competitive compromise since they require nearly negligible hardware support and they can lead to the same small number of invalidation misses as the hardware-based protocols. All the software-based protocols rely on compiling program assistance.The compiler analyses the program and classifies the variables into four classesRead-onlyRead-only for any number of processes and read-write for one processRead-write for one processRead-write for any number of processes.Read-only variables can be cached without restrictions. Type 2 variables can be cached only for the processor where the read-write process runs. Since only one process uses type 3 variables it is sufficient to cache them only for that process. Type 4 variables must not be cached in software-based schemes. Variables demonstrate different behavior in different program sections and hence th e program is usually divided into sections by the compiler and the variables are categorized independently in each section. More than that, the compiler generates instructions that control the cache or access the cache explicitly based on the classification of variables and code instalmentation. Typically, at the end of each program section the caches must be invalidated to ensure that the variables are in a consistent state before starting a new section.shared memory systems can be divided into four main classesUniform Memory Access (UMA) MachinesContemporary uniform memory access machines are small-size single bus multiprocessors. Large UMA machines with hundreds of processors and a switching network were typical in the early design of scalable shared memory systems. Famous representatives of that class of multiprocessors are the Denelcor HEP and the NYU Ultracomputer. They introduced many innovative features in their design, some of which even today represent a significant miles tone in parallel computer architectures. However, these early systems do not contain either cache memory or local main memory which turned out to be necessary to achieve high performance in scalable shared memory systemsNon-Uniform Memory Access (NUMA) MachinesNon-uniform memory access (NUMA) machines were designed to avoid the memory access bottleneck of UMA machines. The logically shared memory is physically distributed among the processing nodes of NUMA machines, leading to distributed shared memory architectures. On one hand these parallel computers became extremely scalable, but on the other hand they are very sensitive to data allocation in local memories. Accessing a local memory segment of a node is much faster than accessing a remote memory segment. Not by chance, the structure and design of these machines resemble in many slipway that of distributed memory multicomputers. The main difference is in the organization of the address space. In multiprocessors, a global addre ss space is applied that is uniformly visible from each processor that is, all processors can transparently access all memory locations. In multicomputers, the address space is replicated in the local memories of the processing elements. This difference in the address space of the memory is also reflected at the software level distributed memory multicomputers are programmed on the priming coat of the message-passing paradigm, while NUMA machines are programmed on the basis of the global address space (shared memory) principle.The problem of cache coherency does not appear in distributed memory multicomputers since the message-passing paradigm explicitly handles different copies of the same data structure in the form of independent messages. In the shard memory paradigm, multiple accesses to the same global data structure are possible and can be accelerated if local copies of the global data structure are maintained in local caches. However, the hardware-supported cache consistency schemes are not introduced into the NUMA machines. These systems can cache read-only code and data, as well as local data, but not shared modifiable data. This is the distinguishing feature between NUMA and CC-NUMA multiprocessors. Accordingly, NUMA machines are closer to multicomputers than to other shared memory multiprocessors, while CC-NUMA machines look like real shared memory systems.In NUMA machines, like in multicomputers, the main design issues are the organization of processor nodes, the interconnection network, and the possible techniques to reduce remote memory accesses. Two examples of NUMA machines are the Hector and the Cray T3D multiprocessor.Sources usedwww.wikipedia.comhttp//www.developers.net/tsearch?searchkeys=MIMD+architecturehttp//carbon.cudenver.edu/galaghba/mimd.htmlhttp//www.docstoc.com/docs/2685241/Computer-Architecture-Introduction-to-MIMD-architectures

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.