Database Tech Blog: An Introduction to Teradata components

Teradata is a relational database management system. A Teradata system has many components (hardware and software). A Teradata system contains one or more nodes. A node is a term for a general-purpose processing unit under the control of a single operating system. The basic building block for a Teradata system, the node is where the processing occurs for the database.

A node contains a large number of hardware and software components as explained below…

PDE
The PDE (Parallel Database Extensions) software layer runs the operating system on each node. It was created by NCR to support the parallel environment.

System Disks
System disks are contained on the node used for the following:
• Operating system software
• Teradata software
• Application software
• System dump space
Teradata database tables are stored on disk arrays, not on the system disks.

Memory
Vprocs share a free memory pool within a node. A segment of memory is allocated to a vproc for its use, then returned to the memory pool for use by another vproc. The free memory pool is a collection of memory available to the node.

Application
An application is software that accesses the Teradata RDBMS. It can run on various platforms:
• Channel-attached client
• LAN-attached client
• Node

Vproc
A virtual processor or a vproc is a group of one or more software processes running under the operating system's multi-tasking environment:
• On the UNIX operating system, a vproc is a collection of software processes.
• On the Windows operating systems, a vproc is a single software process.

The two types of Teradata vprocs are:
• AMP (Access Module Processor)
• PE (Parsing Engine)

When vprocs communicate, they use BYNET hardware (on MPP systems), BYNET software, and PDE. The BYNET hardware and software carry vproc messages to and from a particular node. Within a node, the BYNET and PDE software deliver messages to and from the participating vprocs.

PE
PEs (Parsing Engines) are vprocs that receive SQL requests from the client and break the requests into steps. The PEs send the steps to the AMPs and subsequently return the answer to the client.

AMP
AMPs (Access Module Processors) are virtual processors (vprocs) that receive steps from PEs (Parsing Engines) and perform database functions to retrieve or update data. Each AMP is associated with one virtual disk (vdisk), where the data is stored. An AMP manages only its own vdisk, not the vdisk of any other AMP.

Vdisk (Virtual Disk)
A vdisk is the logical disk space that is managed by an AMP. Depending on the configuration, a vdisk may not be contained on the node; however, it is managed by an AMP, which is always a part of the node.
The vdisk is made up of 1 to 64 pdisks (user slices in UNIX or partitions in Windows NT, whose size and configuration vary based on RAID level). The pdisks logically combine to comprise the AMP's vdisk. Although an AMP can manage up to 64 pdisks, it controls only one vdisk. An AMP manages only its own vdisk, not the vdisk of any other AMP.

Channel Driver
Channel driver software is the means of communication between the PEs and applications running on channel-attached (mainframe) clients.

Gateway
The Teradata Gateway software is the means of communication between the PEs and applications running on a LAN-attached clients or A node in the system

BYNET
The BYNET (banyan network) is a combination of hardware and software that provides high performance networking between the nodes of a Teradata system. A dual-redundant, bi-directional, multi-staged network, the BYNET enables the nodes to communicate in a high speed, loosely-coupled fashion. It is based on banyan topology, a mathematically defined structure that has branches reminiscent of a banyan tree.
The BYNET is a high-speed interconnect (network) that enables multiple nodes in the system to communicate.

The BYNET hardware and software handle the communication between the vprocs.
• Hardware: The nodes of an MPP system are connected with the BYNET hardware, consisting of BYNET boards and cables.
• Software: The BYNET software is installed on every node. This BYNET driver is an interface between the
PDE software and the BYNET hardware.

SMP systems do not contain BYNET hardware. The PDE and BYNET software emulates BYNET activity in a single-node environment. The SMP implementation is sometimes called "boardless BYNET."

Teradata uses different types of messages for the highest efficiency:
• Broadcast: BYNET hardware delivers a message to all nodes in an MPP system. This is simulated in SMP systems using BYNET software. A single broadcast message consumes more BYNET bandwidth than a single point-to-point message, but sometimes a broadcast may be the most efficient for processing.
• Point-to-Point: A single message to a single vproc is carried over BYNET hardware to the destination node, and relayed to or from a particular vproc using PDE software. Point-to-point messages enable the BYNET to be scalable because as the system grows, the capacity to handle point-to-point messages also grows.
• Multicast: A message can be sent to several vprocs (multicast). The BYNET hardware first sends a broadcast message to all nodes. The PDE software on the receiving nodes determines which vprocs should receive the message and delivers it to those vprocs. Multicast messages give the system yet another method to deliver messages for the best performance

A Teradata node requires three distinct pieces of software.

Operating System :
UNIX or Windows. Or Linux

The Parallel Database Extensions (PDE) :
The Parallel Database Extensions (PDE) software layer was added to the operating system by NCR to support the parallel software environment.

Teradata RDBMS aka Trusted Parallel Application (TPA)
A Trusted Parallel Application (TPA) uses PDE to implement virtual processors (vprocs). The Teradata RDBMS is classified as a TPA. The four components of the Teradata TPA are:
• AMP
• PE
• Channel Driver
• Teradata Gateway

Access Module Processor (AMP)
AMPs (Access Module Processors) are virtual processors (vprocs) that receive steps from PEs (Parsing Engines) and perform database functions to retrieve or update data
AMP Worker Task Functions
The AWT functions in the AMP perform a number of operations, including:
• Locking tables to ensure data consistency.
• Executing AMP step operations such as select, insert, update, delete and sort.
• Joining tables as required.
• Executing end transaction steps as required to support multi-AMP operations.

AMP File System
The file system software accesses the data on the virtual disks. Each AMP uses the file system software to read from and write to the virtual disks.

AMP Console Utilities
The AMP software includes utilities to perform systems management functions such as:
• Configure and reconfigure the system
• Rebuild tables
• Reveal details about locks and space status

Parsing Engine (PE)
PEs (Parsing Engines) are vprocs that receive SQL requests from the client and break the requests into steps. The PEs send the steps to the AMPs and subsequently return the answer to the client.

PE Session Control
When you log on to the Teradata RDBMS through your application, the session control software
on the PE establishes that session. Session control also manages and terminates sessions on that PE.

PE Parser/Optimizer
The Parser interprets an incoming Teradata SQL request and checks the syntax. The Parser decomposes the request into AMP steps, using the Optimizer to determine the most efficient way to access the data on the virtual disks (vdisks).
The Optimizer develops the "least expensive" plan (in terms of time and system resources) to return the requested response set. Processing alternatives are evaluated, and the fastest alternative is chosen. The selected alternative is converted to executable steps that will performed by the AMPs. Then, the Parser sends the steps to the dispatcher.

PE Dispatcher
The dispatcher is responsible for a number of tasks, depending on the operation it is performing:
• Processing Requests: Controls the sequence in which the steps are executed and passes the steps to the AMPs through the BYNET.
• Processing Responses: After the AMPs process the steps, the dispatcher builds a response message and sends the response back to the user.

Channel Driver
Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. There is one Channel Driver per node.

Teradata Gateway
Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There is one Teradata Gateway per node.

User Access
Clients communicate with the Teradata RDBMS in three ways:
• Channel
• Network
• Node

Channel
Communication from applications on the mainframe goes through the Channel Driver.
Traffic and communications between client applications on the mainframe and the Teradata database are managed by the Teradata Director Program (TDP) software.

Network
Communication from applications on a network-attached client goes through the Teradata Gateway.
Traffic and communication between network-attached client applications and the Teradata database are managed by either: ODBC or CLIv2.

Node
If you install application software on a node, it will be treated like an application on a network-attached client. In other words, communications from applications on the node go through the Teradata Gateway.

To summarize, users can access data in the Teradata RDBMS through an application on both channel-attached and network-attached clients. Additionally, the node itself can act as a client. Teradata client software is installed on each client (channel-attached, network-attached, or node) and communicates with RDBMS software on the node.

Primary Indexes
The Teradata Database distributes the data in each table across all AMPs on a system. Since each amp is responsible for managing it’s own vdisk, it effectively means distributing data across disks automatically. This is one of the major differences between Teradata and other RDBMSs

Teradata assigns rows to AMPs based on the value of their primary index. The determination of which hash bucket, and hence which AMP the row is to be stored on, is made solely on the row hash value of its primary index. Each Teradata Database table must have a primary index.

Database Tech Blog

Sunday, November 4, 2007

An Introduction to Teradata components

No comments:

Post a Comment

Web Sites

Categories

Blog Archive