Current Trends in Distributed
Databases
Current trends in distributed
data management are centered on the Internet, in which petabytes of data can be
managed in a scalable, dynamic, and reliable fashion. Two important areas in
this direction are cloud computing and peer-to-peer data-bases.
1. Cloud Computing
Cloud computing is the paradigm of offering
computer infrastructure, platforms, and software as services over the Internet.
It offers significant economic advantages by limiting both up-front capital
investments toward computer infrastructure as well as total cost of ownership.
It has introduced a new challenge of managing petabytes of data in a scalable
fashion. Traditional database systems for managing enterprise data proved to be
inadequate in handling this challenge, which has resulted in a major
architectural revision. The Claremont report by a group of senior
database researchers envisions that future research in cloud computing will
result in the emergence of new data management architectures and the interplay
of structured and unstructured data as well as other developments.
Performance costs associated with partial
failures and global synchronization were key performance bottlenecks of
traditional database solutions. The key insight is that the hash-value nature
of the underlying datasets used by these organizations lends itself naturally
to partitioning. For instance, search queries essentially involve a recursive
process of mapping keywords to a set of related documents, which can benefit
from such a partitioning. Also, the partitions can be treated independently,
thereby eliminating the need for a coordinated commit. Another problem with tra-ditional
DDBMSs is the lack of support for efficient dynamic partitioning of data, which
limited scalability and resource utilization. Traditional systems treated
system metadata and application data alike, with the system data requiring
strict consistency and availability guarantees. But application data has
variable requirements on these characteristics, depending on its nature. For
example, while a search engine can afford weaker consistency guarantees, an
online text editor like Google Docs, which allows concurrent users, has strict
consistency requirements.
The metadata of a distributed database system
should be decoupled from its actual data in order to ensure scalability. This
decoupling can be used to develop innovative solutions to manage the actual
data by exploiting their inherent suitability to partitioning and using
traditional database solutions to manage critical system metadata. Since
metadata is only a fraction of the total data set, it does not prove to be a
performance bottleneck. Single object semantics of these implementations
enables higher tolerance to nonavailability of certain sections of data. Access
to data is typically by a single object in an atomic fashion. Hence,
transaction support to such data is not as stringent as for traditional
databases. There is a varied set of cloud services available today, including
application services (salesforce.com), stor-age services (Amazon Simple Storage
Service, or Amazon S3), compute services (Google App Engine, Amazon Elastic
Compute Cloud—Amazon EC2), and data services (Amazon SimpleDB, Microsoft SQL
Server Data Services, Google’s Datastore). More and more data-centric
applications are expected to leverage data services in the cloud. While most
current cloud services are data-analysis intensive, it is expected that
business logic will eventually be migrated to the cloud. The key challenge in
this migration would be to ensure the scalability advantages for multi-ple
object semantics inherent to business logic. For a detailed treatment of cloud
computing, refer to the relevant bibliographic references in this chapter’s
Selected Bibliography.
2. Peer-to-Peer Database Systems
A peer-to-peer database system (PDBS) aims to
integrate advantages of P2P (peer-to-peer) computing, such as scalability,
attack resilience, and self-organization, with the features of decentralized
data management. Nodes are autonomous and are linked only to a small number of
peers individually. It is permissible for a node to behave purely as a
collection of files without offering a complete set of traditional DBMS
functionality. While FDBS and MDBS mandate the existence of mappings between
local and global federated schemas, PDBSs attempt to avoid a global schema by
providing mappings between pairs of information sources. In PDBS, each peer
potentially models semantically related data in a manner different from other
peers, and hence the task of constructing a central mediated schema can be very
challenging. PDBSs aim to decentralize data sharing. Each peer has a schema
associated with its domain-specific stored data. The PDBS constructs a semantic
path of mappings between peer schemas. Using this path, a peer to
which a query has been submitted can obtain information from any relevant peer
connected through this path. In multidatabase systems, a separate global query
processor is used, whereas in a P2P system a query is shipped from one peer to
another until it is processed completely. A query submitted to a node may be
forwarded to others based on the mapping graph of semantic paths. Edutella and
Piazza are examples of PDBSs. Details of these systems can be found from the
sources mentioned in this chapter’s Selected Bibliography.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.