A multidimensional index clusters entries so as to exploit ―nearness‖ in multidimensional space.
Keeping track of entries and maintaining a balanced index structure presents a challenge! Consider entries:
<11, 80>, <12, 10> <12, 20>, <13, 75>
Motivation for Multidimensional Indexes
Ø Spatial queries (GIS, CAD).
Find all hotels within a radius of 5 miles from the conference venue.
Find the city with population 500,000 or more that is nearest to Kalamazoo, MI. Find all cities that lie on the Nile in Egypt.
Find all parts that touch the fuselage (in a plane design).
Ø Similarity queries (content-based retrieval).
Given a face, find the five most similar faces.
Ø Multidimensional range queries.
50 < age < 55 AND 80K < sal < 90K Drawbacks
Ø An index based on spatial location needed.
One-dimensional indexes don‘t support multidimensional searching efficiently.
Hash indexes only support point queries; want to support range queries as well. Must support inserts and deletes gracefully.
Ø Ideally, want to support non-point data as well (e.g., lines, shapes).
Ø The R-tree meets these requirements, and variants are widely used today.
Ø To provide such database functions as indexing and consistency, it is desirable to store multimedia data in a database
Rather than storing them outside the database, in a file system
Ø The database must handle large object representation.
Ø Similarity-based retrieval must be provided by special index structures.
Ø Must provide guaranteed steady retrieval rates for continuous-media data.
Multimedia Data Formats
Ø Store and transmit multimedia data in compressed form JPEG and GIF the most widely used formats for image data.
MPEG standard for video data use commonalties among a sequence of frames to achieve a greater degree of compression.
Ø MPEG-1 quality comparable to VHS video tape.
Stores a minute of 30-frame-per-second video and audio in approximately 12.5 MB
Ø MPEG-2 designed for digital broadcast systems and digital video disks; negligible loss of video quality.
Compresses 1 minute of audio-video to approximately 17 MB.
Ø Several alternatives of audio encoding
MPEG-1 Layer 3 (MP3), RealAudio, WindowsMedia format, etc.
Ø Most important types are video and audio data.
Ø Characterized by high data volumes and real-time information-delivery requirements.
Data must be delivered sufficiently fast that there are no gaps in the audio or video. Data must be delivered at a rate that does not cause overflow of system buffers. Synchronization among distinct data streams must be maintained
video of a person speaking must show lips moving synchronously with the audio
Ø Video-on-demand systems deliver video from central video servers, across a network, to terminals
must guarantee end-to-end delivery rates
Ø Current video-on-demand servers are based on file systems; existing database systems do not meet real-time response requirements.
Ø Multimedia data are stored on several disks (RAID configuration), or on tertiary storage for less frequently accessed data.
Ø Head-end terminals - used to view multimedia data
PCs or TVs attached to a small, inexpensive computer called a set-top box.
Examples of similarity based retrieval
Ø Pictorial data: Two pictures or images that are slightly different as represented in the database may be considered the same by a user.
e.g., identify similar designs for registering a new trademark.
Ø Audio data: Speech-based user interfaces allow the user to give a command or identify a data item by speaking.
e.g., test user input against stored commands.
Ø Handwritten data: Identify a handwritten data item or command stored in the database