II: RECOGNITION PROCESS AND QUERY OPTIMIZATION – part 3
Operations Basic Algebra Relational
To define the shape and boundaries database, a data model should include a set of operations to control the data. A collection of operations of the relational model is the standard relational algebra. These operations allow a user to determine the basis of the search request. Results of a search is a new relationship, where it may be formed from one or more relations. Therefore, algebraic operations generate new relationships that could be more useful to use algebraic operations are the same. Sequence of relational algebra operations forms a relation algebra expressions whose results are also in the form of a relation.
Relation algebra operations are generally divided into two groups. The first group, including a set of operations from a collection of mathematical theory that can be used for each relationship set into a set of tuples. A set of operations include UNION, INTERSECTION, SET DIFFERENCE and CARTESIAN PRODUCT.
The second group consists of operations that are specially made for relational databases. Included in the second group is the SELECT, PROJECT and JOIN.
=> Operation SELECT
SELECT operation is used to select a subset of the tuple-tuple of a relation that satisfy the choice. In general, the SELECT operation is shown by:
s <condition> (R)
where (sigma) is used to establish SELECT and <condition> is a boolean expression specified in the attribute-attribute of the relation R. Keep in mind that R generally is an expression of relational algebra that the result is a relation. The simplest expression of the SELECT operation is a name of a relation database. Relations that results from the SELECT operation has the same attributes, as R. Boolean expression specified in <condition> formed from a number of clauses with the form as follows:
<attribute name> <comparison operator> <constant value>, or
<attribute name> <comparison operator> <attribute name>
where <attribute name> is the name of an attribute of R, <comparison operator> is generally one of the operators like “=”,”< “,”³”,”£”, and “¹” and <constant value> is a constant value of the attribute domain. Clauses may have different relationships with the boolean operators AND, OR, and NOT to form a common condition of choice.
For example, to select a tuples of all employees who also worked on the department 4 and produce a salary of more than $ 25,000 per year, or work in department 5 and generate $ 30,000 per year, then SELECT operations that can be specified are:
s (DNO = 4 AND Salary > 25 000) OR (DNO = 5 AND Salary > 30000) (EMPLOYEE)
Generally, the results of a SELECT operation can be determined as follows. <selection condition> used freely for each tuple T in R. This is done by replacing each occurrence of an attribute A, in the condition with the choice of its value in tuple t [Ai]. If the condition evaluated is true, then the tuple t will be selected. All of the selected tuples will appear on the results of SELECT operations.
SELECT operator is unary. Therefore, the SELECT operator is used for a single relation. Moreover, the selection operation is used for each tuple individually. In fact, the conditions of the option can not use more than one tuple. The number of tuples of the relation generated is always less than or equal to the number of tuples in R. Therefore, |sc (R)| £ |R| for each condition C. Part of the tuples chosen by a selection condition referred to as the selectivity of the condition.
=> Operation PROJECT
PROJECT operation is used to select certain columns of the table and discard the other columns that are not needed. In general, the operation indicated by PROJECT:
p <attribute list> (R)
where (pi) is the symbol used to illustrate operation PROJECT and <attribute list> is a list of attributes of the relation R. Keep in mind that R generally is an expression of relational algebra that the result is a relation, which in the simplest case R is the name of a relation database. Results of operations PROJECT-attribute is specified in <attribute list> and in the same order as seen in the list.
=> Operation JOIN
JOIN operation denoted by “?” and is used to combine relations tuples of two relations into single tuples. Generally PROJECT operation on two relations R (A1, A2, … An) and S (B1, B2, … Bm) are indicated by:
R ? <join condition> (S)
The results of JOIN is a relation Q with n + m attributes Q (A1, A2, … An, B1, B2, … Bm). Q has one tuple for each combination of tuples, one from R and one from S. In a JOIN, only the combinations of tuples that satisfy the join condition will look at the results. JOIN condition is determined by the attribute-attribute of the relations R and S, and evaluation for each combination of tuples. Forms of JOIN conditions in general are:
<condition> AND <condition> AND … AND <condition>
where each condition is a form of Ai q Bj . Ai is an attribute of R and Bj is an attribute of S. Ai and Bj have the same domain, and q (theta) is one of the comparison operators {<,³, ¹, £, >, =}. JOIN operation with a join condition commonly called a theta join.
=> Set of operations
Set of operations used to combine elements of two groups in a variety of ways, including UNION, INTERSECTION and DIFFERENCE SET. These operations are binary operations, each of which is used for two sets of relations. Here are definitions of UNION, INTERSECTION, and SET DIFFERENCE on two relations R and S.
- UNION: The results of this operation is indicated by R È S,which is relation tuples of all existing in R or S or both. Tuple duplicates eliminated.
- INTERSECTION: The results of this operation is indicated by R Ç S, which is a relation that represents all tuples in R and S.
- SET DIFFERENCE: The results of this operation is indicated by R – S, which is a relation that all tuples are in R but not in S.
PRODUCT Cartesian operation which is also commonly called the CROSS PRODUCT or CROSS JOIN is denoted by “X” which is also a collection of binary operation. This operation is used to combine tuples from two relations in a joint model. Generally, the results of R (A1, A2, …, An) X S (B1, B2, …, Bm) is a relation Q with n + m of attribute-attribute R (A1, A2, …, An, B1, B2, … , Bm) in the sequence. The results of the relation Q has one tuple for each tuples of conditions. One of R and one from S.
Algorithms for Running Query Operations
Various kinds of query operations such as Select, Join, Project, and set operations can be implemented using one or more access different routines. Access routines are algorithms-algorithms that are used to access and categorize data in a database.
The following will explain the algorithm-the algorithm used to perform query operations on a database system. Algorithms that will be described below, among others, the algorithm is an algorithm to implement the SELECT operation, the algorithms to incorporate JOIN operation, the algorithm to implement the PROJECT operations and algorithms to implement the set of operations.
=> Algorithm to Implement SELECT Operation
There are various methods to implement the SELECT operation. And implementation of SELECT operations depends on the existence and type indexer and conjunctive and disjunctive combination of search criteria.
Implementation of the SELECT operation is subdivided into two, namely the implementation for simple selection (simple choice) and implementation for complex selection (complex choice). The following will discuss the algorithms for implementation both options.
==> Method for the Simple Selection
A number of algorithms search allows to select records from a file. Search algorithms are also known as file scan because the algorithm is doing the reading records from a file to search and retrieve records that satisfy a selection condition. If the search algorithm involves the use of the index, then the search index is called the index scan.
The following will be given some examples of simple selection of data is retrieved from the files in the COMPANY database.
Example 1: Salary> 28000
Example 2: SSN = ’123456789′
Example 3: DNO = 5
Example 1 and example 2 is taken from the EMPLOYEE file and example 3 is taken from the DEPARTMENT file.
Simple selection of example 1, example 2 and example 3 can be implemented using the methods:
- Linear Search (brute force), to retrieve each record in the file and test if the values ??attribute of the record meets the selection conditions.
- Binary Search, if the selection condition involves an equality comparison on a key attribute on the files sequentially. Then the binary search is more efficient than linear search can be used. An example is the SSN = ’123456789′, if the SSN is a request attribute for EMPLOYEE file.
- Hash Index or Primary Key, if the selection condition involves an equality comparison on a primary key attribute with index (hash key). For example, SSN = ’123456789 ‘using the primary index to retrieve records. Note that this condition is most returns a single record.
- Use the primary index to retrieve multiple records, if the comparison condition is >,> =, <, or <= in a field with a primary key index. For example, DNUMBER > 5 use the index to find records that satisfy the matching condition equation (DNUMBER = 5), then restore all subsequent records in the order file. For conditions DNUMBER < 5, restore all previous records.
- Using a clustering index to retrieve multiple records, if the selection condition involves an equality comparison on a non-key attribute with a clustering index. For example, DNO = 5 for EMPLOYEE file to use the index to retrieve all records that meet the conditions.
- Secondary (B+ -tree) index on a comparison of the equation, to get back a record single if the indexer’s field is a key or to retrieve multiple records if the indexer’s field instead of a key and can also be used for comparisons includes >,> =, <, or <=.
==> Methods for Complex Selection
If a condition of the SELECT operation is a conjunction condition, which is composed of several simple conditions associated with AND logic.
The following will be given some examples of the complex selection of the data retrieved from COMPANY database.
Example 4: DNO = 5 AND Salary> 30000 AND SEX = ‘F’
Example 5: ESSN = ’123456789 ‘AND PNO = 10
Example 4 is taken from the EMPLOYEE file. While samples 5 taken from the WORKS_ON file. Complex selection of example 4 and example 5 can be implemented using the methods:
- Conjunctive selection. Using one of the methods of access from simple selection to retrieve records from the first matching criterion, then using a set of results to check the rest of the criteria.
- Composite Index. If two or more attributes are needed in conditions of equality in the conditions conjunctive and a combined index is available on a combination of fields. For example, if an index is formed on the composite key (ESSN, PNO) of the WORKS_ON file for ESSN = ’123456789′ AND PNO = 10, then the index can be used directly.
- Intersection Record Pointers. If the indices of the secondary (other access roads) is available on more than one field required in simple conditions in conjunctive conditions and if the indices including the record pointer 3, then each index can be used to retrieve a collection of pointer record satisfy the conditions of each. Intersection of the set-pointer record collection gives pointers conjunctive records that satisfy the condition, which is then used to retrieve the records directly.
If a single condition determines the choice as Example 1, Example 2 and Example 3, it has to do is check whether an access road is available on an attribute that is included in condition. If an access path available, then the method is suitable for the access road is used. In contrast, brute-force approach of linear search method can be used. Optimization of query to a SELECT operation is required for most conditions conjunctive choice if more than one attribute is involved in conditions that have an access road. Optimizer must choose an access road to get the records at least with the most efficient way to estimate the different prices and choose the method with an estimated price of the least.
At the time of the optimizer to choose between the conditions in a simple multiple choice conjunctive condition, then the optimizer specifically consider the selectivity of each condition. Selectivity (s) is defined as the ratio of the number of records (tuples) that meet the conditions for the number of records on file (relation). That is a number between 0 (zero) and 1 (one), which means the selectivity 0 (zero) means there are no records that satisfy the conditions and 1 (one) means that all records meet the conditions. Although the precise selectivity-selectivity of all conditions may not be available, estimates of selectivity-selectivity is often stored in the DBMS catalog and used by the optimizer. For example, for an equality condition on the key attribute of the relation r -tupleï r (R) is the number of tuples ï, where ï r (R) ï(R), s = 1 / in relation r (R). For an equality condition on an ï / i) / ï r (R) ïattribute with i distinct values, s is estimated by (ïr(R)ï/ i) /ïr(R)ï or 1/i with the assumption that the records were divided among distinct value. According to these assumptions, the records ïr(R)ï/ i will satisfy an equality condition on this attribute. Generally, the number of records that satisfy a condition of choice with selectivity s would be expected to be ïr(R)ï * s. Estimates smaller, this is the first condition requires the use of higher to get back the records.
Comparison to a condition conjunctive choice is a disjunctive condition (where the simple conditions connected by logical OR more than AND) are more difficult to be processed and optimized. An example is:
sDNO = 5 OR Salary > 30000 OR SEX = ‘F’
With such a condition, a simple optimization can be done, because the records that satisfy the conditions are disjunctive union (merging) from records that satisfy the conditions of the individual. In fact, if there is any one of these conditions do not have an access road, then forced to use brute force approach of linear search. Only if a road is in any condition that can be optimized with the choice of getting back the records that satisfy each condition or record his id, and then define the union operation to eliminate the same conditions.
A DBMS provides several methods that have been discussed previously and in particular several additional methods. Query optimizer should choose the one appropriate method to execute each operation in a SELECT query. This optimization using formulas to estimate prices. In the end, the optimizer will choose the access method to estimate the lowest price.
=> Algorithm for Implementing the JOIN Operation
JOIN operation is one of the many other operations that take in performing query processing. Algorithms are considered to form the JOIN operation is R ? a=b S, where a and b are attribute-attribute matching domains of R and S, respectively. A number of methods available to implement the JOIN operation is as follows:
- Nested Loop (brute force). For each record t in R (outer loop), get back the record s of S (inner loop) and do the tests for two records that satisfy the JOIN condition t [A] = s [B].
- Access Structures (Indexes) or single-loop join. If an index (hash key) are available for one or two attributes join (W of S), get back to each record t in R, once, at a time (single loop) and then using the access to retrieve all the records s of S that satisfy s [B] = t [A].
- Sort-Merge Join. If the records of the R and S are physically sorted by the value of the attribute-attribute join of A and B respectively, then the join can be implemented in many ways possible and efficient. Both files are read simultaneously in order the join attribute, matching the records that have the same values ??for A and B. If the files are not sequential, it will be sorted in advance by using external sorting.
- Hash Join. Records from the files R and S, both combined in the same hash file, use the functions in the same hash on the join attribute-attribute A of R and B of S as a hash key. At first, a single pass through the files with fewer records (exemplified by R) combines the record-the record into a hash file for those records are divided into R-bucket hash bucket. In the second stage, a single pass through the other files (S) then merge each record of his to check the appropriate bucket, and the record is combined with all the records that match of R in the bucket. This illustrates that the simplification of the smaller size of the two files is inserted into the bucket-bucket in memory right after the first stage.
=> Algorithm for Implementing PROJECT Operation and Set of Operation
A PROJECT operation p <attribute list> (R) will be implemented if <attribute list> including a key of the relation R, since in this case the result of the operation of the PROJECT will have tuples number the same as R, but only with the value for attributes in <attribute list> in each tuple. If <attribute list> not include a key of R, then the same tuples must be eliminated. Elimination is usually done by sorting the results of operations and then eliminate the same tuples, which appear in sequence after sorting. Hashing can also be used to eliminate records the same way each record is divided and inserted into a bucket in the hash file in memory. And then be checked again to determine whether the records are already in the bucket. If the record is entered is the same record, the record is not inserted into the bucket.
A set of operations such as UNION, INTERSECTION, SET DIFFERENCE and CARTESIAN PRODUCT sometimes difficult to do. Usually, CARTESIAN PRODUCT operation RXS is more difficult to do as a result of this operation is a record for each combination of rows R and S. If R has n records and attribute j and S has m records and k attributes, then record the results of the second relation is n * m + k j records and attributes. Therefore, surgery should be avoided PRODUCT Cartesian and replaced with other operations are the same for query optimization.
Three collection of other operations such as UNION, INTERSECTION and SET DIFFERENCE is used only for relationships that match union, which has the same number attribute and domains attribute the same. Way that is usually used to carry out these operations is to use variations of the sort-merge techniques, namely two relations are sorted on the attribute and the same after reading sorted performed once in each relation that meets to produce results. For example the UNION operation can be done with (R È S), with scanning (reading) and merging (merging) the two files have the same sequence and if the same tuple is available in two relations, then selected only one tuple to be stored in result of the merger. For the INTERSECTION operation, R Ç S, then it will be stored on the results that combined, only for records that appear in both relations.
Hashing also be used to implement UNION, INTERSECTION and SET DIFFERENCE. One table is divided and the other is used to check the appropriate partition. S, then the first partition of R records to hash files.ÈFor example, to carry out R Then for each record S, examination to check if a similar record of R found in the bucket.


