Search to diminish the communication cost by producing

Search strategy or Inventory Algorithm is the most dominant
component of a query optimizer. In this section of paper, we tried to discover
the research on Query Optimization Techniques based on a number of Optimization
Algorithms used in Distributed Database Queries. SDD-1 was one of the early
Distributed Database System. SSD-1 was designed for slow wide area network and
made use of semi joins to diminish the communication cost by producing static
unchangeable query plans without considering the horizontal or vertical data
fragments of distributed database. One of the other Distributed Database System
was the R* system. it was used by DDB query optimizer.  It also generated static unchangeable query in
faster networks but they neither employed semi joins nor handled horizontal or
vertical fragmentation. One of the other Distributed Database System was
Distributed–INGRES. It was able to generate dynamic QEP at run-time on faster
communication networks. Semi Join for the reduction of query size was not used
but the system was able to handle horizontal fragmentation without replication.
Momentous amount of work has been carried out on producing optimal solutions
for Join Order of the query. As we know that the Randomized strategies
definitely reduce the cost of the query optimization but they have a constant
space overhead issues and Randomized strategies are slower than heuristics
strategies. On the other hand, there are deterministic Strategies. They
generate runtime Dynamic solutions but they have exponential time and space
complexity linked with them when the numbers of relations increase in the
distributed query. The next Algorithm is the Two-Phase Optimization Algorithm,
it is a combination of Iterative Improvement and Simulated Annealing. Two-Phase
algorithm performs a random walk along different solutions of search space, it produces
an optimal solution but increases the space overhead of query optimization. A
dynamic programming based solution procedure to diminish the sum of
communication cost and local processing cost by optimally determining Join
Order and Join Approaches i.e nested-loop or sort-merge and Join Sites was also
proposed by scientist. However, they expected data to be stored non-redundantly.
Chen and Yu proposed a heuristic algorithm. Heuristic Algorithm determined the
Join Order and Join Sites with hypothesis that file copies are pre-selected
when multiple copies exist. A lot of research focused on the reduction phase of
the distributed query processing and the objective is to invent a minimum cost
semi join sequence that fully reduces the files referenced by the query. This
is attained by applying join predicates in a query plan in order to reduce the
size of the intermediate query results consequently reducing the cost of the
operation. Scientist drove an algorithm based on Dynamic Programming that recognized
beneficial semi joins and resolute Join Order and Join Sites 18. Two new
concepts in the reduction phase of Distributed database: Gainful semi joins and
pure join attributes 19 was also projected. Mi Xifieng and Fan designed a new
algorithm. It based on the commonly used optimization algorithms. It
significantly reduces the amount of intermediate data and network communication
cost. It also improves the optimization efficiency 20. Since we observe that
because of large space complexity and complicated objective functions, Dynamic
Programming is not a feasible solution optimization of Distributed Queries.
Hence new set of techniques are now being discovered for solutions to Distributed
Queries. These techniques are Genetic Algorithm, Ant Colony Optimization
Algorithm and Hybrids of various Evolutionary Algorithms. Genetic Algorithms
are a family of Computational models encouraged by nature or Biological
Evolutions. John Holland discovered the concept of GA where randomly produced
solutions to a problem are assessed as chromosomes and these chromosomes are permitted
to produce new set of individuals or children with better characteristics through
crossover and mutations operators based on fitness function 23. The GA was
also able to diminish the cost of the distributed query tree. A novel
meta-heuristic algorithm which is suitable for problems associated to
Combinatorial Optimization is Ant Colony Algorithm (ACO). Like query
optimization in distributed database because of its characteristics like
intelligent search techniques, global optimization, robust, distributed
computing and ability to combine with other heuristics 26. Three Italian
scholars, Dorigo M, Colorni A and Maniezzo V first proposed ACO in 1992.Ant
Colony Algorithm is a Bionic Optimization Algorithm encouraged by Ants that
uses probabilistic technique for solving computational problems. Ant Colony
Algorithm was built on the mechanism of positive feedback, so it was very
robust, it provides intelligent search and it can be used for Global Optimization
Solutions 27. No doubt Ant Colony Optimization Algorithm have special
characteristics like distributed computing, robust nature and positive feedback
mechanism, but it also has some deficiencies:

a) The preliminary construction needed by ACO has no systematic
way of start-up.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

b) The conjunction speed of ACO is lesser at the beginning
for there is only a little pheromone difference on the path at that time but
the conjunction speed rises towards optimum answer because of positive feedback
mechanism.

In 14, the scientist proposed a new GA based query
optimizer called NGA. NGA progresses the given QEP by considering Join Order,
Join Site and Semi join reducers. By using new values for mutation, crossovers
and outperformed Exhaustive Search, NGA was able to diminish the Local
Processing Cost as well as Network Communication Cost. The potential of Genetic
Algorithm to optimize distributed queries on the problem of fully reducing all
the tables in a tree structured data model query was also worked upon 21. The
Scientist was also proposed an Algorithm which is a combination algorithm of
Genetic Algorithm and Learning Automata 22 for making optimal Query Execution
Plans on the basis of the Join Order Execution and Join Site Selection in Distributed
Database. In 28, The author proposed an Algorithm which is a combination of
GA and Heuristics. It is used for solving Join Ordering Problems as Travelling
Salesman Problem in Large Scale Database. The computational experiments proved
it also to be a viable solution for Distributed Systems. To find out the
optimal Query Execution Plan and Join Order by reducing Query Execution Time
for Multi-Join Query Optimization in Database, Hybrid of Genetic Algorithm and
Best-Worst Ant Colony Algorithm was employed 24. The algorithm executed
positive feedback mechanism of ACO with global search ability of GA. Another
Hybrid of GA and ACO 30 was employed on Join Ordering problem in Databases
(only nested loop joins considered) by incapacitating the shortcomings of both
the algorithms. First the algorithm adopts Genetic Algorithm to give pheromone
to distribute and then it makes use of ant colony algorithm to give the
precision of the solution. The ability of Hybrid GA-ACO to search extensive
amplitude to answers for join queries in relational Database can be prolonged
to optimize the join queries in distributed database where the most important
challenge is to produce the best QEP for optimal results. Another Scientist Rho
et.al also proposed a Genetic Algorithm based solution method that quickly determine
optimal QEP 13. Identification of Copy which means redundancy of data, Identification
of Beneficial Semi joins, Selection of Join Site, Execution of Join Order, and
Local Processing Cost and Communication Cost included in this Model.

 

 

                     

 

                    
                     Hybrid
GA—ACO  

 

Tansel et.al. also proposed an Algorithm which is a
combination of Dynamic Programming and Ant Colony Optimization. It is known as DP-ACO
(Dynamic Programming- Ant Colony Optimization) Algorithm. It is used for the
optimization of multi way chain equijoin queries in Distributed Database
Environment.

As the size of the relations and number of joins
increases in the query, Hence Dynamic Programming suffers from long execution
times and very large memory requirements. It has been proved that DP-ACO is a feasible
solution by producing good execution plans with 15 ways join queries within
limited time and very limited memory space. Another advantage of DP-ACO is that
it can be easily adapted to existing query optimizers that commonly use DP based
algorithms.

Due to the use of the properties of Ant Colony
Algorithm and Particle Swarm Optimization, Hybrid algorithm is proposed to
solve the traveling salesman problems 28. The algorithm first adopts statistics
technique through this technique it gets several initial better solutions and
in accordance with them, gives information pheromone to distribute. Then it
makes use of the ant colony algorithm to get several solutions through information
pheromone accumulation and renewal. Lastly, through the use of across and
mutation operation of particle swarm optimization, the effective solutions are
obtained. Hence it has been proved that The Hybrid Algorithm of ACO-PSO is much
effective. With the increasing number of relations in a query, much use of
memory and processing is needed. Almost all Commercial Applications involve
data from various sites. DDBMS is now being used as a standard DBMS in all
commercial applications. The path designs the behaviour of ants is pragmatic to
direct the ants towards the unknown areas of search space and visit all the
nodes without knowing the graphic topology for production of optimal solutions
of distributed database queries. First these ants estimate the running times of
the execution plans of the given query and then provide rapid, high performance
and optimal results in a cost-effective manner. In Distributed Database
Management System, The Search strategy adopted by the Query Optimizer can help
to diminish the query execution time and the cost incurred on the query and
hence increases performance of a query by selecting the best Query Execution
Plan. It has been proved that the implementations of these probabilistic
algorithms generate viable solutions when the size of the query and the number
of joins in the query grows.