Search strategy or Inventory Algorithm is the most dominant

component of a query optimizer. In this section of paper, we tried to discover

the research on Query Optimization Techniques based on a number of Optimization

Algorithms used in Distributed Database Queries. SDD-1 was one of the early

Distributed Database System. SSD-1 was designed for slow wide area network and

made use of semi joins to diminish the communication cost by producing static

unchangeable query plans without considering the horizontal or vertical data

fragments of distributed database. One of the other Distributed Database System

was the R* system. it was used by DDB query optimizer. It also generated static unchangeable query in

faster networks but they neither employed semi joins nor handled horizontal or

vertical fragmentation. One of the other Distributed Database System was

Distributed–INGRES. It was able to generate dynamic QEP at run-time on faster

communication networks. Semi Join for the reduction of query size was not used

but the system was able to handle horizontal fragmentation without replication.

Momentous amount of work has been carried out on producing optimal solutions

for Join Order of the query. As we know that the Randomized strategies

definitely reduce the cost of the query optimization but they have a constant

space overhead issues and Randomized strategies are slower than heuristics

strategies. On the other hand, there are deterministic Strategies. They

generate runtime Dynamic solutions but they have exponential time and space

complexity linked with them when the numbers of relations increase in the

distributed query. The next Algorithm is the Two-Phase Optimization Algorithm,

it is a combination of Iterative Improvement and Simulated Annealing. Two-Phase

algorithm performs a random walk along different solutions of search space, it produces

an optimal solution but increases the space overhead of query optimization. A

dynamic programming based solution procedure to diminish the sum of

communication cost and local processing cost by optimally determining Join

Order and Join Approaches i.e nested-loop or sort-merge and Join Sites was also

proposed by scientist. However, they expected data to be stored non-redundantly.

Chen and Yu proposed a heuristic algorithm. Heuristic Algorithm determined the

Join Order and Join Sites with hypothesis that file copies are pre-selected

when multiple copies exist. A lot of research focused on the reduction phase of

the distributed query processing and the objective is to invent a minimum cost

semi join sequence that fully reduces the files referenced by the query. This

is attained by applying join predicates in a query plan in order to reduce the

size of the intermediate query results consequently reducing the cost of the

operation. Scientist drove an algorithm based on Dynamic Programming that recognized

beneficial semi joins and resolute Join Order and Join Sites 18. Two new

concepts in the reduction phase of Distributed database: Gainful semi joins and

pure join attributes 19 was also projected. Mi Xifieng and Fan designed a new

algorithm. It based on the commonly used optimization algorithms. It

significantly reduces the amount of intermediate data and network communication

cost. It also improves the optimization efficiency 20. Since we observe that

because of large space complexity and complicated objective functions, Dynamic

Programming is not a feasible solution optimization of Distributed Queries.

Hence new set of techniques are now being discovered for solutions to Distributed

Queries. These techniques are Genetic Algorithm, Ant Colony Optimization

Algorithm and Hybrids of various Evolutionary Algorithms. Genetic Algorithms

are a family of Computational models encouraged by nature or Biological

Evolutions. John Holland discovered the concept of GA where randomly produced

solutions to a problem are assessed as chromosomes and these chromosomes are permitted

to produce new set of individuals or children with better characteristics through

crossover and mutations operators based on fitness function 23. The GA was

also able to diminish the cost of the distributed query tree. A novel

meta-heuristic algorithm which is suitable for problems associated to

Combinatorial Optimization is Ant Colony Algorithm (ACO). Like query

optimization in distributed database because of its characteristics like

intelligent search techniques, global optimization, robust, distributed

computing and ability to combine with other heuristics 26. Three Italian

scholars, Dorigo M, Colorni A and Maniezzo V first proposed ACO in 1992.Ant

Colony Algorithm is a Bionic Optimization Algorithm encouraged by Ants that

uses probabilistic technique for solving computational problems. Ant Colony

Algorithm was built on the mechanism of positive feedback, so it was very

robust, it provides intelligent search and it can be used for Global Optimization

Solutions 27. No doubt Ant Colony Optimization Algorithm have special

characteristics like distributed computing, robust nature and positive feedback

mechanism, but it also has some deficiencies:

a) The preliminary construction needed by ACO has no systematic

way of start-up.

b) The conjunction speed of ACO is lesser at the beginning

for there is only a little pheromone difference on the path at that time but

the conjunction speed rises towards optimum answer because of positive feedback

mechanism.

In 14, the scientist proposed a new GA based query

optimizer called NGA. NGA progresses the given QEP by considering Join Order,

Join Site and Semi join reducers. By using new values for mutation, crossovers

and outperformed Exhaustive Search, NGA was able to diminish the Local

Processing Cost as well as Network Communication Cost. The potential of Genetic

Algorithm to optimize distributed queries on the problem of fully reducing all

the tables in a tree structured data model query was also worked upon 21. The

Scientist was also proposed an Algorithm which is a combination algorithm of

Genetic Algorithm and Learning Automata 22 for making optimal Query Execution

Plans on the basis of the Join Order Execution and Join Site Selection in Distributed

Database. In 28, The author proposed an Algorithm which is a combination of

GA and Heuristics. It is used for solving Join Ordering Problems as Travelling

Salesman Problem in Large Scale Database. The computational experiments proved

it also to be a viable solution for Distributed Systems. To find out the

optimal Query Execution Plan and Join Order by reducing Query Execution Time

for Multi-Join Query Optimization in Database, Hybrid of Genetic Algorithm and

Best-Worst Ant Colony Algorithm was employed 24. The algorithm executed

positive feedback mechanism of ACO with global search ability of GA. Another

Hybrid of GA and ACO 30 was employed on Join Ordering problem in Databases

(only nested loop joins considered) by incapacitating the shortcomings of both

the algorithms. First the algorithm adopts Genetic Algorithm to give pheromone

to distribute and then it makes use of ant colony algorithm to give the

precision of the solution. The ability of Hybrid GA-ACO to search extensive

amplitude to answers for join queries in relational Database can be prolonged

to optimize the join queries in distributed database where the most important

challenge is to produce the best QEP for optimal results. Another Scientist Rho

et.al also proposed a Genetic Algorithm based solution method that quickly determine

optimal QEP 13. Identification of Copy which means redundancy of data, Identification

of Beneficial Semi joins, Selection of Join Site, Execution of Join Order, and

Local Processing Cost and Communication Cost included in this Model.

Hybrid

GA—ACO

Tansel et.al. also proposed an Algorithm which is a

combination of Dynamic Programming and Ant Colony Optimization. It is known as DP-ACO

(Dynamic Programming- Ant Colony Optimization) Algorithm. It is used for the

optimization of multi way chain equijoin queries in Distributed Database

Environment.

As the size of the relations and number of joins

increases in the query, Hence Dynamic Programming suffers from long execution

times and very large memory requirements. It has been proved that DP-ACO is a feasible

solution by producing good execution plans with 15 ways join queries within

limited time and very limited memory space. Another advantage of DP-ACO is that

it can be easily adapted to existing query optimizers that commonly use DP based

algorithms.

Due to the use of the properties of Ant Colony

Algorithm and Particle Swarm Optimization, Hybrid algorithm is proposed to

solve the traveling salesman problems 28. The algorithm first adopts statistics

technique through this technique it gets several initial better solutions and

in accordance with them, gives information pheromone to distribute. Then it

makes use of the ant colony algorithm to get several solutions through information

pheromone accumulation and renewal. Lastly, through the use of across and

mutation operation of particle swarm optimization, the effective solutions are

obtained. Hence it has been proved that The Hybrid Algorithm of ACO-PSO is much

effective. With the increasing number of relations in a query, much use of

memory and processing is needed. Almost all Commercial Applications involve

data from various sites. DDBMS is now being used as a standard DBMS in all

commercial applications. The path designs the behaviour of ants is pragmatic to

direct the ants towards the unknown areas of search space and visit all the

nodes without knowing the graphic topology for production of optimal solutions

of distributed database queries. First these ants estimate the running times of

the execution plans of the given query and then provide rapid, high performance

and optimal results in a cost-effective manner. In Distributed Database

Management System, The Search strategy adopted by the Query Optimizer can help

to diminish the query execution time and the cost incurred on the query and

hence increases performance of a query by selecting the best Query Execution

Plan. It has been proved that the implementations of these probabilistic

algorithms generate viable solutions when the size of the query and the number

of joins in the query grows.