Search range from large taxonomies categorizing Web

Search user query
processing system that can interconvert the query in given languages and then
passes it to search engine so that it returns all available search results in
both of the languages. Following the above premise this research provides a
case study based on a pilot project developed using two sample languages
i.e. Sindhi and English.

 

1.      
Background

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

The purpose
of Semantic Web technologies (SWT) is to make web contents understandable for
machines that can classify and recognize the information available on internet
as human does (9). However the flexible nature of XML based technologies (RDF,
DLL, SPARQL etc.) made it useful for information structuring and processing in
many other areas of information technology.

In recent
years the development of ontologies—explicit formal specifications of the terms
in the domain and relations among them (Gruber 1993)—has been moving from
the realm of Artificial-Intelligence laboratories to the desktops of domain
experts. Ontologies have become common on the World-Wide Web. The Ontologies on
the Web range from large taxonomies categorizing Web sites (such as on Yahoo!)
to categorizations of products for sale and their features (such as on
Amazon.com). The WWW Consortium (W3C) is developing the Resource Description
Framework (Brickley and Guha 1999), a language for encoding knowledge on
Web pages to make it understandable to electronic agents searching for
information.  The Defense Advanced Research Projects Agency (DARPA),
in conjunction with the W3C, is developing DARPA Agent Markup Language (DAML)
by extending RDF with more expressive constructs aimed at facilitating agent interaction
on the Web (Hendler and McGuinness 2000). Many disciplines now develop
standardized ontologies that domain experts can use to share and annotate
information in their fields. 

Ontology
defines a common vocabulary for researchers who need to share information in a
domain. It includes machine-interpretable definitions of basic concepts in the
domain and relations among them. Why would someone want to develop Ontology?
Some of the reasons are Ontology Development 101:

Ø  To share common understanding of the structure of
information among people or software agents

Ø  To enable reuse of domain knowledge

Ø  To make domain assumptions explicit

Ø  To separate domain knowledge from the operational
knowledge

Ø  To analyze domain knowledge

Parallel corpuses are valuable resource for machine translation,
multilingual text retrieval, language education and other applications BITS. Parallel
ontology can absolutely play the role of parallel corpus by limiting it in
particular relations building a bilingual bio-ontology platform For knowledge
discovery.

To the best of our knowledge this research is first ever
attempt to develop Ontological corpus for Sindhi language. Infect Sindhi
language is one of the fortunate language of the region which is advancing in
different computational areas. Work which is already done can be categorized as
following:

 

a-      
Unicode based -Text processing

b-      
NLP-Corpus based

c-      
Ontological – in other similar languages Eg. Urdu

                Majority of the work
done in Sindhi computing is regarding word processing and Unicode based Sindhi
typing systemmajid bhurgri’s work and next most popular in Dictionaries
Unicode based bilingual sindhi English dictionaryPhonetic based Sindhi spell
checker. Countable but primary efforts are taken in the field of Natural language
processing Towards Sindhi corpus constructionword tokenization modelWord
segmentation model for sindhi.

Urdu language, which
is syntactically and phonetically one of the closest language to Sindhi has
valuable work in the field of Semantic Web technologies. One that is
considerably similar to this research is semantic annotation model for web
documents based on ontologies. This research has a fundamental difference to
presented model that it focuses on semantics of information already available
on web, whereas this research focuses on semantics of information before
interacting it on web.