Document search supported on an ontological indexing system created with MapReduce

  • Sonia Jaramillo Valbuena Universidad del Quindío
  • Jorge Mario Londoño Universidad Pontificia Bolivariana
Keywords: Apriori algorithm, MapReduce, ontology, matching, semantics.

Abstract

This paper presents a search system supported on an ontological indexing system. The approach we present is based on lattice matching. The system performs a matching operation that takes as input a pruned lattice of ontological terms associated with the documents in a corpus and  a query lattice. The matching process selects relevant documents and ranks them in accordance to their closeness to the user’s query. We have implemented our approach on top of MapReduce. The experimental results support the efficacy of the system by providing users with a greater consistency among the results and the search domain. The results displayed to the users also show performance enhancements and improved accuracy. The test results are included at the end of the paper.

Author Biographies

Sonia Jaramillo Valbuena, Universidad del Quindío
Ingeniero de Sistemas y Computación, M.Sc, Estudiante de Doctorado en Ingeniería UPB, Profesor asociado, Facultad de Ingeniería. Universidad del Quíndio, Armenia, Colombia.
Jorge Mario Londoño, Universidad Pontificia Bolivariana
Ingeniero Electrónico, PhD, Facultad de Ingeniería Informática y Telecomunicaciones. Universidad Pontificia Bolivariano, Medellín, Colombia.

Downloads

Download data is not yet available.

Languages:

es

Author Biographies

Sonia Jaramillo Valbuena, Universidad del Quindío
Ingeniero de Sistemas y Computación, M.Sc, Estudiante de Doctorado en Ingeniería UPB, Profesor asociado, Facultad de Ingeniería. Universidad del Quíndio, Armenia, Colombia.
Jorge Mario Londoño, Universidad Pontificia Bolivariana
Ingeniero Electrónico, PhD, Facultad de Ingeniería Informática y Telecomunicaciones. Universidad Pontificia Bolivariano, Medellín, Colombia.

References

Gruber, T. (1993). A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2), pp.199-220. http://dx.doi.org/10.1006/knac.1993.1008

Dean, J. & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM - 50th anniversary issue: 1958 - 2008, 51(1), pp.107-113. http://dx.doi.org/10.1145/1327452.1327492

Zou, G., Zhang, B., Gan, Y. & Zhang, J. (2008). An Ontology-Based Methodology for Semantic Expansion Search. FSKD '08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 5, pp.453-457. http://dx.doi.org/10.1109/fskd.2008.475

Jiang, X. & Tan, A. (2006). OntoSearch: a full-text search engine for the semantic web. AAAI'06 proceedings of the 21st national conference on Artificial intelligence, 2, pp.1325-1330.

Anderson, J. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior, 22, pp.261-295. http://dx.doi.org/10.1016/S0022-5371(83)90201-3

Shamsfard, M., Nematzadeh, A. & Motiee S. (2006). ORank: An Ontology Based System for Ranking Documents. International Journal of Electrical and Computer Engineering, pp.225-231.

Fellbaum, C. (2005). WordNet and wordnets. En Encyclopedia of Language and Linguistics, pp.665-670.

Wang, L. & Tsai. (2008). A practical ontology query expansion algorithm for semantic-aware learning objects retrieval. Computers Education, 50(4), pp.1240-1257.http://dx.doi.org/10.1016/j.compedu.2006.12.007

Velardi, P. & Navigli, R. (2003).An Analysis of Ontology-based Query Expansion Strategies. Workshop on Adaptive Text Extraction and Mining.

Song, M., Song, Hu X., Allen, R. (2007). Integration of association rules and ontologies for semantic query expansion Data. Knowledge Engineering, 63(1), pp.63-75. http://dx.doi.org/10.1016/j.datak.2006.10.010

Ranwez, S., Sy, M., Montmain, J., Regnault, A., Crampes, M. & Ranwez V. (2012). User Centered and Ontology Based Information Retrieval System for Life Sciences. BMC Bioinformatics, Vol 13(1).

Alipanah, N., Khan, L., & Thuraisingham, B. (2011). Optimized Ontology-Driven Query Expansion Using Map-Reduce Framework to Facilitate Federated Queries. Proceeding ICWS '11 Proceedings of the 2011 IEEE International Conference on Web Services, pp. 712-713. http://dx.doi.org/10.1109/ICWS.2011.21

Agrawal, R. & Srikant R. (1995). Mining sequential patterns. Proceeding ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering, pp.3-14. http://dx.doi.org/10.1109/ICDE.1995.380415

Zaki, M. (2000). Scalable algorithms for association minin. IEEE Trans.Knowl.Data Eng, 12(3), pp.372–390. http://dx.doi.org/10.1109/69.846291

Ceglar, A. & Roddick J. F. (2006). Association mining. ACM Computing Surveys (CSUR) Surveys Homepage archive, 38(2), artículo 5. http://dx.doi.org/10.1145/1132956.1132958

Levenshtein. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady , 10(8), pp.707–10.

Porta, J., (2006). Clasificación de patrones. Recuperado en abril de 2014, de http://arantxa.ii.uam.es/~jporta/iula/unsupervised.slides.pdf

Salton, G. (1971). The Smart retrieval system–experiments, Inc. Upper Saddle River, NJ, EE.UU.: Prentice–Hall.

Salton, G., Wong, A. & Yang C.S. (1975). A vector space model for automatic indexing. Communications of the Association for Computing, 18(11), pp.613–620. http://dx.doi.org/10.1145/361219.361220

Apache software foundation. (2012). Apache Hadoop. Recuperado en abril de 2014, de http://hadoop.apache.org

Apache software foundation. (2012). Apache Jena. Recuperado en abril de 2014, de http://jena.apache.org/

Apache software foundation. (2012). Apache Lucene Core. Recuperado en abril de 2014, de http://lucene.apache.org/core/

Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner, W. A. & Cohen K. (2012). The CRAFT Colorado Richly Annotated Full-Text Corpus. Recuperado en abril de 2014, de http://sourceforge.net/projects/bionlp-corpora/files/CRAFT/v0.9/

W3C. (2009). OWL 2 Web Ontology Language Document Overview. Recuperado en abril de 2014, de http://www.w3.org/TR/owl2-overview/

Rijsbergen, R., Robertson, S.E. & Porter, M.F. (1980). New models in probabilistic information retrieval. British Library Research and Development Report, Vol 5587.

Binary fortress software. (2014). Fileseek Fast and Free File Search. Recuperado en abril de 2014, de http://www.fileseek.ca/

Frakes, W. & Baeza, R. (1992). Information Retrieval: data structures and Algorithms. México: Prentice-Hall.

Mathworks. (2014). Matlab. Recuperado en abril de 2014, de http://www.mathworks.com/products/matlab

Babu, S.(2010). Towards automatic optimization of MapReduce programs. SoCC10 Proceedings of the 1st ACM symposium on Cloud computing, pp.137–142. http://dx.doi.org/10.1145/1807128.1807150

Paravastu, R., Scarlat, R. & Chandrasekaran, B. (2012). Adaptive Load Balancing in MapReduce. Recuperado en abril de 2014, de https://www.cs.duke.edu/courses/fall12/cps216/Project/Project/projects/Adaptive_load_balancer/adaptive-load-balancing.pdf

Goel, A. & Munagala, K. (2012). Complexity measures for MapReduce, and comparison to Parallel computing. Recuperado en abril de 2014, de: http://www.stanford.edu/~ashishg/papers/mapreducecomplexity.pdf

How to Cite
Jaramillo Valbuena, S., & Londoño, J. M. (2014). Document search supported on an ontological indexing system created with MapReduce. Ciencia E Ingenieria Neogranadina, 24(2), 57–75. https://doi.org/10.18359/rcin.393
Published
2014-12-01
Section
ARTICLES

Altmetric

Crossref Cited-by logo
QR Code
Article metrics
Abstract views
Galley vies
PDF Views
HTML views
Other views