Document search supported on an ontological indexing system created with MapReduce
Abstract
This paper presents a search system supported on an ontological indexing system. The approach we present is based on lattice matching. The system performs a matching operation that takes as input a pruned lattice of ontological terms associated with the documents in a corpus and a query lattice. The matching process selects relevant documents and ranks them in accordance to their closeness to the user’s query. We have implemented our approach on top of MapReduce. The experimental results support the efficacy of the system by providing users with a greater consistency among the results and the search domain. The results displayed to the users also show performance enhancements and improved accuracy. The test results are included at the end of the paper.Downloads
Languages:
esReferences
Gruber, T. (1993). A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2), pp.199-220. http://dx.doi.org/10.1006/knac.1993.1008
Dean, J. & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM - 50th anniversary issue: 1958 - 2008, 51(1), pp.107-113. http://dx.doi.org/10.1145/1327452.1327492
Zou, G., Zhang, B., Gan, Y. & Zhang, J. (2008). An Ontology-Based Methodology for Semantic Expansion Search. FSKD '08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, 5, pp.453-457. http://dx.doi.org/10.1109/fskd.2008.475
Jiang, X. & Tan, A. (2006). OntoSearch: a full-text search engine for the semantic web. AAAI'06 proceedings of the 21st national conference on Artificial intelligence, 2, pp.1325-1330.
Anderson, J. (1983). A spreading activation theory of memory. Journal of Verbal Learning and Verbal Behavior, 22, pp.261-295. http://dx.doi.org/10.1016/S0022-5371(83)90201-3
Shamsfard, M., Nematzadeh, A. & Motiee S. (2006). ORank: An Ontology Based System for Ranking Documents. International Journal of Electrical and Computer Engineering, pp.225-231.
Fellbaum, C. (2005). WordNet and wordnets. En Encyclopedia of Language and Linguistics, pp.665-670.
Wang, L. & Tsai. (2008). A practical ontology query expansion algorithm for semantic-aware learning objects retrieval. Computers Education, 50(4), pp.1240-1257.http://dx.doi.org/10.1016/j.compedu.2006.12.007
Velardi, P. & Navigli, R. (2003).An Analysis of Ontology-based Query Expansion Strategies. Workshop on Adaptive Text Extraction and Mining.
Song, M., Song, Hu X., Allen, R. (2007). Integration of association rules and ontologies for semantic query expansion Data. Knowledge Engineering, 63(1), pp.63-75. http://dx.doi.org/10.1016/j.datak.2006.10.010
Ranwez, S., Sy, M., Montmain, J., Regnault, A., Crampes, M. & Ranwez V. (2012). User Centered and Ontology Based Information Retrieval System for Life Sciences. BMC Bioinformatics, Vol 13(1).
Alipanah, N., Khan, L., & Thuraisingham, B. (2011). Optimized Ontology-Driven Query Expansion Using Map-Reduce Framework to Facilitate Federated Queries. Proceeding ICWS '11 Proceedings of the 2011 IEEE International Conference on Web Services, pp. 712-713. http://dx.doi.org/10.1109/ICWS.2011.21
Agrawal, R. & Srikant R. (1995). Mining sequential patterns. Proceeding ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering, pp.3-14. http://dx.doi.org/10.1109/ICDE.1995.380415
Zaki, M. (2000). Scalable algorithms for association minin. IEEE Trans.Knowl.Data Eng, 12(3), pp.372–390. http://dx.doi.org/10.1109/69.846291
Ceglar, A. & Roddick J. F. (2006). Association mining. ACM Computing Surveys (CSUR) Surveys Homepage archive, 38(2), artículo 5. http://dx.doi.org/10.1145/1132956.1132958
Levenshtein. (1966). Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady , 10(8), pp.707–10.
Porta, J., (2006). Clasificación de patrones. Recuperado en abril de 2014, de http://arantxa.ii.uam.es/~jporta/iula/unsupervised.slides.pdf
Salton, G. (1971). The Smart retrieval system–experiments, Inc. Upper Saddle River, NJ, EE.UU.: Prentice–Hall.
Salton, G., Wong, A. & Yang C.S. (1975). A vector space model for automatic indexing. Communications of the Association for Computing, 18(11), pp.613–620. http://dx.doi.org/10.1145/361219.361220
Apache software foundation. (2012). Apache Hadoop. Recuperado en abril de 2014, de http://hadoop.apache.org
Apache software foundation. (2012). Apache Jena. Recuperado en abril de 2014, de http://jena.apache.org/
Apache software foundation. (2012). Apache Lucene Core. Recuperado en abril de 2014, de http://lucene.apache.org/core/
Bada, M., Eckert, M., Evans, D., Garcia, K., Shipley, K., Sitnikov, D., Baumgartner, W. A. & Cohen K. (2012). The CRAFT Colorado Richly Annotated Full-Text Corpus. Recuperado en abril de 2014, de http://sourceforge.net/projects/bionlp-corpora/files/CRAFT/v0.9/
W3C. (2009). OWL 2 Web Ontology Language Document Overview. Recuperado en abril de 2014, de http://www.w3.org/TR/owl2-overview/
Rijsbergen, R., Robertson, S.E. & Porter, M.F. (1980). New models in probabilistic information retrieval. British Library Research and Development Report, Vol 5587.
Binary fortress software. (2014). Fileseek Fast and Free File Search. Recuperado en abril de 2014, de http://www.fileseek.ca/
Frakes, W. & Baeza, R. (1992). Information Retrieval: data structures and Algorithms. México: Prentice-Hall.
Mathworks. (2014). Matlab. Recuperado en abril de 2014, de http://www.mathworks.com/products/matlab
Babu, S.(2010). Towards automatic optimization of MapReduce programs. SoCC10 Proceedings of the 1st ACM symposium on Cloud computing, pp.137–142. http://dx.doi.org/10.1145/1807128.1807150
Paravastu, R., Scarlat, R. & Chandrasekaran, B. (2012). Adaptive Load Balancing in MapReduce. Recuperado en abril de 2014, de https://www.cs.duke.edu/courses/fall12/cps216/Project/Project/projects/Adaptive_load_balancer/adaptive-load-balancing.pdf
Goel, A. & Munagala, K. (2012). Complexity measures for MapReduce, and comparison to Parallel computing. Recuperado en abril de 2014, de: http://www.stanford.edu/~ashishg/papers/mapreducecomplexity.pdf
Article metrics | |
---|---|
Abstract views | |
Galley vies | |
PDF Views | |
HTML views | |
Other views |