Национальный цифровой ресурс Руконт - межотраслевая электронная библиотека (ЭБС) на базе технологии Контекстум (всего произведений: 610371)
Контекстум
0   0
Первый авторSergienko
АвторыMuhammad Shan
Страниц11
ID453732
АннотацияNatural language call routing is an important data analysis problem which can be applied in different domains including airspace industry. This paper presents the investigation of collectives of term weighting methods for natural language call routing based on text classification. The main idea is that collectives of different term weighting methods can provide classification effectiveness improvement with the same classification algorithm. Seven different unsupervised and supervised term weighting methods were tested and compared with each other for classification with k-NN. After that different combinations of term weighting methods were formed as collectives. Two approaches for the handling of the collectives were considered: the meta-classifier based on the rule induction and the majority vote procedure. The numerical experiments have shown that the best result is provided with the vote of all seven different term weighting methods. This combination provides a significant increasing of classification effectiveness in comparison with the most effective term weighting methods.
УДК004.93
Sergienko, RomanB. Topic Categorization Based on Collectives of Term Weighting Methods for Natural Language Call Routing / RomanB. Sergienko, Shan Muhammad // Журнал Сибирского федерального университета. Математика и физика. Journal of Siberian Federal University, Mathematics & Physics .— 2016 .— №2 .— С. 107-117 .— URL: https://rucont.ru/efd/453732 (дата обращения: 21.04.2025)

Предпросмотр (выдержки из произведения)

Mathematics & Physics 2016, 9(2), 235–245 УДК 004.93 Topic Categorization Based on Collectives of TermWeighting Methods for Natural Language Call Routing Roman B. SergienkoMuhammad Shan† Wolfgang Minker‡ Institute of Telecommunication Engineering Ulm University Albert-Einstein-Allee, 43, Ulm, 89081 Germany Eugene S. Semenkin§ Informatics and Telecommunications Institute Siberian State Aerospace University Krasnoyarskiy Rabochiy, 31, Krasnoyarsk, 660037 Russia Received 26.12.2015, received in revised form 11.01.2016, accepted 20.02.2016 Natural language call routing is an important data analysis problem which can be applied in different domains including airspace industry. <...> This paper presents the investigation of collectives of term weighting methods for natural language call routing based on text classification. <...> The main idea is that collectives of different term weighting methods can provide classification effectiveness improvement with the same classification algorithm. <...> Seven different unsupervised and supervised term weighting methods were tested and compared with each other for classification with k-NN. <...> After that different combinations of term weighting methods were formed as collectives. <...> Two approaches for the handling of the collectives were considered: the meta-classifier based on the rule induction and the majority vote procedure. <...> The numerical experiments have shown that the best result is provided with the vote of all seven different term weighting methods. <...> The first one is speech recognition of calls and the second one is topic categorization of users’ utterances for further routing. <...> Topic categorization of users’ utterances can be also useful for multi-domain ∗roman.sergienko@uni-ulm.de †muhammad.shan@uni-ulm.de ‡wolfgang.minker@uni-ulm.de §eugenesemenkin@yandex.ru ⃝ Siberian Federal University. <...> All rights reserved c – 235 – Roman B. Sergienko, Muhammad Shan, Wolfgang Minker, Eugene S. Semenkin Topic Categorization . . . spoken dialogue system design [2]. <...> In this work we treat call routing as an example of a text classification application In the vector space model [3] text classification is considered as a machine learning problem. <...> The complexity of text categorization with a vector space model is compounded by the need to extract the numerical data from text information before applying machine learning algorithms <...>