INTERNATIONAL JOURNAL OF NOVEL RESEARCH AND DEVELOPMENT International Peer Reviewed & Refereed Journals, Open Access Journal ISSN Approved Journal No: 2456-4184 | Impact factor: 8.76 | ESTD Year: 2016
Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.76 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
Abstract
Statistics Netherlands (CBS) is interested in using Natural Language Processing (NLP) to classify companies that are not included in Community Innovation Survey (CIS) to obtain reliable data with regard to the location of innovation activities. Various machine learning methods were applied with favorable results in the past. In recent years, growing attention is paid to combining predictions of multiple models. ensemble approach is investigated in relation to predicting innovative companies based on their website text. It was found that the stacking algorithm provided the best accuracy out of all the models but with considerable training time. Depending on the seed and a random selection of the training data, the stacking algorithm provided an improvement of up to 1%. Other ensemble algorithms presented did not improve the accuracy compared to the best-performing individual model.
The main point of this chapter is to cover the data source and webs craping process. Moreover, preprocessing steps and mathematical representation of words extracted from the websites will be discussed.
Keywords:
Web Data Source and Processing
Cite Article:
"Web Data Sources and Processing", International Journal of Novel Research and Development (www.ijnrd.org), ISSN:2456-4184, Vol.9, Issue 4, page no.f396-f403, April-2024, Available :http://www.ijnrd.org/papers/IJNRD2404550.pdf
Downloads:
00025
ISSN:
2456-4184 | IMPACT FACTOR: 8.76 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.76 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator
Facebook Twitter Instagram LinkedIn