CSpace
A Multilayered-and-Randomized Latent Factor Model for High-Dimensional and Sparse Matrices
Yuan, Ye1,2; He, Qiang3; Luo, Xin1,4,5; Shang, Mingsheng1
2022-06-01
摘要How to extract useful knowledge from a high-dimensional and sparse (HiDS) matrix efficiently is critical for many big data-related applications. A latent factor (LF) model has been widely adopted to address this problem. It commonly relies on an iterative learning algorithm like stochastic gradient descent. However, an algorithm of this kind commonly consumes many iterations to converge, resulting in considerable time cost on large-scale datasets. How to accelerate an LF model's training process without accuracy loss becomes a vital issue. To address it, this study innovatively proposes a multilayered-and-randomized latent factor (MLF) model. Its main idea is two-fold: a) adopting randomized-learning to train LFs for implementing a 'one-iteration' training process for saving time; and 2) adopting the principle of a generally multilayered structure as in a deep forest or multilayered extreme learning machine to structure its LFs, thereby enhancing its representative learning ability. Empirical studies on six HiDS matrices from real applications demonstrate that compared with state-of-the-art LF models, an MLF model achieves significantly higher computational efficiency with satisfactory prediction accuracy. It has the potential to handle LF analysis on a large scale HiDS matrix with real-time requirements.
关键词Computational modeling Sparse matrices Big Data Data models Stochastic processes Training Software algorithms Big data latent factor analysis generally multilayered structure deep forest multilayered extreme learning machine randomized-learning high-dimensional and sparse matrix stochastic gradient descent randomized model
DOI10.1109/TBDATA.2020.2988778
发表期刊IEEE TRANSACTIONS ON BIG DATA
ISSN2332-7790
卷号8期号:3页码:784-794
通讯作者Luo, Xin(luoxin21@cigit.ac.cn) ; Shang, Mingsheng(msshang@cigit.ac.cn)
收录类别SCI
WOS记录号WOS:000795107500016
语种英语