Ranking (recommendation
)¶
CLiMF¶
CLiMF (Collaborative Less-is-More Filtering) is used in scenarios with binary relevance data. Hence, it’s focused on improving top-k recommendations through ranking by directly maximizing the Mean Reciprocal Rank (MRR).
Following a similar technique as other iterative approaches, the two low-rank matrices can be randomly initialize and then optimize through a training loss like this:
Note: Orange3 currently does not support ranking operations. Therefore, this model cannot be used neither in cross-validation nor in the prediction module available in Orange3
Example¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | import Orange
import numpy as np
from orangecontrib.recommendation import CLiMFLearner
# Load data
data = Orange.data.Table('epinions_train.tab')
# Train recommender
learner = CLiMFLearner(num_factors=10, num_iter=10, learning_rate=0.0001, lmbda=0.001)
recommender = learner(data)
# Load test dataset
testdata = Orange.data.Table('epinions_test.tab')
# Sample users
num_users = len(recommender.U)
num_samples = min(num_users, 1000) # max. number to sample
users_sampled = np.random.choice(np.arange(num_users), num_samples)
# Compute Mean Reciprocal Rank (MRR)
mrr, _ = recommender.compute_mrr(data=testdata, users=users_sampled)
print('MRR: %.4f' % mrr)
>>>
MRR: 0.3975
|
-
class
orangecontrib.recommendation.
CLiMFLearner
(num_factors=5, num_iter=25, learning_rate=0.0001, lmbda=0.001, preprocessors=None, optimizer=None, verbose=False, random_state=None, callback=None)[source]¶ CLiMF: Collaborative Less-is-More Filtering Matrix Factorization
This model uses stochastic gradient descent to find two low-rank matrices: user-feature matrix and item-feature matrix.
CLiMF is a matrix factorization for scenarios with binary relevance data when only a few (k) items are recommended to individual users. It improves top-k recommendations through ranking by directly maximizing the Mean Reciprocal Rank (MRR).
- Attributes:
- num_factors: int, optional
- The number of latent factors.
- num_iter: int, optional
- The number of passes over the training data (aka epochs).
- learning_rate: float, optional
- The learning rate controlling the size of update steps (general).
- lmbda: float, optional
- Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
- optimizer: Optimizer, optional
- Set the optimizer for SGD. If None (default), classical SGD will be applied.
- verbose: boolean or int, optional
- Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
- random_state: int, optional
- Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.
callback: callable