Ranking (recommendation)

CLiMF

CLiMF (Collaborative Less-is-More Filtering) is used in scenarios with binary relevance data. Hence, it’s focused on improving top-k recommendations through ranking by directly maximizing the Mean Reciprocal Rank (MRR).

Following a similar technique as other iterative approaches, the two low-rank matrices can be randomly initialize and then optimize through a training loss like this:

\[F(U,V) = \sum _{ i=1 }^{ M }{ \sum _{ j=1 }^{ N }{ { Y }_{ ij }[\ln{\quad g({ U }_{ i }^{ T }V_{ i })} +\sum _{ k=1 }^{ N }{ \ln { (1-{ Y }_{ ik }g({ U }_{ i }^{ T }V_{ k }-{ U }_{ i }^{ T }V_{ j })) } } ] } } -\frac { \lambda }{ 2 } ({ \left\| U \right\| }^{ 2 }+{ \left\| V \right\| }^{ 2 })\]

Note: Orange3 currently does not support ranking operations. Therefore, this model cannot be used neither in cross-validation nor in the prediction module available in Orange3

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
 import Orange
 import numpy as np
 from orangecontrib.recommendation import CLiMFLearner

 # Load data
 data = Orange.data.Table('epinions_train.tab')

 # Train recommender
 learner = CLiMFLearner(num_factors=10, num_iter=10, learning_rate=0.0001, lmbda=0.001)
 recommender = learner(data)

 # Load test dataset
 testdata = Orange.data.Table('epinions_test.tab')

 # Sample users
 num_users = len(recommender.U)
 num_samples = min(num_users, 1000)  # max. number to sample
 users_sampled = np.random.choice(np.arange(num_users), num_samples)

 # Compute Mean Reciprocal Rank (MRR)
 mrr, _ = recommender.compute_mrr(data=testdata, users=users_sampled)
 print('MRR: %.4f' % mrr)
 >>>
 MRR: 0.3975
class orangecontrib.recommendation.CLiMFLearner(num_factors=5, num_iter=25, learning_rate=0.0001, lmbda=0.001, preprocessors=None, optimizer=None, verbose=False, random_state=None, callback=None)[source]

CLiMF: Collaborative Less-is-More Filtering Matrix Factorization

This model uses stochastic gradient descent to find two low-rank matrices: user-feature matrix and item-feature matrix.

CLiMF is a matrix factorization for scenarios with binary relevance data when only a few (k) items are recommended to individual users. It improves top-k recommendations through ranking by directly maximizing the Mean Reciprocal Rank (MRR).

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.

callback: callable

fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.