Rating (recommendation)

BRISMF

BRISMF (Biased Regularized Incremental Simultaneous Matrix Factorization) is factorization-based algorithm for large scale recommendation systems.

The basic idea is to factorize a very sparse matrix into two low-rank matrices which represents user and item factors. This can be done by using an iterative approach to minimize the loss function.

User’s predictions are defined as follows:

\[\hat { r }_{ ui }=\mu +b_{ u }+b_{ i }+{ q }_{ i }^{ T }{ p }_{ u }\]

We learn the values of involved parameters by minimizing the regularized squared error function associated with:

\[\min _{ p*,q*,b* }{ \sum _{ (u,i\in k) }^{ }{ { ({ r }_{ ui }-\mu -b_{ u }-b_{ i }-{ q }_{ i }^{ T }{ p }_{ u }) }^{ 2 }+\lambda ({ { { b }_{ u } }^{ 2 }+{ { b }_{ i } }^{ 2 }+\left\| { p }_{ u } \right\| }^{ 2 }+{ \left\| q_{ i } \right\| }^{ 2 }) } }\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import Orange
from orangecontrib.recommendation import BRISMFLearner

# Load data and train the model
data = Orange.data.Table('movielens100k.tab')
learner = BRISMFLearner(num_factors=15, num_iter=25, learning_rate=0.07, lmbda=0.1)
recommender = learner(data)

# Make predictions
prediction = recommender(data[:3])
print(prediction)
>>>
[ 3.79505151  3.75096513  1.293013 ]
class orangecontrib.recommendation.BRISMFLearner(num_factors=5, num_iter=25, learning_rate=0.07, bias_learning_rate=None, lmbda=0.1, bias_lmbda=None, min_rating=None, max_rating=None, optimizer=None, preprocessors=None, verbose=False, random_state=None, callback=None)[source]

BRISMF: Biased Regularized Incremental Simultaneous Matrix Factorization

This model uses stochastic gradient descent to find two low-rank matrices: user-feature matrix and item-feature matrix.

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
bias_learning_rate: float, optional
The learning rate controlling the size of the bias update steps. If None (default), bias_learning_rate = learning_rate
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
bias_lmbda: float, optional
Controls the importance of the bias regularization term. If None (default), bias_lmbda = lmbda
min_rating: float, optional
Defines the lower bound for the predictions. If None (default), ratings won’t be bounded.
max_rating: float, optional
Defines the upper bound for the predictions. If None (default), ratings won’t be bounded.
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.
callback: callable
Method that receives the current iteration as an argument.
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

SVD++

SVD++ is matrix factorization model which makes use of implicit feedback information.

User’s predictions are defined as follows:

\[\hat { r }_{ ui } = \mu + b_u + b_i + \left(p_u + \frac{1}{\sqrt{|N(u)|}}\sum_{j\in N(u)} y_j \right)^T q_i\]

We learn the values of involved parameters by minimizing the regularized squared error function associated with:

\[\begin{split}\begin{split} \min _{ p*,q*,y*,b* }&{\sum _{ (u,i\in k) }{ { ({ r }_{ ui }-\mu -b_{ u }-b_{ i }-{ q }_{ i }^{ T }\left( p_{ u }+\frac { 1 }{ \sqrt { |N(u)| } } \sum _{ j\in N(u) } y_{ j } \right) ) }^{ 2 }}} \\ &+\lambda ({ { { b }_{ u } }^{ 2 }+{ { b }_{ i } }^{ 2 }+\left\| { p }_{ u } \right\| }^{ 2 }+{ \left\| q_{ i } \right\| }^{ 2 }+\sum _{ j\in N(u) }{ \left\| y_{ j } \right\| } ^{ 2 }) \end{split}\end{split}\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import Orange
from orangecontrib.recommendation import SVDPlusPlusLearner

# Load data and train the model
data = Orange.data.Table('movielens100k.tab')
learner = SVDPlusPlusLearner(num_factors=15, num_iter=25, learning_rate=0.07, lmbda=0.1)
recommender = learner(data)

# Make predictions
prediction = recommender(data[:3])
print(prediction)
class orangecontrib.recommendation.SVDPlusPlusLearner(num_factors=5, num_iter=25, learning_rate=0.01, bias_learning_rate=None, lmbda=0.1, bias_lmbda=None, min_rating=None, max_rating=None, feedback=None, optimizer=None, preprocessors=None, verbose=False, random_state=None, callback=None)[source]

SVD++ matrix factorization

This model uses stochastic gradient descent to find three low-rank matrices: user-feature matrix, item-feature matrix and feedback-feature matrix.

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
bias_learning_rate: float, optional
The learning rate controlling the size of the bias update steps. If None (default), bias_learning_rate = learning_rate
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
bias_lmbda: float, optional
Controls the importance of the bias regularization term. If None (default), bias_lmbda = lmbda
min_rating: float, optional
Defines the lower bound for the predictions. If None (default), ratings won’t be bounded.
max_rating: float, optional
Defines the upper bound for the predictions. If None (default), ratings won’t be bounded.
feedback: Orange.data.Table
Implicit feedback information. If None (default), implicit information will be inferred from the ratings (e.g.: item rated, means items seen).
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.
callback: callable
Method that receives the current iteration as an argument.
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.

TrustSVD

TrustSVD is a trust-based matrix factorization, which extends SVD++ with trust information.

User’s predictions are defined as follows:

\[\hat { r }_{ ui }=\mu +b_{ u }+b_{ i }+{ q_{ i } }^{ \top }\left( p_{ u }+{ \left| { I }_{ u } \right| }^{ -\frac { 1 }{ 2 } }\sum _{ i\in { I }_{ u } } y_{ i }+{ \left| { T }_{ u } \right| }^{ -\frac { 1 }{ 2 } }\sum _{ v\in { T }_{ u } } w_{ v } \right)\]

We learn the values of involved parameters by minimizing the regularized squared error function associated with:

\[\begin{split}\begin{split} \mathcal{L} &=\frac { 1 }{ 2 } \sum _{ u }{ \sum _{ j\in { I }_{ u } }{ { ({ \hat { r } }_{ u,j } -{ r }_{ u,j }) }^{ 2 } } } + \frac { { \lambda }_{ t } }{ 2 } \sum _{ u }{ \sum _{ v\in { T }_{ u } }{ { ( { \hat { t } }_{ u,v } -{ t }_{ u,v }) }^{ 2 } } } \\ &+\frac { { \lambda } }{ 2 } \sum _{ u }^{ }{ { \left| { I }_{ u } \right| }^{ -\frac { 1 }{ 2 } }{ b }_{ u }^{ 2 } } +\frac { { \lambda } }{ 2 } \sum _{ j }{ { \left| { U }_{ j } \right| }^{ -\frac { 1 }{ 2 } }{ b }_{ j }^{ 2 } } \\ &+\sum _{ u }^{ }{ (\frac { { \lambda } }{ 2 } { \left| { I }_{ u } \right| }^{ -\frac { 1 }{ 2 } }+\frac { { \lambda }_{ t } }{ 2 } { \left| { T }_{ u } \right| }^{ -\frac { 1 }{ 2 } }{ )\left\| { p }_{ u } \right\| }_{ F }^{ 2 } } \\ &+\frac { { \lambda } }{ 2 } \sum _{ j }{ { \left| { U }_{ j } \right| }^{ -\frac { 1 }{ 2 } }{ \left\| { q }_{ j } \right\| }_{ F }^{ 2 } } +\frac { { \lambda } }{ 2 } \sum _{ i }{ { \left| { U }_{ i } \right| }^{ -\frac { 1 }{ 2 } }{ \left\| { y }_{ i } \right\| }_{ F }^{ 2 } } \\ &+\frac { { \lambda } }{ 2 } { \left| { T }_{ v }^{ + } \right| }^{ -\frac { 1 }{ 2 } }{ \left\| { w }_{ v } \right\| }_{ F }^{ 2 } \end{split}\end{split}\]

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 import Orange
 from orangecontrib.recommendation import TrustSVDLearner

 # Load data and train the model
 ratings = Orange.data.Table('filmtrust/ratings.tab')
 trust = Orange.data.Table('filmtrust/trust.tab')
 learner = TrustSVDLearner(num_factors=15, num_iter=25, learning_rate=0.07,
                           lmbda=0.1, social_lmbda=0.05, trust=trust)
 recommender = learner(data)

 # Make predictions
 prediction = recommender(data[:3])
 print(prediction)
class orangecontrib.recommendation.TrustSVDLearner(num_factors=5, num_iter=25, learning_rate=0.07, bias_learning_rate=None, lmbda=0.1, bias_lmbda=None, social_lmbda=0.05, min_rating=None, max_rating=None, trust=None, optimizer=None, preprocessors=None, verbose=False, random_state=None, callback=None)[source]

Trust-based matrix factorization

This model uses stochastic gradient descent to find four low-rank matrices: user-feature matrix, item-feature matrix, feedback-feature matrix and trustee-feature matrix.

Attributes:
num_factors: int, optional
The number of latent factors.
num_iter: int, optional
The number of passes over the training data (aka epochs).
learning_rate: float, optional
The learning rate controlling the size of update steps (general).
bias_learning_rate: float, optional
The learning rate controlling the size of the bias update steps. If None (default), bias_learning_rate = learning_rate
lmbda: float, optional
Controls the importance of the regularization term (general). Avoids overfitting by penalizing the magnitudes of the parameters.
bias_lmbda: float, optional
Controls the importance of the bias regularization term. If None (default), bias_lmbda = lmbda
social_lmbda: float, optional
Controls the importance of the trust regularization term.
min_rating: float, optional
Defines the lower bound for the predictions. If None (default), ratings won’t be bounded.
max_rating: float, optional
Defines the upper bound for the predictions. If None (default), ratings won’t be bounded.
feedback: Orange.data.Table
Implicit feedback information. If None (default), implicit information will be inferred from the ratings (e.g.: item rated, means items seen).
trust: Orange.data.Table
Social trust information.
optimizer: Optimizer, optional
Set the optimizer for SGD. If None (default), classical SGD will be applied.
verbose: boolean or int, optional
Prints information about the process according to the verbosity level. Values: False (verbose=0), True (verbose=1) and INTEGER
random_state: int seed, optional
Set the seed for the numpy random generator, so it makes the random numbers predictable. This a debbuging feature.
callback: callable
Method that receives the current iteration as an argument.
fit_storage(data)[source]

Fit the model according to the given training data.

Args:
data: Orange.data.Table
Returns:
self: object
Returns self.