Models¶
Instantiating and Training a Collie Model
Collie provides architectures for several state-of-the-art recommendation model architectures for both non-hybrid and hybrid models, depending on if you would like to directly incorporate metadata into the model.
Since Collie utilizes PyTorch Lightning for model training, all models, by default:
Are compatible with CPU, GPU, multi-GPU, and TPU training
Allow for 16-bit precision
Integrate with common external loggers
Allow for extensive predefined and custom training callbacks
Are flexible with minimal boilerplate code
While each model’s API differs slightly, generally, the training procedure for each model will look like:
from collie.model import CollieTrainer, MatrixFactorizationModel
# assume you have ``interactions`` already defined and ready-to-go
model = MatrixFactorizationModel(interactions)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()
# now, ``model`` is ready to be used for inference, evaluation, etc.
model.save_model('model.pkl')
When we have side-data about items, this can be incorporated directly into the loss function of the model. For details on this, see Losses.
Hybrid Collie models allow incorporating side-data about items and/or users directly into the model. For an in-depth example of this, see Tutorials.
Creating a Custom Architecture
Collie not only houses incredible pre-defined architectures, but was built with customization in mind. All Collie recommendation models are built as subclasses of the BasePipeline
model, inheriting common loss calculation functions and model training boilerplate. This allows for a nice balance between both flexibility and faster iteration.
While any method can be overridden with more architecture-specific implementations, at the bare minimum, each additional model must override:
_setup_model
- Model architecture initializationforward
- Model step that accepts a batch of data of form(users, items), negative_items
and outputs a recommendation score for each item
If we wanted to create a custom model that performed a barebones matrix factorization calculation, in Collie, this would be implemented as:
import torch
from collie.model import BasePipeline, CollieTrainer, ScaledEmbedding
from collie.utils import get_init_arguments
class SimpleModel(BasePipeline):
def __init__(self, train, val, embedding_dim):
"""
Initialize a simple model that is a subclass of ``BasePipeline``.
Parameters
----------
train: ``collie.interactions`` object
val: ``collie.interactions`` object
embedding_dim: int
Number of latent factors to use for user and item embeddings
"""
super().__init__(**get_init_arguments())
def _setup_model(self, **kwargs):
"""Method for building model internals that rely on the data passed in."""
self.user_embeddings = ScaledEmbedding(num_embeddings=self.hparams.num_users,
embedding_dim=self.hparams.embedding_dim)
self.item_embeddings = ScaledEmbedding(num_embeddings=self.hparams.num_items,
embedding_dim=self.hparams.embedding_dim)
def forward(self, users, items):
"""
Forward pass through the model.
Parameters
----------
users: tensor, 1-d
Array of user indices
items: tensor, 1-d
Array of item indices
Returns
-------
preds: tensor, 1-d
Predicted scores
"""
return torch.mul(
self.user_embeddings(users), self.item_embeddings(items)
).sum(axis=1)
# assume you have ``train`` and ``val`` already defined and ready-to-go
model = SimpleModel(train, val, embedding_dim=10)
trainer = CollieTrainer(model, max_epochs=10)
trainer.fit(model)
model.eval()
# now, ``model`` is ready to be used for inference, evaluation, etc.
model.save_model('model.pkl')
See the source code for the BasePipeline
in Model Templates below for the calling order of each class method as well as initialization details for optimizers, schedulers, and more.
Standard Models¶
Matrix Factorization Model¶
- class collie.model.MatrixFactorizationModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, dropout_p: float = 0.0, sparse: bool = False, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
BasePipeline
Training pipeline for the matrix factorization model.
MatrixFactorizationModel
models have an embedding layer for both users and items which are dot-producted together to output a single float ranking value.Collie adds a twist on to this incredibly popular framework by allowing separate optimizers for embeddings and bias terms. With larger datasets and multiple epochs of training, a model might incorrectly learn to only optimize the bias terms for a quicker path towards a local loss minimum, essentially memorizing how popular each item is. By using a separate, slower optimizer for the bias terms (like Stochastic Gradient Descent), the model must prioritize optimizing the embeddings for meaningful, more varied recommendations, leading to a model that is able to achieve a much lower loss. See the documentation below for
bias_lr
andbias_optimizer
input arguments for implementation details.All
MatrixFactorizationModel
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollieTrainer, MatrixFactorizationModel model = MatrixFactorizationModel(train=train) trainer = CollieTrainer(model) trainer.fit(model) model.eval() # do evaluation as normal with ``model`` model.save_model(filename='model.pth') new_model = MatrixFactorizationModel(load_model_path='model.pth') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
embedding_dim (int) – Number of latent factors to use for user and item embeddings
dropout_p (float) – Probability of dropout
sparse (bool) – Whether or not to treat embeddings as sparse tensors. If
True
, cannot use weight decay on the optimizerlr (float) – Model learning rate
bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to
lr
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adagrad'
(fortorch.optim.Adagrad
)'adam'
(fortorch.optim.Adam
)'sparse_adam'
(fortorch.optim.SparseAdam
)
bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as
optimizer
, with the addition ofinfer
, which will set the optimizer equal tooptimizer
. Ifbias_optimizer
isNone
, only a single optimizer will be created for all model parametersloss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
y_range (tuple) – Specify as
(min, max)
to apply a sigmoid layer to the output score of the model to get predicted ratings within the range ofmin
andmax
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
- forward(users: tensor, items: tensor) tensor [source]¶
Forward pass through the model.
Simple matrix factorization for a single user and item looks like:
``prediction = (user_embedding * item_embedding) + user_bias + item_bias``
If dropout is added, it is applied to the two embeddings and not the biases.
- Parameters
users (tensor, 1-d) – Array of user indices
items (tensor, 1-d) – Array of item indices
- Returns
preds – Predicted ratings or rankings
- Return type
tensor, 1-d
Multilayer Perceptron Matrix Factorization Model¶
- class collie.model.MLPMatrixFactorizationModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, num_layers: int = 3, dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
BasePipeline
Training pipeline for the matrix factorization model with MLP layers instead of a final dot
product (like in
MatrixFactorizationModel
).MLPMatrixFactorizationModel
models have an embedding layer for both users and items which, are concatenated and sent through a MLP to output a single float ranking value.All
MLPMatrixFactorizationModel
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollieTrainer, MLPMatrixFactorizationModel model = MLPMatrixFactorizationModel(train=train) trainer = CollieTrainer(model) trainer.fit(model) model.eval() # do evaluation as normal with ``model`` model.save_model(filename='model.pth') new_model = MLPMatrixFactorizationModel(load_model_path='model.pth') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
embedding_dim (int) – Number of latent factors to use for user and item embeddings
num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula
embedding_dim * (2 ** (``num_layers
-current_layer_number
))``dropout_p (float) – Probability of dropout on the linear layers
lr (float) – Model learning rate
bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to
lr
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as
optimizer
, with the addition ofinfer
, which will set the optimizer equal tooptimizer
. Ifbias_optimizer
isNone
, only a single optimizer will be created for all model parametersloss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
y_range (tuple) – Specify as
(min, max)
to apply a sigmoid layer to the output score of the model to get predicted ratings within the range ofmin
andmax
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
- forward(users: tensor, items: tensor) tensor [source]¶
Forward pass through the model, roughly:
`prediction = MLP(concatenate(user_embedding * item_embedding)) + user_bias + item_bias`
If dropout is added, it is applied for the two embeddings and not the biases.
- Parameters
users (tensor, 1-d) – Array of user indices
items (tensor, 1-d) – Array of item indices
- Returns
preds – Predicted ratings or rankings
- Return type
tensor, 1-d
Nonlinear Embeddings Matrix Factorization Model¶
- class collie.model.NonlinearMatrixFactorizationModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, user_embedding_dim: int = 60, item_embedding_dim: int = 60, user_dense_layers_dims: ~typing.List[float] = [48, 32], item_dense_layers_dims: ~typing.List[float] = [48, 32], embedding_dropout_p: float = 0.0, dense_dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
BasePipeline
Training pipeline for a nonlinear matrix factorization model.
NonlinearMatrixFactorizationModel
models have an embedding layer for users and items. These are sent through separate dense networks, which output more refined embeddings, which are then dot producted for a single float ranking / rating.Collie adds a twist on to this novel framework by allowing separate optimizers for embeddings and bias terms. With larger datasets and multiple epochs of training, a model might incorrectly learn to only optimize the bias terms for a quicker path towards a local loss minimum, essentially memorizing how popular each item is. By using a separate, slower optimizer for the bias terms (like Stochastic Gradient Descent), the model must prioritize optimizing the embeddings for meaningful, more varied recommendations, leading to a model that is able to achieve a much lower loss. See the documentation below for
bias_lr
andbias_optimizer
input arguments for implementation details.All
NonlinearMatrixFactorizationModel
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollieTrainer, NonlinearMatrixFactorizationModel model = NonlinearMatrixFactorizationModel(train=train) trainer = CollieTrainer(model) trainer.fit(model) model.eval() # do evaluation as normal with ``model`` model.save_model(filename='model.pth') new_model = NonlinearMatrixFactorizationModel(load_model_path='model.pth') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
user_embedding_dim (int) – Number of latent factors to use for user embeddings
item_embedding_dim (int) – Number of latent factors to use for item embeddings
user_dense_layers_dims (list) – List of linear layer dimensions to apply to the user embedding, starting with the dimension directly following
user_embedding_dim
item_dense_layers_dims (list) – List of linear layer dimensions to apply to the item embedding, starting with the dimension directly following
item_embedding_dim
embedding_dropout_p (float) – Probability of dropout on the embedding layers
dense_dropout_p (float) – Probability of dropout on the dense layers
lr (float) – Model learning rate
bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to
lr
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as
optimizer
, with the addition ofinfer
, which will set the optimizer equal tooptimizer
. Ifbias_optimizer
isNone
, only a single optimizer will be created for all model parametersloss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
y_range (tuple) – Specify as
(min, max)
to apply a sigmoid layer to the output score of the model to get predicted ratings within the range ofmin
andmax
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
Collaborative Metric Learning Model¶
- class collie.model.CollaborativeMetricLearningModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, sparse: bool = False, lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
BasePipeline
Training pipeline for the collaborative metric learning model.
CollaborativeMetricLearningModel
models have an embedding layer for both users and items. A single float, prediction is retrieved by taking the pairwise distance between the two embeddings.The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1803.00202.pdf 1
All
CollaborativeMetricLearningModel
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollaborativeMetricLearningModel, CollieTrainer model = CollaborativeMetricLearningModel(train=train) trainer = CollieTrainer(model) trainer.fit(model) model.eval() # do evaluation as normal with ``model`` model.save_model(filename='model.pth') new_model = CollaborativeMetricLearningModel(load_model_path='model.pth') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
embedding_dim (int) – Number of latent factors to use for user and item embeddings
sparse (bool) – Whether or not to treat embeddings as sparse tensors. If
True
, cannot use weight decay on the optimizerlr (float) – Model learning rate
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adagrad'
(fortorch.optim.Adagrad
)'adam'
(fortorch.optim.Adam
)'sparse_adam'
(fortorch.optim.SparseAdam
)
loss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
y_range (tuple) – Specify as
(min, max)
to apply a sigmoid layer to the output score of the model to get predicted ratings within the range ofmin
andmax
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
References
- 1
Campo, Miguel, et al. “Collaborative Metric Learning Recommendation System: Application to Theatrical Movie Releases.” ArXiv.org, 1 Mar. 2018, arxiv.org/abs/1803.00202.
- forward(users: tensor, items: tensor) tensor [source]¶
Forward pass through the model, equivalent to:
`prediction = pairwise_distance(user_embedding * item_embedding)`
- Parameters
users (tensor, 1-d) – Array of user indices
items (tensor, 1-d) – Array of item indices
- Returns
preds – Predicted ratings or rankings
- Return type
tensor, 1-d
Neural Collaborative Filtering (NeuCF)¶
- class collie.model.NeuralCollaborativeFiltering(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 8, num_layers: int = 3, final_layer: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~torch._VariableFunctionsClass.tensor], ~torch._VariableFunctionsClass.tensor]]] = None, dropout_p: float = 0.0, lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
BasePipeline
Training pipeline for a neural matrix factorization model.
NeuralCollaborativeFiltering
models combine a collaborative filtering and multilayer perceptron network in a single, unified model. The model consists of two sections: the first is a simple matrix factorization that calculates a score by multiplying together user and item embeddings (lookups through an embedding table); the second is a MLP network that feeds embeddings from a second set of embedding tables (one for user, one for item). Both output vectors are combined and sent through a final MLP layer before returning a single recommendation score.The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1708.05031.pdf 2
All
NeuralCollaborativeFiltering
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollieTrainer, NeuralCollaborativeFiltering model = NeuralCollaborativeFiltering(train=train) trainer = CollieTrainer(model) trainer.fit(model) model.eval() # do evaluation as normal with ``model`` model.save_model(filename='model.pth') new_model = NeuralCollaborativeFiltering(load_model_path='model.pth') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
embedding_dim (int) – Number of latent factors to use for the matrix factorization embedding table. For the MLP embedding table, the dimensionality will be calculated with the formula
embedding_dim * (2 ** (num_layers - 1))
num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula
embedding_dim * (2 ** (``num_layers
-current_layer_number
))``final_layer (str or function) –
Final layer activation function. Available string options include:
’sigmoid’
’relu’
’leaky_relu’
dropout_p (float) – Probability of dropout on the MLP layers
lr (float) – Model learning rate
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
loss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
References
- 2
Xiangnan et al. “Neural Collaborative Filtering.” Neural Collaborative Filtering | Proceedings of the 26th International Conference on World Wide Web, 1 Apr. 2017, dl.acm.org/doi/10.1145/3038912.3052569.
Deep Factorization Machine (DeepFM)¶
- class collie.model.DeepFM(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 8, num_layers: int = 3, final_layer: ~typing.Optional[~typing.Union[str, ~typing.Callable[[...], ~typing.Any]]] = None, dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
BasePipeline
Training pipeline for a deep factorization model.
DeepFM
models combine a shallow factorization machine and a deep multilayer perceptron network in a single, unified model. The model consists of embedding tables for users and items, and model output is the sum of 1) factorization machine output of both embeddings (shallow) and 2) MLP output for the concatenation of both embeddings (deep).The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1703.04247.pdf 3
All
DeepFM
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollieTrainer, DeepFM model = DeepFM(train=train) trainer = CollieTrainer(model) trainer.fit(model) model.eval() # do evaluation as normal with ``model`` model.save_model(filename='model.pth') new_model = DeepFM(load_model_path='model.pth') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
embedding_dim (int) – Number of latent factors to use for the matrix factorization embedding table. For the MLP embedding table, the dimensionality will be calculated with the formula
embedding_dim * (2 ** (num_layers - 1))
num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula
embedding_dim * (2 ** (``num_layers
-current_layer_number
))``final_layer (str or function) –
Final layer activation function. Available string options include:
’sigmoid’
’relu’
’leaky_relu’
dropout_p (float) – Probability of dropout
lr (float) – Model learning rate
bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to
lr
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as
optimizer
, with the addition ofinfer
, which will set the optimizer equal tooptimizer
. Ifbias_optimizer
isNone
, only a single optimizer will be created for all model parametersloss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
References
- 3
Guo, Huifeng, et al. “DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction.” ArXiv.org, 13 Mar. 2017, arxiv.org/abs/1703.04247.
Hybrid Models¶
Hybrid Pretrained Matrix Factorization Model¶
- class collie.model.HybridPretrainedModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, user_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, trained_model: ~typing.Optional[~collie.model.matrix_factorization.MatrixFactorizationModel] = None, item_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, user_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, combined_layers_dims: ~typing.List[int] = [128, 64, 32], freeze_embeddings: bool = True, dropout_p: float = 0.0, lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
BasePipeline
Training pipeline for a hybrid recommendation model using a pre-trained matrix factorization
model as its base.
HybridPretrainedModel
models contain dense layers that process item and/or user metadata, concatenate this embedding with the user and item embeddings copied from a trainedMatrixFactorizationModel
, and send this concatenated embedding through more dense layers to output a single float ranking / rating. We add both user and item biases to this score before returning. This is the same architecture as theHybridModel
, but we are using the embeddings from a pre-trained model rather than training them up ourselves.All
HybridPretrainedModel
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollieTrainer, HybridPretrainedModel, MatrixFactorizationModel # instantiate and fit a ``MatrixFactorizationModel`` as expected mf_model = MatrixFactorizationModel(train=train) mf_trainer = CollieTrainer(mf_model) mf_trainer.fit(mf_model) hybrid_model = HybridPretrainedModel(train=train, item_metadata=item_metadata, user_metadata=user_metadata, trained_model=mf_model) hybrid_trainer = CollieTrainer(hybrid_model) hybrid_trainer.fit(hybrid_model) hybrid_model.eval() # do evaluation as normal with ``hybrid_model`` hybrid_model.save_model(path='model') new_hybrid_model = HybridPretrainedModel(load_model_path='model') # do evaluation as normal with ``new_hybrid_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
item_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the item metadata should be (num_items x metadata_features), and each item’s metadata should be available when indexing a row by an item ID
user_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the user metadata should be (num_users x metadata_features), and each user’s metadata should be available when indexing a row by an item ID
trained_model (
collie.model.MatrixFactorizationModel
) – Previously trainedMatrixFactorizationModel
model to extract embeddings fromitem_metadata_layers_dims (list) – List of linear layer dimensions to apply to the item metadata only, starting with the dimension directly following
metadata_features
and ending with the dimension to concatenate with the item embeddingsuser_metadata_layers_dims (list) – List of linear layer dimensions to apply to the user metadata only, starting with the dimension directly following
metadata_features
and ending with the dimension to concatenate with the user embeddingscombined_layers_dims (list) – List of linear layer dimensions to apply to the concatenated item embeddings and item metadata, starting with the dimension directly following the shape of
item_embeddings + metadata_features
and ending with the dimension before the final linear layer to dimension 1freeze_embeddings (bool) – When initializing the model, whether or not to freeze
trained_model
’s embeddingsdropout_p (float) – Probability of dropout
lr (float) – Model learning rate
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
loss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
- forward(users: tensor, items: tensor) tensor [source]¶
Forward pass through the model.
- Parameters
users (tensor, 1-d) – Array of user indices
items (tensor, 1-d) – Array of item indices
- Returns
preds – Predicted ratings or rankings
- Return type
tensor, 1-d
- load_from_hybrid_model(hybrid_model) None [source]¶
Copy hyperparameters and state dictionary from an existing
HybridPretrainedModel
instance.This is particularly useful for creating another PyTorch Lightning trainer object to fine-tune copied-over embeddings from a
MatrixFactorizationModel
instance.- Parameters
hybrid_model (
collie.model.HybridPretrainedModel
) – HybridPretrainedModel containing hyperparameters and state dictionary to copy over
- save_model(path: Union[str, Path] = 'data/model', overwrite: bool = False) None [source]¶
Save the model’s state dictionary, hyperparameters, and user and/or item metadata.
While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:
To properly save and load a model requires the
Trainer
object, meaning that all deployed models will require Lightning to run the model, which is not actually needed for inference.In the v0.8.4 release, loading a model back in leads to a
RuntimeError
unable to load in weights.
- Parameters
path (str or Path) – Directory path to save model and data files
overwrite (bool) – Whether or not to overwrite existing data
Multi-Stage Models¶
Cold Start Matrix Factorization Model¶
- class collie.model.ColdStartModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_buckets: ~typing.Optional[~typing.Iterable[int]] = None, embedding_dim: int = 30, dropout_p: float = 0.0, sparse: bool = False, item_buckets_stage_lr: float = 0.001, no_buckets_stage_lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=False), weight_decay: float = 0.0, item_buckets_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', no_buckets_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
MultiStagePipeline
Training pipeline for a matrix factorization model optimized for the cold-start problem.
Many recommendation models suffer from the cold start problem, when a model is unable to provide adequate recommendations for a new item until enough users have interacted with it. But, if users only interact with recommended items, the item will never be recommended, and thus the model will never improve recommendations for this item.
The
ColdStartModel
attempts to bypass this by limiting the item space down to “item buckets”, training a model on this as the item space, then expanding out to all items. During this expansion, the learned-embeddings of each bucket is copied over to each corresponding item, providing a smarter initialization than a random one for both existing and new items. Now, when we have a new item, we can use its bucket embedding as an initialization into a model.The stages in a
ColdStartModel
are, in order:item_buckets
Matrix factorization with item embeddings and bias terms bucketed by
item_buckets
argument. Unlike in the next stage, many items may map on to a single bucket, and this will share the same embedding and bias representation. The model should learn user preference for buckets in this stage.
no_buckets
Standard matrix factorization as we do in
MatrixFactorizationModel
. However, upon advancing to this stage, the item embeddings are initialized with their bucketed embedding value (and same for biases). Not only does this provide better initialization than random, but allows new items to be incorporated into the model without training by using their item bucket embedding and bias terms at prediction time.
Note that the cold start problem exists for new users as well, but this functionality will be added to this model in a future version.
All
ColdStartModel
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import ColdStartModel, CollieTrainer # instantiate and fit a ``ColdStartModel`` as expected model = ColdStartModel(train=train, item_buckets=item_buckets) trainer = CollieTrainer(model) trainer.fit(model) # train for X more epochs on the next stage, ``no_buckets`` trainer.max_epochs += X model.advance_stage() trainer.fit(model) model.eval() # do evaluation as normal with ``model`` # get item-item recommendations for a new item by using the bucket ID, Z similar_items = model.item_bucket_item_similarity(item_bucket_id=Z) model.save_model(filename='model.pth') new_model = ColdStartModel(load_model_path='model.pth') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
item_buckets (torch.tensor, 1-d) –
An ordered iterable containing the bucket ID for each item ID. For example, if you have five films and are going to bucket by primary genre, and your data looks like:
Item ID: 0, Genre ID: 1
Item ID: 1, Genre ID: 0
Item ID: 2, Genre ID: 2
Item ID: 3, Genre ID: 2
Item ID: 4, Genre ID: 1
Then
item_buckets
would be:[1, 0, 2, 2, 1]
embedding_dim (int) – Number of latent factors to use for user and item embeddings
dropout_p (float) – Probability of dropout
item_buckets_stage_lr (float) –
Optimizer used for user parameters and item bucket parameters optimized during the
item_buckets
stage. If a string, one of the following supported optimizers:'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
no_buckets_stage_lr (float) –
Optimizer used for user parameters and item parameters optimized during the
no_buckets
stage. If a string, one of the following supported optimizers:'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
loss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
Notes
The
forward
calculation will be different depending on the stage that is set. Note this when evaluating / saving and loading models in.- forward(users: tensor, items: tensor) tensor [source]¶
Forward pass through the model.
- Parameters
users (tensor, 1-d) – Array of user indices
items (tensor, 1-d) – Array of item indices
- Returns
preds – Predicted ratings or rankings
- Return type
tensor, 1-d
- item_bucket_item_similarity(item_bucket_id: int) Series [source]¶
Get most similar item indices to a item bucket by cosine similarity.
Cosine similarity is computed with item and item bucket embeddings from a trained model.
- Parameters
item_id (int) –
- Returns
sim_score_idxs – Sorted values as cosine similarity for each item in the dataset with the index being the item ID
- Return type
pd.Series
Hybrid Matrix Factorization Model¶
- class collie.model.HybridModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, user_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, embedding_dim: int = 30, item_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, user_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, combined_layers_dims: ~typing.List[int] = [128, 64, 32], dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, metadata_only_stage_lr: float = 0.001, all_stage_lr: float = 0.0001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=False), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', metadata_only_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', all_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]¶
Bases:
MultiStagePipeline
Training pipeline for a multi-stage hybrid recommendation model.
HybridModel
models contain dense layers that process item and/or user metadata, concatenate this embedding with user and item embeddings, sending this concatenated embedding through more dense layers to output a single float ranking / rating. We add both user and item biases to this score before returning. This is the same architecture as theHybridPretrainedModel
, but we are training the embeddings ourselves rather than relying on pulling this from a pre-trained model.The stages in a
HybridModel
depend on whether both item and user metadata is used. For the full model, they are, in order:matrix_factorization
Matrix factorization exactly as we do in
MatrixFactorizationModel
. In this stage, metadata is NOT incorporated into the model.
metadata_only
User and item embeddings terms are frozen, and the MLP layers for the metadata (if specified) and combined embedding-metadata data are optimized.
all
Embedding and MLP layers are all optimized together, including those for metadata.
All
HybridModel
instances are subclasses of theLightningModule
class provided by PyTorch Lightning. This means to train a model, you will need acollie.model.CollieTrainer
object, but the model can be saved and loaded without thisTrainer
instance. Example usage may look like:from collie.model import CollieTrainer, HybridModel # instantiate and fit a ``HybridModel`` as expected model = HybridModel(train=train, item_metadata=item_metadata, user_metadata=user_metadata) trainer = CollieTrainer(model) trainer.fit(model) # train for X more epochs on the next stage, ``metadata_only`` trainer.max_epochs += X model.advance_stage() trainer.fit(model) # train for Y more epochs on the next stage, ``all`` trainer.max_epochs += Y model.advance_stage() trainer.fit(model) model.eval() # do evaluation as normal with ``model`` model.save_model(path='model') new_model = HybridModel(load_model_path='model') # do evaluation as normal with ``new_model``
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
item_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the item metadata should be (num_items x metadata_features), and each item’s metadata should be available when indexing a row by an item ID
user_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the user metadata should be (num_users x metadata_features), and each user’s metadata should be available when indexing a row by a user ID
embedding_dim (int) – Number of latent factors to use for user and item embeddings
item_metadata_layers_dims (list) – List of linear layer dimensions to apply to the item metadata only, starting with the dimension directly following
item_metadata_features
and ending with the dimension to concatenate with the item embeddingsuser_metadata_layers_dims (list) – List of linear layer dimensions to apply to the user metadata only, starting with the dimension directly following
user_metadata_features
and ending with the dimension to concatenate with the user embeddingscombined_layers_dims (list) – List of linear layer dimensions to apply to the concatenated item embeddings and item metadata, starting with the dimension directly following the shape of
item_embeddings + metadata_features
and ending with the dimension before the final linear layer to dimension 1dropout_p (float) – Probability of dropout
metadata_only_stage_lr (float) – Learning rate for metadata and combined layers optimized during the
metadata_only
stageall_stage_lr (float) – Learning rate for all model parameters optimized during the
all
stagelr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
Optimizer used for embeddings and bias terms (if
bias_optimizer
isNone
) during thematrix_factorization
stage. If a string, one of the following supported optimizers:'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
metadata_only_stage_optimizer (torch.optim or str) –
Optimizer used for metadata and combined layers during the
metadata_only
stage. If a string, one of the following supported optimizers:'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
all_stage_optimizer (torch.optim or str) –
Optimizer used for all model parameters during the
all
stage. If a string, one of the following supported optimizers:'sgd'
(fortorch.optim.SGD
)'adam'
(fortorch.optim.Adam
)
loss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary
Notes
The
forward
calculation will be different depending on the stage that is set. Note this when evaluating / saving and loading models in.- forward(users: tensor, items: tensor) tensor [source]¶
Forward pass through the model.
- Parameters
users (tensor, 1-d) – Array of user indices
items (tensor, 1-d) – Array of item indices
- Returns
preds – Predicted ratings or rankings
- Return type
tensor, 1-d
- save_model(path: Union[str, Path] = 'data/model', overwrite: bool = False) None [source]¶
Save the model’s state dictionary, hyperparameters, and user and/or item metadata.
While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:
To properly save and load a model requires the
Trainer
object, meaning that all deployed models will require Lightning to run the model, which is not actually needed for inference.In the v0.8.4 release, loading a model back in leads to a
RuntimeError
unable to load in weights.
- Parameters
path (str or Path) – Directory path to save model and data files
overwrite (bool) – Whether or not to overwrite existing data
Trainers¶
PyTorch Lightning Trainer¶
- class collie.model.CollieTrainer(model: Module, max_epochs: int = 10, benchmark: bool = True, deterministic: bool = True, **kwargs)[source]¶
Bases:
Trainer
Helper wrapper class around PyTorch Lightning’s
Trainer
class.Specifically, this wrapper:
Checks if a model has a validation dataset passed in (under the
val_loader
attribute) and, if not, setsnum_sanity_val_steps
to 0 andcheck_val_every_n_epoch
tosys.maxint
.Checks if a GPU is available and, if
gpus is None
, setsgpus = -1
.
See
pytorch_lightning.Trainer
documentation for more details at: https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-class-apiCompared with
CollieMinimalTrainer
, PyTorch Lightning’sTrainer
offers more flexibility and room for exploration, at the cost of a higher training time (which is especially true for larger models). We recommend starting all model exploration with thisCollieTrainer
(callbacks, automatic Lightning optimizations, etc.), finding a set of hyperparameters that work for your training job, then using this in the simpler but fasterCollieMinimalTrainer
.- Parameters
model (collie.model.BasePipeline) – Initialized Collie model
max_epochs (int) – Stop training once this number of epochs is reached
benchmark (bool) – If set to
True
, enablescudnn.benchmark
deterministic (bool) – If set to
True
, enablescudnn.deterministic
**kwargs (keyword arguments) – Additional keyword arguments to be sent to the
Trainer
class: https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-class-apifollows (Original pytorch_lightning.Trainer docstring as) –
######## –
Customize every aspect of training via flags.
Args:
- accelerator: Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps”, “auto”)
as well as custom accelerator instances.
- accumulate_grad_batches: Accumulates grads every k batches or as set up in the dict.
Default:
None
.- amp_backend: The mixed precision backend to use (“native” or “apex”).
Default:
'native''
.Deprecated since version v1.9: Setting
amp_backend
inside theTrainer
is deprecated in v1.8.0 and will be removed in v2.0.0. This argument was only relevant for apex which is being removed.- amp_level: The optimization level to use (O1, O2, etc…). By default it will be set to “O2”
if
amp_backend
is set to “apex”.Deprecated since version v1.8: Setting
amp_level
inside theTrainer
is deprecated in v1.8.0 and will be removed in v2.0.0.- auto_lr_find: If set to True, will make trainer.tune() run a learning rate finder,
trying to optimize initial learning for faster convergence. trainer.tune() method will set the suggested learning rate in self.lr or self.learning_rate in the LightningModule. To use a different key set a string instead of True with the key name. Default:
False
.- auto_scale_batch_size: If set to True, will initially run a batch size
finder trying to find the largest batch size that fits into memory. The result will be stored in self.batch_size in the LightningModule or LightningDataModule depending on your setup. Additionally, can be set to either power that estimates the batch size through a power search or binsearch that estimates the batch size through a binary search. Default:
False
.- auto_select_gpus: If enabled and
gpus
ordevices
is an integer, pick available gpus automatically. This is especially useful when GPUs are configured to be in “exclusive mode”, such that only one process at a time can access them. Default:
False
.Deprecated since version v1.9:
auto_select_gpus
has been deprecated in v1.9.0 and will be removed in v2.0.0. Please use the functionfind_usable_cuda_devices()
instead.- benchmark: The value (
True
orFalse
) to settorch.backends.cudnn.benchmark
to. The value for
torch.backends.cudnn.benchmark
set in the current session will be used (False
if not manually set). If :paramref:`~pytorch_lightning.trainer.Trainer.deterministic` is set toTrue
, this will default toFalse
. Override to manually set a different value. Default:None
.- callbacks: Add a callback or list of callbacks.
Default:
None
.- enable_checkpointing: If
True
, enable checkpointing. It will configure a default ModelCheckpoint callback if there is no user-defined ModelCheckpoint in :paramref:`~pytorch_lightning.trainer.trainer.Trainer.callbacks`. Default:
True
.- check_val_every_n_epoch: Perform a validation loop every after every N training epochs. If
None
, validation will be done solely based on the number of training batches, requiring
val_check_interval
to be an integer value. Default:1
.- default_root_dir: Default path for logs and weights when no logger/ckpt_callback passed.
Default:
os.getcwd()
. Can be remote file paths such as s3://mybucket/path or ‘hdfs://path/’- detect_anomaly: Enable anomaly detection for the autograd engine.
Default:
False
.- deterministic: If
True
, sets whether PyTorch operations must use deterministic algorithms. Set to
"warn"
to use deterministic algorithms whenever possible, throwing warnings on operations that don’t support deterministic mode (requires PyTorch 1.11+). If not set, defaults toFalse
. Default:None
.- devices: Will be mapped to either gpus, tpu_cores, num_processes or ipus,
based on the accelerator type.
- fast_dev_run: Runs n if set to
n
(int) else 1 if set toTrue
batch(es) of train, val and test to find any bugs (ie: a sort of unit test). Default:
False
.- gpus: Number of GPUs to train on (int) or which GPUs to train on (list or str) applied per node
Default:
None
.Deprecated since version v1.7:
gpus
has been deprecated in v1.7 and will be removed in v2.0. Please useaccelerator='gpu'
anddevices=x
instead.- gradient_clip_val: The value at which to clip gradients. Passing
gradient_clip_val=None
disables gradient clipping. If using Automatic Mixed Precision (AMP), the gradients will be unscaled before. Default:
None
.- gradient_clip_algorithm: The gradient clipping algorithm to use. Pass
gradient_clip_algorithm="value"
to clip by value, and
gradient_clip_algorithm="norm"
to clip by norm. By default it will be set to"norm"
.- limit_train_batches: How much of training dataset to check (float = fraction, int = num_batches).
Default:
1.0
.- limit_val_batches: How much of validation dataset to check (float = fraction, int = num_batches).
Default:
1.0
.- limit_test_batches: How much of test dataset to check (float = fraction, int = num_batches).
Default:
1.0
.- limit_predict_batches: How much of prediction dataset to check (float = fraction, int = num_batches).
Default:
1.0
.- logger: Logger (or iterable collection of loggers) for experiment tracking. A
True
value uses the default
TensorBoardLogger
if it is installed, otherwiseCSVLogger
.False
will disable logging. If multiple loggers are provided, local files (checkpoints, profiler traces, etc.) are saved in thelog_dir
of he first logger. Default:True
.- log_every_n_steps: How often to log within steps.
Default:
50
.- enable_progress_bar: Whether to enable to progress bar by default.
Default:
True
.- profiler: To profile individual steps during training and assist in identifying bottlenecks.
Default:
None
.- overfit_batches: Overfit a fraction of training/validation data (float) or a set number of batches (int).
Default:
0.0
.- plugins: Plugins allow modification of core behavior like ddp and amp, and enable custom lightning plugins.
Default:
None
.- precision: Double precision (64), full precision (32), half precision (16) or bfloat16 precision (bf16).
Can be used on CPU, GPU, TPUs, HPUs or IPUs. Default:
32
.- max_epochs: Stop training once this number of epochs is reached. Disabled by default (None).
If both max_epochs and max_steps are not specified, defaults to
max_epochs = 1000
. To enable infinite training, setmax_epochs = -1
.
min_epochs: Force training for at least these many epochs. Disabled by default (None).
- max_steps: Stop training after this number of steps. Disabled by default (-1). If
max_steps = -1
and
max_epochs = None
, will default tomax_epochs = 1000
. To enable infinite training, setmax_epochs
to-1
.
min_steps: Force training for at least these number of steps. Disabled by default (
None
).- max_time: Stop training after this amount of time has passed. Disabled by default (
None
). The time duration can be specified in the format DD:HH:MM:SS (days, hours, minutes seconds), as a
datetime.timedelta
, or a dictionary with keys that will be passed todatetime.timedelta
.- num_nodes: Number of GPU nodes for distributed training.
Default:
1
.- num_processes: Number of processes for distributed training with
accelerator="cpu"
. Default:
1
.Deprecated since version v1.7:
num_processes
has been deprecated in v1.7 and will be removed in v2.0. Please useaccelerator='cpu'
anddevices=x
instead.- num_sanity_val_steps: Sanity check runs n validation batches before starting the training routine.
Set it to -1 to run all batches in all validation dataloaders. Default:
2
.- reload_dataloaders_every_n_epochs: Set to a non-negative integer to reload dataloaders every n epochs.
Default:
0
.- replace_sampler_ddp: Explicitly enables or disables sampler replacement. If not specified this
will toggled automatically when DDP is used. By default it will add
shuffle=True
for train sampler andshuffle=False
for val/test sampler. If you want to customize it, you can setreplace_sampler_ddp=False
and add your own distributed sampler.- resume_from_checkpoint: Path/URL of the checkpoint from which training is resumed. If there is
no checkpoint file at the path, an exception is raised. If resuming from mid-epoch checkpoint, training will start from the beginning of the next epoch.
Deprecated since version v1.5:
resume_from_checkpoint
is deprecated in v1.5 and will be removed in v2.0. Please pass the path toTrainer.fit(..., ckpt_path=...)
instead.- strategy: Supports different training strategies with aliases
as well custom strategies. Default:
None
.- sync_batchnorm: Synchronize batch norm layers between process groups/whole world.
Default:
False
.- tpu_cores: How many TPU cores to train on (1 or 8) / Single TPU to train on (1)
Default:
None
.Deprecated since version v1.7:
tpu_cores
has been deprecated in v1.7 and will be removed in v2.0. Please useaccelerator='tpu'
anddevices=x
instead.- ipus: How many IPUs to train on.
Default:
None
.Deprecated since version v1.7:
ipus
has been deprecated in v1.7 and will be removed in v2.0. Please useaccelerator='ipu'
anddevices=x
instead.- track_grad_norm: -1 no tracking. Otherwise tracks that p-norm. May be set to ‘inf’ infinity-norm. If using
Automatic Mixed Precision (AMP), the gradients will be unscaled before logging them. Default:
-1
.- val_check_interval: How often to check the validation set. Pass a
float
in the range [0.0, 1.0] to check after a fraction of the training epoch. Pass an
int
to check after a fixed number of training batches. Anint
value can only be higher than the number of training batches whencheck_val_every_n_epoch=None
, which validates after everyN
training batches across epochs or during iteration-based training. Default:1.0
.- enable_model_summary: Whether to enable model summarization by default.
Default:
True
.- move_metrics_to_cpu: Whether to force internal logged metrics to be moved to cpu.
This can save some gpu memory, but can make training slower. Use with attention. Default:
False
.- multiple_trainloader_mode: How to loop over the datasets when there are multiple train loaders.
In ‘max_size_cycle’ mode, the trainer ends one epoch when the largest dataset is traversed, and smaller datasets reload when running out of their data. In ‘min_size’ mode, all the datasets reload when reaching the minimum length of datasets. Default:
"max_size_cycle"
.- inference_mode: Whether to use
torch.inference_mode()
ortorch.no_grad()
during evaluation (
validate
/test
/predict
).
- property checkpoint_callback: Optional[Checkpoint]¶
The first
ModelCheckpoint
callback in the Trainer.callbacks list, orNone
if it doesn’t exist.
- property checkpoint_callbacks: List[Checkpoint]¶
A list of all instances of
ModelCheckpoint
found in the Trainer.callbacks list.
- property ckpt_path: Optional[str]¶
Set to the path/URL of a checkpoint loaded via
fit()
,validate()
,test()
, orpredict()
.None
otherwise.
- property current_epoch: int¶
The current epoch, updated after the epoch end hooks are run.
- property default_root_dir: str¶
The default location to save artifacts of loggers, checkpoints etc.
It is used as a fallback if logger or checkpoint callback do not define specific save paths.
- property device_ids: List[int]¶
List of device indexes per node.
- property early_stopping_callback: Optional[EarlyStopping]¶
The first
EarlyStopping
callback in the Trainer.callbacks list, orNone
if it doesn’t exist.
- property early_stopping_callbacks: List[EarlyStopping]¶
A list of all instances of
EarlyStopping
found in the Trainer.callbacks list.
- property enable_validation: bool¶
Check if we should run validation during training.
- property estimated_stepping_batches: Union[int, float]¶
Estimated stepping batches for the complete training inferred from DataLoaders, gradient accumulation factor and distributed setup.
Examples:
def configure_optimizers(self): optimizer = ... scheduler = torch.optim.lr_scheduler.OneCycleLR( optimizer, max_lr=1e-3, total_steps=self.trainer.estimated_stepping_batches ) return [optimizer], [scheduler]
- fit(model: LightningModule, train_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], Sequence[Sequence[DataLoader]], Sequence[Dict[str, DataLoader]], Dict[str, DataLoader], Dict[str, Dict[str, DataLoader]], Dict[str, Sequence[DataLoader]], LightningDataModule]] = None, val_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader]]] = None, datamodule: Optional[LightningDataModule] = None, ckpt_path: Optional[str] = None) None ¶
Runs the full optimization routine.
- Parameters
model – Model to fit.
train_dataloaders – A collection of
torch.utils.data.DataLoader
or aLightningDataModule
specifying training samples. In the case of multiple dataloaders, please see this section.val_dataloaders – A
torch.utils.data.DataLoader
or a sequence of them specifying validation samples.ckpt_path – Path/URL of the checkpoint from which training is resumed. Could also be one of two special keywords
"last"
and"hpc"
. If there is no checkpoint file at the path, an exception is raised. If resuming from mid-epoch checkpoint, training will start from the beginning of the next epoch.datamodule – An instance of
LightningDataModule
.
- property global_step: int¶
The number of optimizer steps taken (does not reset each epoch).
This includes multiple optimizers and TBPTT steps (if enabled).
- property is_last_batch: bool¶
Whether trainer is executing the last batch.
- property max_epochs¶
Property that just returns
max_epochs
, included only so we can have a setter for it without anAttributeError
.
- property model: Optional[Module]¶
The LightningModule, but possibly wrapped into DataParallel or DistributedDataParallel.
To access the pure LightningModule, use
lightning_module()
instead.
- property num_devices: int¶
Number of devices the trainer uses per node.
- predict(model: Optional[LightningModule] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], LightningDataModule]] = None, datamodule: Optional[LightningDataModule] = None, return_predictions: Optional[bool] = None, ckpt_path: Optional[str] = None) Optional[Union[List[Any], List[List[Any]]]] ¶
Run inference on your data. This will call the model forward function to compute predictions. Useful to perform distributed and batched predictions. Logging is disabled in the predict hooks.
- Parameters
model – The model to predict with.
dataloaders – A
torch.utils.data.DataLoader
or a sequence of them, or aLightningDataModule
specifying prediction samples.datamodule – The datamodule with a predict_dataloader method that returns one or more dataloaders.
return_predictions – Whether to return predictions.
True
by default except when an accelerator that spawns processes is used (not supported).ckpt_path – Either
"best"
,"last"
,"hpc"
or path to the checkpoint you wish to predict. IfNone
and the model instance was passed, use the current weights. Otherwise, the best model checkpoint from the previoustrainer.fit
call will be loaded if a checkpoint callback is configured.
- Returns
Returns a list of dictionaries, one for each provided dataloader containing their respective predictions.
See Lightning inference section for more.
- property prediction_writer_callbacks: List[BasePredictionWriter]¶
A list of all instances of
BasePredictionWriter
found in the Trainer.callbacks list.
- property progress_bar_callback: Optional[ProgressBarBase]¶
An instance of
ProgressBarBase
found in the Trainer.callbacks list, orNone
if one doesn’t exist.
- reset_predict_dataloader(model: Optional[LightningModule] = None) None ¶
Resets the predict dataloader and determines the number of batches.
- Parameters
model – The
LightningModule
if called outside of the trainer scope.
- reset_test_dataloader(model: Optional[LightningModule] = None) None ¶
Resets the test dataloader and determines the number of batches.
- Parameters
model – The
LightningModule
if called outside of the trainer scope.
- reset_train_dataloader(model: Optional[LightningModule] = None) None ¶
Resets the train dataloader and initialises required variables (number of batches, when to validate, etc.).
- Parameters
model – The
LightningModule
if calling this outside of the trainer scope.
- reset_val_dataloader(model: Optional[LightningModule] = None) None ¶
Resets the validation dataloader and determines the number of batches.
- Parameters
model – The
LightningModule
if called outside of the trainer scope.
- save_checkpoint(filepath: Union[str, Path], weights_only: bool = False, storage_options: Optional[Any] = None) None ¶
Runs routine to create a checkpoint.
- Parameters
filepath – Path where checkpoint is saved.
weights_only – If
True
, will only save the model weights.storage_options – parameter for how to save to storage, passed to
CheckpointIO
plugin
- test(model: Optional[LightningModule] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], LightningDataModule]] = None, ckpt_path: Optional[str] = None, verbose: bool = True, datamodule: Optional[LightningDataModule] = None) List[Dict[str, float]] ¶
Perform one evaluation epoch over the test set. It’s separated from fit to make sure you never run on your test set until you want to.
- Parameters
model – The model to test.
dataloaders – A
torch.utils.data.DataLoader
or a sequence of them, or aLightningDataModule
specifying test samples.ckpt_path – Either
"best"
,"last"
,"hpc"
or path to the checkpoint you wish to test. IfNone
and the model instance was passed, use the current weights. Otherwise, the best model checkpoint from the previoustrainer.fit
call will be loaded if a checkpoint callback is configured.verbose – If True, prints the test results.
datamodule – An instance of
LightningDataModule
.
- Returns
List of dictionaries with metrics logged during the test phase, e.g., in model- or callback hooks like
test_step()
,test_epoch_end()
, etc. The length of the list corresponds to the number of test dataloaders used.
- tune(model: LightningModule, train_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], Sequence[Sequence[DataLoader]], Sequence[Dict[str, DataLoader]], Dict[str, DataLoader], Dict[str, Dict[str, DataLoader]], Dict[str, Sequence[DataLoader]], LightningDataModule]] = None, val_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader]]] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader]]] = None, datamodule: Optional[LightningDataModule] = None, scale_batch_size_kwargs: Optional[Dict[str, Any]] = None, lr_find_kwargs: Optional[Dict[str, Any]] = None, method: Literal['fit', 'validate', 'test', 'predict'] = 'fit') _TunerResult ¶
Runs routines to tune hyperparameters before training.
- Parameters
model – Model to tune.
train_dataloaders – A collection of
torch.utils.data.DataLoader
or aLightningDataModule
specifying training samples. In the case of multiple dataloaders, please see this section.val_dataloaders – A
torch.utils.data.DataLoader
or a sequence of them specifying validation samples.dataloaders – A
torch.utils.data.DataLoader
or a sequence of them specifying val/test/predict samples used for running tuner on validation/testing/prediction.datamodule – An instance of
LightningDataModule
.scale_batch_size_kwargs – Arguments for
scale_batch_size()
lr_find_kwargs – Arguments for
lr_find()
method – Method to run tuner on. It can be any of
("fit", "validate", "test", "predict")
.
- validate(model: Optional[LightningModule] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], LightningDataModule]] = None, ckpt_path: Optional[str] = None, verbose: bool = True, datamodule: Optional[LightningDataModule] = None) List[Dict[str, float]] ¶
Perform one evaluation epoch over the validation set.
- Parameters
model – The model to validate.
dataloaders – A
torch.utils.data.DataLoader
or a sequence of them, or aLightningDataModule
specifying validation samples.ckpt_path – Either
"best"
,"last"
,"hpc"
or path to the checkpoint you wish to validate. IfNone
and the model instance was passed, use the current weights. Otherwise, the best model checkpoint from the previoustrainer.fit
call will be loaded if a checkpoint callback is configured.verbose – If True, prints the validation results.
datamodule – An instance of
LightningDataModule
.
- Returns
List of dictionaries with metrics logged during the validation phase, e.g., in model- or callback hooks like
validation_step()
,validation_epoch_end()
, etc. The length of the list corresponds to the number of validation dataloaders used.
Non- PyTorch Lightning Trainer¶
- class collie.model.CollieMinimalTrainer(model: BasePipeline, max_epochs: int = 10, gpus: Optional[Union[bool, int]] = None, logger: Optional[Logger] = None, early_stopping_patience: Optional[int] = 3, log_every_n_steps: int = 50, flush_logs_every_n_steps: int = 100, enable_model_summary: bool = True, weights_summary: Optional[str] = None, detect_anomaly: bool = False, terminate_on_nan: Optional[bool] = None, benchmark: bool = True, deterministic: bool = True, progress_bar_refresh_rate: Optional[int] = None, verbosity: Union[bool, int] = True)[source]¶
A more manual implementation of PyTorch Lightning’s
Trainer
class, attempting to port over the most commonly usedTrainer
arguments into a training loop with more transparency and faster training times.Through extensive experimentation, we found that PyTorch Lightning’s
Trainer
was training Collie models about 25% slower than the more manual, typical PyTorch training loop boilerplate. Thus, we created theCollieMinimalTrainer
, which shares a similar API to PyTorch Lightning’sTrainer
object (both in instantiation and in usage), with a standard PyTorch training loop in its place.While PyTorch Lightning’s
Trainer
offers more flexibility and customization through the addition of the additionalTrainer
arguments andcallbacks
, we designed this class as a way to train a model in production, where we might be more focused on faster training times and less on hyperparameter tuning and R&D, where one might instead opt to use PyTorch Lightning’sTrainer
class.Note that the arguments the
CollieMinimalTrainer
trainer accepts will be slightly different than the ones that theCollieTrainer
accept, and defaults are also not guaranteed to be equal as the two libraries evolve. Notable changes are:If
gpus > 1
, only a single GPU will be used and any other GPUs will remain unused. Multi- GPU training is not supported inCollieMinimalTrainer
at this time.logger == True
has no meaning inCollieMinimalTrainer
- a default logger will NOT be created if set toTrue
.There is no way to pass in
callbacks
at this time. Instead, we will implement the most used ones during training here, manually, in favor of greater speed over customization. To use early stopping, set theearly_stopping_patience
to an integer other thanNone
.
from collie.model import CollieMinimalTrainer, MatrixFactorizationModel # notice how similar the usage is to the standard ``CollieTrainer`` model = MatrixFactorizationModel(train=train) trainer = CollieMinimalTrainer(model) trainer.fit(model)
Model results should NOT be significantly different whether trained with
CollieTrainer
orCollieMinimalTrainer
.If there’s an argument you would like to see added to
CollieMinimalTrainer
that is present inCollieTrainer
used during productionalized model training, make an Issue or a PR in GitHub!- Parameters
model (collie.model.BasePipeline) – Initialized Collie model
max_epochs (int) – Stop training once this number of epochs is reached
gpus (bool or int) – Whether to train on the GPU (
gpus == True
orgpus > 0
) or the CPUlogger (LightningLoggerBase) – Logger for experiment tracking. Set
logger = None
orlogger = False
to disable loggingearly_stopping_patience (int) – Number of epochs of patience to have without any improvement in loss before stopping training early. Validation epoch loss will be used if there is a validation DataLoader present, else training epoch loss will be used. Set
early_stopping_patience = None
orearly_stopping_patience = False
to disable early stoppinglog_every_n_steps (int) – How often to log within steps, if
logger
is enabledflush_logs_every_n_steps (int) – How often to flush logs to disk, if
logger
is enabledenable_model_summary (bool) – Whether to enable or disable the model summarization
weights_summary (str) – Deprecated, replaced with
enable_model_summary
. Prints summary of the weights when training beginsdetect_anomaly (bool) –
Context-manager that enable anomaly detection for the autograd engine. This does two things:
Running the forward pass with detection enabled will allow the backward pass to print the traceback of the forward operation that created the failing backward function.
Any backward computation that generate “nan” value will raise an error.
Warning: This mode should be enabled only for debugging as the different tests will slow down your program execution.
terminate_on_nan (bool) – Deprecated, replaced with
detect_anomaly
. If set toTrue
, will terminate training (by raising aValueError
) at the end of each training batch, if any of the parameters or the loss are NaN or +/- infinitybenchmark (bool) – If set to
True
, enablescudnn.benchmark
deterministic (bool) – If set to
True
, enablescudnn.deterministic
progress_bar_refresh_rate (int) – How often to refresh progress bar (in steps), if
verbosity > 0
verbosity (Union[bool, int]) –
How verbose to be in training.
0
disables all printouts, includingweights_summary
1
printsweights_summary
(if applicable) and epoch losses2
printsweights_summary
(if applicable), epoch losses, and progress bars
- fit(model: BasePipeline) None [source]¶
Runs the full optimization routine.
- Parameters
model (collie.model.BasePipeline) – Initialized Collie model
- property max_epochs¶
Property that just returns
max_epochs
, included only so we can have a setter for it without anAttributeError
.
Model Templates¶
Base Collie Pipeline Template¶
- class collie.model.BasePipeline(train: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, ExplicitInteractions, Interactions, InteractionsDataLoader]] = None, val: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, ExplicitInteractions, Interactions, InteractionsDataLoader]] = None, lr: float = 0.001, lr_scheduler_func: Optional[_LRScheduler] = None, weight_decay: float = 0.0, optimizer: Union[str, Optimizer] = 'adam', loss: Union[str, Callable[[...], tensor]] = 'hinge', metadata_for_loss: Optional[Dict[str, tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None, **kwargs)[source]¶
Bases:
LightningModule
Base Pipeline model architectures to inherit from.
All subclasses MUST at least override the following methods:
_setup_model
- Set up the model architectureforward
- Forward pass through a model
For
item_item_similarity
to work properly, all subclasses are should also implement:_get_item_embeddings
- Returns item embeddings from the model on the device
For
user_user_similarity
to work properly, all subclasses are should also implement:_get_user_embeddings
- Returns user embeddings from the model on the device
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
lr (float) – Model learning rate
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer (torch.optim or str) –
If a string, one of the following supported optimizers:
'sgd'
(fortorch.optim.SGD
)'adagrad'
(fortorch.optim.Adagrad
)'adam'
(fortorch.optim.Adam
)'sparse_adam'
(fortorch.optim.SparseAdam
)
loss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary**kwargs (keyword arguments) – All keyword arguments will be saved to
self.hparams
by default
- calculate_loss(batch: Union[Tuple[Tuple[tensor, tensor], tensor], Tuple[tensor, tensor, tensor]]) tensor [source]¶
Given a batch of data, calculate the loss value.
Note that the data type (implicit or explicit) will be determined by the structure of the batch sent to this method. See the table below for expected data types:
__getitem__
FormatExpected Meaning
Model Type
((X, Y), Z)
((user IDs, item IDs), negative item IDs)
Implicit
(X, Y, Z)
(user IDs, item IDs, ratings)
Explicit
- configure_optimizers() Union[Tuple[List[Optimizer], List[Optimizer]], Tuple[Optimizer, Optimizer], Optimizer] [source]¶
Configure optimizers and learning rate schedulers to use in optimization.
This method will be called after
setup
.If
self.bias_optimizer
is None, only a single optimizer will be returned. If there is a non-None
class attribute forbias_optimizer
, two optimizers will be created: one for all layers with the name ‘bias’ in them, and another for all other model parameters. The bias optimizer will be set with the same parameters asoptimizer
with the exception of the learning rate, which will be set toself.hparams.bias_lr
.
- abstract forward(users: tensor, items: tensor) tensor [source]¶
forward
should be implemented in all subclasses.
- get_item_predictions(user_id: int = 0, unseen_items_only: bool = False, sort_values: bool = True) Series [source]¶
Get predicted rankings/ratings for all items for a given
user_id
.This method cannot be called for datasets stored in
HDF5InteractionsDataLoader
since data in thisDataLoader
is read in dynamically.- Parameters
user_id (int) –
unseen_items_only (bool) – Filter
preds
to only show predictions of unseen items not present in the training or validation datasets for thatuser_id
. Note this requires bothtrain_loader
andval_loader
to be 1) class-level attributes in the model and 2) DataLoaders withInteractions
at its core (notHDF5Interactions
). If you are loading in a model, these two attributes will need to be set manually, since datasets are NOT saved when saving the modelsort_values (bool) – Whether to sort recommendations by descending prediction probability or not
- Returns
preds – Sorted values as predicted ratings for each item in the dataset with the index being the item ID
- Return type
pd.Series
- get_user_predictions(item_id: int = 0, unseen_users_only: bool = False, sort_values: bool = True) Series [source]¶
User counterpart to
get_item_predictions
method.Get predicted rankings/ratings for all users for a given
item_id
.This method cannot be called for datasets stored in
HDF5InteractionsDataLoader
since data in thisDataLoader
is read in dynamically.- Parameters
item_id (int) –
unseen_users_only (bool) – Filter
preds
to only show predictions of unseen users not present in the training or validation datasets for thatitem_id
. Note this requires bothtrain_loader
andval_loader
to be 1) class-level attributes in the model and 2) DataLoaders withInteractions
at its core (notHDF5Interactions
). If you are loading in a model, these two attributes will need to be set manually, since datasets are NOT saved when saving the modelsort_values (bool) – Whether to sort recommendations by descending prediction probability
- Returns
preds – Sorted values as predicted ratings for each user in the dataset with the index being the user ID
- Return type
pd.Series
- item_item_similarity(item_id: int) Series [source]¶
Get most similar item indices by cosine similarity.
Cosine similarity is computed with item embeddings from a trained model.
- Parameters
item_id (int) –
- Returns
sim_score_idxs – Sorted values as cosine similarity for each item in the dataset with the index being the item ID
- Return type
pd.Series
Note
Returned array is unfiltered, so the first element, being the most similar item, will always be the item itself.
- on_fit_start() None [source]¶
Method that runs at the very beginning of the fit process.
This method will be called after
configure_optimizers
.
- save_model(filename: Union[str, Path] = 'model.pth') None [source]¶
Save the model’s state dictionary and hyperparameters.
While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:
We only want to save the underlying PyTorch model (and not the
Trainer
object) so we don’t have to require PyTorch Lightning as a dependency when deploying a model.In the v0.8.4 release, loading a model back in leads to a
RuntimeError
unable to load in weights.
- Parameters
filepath (str or Path) – Filepath for state dictionary to be saved at ending in ‘.pth’
- train_dataloader() Union[ApproximateNegativeSamplingInteractionsDataLoader, InteractionsDataLoader] [source]¶
Method that sets up training data as a PyTorch DataLoader.
This method will be called after
on_fit_start
.
- training_epoch_end(outputs: Union[List[float], List[List[float]]]) None [source]¶
Method that contains a callback for logic to run after the training epoch ends.
This method will be called after
training_step
.
- training_step(batch: Tuple[Tuple[tensor, tensor], tensor], batch_idx: int, optimizer_idx: Optional[int] = None) tensor [source]¶
Method that contains logic for what happens inside the training loop.
This method will be called after
train_dataloader
.
- user_user_similarity(user_id: int) Series [source]¶
User counterpart to
item_item_similarity
method.Get most similar user indices by cosine similarity.
Cosine similarity is computed with user embeddings from a trained model.
- Parameters
user_id (int) –
- Returns
sim_score_idxs – Sorted values as cosine similarity for each user in the dataset with the index being the user ID
- Return type
pd.Series
Note
Returned array is unfiltered, so the first element, being the most similar user, will always be the seed user themself.
- val_dataloader() Union[ApproximateNegativeSamplingInteractionsDataLoader, InteractionsDataLoader] [source]¶
Method that sets up validation data as a PyTorch DataLoader.
This method will be called after
training_step
.
Base Collie Multi-Stage Pipeline Template¶
- class collie.model.MultiStagePipeline(train: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, Interactions, InteractionsDataLoader]] = None, val: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, Interactions, InteractionsDataLoader]] = None, lr_scheduler_func: Optional[_LRScheduler] = None, weight_decay: float = 0.0, optimizer_config_list: Optional[List[Dict[str, Union[float, List[str], str]]]] = None, loss: Union[str, Callable[[...], tensor]] = 'hinge', metadata_for_loss: Optional[Dict[str, tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None, **kwargs)[source]¶
Bases:
BasePipeline
Multi-stage pipeline model architectures to inherit from.
This model template is intended for models that train in distinct stages, with a different optimizer optimizing each step. This allows model components to be optimized with a set order in mind, rather than all at once, such as with the
BasePipeline
.Generally, multi-stage models will have a training protocol like:
from collie.model import CollieTrainer, SomeMultiStageModel model = SomeMultiStageModel(train=train) trainer = CollieTrainer(model) # fit stage 1 trainer.fit(model) # fit stage 2 trainer.max_epochs += 10 model.advance_stage() trainer.fit(model) # fit stage 3 trainer.max_epochs += 10 model.advance_stage() trainer.fit(model) # ... and so on, until... model.eval()
Just like with
BasePipeline
, all subclasses MUST at least override the following methods:_setup_model
- Set up the model architectureforward
- Forward pass through a model
For
item_item_similarity
to work properly, all subclasses are should also implement:_get_item_embeddings
- Returns item embeddings from the model
- Parameters
train (
collie.interactions
object) – Data loader for training data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=True
val (
collie.interactions
object) – Data loader for validation data. If anInteractions
object is supplied, anInteractionsDataLoader
will automatically be instantiated withshuffle=False
lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting
weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits
optimizer_config_list (list of dict) –
List of dictionaries containing the optimizer configurations for each stage’s optimizer(s). Each dictionary must contain the following keys:
lr
: strLearning rate for the optimizer
optimizer
:torch.optim
orstr
parameter_prefix_list
: List[str]List of string prefixes corressponding to the model components that should be optimized with this optimizer
stage
: strName of stage
This must be ordered with the intended progression of stages.
loss (function or str) –
If a string, one of the following implemented losses:
'bpr'
/'adaptive_bpr'
(implicit data)'hinge'
/'adaptive_hinge'
(implicit data)'warp'
(implicit data)'mse'
(explicit data)'mae'
(explicit data)
For implicit data, if
train.num_negative_samples > 1
, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are
num_items
,positive_items
,negative_items
,metadata
, andmetadata_weights
. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in
metadata_weights
. Values should be atorch.tensor
of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)metadata_for_loss_weights (dict) –
Keys should be strings identifying each metadata type that match keys in
metadata
. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values<= 1
. e.g. Ifmetadata_for_loss_weights = {'genre': .3, 'director': .2}
, then an item is:a 100% match if it’s the same item,
a 50% match if it’s a different item with the same genre and same director,
a 30% match if it’s a different item with the same genre and different director,
a 20% match if it’s a different item with a different genre and same director,
a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit
load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of
model.save_model()
method. Note that datasets and optimizers will NOT be restored. IfNone
, will initialize model as normalmap_location (str or torch.device) – If
load_model_path
is provided, device specifying how to remap storage locations whentorch.load
-ing the state dictionary**kwargs (keyword arguments) – All keyword arguments will be saved to
self.hparams
by default
Notes
With each call of
trainer.fit
, the optimizer and learning rate scheduler state will reset.When loading a multi-stage model in, the state will be set to the last possible state. This state may have a different
forward
calculation than other states.
- configure_optimizers() Union[Tuple[List[Optimizer], List[Optimizer]], Tuple[Optimizer, Optimizer], Optimizer] [source]¶
Configure optimizers and learning rate schedulers to use in optimization.
This method will be called after setup.
Creates an optimizer and learning rate scheduler for each configuration dictionary in
self.hparams.optimizer_config_list
.
- optimizer_step(epoch: Optional[int] = None, batch_idx: Optional[int] = None, optimizer: Optional[Optimizer] = None, optimizer_idx: Optional[int] = None, optimizer_closure: Optional[Callable[[...], Any]] = None, **kwargs) None [source]¶
Overriding Lightning’s optimizer step function to only step the optimizer associated with the relevant stage.
See here for more details: https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#optimizer-step
- Parameters
epoch (int) – Current epoch
batch_idx (int) – Index of current batch
optimizer (torch.optim.Optimizer) – A PyTorch optimizer
optimizer_idx (int) – If you used multiple optimizers, this indexes into that list
optimizer_closure (Callable) – Closure for all optimizers
Layers¶
Scaled Embedding¶
- class collie.model.ScaledEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[Tensor] = None, device=None, dtype=None)[source]¶
Bases:
Embedding
Embedding layer that initializes its values to use a truncated normal distribution.
Zero Embedding¶
- class collie.model.ZeroEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[Tensor] = None, device=None, dtype=None)[source]¶
Bases:
Embedding
Embedding layer with weights zeroed-out.