Models

Instantiating and Training a Collie Model

Collie provides architectures for several state-of-the-art recommendation model architectures for both non-hybrid and hybrid models, depending on if you would like to directly incorporate metadata into the model.

Since Collie utilizes PyTorch Lightning for model training, all models, by default:

  • Are compatible with CPU, GPU, multi-GPU, and TPU training

  • Allow for 16-bit precision

  • Integrate with common external loggers

  • Allow for extensive predefined and custom training callbacks

  • Are flexible with minimal boilerplate code

While each model’s API differs slightly, generally, the training procedure for each model will look like:

from collie.model import CollieTrainer, MatrixFactorizationModel


# assume you have ``interactions`` already defined and ready-to-go

model = MatrixFactorizationModel(interactions)

trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# now, ``model`` is ready to be used for inference, evaluation, etc.

model.save_model('model.pkl')

When we have side-data about items, this can be incorporated directly into the loss function of the model. For details on this, see Losses.

Hybrid Collie models allow incorporating side-data about items and/or users directly into the model. For an in-depth example of this, see Tutorials.

Creating a Custom Architecture

Collie not only houses incredible pre-defined architectures, but was built with customization in mind. All Collie recommendation models are built as subclasses of the BasePipeline model, inheriting common loss calculation functions and model training boilerplate. This allows for a nice balance between both flexibility and faster iteration.

While any method can be overridden with more architecture-specific implementations, at the bare minimum, each additional model must override:

  • _setup_model - Model architecture initialization

  • forward - Model step that accepts a batch of data of form (users, items), negative_items and outputs a recommendation score for each item

If we wanted to create a custom model that performed a barebones matrix factorization calculation, in Collie, this would be implemented as:

import torch

from collie.model import BasePipeline, CollieTrainer, ScaledEmbedding
from collie.utils import get_init_arguments


class SimpleModel(BasePipeline):
    def __init__(self, train, val, embedding_dim):
        """
        Initialize a simple model that is a subclass of ``BasePipeline``.

        Parameters
        ----------
        train: ``collie.interactions`` object
        val: ``collie.interactions`` object
        embedding_dim: int
            Number of latent factors to use for user and item embeddings

        """
        super().__init__(**get_init_arguments())

    def _setup_model(self, **kwargs):
        """Method for building model internals that rely on the data passed in."""
        self.user_embeddings = ScaledEmbedding(num_embeddings=self.hparams.num_users,
                                               embedding_dim=self.hparams.embedding_dim)
        self.item_embeddings = ScaledEmbedding(num_embeddings=self.hparams.num_items,
                                               embedding_dim=self.hparams.embedding_dim)

    def forward(self, users, items):
        """
        Forward pass through the model.

        Parameters
        ----------
        users: tensor, 1-d
            Array of user indices
        items: tensor, 1-d
            Array of item indices

        Returns
        -------
        preds: tensor, 1-d
            Predicted scores

        """
        return torch.mul(
            self.user_embeddings(users), self.item_embeddings(items)
        ).sum(axis=1)


# assume you have ``train`` and ``val`` already defined and ready-to-go

model = SimpleModel(train, val, embedding_dim=10)

trainer = CollieTrainer(model, max_epochs=10)
trainer.fit(model)
model.eval()

# now, ``model`` is ready to be used for inference, evaluation, etc.

model.save_model('model.pkl')

See the source code for the BasePipeline in Model Templates below for the calling order of each class method as well as initialization details for optimizers, schedulers, and more.

Standard Models

Matrix Factorization Model

class collie.model.MatrixFactorizationModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, dropout_p: float = 0.0, sparse: bool = False, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: BasePipeline

Training pipeline for the matrix factorization model.

MatrixFactorizationModel models have an embedding layer for both users and items which are dot-producted together to output a single float ranking value.

Collie adds a twist on to this incredibly popular framework by allowing separate optimizers for embeddings and bias terms. With larger datasets and multiple epochs of training, a model might incorrectly learn to only optimize the bias terms for a quicker path towards a local loss minimum, essentially memorizing how popular each item is. By using a separate, slower optimizer for the bias terms (like Stochastic Gradient Descent), the model must prioritize optimizing the embeddings for meaningful, more varied recommendations, leading to a model that is able to achieve a much lower loss. See the documentation below for bias_lr and bias_optimizer input arguments for implementation details.

All MatrixFactorizationModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, MatrixFactorizationModel

model = MatrixFactorizationModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = MatrixFactorizationModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • dropout_p (float) – Probability of dropout

  • sparse (bool) – Whether or not to treat embeddings as sparse tensors. If True, cannot use weight decay on the optimizer

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adagrad' (for torch.optim.Adagrad)

    • 'adam' (for torch.optim.Adam)

    • 'sparse_adam' (for torch.optim.SparseAdam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model.

Simple matrix factorization for a single user and item looks like:

``prediction = (user_embedding * item_embedding) + user_bias + item_bias``

If dropout is added, it is applied to the two embeddings and not the biases.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Multilayer Perceptron Matrix Factorization Model

class collie.model.MLPMatrixFactorizationModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, num_layers: int = 3, dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: BasePipeline

Training pipeline for the matrix factorization model with MLP layers instead of a final dot

product (like in MatrixFactorizationModel).

MLPMatrixFactorizationModel models have an embedding layer for both users and items which, are concatenated and sent through a MLP to output a single float ranking value.

All MLPMatrixFactorizationModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, MLPMatrixFactorizationModel

model = MLPMatrixFactorizationModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = MLPMatrixFactorizationModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula embedding_dim * (2 ** (``num_layers - current_layer_number))``

  • dropout_p (float) – Probability of dropout on the linear layers

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model, roughly:

`prediction = MLP(concatenate(user_embedding * item_embedding)) + user_bias + item_bias`

If dropout is added, it is applied for the two embeddings and not the biases.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Nonlinear Embeddings Matrix Factorization Model

class collie.model.NonlinearMatrixFactorizationModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, user_embedding_dim: int = 60, item_embedding_dim: int = 60, user_dense_layers_dims: ~typing.List[float] = [48, 32], item_dense_layers_dims: ~typing.List[float] = [48, 32], embedding_dropout_p: float = 0.0, dense_dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: BasePipeline

Training pipeline for a nonlinear matrix factorization model.

NonlinearMatrixFactorizationModel models have an embedding layer for users and items. These are sent through separate dense networks, which output more refined embeddings, which are then dot producted for a single float ranking / rating.

Collie adds a twist on to this novel framework by allowing separate optimizers for embeddings and bias terms. With larger datasets and multiple epochs of training, a model might incorrectly learn to only optimize the bias terms for a quicker path towards a local loss minimum, essentially memorizing how popular each item is. By using a separate, slower optimizer for the bias terms (like Stochastic Gradient Descent), the model must prioritize optimizing the embeddings for meaningful, more varied recommendations, leading to a model that is able to achieve a much lower loss. See the documentation below for bias_lr and bias_optimizer input arguments for implementation details.

All NonlinearMatrixFactorizationModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, NonlinearMatrixFactorizationModel

model = NonlinearMatrixFactorizationModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = NonlinearMatrixFactorizationModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • user_embedding_dim (int) – Number of latent factors to use for user embeddings

  • item_embedding_dim (int) – Number of latent factors to use for item embeddings

  • user_dense_layers_dims (list) – List of linear layer dimensions to apply to the user embedding, starting with the dimension directly following user_embedding_dim

  • item_dense_layers_dims (list) – List of linear layer dimensions to apply to the item embedding, starting with the dimension directly following item_embedding_dim

  • embedding_dropout_p (float) – Probability of dropout on the embedding layers

  • dense_dropout_p (float) – Probability of dropout on the dense layers

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Collaborative Metric Learning Model

class collie.model.CollaborativeMetricLearningModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, sparse: bool = False, lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, y_range: ~typing.Optional[~typing.Tuple[float, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: BasePipeline

Training pipeline for the collaborative metric learning model.

CollaborativeMetricLearningModel models have an embedding layer for both users and items. A single float, prediction is retrieved by taking the pairwise distance between the two embeddings.

The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1803.00202.pdf 1

All CollaborativeMetricLearningModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollaborativeMetricLearningModel, CollieTrainer

model = CollaborativeMetricLearningModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = CollaborativeMetricLearningModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • sparse (bool) – Whether or not to treat embeddings as sparse tensors. If True, cannot use weight decay on the optimizer

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adagrad' (for torch.optim.Adagrad)

    • 'adam' (for torch.optim.Adam)

    • 'sparse_adam' (for torch.optim.SparseAdam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

References

1

Campo, Miguel, et al. “Collaborative Metric Learning Recommendation System: Application to Theatrical Movie Releases.” ArXiv.org, 1 Mar. 2018, arxiv.org/abs/1803.00202.

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model, equivalent to:

`prediction = pairwise_distance(user_embedding * item_embedding)`

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Neural Collaborative Filtering (NeuCF)

class collie.model.NeuralCollaborativeFiltering(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 8, num_layers: int = 3, final_layer: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~torch._VariableFunctionsClass.tensor], ~torch._VariableFunctionsClass.tensor]]] = None, dropout_p: float = 0.0, lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: BasePipeline

Training pipeline for a neural matrix factorization model.

NeuralCollaborativeFiltering models combine a collaborative filtering and multilayer perceptron network in a single, unified model. The model consists of two sections: the first is a simple matrix factorization that calculates a score by multiplying together user and item embeddings (lookups through an embedding table); the second is a MLP network that feeds embeddings from a second set of embedding tables (one for user, one for item). Both output vectors are combined and sent through a final MLP layer before returning a single recommendation score.

The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1708.05031.pdf 2

All NeuralCollaborativeFiltering instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, NeuralCollaborativeFiltering

model = NeuralCollaborativeFiltering(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = NeuralCollaborativeFiltering(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for the matrix factorization embedding table. For the MLP embedding table, the dimensionality will be calculated with the formula embedding_dim * (2 ** (num_layers - 1))

  • num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula embedding_dim * (2 ** (``num_layers - current_layer_number))``

  • final_layer (str or function) –

    Final layer activation function. Available string options include:

    • ’sigmoid’

    • ’relu’

    • ’leaky_relu’

  • dropout_p (float) – Probability of dropout on the MLP layers

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

References

2

Xiangnan et al. “Neural Collaborative Filtering.” Neural Collaborative Filtering | Proceedings of the 26th International Conference on World Wide Web, 1 Apr. 2017, dl.acm.org/doi/10.1145/3038912.3052569.

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Deep Factorization Machine (DeepFM)

class collie.model.DeepFM(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 8, num_layers: int = 3, final_layer: ~typing.Optional[~typing.Union[str, ~typing.Callable[[...], ~typing.Any]]] = None, dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: BasePipeline

Training pipeline for a deep factorization model.

DeepFM models combine a shallow factorization machine and a deep multilayer perceptron network in a single, unified model. The model consists of embedding tables for users and items, and model output is the sum of 1) factorization machine output of both embeddings (shallow) and 2) MLP output for the concatenation of both embeddings (deep).

The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1703.04247.pdf 3

All DeepFM instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, DeepFM

model = DeepFM(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = DeepFM(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for the matrix factorization embedding table. For the MLP embedding table, the dimensionality will be calculated with the formula embedding_dim * (2 ** (num_layers - 1))

  • num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula embedding_dim * (2 ** (``num_layers - current_layer_number))``

  • final_layer (str or function) –

    Final layer activation function. Available string options include:

    • ’sigmoid’

    • ’relu’

    • ’leaky_relu’

  • dropout_p (float) – Probability of dropout

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

References

3

Guo, Huifeng, et al. “DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction.” ArXiv.org, 13 Mar. 2017, arxiv.org/abs/1703.04247.

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Hybrid Models

Hybrid Pretrained Matrix Factorization Model

class collie.model.HybridPretrainedModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, user_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, trained_model: ~typing.Optional[~collie.model.matrix_factorization.MatrixFactorizationModel] = None, item_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, user_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, combined_layers_dims: ~typing.List[int] = [128, 64, 32], freeze_embeddings: bool = True, dropout_p: float = 0.0, lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: BasePipeline

Training pipeline for a hybrid recommendation model using a pre-trained matrix factorization

model as its base.

HybridPretrainedModel models contain dense layers that process item and/or user metadata, concatenate this embedding with the user and item embeddings copied from a trained MatrixFactorizationModel, and send this concatenated embedding through more dense layers to output a single float ranking / rating. We add both user and item biases to this score before returning. This is the same architecture as the HybridModel, but we are using the embeddings from a pre-trained model rather than training them up ourselves.

All HybridPretrainedModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, HybridPretrainedModel, MatrixFactorizationModel

# instantiate and fit a ``MatrixFactorizationModel`` as expected
mf_model = MatrixFactorizationModel(train=train)
mf_trainer = CollieTrainer(mf_model)
mf_trainer.fit(mf_model)

hybrid_model = HybridPretrainedModel(train=train,
                                     item_metadata=item_metadata,
                                     user_metadata=user_metadata,
                                     trained_model=mf_model)
hybrid_trainer = CollieTrainer(hybrid_model)
hybrid_trainer.fit(hybrid_model)
hybrid_model.eval()

# do evaluation as normal with ``hybrid_model``

hybrid_model.save_model(path='model')
new_hybrid_model = HybridPretrainedModel(load_model_path='model')

# do evaluation as normal with ``new_hybrid_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • item_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the item metadata should be (num_items x metadata_features), and each item’s metadata should be available when indexing a row by an item ID

  • user_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the user metadata should be (num_users x metadata_features), and each user’s metadata should be available when indexing a row by an item ID

  • trained_model (collie.model.MatrixFactorizationModel) – Previously trained MatrixFactorizationModel model to extract embeddings from

  • item_metadata_layers_dims (list) – List of linear layer dimensions to apply to the item metadata only, starting with the dimension directly following metadata_features and ending with the dimension to concatenate with the item embeddings

  • user_metadata_layers_dims (list) – List of linear layer dimensions to apply to the user metadata only, starting with the dimension directly following metadata_features and ending with the dimension to concatenate with the user embeddings

  • combined_layers_dims (list) – List of linear layer dimensions to apply to the concatenated item embeddings and item metadata, starting with the dimension directly following the shape of item_embeddings + metadata_features and ending with the dimension before the final linear layer to dimension 1

  • freeze_embeddings (bool) – When initializing the model, whether or not to freeze trained_model’s embeddings

  • dropout_p (float) – Probability of dropout

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

freeze_embeddings() None[source]

Remove gradient requirement from the embeddings.

load_from_hybrid_model(hybrid_model) None[source]

Copy hyperparameters and state dictionary from an existing HybridPretrainedModel instance.

This is particularly useful for creating another PyTorch Lightning trainer object to fine-tune copied-over embeddings from a MatrixFactorizationModel instance.

Parameters

hybrid_model (collie.model.HybridPretrainedModel) – HybridPretrainedModel containing hyperparameters and state dictionary to copy over

save_model(path: Union[str, Path] = 'data/model', overwrite: bool = False) None[source]

Save the model’s state dictionary, hyperparameters, and user and/or item metadata.

While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:

  1. To properly save and load a model requires the Trainer object, meaning that all deployed models will require Lightning to run the model, which is not actually needed for inference.

  2. In the v0.8.4 release, loading a model back in leads to a RuntimeError unable to load in weights.

Parameters
  • path (str or Path) – Directory path to save model and data files

  • overwrite (bool) – Whether or not to overwrite existing data

unfreeze_embeddings() None[source]

Require gradients for the embeddings.

Multi-Stage Models

Cold Start Matrix Factorization Model

class collie.model.ColdStartModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_buckets: ~typing.Optional[~typing.Iterable[int]] = None, embedding_dim: int = 30, dropout_p: float = 0.0, sparse: bool = False, item_buckets_stage_lr: float = 0.001, no_buckets_stage_lr: float = 0.001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=False), weight_decay: float = 0.0, item_buckets_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', no_buckets_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: MultiStagePipeline

Training pipeline for a matrix factorization model optimized for the cold-start problem.

Many recommendation models suffer from the cold start problem, when a model is unable to provide adequate recommendations for a new item until enough users have interacted with it. But, if users only interact with recommended items, the item will never be recommended, and thus the model will never improve recommendations for this item.

The ColdStartModel attempts to bypass this by limiting the item space down to “item buckets”, training a model on this as the item space, then expanding out to all items. During this expansion, the learned-embeddings of each bucket is copied over to each corresponding item, providing a smarter initialization than a random one for both existing and new items. Now, when we have a new item, we can use its bucket embedding as an initialization into a model.

The stages in a ColdStartModel are, in order:

  1. item_buckets

    Matrix factorization with item embeddings and bias terms bucketed by item_buckets argument. Unlike in the next stage, many items may map on to a single bucket, and this will share the same embedding and bias representation. The model should learn user preference for buckets in this stage.

  2. no_buckets

    Standard matrix factorization as we do in MatrixFactorizationModel. However, upon advancing to this stage, the item embeddings are initialized with their bucketed embedding value (and same for biases). Not only does this provide better initialization than random, but allows new items to be incorporated into the model without training by using their item bucket embedding and bias terms at prediction time.

Note that the cold start problem exists for new users as well, but this functionality will be added to this model in a future version.

All ColdStartModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import ColdStartModel, CollieTrainer

# instantiate and fit a ``ColdStartModel`` as expected
model = ColdStartModel(train=train, item_buckets=item_buckets)
trainer = CollieTrainer(model)
trainer.fit(model)

# train for X more epochs on the next stage, ``no_buckets``
trainer.max_epochs += X
model.advance_stage()
trainer.fit(model)

model.eval()

# do evaluation as normal with ``model``

# get item-item recommendations for a new item by using the bucket ID, Z
similar_items = model.item_bucket_item_similarity(item_bucket_id=Z)

model.save_model(filename='model.pth')
new_model = ColdStartModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • item_buckets (torch.tensor, 1-d) –

    An ordered iterable containing the bucket ID for each item ID. For example, if you have five films and are going to bucket by primary genre, and your data looks like:

    • Item ID: 0, Genre ID: 1

    • Item ID: 1, Genre ID: 0

    • Item ID: 2, Genre ID: 2

    • Item ID: 3, Genre ID: 2

    • Item ID: 4, Genre ID: 1

    Then item_buckets would be: [1, 0, 2, 2, 1]

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • dropout_p (float) – Probability of dropout

  • item_buckets_stage_lr (float) –

    Optimizer used for user parameters and item bucket parameters optimized during the item_buckets stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • no_buckets_stage_lr (float) –

    Optimizer used for user parameters and item parameters optimized during the no_buckets stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

Notes

The forward calculation will be different depending on the stage that is set. Note this when evaluating / saving and loading models in.

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

item_bucket_item_similarity(item_bucket_id: int) Series[source]

Get most similar item indices to a item bucket by cosine similarity.

Cosine similarity is computed with item and item bucket embeddings from a trained model.

Parameters

item_id (int) –

Returns

sim_score_idxs – Sorted values as cosine similarity for each item in the dataset with the index being the item ID

Return type

pd.Series

set_stage(stage: str) None[source]

Set the stage for the model.

Hybrid Matrix Factorization Model

class collie.model.HybridModel(train: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: ~typing.Optional[~typing.Union[~collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, ~collie.interactions.datasets.Interactions, ~collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, user_metadata: ~typing.Optional[~typing.Union[~torch._VariableFunctionsClass.tensor, ~pandas.core.frame.DataFrame, ~numpy.array]] = None, embedding_dim: int = 30, item_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, user_metadata_layers_dims: ~typing.Optional[~typing.List[int]] = None, combined_layers_dims: ~typing.List[int] = [128, 64, 32], dropout_p: float = 0.0, lr: float = 0.001, bias_lr: ~typing.Optional[~typing.Union[float, str]] = 0.01, metadata_only_stage_lr: float = 0.001, all_stage_lr: float = 0.0001, lr_scheduler_func: ~typing.Optional[~torch.optim.lr_scheduler._LRScheduler] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=False), weight_decay: float = 0.0, optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', bias_optimizer: ~typing.Optional[~typing.Union[str, ~torch.optim.optimizer.Optimizer]] = 'sgd', metadata_only_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', all_stage_optimizer: ~typing.Union[str, ~torch.optim.optimizer.Optimizer] = 'adam', loss: ~typing.Union[str, ~typing.Callable[[...], ~torch._VariableFunctionsClass.tensor]] = 'hinge', metadata_for_loss: ~typing.Optional[~typing.Dict[str, ~torch._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: ~typing.Optional[~typing.Dict[str, float]] = None, load_model_path: ~typing.Optional[str] = None, map_location: ~typing.Optional[str] = None)[source]

Bases: MultiStagePipeline

Training pipeline for a multi-stage hybrid recommendation model.

HybridModel models contain dense layers that process item and/or user metadata, concatenate this embedding with user and item embeddings, sending this concatenated embedding through more dense layers to output a single float ranking / rating. We add both user and item biases to this score before returning. This is the same architecture as the HybridPretrainedModel, but we are training the embeddings ourselves rather than relying on pulling this from a pre-trained model.

The stages in a HybridModel depend on whether both item and user metadata is used. For the full model, they are, in order:

  1. matrix_factorization

    Matrix factorization exactly as we do in MatrixFactorizationModel. In this stage, metadata is NOT incorporated into the model.

  2. metadata_only

    User and item embeddings terms are frozen, and the MLP layers for the metadata (if specified) and combined embedding-metadata data are optimized.

  3. all

    Embedding and MLP layers are all optimized together, including those for metadata.

All HybridModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, HybridModel

# instantiate and fit a ``HybridModel`` as expected
model = HybridModel(train=train,
                    item_metadata=item_metadata,
                    user_metadata=user_metadata)
trainer = CollieTrainer(model)
trainer.fit(model)

# train for X more epochs on the next stage, ``metadata_only``
trainer.max_epochs += X
model.advance_stage()
trainer.fit(model)

# train for Y more epochs on the next stage, ``all``
trainer.max_epochs += Y
model.advance_stage()
trainer.fit(model)

model.eval()

# do evaluation as normal with ``model``

model.save_model(path='model')
new_model = HybridModel(load_model_path='model')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • item_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the item metadata should be (num_items x metadata_features), and each item’s metadata should be available when indexing a row by an item ID

  • user_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the user metadata should be (num_users x metadata_features), and each user’s metadata should be available when indexing a row by a user ID

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • item_metadata_layers_dims (list) – List of linear layer dimensions to apply to the item metadata only, starting with the dimension directly following item_metadata_features and ending with the dimension to concatenate with the item embeddings

  • user_metadata_layers_dims (list) – List of linear layer dimensions to apply to the user metadata only, starting with the dimension directly following user_metadata_features and ending with the dimension to concatenate with the user embeddings

  • combined_layers_dims (list) – List of linear layer dimensions to apply to the concatenated item embeddings and item metadata, starting with the dimension directly following the shape of item_embeddings + metadata_features and ending with the dimension before the final linear layer to dimension 1

  • dropout_p (float) – Probability of dropout

  • metadata_only_stage_lr (float) – Learning rate for metadata and combined layers optimized during the metadata_only stage

  • all_stage_lr (float) – Learning rate for all model parameters optimized during the all stage

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    Optimizer used for embeddings and bias terms (if bias_optimizer is None) during the matrix_factorization stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • metadata_only_stage_optimizer (torch.optim or str) –

    Optimizer used for metadata and combined layers during the metadata_only stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • all_stage_optimizer (torch.optim or str) –

    Optimizer used for all model parameters during the all stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

Notes

The forward calculation will be different depending on the stage that is set. Note this when evaluating / saving and loading models in.

forward(users: tensor, items: tensor) tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

save_model(path: Union[str, Path] = 'data/model', overwrite: bool = False) None[source]

Save the model’s state dictionary, hyperparameters, and user and/or item metadata.

While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:

  1. To properly save and load a model requires the Trainer object, meaning that all deployed models will require Lightning to run the model, which is not actually needed for inference.

  2. In the v0.8.4 release, loading a model back in leads to a RuntimeError unable to load in weights.

Parameters
  • path (str or Path) – Directory path to save model and data files

  • overwrite (bool) – Whether or not to overwrite existing data

Trainers

PyTorch Lightning Trainer

class collie.model.CollieTrainer(model: Module, max_epochs: int = 10, benchmark: bool = True, deterministic: bool = True, **kwargs)[source]

Bases: Trainer

Helper wrapper class around PyTorch Lightning’s Trainer class.

Specifically, this wrapper:

  • Checks if a model has a validation dataset passed in (under the val_loader attribute) and, if not, sets num_sanity_val_steps to 0 and check_val_every_n_epoch to sys.maxint.

  • Checks if a GPU is available and, if gpus is None, sets gpus = -1.

See pytorch_lightning.Trainer documentation for more details at: https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-class-api

Compared with CollieMinimalTrainer, PyTorch Lightning’s Trainer offers more flexibility and room for exploration, at the cost of a higher training time (which is especially true for larger models). We recommend starting all model exploration with this CollieTrainer (callbacks, automatic Lightning optimizations, etc.), finding a set of hyperparameters that work for your training job, then using this in the simpler but faster CollieMinimalTrainer.

Parameters
  • model (collie.model.BasePipeline) – Initialized Collie model

  • max_epochs (int) – Stop training once this number of epochs is reached

  • benchmark (bool) – If set to True, enables cudnn.benchmark

  • deterministic (bool) – If set to True, enables cudnn.deterministic

  • **kwargs (keyword arguments) – Additional keyword arguments to be sent to the Trainer class: https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-class-api

  • follows (Original pytorch_lightning.Trainer docstring as) –

  • ########

    Customize every aspect of training via flags.

    Args:

    accelerator: Supports passing different accelerator types (“cpu”, “gpu”, “tpu”, “ipu”, “hpu”, “mps”, “auto”)

    as well as custom accelerator instances.

    accumulate_grad_batches: Accumulates grads every k batches or as set up in the dict.

    Default: None.

    amp_backend: The mixed precision backend to use (“native” or “apex”).

    Default: 'native''.

    Deprecated since version v1.9: Setting amp_backend inside the Trainer is deprecated in v1.8.0 and will be removed in v2.0.0. This argument was only relevant for apex which is being removed.

    amp_level: The optimization level to use (O1, O2, etc…). By default it will be set to “O2”

    if amp_backend is set to “apex”.

    Deprecated since version v1.8: Setting amp_level inside the Trainer is deprecated in v1.8.0 and will be removed in v2.0.0.

    auto_lr_find: If set to True, will make trainer.tune() run a learning rate finder,

    trying to optimize initial learning for faster convergence. trainer.tune() method will set the suggested learning rate in self.lr or self.learning_rate in the LightningModule. To use a different key set a string instead of True with the key name. Default: False.

    auto_scale_batch_size: If set to True, will initially run a batch size

    finder trying to find the largest batch size that fits into memory. The result will be stored in self.batch_size in the LightningModule or LightningDataModule depending on your setup. Additionally, can be set to either power that estimates the batch size through a power search or binsearch that estimates the batch size through a binary search. Default: False.

    auto_select_gpus: If enabled and gpus or devices is an integer, pick available

    gpus automatically. This is especially useful when GPUs are configured to be in “exclusive mode”, such that only one process at a time can access them. Default: False.

    Deprecated since version v1.9: auto_select_gpus has been deprecated in v1.9.0 and will be removed in v2.0.0. Please use the function find_usable_cuda_devices() instead.

    benchmark: The value (True or False) to set torch.backends.cudnn.benchmark to.

    The value for torch.backends.cudnn.benchmark set in the current session will be used (False if not manually set). If :paramref:`~pytorch_lightning.trainer.Trainer.deterministic` is set to True, this will default to False. Override to manually set a different value. Default: None.

    callbacks: Add a callback or list of callbacks.

    Default: None.

    enable_checkpointing: If True, enable checkpointing.

    It will configure a default ModelCheckpoint callback if there is no user-defined ModelCheckpoint in :paramref:`~pytorch_lightning.trainer.trainer.Trainer.callbacks`. Default: True.

    check_val_every_n_epoch: Perform a validation loop every after every N training epochs. If None,

    validation will be done solely based on the number of training batches, requiring val_check_interval to be an integer value. Default: 1.

    default_root_dir: Default path for logs and weights when no logger/ckpt_callback passed.

    Default: os.getcwd(). Can be remote file paths such as s3://mybucket/path or ‘hdfs://path/’

    detect_anomaly: Enable anomaly detection for the autograd engine.

    Default: False.

    deterministic: If True, sets whether PyTorch operations must use deterministic algorithms.

    Set to "warn" to use deterministic algorithms whenever possible, throwing warnings on operations that don’t support deterministic mode (requires PyTorch 1.11+). If not set, defaults to False. Default: None.

    devices: Will be mapped to either gpus, tpu_cores, num_processes or ipus,

    based on the accelerator type.

    fast_dev_run: Runs n if set to n (int) else 1 if set to True batch(es)

    of train, val and test to find any bugs (ie: a sort of unit test). Default: False.

    gpus: Number of GPUs to train on (int) or which GPUs to train on (list or str) applied per node

    Default: None.

    Deprecated since version v1.7: gpus has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='gpu' and devices=x instead.

    gradient_clip_val: The value at which to clip gradients. Passing gradient_clip_val=None disables

    gradient clipping. If using Automatic Mixed Precision (AMP), the gradients will be unscaled before. Default: None.

    gradient_clip_algorithm: The gradient clipping algorithm to use. Pass gradient_clip_algorithm="value"

    to clip by value, and gradient_clip_algorithm="norm" to clip by norm. By default it will be set to "norm".

    limit_train_batches: How much of training dataset to check (float = fraction, int = num_batches).

    Default: 1.0.

    limit_val_batches: How much of validation dataset to check (float = fraction, int = num_batches).

    Default: 1.0.

    limit_test_batches: How much of test dataset to check (float = fraction, int = num_batches).

    Default: 1.0.

    limit_predict_batches: How much of prediction dataset to check (float = fraction, int = num_batches).

    Default: 1.0.

    logger: Logger (or iterable collection of loggers) for experiment tracking. A True value uses

    the default TensorBoardLogger if it is installed, otherwise CSVLogger. False will disable logging. If multiple loggers are provided, local files (checkpoints, profiler traces, etc.) are saved in the log_dir of he first logger. Default: True.

    log_every_n_steps: How often to log within steps.

    Default: 50.

    enable_progress_bar: Whether to enable to progress bar by default.

    Default: True.

    profiler: To profile individual steps during training and assist in identifying bottlenecks.

    Default: None.

    overfit_batches: Overfit a fraction of training/validation data (float) or a set number of batches (int).

    Default: 0.0.

    plugins: Plugins allow modification of core behavior like ddp and amp, and enable custom lightning plugins.

    Default: None.

    precision: Double precision (64), full precision (32), half precision (16) or bfloat16 precision (bf16).

    Can be used on CPU, GPU, TPUs, HPUs or IPUs. Default: 32.

    max_epochs: Stop training once this number of epochs is reached. Disabled by default (None).

    If both max_epochs and max_steps are not specified, defaults to max_epochs = 1000. To enable infinite training, set max_epochs = -1.

    min_epochs: Force training for at least these many epochs. Disabled by default (None).

    max_steps: Stop training after this number of steps. Disabled by default (-1). If max_steps = -1

    and max_epochs = None, will default to max_epochs = 1000. To enable infinite training, set max_epochs to -1.

    min_steps: Force training for at least these number of steps. Disabled by default (None).

    max_time: Stop training after this amount of time has passed. Disabled by default (None).

    The time duration can be specified in the format DD:HH:MM:SS (days, hours, minutes seconds), as a datetime.timedelta, or a dictionary with keys that will be passed to datetime.timedelta.

    num_nodes: Number of GPU nodes for distributed training.

    Default: 1.

    num_processes: Number of processes for distributed training with accelerator="cpu".

    Default: 1.

    Deprecated since version v1.7: num_processes has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='cpu' and devices=x instead.

    num_sanity_val_steps: Sanity check runs n validation batches before starting the training routine.

    Set it to -1 to run all batches in all validation dataloaders. Default: 2.

    reload_dataloaders_every_n_epochs: Set to a non-negative integer to reload dataloaders every n epochs.

    Default: 0.

    replace_sampler_ddp: Explicitly enables or disables sampler replacement. If not specified this

    will toggled automatically when DDP is used. By default it will add shuffle=True for train sampler and shuffle=False for val/test sampler. If you want to customize it, you can set replace_sampler_ddp=False and add your own distributed sampler.

    resume_from_checkpoint: Path/URL of the checkpoint from which training is resumed. If there is

    no checkpoint file at the path, an exception is raised. If resuming from mid-epoch checkpoint, training will start from the beginning of the next epoch.

    Deprecated since version v1.5: resume_from_checkpoint is deprecated in v1.5 and will be removed in v2.0. Please pass the path to Trainer.fit(..., ckpt_path=...) instead.

    strategy: Supports different training strategies with aliases

    as well custom strategies. Default: None.

    sync_batchnorm: Synchronize batch norm layers between process groups/whole world.

    Default: False.

    tpu_cores: How many TPU cores to train on (1 or 8) / Single TPU to train on (1)

    Default: None.

    Deprecated since version v1.7: tpu_cores has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='tpu' and devices=x instead.

    ipus: How many IPUs to train on.

    Default: None.

    Deprecated since version v1.7: ipus has been deprecated in v1.7 and will be removed in v2.0. Please use accelerator='ipu' and devices=x instead.

    track_grad_norm: -1 no tracking. Otherwise tracks that p-norm. May be set to ‘inf’ infinity-norm. If using

    Automatic Mixed Precision (AMP), the gradients will be unscaled before logging them. Default: -1.

    val_check_interval: How often to check the validation set. Pass a float in the range [0.0, 1.0] to check

    after a fraction of the training epoch. Pass an int to check after a fixed number of training batches. An int value can only be higher than the number of training batches when check_val_every_n_epoch=None, which validates after every N training batches across epochs or during iteration-based training. Default: 1.0.

    enable_model_summary: Whether to enable model summarization by default.

    Default: True.

    move_metrics_to_cpu: Whether to force internal logged metrics to be moved to cpu.

    This can save some gpu memory, but can make training slower. Use with attention. Default: False.

    multiple_trainloader_mode: How to loop over the datasets when there are multiple train loaders.

    In ‘max_size_cycle’ mode, the trainer ends one epoch when the largest dataset is traversed, and smaller datasets reload when running out of their data. In ‘min_size’ mode, all the datasets reload when reaching the minimum length of datasets. Default: "max_size_cycle".

    inference_mode: Whether to use torch.inference_mode() or torch.no_grad() during

    evaluation (validate/test/predict).

property checkpoint_callback: Optional[Checkpoint]

The first ModelCheckpoint callback in the Trainer.callbacks list, or None if it doesn’t exist.

property checkpoint_callbacks: List[Checkpoint]

A list of all instances of ModelCheckpoint found in the Trainer.callbacks list.

property ckpt_path: Optional[str]

Set to the path/URL of a checkpoint loaded via fit(), validate(), test(), or predict(). None otherwise.

property current_epoch: int

The current epoch, updated after the epoch end hooks are run.

property default_root_dir: str

The default location to save artifacts of loggers, checkpoints etc.

It is used as a fallback if logger or checkpoint callback do not define specific save paths.

property device_ids: List[int]

List of device indexes per node.

property early_stopping_callback: Optional[EarlyStopping]

The first EarlyStopping callback in the Trainer.callbacks list, or None if it doesn’t exist.

property early_stopping_callbacks: List[EarlyStopping]

A list of all instances of EarlyStopping found in the Trainer.callbacks list.

property enable_validation: bool

Check if we should run validation during training.

property estimated_stepping_batches: Union[int, float]

Estimated stepping batches for the complete training inferred from DataLoaders, gradient accumulation factor and distributed setup.

Examples:

def configure_optimizers(self):
    optimizer = ...
    scheduler = torch.optim.lr_scheduler.OneCycleLR(
        optimizer, max_lr=1e-3, total_steps=self.trainer.estimated_stepping_batches
    )
    return [optimizer], [scheduler]
fit(model: LightningModule, train_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], Sequence[Sequence[DataLoader]], Sequence[Dict[str, DataLoader]], Dict[str, DataLoader], Dict[str, Dict[str, DataLoader]], Dict[str, Sequence[DataLoader]], LightningDataModule]] = None, val_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader]]] = None, datamodule: Optional[LightningDataModule] = None, ckpt_path: Optional[str] = None) None

Runs the full optimization routine.

Parameters
  • model – Model to fit.

  • train_dataloaders – A collection of torch.utils.data.DataLoader or a LightningDataModule specifying training samples. In the case of multiple dataloaders, please see this section.

  • val_dataloaders – A torch.utils.data.DataLoader or a sequence of them specifying validation samples.

  • ckpt_path – Path/URL of the checkpoint from which training is resumed. Could also be one of two special keywords "last" and "hpc". If there is no checkpoint file at the path, an exception is raised. If resuming from mid-epoch checkpoint, training will start from the beginning of the next epoch.

  • datamodule – An instance of LightningDataModule.

property global_step: int

The number of optimizer steps taken (does not reset each epoch).

This includes multiple optimizers and TBPTT steps (if enabled).

property is_last_batch: bool

Whether trainer is executing the last batch.

property max_epochs

Property that just returns max_epochs, included only so we can have a setter for it without an AttributeError.

property model: Optional[Module]

The LightningModule, but possibly wrapped into DataParallel or DistributedDataParallel.

To access the pure LightningModule, use lightning_module() instead.

property num_devices: int

Number of devices the trainer uses per node.

predict(model: Optional[LightningModule] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], LightningDataModule]] = None, datamodule: Optional[LightningDataModule] = None, return_predictions: Optional[bool] = None, ckpt_path: Optional[str] = None) Optional[Union[List[Any], List[List[Any]]]]

Run inference on your data. This will call the model forward function to compute predictions. Useful to perform distributed and batched predictions. Logging is disabled in the predict hooks.

Parameters
  • model – The model to predict with.

  • dataloaders – A torch.utils.data.DataLoader or a sequence of them, or a LightningDataModule specifying prediction samples.

  • datamodule – The datamodule with a predict_dataloader method that returns one or more dataloaders.

  • return_predictions – Whether to return predictions. True by default except when an accelerator that spawns processes is used (not supported).

  • ckpt_path – Either "best", "last", "hpc" or path to the checkpoint you wish to predict. If None and the model instance was passed, use the current weights. Otherwise, the best model checkpoint from the previous trainer.fit call will be loaded if a checkpoint callback is configured.

Returns

Returns a list of dictionaries, one for each provided dataloader containing their respective predictions.

See Lightning inference section for more.

property prediction_writer_callbacks: List[BasePredictionWriter]

A list of all instances of BasePredictionWriter found in the Trainer.callbacks list.

property progress_bar_callback: Optional[ProgressBarBase]

An instance of ProgressBarBase found in the Trainer.callbacks list, or None if one doesn’t exist.

reset_predict_dataloader(model: Optional[LightningModule] = None) None

Resets the predict dataloader and determines the number of batches.

Parameters

model – The LightningModule if called outside of the trainer scope.

reset_test_dataloader(model: Optional[LightningModule] = None) None

Resets the test dataloader and determines the number of batches.

Parameters

model – The LightningModule if called outside of the trainer scope.

reset_train_dataloader(model: Optional[LightningModule] = None) None

Resets the train dataloader and initialises required variables (number of batches, when to validate, etc.).

Parameters

model – The LightningModule if calling this outside of the trainer scope.

reset_val_dataloader(model: Optional[LightningModule] = None) None

Resets the validation dataloader and determines the number of batches.

Parameters

model – The LightningModule if called outside of the trainer scope.

save_checkpoint(filepath: Union[str, Path], weights_only: bool = False, storage_options: Optional[Any] = None) None

Runs routine to create a checkpoint.

Parameters
  • filepath – Path where checkpoint is saved.

  • weights_only – If True, will only save the model weights.

  • storage_options – parameter for how to save to storage, passed to CheckpointIO plugin

test(model: Optional[LightningModule] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], LightningDataModule]] = None, ckpt_path: Optional[str] = None, verbose: bool = True, datamodule: Optional[LightningDataModule] = None) List[Dict[str, float]]

Perform one evaluation epoch over the test set. It’s separated from fit to make sure you never run on your test set until you want to.

Parameters
  • model – The model to test.

  • dataloaders – A torch.utils.data.DataLoader or a sequence of them, or a LightningDataModule specifying test samples.

  • ckpt_path – Either "best", "last", "hpc" or path to the checkpoint you wish to test. If None and the model instance was passed, use the current weights. Otherwise, the best model checkpoint from the previous trainer.fit call will be loaded if a checkpoint callback is configured.

  • verbose – If True, prints the test results.

  • datamodule – An instance of LightningDataModule.

Returns

List of dictionaries with metrics logged during the test phase, e.g., in model- or callback hooks like test_step(), test_epoch_end(), etc. The length of the list corresponds to the number of test dataloaders used.

tune(model: LightningModule, train_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], Sequence[Sequence[DataLoader]], Sequence[Dict[str, DataLoader]], Dict[str, DataLoader], Dict[str, Dict[str, DataLoader]], Dict[str, Sequence[DataLoader]], LightningDataModule]] = None, val_dataloaders: Optional[Union[DataLoader, Sequence[DataLoader]]] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader]]] = None, datamodule: Optional[LightningDataModule] = None, scale_batch_size_kwargs: Optional[Dict[str, Any]] = None, lr_find_kwargs: Optional[Dict[str, Any]] = None, method: Literal['fit', 'validate', 'test', 'predict'] = 'fit') _TunerResult

Runs routines to tune hyperparameters before training.

Parameters
  • model – Model to tune.

  • train_dataloaders – A collection of torch.utils.data.DataLoader or a LightningDataModule specifying training samples. In the case of multiple dataloaders, please see this section.

  • val_dataloaders – A torch.utils.data.DataLoader or a sequence of them specifying validation samples.

  • dataloaders – A torch.utils.data.DataLoader or a sequence of them specifying val/test/predict samples used for running tuner on validation/testing/prediction.

  • datamodule – An instance of LightningDataModule.

  • scale_batch_size_kwargs – Arguments for scale_batch_size()

  • lr_find_kwargs – Arguments for lr_find()

  • method – Method to run tuner on. It can be any of ("fit", "validate", "test", "predict").

validate(model: Optional[LightningModule] = None, dataloaders: Optional[Union[DataLoader, Sequence[DataLoader], LightningDataModule]] = None, ckpt_path: Optional[str] = None, verbose: bool = True, datamodule: Optional[LightningDataModule] = None) List[Dict[str, float]]

Perform one evaluation epoch over the validation set.

Parameters
  • model – The model to validate.

  • dataloaders – A torch.utils.data.DataLoader or a sequence of them, or a LightningDataModule specifying validation samples.

  • ckpt_path – Either "best", "last", "hpc" or path to the checkpoint you wish to validate. If None and the model instance was passed, use the current weights. Otherwise, the best model checkpoint from the previous trainer.fit call will be loaded if a checkpoint callback is configured.

  • verbose – If True, prints the validation results.

  • datamodule – An instance of LightningDataModule.

Returns

List of dictionaries with metrics logged during the validation phase, e.g., in model- or callback hooks like validation_step(), validation_epoch_end(), etc. The length of the list corresponds to the number of validation dataloaders used.

Non- PyTorch Lightning Trainer

class collie.model.CollieMinimalTrainer(model: BasePipeline, max_epochs: int = 10, gpus: Optional[Union[bool, int]] = None, logger: Optional[Logger] = None, early_stopping_patience: Optional[int] = 3, log_every_n_steps: int = 50, flush_logs_every_n_steps: int = 100, enable_model_summary: bool = True, weights_summary: Optional[str] = None, detect_anomaly: bool = False, terminate_on_nan: Optional[bool] = None, benchmark: bool = True, deterministic: bool = True, progress_bar_refresh_rate: Optional[int] = None, verbosity: Union[bool, int] = True)[source]

A more manual implementation of PyTorch Lightning’s Trainer class, attempting to port over the most commonly used Trainer arguments into a training loop with more transparency and faster training times.

Through extensive experimentation, we found that PyTorch Lightning’s Trainer was training Collie models about 25% slower than the more manual, typical PyTorch training loop boilerplate. Thus, we created the CollieMinimalTrainer, which shares a similar API to PyTorch Lightning’s Trainer object (both in instantiation and in usage), with a standard PyTorch training loop in its place.

While PyTorch Lightning’s Trainer offers more flexibility and customization through the addition of the additional Trainer arguments and callbacks, we designed this class as a way to train a model in production, where we might be more focused on faster training times and less on hyperparameter tuning and R&D, where one might instead opt to use PyTorch Lightning’s Trainer class.

Note that the arguments the CollieMinimalTrainer trainer accepts will be slightly different than the ones that the CollieTrainer accept, and defaults are also not guaranteed to be equal as the two libraries evolve. Notable changes are:

  • If gpus > 1, only a single GPU will be used and any other GPUs will remain unused. Multi- GPU training is not supported in CollieMinimalTrainer at this time.

  • logger == True has no meaning in CollieMinimalTrainer - a default logger will NOT be created if set to True.

  • There is no way to pass in callbacks at this time. Instead, we will implement the most used ones during training here, manually, in favor of greater speed over customization. To use early stopping, set the early_stopping_patience to an integer other than None.

from collie.model import CollieMinimalTrainer, MatrixFactorizationModel


# notice how similar the usage is to the standard ``CollieTrainer``
model = MatrixFactorizationModel(train=train)
trainer = CollieMinimalTrainer(model)
trainer.fit(model)

Model results should NOT be significantly different whether trained with CollieTrainer or CollieMinimalTrainer.

If there’s an argument you would like to see added to CollieMinimalTrainer that is present in CollieTrainer used during productionalized model training, make an Issue or a PR in GitHub!

Parameters
  • model (collie.model.BasePipeline) – Initialized Collie model

  • max_epochs (int) – Stop training once this number of epochs is reached

  • gpus (bool or int) – Whether to train on the GPU (gpus == True or gpus > 0) or the CPU

  • logger (LightningLoggerBase) – Logger for experiment tracking. Set logger = None or logger = False to disable logging

  • early_stopping_patience (int) – Number of epochs of patience to have without any improvement in loss before stopping training early. Validation epoch loss will be used if there is a validation DataLoader present, else training epoch loss will be used. Set early_stopping_patience = None or early_stopping_patience = False to disable early stopping

  • log_every_n_steps (int) – How often to log within steps, if logger is enabled

  • flush_logs_every_n_steps (int) – How often to flush logs to disk, if logger is enabled

  • enable_model_summary (bool) – Whether to enable or disable the model summarization

  • weights_summary (str) – Deprecated, replaced with enable_model_summary. Prints summary of the weights when training begins

  • detect_anomaly (bool) –

    Context-manager that enable anomaly detection for the autograd engine. This does two things:

    • Running the forward pass with detection enabled will allow the backward pass to print the traceback of the forward operation that created the failing backward function.

    • Any backward computation that generate “nan” value will raise an error.

    Warning: This mode should be enabled only for debugging as the different tests will slow down your program execution.

  • terminate_on_nan (bool) – Deprecated, replaced with detect_anomaly. If set to True, will terminate training (by raising a ValueError) at the end of each training batch, if any of the parameters or the loss are NaN or +/- infinity

  • benchmark (bool) – If set to True, enables cudnn.benchmark

  • deterministic (bool) – If set to True, enables cudnn.deterministic

  • progress_bar_refresh_rate (int) – How often to refresh progress bar (in steps), if verbosity > 0

  • verbosity (Union[bool, int]) –

    How verbose to be in training.

    • 0 disables all printouts, including weights_summary

    • 1 prints weights_summary (if applicable) and epoch losses

    • 2 prints weights_summary (if applicable), epoch losses, and progress bars

fit(model: BasePipeline) None[source]

Runs the full optimization routine.

Parameters

model (collie.model.BasePipeline) – Initialized Collie model

property max_epochs

Property that just returns max_epochs, included only so we can have a setter for it without an AttributeError.

Model Templates

Base Collie Pipeline Template

class collie.model.BasePipeline(train: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, ExplicitInteractions, Interactions, InteractionsDataLoader]] = None, val: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, ExplicitInteractions, Interactions, InteractionsDataLoader]] = None, lr: float = 0.001, lr_scheduler_func: Optional[_LRScheduler] = None, weight_decay: float = 0.0, optimizer: Union[str, Optimizer] = 'adam', loss: Union[str, Callable[[...], tensor]] = 'hinge', metadata_for_loss: Optional[Dict[str, tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None, **kwargs)[source]

Bases: LightningModule

Base Pipeline model architectures to inherit from.

All subclasses MUST at least override the following methods:

  • _setup_model - Set up the model architecture

  • forward - Forward pass through a model

For item_item_similarity to work properly, all subclasses are should also implement:

  • _get_item_embeddings - Returns item embeddings from the model on the device

For user_user_similarity to work properly, all subclasses are should also implement:

  • _get_user_embeddings - Returns user embeddings from the model on the device

Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adagrad' (for torch.optim.Adagrad)

    • 'adam' (for torch.optim.Adam)

    • 'sparse_adam' (for torch.optim.SparseAdam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

  • **kwargs (keyword arguments) – All keyword arguments will be saved to self.hparams by default

calculate_loss(batch: Union[Tuple[Tuple[tensor, tensor], tensor], Tuple[tensor, tensor, tensor]]) tensor[source]

Given a batch of data, calculate the loss value.

Note that the data type (implicit or explicit) will be determined by the structure of the batch sent to this method. See the table below for expected data types:

__getitem__ Format

Expected Meaning

Model Type

((X, Y), Z)

((user IDs, item IDs), negative item IDs)

Implicit

(X, Y, Z)

(user IDs, item IDs, ratings)

Explicit

configure_optimizers() Union[Tuple[List[Optimizer], List[Optimizer]], Tuple[Optimizer, Optimizer], Optimizer][source]

Configure optimizers and learning rate schedulers to use in optimization.

This method will be called after setup.

If self.bias_optimizer is None, only a single optimizer will be returned. If there is a non-None class attribute for bias_optimizer, two optimizers will be created: one for all layers with the name ‘bias’ in them, and another for all other model parameters. The bias optimizer will be set with the same parameters as optimizer with the exception of the learning rate, which will be set to self.hparams.bias_lr.

abstract forward(users: tensor, items: tensor) tensor[source]

forward should be implemented in all subclasses.

get_item_predictions(user_id: int = 0, unseen_items_only: bool = False, sort_values: bool = True) Series[source]

Get predicted rankings/ratings for all items for a given user_id.

This method cannot be called for datasets stored in HDF5InteractionsDataLoader since data in this DataLoader is read in dynamically.

Parameters
  • user_id (int) –

  • unseen_items_only (bool) – Filter preds to only show predictions of unseen items not present in the training or validation datasets for that user_id. Note this requires both train_loader and val_loader to be 1) class-level attributes in the model and 2) DataLoaders with Interactions at its core (not HDF5Interactions). If you are loading in a model, these two attributes will need to be set manually, since datasets are NOT saved when saving the model

  • sort_values (bool) – Whether to sort recommendations by descending prediction probability or not

Returns

preds – Sorted values as predicted ratings for each item in the dataset with the index being the item ID

Return type

pd.Series

get_user_predictions(item_id: int = 0, unseen_users_only: bool = False, sort_values: bool = True) Series[source]

User counterpart to get_item_predictions method.

Get predicted rankings/ratings for all users for a given item_id.

This method cannot be called for datasets stored in HDF5InteractionsDataLoader since data in this DataLoader is read in dynamically.

Parameters
  • item_id (int) –

  • unseen_users_only (bool) – Filter preds to only show predictions of unseen users not present in the training or validation datasets for that item_id. Note this requires both train_loader and val_loader to be 1) class-level attributes in the model and 2) DataLoaders with Interactions at its core (not HDF5Interactions). If you are loading in a model, these two attributes will need to be set manually, since datasets are NOT saved when saving the model

  • sort_values (bool) – Whether to sort recommendations by descending prediction probability

Returns

preds – Sorted values as predicted ratings for each user in the dataset with the index being the user ID

Return type

pd.Series

item_item_similarity(item_id: int) Series[source]

Get most similar item indices by cosine similarity.

Cosine similarity is computed with item embeddings from a trained model.

Parameters

item_id (int) –

Returns

sim_score_idxs – Sorted values as cosine similarity for each item in the dataset with the index being the item ID

Return type

pd.Series

Note

Returned array is unfiltered, so the first element, being the most similar item, will always be the item itself.

on_fit_start() None[source]

Method that runs at the very beginning of the fit process.

This method will be called after configure_optimizers.

save_model(filename: Union[str, Path] = 'model.pth') None[source]

Save the model’s state dictionary and hyperparameters.

While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:

  1. We only want to save the underlying PyTorch model (and not the Trainer object) so we don’t have to require PyTorch Lightning as a dependency when deploying a model.

  2. In the v0.8.4 release, loading a model back in leads to a RuntimeError unable to load in weights.

Parameters

filepath (str or Path) – Filepath for state dictionary to be saved at ending in ‘.pth’

train_dataloader() Union[ApproximateNegativeSamplingInteractionsDataLoader, InteractionsDataLoader][source]

Method that sets up training data as a PyTorch DataLoader.

This method will be called after on_fit_start.

training_epoch_end(outputs: Union[List[float], List[List[float]]]) None[source]

Method that contains a callback for logic to run after the training epoch ends.

This method will be called after training_step.

training_step(batch: Tuple[Tuple[tensor, tensor], tensor], batch_idx: int, optimizer_idx: Optional[int] = None) tensor[source]

Method that contains logic for what happens inside the training loop.

This method will be called after train_dataloader.

user_user_similarity(user_id: int) Series[source]

User counterpart to item_item_similarity method.

Get most similar user indices by cosine similarity.

Cosine similarity is computed with user embeddings from a trained model.

Parameters

user_id (int) –

Returns

sim_score_idxs – Sorted values as cosine similarity for each user in the dataset with the index being the user ID

Return type

pd.Series

Note

Returned array is unfiltered, so the first element, being the most similar user, will always be the seed user themself.

val_dataloader() Union[ApproximateNegativeSamplingInteractionsDataLoader, InteractionsDataLoader][source]

Method that sets up validation data as a PyTorch DataLoader.

This method will be called after training_step.

validation_epoch_end(outputs: List[float]) None[source]

Method that contains a callback for logic to run after the validation epoch ends.

This method will be called after validation_step.

validation_step(batch: Tuple[Tuple[tensor, tensor], tensor], batch_idx: int, optimizer_idx: Optional[int] = None) tensor[source]

Method that contains logic for what happens inside the validation loop.

This method will be called after val_dataloader.

Base Collie Multi-Stage Pipeline Template

class collie.model.MultiStagePipeline(train: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, Interactions, InteractionsDataLoader]] = None, val: Optional[Union[ApproximateNegativeSamplingInteractionsDataLoader, Interactions, InteractionsDataLoader]] = None, lr_scheduler_func: Optional[_LRScheduler] = None, weight_decay: float = 0.0, optimizer_config_list: Optional[List[Dict[str, Union[float, List[str], str]]]] = None, loss: Union[str, Callable[[...], tensor]] = 'hinge', metadata_for_loss: Optional[Dict[str, tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None, **kwargs)[source]

Bases: BasePipeline

Multi-stage pipeline model architectures to inherit from.

This model template is intended for models that train in distinct stages, with a different optimizer optimizing each step. This allows model components to be optimized with a set order in mind, rather than all at once, such as with the BasePipeline.

Generally, multi-stage models will have a training protocol like:

from collie.model import CollieTrainer, SomeMultiStageModel

model = SomeMultiStageModel(train=train)
trainer = CollieTrainer(model)

# fit stage 1
trainer.fit(model)

# fit stage 2
trainer.max_epochs += 10
model.advance_stage()
trainer.fit(model)

# fit stage 3
trainer.max_epochs += 10
model.advance_stage()
trainer.fit(model)

# ... and so on, until...

model.eval()

Just like with BasePipeline, all subclasses MUST at least override the following methods:

  • _setup_model - Set up the model architecture

  • forward - Forward pass through a model

For item_item_similarity to work properly, all subclasses are should also implement:

  • _get_item_embeddings - Returns item embeddings from the model

Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer_config_list (list of dict) –

    List of dictionaries containing the optimizer configurations for each stage’s optimizer(s). Each dictionary must contain the following keys:

    • lr: str

      Learning rate for the optimizer

    • optimizer: torch.optim or str

    • parameter_prefix_list: List[str]

      List of string prefixes corressponding to the model components that should be optimized with this optimizer

    • stage: str

      Name of stage

    This must be ordered with the intended progression of stages.

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

  • **kwargs (keyword arguments) – All keyword arguments will be saved to self.hparams by default

Notes

  • With each call of trainer.fit, the optimizer and learning rate scheduler state will reset.

  • When loading a multi-stage model in, the state will be set to the last possible state. This state may have a different forward calculation than other states.

advance_stage() None[source]

Advance the stage to the next one in self.hparams.stage_list.

configure_optimizers() Union[Tuple[List[Optimizer], List[Optimizer]], Tuple[Optimizer, Optimizer], Optimizer][source]

Configure optimizers and learning rate schedulers to use in optimization.

This method will be called after setup.

Creates an optimizer and learning rate scheduler for each configuration dictionary in self.hparams.optimizer_config_list.

optimizer_step(epoch: Optional[int] = None, batch_idx: Optional[int] = None, optimizer: Optional[Optimizer] = None, optimizer_idx: Optional[int] = None, optimizer_closure: Optional[Callable[[...], Any]] = None, **kwargs) None[source]

Overriding Lightning’s optimizer step function to only step the optimizer associated with the relevant stage.

See here for more details: https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#optimizer-step

Parameters
  • epoch (int) – Current epoch

  • batch_idx (int) – Index of current batch

  • optimizer (torch.optim.Optimizer) – A PyTorch optimizer

  • optimizer_idx (int) – If you used multiple optimizers, this indexes into that list

  • optimizer_closure (Callable) – Closure for all optimizers

set_stage(stage: str) None[source]

Set the model to the desired stage.

Layers

Scaled Embedding

class collie.model.ScaledEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[Tensor] = None, device=None, dtype=None)[source]

Bases: Embedding

Embedding layer that initializes its values to use a truncated normal distribution.

reset_parameters() None[source]

Overriding default reset_parameters method.

Zero Embedding

class collie.model.ZeroEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[Tensor] = None, device=None, dtype=None)[source]

Bases: Embedding

Embedding layer with weights zeroed-out.

reset_parameters() None[source]

Overriding default reset_parameters method.