Models

Instantiating and Training a Collie Model

Collie provides architectures for several state-of-the-art recommendation model architectures for both non-hybrid and hybrid models, depending on if you would like to directly incorporate metadata into the model.

Since Collie utilizes PyTorch Lightning for model training, all models, by default:

  • Are compatible with CPU, GPU, multi-GPU, and TPU training

  • Allow for 16-bit precision

  • Integrate with common external loggers

  • Allow for extensive predefined and custom training callbacks

  • Are flexible with minimal boilerplate code

While each model’s API differs slightly, generally, the training procedure for each model will look like:

from collie.model import CollieTrainer, MatrixFactorizationModel


# assume you have ``interactions`` already defined and ready-to-go

model = MatrixFactorizationModel(interactions)

trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# now, ``model`` is ready to be used for inference, evaluation, etc.

model.save_model('model.pkl')

When we have side-data about items, this can be incorporated directly into the loss function of the model. For details on this, see Losses.

Hybrid Collie models also allow incorporating this side-data directly into the model. For an in-depth example of this, see Tutorials.

Creating a Custom Architecture

Collie not only houses incredible pre-defined architectures, but was built with customization in mind. All Collie recommendation models are built as subclasses of the BasePipeline model, inheriting common loss calculation functions and model training boilerplate. This allows for a nice balance between both flexibility and faster iteration.

While any method can be overridden with more architecture-specific implementations, at the bare minimum, each additional model must override:

  • _setup_model - Model architecture initialization

  • forward - Model step that accepts a batch of data of form (users, items), negative_items and outputs a recommendation score for each item

If we wanted to create a custom model that performed a barebones matrix factorization calculation, in Collie, this would be implemented as:

import torch

from collie.model import BasePipeline, CollieTrainer, ScaledEmbedding
from collie.utils import get_init_arguments


class SimpleModel(BasePipeline):
    def __init__(self, train, val, embedding_dim):
        """
        Initialize a simple model that is a subclass of ``BasePipeline``.

        Parameters
        ----------
        train: ``collie.interactions`` object
        val: ``collie.interactions`` object
        embedding_dim: int
            Number of latent factors to use for user and item embeddings

        """
        super().__init__(**get_init_arguments())

    def _setup_model(self, **kwargs):
        """Method for building model internals that rely on the data passed in."""
        self.user_embeddings = ScaledEmbedding(num_embeddings=self.hparams.num_users,
                                               embedding_dim=self.hparams.embedding_dim)
        self.item_embeddings = ScaledEmbedding(num_embeddings=self.hparams.num_items,
                                               embedding_dim=self.hparams.embedding_dim)

    def forward(self, users, items):
        """
        Forward pass through the model.

        Parameters
        ----------
        users: tensor, 1-d
            Array of user indices
        items: tensor, 1-d
            Array of item indices

        Returns
        -------
        preds: tensor, 1-d
            Predicted scores

        """
        return torch.mul(
            self.user_embeddings(users), self.item_embeddings(items)
        ).sum(axis=1)


# assume you have ``train`` and ``val`` already defined and ready-to-go

model = SimpleModel(train, val, embedding_dim=10)

trainer = CollieTrainer(model, max_epochs=10)
trainer.fit(model)
model.eval()

# now, ``model`` is ready to be used for inference, evaluation, etc.

model.save_model('model.pkl')

See the source code for the BasePipeline in Model Templates below for the calling order of each class method as well as initialization details for optimizers, schedulers, and more.

Standard Models

Matrix Factorization Model

class collie.model.MatrixFactorizationModel(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, dropout_p: float = 0.0, sparse: bool = False, lr: float = 0.001, bias_lr: Optional[Union[float, str]] = 0.01, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', bias_optimizer: Optional[Union[str, Callable]] = 'sgd', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, y_range: Optional[Tuple[float, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Training pipeline for the matrix factorization model.

MatrixFactorizationModel models have an embedding layer for both users and items which are dot-producted together to output a single float ranking value.

Collie adds a twist on to this incredibly popular framework by allowing separate optimizers for embeddings and bias terms. With larger datasets and multiple epochs of training, a model might incorrectly learn to only optimize the bias terms for a quicker path towards a local loss minimum, essentially memorizing how popular each item is. By using a separate, slower optimizer for the bias terms (like Stochastic Gradient Descent), the model must prioritize optimizing the embeddings for meaningful, more varied recommendations, leading to a model that is able to achieve a much lower loss. See the documentation below for bias_lr and bias_optimizer input arguments for implementation details.

All MatrixFactorizationModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, MatrixFactorizationModel

model = MatrixFactorizationModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = MatrixFactorizationModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • dropout_p (float) – Probability of dropout

  • sparse (bool) – Whether or not to treat embeddings as sparse tensors. If True, cannot use weight decay on the optimizer

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adagrad' (for torch.optim.Adagrad)

    • 'adam' (for torch.optim.Adam)

    • 'sparse_adam' (for torch.optim.SparseAdam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model.

Simple matrix factorization for a single user and item looks like:

``prediction = (user_embedding * item_embedding) + user_bias + item_bias``

If dropout is added, it is applied to the two embeddings and not the biases.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Multilayer Perceptron Matrix Factorization Model

class collie.model.MLPMatrixFactorizationModel(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, num_layers: int = 3, dropout_p: float = 0.0, lr: float = 0.001, bias_lr: Optional[Union[float, str]] = 0.01, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', bias_optimizer: Optional[Union[str, Callable]] = 'sgd', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, y_range: Optional[Tuple[float, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Training pipeline for the matrix factorization model with MLP layers instead of a final dot

product (like in MatrixFactorizationModel).

MLPMatrixFactorizationModel models have an embedding layer for both users and items which, are concatenated and sent through a MLP to output a single float ranking value.

All MLPMatrixFactorizationModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, MLPMatrixFactorizationModel

model = MLPMatrixFactorizationModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = MLPMatrixFactorizationModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula embedding_dim * (2 ** (``num_layers - current_layer_number))``

  • dropout_p (float) – Probability of dropout on the linear layers

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model, roughly:

`prediction = MLP(concatenate(user_embedding * item_embedding)) + user_bias + item_bias`

If dropout is added, it is applied for the two embeddings and not the biases.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Nonlinear Embeddings Matrix Factorization Model

class collie.model.NonlinearMatrixFactorizationModel(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, user_embedding_dim: int = 60, item_embedding_dim: int = 60, user_dense_layers_dims: List[float] = [48, 32], item_dense_layers_dims: List[float] = [48, 32], embedding_dropout_p: float = 0.0, dense_dropout_p: float = 0.0, lr: float = 0.001, bias_lr: Optional[Union[float, str]] = 0.01, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', bias_optimizer: Optional[Union[str, Callable]] = 'sgd', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, y_range: Optional[Tuple[float, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Training pipeline for a nonlinear matrix factorization model.

NonlinearMatrixFactorizationModel models have an embedding layer for users and items. These are sent through separate dense networks, which output more refined embeddings, which are then dot producted for a single float ranking / rating.

Collie adds a twist on to this novel framework by allowing separate optimizers for embeddings and bias terms. With larger datasets and multiple epochs of training, a model might incorrectly learn to only optimize the bias terms for a quicker path towards a local loss minimum, essentially memorizing how popular each item is. By using a separate, slower optimizer for the bias terms (like Stochastic Gradient Descent), the model must prioritize optimizing the embeddings for meaningful, more varied recommendations, leading to a model that is able to achieve a much lower loss. See the documentation below for bias_lr and bias_optimizer input arguments for implementation details.

All NonlinearMatrixFactorizationModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, NonlinearMatrixFactorizationModel

model = NonlinearMatrixFactorizationModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = NonlinearMatrixFactorizationModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • user_embedding_dim (int) – Number of latent factors to use for user embeddings

  • item_embedding_dim (int) – Number of latent factors to use for item embeddings

  • user_dense_layers_dims (list) – List of linear layer dimensions to apply to the user embedding, starting with the dimension directly following user_embedding_dim

  • item_dense_layers_dims (list) – List of linear layer dimensions to apply to the item embedding, starting with the dimension directly following item_embedding_dim

  • embedding_dropout_p (float) – Probability of dropout on the embedding layers

  • dense_dropout_p (float) – Probability of dropout on the dense layers

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Collaborative Metric Learning Model

class collie.model.CollaborativeMetricLearningModel(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 30, sparse: bool = False, lr: float = 0.001, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, y_range: Optional[Tuple[float, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Training pipeline for the collaborative metric learning model.

CollaborativeMetricLearningModel models have an embedding layer for both users and items. A single float, prediction is retrieved by taking the pairwise distance between the two embeddings.

The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1803.00202.pdf 1

All CollaborativeMetricLearningModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollaborativeMetricLearningModel, CollieTrainer

model = CollaborativeMetricLearningModel(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = CollaborativeMetricLearningModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • sparse (bool) – Whether or not to treat embeddings as sparse tensors. If True, cannot use weight decay on the optimizer

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adagrad' (for torch.optim.Adagrad)

    • 'adam' (for torch.optim.Adam)

    • 'sparse_adam' (for torch.optim.SparseAdam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • y_range (tuple) – Specify as (min, max) to apply a sigmoid layer to the output score of the model to get predicted ratings within the range of min and max

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

References

1

Campo, Miguel, et al. “Collaborative Metric Learning Recommendation System: Application to Theatrical Movie Releases.” ArXiv.org, 1 Mar. 2018, arxiv.org/abs/1803.00202.

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model, equivalent to:

`prediction = pairwise_distance(user_embedding * item_embedding)`

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Neural Collaborative Filtering (NeuCF)

class collie.model.NeuralCollaborativeFiltering(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 8, num_layers: int = 3, final_layer: Optional[Union[str, Callable]] = None, dropout_p: float = 0.0, lr: float = 0.001, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Training pipeline for a neural matrix factorization model.

NeuralCollaborativeFiltering models combine a collaborative filtering and multilayer perceptron network in a single, unified model. The model consists of two sections: the first is a simple matrix factorization that calculates a score by multiplying together user and item embeddings (lookups through an embedding table); the second is a MLP network that feeds embeddings from a second set of embedding tables (one for user, one for item). Both output vectors are combined and sent through a final MLP layer before returning a single recommendation score.

The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1708.05031.pdf 2

All NeuralCollaborativeFiltering instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, NeuralCollaborativeFiltering

model = NeuralCollaborativeFiltering(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = NeuralCollaborativeFiltering(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for the matrix factorization embedding table. For the MLP embedding table, the dimensionality will be calculated with the formula embedding_dim * (2 ** (num_layers - 1))

  • num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula embedding_dim * (2 ** (``num_layers - current_layer_number))``

  • final_layer (str or function) –

    Final layer activation function. Available string options include:

    • ’sigmoid’

    • ’relu’

    • ’leaky_relu’

  • dropout_p (float) – Probability of dropout on the MLP layers

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

References

2

Xiangnan et al. “Neural Collaborative Filtering.” Neural Collaborative Filtering | Proceedings of the 26th International Conference on World Wide Web, 1 Apr. 2017, dl.acm.org/doi/10.1145/3038912.3052569.

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Deep Factorization Machine (DeepFM)

class collie.model.DeepFM(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, embedding_dim: int = 8, num_layers: int = 3, final_layer: Optional[Union[str, Callable]] = None, dropout_p: float = 0.0, lr: float = 0.001, bias_lr: Optional[Union[float, str]] = 0.01, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', bias_optimizer: Optional[Union[str, Callable]] = 'sgd', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Training pipeline for a deep factorization model.

DeepFM models combine a shallow factorization machine and a deep multilayer perceptron network in a single, unified model. The model consists of embedding tables for users and items, and model output is the sum of 1) factorization machine output of both embeddings (shallow) and 2) MLP output for the concatenation of both embeddings (deep).

The implementation here is meant to mimic its original implementation as specified here: https://arxiv.org/pdf/1703.04247.pdf 3

All DeepFM instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, DeepFM

model = DeepFM(train=train)
trainer = CollieTrainer(model)
trainer.fit(model)
model.eval()

# do evaluation as normal with ``model``

model.save_model(filename='model.pth')
new_model = DeepFM(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • embedding_dim (int) – Number of latent factors to use for the matrix factorization embedding table. For the MLP embedding table, the dimensionality will be calculated with the formula embedding_dim * (2 ** (num_layers - 1))

  • num_layers (int) – Number of MLP layers to apply. Each MLP layer will have its input dimension calculated with the formula embedding_dim * (2 ** (``num_layers - current_layer_number))``

  • final_layer (str or function) –

    Final layer activation function. Available string options include:

    • ’sigmoid’

    • ’relu’

    • ’leaky_relu’

  • dropout_p (float) – Probability of dropout

  • lr (float) – Model learning rate

  • bias_lr (float) – Bias terms learning rate. If ‘infer’, will set equal to lr

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • bias_optimizer (torch.optim or str) – Optimizer for the bias terms. This supports the same string options as optimizer, with the addition of infer, which will set the optimizer equal to optimizer. If bias_optimizer is None, only a single optimizer will be created for all model parameters

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

References

3

Guo, Huifeng, et al. “DeepFM: A Factorization-Machine Based Neural Network for CTR Prediction.” ArXiv.org, 13 Mar. 2017, arxiv.org/abs/1703.04247.

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

Hybrid Models

Hybrid Pretrained Matrix Factorization Model

class collie.model.HybridPretrainedModel(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_metadata: Optional[Union[None._VariableFunctionsClass.tensor, pandas.core.frame.DataFrame, numpy.array]] = None, trained_model: Optional[collie.model.matrix_factorization.MatrixFactorizationModel] = None, metadata_layers_dims: Optional[List[int]] = None, combined_layers_dims: List[int] = [128, 64, 32], freeze_embeddings: bool = True, dropout_p: float = 0.0, lr: float = 0.001, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=True), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Training pipeline for a hybrid recommendation model using a pre-trained matrix factorization

model as its base.

HybridPretrainedModel models contain dense layers that process item metadata, concatenate this embedding with the user and item embeddings copied from a trained MatrixFactorizationModel, and send this concatenated embedding through more dense layers to output a single float ranking / rating. We add both user and item biases to this score before returning. This is the same architecture as the HybridModel, but we are using the embeddings from a pre-trained model rather than training them up ourselves.

All HybridPretrainedModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, HybridPretrainedModel, MatrixFactorizationModel

# instantiate and fit a ``MatrixFactorizationModel`` as expected
mf_model = MatrixFactorizationModel(train=train)
mf_trainer = CollieTrainer(mf_model)
mf_trainer.fit(mf_model)

hybrid_model = HybridPretrainedModel(train=train,
                                     item_metadata=item_metadata,
                                     trained_model=mf_model)
hybrid_trainer = CollieTrainer(hybrid_model)
hybrid_trainer.fit(hybrid_model)
hybrid_model.eval()

# do evaluation as normal with ``hybrid_model``

hybrid_model.save_model(path='model')
new_hybrid_model = HybridPretrainedModel(load_model_path='model')

# do evaluation as normal with ``new_hybrid_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • item_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the item metadata should be (num_items x metadata_features), and each item’s metadata should be available when indexing a row by an item ID

  • trained_model (collie.model.MatrixFactorizationModel) – Previously trained MatrixFactorizationModel model to extract embeddings from

  • metadata_layers_dims (list) – List of linear layer dimensions to apply to the metadata only, starting with the dimension directly following metadata_features and ending with the dimension to concatenate with the item embeddings

  • combined_layers_dims (list) – List of linear layer dimensions to apply to the concatenated item embeddings and item metadata, starting with the dimension directly following the shape of item_embeddings + metadata_features and ending with the dimension before the final linear layer to dimension 1

  • freeze_embeddings (bool) – When initializing the model, whether or not to freeze trained_model’s embeddings

  • dropout_p (float) – Probability of dropout

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

freeze_embeddings()None[source]

Remove gradient requirement from the embeddings.

load_from_hybrid_model(hybrid_model)None[source]

Copy hyperparameters and state dictionary from an existing HybridPretrainedModel instance.

This is particularly useful for creating another PyTorch Lightning trainer object to fine-tune copied-over embeddings from a MatrixFactorizationModel instance.

Parameters

hybrid_model (collie.model.HybridPretrainedModel) – HybridPretrainedModel containing hyperparameters and state dictionary to copy over

save_model(path: Union[str, pathlib.Path] = 'data/model', overwrite: bool = False)None[source]

Save the model’s state dictionary, hyperparameters, and item metadata.

While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:

  1. To properly save and load a model requires the Trainer object, meaning that all deployed models will require Lightning to run the model, which is not actually needed for inference.

  2. In the v0.8.4 release, loading a model back in leads to a RuntimeError unable to load in weights.

Parameters
  • path (str or Path) – Directory path to save model and data files

  • overwrite (bool) – Whether or not to overwrite existing data

unfreeze_embeddings()None[source]

Require gradients for the embeddings.

Multi-Stage Models

Cold Start Matrix Factorization Model

class collie.model.ColdStartModel(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_buckets: Optional[Iterable[int]] = None, embedding_dim: int = 30, dropout_p: float = 0.0, sparse: bool = False, item_buckets_stage_lr: float = 0.001, no_buckets_stage_lr: float = 0.001, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=False), weight_decay: float = 0.0, item_buckets_stage_optimizer: Union[str, Callable] = 'adam', no_buckets_stage_optimizer: Union[str, Callable] = 'adam', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.multi_stage_pipeline.MultiStagePipeline

Training pipeline for a matrix factorization model optimized for the cold-start problem.

Many recommendation models suffer from the cold start problem, when a model is unable to provide adequate recommendations for a new item until enough users have interacted with it. But, if users only interact with recommended items, the item will never be recommended, and thus the model will never improve recommendations for this item.

The ColdStartModel attempts to bypass this by limiting the item space down to “item buckets”, training a model on this as the item space, then expanding out to all items. During this expansion, the learned-embeddings of each bucket is copied over to each corresponding item, providing a smarter initialization than a random one for both existing and new items. Now, when we have a new item, we can use its bucket embedding as an initialization into a model.

The stages in a ColdStartModel are, in order:

  1. item_buckets

    Matrix factorization with item embeddings and bias terms bucketed by item_buckets argument. Unlike in the next stage, many items may map on to a single bucket, and this will share the same embedding and bias representation. The model should learn user preference for buckets in this stage.

  2. no_buckets

    Standard matrix factorization as we do in MatrixFactorizationModel. However, upon advancing to this stage, the item embeddings are initialized with their bucketed embedding value (and same for biases). Not only does this provide better initialization than random, but allows new items to be incorporated into the model without training by using their item bucket embedding and bias terms at prediction time.

Note that the cold start problem exists for new users as well, but this functionality will be added to this model in a future version.

All ColdStartModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import ColdStartModel, CollieTrainer

# instantiate and fit a ``ColdStartModel`` as expected
model = ColdStartModel(train=train, item_buckets=item_buckets)
trainer = CollieTrainer(model)
trainer.fit(model)

# train for X more epochs on the next stage, ``no_buckets``
trainer.max_epochs += X
model.advance_stage()
trainer.fit(model)

model.eval()

# do evaluation as normal with ``model``

# get item-item recommendations for a new item by using the bucket ID, Z
similar_items = model.item_bucket_item_similarity(item_bucket_id=Z)

model.save_model(filename='model.pth')
new_model = ColdStartModel(load_model_path='model.pth')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • item_buckets (torch.tensor, 1-d) –

    An ordered iterable containing the bucket ID for each item ID. For example, if you have five films and are going to bucket by primary genre, and your data looks like:

    • Item ID: 0, Genre ID: 1

    • Item ID: 1, Genre ID: 0

    • Item ID: 2, Genre ID: 2

    • Item ID: 3, Genre ID: 2

    • Item ID: 4, Genre ID: 1

    Then item_buckets would be: [1, 0, 2, 2, 1]

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • dropout_p (float) – Probability of dropout

  • item_buckets_stage_lr (float) –

    Optimizer used for user parameters and item bucket parameters optimized during the item_buckets stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • no_buckets_stage_lr (float) –

    Optimizer used for user parameters and item parameters optimized during the no_buckets stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

Notes

The forward calculation will be different depending on the stage that is set. Note this when evaluating / saving and loading models in.

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

item_bucket_item_similarity(item_bucket_id: int)pandas.core.series.Series[source]

Get most similar item indices to a item bucket by cosine similarity.

Cosine similarity is computed with item and item bucket embeddings from a trained model.

Parameters

item_id (int) –

Returns

sim_score_idxs – Sorted values as cosine similarity for each item in the dataset with the index being the item ID

Return type

pd.Series

set_stage(stage: str)None[source]

Set the stage for the model.

Hybrid Matrix Factorization Model

class collie.model.HybridModel(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, item_metadata: Optional[Union[None._VariableFunctionsClass.tensor, pandas.core.frame.DataFrame, numpy.array]] = None, embedding_dim: int = 30, metadata_layers_dims: Optional[List[int]] = None, combined_layers_dims: List[int] = [128, 64, 32], dropout_p: float = 0.0, lr: float = 0.001, bias_lr: Optional[Union[float, str]] = 0.01, metadata_only_stage_lr: float = 0.001, all_stage_lr: float = 0.0001, lr_scheduler_func: Optional[Callable] = functools.partial(<class 'torch.optim.lr_scheduler.ReduceLROnPlateau'>, patience=1, verbose=False), weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', bias_optimizer: Optional[Union[str, Callable]] = 'sgd', metadata_only_stage_optimizer: Union[str, Callable] = 'adam', all_stage_optimizer: Union[str, Callable] = 'adam', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None)[source]

Bases: collie.model.base.multi_stage_pipeline.MultiStagePipeline

Training pipeline for a multi-stage hybrid recommendation model.

HybridModel models contain dense layers that process item metadata, concatenate this embedding with user and item embeddings, sending this concatenated embedding through more dense layers to output a single float ranking / rating. We add both user and item biases to this score before returning. This is the same architecture as the HybridPretrainedModel, but we are training the embeddings ourselves rather than relying on pulling this from a pre-trained model.

The stages in a HybridModel are, in order:

  1. matrix_factorization

    Matrix factorization exactly as we do in MatrixFactorizationModel. In this stage, metadata is NOT incorporated into the model.

  2. metadata_only

    User and item embeddings terms are frozen, and the MLP layers for the metadata (if specified) and combined embedding-metadata data are optimized.

  3. all

    Embedding and MLP layers are all optimized together, including those for metadata.

All HybridModel instances are subclasses of the LightningModule class provided by PyTorch Lightning. This means to train a model, you will need a collie.model.CollieTrainer object, but the model can be saved and loaded without this Trainer instance. Example usage may look like:

from collie.model import CollieTrainer, HybridModel

# instantiate and fit a ``HybridModel`` as expected
model = HybridModel(train=train, item_metadata=item_metadata)
trainer = CollieTrainer(model)
trainer.fit(model)

# train for X more epochs on the next stage, ``metadata_only``
trainer.max_epochs += X
model.advance_stage()
trainer.fit(model)

# train for Y more epochs on the next stage, ``all``
trainer.max_epochs += Y
model.advance_stage()
trainer.fit(model)

model.eval()

# do evaluation as normal with ``model``

model.save_model(path='model')
new_model = HybridModel(load_model_path='model')

# do evaluation as normal with ``new_model``
Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • item_metadata (torch.tensor, pd.DataFrame, or np.array, 2-dimensional) – The shape of the item metadata should be (num_items x metadata_features), and each item’s metadata should be available when indexing a row by an item ID

  • embedding_dim (int) – Number of latent factors to use for user and item embeddings

  • metadata_layers_dims (list) – List of linear layer dimensions to apply to the metadata only, starting with the dimension directly following metadata_features and ending with the dimension to concatenate with the item embeddings

  • combined_layers_dims (list) – List of linear layer dimensions to apply to the concatenated item embeddings and item metadata, starting with the dimension directly following the shape of item_embeddings + metadata_features and ending with the dimension before the final linear layer to dimension 1

  • dropout_p (float) – Probability of dropout

  • metadata_only_stage_lr (float) – Learning rate for metadata and combined layers optimized during the metadata_only stage

  • all_stage_lr (float) – Learning rate for all model parameters optimized during the all stage

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    Optimizer used for embeddings and bias terms (if bias_optimizer is None) during the matrix_factorization stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • metadata_only_stage_optimizer (torch.optim or str) –

    Optimizer used for metadata and combined layers during the metadata_only stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • all_stage_optimizer (torch.optim or str) –

    Optimizer used for all model parameters during the all stage. If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adam' (for torch.optim.Adam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

Notes

The forward calculation will be different depending on the stage that is set. Note this when evaluating / saving and loading models in.

forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

Forward pass through the model.

Parameters
  • users (tensor, 1-d) – Array of user indices

  • items (tensor, 1-d) – Array of item indices

Returns

preds – Predicted ratings or rankings

Return type

tensor, 1-d

save_model(path: Union[str, pathlib.Path] = 'data/model', overwrite: bool = False)None[source]

Save the model’s state dictionary, hyperparameters, and item metadata.

While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:

  1. To properly save and load a model requires the Trainer object, meaning that all deployed models will require Lightning to run the model, which is not actually needed for inference.

  2. In the v0.8.4 release, loading a model back in leads to a RuntimeError unable to load in weights.

Parameters
  • path (str or Path) – Directory path to save model and data files

  • overwrite (bool) – Whether or not to overwrite existing data

Trainers

PyTorch Lightning Trainer

class collie.model.CollieTrainer(model: torch.nn.modules.module.Module, max_epochs: int = 10, benchmark: bool = True, deterministic: bool = True, **kwargs)[source]

Bases: pytorch_lightning.trainer.trainer.Trainer

Helper wrapper class around PyTorch Lightning’s Trainer class.

Specifically, this wrapper:

  • Checks if a model has a validation dataset passed in (under the val_loader attribute) and, if not, sets num_sanity_val_steps to 0 and check_val_every_n_epoch to sys.maxint.

  • Checks if a GPU is available and, if gpus is None, sets gpus = -1.

See pytorch_lightning.Trainer documentation for more details at: https://pytorch-lightning.readthedocs.io/en/latest/common/trainer.html#trainer-class-api

Compared with CollieMinimalTrainer, PyTorch Lightning’s Trainer offers more flexibility and room for exploration, at the cost of a higher training time (which is especially true for larger models). We recommend starting all model exploration with this CollieTrainer (callbacks, automatic Lightning optimizations, etc.), finding a set of hyperparameters that work for your training job, then using this in the simpler but faster CollieMinimalTrainer.

Parameters
property checkpoint_callback: Optional[pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint]

The first ModelCheckpoint callback in the Trainer.callbacks list, or None if it doesn’t exist.

property checkpoint_callbacks: List[pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint]

A list of all instances of ModelCheckpoint found in the Trainer.callbacks list.

configure_schedulers(schedulers: list, monitor: Optional[str], is_manual_optimization: bool)List[Dict[str, Any]]

Convert each scheduler into dict structure with relevant information

configure_sharded_model(model: pytorch_lightning.core.lightning.LightningModule)None

Called at the beginning of fit (train + validate), validate, test, or predict, or tune.

property default_root_dir: str

The default location to save artifacts of loggers, checkpoints etc. It is used as a fallback if logger or checkpoint callback do not define specific save paths.

property disable_validation: bool

Check if validation is disabled during training.

property early_stopping_callback: Optional[pytorch_lightning.callbacks.early_stopping.EarlyStopping]

The first EarlyStopping callback in the Trainer.callbacks list, or None if it doesn’t exist.

property early_stopping_callbacks: List[pytorch_lightning.callbacks.early_stopping.EarlyStopping]

A list of all instances of EarlyStopping found in the Trainer.callbacks list.

property enable_validation: bool

Check if we should run validation during training.

fit(model: pytorch_lightning.core.lightning.LightningModule, train_dataloader: Optional[Any] = None, val_dataloaders: Optional[Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]]] = None, datamodule: Optional[pytorch_lightning.core.datamodule.LightningDataModule] = None)None

Runs the full optimization routine.

Parameters
  • model – Model to fit.

  • train_dataloader – Either a single PyTorch DataLoader or a collection of these (list, dict, nested lists and dicts). In the case of multiple dataloaders, please see this page

  • val_dataloaders – Either a single Pytorch Dataloader or a list of them, specifying validation samples. If the model has a predefined val_dataloaders method this will be skipped

  • datamodule – An instance of LightningDataModule.

classmethod get_deprecated_arg_names()List

Returns a list with deprecated Trainer arguments.

property max_epochs

Property that just returns max_epochs, included only so we can have a setter for it without an AttributeError.

property model: torch.nn.modules.module.Module

The LightningModule, but possibly wrapped into DataParallel or DistributedDataParallel. To access the pure LightningModule, use lightning_module() instead.

on_after_backward()

Called after loss.backward() and before optimizers do anything.

on_batch_end()

Called when the training batch ends.

on_batch_start()

Called when the training batch begins.

on_before_accelerator_backend_setup(model: pytorch_lightning.core.lightning.LightningModule)None

Called at the beginning of fit (train + validate), validate, test, or predict, or tune.

on_before_zero_grad(optimizer)

Called after optimizer.step() and before optimizer.zero_grad().

on_epoch_end()

Called when either of train/val/test epoch ends.

on_epoch_start()

Called when either of train/val/test epoch begins.

on_fit_end()

Called when the trainer initialization begins, model has not yet been set.

on_fit_start()

Called when the trainer initialization begins, model has not yet been set.

on_init_end()

Called when the trainer initialization ends, model has not yet been set.

on_init_start()

Called when the trainer initialization begins, model has not yet been set.

on_keyboard_interrupt()

Called when the training is interrupted by KeyboardInterrupt.

on_load_checkpoint(checkpoint)

Called when loading a model checkpoint.

on_predict_batch_end(outputs: Union[torch.Tensor, Dict[str, Any]], batch: Any, batch_idx: int, dataloader_idx: int)None

Called when the predict batch ends.

on_predict_batch_start(batch: Any, batch_idx: int, dataloader_idx: int)None

Called when the predict batch begins.

on_predict_end()None

Called when predict ends.

on_predict_epoch_end(outputs: List[Any])None

Called when the epoch ends.

on_predict_epoch_start()None

Called when the epoch begins.

on_predict_start()None

Called when predict begins.

on_pretrain_routine_end()None

Called when the pre-train routine ends.

on_pretrain_routine_start()None

Called when the pre-train routine begins.

on_sanity_check_end()

Called when the validation sanity check ends.

on_sanity_check_start()

Called when the validation sanity check starts.

on_save_checkpoint(checkpoint: Dict[str, Any])Dict[Type, dict]

Called when saving a model checkpoint.

on_test_batch_end(outputs: Union[torch.Tensor, Dict[str, Any]], batch, batch_idx, dataloader_idx)

Called when the test batch ends.

on_test_batch_start(batch, batch_idx, dataloader_idx)

Called when the test batch begins.

on_test_end()

Called when the test ends.

on_test_epoch_end()

Called when the test epoch ends.

on_test_epoch_start()

Called when the epoch begins.

on_test_start()

Called when the test begins.

on_train_batch_end(outputs: Union[torch.Tensor, Dict[str, Any]], batch, batch_idx, dataloader_idx)

Called when the training batch ends.

on_train_batch_start(batch, batch_idx, dataloader_idx)

Called when the training batch begins.

on_train_end()

Called when the train ends.

on_train_epoch_end(outputs: List[Union[torch.Tensor, Dict[str, Any]]])

Called when the epoch ends.

Parameters

outputs – List of outputs on each train epoch

on_train_epoch_start()

Called when the epoch begins.

on_train_start()

Called when the train begins.

on_validation_batch_end(outputs: Union[torch.Tensor, Dict[str, Any]], batch, batch_idx, dataloader_idx)

Called when the validation batch ends.

on_validation_batch_start(batch, batch_idx, dataloader_idx)

Called when the validation batch begins.

on_validation_end()

Called when the validation loop ends.

on_validation_epoch_end()

Called when the validation epoch ends.

on_validation_epoch_start()

Called when the epoch begins.

on_validation_start()

Called when the validation loop begins.

predict(model: Optional[pytorch_lightning.core.lightning.LightningModule] = None, dataloaders: Optional[Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]]] = None, datamodule: Optional[pytorch_lightning.core.datamodule.LightningDataModule] = None, return_predictions: Optional[bool] = None)Optional[Union[List[Any], List[List[Any]]]]

Separates from fit to make sure you never run on your predictions set until you want to. This will call the model forward function to compute predictions.

Parameters
  • model – The model to predict with.

  • dataloaders – Either a single PyTorch DataLoader or a list of them, specifying inference samples.

  • datamodule – The datamodule with a predict_dataloader method that returns one or more dataloaders.

  • return_predictions – Whether to return predictions. True by default except when an accelerator that spawns processes is used (not supported).

Returns

Returns a list of dictionaries, one for each provided dataloader containing their respective predictions.

property prediction_writer_callbacks: List[pytorch_lightning.callbacks.prediction_writer.BasePredictionWriter]

A list of all instances of BasePredictionWriter found in the Trainer.callbacks list.

property progress_bar_dict: dict

Read-only for progress bar metrics.

request_dataloader(model: pytorch_lightning.core.lightning.LightningModule, stage: str)torch.utils.data.dataloader.DataLoader

Handles downloading data in the GPU or TPU case.

Parameters

dataloader_fx – The bound dataloader getter

Returns

The dataloader

reset_predict_dataloader(model)None

Resets the predict dataloader and determines the number of batches.

Parameters

model – The current LightningModule

reset_test_dataloader(model)None

Resets the test dataloader and determines the number of batches.

Parameters

model – The current LightningModule

reset_train_dataloader(model: pytorch_lightning.core.lightning.LightningModule)None

Resets the train dataloader and initialises required variables (number of batches, when to validate, etc.).

Parameters

model – The current LightningModule

reset_val_dataloader(model: pytorch_lightning.core.lightning.LightningModule)None

Resets the validation dataloader and determines the number of batches.

Parameters

model – The current LightningModule

setup(model: pytorch_lightning.core.lightning.LightningModule, stage: Optional[str])None

Called at the beginning of fit (train + validate), validate, test, or predict, or tune.

teardown(stage: Optional[str] = None)None

Called at the end of fit (train + validate), validate, test, or predict, or tune.

test(model: Optional[pytorch_lightning.core.lightning.LightningModule] = None, test_dataloaders: Optional[Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]]] = None, ckpt_path: Optional[str] = 'best', verbose: bool = True, datamodule: Optional[pytorch_lightning.core.datamodule.LightningDataModule] = None)List[Dict[str, float]]

Perform one evaluation epoch over the test set. It’s separated from fit to make sure you never run on your test set until you want to.

Parameters
  • model – The model to test.

  • test_dataloaders – Either a single PyTorch DataLoader or a list of them, specifying test samples.

  • ckpt_path – Either best or path to the checkpoint you wish to test. If None, use the current weights of the model. When the model is given as argument, this parameter will not apply.

  • verbose – If True, prints the test results.

  • datamodule – An instance of LightningDataModule.

Returns

Returns a list of dictionaries, one for each test dataloader containing their respective metrics.

tune(model: pytorch_lightning.core.lightning.LightningModule, train_dataloader: Optional[torch.utils.data.dataloader.DataLoader] = None, val_dataloaders: Optional[Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]]] = None, datamodule: Optional[pytorch_lightning.core.datamodule.LightningDataModule] = None, scale_batch_size_kwargs: Optional[Dict[str, Any]] = None, lr_find_kwargs: Optional[Dict[str, Any]] = None)Dict[str, Optional[Union[int, pytorch_lightning.tuner.lr_finder._LRFinder]]]

Runs routines to tune hyperparameters before training.

Parameters
  • model – Model to tune.

  • train_dataloader – A Pytorch DataLoader with training samples. If the model has a predefined train_dataloader method this will be skipped.

  • val_dataloaders – Either a single Pytorch Dataloader or a list of them, specifying validation samples. If the model has a predefined val_dataloaders method this will be skipped

  • datamodule – An instance of LightningDataModule.

  • scale_batch_size_kwargs – Arguments for scale_batch_size()

  • lr_find_kwargs – Arguments for lr_find()

validate(model: Optional[pytorch_lightning.core.lightning.LightningModule] = None, val_dataloaders: Optional[Union[torch.utils.data.dataloader.DataLoader, List[torch.utils.data.dataloader.DataLoader]]] = None, ckpt_path: Optional[str] = 'best', verbose: bool = True, datamodule: Optional[pytorch_lightning.core.datamodule.LightningDataModule] = None)List[Dict[str, float]]

Perform one evaluation epoch over the validation set.

Parameters
  • model – The model to validate.

  • val_dataloaders – Either a single PyTorch DataLoader or a list of them, specifying validation samples.

  • ckpt_path – Either best or path to the checkpoint you wish to validate. If None, use the current weights of the model. When the model is given as argument, this parameter will not apply.

  • verbose – If True, prints the validation results.

  • datamodule – An instance of LightningDataModule.

Returns

The dictionary with final validation results returned by validation_epoch_end. If validation_epoch_end is not defined, the output is a list of the dictionaries returned by validation_step.

property weights_save_path: str

The default root location to save weights (checkpoints), e.g., when the ModelCheckpoint does not define a file path.

Non- PyTorch Lightning Trainer

class collie.model.CollieMinimalTrainer(model: collie.model.base.base_pipeline.BasePipeline, max_epochs: int = 10, gpus: Optional[Union[bool, int]] = None, logger: Optional[pytorch_lightning.loggers.base.LightningLoggerBase] = None, early_stopping_patience: Optional[int] = 3, log_every_n_steps: int = 50, flush_logs_every_n_steps: int = 100, weights_summary: Optional[str] = 'top', terminate_on_nan: bool = False, benchmark: bool = True, deterministic: bool = True, progress_bar_refresh_rate: Optional[int] = None, verbosity: Union[bool, int] = True)[source]

A more manual implementation of PyTorch Lightning’s Trainer class, attempting to port over the most commonly used Trainer arguments into a training loop with more transparency and faster training times.

Through extensive experimentation, we found that PyTorch Lightning’s Trainer was training Collie models about 25% slower than the more manual, typical PyTorch training loop boilerplate. Thus, we created the CollieMinimalTrainer, which shares a similar API to PyTorch Lightning’s Trainer object (both in instantiation and in usage), with a standard PyTorch training loop in its place.

While PyTorch Lightning’s Trainer offers more flexibility and customization through the addition of the additional Trainer arguments and callbacks, we designed this class as a way to train a model in production, where we might be more focused on faster training times and less on hyperparameter tuning and R&D, where one might instead opt to use PyTorch Lightning’s Trainer class.

Note that the arguments the CollieMinimalTrainer trainer accepts will be slightly different than the ones that the CollieTrainer accept, and defaults are also not guaranteed to be equal as the two libraries evolve. Notable changes are:

  • If gpus > 1, only a single GPU will be used and any other GPUs will remain unused. Multi- GPU training is not supported in CollieMinimalTrainer at this time.

  • logger == True has no meaning in CollieMinimalTrainer - a default logger will NOT be created if set to True.

  • There is no way to pass in callbacks at this time. Instead, we will implement the most used ones during training here, manually, in favor of greater speed over customization. To use early stopping, set the early_stopping_patience to an integer other than None.

from collie.model import CollieMinimalTrainer, MatrixFactorizationModel


# notice how similar the usage is to the standard ``CollieTrainer``
model = MatrixFactorizationModel(train=train)
trainer = CollieMinimalTrainer(model)
trainer.fit(model)

Model results should NOT be significantly different whether trained with CollieTrainer or CollieMinimalTrainer.

If there’s an argument you would like to see added to CollieMinimalTrainer that is present in CollieTrainer used during productionalized model training, make an Issue or a PR in GitHub!

Parameters
  • model (collie.model.BasePipeline) – Initialized Collie model

  • max_epochs (int) – Stop training once this number of epochs is reached

  • gpus (bool or int) – Whether to train on the GPU (gpus == True or gpus > 0) or the CPU

  • logger (LightningLoggerBase) – Logger for experiment tracking. Set logger = None or logger = False to disable logging

  • early_stopping_patience (int) – Number of epochs of patience to have without any improvement in loss before stopping training early. Validation epoch loss will be used if there is a validation DataLoader present, else training epoch loss will be used. Set early_stopping_patience = None or early_stopping_patience = False to disable early stopping

  • log_every_n_steps (int) – How often to log within steps, if logger is enabled

  • flush_logs_every_n_steps (int) – How often to flush logs to disk, if logger is enabled

  • weights_summary (str) – Prints a summary of the weights when training begins

  • terminate_on_nan (bool) – If set to True, will terminate training (by raising a ValueError) at the end of each training batch, if any of the parameters or the loss are NaN or +/- infinity

  • benchmark (bool) – If set to True, enables cudnn.benchmark

  • deterministic (bool) – If set to True, enables cudnn.deterministic

  • progress_bar_refresh_rate (int) – How often to refresh progress bar (in steps), if verbosity > 0

  • verbosity (Union[bool, int]) –

    How verbose to be in training.

    • 0 disables all printouts, including weights_summary

    • 1 prints weights_summary (if applicable) and epoch losses

    • 2 prints weights_summary (if applicable), epoch losses, and progress bars

fit(model: collie.model.base.base_pipeline.BasePipeline)None[source]

Runs the full optimization routine.

Parameters

model (collie.model.BasePipeline) – Initialized Collie model

property max_epochs

Property that just returns max_epochs, included only so we can have a setter for it without an AttributeError.

Model Templates

Base Collie Pipeline Template

class collie.model.BasePipeline(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.ExplicitInteractions, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.ExplicitInteractions, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, lr: float = 0.001, lr_scheduler_func: Optional[Callable] = None, weight_decay: float = 0.0, optimizer: Union[str, Callable] = 'adam', loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None, **kwargs)[source]

Bases: pytorch_lightning.core.lightning.LightningModule

Base Pipeline model architectures to inherit from.

All subclasses MUST at least override the following methods:

  • _setup_model - Set up the model architecture

  • forward - Forward pass through a model

For item_item_similarity to work properly, all subclasses are should also implement:

  • _get_item_embeddings - Returns item embeddings from the model on the device

Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • lr (float) – Model learning rate

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer (torch.optim or str) –

    If a string, one of the following supported optimizers:

    • 'sgd' (for torch.optim.SGD)

    • 'adagrad' (for torch.optim.Adagrad)

    • 'adam' (for torch.optim.Adam)

    • 'sparse_adam' (for torch.optim.SparseAdam)

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

  • **kwargs (keyword arguments) – All keyword arguments will be saved to self.hparams by default

calculate_loss(batch: Union[Tuple[Tuple[None._VariableFunctionsClass.tensor, None._VariableFunctionsClass.tensor], None._VariableFunctionsClass.tensor], Tuple[None._VariableFunctionsClass.tensor, None._VariableFunctionsClass.tensor, None._VariableFunctionsClass.tensor]])None._VariableFunctionsClass.tensor[source]

Given a batch of data, calculate the loss value.

Note that the data type (implicit or explicit) will be determined by the structure of the batch sent to this method. See the table below for expected data types:

__getitem__ Format

Expected Meaning

Model Type

((X, Y), Z)

((user IDs, item IDs), negative item IDs)

Implicit

(X, Y, Z)

(user IDs, item IDs, ratings)

Explicit

configure_optimizers()Union[Tuple[List[Callable], List[Callable]], Tuple[Callable, Callable], Callable][source]

Configure optimizers and learning rate schedulers to use in optimization.

This method will be called after setup.

If self.bias_optimizer is None, only a single optimizer will be returned. If there is a non-None class attribute for bias_optimizer, two optimizers will be created: one for all layers with the name ‘bias’ in them, and another for all other model parameters. The bias optimizer will be set with the same parameters as optimizer with the exception of the learning rate, which will be set to self.hparams.bias_lr.

abstract forward(users: None._VariableFunctionsClass.tensor, items: None._VariableFunctionsClass.tensor)None._VariableFunctionsClass.tensor[source]

forward should be implemented in all subclasses.

get_item_predictions(user_id: int = 0, unseen_items_only: bool = False, sort_values: bool = True)pandas.core.series.Series[source]

Get predicted rankings/ratings for all items for a given user_id.

This method cannot be called for datasets stored in HDF5InteractionsDataLoader since data in this DataLoader is read in dynamically.

Parameters
  • user_id (int) –

  • unseen_items_only (bool) – Filter preds to only show predictions of unseen items not present in the training or validation datasets for that user_id. Note this requires both train_loader and val_loader to be 1) class-level attributes in the model and 2) DataLoaders with Interactions at its core (not HDF5Interactions). If you are loading in a model, these two attributes will need to be set manually, since datasets are NOT saved when saving the model

  • sort_values (bool) – Whether to sort recommendations by descending prediction probability or not

Returns

preds – Sorted values as predicted ratings for each item in the dataset with the index being the item ID

Return type

pd.Series

item_item_similarity(item_id: int)pandas.core.series.Series[source]

Get most similar item indices by cosine similarity.

Cosine similarity is computed with item embeddings from a trained model.

Parameters

item_id (int) –

Returns

sim_score_idxs – Sorted values as cosine similarity for each item in the dataset with the index being the item ID

Return type

pd.Series

Note

Returned array is unfiltered, so the first element, being the most similar item, will always be the item itself.

save_model(filename: Union[str, pathlib.Path] = 'model.pth')None[source]

Save the model’s state dictionary and hyperparameters.

While PyTorch Lightning offers a way to save and load models, there are two main reasons for overriding these:

  1. We only want to save the underlying PyTorch model (and not the Trainer object) so we don’t have to require PyTorch Lightning as a dependency when deploying a model.

  2. In the v0.8.4 release, loading a model back in leads to a RuntimeError unable to load in weights.

Parameters

filepath (str or Path) – Filepath for state dictionary to be saved at ending in ‘.pth’

train_dataloader()Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.dataloaders.InteractionsDataLoader][source]

Method that sets up training data as a PyTorch DataLoader.

This method will be called after configure_optimizers.

training_epoch_end(outputs: Union[List[float], List[List[float]]])None[source]

Method that contains a callback for logic to run after the training epoch ends.

This method will be called after training_step.

training_step(batch: Tuple[Tuple[None._VariableFunctionsClass.tensor, None._VariableFunctionsClass.tensor], None._VariableFunctionsClass.tensor], batch_idx: int, optimizer_idx: Optional[int] = None)None._VariableFunctionsClass.tensor[source]

Method that contains logic for what happens inside the training loop.

This method will be called after train_dataloader.

val_dataloader()Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.dataloaders.InteractionsDataLoader][source]

Method that sets up validation data as a PyTorch DataLoader.

This method will be called after training_step.

validation_epoch_end(outputs: List[float])None[source]

Method that contains a callback for logic to run after the validation epoch ends.

This method will be called after validation_step.

validation_step(batch: Tuple[Tuple[None._VariableFunctionsClass.tensor, None._VariableFunctionsClass.tensor], None._VariableFunctionsClass.tensor], batch_idx: int, optimizer_idx: Optional[int] = None)None._VariableFunctionsClass.tensor[source]

Method that contains logic for what happens inside the validation loop.

This method will be called after val_dataloader.

Base Collie Multi-Stage Pipeline Template

class collie.model.MultiStagePipeline(train: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, val: Optional[Union[collie.interactions.dataloaders.ApproximateNegativeSamplingInteractionsDataLoader, collie.interactions.datasets.Interactions, collie.interactions.dataloaders.InteractionsDataLoader]] = None, lr_scheduler_func: Optional[Callable] = None, weight_decay: float = 0.0, optimizer_config_list: Optional[List[Dict[str, Union[float, List[str], str]]]] = None, loss: Union[str, Callable] = 'hinge', metadata_for_loss: Optional[Dict[str, None._VariableFunctionsClass.tensor]] = None, metadata_for_loss_weights: Optional[Dict[str, float]] = None, load_model_path: Optional[str] = None, map_location: Optional[str] = None, **kwargs)[source]

Bases: collie.model.base.base_pipeline.BasePipeline

Multi-stage pipeline model architectures to inherit from.

This model template is intended for models that train in distinct stages, with a different optimizer optimizing each step. This allows model components to be optimized with a set order in mind, rather than all at once, such as with the BasePipeline.

Generally, multi-stage models will have a training protocol like:

from collie.model import CollieTrainer, SomeMultiStageModel

model = SomeMultiStageModel(train=train)
trainer = CollieTrainer(model)

# fit stage 1
trainer.fit(model)

# fit stage 2
trainer.max_epochs += 10
model.advance_stage()
trainer.fit(model)

# fit stage 3
trainer.max_epochs += 10
model.advance_stage()
trainer.fit(model)

# ... and so on, until...

model.eval()

Just like with BasePipeline, all subclasses MUST at least override the following methods:

  • _setup_model - Set up the model architecture

  • forward - Forward pass through a model

For item_item_similarity to work properly, all subclasses are should also implement:

  • _get_item_embeddings - Returns item embeddings from the model

Parameters
  • train (collie.interactions object) – Data loader for training data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=True

  • val (collie.interactions object) – Data loader for validation data. If an Interactions object is supplied, an InteractionsDataLoader will automatically be instantiated with shuffle=False

  • lr_scheduler_func (torch.optim.lr_scheduler) – Learning rate scheduler to use during fitting

  • weight_decay (float) – Weight decay passed to the optimizer, if optimizer permits

  • optimizer_config_list (list of dict) –

    List of dictionaries containing the optimizer configurations for each stage’s optimizer(s). Each dictionary must contain the following keys:

    • lr: str

      Learning rate for the optimizer

    • optimizer: torch.optim or str

    • parameter_prefix_list: List[str]

      List of string prefixes corressponding to the model components that should be optimized with this optimizer

    • stage: str

      Name of stage

    This must be ordered with the intended progression of stages.

  • loss (function or str) –

    If a string, one of the following implemented losses:

    • 'bpr' / 'adaptive_bpr' (implicit data)

    • 'hinge' / 'adaptive_hinge' (implicit data)

    • 'warp' (implicit data)

    • 'mse' (explicit data)

    • 'mae' (explicit data)

    For implicit data, if train.num_negative_samples > 1, the adaptive loss version will automatically be used of the losses above (except for WARP loss, which is only adaptive by nature).

    If a callable is passed, that function will be used for calculating the loss. For implicit models, the first two arguments passed will be the positive and negative predictions, respectively. Additional keyword arguments passed in order are num_items, positive_items, negative_items, metadata, and metadata_weights. For explicit models, the only two arguments passed in will be the prediction and actual rating values, in order.

  • metadata_for_loss (dict) – Keys should be strings identifying each metadata type that match keys in metadata_weights. Values should be a torch.tensor of shape (num_items x 1). Each tensor should contain categorical metadata information about items (e.g. a number representing the genre of the item)

  • metadata_for_loss_weights (dict) –

    Keys should be strings identifying each metadata type that match keys in metadata. Values should be the amount of weight to place on a match of that type of metadata, with the sum of all values <= 1. e.g. If metadata_for_loss_weights = {'genre': .3, 'director': .2}, then an item is:

    • a 100% match if it’s the same item,

    • a 50% match if it’s a different item with the same genre and same director,

    • a 30% match if it’s a different item with the same genre and different director,

    • a 20% match if it’s a different item with a different genre and same director,

    • a 0% match if it’s a different item with a different genre and different director, which is equivalent to the loss without any partial credit

  • load_model_path (str or Path) – To load a previously-saved model for inference, pass in path to output of model.save_model() method. Note that datasets and optimizers will NOT be restored. If None, will initialize model as normal

  • map_location (str or torch.device) – If load_model_path is provided, device specifying how to remap storage locations when torch.load-ing the state dictionary

  • **kwargs (keyword arguments) – All keyword arguments will be saved to self.hparams by default

Notes

  • With each call of trainer.fit, the optimizer and learning rate scheduler state will reset.

  • When loading a multi-stage model in, the state will be set to the last possible state. This state may have a different forward calculation than other states.

advance_stage()None[source]

Advance the stage to the next one in self.hparams.stage_list.

configure_optimizers()Union[Tuple[List[Callable], List[Callable]], Tuple[Callable, Callable], Callable][source]

Configure optimizers and learning rate schedulers to use in optimization.

This method will be called after setup.

Creates an optimizer and learning rate scheduler for each configuration dictionary in self.hparams.optimizer_config_list.

optimizer_step(epoch: Optional[int] = None, batch_idx: Optional[int] = None, optimizer: Optional[torch.optim.optimizer.Optimizer] = None, optimizer_idx: Optional[int] = None, optimizer_closure: Optional[Callable] = None, **kwargs)None[source]

Overriding Lightning’s optimizer step function to only step the optimizer associated with the relevant stage.

See here for more details: https://pytorch-lightning.readthedocs.io/en/stable/common/lightning_module.html#optimizer-step

Parameters
  • epoch (int) – Current epoch

  • batch_idx (int) – Index of current batch

  • optimizer (torch.optim.Optimizer) – A PyTorch optimizer

  • optimizer_idx (int) – If you used multiple optimizers, this indexes into that list

  • optimizer_closure (Callable) – Closure for all optimizers

set_stage(stage: str)None[source]

Set the model to the desired stage.

Layers

Scaled Embedding

class collie.model.ScaledEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None, device=None, dtype=None)[source]

Bases: torch.nn.modules.sparse.Embedding

Embedding layer that initializes its values to use a truncated normal distribution.

reset_parameters()None[source]

Overriding default reset_parameters method.

Zero Embedding

class collie.model.ZeroEmbedding(num_embeddings: int, embedding_dim: int, padding_idx: Optional[int] = None, max_norm: Optional[float] = None, norm_type: float = 2.0, scale_grad_by_freq: bool = False, sparse: bool = False, _weight: Optional[torch.Tensor] = None, device=None, dtype=None)[source]

Bases: torch.nn.modules.sparse.Embedding

Embedding layer with weights zeroed-out.

reset_parameters()None[source]

Overriding default reset_parameters method.