Utility Functions¶
Create Ratings Matrix¶
- class collie.utils.create_ratings_matrix(df: DataFrame, user_col: str = 'user_id', item_col: str = 'item_id', ratings_col: str = 'rating', sparse: bool = False)[source]¶
Helper function to convert a Pandas DataFrame to 2-dimensional matrix.
- Parameters
df (pd.DataFrame) – Dataframe with columns for user IDs, item IDs, and ratings
user_col (str) – Column name for the user IDs
item_col (str) – Column name for the item IDs
ratings_col (str) – Column name for the ratings column
sparse (bool) – Whether to return data as a sparse
coo_matrix
(True) or np.array (False)
- Returns
ratings_matrix – Data with users as rows, items as columns, and ratings as values
- Return type
np.array or scipy.sparse.coo_matrix, 2-d
DataFrame to Interactions
¶
- class collie.utils.df_to_interactions(df: DataFrame, user_col: str = 'user_id', item_col: str = 'item_id', ratings_col: Optional[str] = 'rating', **kwargs)[source]¶
Helper function to convert a DataFrame to an
Interactions
object.- Parameters
df (pd.DataFrame) – Dataframe with columns for user IDs, item IDs, and (optionally) ratings
user_col (str) – Column name for the user IDs
item_col (str) – Column name for the item IDs
ratings_col (str) – Column name for the ratings column. If
None
, will default to ratings of all 1s**kwargs – Keyword arguments to pass to
Interactions
- Returns
interactions
- Return type
Convert to Implicit Ratings¶
- class collie.utils.convert_to_implicit(explicit_df: DataFrame, min_rating_to_keep: Optional[float] = 4, user_col: str = 'user_id', item_col: str = 'item_id', ratings_col: str = 'rating')[source]¶
Convert explicit interactions data to implicit data.
Duplicate user ID and item ID pairs will be dropped, as well as all scores that are
< min_rating_to_keep
. All remaining interactions will have a rating of1
.- Parameters
explicit_df (pd.DataFrame) – Dataframe with explicit ratings in the rating column
min_rating_to_keep (int) – Minimum rating to be considered a valid interaction
ratings_col (str) – Column name for the ratings column
- Returns
implicit_df – Dataframe that converts all
ratings >= min_rating_to_keep
to 1 and drops the rest with a reset index. Note that the order ofimplicit_df
will not be equal toexplicit_df
- Return type
pd.DataFrame
Remove Users With Fewer Than n Interactions¶
- class collie.utils.remove_users_with_fewer_than_n_interactions(df: DataFrame, min_num_of_interactions: int = 3, user_col: str = 'user_id')[source]¶
Remove DataFrame rows with users who appear fewer than
min_num_of_interactions
times.- Parameters
df (pd.DataFrame) –
min_num_of_interactions (int) – Minimum number of interactions a user can have while remaining in
filtered_df
user_col (str) – Column name for the user IDs
- Returns
filtered_df
- Return type
pd.DataFrame
Pandas DataFrame to HDF5 Format¶
DataFrame to HTML¶
- class collie.utils.df_to_html(df: DataFrame, image_cols: List[str] = [], hyperlink_cols: List[str] = [], html_tags: Dict[str, Union[str, List[str]]] = {}, transpose: bool = False, image_width: Optional[int] = None, max_num_rows: int = 200, **kwargs)[source]¶
Convert a Pandas DataFrame to HTML.
- Parameters
df (DataFrame) – DataFrame to convert to HTML
image_cols (str or list) – Column names that contain image urls or file paths. Columns specified as images will make all other transformations to those columns be ignored. Local files will display correctly in Jupyter if specified using relative paths but not if specified using absolute paths (see https://github.com/jupyter/notebook/issues/3810).
hyperlink_cols (str or list) – Column names that contain hyperlinks to open in a new tab
html_tags (dictionary) –
A transformation to be inserted directly into the HTML tag.
Ex:
{'col_name_1': 'strong'}
becomes<strong>col_name_1</strong>
Ex:
{'col_name_2': 'mark'}
becomes<mark>col_name_2</mark>
Ex:
{'col_name_3': 'h2'}
becomes<h2>col_name_3</h2>
Ex:
{'col_name_4': ['em', 'strong']}
becomes<em><strong>col_name_4</strong></em>
transpose (bool) – Transpose the DataFrame before converting to HTML
image_width (int) – Set image width for each image generated
max_num_rows (int) – Maximum number of rows to display
**kwargs (keyword arguments) – Additional arguments sent to
pandas.DataFrame.to_html
, as listed in: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_html.html
- Returns
df_html – DataFrame converted to a HTML string, ready for displaying
- Return type
HTML
Examples
In a Jupyter notebook:
from IPython.core.display import display, HTML import pandas as pd df = pd.DataFrame({ 'item': ['Beefy Fritos® Burrito'], 'price': ['1.00'], 'image_url': ['https://www.tacobell.com/images/22480_beefy_fritos_burrito_269x269.jpg'], }) display( HTML( df_to_html( df, image_cols='image_url', html_tags={'item': 'strong', 'price': 'em'}, image_width=200, ) ) )
Note
Converted table will have CSS class ‘dataframe’, unless otherwise specified.
Timer Class¶
Truncated Normal Initialization¶
- class collie.utils.trunc_normal(embedding_weight: tensor, mean: float = 0.0, std: float = 1.0)[source]¶
Truncated normal initialization (approximation).
Taken from FastAI: https://github.com/fastai/fastai/blob/master/fastai/layers.py