Skip to main content

Store Recommendation System

Open In Colab

In this tutorial, we'll construct an inventory-focused recommendation system. A versatile recommendation system has broad applications, ranging from e-commerce platforms to brick-and-mortar supermarkets, enabling them to provide more intelligent product suggestions to customers. This tutorial was inspired by the winners of the XpressAi Tokyo Builders Hackathon 2023.

Tutorial Overview

Here's what we will be covering in this tutorial:

  1. Setting up the basic data ingestion: We'll begin by setting up our data, making sure it's in the right format and ready to use.

  2. Creating a smart search: After setting up our data, we will develop a smart search functionality. This will allow users to search through our inventory in a sophisticated and intuitive way.

  3. Building a recommendation system using analogies: Once we have our smart search in place, we'll start creating the core of our application - the recommendation system. We'll leverage the concept of analogies to find similarities between different items and recommend items that are similar to what users have shown interest in.

  4. Improving the recommendation system with prompt injection: Finally, we'll explore ways to improve our recommendation system even further by using prompt injection, a technique that can provide more specific and nuanced recommendations.

By the end of this tutorial, you will have a strong understanding of how to build a powerful recommendation system. So, let's roll up our sleeves and get started!

Let's get started!

Creating a Vector Space and Usage Level Token

  1. Launch the Vecto login page at Vecto Login. Enter your Username and Password then proceed by clicking Sign In.

  2. Next, click on Vector Spaces in the menu bar and select the New Vector Space option. For the Vector Space name, let's go with recommendation_system. Next, we get to select a vectorization model. As we're primarily working with TEXT data, the SBERT model is an good choice. Wrap it up by clicking the Create Vector Space button. To view the specifics of your Vector Space, simply click on its name in the Vector Spaces list. Remember to jot down your Vector Space ID; we'll be needing it soon.

  3. To interact with our vector space, we need a unique Vector Space authentication token. Start by clicking on your username to expose the Tokens tab. Set the token name as recommendation_system_token. For our initial activities with this vector space, a USAGE access token will suffice. It grants us read-write privileges for our specific Vector Space. Having selected the recommendation_system Vector Space we previously crafted, proceed by clicking Create User Token.

Remember, the token will only be displayed once, so keep it safe! We'll need it for the upcoming steps.

As always, it is important to keep your token safe. A common practice is to set the token in an .env file or export it as a variable.

Setting Up Vecto Application

pip install ftfy tqdm ipywidgets==7.7.1 vecto-sdk

Initialize Vecto and Data Ingestion

We'll begin by initializing the Vecto class and supplying it with our recommendation_system Vector Space ID and authentication token.

By default, the Vecto class checks for the existence of VECTO_API_KEY in the environment. If it doesn't exist, you can directly supply the token parameter. Since we're going to interact with the newly created vector space, it's also essential to provide its ID.

Replace the placeholders with the actual values for the token and vecto_space_id, then run the cell.

from vecto import Vecto
import os

# token = os.environ['VECTO_API_KEY']
token = ""
vector_space_id = ""

vs = Vecto(token, vector_space_id)

Dataset

In this tutorial, we are using a synthetic inventory dataset which contains 159 items with the following fields: Item Name, Category, Description, Calories, and Expiry Date. Let's first create a csv reader and observe the file content.

import requests
import csv
import io

# Read the CSV from the URL
inventory_url = 'https://raw.githubusercontent.com/XpressAI/vecto-examples/main/Examples/supermarket-inventory.csv'
response = requests.get(inventory_url)

# Use io.StringIO to convert the text content into a file-like object for csv.reader
csv_data = io.StringIO(response.text)

# Create a CSV reader and specify the delimiter
csv_reader = csv.reader(csv_data, delimiter=';')

# Convert the reader into a list and print the first 5 rows
inventory_data = list(csv_reader)
for inventory_item in inventory_data[:5]:
print(inventory_item)

Ingest Dataset

Then let's format it into an ingest format that Vecto expects, the data and attribute. For the data, we're going to use Item Name, and the rest can be its attributes.

from pprint import pprint 

# Extract item names and item attributes
data = [inventory_item[0] for inventory_item in inventory_data[1:]] # Only include item name, exclude header
attribute_names = inventory_data[0]
attributes = [{attribute_names[i]: attribute for i, attribute in enumerate(inventory_item)} for inventory_item in inventory_data[1:]]

# Print the first 3 elements of data and attributes
print(data[:3])
pprint(attributes[:3])
from tqdm.notebook import tqdm

vs.ingest_all_text(data, attributes, batch_size=128)

The batch size determines the number of images ingested in each batch. Here, we set the batch size to 128 to speed up the initial ingest process. However, batch size could be set to any other integer value, even just 1, as this is merely depending on the dataset type and size.

You will need to wait for the vectorization process to finish before moving to the next section.

Let's run a lookup function and hook it up with ipywidgets.

from ipywidgets import interact_manual, IntSlider, FileUpload
from IPython.display import display
import io

def display_results(results):
output = []
for result in results:
formatted_result = f"Name: {result.attributes['Item Name']}\nCategory: {result.attributes['Category']}\nDescription: {result.attributes['Description']}\nSimilarity: {result.similarity}\n\n"
output.append(formatted_result)
print(*output, sep='\n')

def text_query(query, top_k=10):
f = io.StringIO(query)
response = vs.lookup_text_from_str(query, top_k)
display_results(response)

You can then perform searches using text queries. Using the interactive cell widget:

  • Type your query text in the available text box.
  • Select the number of results with the highest search similarity to view top_k.
interact_manual(text_query, query="Bread", top_k=IntSlider(min=1, max=50))

Vecto should return results that correspond to your search query.

Creating Recommendations using Analogies

As covered in a previous tutorial, vectors can solve analogies using an arithmetic-like logic. For example, using the 'dog is to puppy as cat is to kitten' analogy, we can use vector differences ('puppy vector - dog vector') to create an output so that a 'cat' query will display kittens. In essence, this is 'puppy - dog + cat = kitten'. By establishing a 'start' (Dog), 'end' (Puppy), and 'query' (Cat), we can deduce an 'Adult to Baby' relationship and apply it to other queries.

In a similar manner, we can apply this concept to food and condiments. By understanding the analogous relationship between a dish and its condiments, we can create a vector that maps a dish to its ideal condiments. For example, if we understand 'burger' is to 'ketchup' as 'hotdog' is to 'mustard', we can make suitable condiment recommendations for different dishes.

Let's start with a basic setup.

def text_analogy(dish, start, end, top_k=10):
query = io.StringIO(dish)
result = vs.compute_text_analogy(query, analogy_start_end, top_k)
display_results(result)

interact_manual(text_analogy, dish="hotdog", start="burger", end="ketchup", top_k=IntSlider(min=1, max=50))

Looks like Vecto has some ideas of what condiment could go with the dish, but it's not perfect. You can improve the recommendations by stacking multiple starts and ends.

Let's use following synthetic dataset that contains two elements: the Item, which refers to the condiment, and the Dish in which it is used. Let's start by inputting the data into a list.

# Data source
url = 'https://raw.githubusercontent.com/XpressAI/vecto-python-examples/main/Examples/food-condiments.csv'

response = requests.get(url)
decoded_content = response.content.decode('utf-8')
csv_reader = csv.DictReader(io.StringIO(decoded_content), delimiter=';')

# Initialize lists to store items and dishes
items = []
dishes = []

for row in csv_reader:
items.append(row['Item'])
dishes.append(row['Dish'])

for dish, item in zip(dishes[:5], items[:5]):
print({"Dish": dish, "Condiment": item})

Let's format it into a VectoAnalogyStartEnd, a list of dicts that contains the starts and ends.

analogy_start_end = []

for dish, item in zip(dishes, items):
mapping = {"start": io.StringIO(dish), "end": io.StringIO(item)}
analogy_start_end.append(mapping)
def improved_text_analogy(dish, top_k=10):
query = io.StringIO(dish)
result = vs.compute_text_analogy(query, analogy_start_end, top_k)
display_results(result)

interact_manual(improved_text_analogy, dish="spaghetti", top_k=IntSlider(min=1, max=50))

You should see more grounded recommendations. Take note that you can only output recommendations of items that you have ingested. Consequently, if an item, although contextually appropriate, isn't present in the inventory, it won't be featured in the recommendation outputs.

Analogies are especially valuable when dealing with a large product catalog but lacking historical data to build a typical recommendation system. This can often be the case for new businesses or for those expanding their product line, where customer behavior patterns or purchasing history are sparse. By employing analogies, businesses can lay out logical correlations among products, delivering relevant and situation-specific recommendations to their customers.

Recommendations using Prompt Injection

The last technique we'll cover is recommendatons using prompt injection. The way it works is fairly simple - by adding context or specifying the type of results you want the model to prioritize, you can guide the recommendation process towards a desired outcome.

For example, if the base use case is generating condiment products based on the selected dish, you can simply add condiments for before the search prompt.

def prompt_injection_text_query(query, top_k=10):
updated_query = "condiment for " + query
response = vs.lookup_text_from_str(updated_query, top_k)
display_results(response)
interact_manual(prompt_injection_text_query, query="burgers", top_k=IntSlider(min=1, max=50))

As a result, you should receive recommendations such as Mayonnaise, Mustard, and Tomato sauce. And that's the basic concept! You have the flexibility to replace condiments with any category that suits your specific use case. For instance, in the context of an electronics store, you might substitute condiments with accessories.