Store Recommendation System
In this tutorial, we'll construct an inventory-focused recommendation system. We're going to use the Python SDK for simplicity, but you should be able to adapt the concepts to other programming languages. A versatile recommendation system has broad applications, ranging from e-commerce platforms to brick-and-mortar supermarkets, enabling them to provide more intelligent product suggestions to customers. This tutorial was inspired by the winners of the XpressAi Tokyo Builders Hackathon 2023.
Here's what we will be covering in this tutorial:
Setting up the basic data ingestion: We'll begin by setting up our data, making sure it's in the right format and ready to use.
Creating a smart search: After setting up our data, we will develop a smart search functionality. This will allow users to search through our inventory in a sophisticated and intuitive way.
Building a recommendation system using analogies: Once we have our smart search in place, we'll start creating the core of our application - the recommendation system. We'll leverage the concept of analogies to find similarities between different items and recommend items that are similar to what users have shown interest in.
Improving the recommendation system with prompt injection: Finally, we'll explore ways to improve our recommendation system even further by using prompt injection, a technique that can provide more specific and nuanced recommendations.
By the end of this tutorial, you will have a strong understanding of how to build a powerful recommendation system. So, let's roll up our sleeves and get started!
Set Up Vecto Application
!pip install ftfy tqdm ipywidgets==7.7.1 git+https://github.com/XpressAI/vecto-python-sdk@v0.1.0
Creating a Vector Space and Usage
Level Token
Launch the Vecto login page at Vecto Login. Enter your Username and Password then proceed by clicking Sign In.
Next, click on
Vector Spaces
in the menu bar and select the New Vector Space option. For the Vector Space name, let's go withrecommendation_system
. Next, we get to select avectorization model
. As we're primarily working withTEXT
data, the SBERT model is an good choice. Wrap it up by clicking theCreate Vector Space
button. To view the specifics of your Vector Space, simply click on its name in the Vector Spaces list. Remember to jot down your Vector Space ID; we'll be needing it soon.To interact with our vector space, we need a unique Vector Space authentication token. Start by clicking on your username to expose the Tokens tab. Set the token name as
recommendation_system_token
. For our initial activities with this vector space, aUSAGE
access token will suffice. It grants us read-write privileges for our specific Vector Space. Having selected therecommendation_system
Vector Space we previously crafted, proceed by clickingCreate User Token
.
Remember, the token will only be displayed once, so keep it safe! We'll need it for the upcoming steps.

As always, it is important to keep your token safe. A common practice is to set the token in an .env file or export it as a variable.
Initialize Vecto and Ingesting Data
We'll begin by initializing the Vecto
class and supplying it with our recommendation_system
Vector Space ID and authentication token.
By default, the Vecto
class checks for the existence of VECTO_API_KEY
in the environment. If it doesn't exist, you can directly supply the token parameter. Since we're going to interact with the newly created vector space, it's also essential to provide its ID.
Replace the placeholders with the actual values for the token
and vecto_space_id
, then run the cell.
from vecto import Vecto
import os
# token = os.environ['VECTO_API_KEY']
token = ""
vector_space_id = ""
vs = Vecto(token, vector_space_id)
Dataset
In this tutorial, we are using a synthetic inventory dataset which contains 150 items with the following fields: Item Name
, Category
, Description
, Calories
, and Expiry Date
. Let's first create a csv reader and observe the file content.
import urllib.request
import csv
# Read the CSV from the URL
inventory_url = 'https://raw.githubusercontent.com/XpressAI/vecto-examples/main/Examples/supermarket-inventory.csv'
http_response = urllib.request.urlopen(inventory_url)
http_response_lines = [line.decode('utf-8') for line in http_response.readlines()]
csv_data_reader = csv.reader(http_response_lines, delimiter=';')
inventory_data = list(csv_data_reader)
# Print first 5 rows, including the header
for inventory_item in inventory_data[:5]:
print(inventory_item)
Ingest Dataset
Then let's format it into an ingest format that Vecto expects, the data
and attribute
.
For the data, we're going to use Item Name
, and the rest can be its attributes.
from pprint import pprint
# Extract item names and item attributes
data = [inventory_item[0] for inventory_item in inventory_data[1:]] # Only include item name, exclude header
attribute_names = inventory_data[0]
attributes = [{attribute_names[i]: attribute for i, attribute in enumerate(inventory_item)} for inventory_item in inventory_data[1:]]
# Print the first 3 elements of data and attributes
print(data[:3])
pprint(attributes[:3])
The batch size determines the number of images ingested in each batch. Here, we set the batch size to 128
to speed up the initial ingest process. However, batch size could be set to any other integer value, even just 1
, as this is merely depending on the dataset type and size.
from tqdm.notebook import tqdm
vs.ingest_all_text(data, attributes, batch_size=128)
You will need to wait for the vectorization process to finish before moving to the next section.
Smart Search using Text Vector Search
Let's run a lookup function and hook it up with ipywidgets.
from ipywidgets import interact_manual, IntSlider, FileUpload
from IPython.display import display
import io
def display_results(results):
output = []
for result in results:
formatted_result = f"Name: {result.attributes['Item Name']}\nCategory: {result.attributes['Category']}\nDescription: {result.attributes['Description']}\nSimilarity: {result.similarity}\n\n"
output.append(formatted_result)
print(*output, sep='\n')
def text_query(query, top_k=10):
f = io.StringIO(query)
response = vs.lookup_text_from_str(query, top_k)
display_results(response)
You can then perform searches using text queries. Using the interactive cell widget:
- Type your query text in the available text box.
- Select the number of results with the highest search similarity to view
top_k
.
interact_manual(text_query, query="Bread", top_k=IntSlider(min=1, max=50))
Vecto should return results that correspond to your search query.
You can experiment with other search terms, for instance, breakfast
. Even when terms like Coffee
or Cereal
aren't included in their keywords or description, Vecto can still provide these recommendations. This highlights Vecto's capability to generate intelligent suggestions.
Creating Recommendations using Analogies
As covered in a previous tutorial, vectors can solve analogies using an arithmetic-like logic. For example, using the 'dog is to puppy as cat is to kitten' analogy, we can use vector differences ('puppy vector - dog vector') to create an output so that a 'cat' query will display kittens. In essence, this is 'puppy - dog + cat = kitten'. By establishing a 'start' (Dog), 'end' (Puppy), and 'query' (Cat), we can deduce an 'Adult to Baby' relationship and apply it to other queries.
In a similar manner, we can apply this concept to food and condiments. By understanding the analogous relationship between a dish and its condiments, we can create a vector that maps a dish to its ideal condiments. For example, if we understand 'burger' is to 'ketchup' as 'hotdog' is to 'mustard', we can make suitable condiment recommendations for different dishes.
Let's start with a basic setup.
def text_analogy(dish, start, end, top_k=10):
query = io.StringIO(dish)
result = vs.compute_text_analogy(query, analogy_start_end, top_k)
display_results(result)
interact_manual(text_analogy, dish="hotdog", start="burger", end="ketchup", top_k=IntSlider(min=1, max=50))
Looks like Vecto
has some ideas of what condiment could go with the dish, but it's not perfect.
You can improve the recommendations by stacking multiple start
s and end
s.
Let's use following synthetic dataset that contains two elements: the Item
, which refers to the condiment, and the Dish
in which it is used. Let's start by inputting the data into a list.
# Data source
url = 'https://raw.githubusercontent.com/XpressAI/vecto-examples/main/Examples/food-condiments.csv'
response = requests.get(url)
decoded_content = response.content.decode('utf-8')
csv_reader = csv.DictReader(io.StringIO(decoded_content), delimiter=';')
# Initialize lists to store items and dishes
items = []
dishes = []
for row in csv_reader:
items.append(row['Item'])
dishes.append(row['Dish'])
for dish, item in zip(dishes[:5], items[:5]):
print({"Dish": dish, "Condiment": item})
Let's format it into a VectoAnalogyStartEnd, a list of dicts that contains the start
s and end
s.
analogy_start_end = []
for dish, item in zip(dishes, items):
mapping = {"start": io.StringIO(dish), "end": io.StringIO(item)}
analogy_start_end.append(mapping)
def improved_text_analogy(dish, top_k=10):
query = io.StringIO(dish)
result = vs.compute_text_analogy(query, analogy_start_end, top_k)
display_results(result)
interact_manual(improved_text_analogy, dish="spaghetti", top_k=IntSlider(min=1, max=50))
You should see more grounded recommendations. Take note that you can only output recommendations of items that you have ingested. Consequently, if an item, although contextually appropriate, isn't present in the inventory, it won't be featured in the recommendation outputs.
Analogies are especially valuable when dealing with a large product catalog but lacking historical data to build a typical recommendation system. This can often be the case for new businesses or for those expanding their product line, where customer behavior patterns or purchasing history are sparse. By employing analogies, businesses can lay out logical correlations among products, delivering relevant and situation-specific recommendations to their customers.
Recommendations using Prompt Injection
The last technique we'll cover is recommendatons using prompt injection. The way it works is fairly simple - by adding context or specifying the type of results you want the model to prioritize, you can guide the recommendation process towards a desired outcome.
For example, if the base use case is generating condiment
products based on the selected dish
, you can simply add condiments for
before the search prompt.
def prompt_injection_text_query(query, top_k=10):
updated_query = "condiment for " + query
response = vs.lookup_text_from_str(updated_query, top_k)
display_results(response)
interact_manual(prompt_injection_text_query, query="burgers", top_k=IntSlider(min=1, max=50))
As a result, you should receive recommendations such as Mayonnaise
, Mustard
, and Tomato sauce
. And that's the basic concept! You have the flexibility to replace condiments with any category that suits your specific use case. For instance, in the context of an electronics store, you might substitute condiments with accessories.