Store Recommendation System
In this tutorial, we'll construct an inventory-focused recommendation system. A versatile recommendation system has broad applications, ranging from e-commerce platforms to brick-and-mortar supermarkets, enabling them to provide more intelligent product suggestions to customers. This tutorial was inspired by the winners of the XpressAi Tokyo Builders Hackathon 2023.
Tutorial Overview
Here's what we will be covering in this tutorial:
Setting up the basic data ingestion: We'll begin by setting up our data, making sure it's in the right format and ready to use.
Creating a smart search: After setting up our data, we will develop a smart search functionality. This will allow users to search through our inventory in a sophisticated and intuitive way.
Building a recommendation system using analogies: Once we have our smart search in place, we'll start creating the core of our application - the recommendation system. We'll leverage the concept of analogies to find similarities between different items and recommend items that are similar to what users have shown interest in.
Improving the recommendation system with prompt injection: Finally, we'll explore ways to improve our recommendation system even further by using prompt injection, a technique that can provide more specific and nuanced recommendations.
By the end of this tutorial, you will have a strong understanding of how to build a powerful recommendation system. So, let's roll up our sleeves and get started!
Let's get started!
Creating a Vector Space and Usage
Level Token
Launch the Vecto login page at Vecto Login. Enter your Username and Password then proceed by clicking Sign In.
Next, click on
Vector Spaces
in the menu bar and select the New Vector Space option. For the Vector Space name, let's go withrecommendation_system
. Next, we get to select avectorization model
. As we're primarily working withTEXT
data, the SBERT model is an good choice. Wrap it up by clicking theCreate Vector Space
button. To view the specifics of your Vector Space, simply click on its name in the Vector Spaces list. Remember to jot down your Vector Space ID; we'll be needing it soon.To interact with our vector space, we need a unique Vector Space authentication token. Start by clicking on your username to expose the Tokens tab. Set the token name as
recommendation_system_token
. For our initial activities with this vector space, aUSAGE
access token will suffice. It grants us read-write privileges for our specific Vector Space. Having selected therecommendation_system
Vector Space we previously crafted, proceed by clickingCreate User Token
.
Remember, the token will only be displayed once, so keep it safe! We'll need it for the upcoming steps.
As always, it is important to keep your token safe. A common practice is to set the token in an .env file or export it as a variable.
Setting Up Vecto Application
- Python
- TypeScript
pip install ftfy tqdm ipywidgets==7.7.1 vecto-sdk
We've setup a repository for you to run the demo. Otherwise you can install the required packages using:
npm install @xpressai/vecto-client typescript @types/node dotenv csv-parse
Initialize Vecto and Data Ingestion
- Python
- TypeScript
We'll begin by initializing the Vecto
class and supplying it with our recommendation_system
Vector Space ID and authentication token.
By default, the Vecto
class checks for the existence of VECTO_API_KEY
in the environment. If it doesn't exist, you can directly supply the token parameter. Since we're going to interact with the newly created vector space, it's also essential to provide its ID.
Replace the placeholders with the actual values for the token
and vecto_space_id
, then run the cell.
from vecto import Vecto
import os
# token = os.environ['VECTO_API_KEY']
token = ""
vector_space_id = ""
vs = Vecto(token, vector_space_id)
Create a .env
in your root, add your vector space id and token.
VECTOR_SPACE_ID=
VECTO_USER_TOKEN=
Then create a new TypeScript file, name it say ingest-inventory.tsx
.
import fs from 'fs';
import { parse } from 'csv-parse';
import {
Configuration,
IndexApi,
IndexDataRequest,
} from '@xpressai/vecto-client';
import dotenv from 'dotenv';
dotenv.config();
const config = new Configuration({
accessToken: process.env.VECTO_USER_TOKEN,
});
Dataset
In this tutorial, we are using a synthetic inventory dataset which contains 159 items with the following fields: Item Name
, Category
, Description
, Calories
, and Expiry Date
. Let's first create a csv reader and observe the file content.
- Python
- TypeScript
import requests
import csv
import io
# Read the CSV from the URL
inventory_url = 'https://raw.githubusercontent.com/XpressAI/vecto-examples/main/Examples/supermarket-inventory.csv'
response = requests.get(inventory_url)
# Use io.StringIO to convert the text content into a file-like object for csv.reader
csv_data = io.StringIO(response.text)
# Create a CSV reader and specify the delimiter
csv_reader = csv.reader(csv_data, delimiter=';')
# Convert the reader into a list and print the first 5 rows
inventory_data = list(csv_reader)
for inventory_item in inventory_data[:5]:
print(inventory_item)
import fetch from 'node-fetch';
import { parse } from 'csv-parse/sync';
import * as readline from 'readline';
// Function to fetch and parse the CSV data
async function fetchAndParseCSV(url: string): Promise<any[]> {
const response = await fetch(url);
const csvData = await response.text();
// Parse the CSV data
const records = parse(csvData, {
delimiter: ';',
columns: true, // Use the first line as header columns
skip_empty_lines: true,
});
return records;
}
// Function to print the first 5 rows of the parsed CSV data
function printFirst5Rows(data: any[]): void {
data.slice(0, 5).forEach((row, index) => {
console.log(`Row ${index + 1}:`, row);
});
}
// Main function to execute the fetch and print
async function main() {
const inventoryUrl = 'https://raw.githubusercontent.com/XpressAI/vecto-examples/main/Examples/supermarket-inventory.csv';
const inventoryData = await fetchAndParseCSV(inventoryUrl);
printFirst5Rows(inventoryData);
}
main().catch(error => console.error('Error fetching or parsing CSV:', error));
Ingest Dataset
Then let's format it into an ingest format that Vecto expects, the data
and attribute
.
For the data, we're going to use Item Name
, and the rest can be its attributes.
- Python
- TypeScript
from pprint import pprint
# Extract item names and item attributes
data = [inventory_item[0] for inventory_item in inventory_data[1:]] # Only include item name, exclude header
attribute_names = inventory_data[0]
attributes = [{attribute_names[i]: attribute for i, attribute in enumerate(inventory_item)} for inventory_item in inventory_data[1:]]
# Print the first 3 elements of data and attributes
print(data[:3])
pprint(attributes[:3])
from tqdm.notebook import tqdm
vs.ingest_all_text(data, attributes, batch_size=128)
import fs from 'fs';
import { parse } from 'csv-parse';
import {
Configuration,
IndexApi,
IndexDataRequest,
} from '@xpressai/vecto-client';
import dotenv from 'dotenv';
dotenv.config();
const config = new Configuration({
accessToken: process.env.VECTO_USER_TOKEN,
});
type InventoryItem = {
itemName: string;
category: string;
description: string;
calories: number;
expiryDate: string;
};
async function indexInventoryTextData(inventoryItems: InventoryItem[]) {
const indexApi = new IndexApi(config);
const batchSize = 128;
for (let i = 1; i < inventoryItems.length; i += batchSize) {
const batch = inventoryItems.slice(i, i + batchSize);
// Create an array of Blob for each item in the batch
const inputs = batch.map(item => new Blob([JSON.stringify(item)]));
const attributes = batch.map(item => JSON.stringify(item));
const textDataParams: IndexDataRequest = {
vectorSpaceId: Number(process.env.VECTOR_SPACE_ID),
modality: 'TEXT',
attributes: attributes,
input: inputs,
};
try {
const result = await indexApi.indexData(textDataParams);
console.log('Batch indexed successfully:', result);
} catch (error) {
console.error('Error indexing batch:', error);
}
}
}
const csvFilePath = 'supermarket-inventory.csv';
const headers = ['itemName', 'category', 'description', 'calories', 'expiryDate'];
const fileContent = fs.readFileSync(csvFilePath, { encoding: 'utf-8' });
parse(fileContent, {
delimiter: ';',
columns: headers,
fromLine: 2 // Start parsing from line 2, skipping the header row
}, (error, result) => {
indexInventoryTextData(result);
});
The batch size determines the number of images ingested in each batch. Here, we set the batch size to 128
to speed up the initial ingest process. However, batch size could be set to any other integer value, even just 1
, as this is merely depending on the dataset type and size.
You will need to wait for the vectorization process to finish before moving to the next section.
Smart Search using Text Vector Search
- Python
- TypeScript
Let's run a lookup function and hook it up with ipywidgets.
from ipywidgets import interact_manual, IntSlider, FileUpload
from IPython.display import display
import io
def display_results(results):
output = []
for result in results:
formatted_result = f"Name: {result.attributes['Item Name']}\nCategory: {result.attributes['Category']}\nDescription: {result.attributes['Description']}\nSimilarity: {result.similarity}\n\n"
output.append(formatted_result)
print(*output, sep='\n')
def text_query(query, top_k=10):
f = io.StringIO(query)
response = vs.lookup_text_from_str(query, top_k)
display_results(response)
You can then perform searches using text queries. Using the interactive cell widget:
- Type your query text in the available text box.
- Select the number of results with the highest search similarity to view
top_k
.
interact_manual(text_query, query="Bread", top_k=IntSlider(min=1, max=50))
Vecto should return results that correspond to your search query.
Create a new typescript file for the lookup, say lookup.tsx
.
import {
Configuration,
LookupApi,
LookupRequest,
} from '@xpressai/vecto-client';
import dotenv from 'dotenv';
dotenv.config();
const config = new Configuration({
accessToken: process.env.VECTO_USER_TOKEN,
});
async function lookupTextData() {
const lookupApi = new LookupApi(config);
const textParams: LookupRequest = {
vectorSpaceId: Number(process.env.VECTOR_SPACE_ID),
modality: 'TEXT',
topK: 3,
query: 'text query',
};
try {
const results = await lookupApi.lookup(textParams);
console.log("Text lookup results: ", JSON.stringify(results, null, 2));
} catch (error) {
console.error('Error lookup data:', error);
}
}
lookupTextData();
import fs from 'fs';
async function lookupImageData() {
const lookupApi = new LookupApi(config);
const fileContent = fs.readFileSync('bread.png');
const imageBlob = new Blob([fileContent]);
const ImageParams: LookupRequest = {
vectorSpaceId: Number(process.env.VECTOR_SPACE_ID),
modality: 'IMAGE',
topK: 3,
query: imageBlob,
};
try {
const results = await lookupApi.lookup(ImageParams);
console.log("Image lookup results: ", JSON.stringify(results, null, 2));
} catch (error) {
console.error('Error lookup data:', error);
}
}
lookupImageData();
Creating Recommendations using Analogies
As covered in a previous tutorial, vectors can solve analogies using an arithmetic-like logic. For example, using the 'dog is to puppy as cat is to kitten' analogy, we can use vector differences ('puppy vector - dog vector') to create an output so that a 'cat' query will display kittens. In essence, this is 'puppy - dog + cat = kitten'. By establishing a 'start' (Dog), 'end' (Puppy), and 'query' (Cat), we can deduce an 'Adult to Baby' relationship and apply it to other queries.
In a similar manner, we can apply this concept to food and condiments. By understanding the analogous relationship between a dish and its condiments, we can create a vector that maps a dish to its ideal condiments. For example, if we understand 'burger' is to 'ketchup' as 'hotdog' is to 'mustard', we can make suitable condiment recommendations for different dishes.
Let's start with a basic setup.
- Python
- TypeScript
def text_analogy(dish, start, end, top_k=10):
query = io.StringIO(dish)
result = vs.compute_text_analogy(query, analogy_start_end, top_k)
display_results(result)
interact_manual(text_analogy, dish="hotdog", start="burger", end="ketchup", top_k=IntSlider(min=1, max=50))
Create a new typescript file for the analogy, say simple-analogy.tsx
.
import {
Configuration,
LookupApi,
LookupWithDynamicAnalogyRequest
} from '@xpressai/vecto-client';
import dotenv from 'dotenv';
dotenv.config();
const config = new Configuration({
accessToken: process.env.VECTO_USER_TOKEN,
});
async function textAnalogy() {
const lookupApi = new LookupApi(config);
const params: LookupWithDynamicAnalogyRequest = {
vectorSpaceId: Number(process.env.VECTOR_SPACE_ID),
modality: 'TEXT',
topK: 3,
query: "hotdog",
start: [new Blob(["burger"])],
end: [new Blob(["ketchup"])],
};
try {
const results = await lookupApi.lookupWithDynamicAnalogy(params);
console.log("Text analogy results: ", JSON.stringify(results, null, 2));
} catch (error) {
console.error('Error analogy data:', error);
}
}
textAnalogy();
Looks like Vecto
has some ideas of what condiment could go with the dish, but it's not perfect.
You can improve the recommendations by stacking multiple start
s and end
s.
Let's use following synthetic dataset that contains two elements: the Item
, which refers to the condiment, and the Dish
in which it is used. Let's start by inputting the data into a list.
- Python
- TypeScript
# Data source
url = 'https://raw.githubusercontent.com/XpressAI/vecto-python-examples/main/Examples/food-condiments.csv'
response = requests.get(url)
decoded_content = response.content.decode('utf-8')
csv_reader = csv.DictReader(io.StringIO(decoded_content), delimiter=';')
# Initialize lists to store items and dishes
items = []
dishes = []
for row in csv_reader:
items.append(row['Item'])
dishes.append(row['Dish'])
for dish, item in zip(dishes[:5], items[:5]):
print({"Dish": dish, "Condiment": item})
Let's format it into a VectoAnalogyStartEnd, a list of dicts that contains the start
s and end
s.
analogy_start_end = []
for dish, item in zip(dishes, items):
mapping = {"start": io.StringIO(dish), "end": io.StringIO(item)}
analogy_start_end.append(mapping)
def improved_text_analogy(dish, top_k=10):
query = io.StringIO(dish)
result = vs.compute_text_analogy(query, analogy_start_end, top_k)
display_results(result)
interact_manual(improved_text_analogy, dish="spaghetti", top_k=IntSlider(min=1, max=50))
import {
Configuration,
LookupApi,
LookupWithDynamicAnalogyRequest
} from '@xpressai/vecto-client';
import dotenv from 'dotenv';
import { readFileSync } from 'fs';
import { parse } from 'csv-parse/sync';
dotenv.config();
const config = new Configuration({
accessToken: process.env.VECTO_USER_TOKEN,
});
async function textAnalogy(query: string, starts: string[], ends: string[]) {
const lookupApi = new LookupApi(config);
const startBlobs = starts.map(start => new Blob([start]));
const endBlobs = ends.map(end => new Blob([end]));
const params: LookupWithDynamicAnalogyRequest = {
vectorSpaceId: Number(process.env.VECTOR_SPACE_ID),
modality: 'TEXT',
topK: 3,
query: query,
start: startBlobs,
end: endBlobs,
};
try {
const results = await lookupApi.lookupWithDynamicAnalogy(params);
console.log("Text analogy results: ", JSON.stringify(results, null, 2));
} catch (error) {
console.error('Error analogy data:', error);
}
}
function processCSV() {
const fileContent = readFileSync('food-condiments.csv', { encoding: 'utf-8' });
const records = parse(fileContent, {
delimiter: ';',
columns: true
}) as { Item: string; Dish: string }[];
const starts = records.map(record => record.Item);
const ends = records.map(record => record.Dish);
textAnalogy("hotdog", starts, ends);
}
processCSV();
You should see more grounded recommendations. Take note that you can only output recommendations of items that you have ingested. Consequently, if an item, although contextually appropriate, isn't present in the inventory, it won't be featured in the recommendation outputs.
Analogies are especially valuable when dealing with a large product catalog but lacking historical data to build a typical recommendation system. This can often be the case for new businesses or for those expanding their product line, where customer behavior patterns or purchasing history are sparse. By employing analogies, businesses can lay out logical correlations among products, delivering relevant and situation-specific recommendations to their customers.
Recommendations using Prompt Injection
The last technique we'll cover is recommendatons using prompt injection. The way it works is fairly simple - by adding context or specifying the type of results you want the model to prioritize, you can guide the recommendation process towards a desired outcome.
For example, if the base use case is generating condiment
products based on the selected dish
, you can simply add condiments for
before the search prompt.
- Python
- TypeScript
def prompt_injection_text_query(query, top_k=10):
updated_query = "condiment for " + query
response = vs.lookup_text_from_str(updated_query, top_k)
display_results(response)
interact_manual(prompt_injection_text_query, query="burgers", top_k=IntSlider(min=1, max=50))
import {
Configuration,
LookupApi,
LookupRequest,
} from '@xpressai/vecto-client';
import dotenv from 'dotenv';
dotenv.config();
const config = new Configuration({
accessToken: process.env.VECTO_USER_TOKEN,
});
async function textLookup(query: string, promptInjection: string = "condiment for " + query) {
const lookupApi = new LookupApi(config);
// Update the query with the promptInjection
const updatedQuery = promptInjection + " " + query;
const params: LookupRequest = {
vectorSpaceId: Number(process.env.VECTOR_SPACE_ID),
modality: 'TEXT',
topK: 3,
query: updatedQuery,
};
try {
const results = await lookupApi.lookup(params);
console.log("Text lookup results: ", JSON.stringify(results, null, 2));
} catch (error) {
console.error('Error lookup data:', error);
}
}
// Example usage
textLookup("hotdogs");
As a result, you should receive recommendations such as Mayonnaise
, Mustard
, and Tomato sauce
. And that's the basic concept! You have the flexibility to replace condiments with any category that suits your specific use case. For instance, in the context of an electronics store, you might substitute condiments with accessories.