AI/ML pipeline

Recommendations, image recognition, and fraud detection

Overview

Current state

Basic queries for item listings, manual categorization, no fraud detection.

Target state

  • Personalized recommendations per user
  • Auto-tagging images (brand, condition, category)
  • Fraud detection for listings and users
  • Price suggestions based on market data

Tech stack

  • Embeddings: OpenAI / Cohere for text + image embeddings
  • Vector DB: Supabase pgvector or Pinecone
  • ML Models: Hugging Face for image classification
  • Processing: Vercel AI SDK, background jobs

Features

Personalized Recommendations

  • Track user views, likes, purchases
  • Generate user preference embeddings
  • Similar items based on browsing history
  • "You might like" carousel on home

Image Recognition

  • Auto-detect brand from logo/tags
  • Suggest category from item photo
  • Condition assessment (new, used, worn)
  • Background removal for cleaner listings

Fraud Detection

  • Duplicate listing detection (image similarity)
  • Suspicious pricing alerts
  • Account behavior scoring
  • Automated flagging for review

Price Suggestions

  • Market analysis for similar items
  • Historical price trends
  • "Price to sell fast" vs "maximize profit"
  • Alerts when items are underpriced

Architecture

Data flow

Data Flow
User action → Event tracking → Feature store

                            ML Pipeline

                            Predictions → Cache → API

Embedding pipeline

packages/features/items/embeddings.ts
async function generateItemEmbeddings(item: Item) {
  const textEmbedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: `${item.title} ${item.description} ${item.brand}`,
  });

  const imageEmbedding = await generateImageEmbedding(item.mainImageUrl);

  await db.update(items)
    .set({
      textEmbedding: textEmbedding.data[0].embedding,
      imageEmbedding
    })
    .where(eq(items.id, item.id));
}
packages/features/items/queries/similar.ts
async function findSimilarItems(itemId: string, limit = 10) {
  return db.execute(sql`
    SELECT *,
      1 - (text_embedding <=> ${targetEmbedding}) as similarity
    FROM items
    WHERE id != ${itemId}
    ORDER BY text_embedding <=> ${targetEmbedding}
    LIMIT ${limit}
  `);
}

Implementation

Embeddings foundation

Set up pgvector in Supabase, generate embeddings for existing items, and build similarity search API.

Recommendations

Implement user preference tracking, recommendation API endpoints, and "Similar items" on item detail page.

Image recognition

Integrate image classification model, auto-suggest categories on upload, and brand detection from images.

Fraud detection

Add duplicate detection on listing, pricing anomaly alerts, and user behavior scoring.

Checklist

Infrastructure

  • Enable pgvector extension in Supabase
  • Add embedding columns to items table
  • Set up background job for embedding generation
  • Create vector similarity indexes

Features

  • Similar items API
  • User recommendations API
  • Image auto-tagging
  • Price suggestion engine
  • Fraud detection alerts