Skip to content
Home / Skills / Spring Ai / Embedding Models
SP

Embedding Models

Spring Ai core v1.0.0

Embedding Models

Overview

Embedding models convert text into dense vector representations (e.g., float[1536]) that capture semantic meaning. Spring AI’s EmbeddingModel abstraction enables vector-based similarity search, clustering, classification, and retrieval-augmented generation (RAG) pipelines. Embeddings are the foundation of semantic search and are essential for grounding LLM responses in domain-specific knowledge.


Key Concepts

EmbeddingModel Interface

public interface EmbeddingModel extends Model<EmbeddingRequest, EmbeddingResponse> {
    EmbeddingResponse call(EmbeddingRequest request);
    
    // Convenience method
    default List<Double> embed(String text) {
        return call(new EmbeddingRequest(List.of(text), null))
            .getResults()
            .get(0)
            .getOutput();
    }
}

Embedding Dimensionality Tradeoffs

┌─────────────────────────────────────────────────────────────┐
│            Embedding Model Comparison                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│ Model                Dims   Cost/1M  Recall  Use Case        │
│ ─────                ────   ────────  ──────  ────────       │
│                                                              │
│ text-embedding-3-small  1536   $0.02   Good   High volume   │
│ text-embedding-3-large  3072   $0.13   Better High accuracy │
│ text-embedding-ada-002  1536   $0.10   Good   Legacy        │
│ voyage-2               1024   $0.10   Good   Specialized    │
│ e5-mistral-7b-instruct  4096   Free    Best   Local/private │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Dimensionality Impact:

  • Higher dimensions → Better recall, higher storage cost, slower search
  • Lower dimensions → Lower storage, faster search, potentially lower recall

Similarity Metrics

┌──────────────────────────────────────────────────────┐
│          Vector Similarity Metrics                    │
├──────────────────────────────────────────────────────┤
│                                                       │
│  COSINE SIMILARITY (Most Common)                      │
│  ─────────────────                                    │
│  Range: [-1, 1]   (1 = identical, 0 = orthogonal)     │
│  Use: Text, semantic search                           │
│  Formula: A · B / (||A|| * ||B||)                     │
│                                                       │
│  EUCLIDEAN DISTANCE                                   │
│  ──────────────────                                   │
│  Range: [0, ∞]    (0 = identical, ∞ = very different) │
│  Use: Image embeddings, when magnitude matters        │
│  Formula: sqrt(Σ(A_i - B_i)²)                         │
│                                                       │
│  DOT PRODUCT                                          │
│  ───────────                                          │
│  Range: [-∞, ∞]                                       │
│  Use: When vectors are normalized                     │
│  Formula: Σ(A_i * B_i)                                │
│                                                       │
└──────────────────────────────────────────────────────┘

Best Practices

1. Choose Embedding Model Based on Use Case

Match model dimensions and cost to retrieval requirements.

@Configuration
public class EmbeddingConfig {
    
    @Bean
    @ConditionalOnProperty(name = "embedding.mode", havingValue = "high-volume")
    public EmbeddingModel smallEmbeddingModel() {
        return new OpenAiEmbeddingModel(
            openAiApi,
            EmbeddingOptions.builder()
                .withModel("text-embedding-3-small") // 1536 dims, cheap
                .build()
        );
    }
    
    @Bean
    @ConditionalOnProperty(name = "embedding.mode", havingValue = "high-accuracy")
    public EmbeddingModel largeEmbeddingModel() {
        return new OpenAiEmbeddingModel(
            openAiApi,
            EmbeddingOptions.builder()
                .withModel("text-embedding-3-large") // 3072 dims, better recall
                .build()
        );
    }
}

2. Batch Embeddings for Efficiency

Embed multiple documents in a single API call to reduce latency and cost.

@Service
public class DocumentEmbeddingService {
    private final EmbeddingModel embeddingModel;
    
    public List<float[]> embedBatch(List<String> documents) {
        // Single API call for up to 2048 documents (OpenAI limit)
        EmbeddingResponse response = embeddingModel.call(
            new EmbeddingRequest(documents, null)
        );
        
        return response.getResults().stream()
            .map(result -> toFloatArray(result.getOutput()))
            .collect(Collectors.toList());
    }
}

3. Normalize Vectors for Cosine Similarity

Most vector databases expect normalized vectors for cosine similarity.

public float[] normalizeVector(List<Double> vector) {
    double magnitude = Math.sqrt(
        vector.stream().mapToDouble(v -> v * v).sum()
    );
    
    return vector.stream()
        .map(v -> (float) (v / magnitude))
        .mapToDouble(Float::doubleValue)
        .toArray();
}

4. Cache Embeddings to Avoid Recomputation

Embeddings for static content should be computed once and stored.

@Service
public class CachedEmbeddingService {
    private final EmbeddingModel embeddingModel;
    private final EmbeddingCache cache;
    
    public float[] embed(String text) {
        return cache.computeIfAbsent(text, key -> {
            List<Double> embedding = embeddingModel.embed(text);
            return toFloatArray(embedding);
        });
    }
}

5. Monitor Embedding Drift for Document Updates

When documents change, re-embed and update the vector store.

@Scheduled(cron = "0 0 2 * * ?") // Daily at 2 AM
public void updateStaleEmbeddings() {
    List<Document> updatedDocs = documentRepository.findUpdatedSince(
        LocalDateTime.now().minusDays(1)
    );
    
    for (Document doc : updatedDocs) {
        float[] embedding = embeddingService.embed(doc.getContent());
        vectorStore.upsert(doc.getId(), embedding, doc.getMetadata());
    }
    
    log.info("Updated {} stale embeddings", updatedDocs.size());
}

Code Examples

Example 1: Basic Text Embedding

@Service
public class BasicEmbeddingService {
    private final EmbeddingModel embeddingModel;
    
    public float[] embedText(String text) {
        List<Double> embedding = embeddingModel.embed(text);
        return toFloatArray(embedding);
    }
    
    public double cosineSimilarity(String text1, String text2) {
        float[] vec1 = embedText(text1);
        float[] vec2 = embedText(text2);
        
        return dotProduct(vec1, vec2); // Assumes normalized vectors
    }
    
    private float[] toFloatArray(List<Double> list) {
        float[] array = new float[list.size()];
        for (int i = 0; i < list.size(); i++) {
            array[i] = list.get(i).floatValue();
        }
        return array;
    }
}

✅ Good for: Simple similarity checks, one-off embeddings
❌ Not good for: High-volume production (no batching)


Example 2: Batch Embedding for RAG Ingestion

@Service
public class DocumentIngestionService {
    private final EmbeddingModel embeddingModel;
    private final VectorStore vectorStore;
    
    public void ingestDocuments(List<Document> documents) {
        // Chunk documents
        List<String> chunks = documents.stream()
            .flatMap(doc -> chunkDocument(doc, 512))
            .collect(Collectors.toList());
        
        // Batch embed (up to 2048 at a time for OpenAI)
        for (int i = 0; i < chunks.size(); i += 2048) {
            List<String> batch = chunks.subList(
                i, Math.min(i + 2048, chunks.size())
            );
            
            EmbeddingResponse response = embeddingModel.call(
                new EmbeddingRequest(batch, null)
            );
            
            // Store in vector database
            for (int j = 0; j < batch.size(); j++) {
                vectorStore.add(
                    new org.springframework.ai.vectorstore.Document(
                        batch.get(j),
                        Map.of("chunk_index", i + j),
                        toFloatArray(response.getResults().get(j).getOutput())
                    )
                );
            }
        }
    }
}

✅ Good for: RAG pipelines, bulk ingestion
❌ Not good for: Real-time embedding (use pre-computed vectors)


Example 3: Semantic Deduplication

@Service
public class DeduplicationService {
    private final EmbeddingModel embeddingModel;
    private static final double SIMILARITY_THRESHOLD = 0.95;
    
    public List<Document> deduplicate(List<Document> documents) {
        List<Document> unique = new ArrayList<>();
        
        for (Document doc : documents) {
            float[] embedding = embedText(doc.getContent());
            
            boolean isDuplicate = unique.stream()
                .anyMatch(existing -> {
                    float[] existingEmbedding = embedText(existing.getContent());
                    return cosineSimilarity(embedding, existingEmbedding) > SIMILARITY_THRESHOLD;
                });
            
            if (!isDuplicate) {
                unique.add(doc);
            }
        }
        
        return unique;
    }
}

✅ Good for: Content moderation, duplicate detection
❌ Not good for: Large datasets (O(n²) complexity; use vector DB)


Example 4: Multi-Model Fallback

@Service
public class ResilientEmbeddingService {
    private final EmbeddingModel primaryModel;
    private final EmbeddingModel fallbackModel;
    
    @CircuitBreaker(name = "embedding", fallbackMethod = "fallbackEmbed")
    public List<Double> embed(String text) {
        return primaryModel.embed(text);
    }
    
    private List<Double> fallbackEmbed(String text, Exception e) {
        log.warn("Primary embedding model failed, using fallback", e);
        return fallbackModel.embed(text);
    }
}

✅ Good for: Production reliability
❌ Not good for: Dimensionality must match (or re-index everything)


Example 5: Hybrid Search (Keyword + Semantic)

@Service
public class HybridSearchService {
    private final EmbeddingModel embeddingModel;
    private final VectorStore vectorStore;
    private final FullTextSearchEngine fullTextSearch;
    
    public List<Document> search(String query, int topK) {
        // 1. Semantic search via embeddings
        float[] queryEmbedding = embedText(query);
        List<Document> semanticResults = vectorStore.similaritySearch(
            SearchRequest.query(queryEmbedding).withTopK(topK)
        );
        
        // 2. Keyword search via full-text index
        List<Document> keywordResults = fullTextSearch.search(query, topK);
        
        // 3. Merge and rerank
        return mergeAndRerank(semanticResults, keywordResults, query, topK);
    }
    
    private List<Document> mergeAndRerank(
        List<Document> semantic,
        List<Document> keyword,
        String query,
        int topK
    ) {
        // Reciprocal Rank Fusion (RRF)
        Map<String, Double> scores = new HashMap<>();
        
        for (int i = 0; i < semantic.size(); i++) {
            String id = semantic.get(i).getId();
            scores.merge(id, 1.0 / (60 + i), Double::sum);
        }
        
        for (int i = 0; i < keyword.size(); i++) {
            String id = keyword.get(i).getId();
            scores.merge(id, 1.0 / (60 + i), Double::sum);
        }
        
        return scores.entrySet().stream()
            .sorted(Map.Entry.<String, Double>comparingByValue().reversed())
            .limit(topK)
            .map(entry -> findDocumentById(entry.getKey()))
            .collect(Collectors.toList());
    }
}

✅ Good for: Best of both worlds (semantic + keyword)
❌ Not good for: Real-time (needs caching/pre-computation)


Anti-Patterns

❌ Re-Embedding Static Content on Every Query

// DON'T: Recompute embeddings for static documents
public List<Document> search(String query) {
    float[] queryEmbedding = embedText(query);
    
    List<Document> allDocs = documentRepository.findAll();
    for (Document doc : allDocs) {
        doc.setEmbedding(embedText(doc.getContent())); // Wasteful!
    }
    
    return findTopK(queryEmbedding, allDocs);
}

Why: Embeddings are deterministic; recomputing wastes time and money.

✅ DO: Pre-compute and store embeddings

// At ingestion time
vectorStore.add(new Document(
    content,
    metadata,
    embeddingModel.embed(content)
));

// At query time
return vectorStore.similaritySearch(query);

❌ Using Wrong Similarity Metric

// DON'T: Use Euclidean distance on non-normalized vectors
double distance = euclideanDistance(vec1, vec2);

Why: Euclidean distance is sensitive to vector magnitude; cosine similarity is better for text.

✅ DO: Use cosine similarity for text embeddings

double similarity = cosineSimilarity(vec1, vec2);

❌ Embedding Full Documents Without Chunking

// DON'T: Embed entire 50-page document
String fullDoc = readEntireDocument(); // 50,000 tokens
float[] embedding = embedText(fullDoc); // Truncated at 8192 tokens!

Why: Embedding models have token limits (e.g., 8192 for OpenAI); long texts lose information.

✅ DO: Chunk documents before embedding

List<String> chunks = chunkDocument(fullDoc, 512); // 512 tokens each
for (String chunk : chunks) {
    vectorStore.add(new Document(chunk, metadata, embedText(chunk)));
}

❌ Ignoring Embedding Model Version Changes

// DON'T: Switch embedding models without re-indexing
// Old: text-embedding-ada-002 (1536 dims)
// New: text-embedding-3-large (3072 dims)
// Vectors are incompatible!

Why: Different models produce incomparable embeddings; similarity search fails.

✅ DO: Version your vector stores

@Bean("embeddingModelV2")
public EmbeddingModel newModel() { ... }

@Bean("vectorStoreV2")
public VectorStore newVectorStore() { ... }

// Migrate data with re-embedding
migrationService.reindexWithNewModel(embeddingModelV2, vectorStoreV2);

Testing Strategies

Unit Testing with Fixed Vectors

@Test
void shouldComputeCosineSimilarity() {
    float[] vec1 = {1.0f, 0.0f, 0.0f};
    float[] vec2 = {1.0f, 0.0f, 0.0f};
    
    double similarity = cosineSimilarity(vec1, vec2);
    
    assertEquals(1.0, similarity, 0.001); // Identical vectors
}

Integration Testing with Local Models

@SpringBootTest
@TestPropertySource(properties = {
    "spring.ai.ollama.base-url=http://localhost:11434",
    "spring.ai.ollama.embedding.model=nomic-embed-text"
})
class EmbeddingServiceIntegrationTest {
    @Autowired
    private EmbeddingModel embeddingModel;
    
    @Test
    void shouldEmbedText() {
        List<Double> embedding = embeddingModel.embed("Hello world");
        
        assertNotNull(embedding);
        assertEquals(768, embedding.size()); // nomic-embed-text dims
    }
}

Similarity Threshold Tuning

@Test
void shouldTuneSimilarityThreshold() {
    List<Pair<String, String>> goldenPairs = loadGoldenDataset();
    
    for (double threshold = 0.7; threshold <= 0.99; threshold += 0.01) {
        int truePositives = 0;
        int falsePositives = 0;
        
        for (Pair<String, String> pair : goldenPairs) {
            double similarity = embeddingService.similarity(
                pair.getFirst(), pair.getSecond()
            );
            
            if (similarity >= threshold) {
                if (pair.isMatch()) truePositives++;
                else falsePositives++;
            }
        }
        
        double precision = (double) truePositives / (truePositives + falsePositives);
        log.info("Threshold: {}, Precision: {}", threshold, precision);
    }
}

Performance Considerations

ConcernStrategy
LatencyBatch embeddings; use async calls; cache results
CostChoose smaller models (1536 vs 3072 dims); batch requests
StorageUse dimensionality reduction (PCA) if recall permits
RecallUse larger models or hybrid search (keyword + semantic)
Index SizeShard vector store; use approximate NN (HNSW, IVF)

Observability

Metrics to Track

@Component
@Aspect
public class EmbeddingMetrics {
    private final MeterRegistry registry;
    
    @Around("execution(* org.springframework.ai.embedding.EmbeddingModel.call(..))")
    public Object trackEmbedding(ProceedingJoinPoint joinPoint) throws Throwable {
        Timer.Sample sample = Timer.start(registry);
        
        try {
            EmbeddingResponse response = (EmbeddingResponse) joinPoint.proceed();
            
            // Track batch size
            registry.counter("embedding.batch.size")
                .increment(response.getResults().size());
            
            // Track total dimensions
            int dims = response.getResults().get(0).getOutput().size();
            registry.gauge("embedding.dimensions", dims);
            
            return response;
        } finally {
            sample.stop(registry.timer("embedding.duration"));
        }
    }
}

References