An embedding turns text into a list of numbers that captures meaning. Similar ideas land close together in that space, which is what makes semantic search possible.
Why not just keywords?
Keyword search matches characters. Embeddings match meaning, so "how do I cancel my plan" finds a doc titled "Ending a subscription" even with no shared words.
Measuring closeness
Most systems compare vectors with cosine similarity, which looks at the angle between them rather than their length:
import numpy as np
def cosine(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))Where it goes wrong
Embeddings inherit the biases and blind spots of their training data, and chunking choices matter a lot. Test retrieval on real queries before trusting it.