FastEmbed

FastEmbed is MemWire’s default embedding backend. Models run entirely on-device — no OpenAI key, no network calls, no per-token cost. Two model types are used:

Type	Purpose
Dense model	Semantic similarity — powers memory search and recall
Sparse model	Keyword matching — used alongside dense vectors in hybrid search

Default models

No configuration needed to get started — MemWire ships with sensible defaults.

from memwire import MemWire, MemWireConfig

config = MemWireConfig(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    embedding_dim=384,
    sparse_model_name="prithivida/Splade_PP_en_v1",
)
memory = MemWire(config=config)

Model	Type	Dimensions	Notes
`sentence-transformers/all-MiniLM-L6-v2`	Dense	384	Fast, multilingual-friendly, good general-purpose baseline
`prithivida/Splade_PP_en_v1`	Sparse	—	SPLADE sparse model for hybrid BM25+vector search

Changing the dense model

You can swap the dense model for any FastEmbed-compatible model. Make sure embedding_dim matches the model’s output dimension.

# Smaller — faster inference, slightly lower accuracy
config = MemWireConfig(
    model_name="BAAI/bge-small-en-v1.5",
    embedding_dim=384,
)

# Balanced — good quality, moderate size
config = MemWireConfig(
    model_name="BAAI/bge-base-en-v1.5",
    embedding_dim=768,
)

# Larger — best quality, slower inference
config = MemWireConfig(
    model_name="BAAI/bge-large-en-v1.5",
    embedding_dim=1024,
)

If you change the embedding model after data has already been stored, existing vectors will be incompatible with new embeddings. Reset your Qdrant storage when switching models.

Disabling hybrid search

If you want dense-only retrieval and don’t need the sparse model to load:

config = MemWireConfig(
    use_hybrid_search=False,
)

Enabling reranking

A cross-encoder reranker re-scores the top candidates after initial retrieval. This improves precision at the cost of slightly higher latency.

config = MemWireConfig(
    use_reranking=True,
    reranker_model_name="Xenova/ms-marco-MiniLM-L-6-v2",
)

Configuration reference

Parameter	Default	Description
`model_name`	`sentence-transformers/all-MiniLM-L6-v2`	Dense embedding model. Must be FastEmbed-compatible.
`embedding_dim`	`384`	Output dimension of the dense model. Must match the model.
`sparse_model_name`	`prithivida/Splade_PP_en_v1`	Sparse model for hybrid BM25+vector search.
`use_hybrid_search`	`True`	Combine dense and sparse vectors. Set to `False` for dense-only search.
`use_reranking`	`False`	Apply a cross-encoder reranker to re-score top candidates.
`reranker_model_name`	`Xenova/ms-marco-MiniLM-L-6-v2`	Cross-encoder reranker model (used when `use_reranking=True`).
`embedding_cache_maxsize`	`10000`	LRU cache size for embedding vectors. Increase for large-scale workloads.

Getting started

Features

Configuration

Default models

Changing the dense model

Disabling hybrid search

Enabling reranking

Configuration reference

Getting started

Features

Configuration

​Default models

​Changing the dense model

​Disabling hybrid search

​Enabling reranking

​Configuration reference

Default models

Changing the dense model

Disabling hybrid search

Enabling reranking

Configuration reference