BoEmbedder

A Python library for generating text embeddings using Google's Gemini API with batching, retry logic, and progress tracking.

Installation

From GitHub

pip install git+https://github.com/OpenPecha/BoEmbedder.git

From Source

# Clone the repository
git clone https://github.com/OpenPecha/BoEmbedder.git
cd BoEmbedder

# Install in development mode
pip install -e .

# Or install with development dependencies
pip install -e ".[dev]"

Usage

Basic Example

from BoEmbedder.gemini import embed_texts_batch

# Your texts to embed
texts = [
    "This is the first document.",
    "This is the second document.",
    "And this is the third one."
]

# Generate embeddings
embeddings = embed_texts_batch(
    texts=texts,
    api_key="your-gemini-api-key",
    batch_size=10,
    model="gemini-embedding-001",
    output_dimensionality=768
)

print(f"Generated {len(embeddings)} embeddings")
print(f"Each embedding has dimension: {len(embeddings[0])}")

Using Environment Variables

import os
from BoEmbedder.gemini import embed_texts_batch

# Set your API key as an environment variable
os.environ["GOOGLE_GEMINI_KEY"] = "your-gemini-api-key"

# Embed texts (API key will be read from environment)
embeddings = embed_texts_batch(
    texts=texts,
    api_key=os.getenv("GOOGLE_GEMINI_KEY")
)

Advanced Usage with Custom Parameters

from BoEmbedder.gemini import embed_texts_batch

embeddings = embed_texts_batch(
    texts=texts,
    api_key="your-gemini-api-key",
    batch_size=5,              # Process 5 texts per batch
    max_texts=100,             # Limit to first 100 texts
    model="gemini-embedding-001",
    task_type="RETRIEVAL_DOCUMENT",  # Or "RETRIEVAL_QUERY", "SEMANTIC_SIMILARITY"
    output_dimensionality=768,  # Embedding dimension
    max_retries=3,             # Retry failed batches up to 3 times
    show_progress=True         # Display progress bar
)

Configuration Only

from BoEmbedder.gemini import configure_gemini

# Configure Gemini API separately
configure_gemini(api_key="your-gemini-api-key")

# Or use environment variable
configure_gemini()  # Reads from GOOGLE_GEMINI_KEY

Features

Batch Processing: Process multiple texts efficiently in configurable batch sizes
Automatic Retries: Built-in retry logic with exponential backoff and jitter
Progress Tracking: Optional progress bar using tqdm
Flexible Configuration: Support for different models, task types, and embedding dimensions
Error Handling: Robust error handling with detailed error messages
Environment Variable Support: API keys can be provided directly or via environment variables

API Reference

`embed_texts_batch()`

Generate embeddings for a list of texts.

Parameters:

texts (List[str]): List of text strings to embed
api_key (str): Gemini API key
batch_size (int, optional): Number of texts per batch (default: 10)
max_texts (int, optional): Maximum number of texts to process (default: None, processes all)
model (str, optional): Gemini model name (default: "gemini-embedding-001")
task_type (str, optional): Task type for embeddings (default: "RETRIEVAL_DOCUMENT")
output_dimensionality (int, optional): Embedding dimension (default: 768)
max_retries (int, optional): Maximum retry attempts per batch (default: 3)
show_progress (bool, optional): Show progress bar (default: True)

Returns:

List[List[float]]: List of embeddings, where each embedding is a list of floats

`configure_gemini()`

Configure the Gemini API client.

Parameters:

api_key (str, optional): API key. If not provided, uses GOOGLE_GEMINI_KEY from environment

Contributing

We welcome contributions! Here's how you can help:

Fork the repository
Create a new branch (git checkout -b feature/your-feature)
Make your changes
Run tests if available (pytest)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/your-feature)
Create a new Pull Request

Development Setup

# Install with development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=BoEmbedder

License

This project is licensed under the MIT License - see the LICENSE file for details.

Maintained by: OpenPecha

Questions or Issues? Please file an issue on our GitHub Issues page.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
docs		docs
src/BoEmbedder		src/BoEmbedder
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BoEmbedder

Installation

From GitHub

From Source

Usage

Basic Example

Using Environment Variables

Advanced Usage with Custom Parameters

Configuration Only

Features

API Reference

`embed_texts_batch()`

`configure_gemini()`

Contributing

Development Setup

License

About

Uh oh!

Releases

Packages

Languages

License

OpenPecha/BoEmbedder

Folders and files

Latest commit

History

Repository files navigation

BoEmbedder

Installation

From GitHub

From Source

Usage

Basic Example

Using Environment Variables

Advanced Usage with Custom Parameters

Configuration Only

Features

API Reference

embed_texts_batch()

configure_gemini()

Contributing

Development Setup

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`embed_texts_batch()`

`configure_gemini()`

Packages