Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
244 changes: 235 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,250 @@
# LlmEvalRuby

TODO: Delete this and the text below, and describe your gem

Welcome to your new gem! In this directory, you'll find the files you need to be able to package up your Ruby library into a gem. Put your Ruby code in the file `lib/llm_eval_ruby`. To experiment with that code, run `bin/console` for an interactive prompt.
A Ruby gem that provides LLM evaluation functionality with prompt management and tracing capabilities. It supports multiple adapters for both local development and production environments with Langfuse integration.

## Installation

TODO: Replace `UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG` with your gem name right after releasing it to RubyGems.org. Please do not do it earlier due to security reasons. Alternatively, replace this section with instructions to install your gem from git if you don't plan to release to RubyGems.org.
Add this line to your application's Gemfile:

```ruby
gem 'llm_eval_ruby', git: 'https://github.com/test-IO/llm_eval_ruby'
```

Install the gem and add to the application's Gemfile by executing:
And then execute:

$ bundle add UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
$ bundle install

If bundler is not being used to manage dependencies, install the gem by executing:
## Configuration

$ gem install UPDATE_WITH_YOUR_GEM_NAME_IMMEDIATELY_AFTER_RELEASE_TO_RUBYGEMS_ORG
Create an initializer file (e.g., `config/initializers/llm_eval_ruby.rb`):

```ruby
LlmEvalRuby.configure do |config|
# Choose adapter: :langfuse or :local
config.adapter = :langfuse

# Langfuse configuration (for production)
config.langfuse_options = {
host: ENV['LANGFUSE_HOST'],
username: ENV['LANGFUSE_USERNAME'],
password: ENV['LANGFUSE_PASSWORD']
}

# Local configuration (for development)
config.local_options = {
prompts_path: Rails.root.join('lib', 'prompts'),
traces_path: Rails.root.join('log', 'trace.log')
}
end
```

## Usage

TODO: Write usage instructions here
### 1. Basic Tracing

```ruby
# Create a trace for a job or operation
trace = LlmEvalRuby::Tracer.trace(
name: 'test_case_generation',
session_id: 'session_123',
input: { url: 'https://example.com' },
user_id: 'test-generator',
metadata: { workflow_id: 'wf_123' }
)

# Create spans within a trace
LlmEvalRuby::Tracer.span(name: :fetch_prompts, trace_id: trace.id) do
# Your code here
end

# Track AI generations
generation = LlmEvalRuby::Tracer.generation(
name: 'generate_test_cases',
input: 'Generate test cases for login functionality',
trace_id: trace.id
)

# ... make AI API call ...

generation.end(output: ai_response, usage: { tokens: 150 })
```

### 2. Observable Pattern for Services

```ruby
class OpenaiService
include LlmEvalRuby::Observable

attr_reader :trace_id

# Automatically create spans for these methods
observe :create_assistant, type: :span
observe :create_file, type: :span
observe :add_message, type: :span

# Automatically track generation for this method
observe :chat, type: :generation

def initialize(session_id, trace_id = nil)
@session_id = session_id
@trace_id = trace_id
end

def create_assistant(params)
# Method implementation
# Automatically wrapped in a span
end

def chat(params)
# Method implementation
# Automatically tracked as a generation
end
end
```

### 3. Prompt Management

#### Chat Prompts (System + User)
```ruby
# Fetch chat prompts (returns [system_prompt, user_prompt])
system_prompt, user_prompt = LlmEvalRuby::PromptRepositories::Chat.fetch(
name: :validate_test_case
)

# Compile prompts with variables
compiled_prompt = user_prompt.compile(
variables: {
test_case_content: "Login with valid credentials",
out_of_scope: "Performance testing"
}
)

# Use in AI call
response = openai.chat(
parameters: {
model: 'gpt-4',
messages: [
{ role: 'system', content: system_prompt.content },
{ role: 'user', content: compiled_prompt.content }
]
}
)
```

#### Text Prompts (Single prompt)
```ruby
# Fetch and compile in one step
prompt = LlmEvalRuby::PromptRepositories::Text.fetch_and_compile(
name: :precook_out_of_scope,
variables: {
feature_description: "User authentication",
out_of_scope: "Load testing, Security scanning"
}
)

# Use the compiled content
response = openai.chat(
parameters: {
model: 'gpt-4',
messages: [{ role: 'user', content: prompt.content }]
}
)
```

### 4. Real-World Example: Background Job

```ruby
class ValidateTestCaseJob < ApplicationJob
def perform(test_case_id)
@test_case = TestCase.find(test_case_id)

# Start trace
@trace_id = LlmEvalRuby::Tracer.trace(
name: 'validate_test_case',
session_id: @test_case.session_id,
input: { test_case_id: test_case_id },
user_id: 'validator'
).id

# Fetch prompts with span tracking
LlmEvalRuby::Tracer.span(name: :fetch_prompts, trace_id: @trace_id) do
@system_prompt, @user_prompt = LlmEvalRuby::PromptRepositories::Chat.fetch(
name: :validate_test_case
)
end

# Compile prompt with variables
compiled_prompt = @user_prompt.compile(
variables: {
test_case_content: @test_case.content,
requirements: @test_case.requirements
}
)

# Track AI generation
generation = LlmEvalRuby::Tracer.generation(
name: 'validate_test_case',
input: compiled_prompt.content,
trace_id: @trace_id
)

# Make AI call
response = openai_service.chat(
parameters: {
model: 'gpt-4',
messages: [
{ role: 'system', content: @system_prompt.content },
{ role: 'user', content: compiled_prompt.content }
]
}
)

# End generation tracking
generation.end(
output: response.dig('choices', 0, 'message', 'content'),
usage: response['usage']
)

# Process response...
end

private

def openai_service
@openai_service ||= OpenaiService.new(@test_case.session_id, @trace_id)
end
end
```

### 5. Prompt File Structure

For local adapter, organize prompts in your `lib/prompts` directory:

```
lib/prompts/
├── chat/
│ ├── validate_test_case/
│ │ ├── system.liquid
│ │ └── user.liquid
│ └── generate_test_cases/
│ ├── system.liquid
│ └── user.liquid
└── text/
└── precook_out_of_scope.liquid
```

Example prompt file (`lib/prompts/chat/validate_test_case/user.liquid`):
```liquid
Please validate the following test case:

Test Case Content:
{{ test_case_content }}

Out of Scope Items:
{{ out_of_scope }}

Determine if this test case is valid and in scope.
```

## Development

Expand Down