negotiation_platform.models package

class negotiation_platform.models.BaseLLMModel(model_name: str, config: Dict[str, Any])[source]

Bases: ABC

Abstract base class for all Large Language Model implementations.

This class defines the standardized interface that all LLM model wrappers must implement to ensure compatibility and interchangeability within the negotiation platform. It provides the foundation for supporting various model types and APIs through a unified abstraction layer.

The interface encompasses the complete model lifecycle from initialization and loading through response generation and resource cleanup. All methods are designed to support both synchronous and configuration-driven usage patterns common in negotiation research scenarios.

model_name

Unique identifier for the model (e.g., model path, API endpoint, or registry name).

Type:

str

config

Model-specific configuration parameters including generation settings, API keys, and optimization flags.

Type:

Dict[str, Any]

is_loaded

Current loading state of the model, used for resource management and error prevention.

Type:

bool

Example

>>> class CustomModelWrapper(BaseLLMModel):
...     def load_model(self):
...         # Custom implementation
...         self.is_loaded = True
...     def generate_response(self, prompt):
...         # Custom response generation
...         return "Generated response"

Note

Concrete implementations should handle model-specific concerns such as authentication, memory management, GPU utilization, and error recovery while maintaining compatibility with this interface.

__init__(model_name: str, config: Dict[str, Any])[source]

Initialize the base LLM model with configuration.

Sets up the foundational attributes required by all model implementations including the model identifier, configuration parameters, and loading state tracking. Subclasses should call this constructor before performing any model-specific initialization.

Parameters:
  • model_name (str) – Unique identifier for the model, which may represent a file path, model registry name, API endpoint, or other model-specific identifier used for loading.

  • config (Dict[str, Any]) – Comprehensive configuration dictionary containing model-specific parameters such as: - Generation settings (temperature, max_tokens, top_p) - Hardware preferences (device, precision, memory_limits) - API credentials and endpoints (for cloud models) - Optimization flags (quantization, batching, caching)

Example

>>> config = {
...     'temperature': 0.7,
...     'max_new_tokens': 120,
...     'device': 'cuda',
...     'load_in_8bit': True
... }
>>> model = CustomModelWrapper('model-name', config)

Note

The model is initialized in an unloaded state. Call load_model() explicitly to prepare the model for inference operations.

abstract load_model()[source]

Load the model into memory and prepare for inference.

This method performs all necessary initialization steps to make the model ready for generating responses. Implementation should handle model downloading, memory allocation, device placement, and any required preprocessing or optimization steps.

The method should be idempotent - calling it multiple times should not cause errors or unnecessary resource consumption. Implementations should update the is_loaded attribute to reflect the current state.

Raises:
  • RuntimeError – If model loading fails due to insufficient resources, network issues, authentication problems, or other critical errors.

  • FileNotFoundError – If the specified model cannot be located or accessed from the configured source.

  • MemoryError – If insufficient memory is available for model loading.

Example

>>> model = HuggingFaceModelWrapper('model-name', config)
>>> model.load_model()  # Loads model into GPU/CPU memory
>>> assert model.is_loaded == True

Note

Implementations should handle device placement, quantization, memory optimization, and other hardware-specific concerns based on the configuration provided during initialization.

abstract generate_response(prompt: str, **kwargs) str[source]

Generate a text response from the loaded model.

This method performs inference using the loaded model to generate a text response based on the provided prompt. The response should be contextually appropriate for negotiation scenarios and follow any game-specific formatting requirements.

Parameters:
  • prompt (str) – Input text prompt for the model, typically containing negotiation context, game rules, player role, and current situation requiring a response.

  • **kwargs – Additional generation parameters that may override default configuration settings. Common parameters include: - temperature (float): Sampling temperature for randomness - max_new_tokens (int): Maximum tokens to generate - top_p (float): Nucleus sampling parameter - do_sample (bool): Whether to use sampling vs greedy decoding

Returns:

Generated text response from the model, which should be

parseable into structured actions for the negotiation game. The response may contain JSON, natural language, or mixed formats depending on the model and prompt design.

Return type:

str

Raises:
  • RuntimeError – If the model is not loaded or generation fails due to resource constraints or model errors.

  • ValueError – If the prompt is empty, too long, or contains invalid characters that the model cannot process.

Example

>>> prompt = "You are a buyer in a car negotiation. Make an offer."
>>> response = model.generate_response(prompt, temperature=0.5)
>>> print(response)
'{"type": "offer", "price": 25000}'

Note

Implementations should handle prompt preprocessing, generation parameter validation, output post-processing, and error recovery while maintaining consistent response quality.

abstract parse_action(response: str, game_type: str | None = None, player_name: str | None = None) Dict[str, Any][source]

Parse LLM response into a structured negotiation action.

This method converts the raw text output from the model into a standardized action dictionary that can be processed by the game engine. It should handle various response formats, apply validation, and provide error recovery for malformed responses.

Parameters:
  • response (str) – Raw text response from the model, which may contain JSON, structured text, or natural language that needs to be extracted and parsed into action format.

  • game_type (str, optional) – Type of negotiation game being played (e.g., ‘company_car’, ‘resource_allocation’, ‘integrative_negotiations’), used for game-specific parsing and validation rules.

  • player_name (str, optional) – Name or identifier of the player generating the action, used for debugging and error reporting.

Returns:

Structured action dictionary containing at minimum:
  • type (str): Action type (e.g., ‘offer’, ‘accept’, ‘reject’)

  • Additional fields specific to the action type and game: * price (float): For offer/counter actions in price bargaining * gpu_hours/cpu_hours (float): For resource allocation proposals * proposal (dict): For integrative negotiation proposals

Return type:

Dict[str, Any]

Raises:
  • ValueError – If the response cannot be parsed into a valid action or contains invalid values for the specified game type.

  • KeyError – If required action fields are missing from the parsed response for the given game context.

Example

>>> response = '{"type": "offer", "price": 28000}'
>>> action = model.parse_action(response, 'company_car', 'Buyer')
>>> print(action)
{'type': 'offer', 'price': 28000.0}

Note

Implementations should provide robust parsing with fallback strategies for common formatting issues, and should integrate with validation schemas when available for the game type.

abstract unload_model()[source]

Unload model from memory and free associated resources.

This method performs cleanup operations to release memory and other system resources used by the model. It should handle GPU memory cleanup, cache clearing, and any other resource deallocation needed to prepare for model switching or application shutdown.

The method should be idempotent - calling it multiple times should not cause errors. After successful unloading, the is_loaded attribute should be set to False to reflect the model state.

Raises:

RuntimeError – If unloading fails or resources cannot be properly released, though implementations should make best efforts to continue with partial cleanup in such cases.

Example

>>> model.load_model()
>>> assert model.is_loaded == True
>>> model.unload_model()
>>> assert model.is_loaded == False

Note

This method is critical for memory management in multi-model scenarios where models are dynamically loaded and switched. Implementations should ensure thorough cleanup including GPU memory, cached tensors, and any background processes.

get_model_info() Dict[str, Any][source]

Retrieve comprehensive information about the model instance.

This method provides a standardized way to inspect the current state and configuration of a model instance. It returns information useful for debugging, monitoring, logging, and administrative purposes.

Returns:

Information dictionary containing:
  • name (str): The model identifier used during initialization

  • loaded (bool): Current loading state of the model

  • config (Dict[str, Any]): Complete configuration dictionary including all parameters used for model initialization and generation settings

Return type:

Dict[str, Any]

Example

>>> model = HuggingFaceModelWrapper('model-name', {'temp': 0.7})
>>> info = model.get_model_info()
>>> print(info)
{
    'name': 'model-name',
    'loaded': False,
    'config': {'temp': 0.7}
}

Note

This method does not modify the model state and can be called safely at any time. Subclasses may extend this method to include additional model-specific information such as memory usage, device placement, or performance statistics.

class negotiation_platform.models.HuggingFaceModelWrapper(model_name: str, config: Dict[str, Any])[source]

Bases: BaseLLMModel

Production-ready wrapper for Hugging Face Transformers models.

This class provides a comprehensive implementation of the BaseLLMModel interface specifically designed for Hugging Face Transformers models. It handles the complexities of modern language model deployment including memory optimization, device management, and robust inference pipelines.

The wrapper supports both research and production scenarios with features like automatic quantization, intelligent device placement, comprehensive error handling, and extensive logging for debugging and monitoring.

tokenizer

Hugging Face tokenizer instance for the model.

model

Loaded Hugging Face model instance ready for inference.

device

Target device for model execution (‘cuda’, ‘cpu’, or ‘auto’).

Type:

str

Configuration Options:
Device and Memory:
  • device (str): Target device (‘cuda’, ‘cpu’, ‘auto’)

  • device_map (str/dict): Advanced device placement strategy

  • load_in_8bit (bool): Enable 8-bit quantization for memory efficiency

  • load_in_4bit (bool): Enable 4-bit quantization for extreme efficiency

Generation Parameters:
  • temperature (float): Sampling temperature (0.0-2.0)

  • max_new_tokens (int): Maximum tokens to generate

  • top_p (float): Nucleus sampling parameter

  • do_sample (bool): Enable sampling vs greedy decoding

  • repetition_penalty (float): Penalty for repetitive text

Authentication:
  • api_token (str): Hugging Face API token for private models

  • trust_remote_code (bool): Allow execution of remote code

Example

>>> config = {
...     'device': 'auto',
...     'load_in_8bit': True,
...     'temperature': 0.7,
...     'max_new_tokens': 120,
...     'api_token': 'hf_token_here'
... }
>>> wrapper = HuggingFaceModelWrapper('model-name', config)
>>> wrapper.load_model()
>>> response = wrapper.generate_response('Your prompt here')

Note

This implementation includes production-ready optimizations and extensive error handling. For development, consider enabling verbose logging to monitor model behavior and performance.

__init__(model_name: str, config: Dict[str, Any])[source]

Initialize Hugging Face model wrapper with intelligent device detection.

Sets up the wrapper with automatic device detection and configuration normalization. Handles the complexity of device selection including CUDA availability checking and fallback strategies for various hardware configurations.

Parameters:
  • model_name (str) – Hugging Face model identifier, which can be: - Repository ID (e.g., ‘meta-llama/Llama-2-7b-chat-hf’) - Local model path for offline usage - Custom model name for privately hosted models

  • config (Dict[str, Any]) – Comprehensive configuration dictionary containing device preferences, generation parameters, memory optimization settings, and authentication credentials.

Example

>>> config = {
...     'device': 'auto',  # Automatic device detection
...     'load_in_8bit': True,  # Memory optimization
...     'temperature': 0.7,  # Generation creativity
...     'max_new_tokens': 150
... }
>>> wrapper = HuggingFaceModelWrapper('model-name', config)

Note

The device setting is intelligently resolved: ‘auto’ selects CUDA if available, otherwise falls back to CPU. Explicit device settings override automatic detection but should match available hardware.

load_model()[source]

Load Hugging Face model and tokenizer with advanced optimization.

Performs comprehensive model loading with memory optimization, device placement, quantization support, and extensive error handling. This method implements production-ready loading strategies including automatic quantization, intelligent device mapping, and detailed progress monitoring.

The loading process includes: - Environment optimization and TorchDynamo configuration - Quantization setup (4-bit/8-bit) for memory efficiency - Tokenizer initialization with authentication handling - Model loading with optimal device placement - Resource monitoring and diagnostic reporting

Raises:
  • RuntimeError – If model loading fails due to insufficient resources, authentication issues, or hardware compatibility problems.

  • FileNotFoundError – If the specified model cannot be found in the Hugging Face Hub or local filesystem.

  • MemoryError – If insufficient GPU/CPU memory is available for the model with current quantization settings.

  • ImportError – If required dependencies (bitsandbytes) are missing for quantization features.

Example

>>> model = HuggingFaceModelWrapper('meta-llama/Llama-2-7b-chat-hf', config)
>>> model.load_model()  # Comprehensive loading with optimization
Loading meta-llama/Llama-2-7b-chat-hf...
✅ Model loaded successfully in 45.2s

Note

This method includes extensive logging and diagnostic output for debugging. It automatically handles quantization, device placement, and memory optimization based on the configuration provided during initialization. The loading process is optimized for both speed and memory efficiency.

generate_response(prompt: str, **kwargs) str[source]

Generate contextually appropriate response using the loaded model.

Performs inference using the loaded Hugging Face model with intelligent parameter management, device handling, and response post-processing. The method prioritizes configuration-based parameters while providing robust error handling and response cleaning for negotiation contexts.

Parameters:
  • prompt (str) – Input prompt for the model, typically containing negotiation context, game rules, and situation requiring response.

  • **kwargs – Additional generation parameters (note: YAML configuration takes precedence over kwargs for consistency). Supported options: - early_stopping (bool): Enable early stopping for beam search - length_penalty (float): Length penalty for generation

Returns:

Generated response text, cleaned and processed for parsing.

The method attempts to extract clean JSON from model output and removes common prefixes/suffixes that interfere with action parsing.

Return type:

str

Raises:
  • RuntimeError – If the model is not loaded or generation fails due to resource constraints, device issues, or model errors.

  • ValueError – If the prompt is empty or contains characters that cause tokenization or generation failures.

Example

>>> prompt = "Make a negotiation offer for the company car."
>>> response = model.generate_response(prompt)
>>> print(response)
'{"type": "offer", "price": 28000}'

Note

This method includes intelligent response cleaning that attempts to extract clean JSON from model outputs and removes common model-generated prefixes. Generation parameters are strictly controlled by YAML configuration to ensure consistent behavior across experiments.

parse_action(response: str, game_type: str | None = None, player_name: str | None = None) Dict[str, Any][source]

Parse LLM response into structured negotiation action with validation.

Implements a comprehensive parsing pipeline that converts raw model output into validated negotiation actions. The method employs multiple parsing strategies, extensive error handling, and optional Pydantic validation for game-specific action constraints.

Parameters:
  • response (str) – Raw text response from the model, which may contain JSON, natural language, or mixed formats requiring extraction.

  • game_type (str, optional) – Type of negotiation game for validation (‘company_car’, ‘resource_allocation’, ‘integrative_negotiations’). Enables game-specific parsing rules and action validation.

  • player_name (str, optional) – Player identifier for enhanced logging and debugging output during parsing operations.

Returns:

Validated action dictionary containing:
  • type (str): Action type (‘offer’, ‘accept’, ‘reject’, ‘propose’)

  • Additional fields specific to action type and game context

  • Fallback ‘noop’ action if parsing fails completely

Return type:

Dict[str, Any]

Parsing Strategy:
  1. Response preprocessing to fix common JSON issues

  2. Priority extraction of JSON from response start

  3. Pattern-based JSON extraction with multiple strategies

  4. Manual key-value extraction as fallback

  5. Optional Pydantic validation for game-specific constraints

Example

>>> response = '{"type": "offer", "price": 28000}'
>>> action = model.parse_action(response, 'company_car', 'Buyer')
>>> print(action)
{'type': 'offer', 'price': 28000.0}

Note

This method includes extensive debugging output and graceful error handling. It integrates with the action_schemas module for Pydantic validation when game_type is specified, providing robust action parsing for research and production scenarios.

unload_model()[source]

Unload model from memory and perform comprehensive cleanup.

Performs thorough cleanup of model resources including GPU memory, cached tensors, and Python object references. This method is essential for memory management in multi-model scenarios and prevents memory leaks during model switching operations.

Cleanup Operations:
  • Deletion of model and tokenizer objects

  • GPU memory cache clearing (CUDA)

  • Python garbage collection triggering

  • Loading state reset

Example

>>> model.load_model()
>>> # ... use model for inference ...
>>> model.unload_model()  # Free all resources
🗑️  meta-llama/Llama-2-7b-chat-hf unloaded

Note

This method is idempotent and safe to call multiple times. It’s automatically called during model switching and should be called explicitly when done with a model to free resources for other models or applications.

Submodules

Action Schemas Module

Pydantic schemas for validating and constraining LLM action outputs

class negotiation_platform.models.action_schemas.BaseAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Base action schema

type: str
class negotiation_platform.models.action_schemas.OfferAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Price bargaining offer action

type: Literal['offer'] = Ellipsis
price: float = Ellipsis
class negotiation_platform.models.action_schemas.AcceptAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Accept action for any game type

type: Literal['accept'] = Ellipsis
class negotiation_platform.models.action_schemas.CounterAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Price bargaining counter-offer action

type: Literal['counter'] = Ellipsis
price: float = Ellipsis
class negotiation_platform.models.action_schemas.RejectAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Reject action for any game type

type: Literal['reject'] = Ellipsis
class negotiation_platform.models.action_schemas.ResourceProposalAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Resource allocation proposal with gpu_hours and cpu_hours

type: Literal['propose'] = Ellipsis
gpu_hours: float = Ellipsis
cpu_hours: float = Ellipsis
class negotiation_platform.models.action_schemas.ProposeTradeAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Alternative resource allocation trade proposal

type: Literal['propose_trade'] = Ellipsis
offer: Dict[str, float] = Ellipsis
request: Dict[str, float] = Ellipsis
class negotiation_platform.models.action_schemas.IntegrativeProposalAction(*args: Any, **kwargs: Any)[source]

Bases: BaseModel

Integrative negotiation proposal with constrained discrete values

type: Literal['propose'] = Ellipsis
server_room: int | float = Ellipsis
meeting_access: int | float = Ellipsis
cleaning: str = Ellipsis
branding: str = Ellipsis
validate_server_room

classmethod(function) -> method

Convert a function to be a class method.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:

@classmethod def f(cls, arg1, arg2, …):

It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see the staticmethod builtin.

validate_meeting_access

classmethod(function) -> method

Convert a function to be a class method.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:

@classmethod def f(cls, arg1, arg2, …):

It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see the staticmethod builtin.

validate_cleaning

classmethod(function) -> method

Convert a function to be a class method.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:

@classmethod def f(cls, arg1, arg2, …):

It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see the staticmethod builtin.

validate_branding

classmethod(function) -> method

Convert a function to be a class method.

A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom:

class C:

@classmethod def f(cls, arg1, arg2, …):

It can be called either on the class (e.g. C.f()) or on an instance (e.g. C().f()). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument.

Class methods are different than C++ or Java static methods. If you want those, see the staticmethod builtin.

negotiation_platform.models.action_schemas.GameAction

Union type encompassing all valid negotiation actions.

This type union provides comprehensive coverage of all supported action types across different negotiation game scenarios. It serves as the canonical reference for valid action schemas in the validation pipeline and enables type-safe action processing throughout the platform.

Included Action Types:
  • OfferAction: Price-based offers in bargaining scenarios

  • AcceptAction: Universal acceptance across all game types

  • CounterAction: Price-based counter-offers in bargaining

  • RejectAction: Universal rejection across all game types

  • ResourceProposalAction: Computing resource allocation proposals

  • ProposeTradeAction: Complex resource trading proposals

  • IntegrativeProposalAction: Multi-issue integrative negotiations

Usage:

This union is used internally by the validation system and should not typically be referenced directly in user code. Instead, use the validate_and_constrain_action() function for action processing.

alias of OfferAction | AcceptAction | CounterAction | RejectAction | ResourceProposalAction | ProposeTradeAction | IntegrativeProposalAction

negotiation_platform.models.action_schemas.validate_and_constrain_action(raw_response: str, game_type: str) Dict[str, Any][source]

Validate and constrain LLM response to proper negotiation action format.

This function serves as the primary entry point for converting raw LLM responses into validated, game-appropriate action dictionaries. It applies comprehensive validation using Pydantic schemas, performs automatic error correction, and provides robust fallback handling for malformed responses.

The validation process includes JSON parsing, game-specific schema validation, automatic value correction, and intelligent error recovery to maximize the success rate of action parsing from LLM outputs.

Parameters:
  • raw_response (str) – Raw JSON string response from the LLM, which may contain formatting issues, invalid values, or non-standard structure requiring correction and validation.

  • game_type (str) – Type of negotiation game context for validation (‘price_bargaining’, ‘company_car’, ‘resource_allocation’, ‘integrative_negotiations’). Determines which validation schemas and correction rules are applied.

Returns:

Validated and constrained action dictionary

containing properly formatted action data with all required fields and valid values according to game-specific rules. Includes type field and game-appropriate additional fields.

Return type:

Dict[str, Any]

Raises:

ValueError – If the response cannot be parsed, validated, or corrected into a valid action format despite multiple correction attempts and fallback strategies.

Example

>>> response = '{"type": "offer", "price": 28000}'
>>> action = validate_and_constrain_action(response, 'company_car')
>>> print(action)
{'type': 'offer', 'price': 28000.0}
>>> # Auto-correction example
>>> response = '{"type": "propose", "server_room": 75}'
>>> action = validate_and_constrain_action(response, 'integrative_negotiations')
>>> print(action['server_room'])  # Auto-corrected to 50
50

Note

This function includes extensive error handling and automatic correction capabilities. It integrates with the auto_correct_action function to handle common LLM output mistakes and provides graceful degradation when validation fails completely.

negotiation_platform.models.action_schemas.auto_correct_action(parsed: Dict[str, Any], game_type: str) Dict[str, Any] | None[source]

Auto-correct common LLM mistakes in action format with intelligent fallbacks.

This function implements sophisticated error correction for common mistakes made by language models when generating negotiation actions. It uses pattern matching, keyword detection, and game-specific knowledge to recover valid actions from malformed or non-standard LLM outputs.

Parameters:
  • parsed (Dict[str, Any]) – Parsed action dictionary that failed standard validation, potentially containing incorrect field names, invalid values, or missing required information.

  • game_type (str) – Type of negotiation game for context-specific correction rules (‘price_bargaining’, ‘company_car’, ‘resource_allocation’, ‘integrative_negotiations’).

Returns:

Corrected action dictionary if successful

correction is possible, None if the input cannot be meaningfully corrected into a valid action format. Corrected actions conform to the appropriate schema for the specified game type.

Return type:

Optional[Dict[str, Any]]

Correction Strategies:
  • Action type normalization (accept/agree/yes → accept)

  • Alternative field name handling (amount/value → price)

  • Game-specific field mapping (x/y → gpu_hours/cpu_hours)

  • Value constraint enforcement for integrative negotiations

  • Format conversion (nested proposal structures)

Example

>>> # Correct alternative acceptance terms
>>> parsed = {'type': 'agree'}
>>> corrected = auto_correct_action(parsed, 'company_car')
>>> print(corrected)
{'type': 'accept'}
>>> # Correct alternative field names
>>> parsed = {'type': 'offer', 'amount': 25000}
>>> corrected = auto_correct_action(parsed, 'price_bargaining')
>>> print(corrected)
{'type': 'offer', 'price': 25000.0}

Note

This function is designed to be permissive and attempts multiple correction strategies before giving up. It logs correction actions for debugging and research purposes when values are automatically adjusted to meet constraint requirements.

Base Model Module

Base Model Interface

Abstract base class interface for plug-and-play Large Language Model (LLM) integration in the negotiation platform. This module defines the standardized API that all model implementations must follow to ensure compatibility and interchangeability within the platform architecture.

The interface supports various model types including Hugging Face transformers, commercial APIs, and custom implementations through a unified abstraction layer. All concrete implementations must provide model loading, response generation, action parsing, and resource cleanup capabilities.

Key Features:
  • Standardized model lifecycle management (load/unload)

  • Unified response generation interface with configurable parameters

  • Game-specific action parsing with validation support

  • Memory management and resource optimization hooks

  • Extensible configuration system for model-specific parameters

Example

>>> from negotiation_platform.models.hf_model_wrapper import HuggingFaceModelWrapper
>>> config = {'max_new_tokens': 120, 'temperature': 0.7}
>>> model = HuggingFaceModelWrapper('meta-llama/Llama-2-7b-chat-hf', config)
>>> model.load_model()
>>> response = model.generate_response('Your negotiation prompt here')
>>> action = model.parse_action(response, 'company_car', 'Player1')

Note

This is an abstract base class and cannot be instantiated directly. Use concrete implementations like HuggingFaceModelWrapper for actual model integration.

class negotiation_platform.models.base_model.BaseLLMModel(model_name: str, config: Dict[str, Any])[source]

Bases: ABC

Abstract base class for all Large Language Model implementations.

This class defines the standardized interface that all LLM model wrappers must implement to ensure compatibility and interchangeability within the negotiation platform. It provides the foundation for supporting various model types and APIs through a unified abstraction layer.

The interface encompasses the complete model lifecycle from initialization and loading through response generation and resource cleanup. All methods are designed to support both synchronous and configuration-driven usage patterns common in negotiation research scenarios.

model_name

Unique identifier for the model (e.g., model path, API endpoint, or registry name).

Type:

str

config

Model-specific configuration parameters including generation settings, API keys, and optimization flags.

Type:

Dict[str, Any]

is_loaded

Current loading state of the model, used for resource management and error prevention.

Type:

bool

Example

>>> class CustomModelWrapper(BaseLLMModel):
...     def load_model(self):
...         # Custom implementation
...         self.is_loaded = True
...     def generate_response(self, prompt):
...         # Custom response generation
...         return "Generated response"

Note

Concrete implementations should handle model-specific concerns such as authentication, memory management, GPU utilization, and error recovery while maintaining compatibility with this interface.

__init__(model_name: str, config: Dict[str, Any])[source]

Initialize the base LLM model with configuration.

Sets up the foundational attributes required by all model implementations including the model identifier, configuration parameters, and loading state tracking. Subclasses should call this constructor before performing any model-specific initialization.

Parameters:
  • model_name (str) – Unique identifier for the model, which may represent a file path, model registry name, API endpoint, or other model-specific identifier used for loading.

  • config (Dict[str, Any]) – Comprehensive configuration dictionary containing model-specific parameters such as: - Generation settings (temperature, max_tokens, top_p) - Hardware preferences (device, precision, memory_limits) - API credentials and endpoints (for cloud models) - Optimization flags (quantization, batching, caching)

Example

>>> config = {
...     'temperature': 0.7,
...     'max_new_tokens': 120,
...     'device': 'cuda',
...     'load_in_8bit': True
... }
>>> model = CustomModelWrapper('model-name', config)

Note

The model is initialized in an unloaded state. Call load_model() explicitly to prepare the model for inference operations.

abstract load_model()[source]

Load the model into memory and prepare for inference.

This method performs all necessary initialization steps to make the model ready for generating responses. Implementation should handle model downloading, memory allocation, device placement, and any required preprocessing or optimization steps.

The method should be idempotent - calling it multiple times should not cause errors or unnecessary resource consumption. Implementations should update the is_loaded attribute to reflect the current state.

Raises:
  • RuntimeError – If model loading fails due to insufficient resources, network issues, authentication problems, or other critical errors.

  • FileNotFoundError – If the specified model cannot be located or accessed from the configured source.

  • MemoryError – If insufficient memory is available for model loading.

Example

>>> model = HuggingFaceModelWrapper('model-name', config)
>>> model.load_model()  # Loads model into GPU/CPU memory
>>> assert model.is_loaded == True

Note

Implementations should handle device placement, quantization, memory optimization, and other hardware-specific concerns based on the configuration provided during initialization.

abstract generate_response(prompt: str, **kwargs) str[source]

Generate a text response from the loaded model.

This method performs inference using the loaded model to generate a text response based on the provided prompt. The response should be contextually appropriate for negotiation scenarios and follow any game-specific formatting requirements.

Parameters:
  • prompt (str) – Input text prompt for the model, typically containing negotiation context, game rules, player role, and current situation requiring a response.

  • **kwargs – Additional generation parameters that may override default configuration settings. Common parameters include: - temperature (float): Sampling temperature for randomness - max_new_tokens (int): Maximum tokens to generate - top_p (float): Nucleus sampling parameter - do_sample (bool): Whether to use sampling vs greedy decoding

Returns:

Generated text response from the model, which should be

parseable into structured actions for the negotiation game. The response may contain JSON, natural language, or mixed formats depending on the model and prompt design.

Return type:

str

Raises:
  • RuntimeError – If the model is not loaded or generation fails due to resource constraints or model errors.

  • ValueError – If the prompt is empty, too long, or contains invalid characters that the model cannot process.

Example

>>> prompt = "You are a buyer in a car negotiation. Make an offer."
>>> response = model.generate_response(prompt, temperature=0.5)
>>> print(response)
'{"type": "offer", "price": 25000}'

Note

Implementations should handle prompt preprocessing, generation parameter validation, output post-processing, and error recovery while maintaining consistent response quality.

abstract parse_action(response: str, game_type: str | None = None, player_name: str | None = None) Dict[str, Any][source]

Parse LLM response into a structured negotiation action.

This method converts the raw text output from the model into a standardized action dictionary that can be processed by the game engine. It should handle various response formats, apply validation, and provide error recovery for malformed responses.

Parameters:
  • response (str) – Raw text response from the model, which may contain JSON, structured text, or natural language that needs to be extracted and parsed into action format.

  • game_type (str, optional) – Type of negotiation game being played (e.g., ‘company_car’, ‘resource_allocation’, ‘integrative_negotiations’), used for game-specific parsing and validation rules.

  • player_name (str, optional) – Name or identifier of the player generating the action, used for debugging and error reporting.

Returns:

Structured action dictionary containing at minimum:
  • type (str): Action type (e.g., ‘offer’, ‘accept’, ‘reject’)

  • Additional fields specific to the action type and game: * price (float): For offer/counter actions in price bargaining * gpu_hours/cpu_hours (float): For resource allocation proposals * proposal (dict): For integrative negotiation proposals

Return type:

Dict[str, Any]

Raises:
  • ValueError – If the response cannot be parsed into a valid action or contains invalid values for the specified game type.

  • KeyError – If required action fields are missing from the parsed response for the given game context.

Example

>>> response = '{"type": "offer", "price": 28000}'
>>> action = model.parse_action(response, 'company_car', 'Buyer')
>>> print(action)
{'type': 'offer', 'price': 28000.0}

Note

Implementations should provide robust parsing with fallback strategies for common formatting issues, and should integrate with validation schemas when available for the game type.

abstract unload_model()[source]

Unload model from memory and free associated resources.

This method performs cleanup operations to release memory and other system resources used by the model. It should handle GPU memory cleanup, cache clearing, and any other resource deallocation needed to prepare for model switching or application shutdown.

The method should be idempotent - calling it multiple times should not cause errors. After successful unloading, the is_loaded attribute should be set to False to reflect the model state.

Raises:

RuntimeError – If unloading fails or resources cannot be properly released, though implementations should make best efforts to continue with partial cleanup in such cases.

Example

>>> model.load_model()
>>> assert model.is_loaded == True
>>> model.unload_model()
>>> assert model.is_loaded == False

Note

This method is critical for memory management in multi-model scenarios where models are dynamically loaded and switched. Implementations should ensure thorough cleanup including GPU memory, cached tensors, and any background processes.

get_model_info() Dict[str, Any][source]

Retrieve comprehensive information about the model instance.

This method provides a standardized way to inspect the current state and configuration of a model instance. It returns information useful for debugging, monitoring, logging, and administrative purposes.

Returns:

Information dictionary containing:
  • name (str): The model identifier used during initialization

  • loaded (bool): Current loading state of the model

  • config (Dict[str, Any]): Complete configuration dictionary including all parameters used for model initialization and generation settings

Return type:

Dict[str, Any]

Example

>>> model = HuggingFaceModelWrapper('model-name', {'temp': 0.7})
>>> info = model.get_model_info()
>>> print(info)
{
    'name': 'model-name',
    'loaded': False,
    'config': {'temp': 0.7}
}

Note

This method does not modify the model state and can be called safely at any time. Subclasses may extend this method to include additional model-specific information such as memory usage, device placement, or performance statistics.

HF Model Wrapper Module

Hugging Face Model Wrapper

Comprehensive wrapper implementation for Hugging Face Transformers models, providing plug-and-play integration with the negotiation platform. This module handles the complexities of model loading, memory management, GPU utilization, and response generation for various Hugging Face language models.

The wrapper supports advanced features including quantization, device mapping, memory optimization, and robust error handling. It provides seamless integration with the platform’s action parsing system and includes extensive debugging capabilities for production deployment.

Key Features:
  • Automatic device detection and optimal GPU/CPU placement

  • Memory-efficient loading with quantization support (4-bit, 8-bit)

  • Robust response generation with configurable parameters

  • Intelligent action parsing with game-specific validation

  • Comprehensive error handling and recovery mechanisms

  • Production-ready logging and monitoring capabilities

Supported Models:
  • All Hugging Face Transformers causal language models

  • Meta Llama family (2, 3.1, 3.2)

  • Mistral family (7B, 8B variants)

  • Qwen family (3B, 7B variants)

  • Custom fine-tuned models with compatible architectures

Example

>>> config = {
...     'device': 'auto',
...     'load_in_8bit': True,
...     'temperature': 0.7,
...     'max_new_tokens': 150
... }
>>> model = HuggingFaceModelWrapper('meta-llama/Llama-2-7b-chat-hf', config)
>>> model.load_model()
>>> response = model.generate_response('Negotiate a car price.')
>>> action = model.parse_action(response, 'company_car')

Note

This implementation requires the transformers, torch, and optionally bitsandbytes libraries. GPU support requires CUDA-compatible hardware and appropriate driver installation.

class negotiation_platform.models.hf_model_wrapper.HuggingFaceModelWrapper(model_name: str, config: Dict[str, Any])[source]

Bases: BaseLLMModel

Production-ready wrapper for Hugging Face Transformers models.

This class provides a comprehensive implementation of the BaseLLMModel interface specifically designed for Hugging Face Transformers models. It handles the complexities of modern language model deployment including memory optimization, device management, and robust inference pipelines.

The wrapper supports both research and production scenarios with features like automatic quantization, intelligent device placement, comprehensive error handling, and extensive logging for debugging and monitoring.

tokenizer

Hugging Face tokenizer instance for the model.

model

Loaded Hugging Face model instance ready for inference.

device

Target device for model execution (‘cuda’, ‘cpu’, or ‘auto’).

Type:

str

Configuration Options:
Device and Memory:
  • device (str): Target device (‘cuda’, ‘cpu’, ‘auto’)

  • device_map (str/dict): Advanced device placement strategy

  • load_in_8bit (bool): Enable 8-bit quantization for memory efficiency

  • load_in_4bit (bool): Enable 4-bit quantization for extreme efficiency

Generation Parameters:
  • temperature (float): Sampling temperature (0.0-2.0)

  • max_new_tokens (int): Maximum tokens to generate

  • top_p (float): Nucleus sampling parameter

  • do_sample (bool): Enable sampling vs greedy decoding

  • repetition_penalty (float): Penalty for repetitive text

Authentication:
  • api_token (str): Hugging Face API token for private models

  • trust_remote_code (bool): Allow execution of remote code

Example

>>> config = {
...     'device': 'auto',
...     'load_in_8bit': True,
...     'temperature': 0.7,
...     'max_new_tokens': 120,
...     'api_token': 'hf_token_here'
... }
>>> wrapper = HuggingFaceModelWrapper('model-name', config)
>>> wrapper.load_model()
>>> response = wrapper.generate_response('Your prompt here')

Note

This implementation includes production-ready optimizations and extensive error handling. For development, consider enabling verbose logging to monitor model behavior and performance.

__init__(model_name: str, config: Dict[str, Any])[source]

Initialize Hugging Face model wrapper with intelligent device detection.

Sets up the wrapper with automatic device detection and configuration normalization. Handles the complexity of device selection including CUDA availability checking and fallback strategies for various hardware configurations.

Parameters:
  • model_name (str) – Hugging Face model identifier, which can be: - Repository ID (e.g., ‘meta-llama/Llama-2-7b-chat-hf’) - Local model path for offline usage - Custom model name for privately hosted models

  • config (Dict[str, Any]) – Comprehensive configuration dictionary containing device preferences, generation parameters, memory optimization settings, and authentication credentials.

Example

>>> config = {
...     'device': 'auto',  # Automatic device detection
...     'load_in_8bit': True,  # Memory optimization
...     'temperature': 0.7,  # Generation creativity
...     'max_new_tokens': 150
... }
>>> wrapper = HuggingFaceModelWrapper('model-name', config)

Note

The device setting is intelligently resolved: ‘auto’ selects CUDA if available, otherwise falls back to CPU. Explicit device settings override automatic detection but should match available hardware.

load_model()[source]

Load Hugging Face model and tokenizer with advanced optimization.

Performs comprehensive model loading with memory optimization, device placement, quantization support, and extensive error handling. This method implements production-ready loading strategies including automatic quantization, intelligent device mapping, and detailed progress monitoring.

The loading process includes: - Environment optimization and TorchDynamo configuration - Quantization setup (4-bit/8-bit) for memory efficiency - Tokenizer initialization with authentication handling - Model loading with optimal device placement - Resource monitoring and diagnostic reporting

Raises:
  • RuntimeError – If model loading fails due to insufficient resources, authentication issues, or hardware compatibility problems.

  • FileNotFoundError – If the specified model cannot be found in the Hugging Face Hub or local filesystem.

  • MemoryError – If insufficient GPU/CPU memory is available for the model with current quantization settings.

  • ImportError – If required dependencies (bitsandbytes) are missing for quantization features.

Example

>>> model = HuggingFaceModelWrapper('meta-llama/Llama-2-7b-chat-hf', config)
>>> model.load_model()  # Comprehensive loading with optimization
Loading meta-llama/Llama-2-7b-chat-hf...
✅ Model loaded successfully in 45.2s

Note

This method includes extensive logging and diagnostic output for debugging. It automatically handles quantization, device placement, and memory optimization based on the configuration provided during initialization. The loading process is optimized for both speed and memory efficiency.

generate_response(prompt: str, **kwargs) str[source]

Generate contextually appropriate response using the loaded model.

Performs inference using the loaded Hugging Face model with intelligent parameter management, device handling, and response post-processing. The method prioritizes configuration-based parameters while providing robust error handling and response cleaning for negotiation contexts.

Parameters:
  • prompt (str) – Input prompt for the model, typically containing negotiation context, game rules, and situation requiring response.

  • **kwargs – Additional generation parameters (note: YAML configuration takes precedence over kwargs for consistency). Supported options: - early_stopping (bool): Enable early stopping for beam search - length_penalty (float): Length penalty for generation

Returns:

Generated response text, cleaned and processed for parsing.

The method attempts to extract clean JSON from model output and removes common prefixes/suffixes that interfere with action parsing.

Return type:

str

Raises:
  • RuntimeError – If the model is not loaded or generation fails due to resource constraints, device issues, or model errors.

  • ValueError – If the prompt is empty or contains characters that cause tokenization or generation failures.

Example

>>> prompt = "Make a negotiation offer for the company car."
>>> response = model.generate_response(prompt)
>>> print(response)
'{"type": "offer", "price": 28000}'

Note

This method includes intelligent response cleaning that attempts to extract clean JSON from model outputs and removes common model-generated prefixes. Generation parameters are strictly controlled by YAML configuration to ensure consistent behavior across experiments.

parse_action(response: str, game_type: str | None = None, player_name: str | None = None) Dict[str, Any][source]

Parse LLM response into structured negotiation action with validation.

Implements a comprehensive parsing pipeline that converts raw model output into validated negotiation actions. The method employs multiple parsing strategies, extensive error handling, and optional Pydantic validation for game-specific action constraints.

Parameters:
  • response (str) – Raw text response from the model, which may contain JSON, natural language, or mixed formats requiring extraction.

  • game_type (str, optional) – Type of negotiation game for validation (‘company_car’, ‘resource_allocation’, ‘integrative_negotiations’). Enables game-specific parsing rules and action validation.

  • player_name (str, optional) – Player identifier for enhanced logging and debugging output during parsing operations.

Returns:

Validated action dictionary containing:
  • type (str): Action type (‘offer’, ‘accept’, ‘reject’, ‘propose’)

  • Additional fields specific to action type and game context

  • Fallback ‘noop’ action if parsing fails completely

Return type:

Dict[str, Any]

Parsing Strategy:
  1. Response preprocessing to fix common JSON issues

  2. Priority extraction of JSON from response start

  3. Pattern-based JSON extraction with multiple strategies

  4. Manual key-value extraction as fallback

  5. Optional Pydantic validation for game-specific constraints

Example

>>> response = '{"type": "offer", "price": 28000}'
>>> action = model.parse_action(response, 'company_car', 'Buyer')
>>> print(action)
{'type': 'offer', 'price': 28000.0}

Note

This method includes extensive debugging output and graceful error handling. It integrates with the action_schemas module for Pydantic validation when game_type is specified, providing robust action parsing for research and production scenarios.

unload_model()[source]

Unload model from memory and perform comprehensive cleanup.

Performs thorough cleanup of model resources including GPU memory, cached tensors, and Python object references. This method is essential for memory management in multi-model scenarios and prevents memory leaks during model switching operations.

Cleanup Operations:
  • Deletion of model and tokenizer objects

  • GPU memory cache clearing (CUDA)

  • Python garbage collection triggering

  • Loading state reset

Example

>>> model.load_model()
>>> # ... use model for inference ...
>>> model.unload_model()  # Free all resources
🗑️  meta-llama/Llama-2-7b-chat-hf unloaded

Note

This method is idempotent and safe to call multiple times. It’s automatically called during model switching and should be called explicitly when done with a model to free resources for other models or applications.