Tabular and Deep RL for Gittins Index
Harshit Dhankhar,Kshitij Mishra,Tejas Bodas
@inproceedings{bib_Tabu_2025, AUTHOR = {Dhankhar, Harshit and Mishra, Kshitij and Bodas, Tejas }, TITLE = {Tabular and Deep RL for Gittins Index}, BOOKTITLE = {WiOpt}. YEAR = {2025}}
n the realm of multi-armed bandit problems, the
Gittins index policy is known to be optimal in maximizing
the expected total discounted reward obtained from pulling the
Markovian arms. In most realistic scenarios however, the Markovian state transition probabilities are unknown and therefore
the Gittins indices cannot be computed. One can then resort to
reinforcement learning (RL) algorithms that explore the state
space to learn these indices while exploiting to maximize the
reward collected. In this work, we propose tabular (QGI) and
Deep RL (DGN) algorithms for learning the Gittins index that are
based on the retirement formulation for the multi-armed bandit
problem. When compared with existing RL algorithms that learn
the Gittins index, our algorithms have a lower run time, require
less storage space (small Q-table size in QGI and smaller replay
buffer in DGN), and illustrate better empirical convergence to the
Gittins index. This makes our algorithm well suited for problems
with large state spaces and is a viable alternative to existing
methods. As a key application, we demonstrate the use of our
algorithms in minimizing the mean flowtime in a job scheduling
problem when jobs are available in batches and have an unknown
service time distribution.
bayesian optimization for dynamic pricing and learning (this is journal version)
Anush Anand,Pranav Agrawal,Tejas Bodas
Performance Evaluation, PeEv, 2025
@inproceedings{bib_baye_2025, AUTHOR = {Anand, Anush and Agrawal, Pranav and Bodas, Tejas }, TITLE = {bayesian optimization for dynamic pricing and learning (this is journal version)}, BOOKTITLE = {Performance Evaluation}. YEAR = {2025}}
Dynamic pricing is the practice of adjusting the selling price of a product to maximize a firm’s revenue by responding to market demand. The literature typically distinguishes between two settings: infinite inventory, where the firm has unlimited stock and time to sell, and finite inventory, where both inventory and selling horizon are limited. In both cases, the central challenge lies in the fact that the demand function — how sales respond to price — is unknown and must be learned from data. Traditional approaches often assume a specific parametric form for the demand function, enabling the use of reinforcement learning (RL) to identify near-optimal pricing strategies. However, such assumptions may not hold in real-world scenarios, limiting the applicability of these methods.
In this work, we propose a Gaussian Process (GP) based nonparametric approach to dynamic pricing that avoids restrictive modeling assumptions. We treat the demand function as a black-box function of the price and develop pricing algorithms based on Bayesian Optimization (BO)—a sample-efficient method for optimizing unknown functions. We present BO-based algorithms tailored for both infinite and finite inventory settings and provide regret guarantees for both regimes, thereby quantifying the learning efficiency of our methods. Through extensive experiments, we demonstrate that our BO-based methods outperform several state-of-the-art RL algorithms in terms of revenue, while requiring fewer assumptions and offering greater robustness. This highlights Bayesian Optimization as a powerful and practical tool for dynamic pricing in complex, uncertain environments.
Bayesian Optimization for Dynamic pricing and learning
Anush Anand,Pranav Agrawal,Tejas Bodas
IFIP Performance, IFIP-P, 2025
@inproceedings{bib_Baye_2025, AUTHOR = {Anand, Anush and Agrawal, Pranav and Bodas, Tejas }, TITLE = {Bayesian Optimization for Dynamic pricing and learning}, BOOKTITLE = {IFIP Performance}. YEAR = {2025}}
Dynamic pricing is the practice of adjusting the selling price of a product to maximize a firm’s revenue by responding to market demand. The literature typically distinguishes between two settings: infinite inventory, where the firm has unlimited stock and time to sell, and finite inventory, where both inventory and selling horizon are limited. In both cases, the central challenge lies in the fact that the demand function — how sales respond to price — is unknown and must be learned from data. Traditional approaches often assume a specific parametric form for the demand function, enabling the use of reinforcement learning (RL) to identify near-optimal pricing strategies. However, such assumptions may not hold in real-world scenarios, limiting the applicability of these methods.
In this work, we propose a Gaussian Process (GP) based nonparametric approach to dynamic pricing that avoids restrictive modeling assumptions. We treat the demand function as a black-box function of the price and develop pricing algorithms based on Bayesian Optimization (BO)—a sample-efficient method for optimizing unknown functions. We present BO-based algorithms tailored for both infinite and finite inventory settings and provide regret guarantees for both regimes, thereby quantifying the learning efficiency of our methods. Through extensive experiments, we demonstrate that our BO-based methods outperform several state-of-the-art RL algorithms in terms of revenue, while requiring fewer assumptions and offering greater robustness. This highlights Bayesian Optimization as a powerful and practical tool for dynamic pricing in complex, uncertain environments.
CGD: Modifying the Loss Landscape by Gradient Regularization
Shikhar Saxena,Tejas Bodas,Arti Yardi
International Conference on Learning and Intelligent Optimization, LION, 2025
Abs | | bib Tex
@inproceedings{bib_CGD:_2025, AUTHOR = {Saxena, Shikhar and Bodas, Tejas and Yardi, Arti }, TITLE = {CGD: Modifying the Loss Landscape by Gradient Regularization}, BOOKTITLE = {International Conference on Learning and Intelligent Optimization}. YEAR = {2025}}
Line-search methods are commonly used to solve optimization problems. The simplest line search method is steepest descent where one always moves in the direction of the negative gradient. Newton's method on the other hand is a second-order method that uses the curvature information in the Hessian to pick the descent direction. In this work, we propose a new line-search method called Constrained Gradient Descent (CGD) that implicitly changes the landscape of the objective function for efficient optimization. CGD is formulated as a solution to the constrained version of the original problem where the constraint is on a function of the gradient. We optimize the corresponding Lagrangian function thereby favourably changing the landscape of the objective function. This results in a line search procedure where the Lagrangian penalty acts as a control over the descent direction and can therefore be used to iterate over points that have smaller gradient values, compared to iterates of vanilla steepest descent. We establish global linear convergence rates for CGD and provide numerical experiments on synthetic test functions to illustrate the performance of CGD. We also provide two practical variants of CGD, CGD-FD which is a Hessian free variant and CGD-QN, a quasi-Newton variant and demonstrate their effectiveness.
Methods and systems for obtaining end-to-end classification model using mixture density network
Tejas Bodas
United States Patent, Us patent, 2024
@inproceedings{bib_Meth_2024, AUTHOR = {Bodas, Tejas }, TITLE = {Methods and systems for obtaining end-to-end classification model using mixture density network}, BOOKTITLE = {United States Patent}. YEAR = {2024}}
Methods and systems for obtaining end-to-end classification model using mixture density network
Load balancing policies without feedback using timed replicas
Rooji Jinan,Ajay Badita,Tejas Bodas,Parimal Parag
Performance Evaluation, PeEv, 2023
@inproceedings{bib_Load_2023, AUTHOR = {Jinan, Rooji and Badita, Ajay and Bodas, Tejas and Parag, Parimal }, TITLE = {Load balancing policies without feedback using timed replicas}, BOOKTITLE = {Performance Evaluation}. YEAR = {2023}}
Bayesian Optimization for Function Compositions with Applications to Dynamic Pricing
Kunal Jain,Prabuchandran K. J.,Tejas Bodas
International Conference on Learning and Intelligent Optimization, LION, 2023
@inproceedings{bib_Baye_2023, AUTHOR = {Jain, Kunal and J., Prabuchandran K. and Bodas, Tejas }, TITLE = {Bayesian Optimization for Function Compositions with Applications to Dynamic Pricing}, BOOKTITLE = {International Conference on Learning and Intelligent Optimization}. YEAR = {2023}}
Bayesian Optimization (BO) is used to find the global optima of black box functions. In this work, we propose a practical BO method of function compositions where the form of the composition is known but the constituent functions are expensive to evaluate. By assuming an independent Gaussian process (GP) model for each of the constituent black-box function, we propose EI and UCB based BO algorithms and demonstrate their ability to outperform vanilla BO and the current state-of-art algorithms. We demonstrate a novel application of the proposed methods to dynamic pricing in revenue management when the underlying demand function is expensive to evaluate.
Practical First-Order Bayesian Optimization Algorithms
Utkarsh Prakash,Aryan Chollera,Kushagra Khatwani,Prabuchandran K. J,Tejas Bodas
@inproceedings{bib_Prac_2023, AUTHOR = {Prakash, Utkarsh and Chollera, Aryan and Khatwani, Kushagra and J, Prabuchandran K. and Bodas, Tejas }, TITLE = {Practical First-Order Bayesian Optimization Algorithms}, BOOKTITLE = {CODS COMDS}. YEAR = {2023}}
First Order Bayesian Optimization (FOBO) is a sample efficient sequential approach to find the global maxima of an expensive-to-evaluate black-box objective function by suitably querying for the function and its gradient evaluations. Such methods assume Gaussian process (GP) models for both, the function and its gradient, and use them to construct an acquisition function that identifies the next query point. In this paper, we propose a class of practical FOBO algorithms that efficiently utilizes the information from the gradient GP to identify potential query points with zero gradients. We construct a multi-level acquisition function where in the first step, we optimize a lower level acquisition function with multiple restarts to identify potential query points with zero gradient value. We then use the upper level acquisition function to rank these query points based on their function values to potentially identify the global maxima. As a final step, the potential point of maxima is chosen as the actual query point. We validate the performance of our proposed algorithms on several test functions and show that our algorithms outperform state-of-the-art FOBO algorithms. We also illustrate the application of our algorithms in finding optimal set of hyper-parameters in machine learning and in learning the optimal policy in reinforcement learning tasks.
Bayesian Optimization for Function Compositions with Applications to Dynamic Pricing
Kunal Jain,Prabuchandran K. J,Tejas Bodas
Technical Report, arXiv, 2023
@inproceedings{bib_Baye_2023, AUTHOR = {Jain, Kunal and J, Prabuchandran K. and Bodas, Tejas }, TITLE = {Bayesian Optimization for Function Compositions with Applications to Dynamic Pricing}, BOOKTITLE = {Technical Report}. YEAR = {2023}}
Bayesian Optimization (BO) is used to find the global optima of black box functions. In this work, we propose a practical BO method of function compositions where the form of the composition is known but the constituent functions are expensive to evaluate. By assuming an independent Gaussian process (GP) model for each of the constituent black-box function, we propose EI and UCB based BO algorithms and demonstrate their ability to outperform vanilla BO and the current state-of-art algorithms. We demonstrate a novel application of the proposed methods to dynamic pricing in revenue management when the underlying demand function is expensive to evaluate.