Recent presentations

• International Symposium on Mathematical Programming (ISMP), Bordeaux, 2018 (slides)

• International Conference on Machine Learning (ICML), Stockholm, 2018 (slides)

Research Interests

• robust control

• optimization

• multi-agent systems

Education

PhD in Electrical Engineering and Computer Science

Northwestern University, 2012–2017

MS in Applied Mathematics

University of Akron, 2007–2012

BS in Applied Mathematics

University of Akron, 2007–2012

BS in Electrical Engineering

University of Akron, 2007–2012

Convex Optimization

Consider a function $$f:\mathbb{R}^d\to\mathbb{R}$$. Suppose $$f$$ is $$L$$-smooth and $$\mu$$-strongly convex with $$0<\mu\leq L$$, and let $$x_\star\in\mathbb{R}^d$$ denote the unique minimizer of $$f$$. Denote the condition ratio of $$f$$ as $$\kappa = L/\mu$$.

Triple Momentum Method

Given initial conditions $$x_{-1},x_0\in\mathbb{R}^d$$, the Triple Momentum Method is given by the iteration

\begin{aligned} x_{k+1} &= x_k + \beta\,(x_k-x_{k-1}) - \alpha\,\nabla\!f(y_k) \\ y_k &= x_k + \gamma\,(x_k-x_{k-1}) \end{aligned}

where the stepsizes are

\begin{aligned} \alpha &= \frac{1+\rho}{L} & \beta &= \frac{\rho^2}{2-\rho} & \gamma &= \frac{\rho^2}{(1+\rho)(2-\rho)} \end{aligned}

with $$\rho = 1-1/\sqrt{\kappa}$$. The iterate sequence $$\{x_k\}$$ converges linearly with rate $$\rho$$ to the optimizer. In particular, the bound

$\|x_k-x_\star\| \leq c\,\rho^k \qquad \text{for }k\geq 1$

is satisfied where $$c>0$$ is a constant that does not depend on $$k$$.

Robust Momentum Method

Given initial conditions $$x_{-1},x_0\in\mathbb{R}^d$$, the Robust Momentum Method is given by the iteration

\begin{aligned} x_{k+1} &= x_k + \beta\,(x_k-x_{k-1}) - \alpha\,\nabla\!f(y_k) \\ y_k &= x_k + \gamma\,(x_k-x_{k-1}) \end{aligned}

where the stepsizes are

\begin{aligned} \alpha &= \frac{\kappa\,(1-\rho)^2(1+\rho)}{L} & \beta &= \frac{\kappa\,\rho^3}{\kappa-1} & \gamma &= \frac{\rho^3}{(\kappa-1)(1-\rho)^2(1+\rho)} \end{aligned}

with $$\rho\in[1-1/\sqrt{\kappa},1-1/\kappa]$$. The iterate sequence $$\{x_k\}$$ converges linearly with rate $$\rho$$ to the optimizer. In particular, the bound

$\|x_k-x_\star\| \leq c\,\rho^k \qquad \text{for }k\geq 1$

is satisfied where $$c>0$$ is a constant that does not depend on $$k$$.

The parameter $$\rho$$ directly controls the worst-case convergence rate of the algorithm. In particular, we have the following:

• The minimum value is $$\rho = 1-1/\sqrt{\kappa}$$. This is the fastest achievable convergence rate, but the resulting algorithm is very fragile to gradient noise. This choice recovers the Triple Momentum Method.

• The maximum value is $$\rho = 1-1/\kappa$$. This is the slowest achievable convergence rate, but the resulting algorithm is very robust to gradient noise. This choice recovers the Gradient Method with stepsize $$1/L$$.

Code

We can also analyze the worst-case linear convergence rate of various algorithms numerically. For example, the following code calculates the worst-case linear convergence rate of the following methods when applied to an $$L$$-smooth $$\mu$$-strongly convex function:

• Gradient method with step-size $$1/L$$

• Gradient method with step-size $$2/(L+\mu)$$

• Heavy-ball method

• Triple momentum method

mu = 1;   % strong convexity parameter
L  = 10;  % Lipschitz parameter of gradient

FixedStepMethod(mu,L);

% Results printed to screen:
%
%        Method     Rate
%      GM (1/L) : 0.9000
% GM (2/(L+mu)) : 0.8182
%           HBM : 0.8602
%           FGM : 0.7518
%           TMM : 0.6838

For more information, including the analysis of the gradient and heavy-ball methods with subspace searches and the fast gradient method with fixed restart, see our paper and our code.

Dynamic Average Consensus

Consider a group of $$n$$ agents connected in a network. Suppose the network is constant, connected, and undirected, and let $$L$$ be the weighted Laplacian matrix corresponding to the weighted adjacency matrix $$[a_{ij}]$$. Denote the minimum and maximum nonzero eigenvalues of $$L$$ as $$\lambda_\text{min}$$ and $$\lambda_\text{max}$$, respectively. Also, define the corresponding condition ratio $$\kappa = \lambda_\text{max}/\lambda_\text{min}$$.

Suppose each agent $$i\in\{1,\ldots,n\}$$ has a discrete-time input signal $$\{u_k^i\}$$, and denote the average signal $$u_k^\text{avg} = \frac{1}{n}\sum_{i=1}^n u_k^i$$.

Each of the following algorithms is designed such that each agent tracks the global average $$u_k^\text{avg}$$ with zero steady-state error when the input signals are constant. If the input signals are slowly time-varying, then the agents track the average with small steady-state error.

Non-Robust Estimators

First, consider the following integral estimators. The first estimator is comparable to gradient descent, while the second is comparable to the heavy-ball method in optimization.

Integral Estimator

Given initial conditions $$p_0^i\in\mathbb{R}$$ for $$i\in\{1,\ldots,n\}$$, the Accelerated Integral Estimator is given by the iteration

\begin{aligned} p_{k+1}^i &= p_k^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - p_k^i \end{aligned}

where the stepsizes are

\begin{aligned} k_I &= \frac{2}{\lambda_\text{max}+\lambda_\text{min}} & &\text{and} & \rho &= \frac{\kappa-1}{\kappa+1}. \end{aligned}

For each agent $$i\in\{1,\ldots,n\}$$, the iterate sequence $$\{x_k^i\}$$ converges linearly with rate $$\rho$$ to the average signal $$u_k^\text{avg}$$ if the input signals $$\{u_k^i\}$$ are constant and the initial conditions satisfy $$\sum_{i=1}^n p_0^i = 0$$.

Accelerated Integral Estimator

Given initial conditions $$p_{-1}^i,p_0^i\in\mathbb{R}$$ for $$i\in\{1,\ldots,n\}$$, the Integral Estimator is given by the iteration

\begin{aligned} p_{k+1}^i &= p_k^i + \rho^2\,(p_k^i-p_{k-1}^i) + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - p_k^i \end{aligned}

where the stepsizes are

\begin{aligned} k_I &= \frac{4}{(\sqrt{\lambda_\text{max}}+\sqrt{\lambda_\text{min}})^2} & &\text{and} & \rho &= \frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}. \end{aligned}

For each agent $$i\in\{1,\ldots,n\}$$, the iterate sequence $$\{x_k^i\}$$ converges linearly with rate $$\rho$$ to the average signal $$u_k^\text{avg}$$ if the input signals $$\{u_k^i\}$$ are constant and the initial conditions satisfy $$\sum_{i=1}^n p_{-1}^i = 0$$ and $$\sum_{i=1}^n p_0^i = 0$$.

Robust Estimators

Both of the previous estimators require correct initialization. In situations where this is undesired, we can instead use estimators which are robust to initial conditions, meaning that the steady-state error is independent of the initial conditions.

Proportional Integral Estimator

Given initial conditions $$p_0^i,q_0^i\in\mathbb{R}$$ for $$i\in\{1,\ldots,n\}$$, the Proportional Integral Estimator is given by the iteration

\begin{aligned} q_{k+1}^i &= \gamma\,q_k^i + k_p \sum_{j=1}^n a_{ij}\,\bigl((x_k^i-x_k^j)+(p_k^i-p_k^j)\bigr) \\ p_{k+1}^i &= p_k^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - q_k^i \end{aligned}

where the stepsizes are

\begin{aligned} k_I &= \frac{1-\rho}{\lambda_\text{min}} & k_p &= \frac{\rho\,(1-\rho)}{\kappa\,(\lambda_\text{min}-(1-\rho)\,\lambda_\text{max})} \end{aligned}

and the convergence rate is

\begin{aligned} \rho = \begin{cases} \rho_1, & 1\leq \kappa\leq (3+\sqrt{5})/4 \\ \rho_2, & (3+\sqrt{5})/4\leq \kappa \end{cases} \end{aligned}

with

\begin{aligned} \rho_1 &= \frac{\sqrt{(\kappa-1)(4\kappa^3+5\kappa-1)} - (\kappa-1)}{2\,(\kappa^2+1)} \\ \rho_2 &= \frac{8\kappa^2-8\kappa+1}{8\kappa^2-1}. \end{aligned}

For each agent $$i\in\{1,\ldots,n\}$$, the iterate sequence $$\{x_k^i\}$$ converges linearly with rate $$\rho$$ to the average signal $$u_k^\text{avg}$$ if the input signals $$\{u_k^i\}$$ are constant.

Accelerated Proportional Integral Estimator

Given initial conditions $$p_{-1}^i,p_0^i,q_{-1}^i,q_0^i\in\mathbb{R}$$ for $$i\in\{1,\ldots,n\}$$, the Accelerated Proportional Integral Estimator is given by the iteration

\begin{aligned} q_{k+1}^i &= 2\,\rho\,q_k^i - \rho^2\,q_{k-1}^i + k_p \sum_{j=1}^n a_{ij}\,\bigl((x_k^i-x_k^j)+(p_k^i-p_k^j)\bigr) \\ p_{k+1}^i &= (1+\rho^2)\,p_k^i - \rho^2\,p_{k-1}^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - q_k^i \end{aligned}

where the stepsizes are

\begin{aligned} k_I &= \frac{(1-\rho)^2}{\lambda_\text{min}} & k_p &= \beta\,k_I & \end{aligned}

and the convergence rate is

\begin{aligned} \rho = \begin{cases} \rho_1, & 1\leq \kappa\leq (1+\sqrt{2})/2 \\ \rho_2, & (1+\sqrt{2})/2\leq \kappa \end{cases} \end{aligned}

with

\begin{aligned} \rho_1 &= \frac{1-\beta-2\,(1-\sqrt{\beta})}{1-\beta} \\ \rho_2 &= \frac{4-\beta+4\,(1-\sqrt{4-\beta})}{\beta} \end{aligned}

where $$\beta = 2+(2\sqrt{\kappa\,(\kappa-1)}-1)/\kappa$$.

For each agent $$i\in\{1,\ldots,n\}$$, the iterate sequence $$\{x_k^i\}$$ converges linearly with rate $$\rho$$ to the average signal $$u_k^\text{avg}$$ if the input signals $$\{u_k^i\}$$ are constant.