Postdoctoral Researcher

University of Wisconsin—MadisonWisconsin Institute for Discovery

Supervisor: Laurent Lessard

Contact Information

vanscoy@wisc.edu# Recent presentations

International Symposium on Mathematical Programming (ISMP), Bordeaux, 2018 (slides)

International Conference on Machine Learning (ICML), Stockholm, 2018 (slides)

# Research Interests

robust control

optimization

multi-agent systems

# Education

- PhD in Electrical Engineering and Computer Science
Northwestern University, 2012–2017

Advisor: Randy Freeman- MS in Applied Mathematics
University of Akron, 2007–2012

Advisor: Gerald Young- BS in Applied Mathematics
University of Akron, 2007–2012

- BS in Electrical Engineering
University of Akron, 2007–2012

## Forthcoming Publications

## Journal Publications

## Peer-Reviewed Conference Proceedings

## Doctoral Dissertation

## Master's Thesis

# Convex Optimization

Consider a function \(f:\mathbb{R}^d\to\mathbb{R}\). Suppose \(f\) is \(L\)-smooth and \(\mu\)-strongly convex with \(0<\mu\leq L\), and let \(x_\star\in\mathbb{R}^d\) denote the unique minimizer of \(f\). Denote the condition ratio of \(f\) as \(\kappa = L/\mu\).

Given initial conditions \(x_{-1},x_0\in\mathbb{R}^d\), the *Triple Momentum Method* is given by the iteration

\[ \begin{aligned} x_{k+1} &= x_k + \beta\,(x_k-x_{k-1}) - \alpha\,\nabla\!f(y_k) \\ y_k &= x_k + \gamma\,(x_k-x_{k-1}) \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} \alpha &= \frac{1+\rho}{L} & \beta &= \frac{\rho^2}{2-\rho} & \gamma &= \frac{\rho^2}{(1+\rho)(2-\rho)} \end{aligned} \]

with \(\rho = 1-1/\sqrt{\kappa}\). The iterate sequence \(\{x_k\}\) converges linearly with rate \(\rho\) to the optimizer. In particular, the bound

\[ \|x_k-x_\star\| \leq c\,\rho^k \qquad \text{for }k\geq 1 \]

is satisfied where \(c>0\) is a constant that does not depend on \(k\).

For more information, see our paper.

Given initial conditions \(x_{-1},x_0\in\mathbb{R}^d\), the *Robust Momentum Method* is given by the iteration

\[ \begin{aligned} x_{k+1} &= x_k + \beta\,(x_k-x_{k-1}) - \alpha\,\nabla\!f(y_k) \\ y_k &= x_k + \gamma\,(x_k-x_{k-1}) \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} \alpha &= \frac{\kappa\,(1-\rho)^2(1+\rho)}{L} & \beta &= \frac{\kappa\,\rho^3}{\kappa-1} & \gamma &= \frac{\rho^3}{(\kappa-1)(1-\rho)^2(1+\rho)} \end{aligned} \]

with \(\rho\in[1-1/\sqrt{\kappa},1-1/\kappa]\). The iterate sequence \(\{x_k\}\) converges linearly with rate \(\rho\) to the optimizer. In particular, the bound

\[ \|x_k-x_\star\| \leq c\,\rho^k \qquad \text{for }k\geq 1 \]

is satisfied where \(c>0\) is a constant that does not depend on \(k\).

The parameter \(\rho\) directly controls the worst-case convergence rate of the algorithm. In particular, we have the following:

The minimum value is \(\rho = 1-1/\sqrt{\kappa}\). This is the fastest achievable convergence rate, but the resulting algorithm is very fragile to gradient noise. This choice recovers the Triple Momentum Method.

The maximum value is \(\rho = 1-1/\kappa\). This is the slowest achievable convergence rate, but the resulting algorithm is very robust to gradient noise. This choice recovers the Gradient Method with stepsize \(1/L\).

For more information, see our paper.

# Code

We can also analyze the worst-case linear convergence rate of various algorithms numerically. For example, the following code calculates the worst-case linear convergence rate of the following methods when applied to an \(L\)-smooth \(\mu\)-strongly convex function:

Gradient method with step-size \(1/L\)

Gradient method with step-size \(2/(L+\mu)\)

Heavy-ball method

Fast gradient method

Triple momentum method

mu = 1; % strong convexity parameter L = 10; % Lipschitz parameter of gradient FixedStepMethod(mu,L); % Results printed to screen: % % Method Rate % GM (1/L) : 0.9000 % GM (2/(L+mu)) : 0.8182 % HBM : 0.8602 % FGM : 0.7518 % TMM : 0.6838

For more information, including the analysis of the gradient and heavy-ball methods with subspace searches and the fast gradient method with fixed restart, see our paper and our code.

# Dynamic Average Consensus

Consider a group of \(n\) agents connected in a network. Suppose the network is constant, connected, and undirected, and let \(L\) be the weighted Laplacian matrix corresponding to the weighted adjacency matrix \([a_{ij}]\). Denote the minimum and maximum nonzero eigenvalues of \(L\) as \(\lambda_\text{min}\) and \(\lambda_\text{max}\), respectively. Also, define the corresponding condition ratio \(\kappa = \lambda_\text{max}/\lambda_\text{min}\).

Suppose each agent \(i\in\{1,\ldots,n\}\) has a discrete-time input signal \(\{u_k^i\}\), and denote the average signal \(u_k^\text{avg} = \frac{1}{n}\sum_{i=1}^n u_k^i\).

Each of the following algorithms is designed such that each agent tracks the global average \(u_k^\text{avg}\) with zero steady-state error when the input signals are constant. If the input signals are slowly time-varying, then the agents track the average with small steady-state error.

For more information, see our tutorial paper.

### Non-Robust Estimators

First, consider the following integral estimators. The first estimator is comparable to gradient descent, while the second is comparable to the heavy-ball method in optimization.

Given initial conditions \(p_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Accelerated Integral Estimator* is given by the iteration

\[ \begin{aligned} p_{k+1}^i &= p_k^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - p_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{2}{\lambda_\text{max}+\lambda_\text{min}} & &\text{and} & \rho &= \frac{\kappa-1}{\kappa+1}. \end{aligned} \]

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant and the initial conditions satisfy \(\sum_{i=1}^n p_0^i = 0\).

Given initial conditions \(p_{-1}^i,p_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Integral Estimator* is given by the iteration

\[ \begin{aligned} p_{k+1}^i &= p_k^i + \rho^2\,(p_k^i-p_{k-1}^i) + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - p_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{4}{(\sqrt{\lambda_\text{max}}+\sqrt{\lambda_\text{min}})^2} & &\text{and} & \rho &= \frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}. \end{aligned} \]

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant and the initial conditions satisfy \(\sum_{i=1}^n p_{-1}^i = 0\) and \(\sum_{i=1}^n p_0^i = 0\).

### Robust Estimators

Both of the previous estimators require correct initialization. In situations where this is undesired, we can instead use estimators which are *robust to initial conditions*, meaning that the steady-state error is independent of the initial conditions.

Given initial conditions \(p_0^i,q_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Proportional Integral Estimator* is given by the iteration

\[ \begin{aligned} q_{k+1}^i &= \gamma\,q_k^i + k_p \sum_{j=1}^n a_{ij}\,\bigl((x_k^i-x_k^j)+(p_k^i-p_k^j)\bigr) \\ p_{k+1}^i &= p_k^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - q_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{1-\rho}{\lambda_\text{min}} & k_p &= \frac{\rho\,(1-\rho)}{\kappa\,(\lambda_\text{min}-(1-\rho)\,\lambda_\text{max})} \end{aligned} \]

and the convergence rate is

\[ \begin{aligned} \rho = \begin{cases} \rho_1, & 1\leq \kappa\leq (3+\sqrt{5})/4 \\ \rho_2, & (3+\sqrt{5})/4\leq \kappa \end{cases} \end{aligned} \]

with

\[ \begin{aligned} \rho_1 &= \frac{\sqrt{(\kappa-1)(4\kappa^3+5\kappa-1)} - (\kappa-1)}{2\,(\kappa^2+1)} \\ \rho_2 &= \frac{8\kappa^2-8\kappa+1}{8\kappa^2-1}. \end{aligned} \]

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant.

For more information, see our paper.

Given initial conditions \(p_{-1}^i,p_0^i,q_{-1}^i,q_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Accelerated Proportional Integral Estimator* is given by the iteration

\[ \begin{aligned} q_{k+1}^i &= 2\,\rho\,q_k^i - \rho^2\,q_{k-1}^i + k_p \sum_{j=1}^n a_{ij}\,\bigl((x_k^i-x_k^j)+(p_k^i-p_k^j)\bigr) \\ p_{k+1}^i &= (1+\rho^2)\,p_k^i - \rho^2\,p_{k-1}^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - q_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{(1-\rho)^2}{\lambda_\text{min}} & k_p &= \beta\,k_I & \end{aligned} \]

and the convergence rate is

\[ \begin{aligned} \rho = \begin{cases} \rho_1, & 1\leq \kappa\leq (1+\sqrt{2})/2 \\ \rho_2, & (1+\sqrt{2})/2\leq \kappa \end{cases} \end{aligned} \]

with

\[ \begin{aligned} \rho_1 &= \frac{1-\beta-2\,(1-\sqrt{\beta})}{1-\beta} \\ \rho_2 &= \frac{4-\beta+4\,(1-\sqrt{4-\beta})}{\beta} \end{aligned} \]

where \(\beta = 2+(2\sqrt{\kappa\,(\kappa-1)}-1)/\kappa\).

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant.

For more information, see our paper.

## Summary

The following figure plots the number of iterations to converge as a function of the condition ratio of the graph Laplacian for each of the four algorithms. The integral estimator converges faster than the corresponding proportional integral estimator (both non-accelerated and accelerated versions). However, the proportional integral estimator is *robust to initial conditions*. Such algorithms can adapt to changes in the communication graph which may occur due to dropped packets, or if the agents are mobile and have a limited communication range.