Postdoctoral Researcher

University of Wisconsin—MadisonWisconsin Institute for Discovery

Supervisor: Laurent Lessard

Contact Information

vanscoy@wisc.edu# Recent presentations

Systems Information Learning Optimization (SILO), Madison, 2019 (slides)

International Symposium on Mathematical Programming (ISMP), Bordeaux, 2018 (slides)

International Conference on Machine Learning (ICML), Stockholm, 2018 (slides)

Systems Information Learning Optimization (SILO), Madison, 2017 (slides)

# Research Interests

robust control

optimization

multi-agent systems

# Education

- PhD in Electrical Engineering and Computer Science
Northwestern University, 2012–2017

Advisor: Randy Freeman- MS in Applied Mathematics
University of Akron, 2007–2012

Advisor: Gerald Young- BS in Applied Mathematics
University of Akron, 2007–2012

- BS in Electrical Engineering
University of Akron, 2007–2012

## Forthcoming Publications

## Journal Publications

## Peer-Reviewed Conference Proceedings

## Doctoral Dissertation

## Master's Thesis

# Convex Optimization

Consider a function \(f:\mathbb{R}^d\to\mathbb{R}\). Suppose \(f\) is \(L\)-smooth and \(\mu\)-strongly convex with \(0<\mu\leq L\), and let \(x_\star\in\mathbb{R}^d\) denote the unique minimizer of \(f\). Denote the condition ratio of \(f\) as \(\kappa = L/\mu\).

Given initial conditions \(x_{-1},x_0\in\mathbb{R}^d\), the *Triple Momentum Method* is given by the iteration

\[ \begin{aligned} x_{k+1} &= x_k + \beta\,(x_k-x_{k-1}) - \alpha\,\nabla\!f(y_k) \\ y_k &= x_k + \gamma\,(x_k-x_{k-1}) \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} \alpha &= \frac{1+\rho}{L} & \beta &= \frac{\rho^2}{2-\rho} & \gamma &= \frac{\rho^2}{(1+\rho)(2-\rho)} \end{aligned} \]

with \(\rho = 1-1/\sqrt{\kappa}\). The iterate sequence \(\{x_k\}\) converges linearly with rate \(\rho\) to the optimizer. In particular, the bound

\[ \|x_k-x_\star\| \leq c\,\rho^k \qquad \text{for }k\geq 1 \]

is satisfied where \(c>0\) is a constant that does not depend on \(k\).

For more information, see our paper.

Given initial conditions \(x_{-1},x_0\in\mathbb{R}^d\), the *Robust Momentum Method* is given by the iteration

\[ \begin{aligned} x_{k+1} &= x_k + \beta\,(x_k-x_{k-1}) - \alpha\,\nabla\!f(y_k) \\ y_k &= x_k + \gamma\,(x_k-x_{k-1}) \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} \alpha &= \frac{\kappa\,(1-\rho)^2(1+\rho)}{L} & \beta &= \frac{\kappa\,\rho^3}{\kappa-1} & \gamma &= \frac{\rho^3}{(\kappa-1)(1-\rho)^2(1+\rho)} \end{aligned} \]

with \(\rho\in[1-1/\sqrt{\kappa},1-1/\kappa]\). The iterate sequence \(\{x_k\}\) converges linearly with rate \(\rho\) to the optimizer. In particular, the bound

\[ \|x_k-x_\star\| \leq c\,\rho^k \qquad \text{for }k\geq 1 \]

is satisfied where \(c>0\) is a constant that does not depend on \(k\).

The parameter \(\rho\) directly controls the worst-case convergence rate of the algorithm. In particular, we have the following:

The minimum value is \(\rho = 1-1/\sqrt{\kappa}\). This is the fastest achievable convergence rate, but the resulting algorithm is very fragile to gradient noise. This choice recovers the Triple Momentum Method.

The maximum value is \(\rho = 1-1/\kappa\). This is the slowest achievable convergence rate, but the resulting algorithm is very robust to gradient noise. This choice recovers the Gradient Method with stepsize \(1/L\).

For more information, see our paper.

# Code

We can also analyze the worst-case linear convergence rate of various algorithms numerically. For example, the following code calculates the worst-case linear convergence rate of the following methods when applied to an \(L\)-smooth \(\mu\)-strongly convex function:

Gradient method with step-size \(1/L\)

Gradient method with step-size \(2/(L+\mu)\)

Heavy-ball method

Fast gradient method

Triple momentum method

mu = 1; % strong convexity parameter L = 10; % Lipschitz parameter of gradient FixedStepMethod(mu,L); % Results printed to screen: % % Method Rate % GM (1/L) : 0.9000 % GM (2/(L+mu)) : 0.8182 % HBM : 0.8602 % FGM : 0.7518 % TMM : 0.6838

For more information, including the analysis of the gradient and heavy-ball methods with subspace searches and the fast gradient method with fixed restart, see our paper and our code.

# Dynamic Average Consensus

Consider a group of \(n\) agents connected in a network. Suppose the network is constant, connected, and undirected, and let \(L\) be the weighted Laplacian matrix corresponding to the weighted adjacency matrix \([a_{ij}]\). Denote the minimum and maximum nonzero eigenvalues of \(L\) as \(\lambda_\text{min}\) and \(\lambda_\text{max}\), respectively. Also, define the corresponding condition ratio \(\kappa = \lambda_\text{max}/\lambda_\text{min}\).

Suppose each agent \(i\in\{1,\ldots,n\}\) has a discrete-time input signal \(\{u_k^i\}\), and denote the average signal \(u_k^\text{avg} = \frac{1}{n}\sum_{i=1}^n u_k^i\).

Each of the following algorithms is designed such that each agent tracks the global average \(u_k^\text{avg}\) with zero steady-state error when the input signals are constant. If the input signals are slowly time-varying, then the agents track the average with small steady-state error.

For more information, see our tutorial paper.

### Non-Robust Estimators

First, consider the following integral estimators. The first estimator is comparable to gradient descent, while the second is comparable to the heavy-ball method in optimization.

Given initial conditions \(p_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Accelerated Integral Estimator* is given by the iteration

\[ \begin{aligned} p_{k+1}^i &= p_k^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - p_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{2}{\lambda_\text{max}+\lambda_\text{min}} & &\text{and} & \rho &= \frac{\kappa-1}{\kappa+1}. \end{aligned} \]

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant and the initial conditions satisfy \(\sum_{i=1}^n p_0^i = 0\).

Given initial conditions \(p_{-1}^i,p_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Integral Estimator* is given by the iteration

\[ \begin{aligned} p_{k+1}^i &= p_k^i + \rho^2\,(p_k^i-p_{k-1}^i) + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - p_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{4}{(\sqrt{\lambda_\text{max}}+\sqrt{\lambda_\text{min}})^2} & &\text{and} & \rho &= \frac{\sqrt{\kappa}-1}{\sqrt{\kappa}+1}. \end{aligned} \]

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant and the initial conditions satisfy \(\sum_{i=1}^n p_{-1}^i = 0\) and \(\sum_{i=1}^n p_0^i = 0\).

### Robust Estimators

Both of the previous estimators require correct initialization. In situations where this is undesired, we can instead use estimators which are *robust to initial conditions*, meaning that the steady-state error is independent of the initial conditions.

Given initial conditions \(p_0^i,q_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Proportional Integral Estimator* is given by the iteration

\[ \begin{aligned} q_{k+1}^i &= \gamma\,q_k^i + k_p \sum_{j=1}^n a_{ij}\,\bigl((x_k^i-x_k^j)+(p_k^i-p_k^j)\bigr) \\ p_{k+1}^i &= p_k^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - q_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{1-\rho}{\lambda_\text{min}} & k_p &= \frac{\rho\,(1-\rho)}{\kappa\,(\lambda_\text{min}-(1-\rho)\,\lambda_\text{max})} \end{aligned} \]

and the convergence rate is

\[ \begin{aligned} \rho = \begin{cases} \rho_1, & 1\leq \kappa\leq (3+\sqrt{5})/4 \\ \rho_2, & (3+\sqrt{5})/4\leq \kappa \end{cases} \end{aligned} \]

with

\[ \begin{aligned} \rho_1 &= \frac{\sqrt{(\kappa-1)(4\kappa^3+5\kappa-1)} - (\kappa-1)}{2\,(\kappa^2+1)} \\ \rho_2 &= \frac{8\kappa^2-8\kappa+1}{8\kappa^2-1}. \end{aligned} \]

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant.

For more information, see our paper.

Given initial conditions \(p_{-1}^i,p_0^i,q_{-1}^i,q_0^i\in\mathbb{R}\) for \(i\in\{1,\ldots,n\}\), the *Accelerated Proportional Integral Estimator* is given by the iteration

\[ \begin{aligned} q_{k+1}^i &= 2\,\rho\,q_k^i - \rho^2\,q_{k-1}^i + k_p \sum_{j=1}^n a_{ij}\,\bigl((x_k^i-x_k^j)+(p_k^i-p_k^j)\bigr) \\ p_{k+1}^i &= (1+\rho^2)\,p_k^i - \rho^2\,p_{k-1}^i + k_I \sum_{j=1}^n a_{ij}\,(x_k^i-x_k^j) \\ x_k^i &= u_k^i - q_k^i \end{aligned} \]

where the stepsizes are

\[ \begin{aligned} k_I &= \frac{(1-\rho)^2}{\lambda_\text{min}} & k_p &= \beta\,k_I & \end{aligned} \]

and the convergence rate is

\[ \begin{aligned} \rho = \begin{cases} \rho_1, & 1\leq \kappa\leq (1+\sqrt{2})/2 \\ \rho_2, & (1+\sqrt{2})/2\leq \kappa \end{cases} \end{aligned} \]

with

\[ \begin{aligned} \rho_1 &= \frac{1-\beta-2\,(1-\sqrt{\beta})}{1-\beta} \\ \rho_2 &= \frac{4-\beta+4\,(1-\sqrt{4-\beta})}{\beta} \end{aligned} \]

where \(\beta = 2+(2\sqrt{\kappa\,(\kappa-1)}-1)/\kappa\).

For each agent \(i\in\{1,\ldots,n\}\), the iterate sequence \(\{x_k^i\}\) converges linearly with rate \(\rho\) to the average signal \(u_k^\text{avg}\) if the input signals \(\{u_k^i\}\) are constant.

For more information, see our paper.

## Summary

The following figure plots the number of iterations to converge as a function of the condition ratio of the graph Laplacian for each of the four algorithms. The integral estimator converges faster than the corresponding proportional integral estimator (both non-accelerated and accelerated versions). However, the proportional integral estimator is *robust to initial conditions*. Such algorithms can adapt to changes in the communication graph which may occur due to dropped packets, or if the agents are mobile and have a limited communication range.