Differential Theory

3 minute read

Published:

This post summarizes differential theory on convex functions, including directional derivatives, subgradients, and Legendre transformation.

Characterization of Smoothness

Directional Derivatives

The directional derivative of $f$ at $x$ in the direction $d$ is defined as $$\nabla_d f(x)=\lim\limits_{t\rightarrow 0+}\frac{f(x+td)-f(x)}{t}$$

If $f$ is finite-value and convex, then the directional derivative for fixed $x$ is finite sublinear with respect to $d$.

Subgradients

The subdifferential $\partial f (x)$ of $f$ at $x$ is the nonempty compact convex set of $\mathbb{R}^n$ whose support function is the directional derivative, i.e. $$\partial f(x)=\{s\in\mathbb{R}^n\mid \langle s, d\rangle \leq \nabla_d f(x),\ \forall\ d\in\mathbb{R}^n\}$$ Any vector $s\in \partial f(x)$ is called a subgradient of $x$.

The subgradient can also be defined via the subgradient inequality. We say $s$ is a subgradient of $f$ at $x$ if $$f(y)\geq f(x)+ \langle s, y-x\rangle,\quad \forall\ z\in\mathbb{R}^n.$$

Property

  • (Multiplicability) If $f$ is a proper convex function on $\mathbb{R}^n$, then $\partial(\lambda f)=\lambda \partial f,\ \forall\ x\in\mathbb{R}^n,\lambda>0.$
  • (Additivity) If $f_1$ and $f_2$\text{ri}( are proper convex functions on $\mathbb{R}^n$, then $\partial(f_1+f_2)\supseteq \partial f_1+ \partial f_2,\ \forall\ x\in\mathbb{R}^n$. If the convex sets $\text{ri}(\text{dom} f_1)\cap \text{ri}(\text{dom} f_1)\neq\varnothing$, then $\partial(f_1+f_2)= \partial f_1+ \partial f_2,\ \forall\ x\in\mathbb{R}^n$.
  • (Affine Composition) Let $f(x)=h(Ax)$ where $h$ is a proper convex function on $\mathbb{R}^m$ and $A$ is a linear transform from $\mathbb{R}^n$ to $\mathbb{R}^m$. Then $\partial f\supset A^{\ast}\partial h(Ax),\ \forall\ x\in\mathbb{R}^n$. If $\text{Im}(A)\cap \text{ri}(\text{dom}(h))\neq\varnothing$ or if $h$ is polyhedral and $\text{Im}(A)\cap \text{dom}(h)\neq\varnothing$, then $\partial f= A^{\ast}\partial h(Ax),\ \forall\ x\in\mathbb{R}^n$.

Essential Smoothness and Essential Strictly Convexity

A proper convex function $f$ on $\mathbb{R}^n$ is essentially smooth if it satisfies the following conditions for $C=\text{dom}(f)$:

  • $C\neq\varnothing$;
  • $f$ is differentiable on $C$;
  • $\lim\limits_{\substack{x\rightarrow x_0\\ x\in C}}|\nabla f(x)|=+\infty$ for all $x_0\in\text{bd}(f)$, which is equivalent to $\partial f(x_0)=\varnothing$ for all $x_0\in\text{bd}(f)$.

The subtle difference between essential smoothness and smoothness lies in the boundary behavior of the gradient. A smooth convex function on $\mathbb{R}^n$ is in particular essentially smooth.

(Essential Smoothness and Subgradient) If $f$ is a closed proper convex function, then $f$ is essentially smooth if and only if $$\partial f=\begin{cases}{\nabla f}&\quad x\in\text{int}(\text{dom}(f)),\\\varnothing&,\quad x\not\in\text{int}(\text{dom}(f)).\end{cases}$$

A proper convex function $f$ on $\mathbb{R}^n$ is essentially smooth if $f$ is strictlt convex on any convex subset of $\text{dom}(\partial f)=\{x\mid \partial f(x)\neq \varnothing\}$.

Smoothness and Strictly Convexity

The duality between essential smoothness and esstential strictly convexity can be characterized as follows.

(Essential Smoothness and Essential Strictly Convexity) A closed proper convex function $f$ is esstentially strictly convex if and only if its convex conjugate $f^{\ast}$ is essentially smooth.

(Subgradient) A closed proper convex function $f$ has one-to-one subgradient mappings if and only if $f$ and $f^{\ast}$ are essentially smooth; if and only if $f$ is strictly convex on $\text{int}(\text{dom}(f))$ and is essentially smooth.

The Legendre Transformation

Note that from the definition of convex conjugate $$ \begin{align*} f^{\ast}(u) &= \sup\limits_{x\in\mathbb{R}^n} x^{\top}u - f(x) \end{align*}$$ If $f$ is differentiable (on its domain), we take the derivative and set to zero: $$u-\nabla f(x)=0.$$ If $\nabla f$ is a one-to-one mapping, then $x=[\nabla f]^{-1}(u)$ obtains the supremum. Thus, $$f^{\ast}(u) = \langle [\nabla f]^{-1}(u), u \rangle- f([\nabla f]^{-1}(u)),$$ which is called the Legendre conjugate.