large blue square

← Functions, Operators, and Derivatives 

A quick trip through some essential concepts and language.


Terminology in pure mathematics and in applied mathematics and theoretical physics is not always in perfect agreement. This account lies seeks to bridge the gap between the two. It is necessary to define terms. I lean toward the terminology and concepts of pure mathematics, because in pure mathematics everything is accurately and precisely defined. A proper understanding of physics requires a proper understanding of the application of mathematics in physics. Use of other language can lead to misapplication. The foundation of mathematics lies in definitions made in set theory. Set theory tends to reduce to hieroglyphics, designed to eliminate any possible trace of ambiguity. That can be unnecessarily difficult to read, but the more intuitive treatments of applied mathematics often mistate what is said in pure mathematics. To clarify what is said in a rigorous treatment, I will translate the formal language of pure mathematics into less formal language appropriate to applied mathematics. It will be sufficient to work with an intuitive understanding that a set is a finite collection of objects.

Functions and Graphs

A function, or a map, may be defined between any two sets, X and Y. Often, in applied mathematics, X and Y are sets of scalars. In pure mathematics, generally, as a matter of policy, we do not specify what we are talking about. I do not define a function to mean necessarily a numerical function. Any sets, X and Y, may be used. In pure mathematics, X and Y can be infinite sets, for example real or complex numbers, but, in the application of mathematics to physical problems, it is sufficient to use finite sets containing a range and density of numbers appropriate to the accuracy of physical measurement.

Definition:  A functionf : X → Y, is a set of ordered pairs, (xy) with x in X and y in Y, such that, for every x, there is a unique pair (xy) in f.

It follows that, for each x, f specifies a particular y. We say that f is a function of x, and we write, f : x → y, or y = f (x). We should avoid the common notation, f = f (x). It has no merit over correct notation, f : x → f (x), and literally means that a function is equal to one of its values. Although, in context, what is meant is usually clear enough, the use of such language leads to a lack of clarity of thought, which, in turn, leads to an increasing level of difficulty with more sophisticated concepts. I have known highly reputable physicists make absurd errors in differential geometry because they do not have the language to distinguish a function of spacetime from the value of that function at a particular point.

operators2-1 A function is continuous if its graph can be plotted without breaks. X and Y are not necessarily sets of scalars.
Definition:  The graph of a function f : X → Y is the set of ordered pairs, (xf (x)).
Definition:  If, for sufficiently small dx and some l in Y,  f (a + dx) − l  is negligible, then l is the limit of f (x) as x tends to a. We write f (x) → l as x → a, or operators2-2 
Definition:  If, for sufficiently large x and some l in Y,  f (x) − l  is negligible, then l is the limit of f (x) as x tends to infinity. We write f (x) → l as x → ∞, or operators2-3 
Definition:   f : X → Y is continuous at a, if f (x) → f (a) as x → a.
Definition:   f : X → Y is continuous, if it is continuous at all x in X.
Definition:  A curve is a continuous function from a one dimensional space to an n-dimensional coordinate space.

In spaces with curvature, coordinates are not vectors; a curve is not, in general, a vector valued function. A curve may be expressed in terms of coordinate functions of a parameter, t,
xi : t → xi(t),
or coordinates may be suppressed,
x : t → x(t).

Function Composition

Function composition means acting with one function followed by another.

Definition:  If  f : X → Y with f : x → f (x) and g : Y → Z with g : y → g(y) then we define the composite function g f : X → Z, by
g f : x → gf (x) = g(f (x)).

Sometimes the notation operators-1 is used for g f, but I have never known it necessary. Note the reversal in order; g f means that f is performed first, then g, as indicated by brackets in the definition.

Identity and Inverse

Definition:  The identity function, 1 : X → X, is the set of pairs of the form (xx).

Sometimes I is used for the identity function. I wish to keep that notation for operators describing interactions between particles. In spite of the slight ambiguity, there is little prospect of confusion. In function composition, the identity 1 behaves like number 1 in multiplication; for any f : X → Y,
1f = f 1 = f.
For a function f : X → Y, the inverse function, f −1, exists, if and only if there is a unique pair (xy) in f for every member y of Y.

Definition:  When the set of ordered pairs, (yx), found by reversing the pairs in f : X → Y, is a function, it is the inverse function,  f −1 : Y → X.

If the functions f : X → Y and g : Y → Z have inverses, then
(g f )−1 = f −1g−1.
Clearly, the composition of any function f with its inverse, f −1, when it exists, is the identity function,
f −1f = f f −1 = 1.

Functionals and Operators

Definition:  A functional is a function from a vector space to a set of scalars.

For example, bras are functionals on a vector space of kets.

Definition:  An operator is a function from a vector space to a vector space.

The vector spaces used to define an operator can be, but are not necessarily, the same. Since scalars are a one dimensional vector space, functionals and functions can both be regarded as operators. For the operators O1 : V1 → V2 and O2 : V1 → V2, there is a natural definition of addition and multiplication, giving operators the structure of vector space.

Definition:  For any vector, x in V1, and for any scalars, a and b,

Thus functions, functionals and operators can be treated as vectors. In physics, we only require answers at a finite range and resolution. It is sufficient to regard them as n-dimensional vectors, where n is a large, but unspecified, finite number. A lower bound on the value of n will depend on the problem in hand. If we need an actual value of n, for example in programming a computer solution, we simply choose some value greater than this lower bound (providing sufficient computer power is available).

Linear Operators

A linear operator is one which preserves vector addition and multiplication by scalars.

Definition:  (Linearity)  If V1 and V2 are vector spaces, then O : V1 → V2 is linear if for any vectors, x and y, and for any scalars, a and b,

Let O be an operator from an n-dimensional Hilbert space H1, with basis elements operators-4, to an m-dimensional Hilbert space, H2, with basis elements operators-5. For any operators-6 in H1 and operators-7 in H2, we can form the inner product in H2 between operators-8 and operators-9. Using the resolution of unity, together with linearity,
Thus, to specify a linear operator, we only have to specify its action on a basis. Converting to coordinate notation, and using the summation convention,
Thus O can be regarded as an m × n matrix,
If H1 and H2 are the same Hilbert space (or of the same dimension), operators-13 is a square matrix.

The Commutator and Anticommutator

Commutation and anticommutation relations between operators play an important role in quantum theory.

Definition:  The commutator between operators A and B is [AB] = AB − BA.
Definition:  The anticommutator between operators A and B is {AB} = AB + BA.

In general the order in which functions and operators act affects the result; [AB] ≠ 0. It is straightforward to show that, for any scalars, a and b, and for any operators, A, B, C : H → H, the commutator satisfies the following relations, which, together with the fact that operators form a vector space, define a Lie algebra.

[aA + bBC] = a[AC] + b[BC].
[CaA + bB] = a[CA] + b[CB].
Anticommutativity or skew-symmetry:
[AB] = −[BA].
The Jacobi Identity:
[[AB], C] + [[BC], A] + [[CA], B] = 0.

Hermitian Conjugation

For any operators-14 in a Hilbert space H2, and any linear operator O : H1 → H2 there is a unique bra, operators-15, in the dual space of H2, such that, for any operators-16 in H1,

Definition:  The Hermitian adjoint, Hermitian conjugate, or conjugate transpose of the linear operator O : H1 → H2 is the linear operator, O : H1 → H2, such that operators-18 is the ket corresponding to operators-19.

Thus, the inner product between operators-20 and operators-21 is the same as the inner product between operators-23 and operators-22. Clearly, for any operators O1 : H1 → H2 and O2 : H2 → H3, (O2O1) = O1O2.

Since reversing an inner product is complex conjugation,
So, the matrix form of O is found by taking the complex conjugate of each matrix element and taking the transpose. The Hermitian conjugate may thus be regarded as a generalisation of the complex conjugate. Sometimes a Hermitian conjugate is taken to mean that H1 and H2 are the same Hilbert space, but the definition is also useful when H1 and H2 are different Hilbert spaces.

Hermitian Operators

Definition:  An operator, O : H1 → H2, is Hermitian, or self-adjoint, if O = O.

Inner products formed with Hermitian operators are real,
This property gives Hermitian operators a central importance in the description of observable quantities in quantum theory.

Unitary Operators

Definition:  An operator, O : H1 → H2, is unitary, if its Hermitian conjugate is also its inverse, OO = OO = 1.

For a unitary operator O, and any operators-26 and operators-27 in H,
Thus a unitary operator preserves the inner product. In particular, vector magnitudes are preserved by unitary operators (unitary operators are isometries). A unitary transformation is a transformation described by a unitary operator.

The Differential Operator

operators2-4A derivative, when it exists, is an approximation to the slope, or gradient, of the tangent to a graph, using a value of dx so small that making it any smaller makes no practical difference to the result. The differential operator maps a function to its derivative.
Definition:  If X is a set of scalars, Y is a set of scalars or a vector space, and f : X → Y, then the derivative, f '  : X → Y, when it exists, is given by
Definition:   f is differentiable, if its derivative exists for all x in X.
Definition:  The differential operator is Dx :  f → f '.

It is straightforward to show that Dx is a linear operator. Newton’s dot notation for the derivative with respect to time may be used for a curve parameterised by t,
It is easy to see that operators2-8 is a tangent vector to the curve. The unit tangent vector for a curve is found by normalising the result of differentiating the curve with respect to the parameter,

Partial Derivatives

Partial differentiation generalises differentiation of scalar functions to functions of more than one variable. In particular, it generalises differentiation to operators on vector space. It is normal to distinguish a partial derivative using the symbol instead of d.

Definition:  When it exists, the partial derivative, f,i of an operator, f : X → Y, is
where dxi is a small vector along the i-axis.
Definition:   f is differentiable, if its partial derivatives exist for all x in X.
Definition:  The partial differential operator is ∂i : f → ∂i f = f,i.

A second order (partial) derivative is found by (partial) differentiation of a (partial) derivative. An nth order (partial) derivative is found by (partial) differentiation of an (n − 1)th order (partial) derivative. The notation is usually abbreviated by the removal of all but the first comma, e.g.,

Definition:  A function f is smooth, if its (partial) derivatives exist to all orders for all x in X.

It is straightforward to show that i is a linear operator, and that, if second order partial derivatives exist, the total derivative of f along a curve, x : t → x(t), is
where the summation convention has been used (proof). Since this is an invariant and operators2-14 is a contravariant vector, f,i is a covariant vector and this is an inner product. This result does not extend to tensors; the partial derivative of a vector or tensor valued function is not in general a tensor.

For a coordinate transformation,
So, the transformation matrix can be written using partial derivatives,

Clairaut’s theorem (proof):  For a functional, f : X → Y, with continuous second derivatives, the partial derivatives commute,

Functions, Operators, and Derivatives ↑Introduction to Tensors →