How Differentials Compose

Next problem, Frankel’s 2.3(1), part(i):

Let F:M^n \rightarrow W^r and G:W^r \rightarrow V^s be smooth maps. Let x, y and z be local coordinates near p \in M, F(p) \in W , and G(F(p)) \in V , respectively. We may consider the composite map G \circ F : M \rightarrow V. Show, by using bases \frac{\partial}{\partial x}, \frac{\partial}{\partial y}, and \frac{\partial}{\partial z}, that (G \circ F)_* = G_* \circ F_*.

Now, the differential of F, F_*, acts like this: F_* \left( \frac{\partial}{\partial x^j} \right) = \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} \frac{\partial}{\partial F^i\left(x\right)}, where x = (x^1, x^2, \cdots , x^n) \in M^n and F^i(x) is the ith component of F(x). A similar definition holds for G_* and (G \circ F)_*.

The problem asks us to consider the differentials acting on the bases \frac{\partial}{\partial x}, \frac{\partial}{\partial y}, and \frac{\partial}{\partial z}. However, it should be obvious that we need only prove that (G \circ F)_*\left(\frac{\partial}{\partial x}\right) = G_* \circ F_*\left(\frac{\partial}{\partial x}\right). The same proof would apply to the other two bases, and hence to the whole vector space that has those three bases as its basis.

Thus
G_* \circ F_* \left( \frac{\partial}{\partial x^j} \right)
= G_* \left( \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} \frac{\partial}{\partial F^i\left(x\right)} \right)
= \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} G_* \left( \frac{\partial}{\partial F^i\left(x\right)} \right)
= \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} \sum_k \frac{\partial G^k \left(F\left(x\right)\right)}{\partial F^i \left(x\right)} \frac{\partial}{\partial G^k\left(F\left(x\right)\right)}
= \sum_k \left(\sum_i \frac{\partial G^k \left(F\left(x\right)\right)}{\partial F^i \left(x\right)} \frac{\partial F^i\left(x\right)}{\partial x^j} \right) \frac{\partial}{\partial G^k\left(F\left(x\right)\right)}
= \sum_k \frac{\partial \left(G \circ F \right)^k \left(x\right)}{\partial x^j} \frac{\partial}{\partial G^k \left( F\left(x\right) \right)}
= (G \circ F)_* \left( \frac{\partial}{\partial x^j} \right)

Published in:  on October 20, 2008 at 2:31 pm Leave a Comment

Failed, again.

So, I have not had time, or have been persuading myself that I don’t have time, to do what I originally intended to do with this blog. In lieu of making the (not inconsiderable) effort to explain technical stuff in ordinary language, I will occasionally post solutions to math/physics problems I have been doing. Perhaps with a little explanation thrown in.

Today’s problem is Problem 2.1(1) from Frankel’s The Geometry of Physics:
If v is a vector and \alpha is a covector, compute directly in coordinates that \sum_i a_i^V v^i_V = \sum_j a_j^U v_U^j . (Here V and U are the open sets associated with charts on the manifold.) What happens if w is another vector and one considers \sum_i v^i w^i ?

The first part is a matter of applying the formulae for coordinate transformations for vectors and covectors. For covectors, a_j^U = \sum_i a_i^V \frac{\partial x_V^i}{\partial x_U^j} . For vectors, v_V^i = \sum_j \frac{\partial x_V^i}{\partial x_U^j} v^j_U .

Starting from the left hand side of what we want to prove:
\sum_i a_i^U v^i_U = \sum_j \sum_i a_i^V \frac{\partial x_V^i}{\partial x_U^j} v_U^j = \sum_i a_i^V \sum_j \frac{\partial x_V^i}{\partial x_U^j} v^j_U = \sum_i a_i^V v_V^i

This shows that the scalar you get from letting a vector ‘eat’ a covector is coordinate independent. A covector is often defined as an entity that maps vectors to scalars. This is a coordinate independent definition — if you change coordinate systems, the same vector-covector combination should give you the same scalar. So it’s not at all surprising that \sum_i a_i^V v^i_V = \sum_j a_j^U v_U^j : computing that scalar in the coordinate system V should give us the same result as computing it in U.

The second part of the question asks if the ’scalar product’ of two vectors, \sum_i v^i w^i , is likewise independent of coordinate systems. Using the coordinate transformation formula for vectors, we get the following:
\sum_i v_U^i w_U^i = \sum_j \sum_i \left( \frac{\partial x_U^i}{\partial x_V^j} \right)^2 w^j_V v_V^j

So \sum_i v^i w^i is not coordinate independent. This, again, is not surprising. Unlike covectors, vectors are not defined as objects that take in other vectors as arguments and spit out a scalar. So we would not expect the coordinate independence of the above vector-covector combination to hold for vector-vector combinations.

Published in:  on at 12:04 pm Leave a Comment

Where Do Irrational Numbers Come From?

I was recently involved in a philosophical exchange elsewhere about some of Wittgenstein’s remarks about whether in a space containing rational numbers, the existence of ‘irrational points’ is ‘prejudged’. I also recently helped a friend to do some of her Analysis in \mathbb{R}^n problems, and had a lot of fun. I realised that I’ve missed doing proof-based math in the last year or so, and also that I’m rather rusty in my proof skills. So I’m going to go through my battered copy of Fitzpatrick’s Advanced Calculus (1st Ed.) again and do some of the problems, and maybe explain some of the important axioms and theorems.

Today we’ll start off with the axioms that let us show the existence of irrational numbers. We build calculus by first defining the natural numbers inductively, then by defining the integers and rational numbers ‘from’ the field axioms. (If we ’start’ with only natural numbers the field axioms give us all rational numbers.) In order to do calculus, we need a continuum, and hence we need irrational numbers. (This is not a logical deduction but a quick explanation for why mathematicians wanted to ‘get’ irrational numbers in their system.) It would obviously be tedious and probably impossible to define the irrational numbers one by one until we have all of them. But we can get plenty of [maybe all?] irrational numbers into the system by introducing the Completeness Axiom, thus giving us all the real numbers.

The Completeness Axiom says that every nonempty set of real numbers that is bounded above has a least upper bound. An upper bound, as its name suggests, is a number that is larger than all the members of the set. A least upper bound is the smallest number amongst all the upper bounds of a set. It is thus a sort of ‘lowest ceiling’ of the set. It may or may not be itself a member of the set.

I won’t go through the whole proof leading from the Completeness Axiom to the existence of irrational numbers (specifically, of irrational square roots). It’s rather involved and not particularly thrilling. But I can give you an intuitive feel for why the Axiom should have that consequence. It’s because we can conceive of sets like \{x\in\mathbb{R} | x^2 < c \} , where c \geq 0. With the Axiom, this set must have a least upper bound. The proof involves showing that if the least upper bound of the set is b, then b2=c. And since c can be any number more than or equal to 0, this means that any positive real number has a square root. Since we know (from other independent proofs) that there some square roots are not rational, there exist irrational numbers.

The reason why b2=c can also be explained intuitively. For suppose that b2 is larger than c. Then it would seem that there is ’space’ between b and the ‘ceiling’ of the set; b can no longer be the least upper bound. Suppose b2 is less than c. Then b isn’t even an upper bound. So it must be the case that b2=c. (Clearly, I have left a lot out of the explanation, but I have sketched the main moves that one would have to make.)

So, just by assuming that all bounded sets have to have a definite ‘ceiling’, we get a whole bunch of irrational numbers.

Published in:  on May 16, 2008 at 2:24 pm Leave a Comment

Determinants and Oriented Volume

This was one of the more delightful aspects of linear algebra, when I first learned it properly. It could be because my initial introduction to linear algebra was purely computational, with no discussion of the geometric meaning of the procedures we learned. Sure, we learned how to calculate determinants, eigenvectors, eigenvalues, reduced row echelon forms, and such, but it was a mere drilling of procedures till we could do them blindfolded.

So here’s an ‘intuitive’ explanation of why the determinant of a matrix represents the ‘oriented volume’ of the parallelpiped spanned by the column vectors that constitute the matrix. The rigorous proofs can be found in any good linear algebra text — I am more interested in offering an explanation of why we might want to relate determinants to oriented volume.

The standard definition of the determinant, taught to most students of subjects where mathematics is used as a tool and not understood for its own sake, is as follows:
\det A = \sum_{i=1}^{n} (-1)^{i+1} a_{i1} \det A^{i, 1}
Here, a_{i1} is the entry of the matrix in the ith and 1st column, and A^{i, 1} is the (n-1) \times (n-1) matrix obtained by striking out the ith row and jth column of A.

We can start with the simplest example: the identity matrix \left( \begin{array} {cccc} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{array} \right) . The determinant of this matrix is 1, and the column vectors constituting it are \left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right), and \left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right). The volume that these three vectors span is the cube with vertices at (0, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 0, 0), (0, 1, 0), (0, 0, 1), and (1, 1, 1). It is, in short, the cube with a corner at the origin and unit lengths stretching along the x, y and z axes. This cube has side of length 1 each, so it has a volume of 1, the same value as the determinant.

Now consider a similar unit cube, but spanned instead by the vectors \left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right), and \left( \begin{array} {c} 0 \\ 0 \\ -1 \end{array} \right). This cube has vertices (0, 0, 0), (1, 1, 0), (1, 0, -1), (0, 1, -1), (1, 0, 0), (0, 1, 0), (0, 0, -1), and (1, 1, -1). If we consider volume to have the standard property of being a positive number, then clearly the volume of this cube is 1, just like the previous cube. Oriented volume, though, as its name suggests, is a kind of volume that is dependent on the ‘directions’ possessed by the object with the volume in question. In this case, since we have fliipped one vector to its ‘negative’, the oriented volume is also ‘negated’, so although the cube has conventional volume, its oriented volume is -1. -1 also ‘happens’ to be the determinant of the matrix composed of \left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right), and \left( \begin{array} {c} 0 \\ 0 \\ -1 \end{array} \right).

We can say the same for the matrices composed of \left( \begin{array} {c} -1 \\ 0 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right) and of \left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ -1 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right). Both are something like the unit cube of \left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right), \left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right), and \left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right) “flipped” (“negated”) once; both have an oriented volume of -1, and both have determinants of -1.

We can abstract this property of the determinant as such: when one of the column vectors of the matrix is negated, the entire determinant is negated. The absolute value of the determinant is unaltered. This can be interpreted as showing that when you ‘flip’ a vector about the origin to point in the ‘opposite’ direction, it together with its two other companions then ’span’ a parallelpiped ‘pointing’ in the opposite direction from the original but having the same absolute volume. The change in the ‘direction’ of the parallelpiped is manifested as a change in the sign of its oriented volume, just as the change in the sign of one of the matrix’s column vectors is manifested as a change in the sign of its determinant.

To summarise, we have seen that negating one of the column vectors of a matrix also negates our intuitive notion of ‘oriented volume’, and negates its determinant. This is just a special case of a property shared by both oriented volumes and determinants: multiplying a vector constituent of either by a scalar c results in a multiplication of either (oriented volume or determinant) by a factor of c.

Another common feature shared by both the oriented volume and the determinant is that both are invariant under the addition of a scalar multiple of a constituent vector to any of the other vectors. For example, if we transform the matrix \left( \begin{array} {cccc} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{array} \right) to \left( \begin{array} {cccc} 1&1&0 \\ 0&1&0 \\ 0&0&1 \end{array} \right) by adding \left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right) to the original middle vector \left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right), the determinant remains the same: the extra “1″ in the x-direction is irrelevant somehow.

This irrelevance makes sense in the ‘oriented volume’ interpretation. There, the volume represented by the second matrix is simply the first volume, the unit cube along all the positive axes, with four faces ’skewed’ from squares to parallelograms so that it’s a parallelpiped with a 45o skew in one direction but with two square faces in the ‘unskewed’ directions. However, since the y-value of the middle vector \left( \begin{array} {c} 1 \\ 1 \\ 0 \end{array} \right) is unchanged at 1, the ‘component’ that matters for calculating parallelpiped volume is unchanged — this ’skewed’ parallelpiped still has volume 1, in the same way that a right angled triangle skewed to a scalene triangle of the same base length and height retains its original volume. Parallelograms are really just triangles with twice the volume, so for them as well, it doesn’t matter how we angle their sides as long as they retain the same ‘base’ and ‘height’. And parallelpipeds are just higher dimension analogues of parallelograms. So it should be intuitively plausible that we should expect their volume to remain the same under certain ’skew’ transformations in which the ‘components’ added to one of their ‘edges’ only result in them becoming ‘more skewed’ but essentially the same ’size’.

Published in:  on December 21, 2007 at 10:09 am Leave a Comment

Tangent Vectors (on Manifolds)

In mathematics, a tangent generally has to do with an entity touching something at only one point, and representing the rate of change of something — a curve, a surface, a function, whatever. For example, a tangent to a curve is a straight line touching the curve at only one point, and whose gradient is equal to the gradient (=rate of change) of the curve at that point. Here I will try to explain the idea of tangent vectors on manifolds.

The most intuitive way to visualise them is as little tangents to curves on a manifold. A curve on a manifold is some function from an interval of the real numbers to a subset of the manifold: think of taking a straight line (representing the interval of real numbers) and twisting it whichever way you want (excluding crossings with itself) to fit somewhere on the manifold. The tangent vectors at a point on the manifold are simply the “lines” that (locally) touch the curve on the manifold at only that point. The curve itself exists independently of whatever local coordinate system we choose to use on the manifold. Since the tangent vectors are simply auxiliary characteristics of the curve, we would expect the tangent vectors to also exist independently of local coordinate systems — that is, while their labels in different coordinate systems may be different, the labels must be related to each other in a determined way, so that we know two different labels are in fact referring to the same object. In mathematical parlance, this translates to the tangent vector being subject to the chain rule for transformations between coordinates. If V is one coordinate patch with coordinates x^i_V and U is another patch with coordinates x^i_U, and the point p_0 lies on their overlap, then the chain rule says the following:
\left(\frac{dx_V^i}{dt}\right)_0 = \sum^n_{j=1}\frac{\partial x_V^i}{\partial x^j_U}\left(p_0\right) \left(\frac{dx_U^j}{dt}\right)_0

\left(\frac{dx_V^i}{dt}\right)_0 and \left(\frac{dx_U^j}{dt}\right)_0 are the tangent vectors to the curve at p_0 under coordinate systems x_V and x_U respectively. So all that scary equation says is that tangent vectors must obey the chain rule under transformations between overlapping coordinate systems.

We can even start off defining tangent vectors to be just those entities that transform that way. Here is Frankel [1]:

Definition: A tangent vector, or contravariant vector, or simply a vector at p_0 \in M^n, call it X, assigns to each coordinate patch (U, x) holding p_0, an n-tuple of real numbers (X_U^i) = (X^1_U, \dots , X_U^n) such that if p_0 \in U \cap V, then X_V^i = \sum_j \left(\frac{\partial x_V^i}{\partial x_U^j}\left(p_0\right)\right) X^j_U

Thus we need not be restricted to tangent lines to curves on the manifold: this definition lets us include any vector entity that happens to follow the transformation rule between two coordinate systems as a tangent vector. \frac{dx_U^j}{dt} and \frac{dx_V^i}{dt} in the ‘curve’ definition of a tangent vector are replaced by X_U^j and X_V^i in the ‘general’ definition. The ‘curve’ definition, though, is still a better way to visualise tangent vectors, since it offers at least a picture of tiny little tangent lines nibbling at a curve on a manifold.

[1] Frankel, T. The Geometry of Physics. Cambridge: Cambridge University Press, 1997.

Published in:  on September 2, 2007 at 6:45 am Leave a Comment

Manifolds, Proper

Here’s the idea behind manifolds. We want manifolds to be like a general structureless space, with none of those troublesome ’special points’ like the origin and irrational points and whatnot. But how do we characterise such a space?

We are most used to working in flat space — \mathbb{R}^n . The thing about \mathbb{R}^n is that it can be nicely expressed in independent coordinates: each of x, y, and z in a position vector (x, y, z) are like independent properties of the point being referred to, and we can perform operations on the vector as though each coordinate is independent of the rest (unless specified otherwise).

It is hence natural to try to frame manifolds in the language of Euclidean space. And we do this by the idea of transformations. We set up a system such that we can transform any given point in a manifold to any other point in a corresponding Euclidean space. But what kind of transformations? It would not be of much use to have rigid transformations that leave curved and twisted surfaces essentially curved and twisted, and straight objects essentially straight. We could not characterise a meaningful array of non-Euclidean spaces with that. We also want maximal transfer of information about the manifold to the corresponding Euclidean structures. That is, we don’t want the ‘translation’ to Euclidean space (or Euclidean space-like, as we shall see) to butcher the original manifolds so much that we wouldn’t be handling the original manifolds in our operations on the Euclidean structures. After all, the whole point of translation was to be able to handle manifolds in convenient Euclidean notation.

The natural thing to do might be to map an entire manifold to a slice of Euclidean space. For example, we can map the whole of a spherical surface, minus a pole, to the infinite real plane \mathbb{R}^2 , as follows:
Riemann Sphere
You place the sphere on the plane. Call the point on the sphere furthest from the plane the ‘north pole’. Draw a line from the north pole of the sphere towards any point on the plane. It will intersect the sphere at exactly one point. Thus each point on the plane maps to a unique point on the sphere, and every point on the sphere except the north pole maps to a unique point on the plane. So the spherical surface sans north pole is a manifold which maps whole and nicely to a whole slice of Euclidean space.

But this is not always the case. That is, we want to deal with manifolds that do not map so nicely. We want to deal with the whole sphere, for instance, north pole and all. We don’t want to stop ourselves from calling funny twisty structures manifolds just because we cannot press them flat (metaphorically) into a continuous slice of Euclidean space. So we relax our demands and try instead to cover any given manifold with a patchwork of maps to separate slices of Euclidean space. The patchwork can be made from arbitrarily small (but still non-zero) patches. Now that we can have many patches, we can characterise even structures that are ‘very far’ from flatness as manifolds. So long as we find the right combination of patches.

Take the surface of a 3D sphere, for example. We can now cover it with six overlapping patches and not miss a point.
sphere quadrants
Going by the colour scheme above, we use the following hemispherical patches: blue+pink, pink+green, green+brown, brown+blue, the hemisphere above the plane, the hemisphere below the plane. They cover every point of the sphere. We can use a simple function to translate every point in a given hemisphere to a corresponding point in Euclidean space. Finally, they also overlap with at least one other patch. This will turn out to be important.

The manifold has no seams. It is supposed to be a structureless surface. So when we divide it into patches to translate into Euclidean space, we don’t want to make special conditions for the translation of points that happen to fall on the edges of the patches. So we insist on overlaps. So there aren’t any ’special’ points that necessarily end up on the edge of the corresponding \mathbb{R}^n slice its patch is translated to. But we have to go further, for still we can see where the overlaps are. We insist that every possible combination of overlapping patches is a valid translation of the manifold. So no particular cutting up is favoured over another, as long as they preserve the essential, structureless characteristics of the manifold. This is why the manifold isn’t just one particular translation of it into patches of Euclidean space. It must be all possible qualified translations, for favouring any one over the others destroys its structureless nature. That is not to say that we do not favour one over others for pragmatic reasons, for ease of computation. But we recognise that the others are mathematically as representative of the manifold as whichever one we choose to use. At this point, a term that my professor used to describe the different possible patchworks pops into my head: a democracy. One translation, one vote. None more important than the others.

I could, and perhaps should, mention what are the mathematical formulations of ‘overlaps’, ‘maps’ and so on. In particular, there are niggling but important details about the kind of maps (differentiable) that are allowed. But I’m lazy, I figure that anyone who has the mathematical background probably knows what I’m talking about or can apply my intuitive description to help her learn from a dry axiomatized textbook description, and that anyone without the mathematical background wouldn’t benefit from reading technical details anyway.

Published in:  on July 20, 2007 at 9:05 am Leave a Comment

The Concept of Manifolds

\mathbb{R}^n is nice. By nice I mean that it is flat, that it contains (by definition) our intuitive notions of distance in flat spaces, that everything that one intuitively suspects is true turns out to be true in it, and so on.

Unfortunately, \mathbb{R}^n often does not suffice as an arena for many physical phenomena, which can take place in spaces with different structures. In fact, one can argue that \mathbb{R}^n simply cannot be a true description of physical phenomena, because we don’t think the coordinate system one chooses is anything more than a tool for the physics — it is certainly not part of the physical world. By this I mean that we don’t think there are literally points in the spaces objects move in, or in the abstract spaces that represent their velocities and positions and various other properties, that have the properties of having irrational coordinates. Or that it is important that a particular point is labelled as (1, 2) rather than (2.59, 0.98). We also don’t always need the idea of a distance as defined in \mathbb{R}^n . And when we throw in complex spaces, it becomes less clear that \mathbb{R}^n is adequate for everything.

Manifolds are a way of describing all continuous spaces. Manifolds are essentially the most structureless possible spaces. And they are everywhere in physics. (83% of physics is manifolds, as my professor quipped.) It is, from an aesthetic and philosophical point of view, more appropriate to think of physics in terms of manifolds than in terms of \mathbb{R}^n or the complex plane or any of those structured things, because physics doesn’t really care what coordinates you put on it. Physics may involve distances, but you can easily do that in manifolds as well. Ultimately, physics involves things moving around in the most general possible spaces. Although these may often be translated into things moving about in \mathbb{R}^n , the \mathbb{R}^n part is strictly ornamental.

Hence manifolds. The nub, though, and the part about manifolds that makes them so general and yet at bottom so easy to deal with, is that they are \mathbb{R}^n on small scales. One can think of this as saying that they are locally flat.

In the next post, I will try to explain how all this is manifested in the usual rigorous definitions of manifolds.

Published in:  on March 17, 2007 at 3:30 am Leave a Comment