## How Differentials Compose

Next problem, Frankel’s 2.3(1), part(i):

Let $F:M^n \rightarrow W^r$ and $G:W^r \rightarrow V^s$ be smooth maps. Let x, y and z be local coordinates near $p \in M$, $F(p) \in W$, and $G(F(p)) \in V$, respectively. We may consider the composite map $G \circ F : M \rightarrow V$. Show, by using bases $\frac{\partial}{\partial x}, \frac{\partial}{\partial y}$, and $\frac{\partial}{\partial z}$, that $(G \circ F)_* = G_* \circ F_*$.

Now, the differential of F, $F_*$, acts like this: $F_* \left( \frac{\partial}{\partial x^j} \right) = \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} \frac{\partial}{\partial F^i\left(x\right)}$, where $x = (x^1, x^2, \cdots , x^n) \in M^n$ and $F^i(x)$ is the ith component of $F(x)$. A similar definition holds for $G_*$ and $(G \circ F)_*$.

The problem asks us to consider the differentials acting on the bases $\frac{\partial}{\partial x}, \frac{\partial}{\partial y}$, and $\frac{\partial}{\partial z}$. However, it should be obvious that we need only prove that $(G \circ F)_*\left(\frac{\partial}{\partial x}\right) = G_* \circ F_*\left(\frac{\partial}{\partial x}\right)$. The same proof would apply to the other two bases, and hence to the whole vector space that has those three bases as its basis.

Thus
$G_* \circ F_* \left( \frac{\partial}{\partial x^j} \right)$
$= G_* \left( \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} \frac{\partial}{\partial F^i\left(x\right)} \right)$
$= \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} G_* \left( \frac{\partial}{\partial F^i\left(x\right)} \right)$
$= \sum_i \frac{\partial F^i\left(x\right)}{\partial x^j} \sum_k \frac{\partial G^k \left(F\left(x\right)\right)}{\partial F^i \left(x\right)} \frac{\partial}{\partial G^k\left(F\left(x\right)\right)}$
$= \sum_k \left(\sum_i \frac{\partial G^k \left(F\left(x\right)\right)}{\partial F^i \left(x\right)} \frac{\partial F^i\left(x\right)}{\partial x^j} \right) \frac{\partial}{\partial G^k\left(F\left(x\right)\right)}$
$= \sum_k \frac{\partial \left(G \circ F \right)^k \left(x\right)}{\partial x^j} \frac{\partial}{\partial G^k \left( F\left(x\right) \right)}$
$= (G \circ F)_* \left( \frac{\partial}{\partial x^j} \right)$

## Failed, again.

So, I have not had time, or have been persuading myself that I don’t have time, to do what I originally intended to do with this blog. In lieu of making the (not inconsiderable) effort to explain technical stuff in ordinary language, I will occasionally post solutions to math/physics problems I have been doing. Perhaps with a little explanation thrown in.

Today’s problem is Problem 2.1(1) from Frankel’s The Geometry of Physics:
If v is a vector and $\alpha$ is a covector, compute directly in coordinates that $\sum_i a_i^V v^i_V = \sum_j a_j^U v_U^j$. (Here V and U are the open sets associated with charts on the manifold.) What happens if w is another vector and one considers $\sum_i v^i w^i$?

The first part is a matter of applying the formulae for coordinate transformations for vectors and covectors. For covectors, $a_j^U = \sum_i a_i^V \frac{\partial x_V^i}{\partial x_U^j}$. For vectors, $v_V^i = \sum_j \frac{\partial x_V^i}{\partial x_U^j} v^j_U$.

Starting from the left hand side of what we want to prove:
$\sum_i a_i^U v^i_U = \sum_j \sum_i a_i^V \frac{\partial x_V^i}{\partial x_U^j} v_U^j = \sum_i a_i^V \sum_j \frac{\partial x_V^i}{\partial x_U^j} v^j_U = \sum_i a_i^V v_V^i$

This shows that the scalar you get from letting a vector ‘eat’ a covector is coordinate independent. A covector is often defined as an entity that maps vectors to scalars. This is a coordinate independent definition — if you change coordinate systems, the same vector-covector combination should give you the same scalar. So it’s not at all surprising that $\sum_i a_i^V v^i_V = \sum_j a_j^U v_U^j$: computing that scalar in the coordinate system V should give us the same result as computing it in U.

The second part of the question asks if the ‘scalar product’ of two vectors, $\sum_i v^i w^i$, is likewise independent of coordinate systems. Using the coordinate transformation formula for vectors, we get the following:
$\sum_i v_U^i w_U^i = \sum_j \sum_i \left( \frac{\partial x_U^i}{\partial x_V^j} \right)^2 w^j_V v_V^j$

So $\sum_i v^i w^i$ is not coordinate independent. This, again, is not surprising. Unlike covectors, vectors are not defined as objects that take in other vectors as arguments and spit out a scalar. So we would not expect the coordinate independence of the above vector-covector combination to hold for vector-vector combinations.

## Where Do Irrational Numbers Come From?

I was recently involved in a philosophical exchange elsewhere about some of Wittgenstein’s remarks about whether in a space containing rational numbers, the existence of ‘irrational points’ is ‘prejudged’. I also recently helped a friend to do some of her Analysis in $\mathbb{R}^n$ problems, and had a lot of fun. I realised that I’ve missed doing proof-based math in the last year or so, and also that I’m rather rusty in my proof skills. So I’m going to go through my battered copy of Fitzpatrick’s Advanced Calculus (1st Ed.) again and do some of the problems, and maybe explain some of the important axioms and theorems.

Today we’ll start off with the axioms that let us show the existence of irrational numbers. We build calculus by first defining the natural numbers inductively, then by defining the integers and rational numbers ‘from’ the field axioms. (If we ‘start’ with only natural numbers the field axioms give us all rational numbers.) In order to do calculus, we need a continuum, and hence we need irrational numbers. (This is not a logical deduction but a quick explanation for why mathematicians wanted to ‘get’ irrational numbers in their system.) It would obviously be tedious and probably impossible to define the irrational numbers one by one until we have all of them. But we can get plenty of [maybe all?] irrational numbers into the system by introducing the Completeness Axiom, thus giving us all the real numbers.

The Completeness Axiom says that every nonempty set of real numbers that is bounded above has a least upper bound. An upper bound, as its name suggests, is a number that is larger than all the members of the set. A least upper bound is the smallest number amongst all the upper bounds of a set. It is thus a sort of ‘lowest ceiling’ of the set. It may or may not be itself a member of the set.

I won’t go through the whole proof leading from the Completeness Axiom to the existence of irrational numbers (specifically, of irrational square roots). It’s rather involved and not particularly thrilling. But I can give you an intuitive feel for why the Axiom should have that consequence. It’s because we can conceive of sets like $\{x\in\mathbb{R} | x^2$ < $c \}$, where $c \geq 0$. With the Axiom, this set must have a least upper bound. The proof involves showing that if the least upper bound of the set is b, then b2=c. And since c can be any number more than or equal to 0, this means that any positive real number has a square root. Since we know (from other independent proofs) that there some square roots are not rational, there exist irrational numbers.

The reason why b2=c can also be explained intuitively. For suppose that b2 is larger than c. Then it would seem that there is ‘space’ between b and the ‘ceiling’ of the set; b can no longer be the least upper bound. Suppose b2 is less than c. Then b isn’t even an upper bound. So it must be the case that b2=c. (Clearly, I have left a lot out of the explanation, but I have sketched the main moves that one would have to make.)

So, just by assuming that all bounded sets have to have a definite ‘ceiling’, we get a whole bunch of irrational numbers.

Published in: on May 16, 2008 at 2:24 pm  Comments (1)

## Dainton on Unger on Phenomenal Truths

Since this blog is not supposed to presuppose any prior exposure to technical jargon or discipline-specific concepts, I shall first have a quick word on what phenomenal truths are. Roughly speaking, they are truths which are gathered from making phenomenal judgments. Now what are phenomenal judgments? They are judgments about the character and nature of our immediate conscious experiences. Although they are judgments of immediate experiences, they themselves need not be immediate. For example, I might reflect on the immediate sensory experiences I’d been having a moment ago (the cool air on my back, the queer smell of the room, etc.), and while this reflection and the accompanying judgments are not themselves immediate but are the products of a longer process, the experiences themselves are immediate.*

Peter Unger, in Identity, Consciousness and Value, suggests that we should be skeptical that our phenomenal experiences are as real as the less accessible “truths” that are discovered by natural science:

It cannot be nearly so easy as this to uncover deep truths about main aspects of reality. As with other psychological phenomena, an adequate understanding of conscious experience requires experiment, observation and theorizing that is both protracted and painstaking.

Barry Dainton, in Stream of Consciousness, rejects this argument. Firstly, he points out, truths need not be difficult to discover. We easily discover non-phenomenal truths all the time — water tends to flow downhill, pricking your skin with a needle draws blood, etc. But perhaps the crux lies in that ‘deep’ truths are difficult to discover? But even if we had any reason to think that, it still doesn’t stop us from accepting as reality the many ‘easy’ truths presented to us by phenomenal experience.

Secondly, it is not clear that all phenomenal truths must be ‘shallow’. Is it not conceivable that further investigation into the nature of conscious experience will uncover deep phenomenal truths?

I pretty much agree with Dainton’s criticisms. All the same, I’m not ready to abandon all skepticism about phenomenal truths. If I were to make an argument against their reality, it would be something along the lines of how there is rather more intersubjective confirmation of truths in natural science than there is of phenomenal truths. I’m not sure that that’s true, but my instinct is that that’s a potentially weak point of phenomenal truths compared to scientific truths. I am rather less certain that the sensation of ‘blue’ I am experiencing now is really the same as the sensation of ‘blue’ everyone else experiences. Or that everyone else experiences that sensation (and others) in the same way that I do. Less certain compared to my certainty that, for example, the Big Bang theory of cosmology is broadly true. Now, I may be completely unjustified in these intuitive judgments, but as philosophers know, in grains of intuition lie the beginnings of a half-respectable argument.

*The difficult reader might then ask, am I not in that case reflecting on memories rather than immediate conscious experiences? I actually think that’s a question that should be taken seriously, but since Dainton doesn’t consider it, let’s leave it aside for now.

## What is a Proposition?

One obvious answer could be something like the following:

A proposition is a statement of a state of affairs, which could be either true or false.

This seems to say exactly what we think a proposition is. It seems as though we could use these criteria (statement, possibility of being true/false) to determine if a given language form presented to us is a proposition. Not just that; it seems to us that the above definition tells us what a proposition is. That we can use our pre-existing ideas of ‘truth’ and ‘falsehood’, for example, to determine what entities out there are propositions, and what aren’t.

In the Philosophical Investigations, Wittgenstein points out that it is wrong-headed to speak of ‘truth’ and ‘falsehood’ as determining what a proposition is. For truth and falsehood themselves belong to the concept of a proposition. They are not, as it were, external criteria with which we use to judge possible propositions.

Wittgenstein compares the above ‘definition’ of a proposition with the following definition of what a king piece in chess is:

The piece that one can check.

In this case, would we say that we have in hand the concept ‘to check’, and the king is then whatever fits that concept — we go around ‘testing’ chess pieces for whether they indeed be ‘checked’, and when we find one that can, we conclude that it is the king? This seems ridiculous, for we all know that the the concept of ‘checking’ in chess is not independent of the concept of ‘king’; indeed, it is an intrinsic part of the concept ‘to check’ that one can check only the king and nothing else.

Wittgenstein means to suggest the same of the concept of a proposition. That is, we do not go around having independent notions of truth and falsehood that we then use to determine if certain linguistic entities are indeed propositions. Because we can understand truth and falsehood only if we also understand that these are concepts that apply to propositions.

This is often slightly disturbing to those with no prior exposure to Wittgenstein. If our apparently accurate definition of a proposition fails so fundamentally, then does there actually exist a definition of ‘proposition’ that is not, in some way, dependent on concepts that are internal to the concept of a proposition? No — it seems that ‘truth’ and ‘proposition’ must go together everywhere, as it were. Together, they form what Wittgenstein calls a language-game — a set of customs from which sprout many mutually defined objects. Chess is a game; a set of customary rules, and from these rules spring the concepts of ‘king’, ‘checking’, and so on, which are intertwined with one another and hence cannot be given a standalone definition in words alone (because to explain ‘king’ in words we have to explain ‘check’ in words, but to explain the latter in words we have to use the former as well, and so on). Similarly, Wittgenstein suggests that our ordinary linguistic and logical concepts, like that of a ‘proposition’, are really just part of language-games, meaning that we can’t hope to give all-determining definitions of them. Their meanings do not lie in abstract linguistic formulations, but in how they relate, very organically, to the other ‘pieces’ (concepts) in their respective language-games. But it would be a mistake to say that just because we cannot give them abstract linguistic definitions, that they are therefore ill-defined, or that we have a paradox. For it does not bother us that we cannot give an abstract definition of the ‘king’ in a chess game that is not implicitly dependent on a prior understanding of ‘king’. We accept that lack of definition as part of what it means to be a rule of a game. Similarly, Wittgenstein coaxes us to accept the lack of satisfactory definitions of many terms in ordinary language as simply part of the nature of those terms as pieces in language-games.

Published in: on January 3, 2008 at 5:05 pm  Comments (1)

## Van Fraassen on Peirce’s “Scholastic Realism”

Because I need the practice, this will be in the mould of those short summaries one writes about one’s course readings.

The arguments of the “scholastic realists” van Fraassen attacks in Laws and Symmetry can be broken into two parts: the first being that there exist laws of nature, the second being that we must believe there exist laws of nature (or else sink into an abyss of skepticism).

Van Fraassen quotes a lecture demonstration by C. S. Peirce:

Here is a stone. Now I place that stone where there will be no obstacle between it and the floor, and I will predict with confidence that as soon as I let go my hold upon the stone it will fall to the floor. I will prove that I can make a correct prediction by actual trial if you like. But I see by your faces that you all think it will be a very silly experiment.

This is supposed to demonstrate that there are some things that we know will happen without having to have that demonstrated before our eyes.

Peirce argues that the fact that we can believe that the stone will fall without doing the experiment is proof that the assumed ‘law’ that the stone will fall to the floor corresponds to reality. The idea is that either the fact that the stone will fall to the floor is a matter of chance — it could have failed to fall, but it just didn’t happen to have failed to fall that one time. Or, more plausibly, the fact that the stone will fall is dictated by a law of nature, which is what justifies us in believing that it will fall even before we see it do so. After all, if it were merely a matter of chance, we wouldn’t feel justified in believing it. Van Fraassen points out that this corresponds to the second part of the scholastic realists’ argument: that given our other beliefs, we must believe there exist laws of nature.

To recap Peirce’s argument: IF we are know that certain regularities in nature will occur without observing them to, THEN we must believe there exist laws of nature.

Van Fraassen argues that the dichotomy Peirce draws between events that happen due to ‘sheer chance’ and events that happen due to a law of nature is a false one. What, he asks, does ‘by chance’ mean? In the most common interpretations of that phrase, it could mean ‘not due to any law’, or it could mean ‘no more probable than the other possibilities’.

If Peirce means to take the latter interpretation, then it is not true that we know that certain regularities in nature will occur. So the premise of Peirce’s argument is false already, and we can’t argue from that to the truth of its conclusion.

What if Peirce means to take the former interpretation, that ‘by chance’ means ‘not due to any law’? Van Fraassen simply says that that would be a strange use of the phrase ‘by chance’. (I’m not sure I agree with him on this.)

Van Fraassen then goes on to consider if Peirce had perhaps accepted the tacit premise that whatever happens either does so for a reason or else is no more likely to happen than its contraries. Van Fraassen rejects this premise because it would mean that if the universe contained no reasons for regularities, then it would have to be completely chaotic — there wouldn’t even be room for highly probable regularities. In fact, this premise is exactly the first part of the scholastic realist argument — not just that we must believe that laws of nature exist, but that there actually exist laws of nature. It is not clear, though, why we should accept the premise that events must either have a reason behind them or be instances of completely random outcomes.

Yet it is hard to deny the strong attraction of the Peircean intuition that laws of nature have a flavour of necessity to them that mere continuation of a regularity does not. As van Fraassen writes, “A law must be conceived as the reason which accounts for uniformity in nature, not the mere uniformity or regularity itself.” But how do we reconcile this intuitive notion of ‘law’ with our repeated inability (from Hume onwards) to prove that such reasons exist?

Frankly, I don’t have much of a problem with giving up the intuition that laws dictate necessity. It’s true that it’s really convenient, for scientists and even most ordinary people, to think of regularities like falling objects as due to natural laws. And when a mode of thinking becomes convenient enough, people start treating its objects as real existent things. In philosophical parlance, they start inventing an ontology to go with their mode of thinking, which may have started out as a metaphysically innocent heuristic. I tend to think, for example, that scientific realists have unwittingly bought into what started out as a heuristic. So I’m perfectly comfortable with the idea that the regularities we know of now are just there, free of metaphysical baggage. It may please scientists to think of them as caused by laws of nature, but the burden of proof is on them to show that they have to accept the ontology of laws of nature. Seems to me the language of laws of nature is near-indispensable in much of modern day science, but I think it’s quite possible to shift to a more metaphysically conservative language (although I don’t see a point in doing so). In other words, bugger our intuitions. They are often products of extended cultural marination that need not push our intuitions any closer to the truth.

## Determinants and Oriented Volume

This was one of the more delightful aspects of linear algebra, when I first learned it properly. It could be because my initial introduction to linear algebra was purely computational, with no discussion of the geometric meaning of the procedures we learned. Sure, we learned how to calculate determinants, eigenvectors, eigenvalues, reduced row echelon forms, and such, but it was a mere drilling of procedures till we could do them blindfolded.

So here’s an ‘intuitive’ explanation of why the determinant of a matrix represents the ‘oriented volume’ of the parallelpiped spanned by the column vectors that constitute the matrix. The rigorous proofs can be found in any good linear algebra text — I am more interested in offering an explanation of why we might want to relate determinants to oriented volume.

The standard definition of the determinant, taught to most students of subjects where mathematics is used as a tool and not understood for its own sake, is as follows:
$\det A = \sum_{i=1}^{n} (-1)^{i+1} a_{i1} \det A^{i, 1}$
Here, $a_{i1}$ is the entry of the matrix in the ith and 1st column, and $A^{i, 1}$ is the $(n-1) \times (n-1)$ matrix obtained by striking out the ith row and jth column of $A$.

We can start with the simplest example: the identity matrix $\left( \begin{array} {cccc} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{array} \right)$. The determinant of this matrix is 1, and the column vectors constituting it are $\left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right)$, and $\left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right)$. The volume that these three vectors span is the cube with vertices at (0, 0, 0), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 0, 0), (0, 1, 0), (0, 0, 1), and (1, 1, 1). It is, in short, the cube with a corner at the origin and unit lengths stretching along the x, y and z axes. This cube has side of length 1 each, so it has a volume of 1, the same value as the determinant.

Now consider a similar unit cube, but spanned instead by the vectors $\left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right)$, and $\left( \begin{array} {c} 0 \\ 0 \\ -1 \end{array} \right)$. This cube has vertices (0, 0, 0), (1, 1, 0), (1, 0, -1), (0, 1, -1), (1, 0, 0), (0, 1, 0), (0, 0, -1), and (1, 1, -1). If we consider volume to have the standard property of being a positive number, then clearly the volume of this cube is 1, just like the previous cube. Oriented volume, though, as its name suggests, is a kind of volume that is dependent on the ‘directions’ possessed by the object with the volume in question. In this case, since we have fliipped one vector to its ‘negative’, the oriented volume is also ‘negated’, so although the cube has conventional volume, its oriented volume is -1. -1 also ‘happens’ to be the determinant of the matrix composed of $\left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right)$, and $\left( \begin{array} {c} 0 \\ 0 \\ -1 \end{array} \right)$.

We can say the same for the matrices composed of $\left( \begin{array} {c} -1 \\ 0 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right)$ and of $\left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ -1 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right)$. Both are something like the unit cube of $\left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right)$, $\left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right)$, and $\left( \begin{array} {c} 0 \\ 0 \\ 1 \end{array} \right)$ “flipped” (“negated”) once; both have an oriented volume of -1, and both have determinants of -1.

We can abstract this property of the determinant as such: when one of the column vectors of the matrix is negated, the entire determinant is negated. The absolute value of the determinant is unaltered. This can be interpreted as showing that when you ‘flip’ a vector about the origin to point in the ‘opposite’ direction, it together with its two other companions then ‘span’ a parallelpiped ‘pointing’ in the opposite direction from the original but having the same absolute volume. The change in the ‘direction’ of the parallelpiped is manifested as a change in the sign of its oriented volume, just as the change in the sign of one of the matrix’s column vectors is manifested as a change in the sign of its determinant.

To summarise, we have seen that negating one of the column vectors of a matrix also negates our intuitive notion of ‘oriented volume’, and negates its determinant. This is just a special case of a property shared by both oriented volumes and determinants: multiplying a vector constituent of either by a scalar $c$ results in a multiplication of either (oriented volume or determinant) by a factor of $c$.

Another common feature shared by both the oriented volume and the determinant is that both are invariant under the addition of a scalar multiple of a constituent vector to any of the other vectors. For example, if we transform the matrix $\left( \begin{array} {cccc} 1&0&0 \\ 0&1&0 \\ 0&0&1 \end{array} \right)$ to $\left( \begin{array} {cccc} 1&1&0 \\ 0&1&0 \\ 0&0&1 \end{array} \right)$ by adding $\left( \begin{array} {c} 1 \\ 0 \\ 0 \end{array} \right)$ to the original middle vector $\left( \begin{array} {c} 0 \\ 1 \\ 0 \end{array} \right)$, the determinant remains the same: the extra “1” in the x-direction is irrelevant somehow.

This irrelevance makes sense in the ‘oriented volume’ interpretation. There, the volume represented by the second matrix is simply the first volume, the unit cube along all the positive axes, with four faces ‘skewed’ from squares to parallelograms so that it’s a parallelpiped with a 45o skew in one direction but with two square faces in the ‘unskewed’ directions. However, since the y-value of the middle vector $\left( \begin{array} {c} 1 \\ 1 \\ 0 \end{array} \right)$ is unchanged at 1, the ‘component’ that matters for calculating parallelpiped volume is unchanged — this ‘skewed’ parallelpiped still has volume 1, in the same way that a right angled triangle skewed to a scalene triangle of the same base length and height retains its original volume. Parallelograms are really just triangles with twice the volume, so for them as well, it doesn’t matter how we angle their sides as long as they retain the same ‘base’ and ‘height’. And parallelpipeds are just higher dimension analogues of parallelograms. So it should be intuitively plausible that we should expect their volume to remain the same under certain ‘skew’ transformations in which the ‘components’ added to one of their ‘edges’ only result in them becoming ‘more skewed’ but essentially the same ‘size’.

## Tangent Vectors (on Manifolds)

In mathematics, a tangent generally has to do with an entity touching something at only one point, and representing the rate of change of something — a curve, a surface, a function, whatever. For example, a tangent to a curve is a straight line touching the curve at only one point, and whose gradient is equal to the gradient (=rate of change) of the curve at that point. Here I will try to explain the idea of tangent vectors on manifolds.

The most intuitive way to visualise them is as little tangents to curves on a manifold. A curve on a manifold is some function from an interval of the real numbers to a subset of the manifold: think of taking a straight line (representing the interval of real numbers) and twisting it whichever way you want (excluding crossings with itself) to fit somewhere on the manifold. The tangent vectors at a point on the manifold are simply the “lines” that (locally) touch the curve on the manifold at only that point. The curve itself exists independently of whatever local coordinate system we choose to use on the manifold. Since the tangent vectors are simply auxiliary characteristics of the curve, we would expect the tangent vectors to also exist independently of local coordinate systems — that is, while their labels in different coordinate systems may be different, the labels must be related to each other in a determined way, so that we know two different labels are in fact referring to the same object. In mathematical parlance, this translates to the tangent vector being subject to the chain rule for transformations between coordinates. If V is one coordinate patch with coordinates $x^i_V$ and U is another patch with coordinates $x^i_U$, and the point $p_0$ lies on their overlap, then the chain rule says the following:
$\left(\frac{dx_V^i}{dt}\right)_0 = \sum^n_{j=1}\frac{\partial x_V^i}{\partial x^j_U}\left(p_0\right) \left(\frac{dx_U^j}{dt}\right)_0$

$\left(\frac{dx_V^i}{dt}\right)_0$ and $\left(\frac{dx_U^j}{dt}\right)_0$ are the tangent vectors to the curve at $p_0$ under coordinate systems $x_V$ and $x_U$ respectively. So all that scary equation says is that tangent vectors must obey the chain rule under transformations between overlapping coordinate systems.

We can even start off defining tangent vectors to be just those entities that transform that way. Here is Frankel [1]:

Definition: A tangent vector, or contravariant vector, or simply a vector at $p_0 \in M^n$, call it X, assigns to each coordinate patch (U, x) holding $p_0$, an n-tuple of real numbers $(X_U^i) = (X^1_U, \dots , X_U^n)$ such that if $p_0 \in U \cap V$, then $X_V^i = \sum_j \left(\frac{\partial x_V^i}{\partial x_U^j}\left(p_0\right)\right) X^j_U$

Thus we need not be restricted to tangent lines to curves on the manifold: this definition lets us include any vector entity that happens to follow the transformation rule between two coordinate systems as a tangent vector. $\frac{dx_U^j}{dt}$ and $\frac{dx_V^i}{dt}$ in the ‘curve’ definition of a tangent vector are replaced by $X_U^j$ and $X_V^i$ in the ‘general’ definition. The ‘curve’ definition, though, is still a better way to visualise tangent vectors, since it offers at least a picture of tiny little tangent lines nibbling at a curve on a manifold.

[1] Frankel, T. The Geometry of Physics. Cambridge: Cambridge University Press, 1997.

## Manifolds, Proper

Here’s the idea behind manifolds. We want manifolds to be like a general structureless space, with none of those troublesome ‘special points’ like the origin and irrational points and whatnot. But how do we characterise such a space?

We are most used to working in flat space — $\mathbb{R}^n$. The thing about $\mathbb{R}^n$ is that it can be nicely expressed in independent coordinates: each of x, y, and z in a position vector (x, y, z) are like independent properties of the point being referred to, and we can perform operations on the vector as though each coordinate is independent of the rest (unless specified otherwise).

It is hence natural to try to frame manifolds in the language of Euclidean space. And we do this by the idea of transformations. We set up a system such that we can transform any given point in a manifold to any other point in a corresponding Euclidean space. But what kind of transformations? It would not be of much use to have rigid transformations that leave curved and twisted surfaces essentially curved and twisted, and straight objects essentially straight. We could not characterise a meaningful array of non-Euclidean spaces with that. We also want maximal transfer of information about the manifold to the corresponding Euclidean structures. That is, we don’t want the ‘translation’ to Euclidean space (or Euclidean space-like, as we shall see) to butcher the original manifolds so much that we wouldn’t be handling the original manifolds in our operations on the Euclidean structures. After all, the whole point of translation was to be able to handle manifolds in convenient Euclidean notation.

The natural thing to do might be to map an entire manifold to a slice of Euclidean space. For example, we can map the whole of a spherical surface, minus a pole, to the infinite real plane $\mathbb{R}^2$, as follows:

You place the sphere on the plane. Call the point on the sphere furthest from the plane the ‘north pole’. Draw a line from the north pole of the sphere towards any point on the plane. It will intersect the sphere at exactly one point. Thus each point on the plane maps to a unique point on the sphere, and every point on the sphere except the north pole maps to a unique point on the plane. So the spherical surface sans north pole is a manifold which maps whole and nicely to a whole slice of Euclidean space.

But this is not always the case. That is, we want to deal with manifolds that do not map so nicely. We want to deal with the whole sphere, for instance, north pole and all. We don’t want to stop ourselves from calling funny twisty structures manifolds just because we cannot press them flat (metaphorically) into a continuous slice of Euclidean space. So we relax our demands and try instead to cover any given manifold with a patchwork of maps to separate slices of Euclidean space. The patchwork can be made from arbitrarily small (but still non-zero) patches. Now that we can have many patches, we can characterise even structures that are ‘very far’ from flatness as manifolds. So long as we find the right combination of patches.

Take the surface of a 3D sphere, for example. We can now cover it with six overlapping patches and not miss a point.

Going by the colour scheme above, we use the following hemispherical patches: blue+pink, pink+green, green+brown, brown+blue, the hemisphere above the plane, the hemisphere below the plane. They cover every point of the sphere. We can use a simple function to translate every point in a given hemisphere to a corresponding point in Euclidean space. Finally, they also overlap with at least one other patch. This will turn out to be important.

The manifold has no seams. It is supposed to be a structureless surface. So when we divide it into patches to translate into Euclidean space, we don’t want to make special conditions for the translation of points that happen to fall on the edges of the patches. So we insist on overlaps. So there aren’t any ‘special’ points that necessarily end up on the edge of the corresponding $\mathbb{R}^n$ slice its patch is translated to. But we have to go further, for still we can see where the overlaps are. We insist that every possible combination of overlapping patches is a valid translation of the manifold. So no particular cutting up is favoured over another, as long as they preserve the essential, structureless characteristics of the manifold. This is why the manifold isn’t just one particular translation of it into patches of Euclidean space. It must be all possible qualified translations, for favouring any one over the others destroys its structureless nature. That is not to say that we do not favour one over others for pragmatic reasons, for ease of computation. But we recognise that the others are mathematically as representative of the manifold as whichever one we choose to use. At this point, a term that my professor used to describe the different possible patchworks pops into my head: a democracy. One translation, one vote. None more important than the others.

I could, and perhaps should, mention what are the mathematical formulations of ‘overlaps’, ‘maps’ and so on. In particular, there are niggling but important details about the kind of maps (differentiable) that are allowed. But I’m lazy, I figure that anyone who has the mathematical background probably knows what I’m talking about or can apply my intuitive description to help her learn from a dry axiomatized textbook description, and that anyone without the mathematical background wouldn’t benefit from reading technical details anyway.

## The Concept of Manifolds

$\mathbb{R}^n$ is nice. By nice I mean that it is flat, that it contains (by definition) our intuitive notions of distance in flat spaces, that everything that one intuitively suspects is true turns out to be true in it, and so on.

Unfortunately, $\mathbb{R}^n$ often does not suffice as an arena for many physical phenomena, which can take place in spaces with different structures. In fact, one can argue that $\mathbb{R}^n$ simply cannot be a true description of physical phenomena, because we don’t think the coordinate system one chooses is anything more than a tool for the physics — it is certainly not part of the physical world. By this I mean that we don’t think there are literally points in the spaces objects move in, or in the abstract spaces that represent their velocities and positions and various other properties, that have the properties of having irrational coordinates. Or that it is important that a particular point is labelled as (1, 2) rather than (2.59, 0.98). We also don’t always need the idea of a distance as defined in $\mathbb{R}^n$. And when we throw in complex spaces, it becomes less clear that $\mathbb{R}^n$ is adequate for everything.

Manifolds are a way of describing all continuous spaces. Manifolds are essentially the most structureless possible spaces. And they are everywhere in physics. (83% of physics is manifolds, as my professor quipped.) It is, from an aesthetic and philosophical point of view, more appropriate to think of physics in terms of manifolds than in terms of $\mathbb{R}^n$ or the complex plane or any of those structured things, because physics doesn’t really care what coordinates you put on it. Physics may involve distances, but you can easily do that in manifolds as well. Ultimately, physics involves things moving around in the most general possible spaces. Although these may often be translated into things moving about in $\mathbb{R}^n$, the $\mathbb{R}^n$ part is strictly ornamental.

Hence manifolds. The nub, though, and the part about manifolds that makes them so general and yet at bottom so easy to deal with, is that they are $\mathbb{R}^n$ on small scales. One can think of this as saying that they are locally flat.

In the next post, I will try to explain how all this is manifested in the usual rigorous definitions of manifolds.