2025-09-14

Learning and Making Sense of Differential Geometry in General Relativity

I have been out of the field of physics for over 5 years, didn't get to make use of my training in quantitative analysis too much in my previous job as a postdoctoral researcher at UC Davis, and make even less use of my training in quantitative analysis in my current job as a transportation planner at Cambridge Systematics. When I was doing work in my PhD in physics and when I was a student in school and college before graduate school, I enjoyed solving problems in math & physics and growing & applying my toolbox of quantitative skills, so since leaving the field, and especially more recently, I have felt a slight itch to recover some of those skills purely for my own personal satisfaction. To that end, I've resolved to learn or relearn some parts of math that I did not learn at all or to the full extent that I should have in college or graduate school (because I could ultimately manage my work in my PhD without knowing those things to that extent). Currently, I'm interested in learning differential geometry as well as complex analysis. It remains to be seen whether there are other topics in physics-relevant math that I become interested in; if they are, then I will certainly make an effort to learn at least a little bit about them. I've enjoyed learning these topics in math to broaden my knowledge & skills, and I've particularly enjoyed pondering definitions & rules of these concepts in math almost like a lawyer (which also makes this useful for my current job in a very indirect way, because my current job in part involves analyzing laws & regulations and creating intuitive explanations of them for public sector clients).

I can't guarantee that I'll write a blog post about every topic in math that I learn about. However, I am writing this post specifically about differential geometry in parts because I feel that I have learned the basic ideas in it in the context of general relativity to my satisfaction (which was my original goal) and because I have some questions/concerns that I have not been able to satisfactorily resolve based on what I have read in lecture notes or textbooks. This post is a way to further flesh out those questions/concerns. Follow the jump to see more.

There are a few conventions & assumptions to note throughout this post.

  • The dimension of the manifold will generically be denoted \( N \). For most nonrelativistic physics, \( N = 3 \), while for most relativistic physics (including general relativity), \( N = 4 \); exceptions include constrained low-dimensional systems or different physical models that have different dimensions analogous to differential geometry.
  • I will consistently use Einstein summation unless otherwise specified. This means that indices that are repeated with one as an upper index and one as a lower index will be summed; indices should never be present more than twice at all and more than once in the same (upper versus lower) position.
  • Although the convention in general relativity is to use lowercase Greek letters for indices, I will keep things easier to read by using lowercase English letters for indices.
  • The motivation of general relativity means that I will only consider differentiable manifolds and specifically smooth (infinitely differentiable) scalar functions or tensor components on them.
  • The motivation of general relativity also means that I will only consider torsion-free connection coefficients that can be expressed in terms of the metric.

Intuitions for basis vectors and covectors on flat manifolds

On a flat manifold (without curvature), it is possible, even if it is technically an abuse of the notion of tangent and cotangent spaces, to visualize vectors as pointing from one place to another in the manifold with a certain magnitude & direction, compare vectors at different points in space, and even associate vectors with macroscopic (non-infinitesimal) displacements from one point to another (such that given vector fields \( \vec{u} \) & \( \vec{v} \), if the locations \( \vec{x} \) & \( \vec{x}' \) differ from each other, the quantity \( \vec{u}(\vec{x}) - \vec{v}(\vec{x}') \) is still mathematically well-defined, which is in practice most useful for defining displacements \( \vec{x} - \vec{x}' \)). This is the way that vectors are treated in most of physics outside of general relativity. The metric is trivial, and covectors can be treated as ultimately unnecessary formal baggage to define inner products.

Vectors can be expressed in a complete basis as \( \vec{v} = v^{i} \vec{e}_{i} \). Technically, the vector \( \vec{v} \), the components \( \{ v^{i} \} \), and the basis vectors \( \{ \vec{e}_{i} \} \) must be specified to exist at a point \( p \) in the manifold, but the reason for considering a flat manifold is that it is possible to choose an orthonormal basis \( \{ \vec{e}_{i} \} \) through the entire manifold such that each basis vector has the "same direction" at every point. The magnitude of the vector can be visualized as its generalized length from tail to head.

On a flat manifold, the trivial nature of the metric and use of a spatially invariant orthonormal basis allow for a little laxity in defining inner products and norms and in distinguishing contravariant versus covariant objects. However, it is useful to define the notion of a covector as an object which maps vectors to real numbers. At a specific point \( p \) in the manifold, a covector expressed in terms of a complete basis of covectors is written as \( \tilde{a} = a_{i} \tilde{\omega}^{i} \) and maps vectors to real numbers as the inner product \( \tilde{a}(\vec{v}) = a_{i} v^{i} \). This also means that \( \tilde{a}(\vec{e}_{i}) = a_{i} \). Furthermore, vectors can be seen equivalently as objects that map covectors to real numbers, such that \( \vec{v}(\tilde{a}) = \tilde{a}(\vec{v}) = v^{i} a_{i} = a_{i} v^{i} \) and \( \vec{v}(\tilde{\omega}^{i}) = v^{i} \).

It is a little harder, but still possible, to visualize covectors. Essentially, a basis covector \( \tilde{\omega}^{i} \) can be visualized as an infinite set of parallel \( N - 1 \)-dimensional hyperspaces with unit spacing in which the hyperspaces are perpendicular to the direction of increasing \( x^{i} \) for the given index \( i \). This means that a general covector \( \tilde{a} \) can be visualized as a set of parallel \( N  - 1 \)-dimensional hyperspaces with the linear density (which is inversely proportional to the spacing) of parallel hyperspaces in the perpendicular direction to those hyperspaces indicating the "magnitude" of the covector (such that a covector with a larger magnitude has more densely packed hyperspaces with smaller spacing) and where the direction of separation perpendicular to the parallel hyperspaces may deviate from that of a single coordinate.

Ultimately, this means that the inner product \( \tilde{a}(\vec{v}) = \vec{v}(\tilde{a}) \) can be visualized as the number of "parallel hyperspaces" of \( \tilde{a} \) that the "arrow" of \( \vec{v} \) "pierces". If the "magnitude" of \( \vec{v} \) increases but the "magnitude" of \( \tilde{a} \) is fixed, then the number of "parallel hyperspaces" "pierced" by the "arrow" increases as the "arrow" becomes "longer", while if the "magnitude" of \( \tilde{a} \) increases but the "magnitude" of \( \vec{v} \) is fixed, then the number of "parallel hyperspaces" "pierced" by the "arrow" increases as the "parallel hyperspaces" become "more densely packed". Additionally, \( \tilde{a}(\vec{v}) = 0 \) means that the "arrow" of \( \vec{v} \) is "parallel" to the "parallel hyperspaces" of \( \tilde{a} \) and "perpendicular" to the directional line of "spacing" of the "parallel hyperspaces".

Scalars, vectors, covectors, and higher-order tensors are themselves invariant, but the components of vectors, covectors, and higher-order tensors may be contravariant or covariant. Typically, changing to another coordinate system is taken to mean changing the basis vectors, so basis vectors are considered to be covariant. With this in mind, vector components must be contravariant (changing in the opposite way as the basis vectors) for vectors to be invariant, basis covectors must be contravariant so that the inner products of basis vectors with basis covectors remain \( \delta_{i}^{j} \) with invariance, and covector components must be covariant for covectors to be invariant.

A simple example (which I find easier to visualize than orthogonal rotation or, in the case of special relativity, boosting) is rescaling basis vectors by an arbitrary factor, so the new basis vectors are related to the old basis vectors by \( \vec{q}_{i} = r_{i} \vec{e}_{i} \) (where Einstein summation is not used in that equation or in any other equations in this example about rescaling). New vector components are related to old vector components by \( u'^{i} = \frac{u^{i}}{r_{i}} \) so that \( \vec{u} = \sum_{i} u^{i} \vec{e}_{i} = u'^{i} \vec{q}_{i} \). New basis covectors are related to old basis covectors by \( \tilde{\chi}^{i} = \frac{\tilde{\omega}^{i}}{r_{i}} \) so that \( \tilde{\chi}^{i} (\vec{q}_{j}) = \delta^{i}_{j} \). Finally, new covector components are related to old covector components by \( a'_{i} = r_{i} a_{i} \) so that \( \tilde{a} = \sum_{i} a_{i} \tilde{\omega}^{i} = \sum_{i} a'_{i} \tilde{\chi}^{i} \). (After this paragraph, Einstein summation will be used again until further notice.)

It is worth noting that Christopher Columbus's failure to reach Asia when sailing west from Europe could be explained as a error of units of distance. In particular, the maps that he used showed distances in Arabic miles, whereas he was familiar with Roman miles, and Arabic miles were longer than Roman miles. This in turn can be cast in terms of contravariance & covariance; the basis vectors \( \{ \vec{q}_{i} \} \) that he was using were longer than the basis vectors \( \{ \vec{e}_{i} \} \) that he was expecting to use, and he only saw the altered components \( \{ u'^{i} \} \), so he erroneously constructed a shorter vector \( u'^{i} \vec{e}_{i} \) instead of properly reconstructing \( u'^{i} \vec{q}_{i} \) and equating this to \( u^{i} \vec{e}_{i} \) in terms of his more familiar basis vectors by reconstructing \( u^{i} \) from the scale factor. 

Different definitions of coordinates and vectors on curved manifolds, and problematic notation

If a curved manifold is considered and is not assumed to be embedded in a higher-dimensional flat manifold, then it is no longer possible to visualize vectors as extending across the manifold or having the "same direction" everywhere. This is specifically because the direction of a vector can change when attempting to "move" it from one point in the manifold to another, and the way that it changes can itself vary depending on the path of points taken in the manifold between those two points. (This idea can be formalized in the notion of parallel transport, which requires notions of Lie derivatives or covariant derivatives that will be discussed later in this post.) Thus, when working with curved manifolds, many of the assumptions implicit in the definitions of functions, vectors, and covectors on flat manifolds must be revisited & in many cases significantly revised (in ways that can still be consistently applied to flat manifolds as special cases of generally curved manifolds).

Points, coordinates, and differentiability of the manifold 

At an even more basic level, on a curved manifold (unlike the case for a flat manifold), there is not necessarily a "natural" choice for the set of coordinates to use. Therefore, it is necessary to be more aware, than when working with flat manifolds, that the fundamental elements of manifolds are points, which cannot be added to each other or multiplied by scalars. Coordinates are simply convenient labeling schemes for those points, there may be different coordinate systems for a manifold that have different pros & cons, and some manifolds might require multiple coordinate systems to cover (with overlaps) different parts of the manifold.

Traditionally, points in the manifold are labeled \( p \), a choice of coordinates is called a chart \( \phi \) such that \( (x^{1}, x^{2}, \ldots, x^{N}) = \phi(p) \) gives the coordinates for a portion of the manifold for which the chart is valid, the ordered tuple \( (x^{1}, x^{2}, \ldots, x^{N}) \) is labeled as either \( x \) or \( x^{i} \) as shorthand, and the mapping between points and coordinates is one-to-one such that points can be recovered from coordinates as \( p = \phi^{-1}(x) \). I am not a huge fan of this notation in practice because while the labeling of the chart as \( \phi \) is instructive when learning the definitions, it becomes cumbersome, and because \( x \) or \( x^{i} \) could in some cases ambiguously refer to either the full coordinate tuple or a single element of the coordinate tuple; the use of \( x^{i} \) as an explicit argument of various objects becomes even more confusing when taking derivatives with respect to a specific coordinate element. Instead, I prefer the abuses of notation \( \breve{x} = (x^{1}, x^{2}, \ldots, x^{N}) \) for the coordinates, \( \breve{x}(p) \) instead of \( \phi(p) \) when going from points to coordinates, and \( p(\breve{x}) \) instead of \( \phi^{-1}(\breve{x}) \) when going from coordinates to points. Crucially, it is worth noting that \( \breve{x} = (x^{1}, x^{2}, \ldots, x^{N}) \) is not a vector because it does not transform like a vector under a general Lorentz transformation at a point in the manifold; it is a convenient notation for a tuple.

It is worth thinking more carefully too about what it means for the manifold to be differentiable. Differentiability of the manifold means that if there are two coordinate systems covering at least part of a manifold with a non-empty overlap, then for points in that overlap, the mapping from one coordinate system to the other is smooth, and that this holds for any pair of coordinate systems with a non-empty overlap covering any part of the manifold.

Scalar functions/fields and curves on the manifold

It is possible to define scalar functions, which can also be called scalar fields, on the manifold. A scalar field maps a weak subset of the manifold (which could be the entire manifold) to the real line, associating a real number to each point in the manifold. As mentioned before the jump, I am only considering smooth functions, which are infinitely differentiable (with the latter point interpreted in a chosen coordinate system but holding true for at least one such coordinate system). Thus, the argument of a scalar field \( f \) is a point \( p \), and the field depends only indirectly on coordinates defined for a specific part of the manifold via \( f(p(\breve{x})) \).

It is also possible to define curves on the manifold. A curve is a continuous sequence of points in the manifold, parameterized by some variable \( t \) which is considered to monotonically increase from 0 at the beginning of the curve to 1 at the end of the curve; this parameterization in terms of increasing \( t \) in turn specifies the direction of the curve at every point. A curve is often denoted as a function \( \gamma \) mapping each value of the real parameter \( t \) to a point on the manifold. The differentiability of the manifold implies differentiability of curves on the manifold. Note that in general relativity, \( t \) is often used for the time coordinate \( x^{0} \) (which is an index, not an exponent), so the parameter is usually taken to be \( \tau \), which is the proper time for a massive particle and may take values different from the interval \( [0, 1] \), or \( \lambda \) when considering a massless particle for which proper time is undefined (in which case \( \lambda \in [0, 1] \) can be assumed more easily). For this post, I will call the parameter \( t \) for simplicity (distinguishing it from the coordinate \( x^{0} \)).

Scalar fields may be evaluated along parameterized curves. A function \( f \) evaluated along a curve \( \gamma \) may be written as \( f(\gamma(t)) \) for a given parameter value \( t \). It is also convenient to define the coordinate representation of the curve as \( \breve{X}(t) = \breve{x}(\gamma(t)) \); this is a sequence of coordinates parameterized by \( t \) and given by the conversion of the curve \( \gamma \) to the coordinates via the function \( \breve{x}(p) \) (as \( \gamma(t) \) for each \( t \) is a point in the manifold). This may look redundant at first, as \( f(p(\breve{X}(t))) \) is exactly \( f(\gamma(t)) \), but the notation \( f(p(\breve{X}(t))) \) becomes helpful when attempting to take spatial derivatives of \( f \), which becomes foundational to the definitions of vectors & covectors on curved manifolds.

As a final point about the notation used for curves, I strongly dislike the notation \( \breve{x}(t) \) because it confuses the general coordinate system defined by the function \( \breve{x}(p) \), mapping from the manifold to coordinates, with a specific sequence of coordinates given by the parameter \( t \). I much prefer to distinguish the latter by using the uppercase letter \( \breve{X} \) for a specific curve/trajectory of a particle or other physical object. An example of my attempt to use similar notation to minimize ambiguity can be seen in this post from 5 years ago about classical phase space densities [LINK], and similar points have been made in the textbook Structure and Interpretation of Classical Mechanics by Gerald Sussman & Jack Wisdom about the rigorous formalism & computational implementation of classical mechanics [LINK from MIT].

Derivatives and vectors on the manifold

The description of vectors on a curved manifold can conceptually be seen to come in two ways. One is from the evaluation of scalar fields over trajectories. The other is from the way that the manifold being differentiable implies that at any point in the manifold, the manifold is close enough to being flat in an arbitrarily small neighborhood around that point that it makes sense to use the directions of increasing coordinates to specify vectors.

Perhaps the more fundamental description of vectors comes from the evaluation of scalar fields over trajectories. This can be very rigorously defined without depending too much on specific coordinate systems, but I find it to be more intuitively helpful to depend a little more on such coordinate descriptions. A scalar field evaluated along a curve \( \gamma \) parameterized by \( t \in [0, 1] \) can be written as \( f(\gamma(t)) \) or equivalently as \( f(p(\breve{X}(t))) \) where \( \breve{X}(t) = \breve{x}(\gamma(t)) \). Understanding how the scalar field changes over the trajectory means computing the total derivative \( \frac{\mathrm{d}}{\mathrm{d}t} f(p(\breve{X}(t))) \). Because the dependence of \( f \) on the parameter \( t \) is only via \( \breve{X}(t) \) and because the dependence of \( f \) on the coordinates \( \breve{x} \) is only via \( p(\breve{x}) \), then this can be evaluated through the chain rule as \( \frac{\mathrm{d}X^{i}}{\mathrm{d}t} \left(\partial_{i} f(p(\breve{x})) \right)|_{\breve{X}(t)} \). Thus, the rate of change of the coordinates of the trajectory with respect to the parameter can be interpreted as the components of a generalized velocity vector (which is truly a vector, unlike the tuple of coordinates) \( v^{i}(p)|_{p(\breve{X}(t))} = \frac{\mathrm{d}X^{i}}{\mathrm{d}t} \).

Without yet addressing the question of what the basis vectors are, if one were to simply posit a vector field to be described in part by its components at every point in the manifold \( \{ u^{i}(p) \} \), then any vector field whose components are \( \{ u^{i}(p) \} \) can effect a curve on the manifold in the form of an integral curve satisfying the (usually coupled, often nonlinear) set of differential equations \( \frac{\mathrm{d}X^{i}}{\mathrm{d}t} = u^{i}(p(\breve{X}(t))) \). Thus, just like how curves on the manifold can be used to derive vectors or vector fields, vector fields can be used to derive curves on the manifold.

The question of the definition of basis vectors in this conceptual framing can in principle be addressed by considering only abstract curves & trajectories on the manifold, but I find it more intuitive to appeal to the differentiability of the manifold to work in a coordinate system in the neighborhood of a point in the manifold. In the neighborhood (which may properly be defined with an appropriate choice of coordinate system) of a point in the manifold, it is possible to identify parametric curves on the manifold corresponding to the directions of increasing coordinates for each coordinate. The rates of change of a scalar field along directions of increasing \( x^{i} \) at that point for each \( i \) are simply given by the partial derivatives \( \left(\partial_{i} f(p(\breve{x})) \right)|_{\breve{x}(p)} \). This in turn means that a general directional derivative with respect to a vector at that point in the manifold is given by \( u^{i}(p) \left(\partial_{i} f(p(\breve{x})) \right)|_{\breve{x}(p)} \). This does not make any reference to any scalar parameter \( t \).

From either standpoint (trajectories on the whole manifold or coordinates increasing locally around a point), vectors have no extrinsic definition if the curved manifold is assumed to not be embedded in a higher-dimensional flat manifold. Directionality is given only through spatial derivatives at specified points. Therefore, at a specified point \( p \) in the manifold, the vector with components \( \{ u^{i}(p) \} \) can be written as the differential operator \( \hat{u}(p) = u^{i}(p) \partial_{i}|_{p} \), thereby leading to the identification of the partial derivatives \( \partial_{i}|_{p} \) with the basis vectors at the point \( p \) as \( \hat{u}(p) \) is a linear combination (as expected for a general vector) of the basis vectors at that point. A vector maps a scalar field to a scalar at a specified point \( p \), and its action on a scalar field at such a point can be written as \( \hat{u}(p)(f(p)) \); note that this is not equivalent to \( f(p)\hat{u}(p) \), so I have chosen to not write \( \hat{u}(p)(f(p)) \) as \( \hat{u}(p) f(p) \) because the latter might misleadingly imply commutativity of \( \hat{u}(p) \) with \( f(p) \).

These vectors are called tangent vectors because they are derived in the same way and mathematically behave in the same ways as tangent vectors to curves embedded in higher-dimensional flat manifolds (and because tangents are naturally related to spatial derivatives and an integral curve is formed by following tangent vectors starting from a point in the manifold). A vector is defined at a particular point in the manifold, while a vector field is defined for a part of or the entire manifold. Because of the manifold's curvature, it is not possible to compare vectors that exist at different points (even if they are part of the same overall vector field). Instead, each point has associated with it a vector space of its own, called a tangent space, with vectors at that point existing only within the tangent space associated with that point, and the abstract union of tangent spaces over the manifold is known as the tangent bundle. Therefore, although it might not be wrong to visualize vectors even on curved manifolds as objects like arrows with a certain magnitude & direction, it is important to remember that those objects only exist at specific points and cannot be compared between points, and vectors only have mathematical meaning when acting as differential operators on curves (or when combined in specific ways with covectors or other tensors, as will be shown later in this post). This is also why \( \breve{x} \) is not actually a vector; on a curved manifold, there is no unique way to define it as a displacement from a specified origin, and it cannot be treated as an element of a tangent space associated with a specific point in the manifold.

It is tempting to say that the spatial variation in a vector field along a curve of increasing \( x^{i} \) for fixed \( i \) can be given by an expression like \( \partial_{i} \left(u_{i}(p(\breve{x})) \partial_{i}|_{p(\breve{x})}\right) \) and use the product rule (\(\partial_{i} (f_{1} f_{2}) = f_{2} \partial_{i} f_{1} + f_{1} \partial_{i} f_{2} \)) to apply the outer derivative operator to both the components and basis vectors forming the vector. However, that does not work because basis vectors, even at nearby points, exist in different tangent spaces and therefore cannot be directly subtracted from each other to form the traditional definition of a derivative. Similar caveats will apply on a curved manifold to covectors, including basis covectors, as well, and therefore to general higher-order tensors too. Later in this post, I will discuss more careful constructions of such notions of derivatives, namely covariant derivatives & Lie derivatives. For now, it is clear that \( \partial_{i}|_{p} \) only acts like a typical partial derivative operator on scalar fields and on "scalar" components of vector fields, covector fields, and tensor fields. To emphasize that its action on other objects will be very different and that it is a basis vector that should be notationally treated the same as any other vector, I will denote the basis vectors as \( \hat{\partial}_{i}|_{p} \). This means that a basis vector's action on a scalar field is \( \hat{\partial}_{i}|_{p} f(p(\breve{x})) = \left(\partial_{i} f(p(\breve{x}))\right)|_{\breve{x}(p)} \), but its action on other objects like basis vectors or basis covectors will be different, and in such cases the basis vectors will retain the notational distinction \( \hat{\partial}_{i}|_{p} \). Thus, vectors & vector fields will be written as \( \vec{u}(p) = u^{i}(p)\hat{\partial}_{i}|_{p} \).

Lie derivatives

The most natural & relevant generalization of the partial derivative to a differentiable manifold is the covariant derivative. However, that requires the existence of some sort of connection coefficient, and for the manifolds used in general relativity, that requires a metric. At this point in this post, tensors have not yet been introduced. Therefore, it is useful to discuss the notion of a Lie derivative in order to provide a sense of how a generalization of a partial derivative may look on a differentiable manifold where one can assign scalar & vector fields but one cannot necessarily assume the existence of a metric.

A Lie derivative "evaluates the change of a tensor field (including scalar [fields], vector fields and [covector fields]), along the flow defined by another vector field" [LINK from Wikipedia]. The Lie derivative operation is generically written as \( \mathcal{L}_{\hat{u}} \) with respect to the vector field \( \hat{u} \). Derivations of formulas for Lie derivatives are quite complicated; it can be instructive to go through those derivations, but this post will focus more on intuition. Note that the vector field \( \hat{u} \) must be defined on a substantial subset of the manifold, not just as a vector at a single point.

The change in a scalar field along the flow of a vector field can be mathematically formalized exactly according to the procedures laid out in previous sections of this post: define a trajectory (flow) whose derivative with respect to a parameter \( t \) is given by a vector field evaluated along that trajectory, such that the change in a scalar field for a particle following that trajectory is the directional derivative of the scalar field at every point along the trajectory with respect to the vector field along the trajectory. Thus, the Lie derivative of a scalar field \( f \) with respect to a vector field \( \hat{u} \) is \( \mathcal{L}_{\hat{u}} f = \hat{u}(f) \). This is easy to justify because scalars assigned to different points on the manifold can be directly compared to each other.

The change in a vector field along the flow of another vector field is harder to intuitively picture because of the fundamental restriction that vectors at different points on the manifold cannot be directly compared to each other. This in turn makes the mathematical derivation more difficult.

One attempt could be to see how a vector field \( \hat{v} \), once it has acted on a scalar field \( f \), changes along the flow of another vector field \( \hat{u} \). This would give the quantity \( \hat{u}(\hat{v}(f)) \), as \( \hat{v}(f) \) is another scalar field. However, this has two problems.

  • That quantity has "vector" components that together would not obey coordinate transformation rules as a single vector as desired for the Lie derivative of a vector field with respect to another vector field.
  • That quantity does not account for how \( f \) itself changes along the flow of \( \hat{u} \) before considering the action of \( \hat{v} \).

The latter point suggests the solution of subtracting the reverse ordering of differential operators. Indeed, the correct expression for the Lie derivative of a vector field with respect to another field is a vector field acting on a scalar field \( (\mathcal{L}_{\hat{u}} \hat{v})(f) = \hat{u}(\hat{v}(f)) - \hat{v}(\hat{u}(f)) \). This gives how the action of the \( \hat{v} \) on \( f \) changes along the flow generated by \( \hat{u} \), accounting for how \( f \) itself is changed by the flow generated by \( \hat{u} \).

It is common for the Lie derivative of a vector field with respect to another vector field to be simply defined, without reference to a scalar field as an argument, as \( \mathcal{L}_{\hat{u}} \hat{v} = \hat{u}\hat{v} - \hat{v}\hat{u} \). However, this raises the problems from earlier in this post about how vectors act on scalar fields versus other types of objects. These kinds of problems with consistent notation will come up later in this post in other contexts (particularly covectors, tensors, and covariant derivatives) and motivate me in this post to introduce slightly different notation & intuition to avoid these notational difficulties, knowing that in practice in general relativity, the existence of a metric makes the covariant derivative more important than the Lie derivative.

Gradients and covectors on the manifold

It has taken me a long time to understand why gradients naturally lead to covectors on curved manifolds and why these gradients should be distinguished from directional derivatives, and I still feel like my understanding is incomplete. In elementary multivariable calculus, I learned that the gradient is a vector \( \nabla f(\vec{x}) = (\partial_{i} f) \vec{e}_{i} \) (where Einstein summation is used without distinguishing contravariant versus covariant objects, as can be done slightly sloppily in a flat manifold with a trivial metric), so the directional derivative with respect to a vector \( \vec{v} \) at a given point was simply given by \( \vec{v} \cdot \nabla f(\vec{x}) = (\nabla f(\vec{x})) \cdot \vec{v} = v_{i} \partial_{i} f \) (where Einstein summation is again used without distinguishing contravariant versus covariant objects). I think that I have had to unlearn some of this intuition, and that unlearning is not complete, so this may at least in part explain my struggles with understanding these concepts. Having gone over what I have had to unlearn, the rest of this post, until further notice, will return to the proper use of Einstein summation with contravariant & covariant objects.

In elementary multivariable calculus, the notation \( \vec{v} \cdot \nabla f(\vec{x}) \) makes it tempting to treat the gradient \( \nabla f(\vec{x}) \) as a vector just like any other vector \( \vec{v} \). It is said to be oriented in the direction where \( f \) is increasing fastest and have a magnitude which captures how "fast" \( f \) is increasing in that direction. The notion of the covariance of the components of \( \nabla f(\vec{x}) \) is brushed aside as an annoyance in a flat manifold that only needs to be dealt with on occasion through appropriate application of the Jacobian of a coordinate transformation. However, the treatment of a gradient as a covector can be seen most naturally by considering that the gradient shows the density of level sets of \( f \). In particular, level sets are most dense (smallest spacing) when the gradient has the largest magnitude, and the direction perpendicular to the level sets at any given point is the generalized direction of that gradient covector.

Thus, the gradient covector can be defined for a scalar field \( f \) at a manifold point \( p \) as \( \tilde{\mathrm{d}}f(p) = (\partial_{i} f)_{p(\breve{x})} \tilde{\omega}^{i}(p) \) where the basis covectors \( \tilde{\omega}^{i}(p) \) are being used provisionally but will be replaced with a better definition very soon. If the scalar field is defined as \( f(p(\breve{x})) = x^{i}(p) \) for a fixed index \( i \), then one can show that \( \tilde{\mathrm{d}}x^{i}(p) = \tilde{\omega}^{i}(p) \) (because \( \partial_{j} x^{i} = \delta_{j}^{\, i} \)). The notation \( \tilde{\mathrm{d}}x^{i}(p) \) is much more common in differential geometry & general relativity for curved manifolds. Moreover, it has an intuitive interpretation: the basis covector in a given direction is the gradient covector of the coordinate in the same direction, with unit magnitude for the spacing of level sets (parallel hyperspaces).

Any covector can be defined as a weighted sum of basis covectors: at a manifold point \( p \), \( \tilde{a} = a_{i} \tilde{\mathrm{d}}x^{i}(p) \). Gradient covectors are covectors. However, not every covector is a gradient covector! A covector \( \tilde{a} \) is a gradient covector iff \( \partial_{i} a_{j} - \partial_{j} a_{i} = 0 \), which in turn requires \( \tilde{a} \) to actually be a covector field defined over at least a neighborhood of \( p \) (not only at \( p \)).

The basis covectors are related to the basis vectors such that for any manifold point \( p \), \( \tilde{\mathrm{d}}x^{i}(p) (\hat{\partial}_{j}|_{p}) = \delta^{i}_{\, j} \). The ordering of basis covectors acting on basis vectors, and not the reverse, becomes important in curved manifolds for reasons that will become clear later in this post. This means that at any manifold point \( p \), given a vector \( \hat{v} \) and covector \( \tilde{a} \), \( \tilde{a}(\hat{v}) = a_{i} v^{i} \) is an invariant scalar. Notably, this implies that for a scalar field \( f \) at a manifold point \( p \), \( \tilde{\mathrm{d}}f(p)(\hat{v}(p)) = \hat{v}(f)|_{p} = (\partial_{i} f)|_{p} v^{i}(p) \) is the directional derivative of \( f \) in the direction of the vector \( \hat{v} \).

Through the rest of this post, I will be a little less strict with the notation of specifying vectors, covectors, or tensors to be restricted to specific points on the manifold. This also means that the notation won't necessarily distinguish vectors, covectors, and tensors at specific points from field counterparts defined over at least one significant subset of the manifold, so the distinctions will only be clear from the text. 

My linear algebraic intuition for basis vectors and basis covectors

Although vectors in differential geometry are treated as differential operators that can act on scalar fields, they do also follow the usual rules of elements of a vector space. This has made me try to picture these vectors in terms of concepts from linear algebra. In particular, a scalar field may be written as a column vector in which each row corresponds to a point on the manifold and the value in each row is the value of the function at that point on the manifold. A vector at a point on the manifold is a differential operator that maps a scalar field to a single real number associated with that point. This suggests that a vector, as a differential operator, is a row vector, and the action of that differential operator on the scalar field is the inner product of the row vector with the column vector. In particular, in the limit of a coordinate displacement \( \Delta x^{i} \to 0 \) for a fixed direction \( i \), the basis vector \( \hat{\partial}_{i}|_{p} \) is a row vector such that the element at the column corresponding to the point \( p \) is \( \frac{1}{\Delta x^{i}} \), the element at the column corresponding to the point \( p(x^{1}, x^{2}, \ldots, x^{i} - \Delta x^{i}, \ldots, x^{N}) \) is \( -\frac{1}{\Delta x^{i}} \), and the elements at all other columns are zero. Any vector at a manifold point \( p \) would be a linear combination of those basis vectors at that manifold point.

If scalar functions and vectors are defined in these ways, then covectors can also be defined as column vectors. In particular, in the limit of a coordinate displacement \( \Delta x^{i} \to 0 \) for a fixed direction \( i \), the basis covector \( \tilde{\mathrm{d}}x^{i}(p) \) is a column vector such that the elements are zero for all rows corresponding to points \( p \) in which \( x^{i} \) is less than the specified value and \( \Delta x^{i} \) for all other rows. Any covector at a manifold point \( p \) would be a linear combination of those basis covectors at that manifold point. This has the following benefits.

  • The fact that both scalar field and covectors are represented as column vectors shows why it is not possible to claim that a covector "acts" on a scalar field.
  • If the action of a covector on a vector, yielding a scalar, is defined as the trace of the outer product of a column vector with a row vector, then this formulation of basis vectors & basis covectors is a discrete statement of the equation \( \frac{\partial}{\partial x} \Theta(x - x') = \delta(x - x') \) where \( \Theta \) is the Heaviside step function. This is particularly helpful because differentiable manifolds consider smooth scalar fields (which are infinitely differentiable everywhere), whereas the Heaviside step function is not continuous, so covectors are inherently distinct from scalar functions despite both being represented as column vectors.

However, this formulation has the following drawbacks.

  • If a vector is treated as a row vector and a covector is treated as a column vector, then it may be more natural to simply take the inner product of the row vector with the column vector to yield the scalar instead of taking the more cumbersome trace of the outer product. However, this may erroneously yield to the conclusion that it is possible to evaluate a vector that takes a covector as its function argument; this is not possible.
  • In addition to following the usual rules for elements of a vector space (largely linearity), vectors on a differentiable manifold must follow the Leibniz rule, meaning that \( \hat{v}(f_{1} f_{2}) = f_{1} \hat{v}(f_{2}) + f_{2} \hat{v}(f_{1}) \). The way to implement this using row vectors, column vectors, and matrices is quite cumbersome & unintuitive by comparison.
  • The fact that vectors (and covectors) at different points on the manifold cannot be added to each other is much harder to keep track of when working with row vectors or column vectors.

Thus, this formulation should not be taken too seriously.

For completeness, though, it is worth noting that a vector field, defined at every point on the manifold, can be represented as a matrix, such that each row corresponds to a vector at a different point on the manifold, and a covector field, defined at every point on the manifold, can be represented as a matrix, such that each column corresponds to a covector at a different point on the manifold. The action of a vector field on a scalar field is another scalar field, which matches how the action of the matrix representing a vector field on the column vector representing a scalar field yields another column vector. This also has the benefit of letting the Lie derivative of a vector field with respect to another vector field be represented as the commutator of two matrices representing vector fields. However, this also presents the following problems.

  • A matrix representing a covector field must be multiplied to the left of a matrix representing a vector field for the result to make sense. The reverse is not allowed.
  • Even if a matrix representing a covector field is multiplied to the left of a matrix representing a vector field, the result is another matrix, whereas it should be a scalar field that should be represented as a column vector.
  • A covector field cannot act on a scalar field or a covector, even though a matrix can be multiplied to the left of a column vector.

These problems underscore why this formulation should not be taken too seriously.

Tensors

Tensors are essentially linear combinations of tensor products (which is a little tautological, but I use that term to avoid the notion of "outer products" in the linear algebraic perspective) of vectors and covectors with each other any number of times. Furthermore, a basis tensor of a certain type is simply the required outer product of basis vectors & basis covectors. A tensor can also be seen as a map from an ordered tuple of vectors and covectors to a scalar; the vectors, covectors, and tensor must all be defined at the same point on the manifold.

A general tensor of order \( (m, n) \) can be written as \( T =  T^{i_{1}i_{2}\ldots i_{m}}_{\;\;\;\;\; j_{1}j_{2}\ldots j_{n}} \bigotimes_{k = 1}^{m} \hat{\partial}_{i_{k}} \bigotimes_{l = 1}^{n} \tilde{\mathrm{d}}x^{j_{l}} \). This maps \( n \) vectors and \( m \) covectors to a scalar. For example, a tensor of order (3, 0) can be written as \( A = A^{ijk} \hat{\partial}_{i} \otimes \hat{\partial}_{j} \otimes \hat{\partial}_{k} \), while a tensor of order (1, 3) can be written as \( B = B^{i}_{\; jkl} \hat{\partial}_{i} \otimes \tilde{\mathrm{d}}x^{j} \otimes \tilde{\mathrm{d}}x^{k} \otimes \tilde{\mathrm{d}}x^{l} \).

An important tensor is the metric \( g = g_{ij} \tilde{\mathrm{d}}x^{i} \otimes \tilde{\mathrm{d}}x^{j} \). This object gives a notion of inner products between vectors or covectors by converting vectors into equivalent covectors & vice versa. Given a vector \( \hat{v} \), the equivalent covector \( \tilde{v} \) is defined such that for any vector \( \hat{u} \), \( \tilde{v}(\hat{u}) = g(\hat{v}, \hat{u}) \). In particular, \( \tilde{v} = v_{i} \tilde{\mathrm{d}}x^{i} \), so \( \tilde{v}(\hat{u}) = v_{j} u^{j} = g(\hat{v}, \hat{u}) = g_{ij} v^{i} u^{j} \) means that \( v_{j} = g_{ij} v^{i} \). In general relativity, the metric is symmetric, so \( g(\hat{u}, \hat{v}) = g(\hat{v}, \hat{u}) \) for any pair of vectors; this means that \( g_{ij} = g_{ji} \).

If \( \hat{u}(\gamma(t)) \) is a velocity vector field evaluated at a time parameter \( t \) along a trajectory \( \gamma \) and the manifold is Riemannian, then \( \int_{0}^{t} \sqrt{g(\hat{u}(\gamma(t')), \hat{u}(\gamma(t')))}~\mathrm{d}t' \) is the distance traveled at time \( t \); if the manifold is Lorentzian, then this interpretation of distance only holds if the trajectory is spacelike (in which case \( t \) is a parameter that cannot be interpreted as time), whereas if the trajectory is timelike, then the expression \( \int_{0}^{t} \sqrt{-g(\hat{u}(\gamma(t')), \hat{u}(\gamma(t')))}~\mathrm{d}t' \) is the proper time experienced by the particle along the trajectory.

Getting the equivalent vector components of a covector requires the inverse metric \( g^{-1} = g^{ij} \hat{\partial}_{i} \otimes \hat{\partial}_{j} \), which is symmetric just like the metric. The inverse metric is such that if \( \tilde{u} \) & \( \tilde{v} \) are respectively the equivalent covectors of the vectors \( \hat{u} \) & \( \hat{v} \), then \( g^{-1}(\tilde{u}, \tilde{v}) = g(\hat{u}, \hat{v}) \). Moreover, the components of the inverse metric are such that \( g_{ik} g^{kj} = \delta_{i}^{\; j} \) as expected.

Thus, the metric effectively makes contravariant things covariant, while the inverse metric does the reverse. This means that the components of the metric can be used to create lowered-index versions of tensor components. As an example, if \( A = A^{ijk} \hat{\partial}_{i} \otimes \hat{\partial}_{j} \otimes \hat{\partial}_{k} \), then \( A^{ij}_{\; k} = g_{kl} A^{ijl} \); this can be obtained as \( A^{ijk} \hat{\partial}_{i} \otimes \hat{\partial}_{j} \otimes g(\hat{\partial}_{k}, \hat{\partial}_{l}) \tilde{\mathrm{d}}x^{l} \). Similarly, the components of the inverse metric can be used to create raised-index versions of tensor components. One must be careful though about the reuse of symbols to have raised versus lowered indices, especially if the lowered indices do not come after all of the raised indices.

Covariant derivatives

The most natural generalization of the partial or directional derivatives to act on vector fields, covector fields, or higher-order tensor fields (not merely scalar fields) on curved manifolds which have a metric is the covariant derivative. Many texts about differential geometry in the context of general relativity will use language like "the covariant derivative shows how to connect a tensor field at two different points on the manifold", but those same texts will start from the formula \( \nabla_{i} \hat{\partial}_{j} = \Gamma^{k}_{\; ij} \hat{\partial}_{k} \). The latter formula suggests that given a curve \( \gamma \) in the neighborhood of a point \( p \) (such that \( \gamma(0) = p \)) in which the curve corresponds to the flow generated by the vector \( \hat{v} \) at that point, the "change" in the basis vectors in that direction (\( \hat{v} \)) due to the curvature of the manifold (or at least the nontrivial nature of connection coefficients in the chosen coordinate chart) can be written as something like \( \mathrm{d}\hat{\partial}_{j}|_{\gamma(\mathrm{d}t)} = \nabla_{(\hat{u})~\mathrm{d}t} \hat{\partial}_{j}|_{p} = u^{i} \nabla_{i} \hat{\partial}_{j}|_{p}~\mathrm{d}t = u^{i} \Gamma^{k}_{\; ij} \hat{\partial}_{k}|_{p}~\mathrm{d}t \). This should raise alarm because basis vectors at different points on the manifold exist in different tangent spaces that cannot be compared. That in turn means that the only way to interpret the action of a covariant derivative on a tensor field with respect to a vector is to consider how the components change. Thus, a more careful exposition is needed to properly build the intuition of what is happening.

The formula \( \nabla_{i} \hat{\partial}_{j} = \Gamma^{k}_{\; ij} \hat{\partial}_{k} \) is indeed correct, but it does not provide much intuition at this stage. Instead, it is more helpful to consider a vector field \( \hat{y} = y^{i} \hat{\partial}_{i} \), so that \( (\nabla_{i} \hat{y})|_{p} = ((\partial_{i} y^{k})|_{p} + \Gamma^{k}_{\; ij} y^{j}(p)) \hat{\partial}_{k}|_{p} \) (after relabeling indices that have been summed over). Rather than attempting to claim that the basis vectors themselves are changing in a way that would invite comparison between basis vectors (that are incomparable because they exist in different tangent spaces corresponding to different points on the manifold), this equation simply shows how at a manifold point \( p \) in the direction of increasing \( x^{i} \) for fixed \( i \), the components of a vector field \( \hat{y} \) have changes not only from the changes in position as expected in \( (\partial_{i} y^{k})|_{p} \) but also from the curvature of the manifold as encoded by \( \Gamma^{k}_{\; ij} y^{j}(p) \). Together, the components of the whole object transform as the components of a vector should.

This is most useful to show how a vector can be "moved" from one point to another. Given a parametric curve \( \gamma \) with parameter \( t \), if \( u^{i}(\gamma(t)) = \frac{\mathrm{d}}{\mathrm{d}t} x^{i}(\gamma(t)) \), then if the vector field \( \hat{y} \) takes on the vector value \( \hat{Y} \) at the point \( \gamma(0) \), then the components of what \( \hat{y} \) "would be" at another point along that curve \( \gamma(t) \) can be found be saying that \( \hat{y} \) is invariant under parallel transport along that curve. Mathematically, this means that \( \nabla_{\hat{u}(\gamma(t))} \hat{y} = 0 \) must be solved, yielding the set of coupled ordinary differential equations \( \frac{\mathrm{d}}{\mathrm{d}t} y^{i}(\gamma(t)) + \Gamma^{i}_{\; jk} u^{j}(\gamma(t)) y^{k}(\gamma(t)) = 0 \) (having used the fact that \( u^{k} \partial_{k} y^{i}|_{\gamma(t)} = \frac{\mathrm{d}}{\mathrm{d}t} y^{i}(\gamma(t)) \) as \( \gamma \) is an integral curve of \( \hat{u} \)).

Properties

The covariant derivative \( \nabla_{\hat{u}} T \) of a tensor field \( T \) in the direction of the vector \( \hat{u} \) at the manifold point \( p \) has the following properties.

  • The result \( \nabla_{\hat{u}} T \) is a tensor of the same order as \( T \).
  • It is linear in the direction vector \( \hat{u} \), meaning that if \( \hat{u} \) & \( \hat{v} \) are vectors while \( a \) & \( b \) are scalars, then \( \nabla_{a\hat{u} + b\hat{v}} T = a\nabla_{\hat{u}} T + b\nabla_{\hat{v}} T \).
  • It is linear in the tensor field \( T \) with respect to constant scalars, meaning that if \( S \) & \( T \) are tensor fields while \( a \) & \( b \) are uniform scalars (as opposed to scalar fields), then \( \nabla_{\hat{u}} (aS + bT) = a\nabla_{\hat{u}} S + b\nabla_{\hat{u}} T \).
  • It obeys the Leibniz rule, so if \( S \) & \( T \) are tensor fields, then \( \nabla_{\hat{u}} (S \otimes T) = (\nabla_{\hat{u}} S) \otimes T + S \otimes (\nabla_{\hat{u}} T) \). (If \( S \) is a scalar field, then the tensor product reduces to ordinary multiplication. This is how the formula for the covariant derivative of a vector field with respect to a vector is derived in terms of explicit components.)
  • For a fixed tensor field \( T \) of order \( (m, n) \), the quantity \( \nabla T \) is a tensor of order \( (m, n + 1) \), because it maps vectors \( \hat{u} \) to tensors \( \nabla_{\hat{u}} T \) of order \( (m, n) \). It can be written explicitly as \( \nabla T|_{p} = (\nabla_{j} T)|_{p} \otimes \tilde{\mathrm{d}}x^{j}(p) \).

Index notation and covariant derivatives

It is technically more proper to say that the subscript of the covariant derivative symbol \( \nabla \) should be a vector, not merely an index. In particular, for any tensor field \( T \) and vector \( \hat{u} = u^{i} \hat{\partial}_{i} \), \( \nabla_{\hat{u}} T = u^{i} \nabla_{\hat{\partial}_{i}} T \) holds by linearity. The objects \( \nabla_{\hat{\partial}_{i}} T \) are tensors of order \( (m, n) \) (the same as \( T \)), but they transform in an additional way due to the transformation of the basis vectors in the subscript of the covariant derivative that cancels the transformation of the vector components \( u^{i} \) (hence the name "covariant" derivative). In any case, it is common to abbreviate \( \nabla_{\hat{\partial}_{i}} T \) as \( \nabla_{i} T \). As long as one recognizes that these two objects are the same for each index \( i \) and are tensors of order \( m, n \) and as long as one remembers the transformation properties implied by the dangling basis vector \( \hat{\partial}_{i} \) in the subscript of the covariant derivative, then I think that this is a forgivable abuse of notation.

The arguably more problematic abuse of notation is applying the covariant derivative notation \( \nabla_{i} \) to tensor components. For example, the covariant derivative of a vector \( \hat{y} \) in the direction of increasing \( x^{i} \) for fixed \( i \) is \( \nabla_{i} \hat{y} = (\partial_{i} y^{k} + \Gamma^{k}_{\; ij} y^{j}) \hat{\partial}_{k} \), and this is often abbreviated as \( \nabla_{i} y^{j} = \partial_{i} y^{j} + \Gamma^{j}_{\; ik} y^{k} \). The problem is that if the components of \( \hat{y} \) are treated as independent scalar fields, then \( \nabla_{i} y^{j} \) would just be \( \partial_{i} y^{j} \); the irony is that this very property (coming after application of the Leibniz rule to the product of the components \( y^{j} \) with the basis vectors \( \hat{\partial}_{j} \)) is what allows the derivation of the components \( \partial_{i} y^{j} + \Gamma^{j}_{\; ik} y^{k} \) of the covariant derivative vector \( \nabla_{i} \hat{y} \). At the same time, I recognize that the notation \( \nabla_{i} y^{j} \) emphasizes the way that \( \nabla \hat{y} \) is a tensor of order (1, 1), because for a fixed vector field \( \hat{y} \), it maps vectors \( \hat{u} \) to vectors \( \nabla_{\hat{u}} \hat{y} \). In particular, that tensor of order (1, 1) can be written explicitly as \( \nabla \hat{y} = (\nabla_{i} \hat{y}) \otimes \tilde{\mathrm{d}}x^{i} = (\partial_{i} y^{j} + \Gamma^{j}_{\; ik} y^{k}) \hat{\partial}_{j} \otimes \tilde{\mathrm{d}}x^{i} \), so following the usual abuse of notation would yield \( \nabla \hat{y} = (\nabla_{i} y^{j}) \hat{\partial}_{j} \otimes \tilde{\mathrm{d}}x^{i} \). Perhaps the better practice would be to use the notation \( \tilde{\mathrm{d}}x^{j} (\nabla_{i} \hat{y}) \) instead of \( \nabla_{i} y^{j} \), especially as the former shows that it is component \( j \) of the vector \( \nabla_{i} \hat{y} \), and generalize this appropriately to higher-order tensors.

Problems with denoting vectors as partial derivatives

Throughout this post, I have illustrated aspects of typical notation in differential geometry that I have found to be problematic. At those points in this post, I have attempted to reconcile these problems with more fundamental findings in differential geometry and use similar but slightly more clear notation. However, in differential geometry as applied to general relativity, I find the use of differential operators to denote vectors to be extremely problematic for the following reasons.

  • If a vector \( \hat{v} \) is represented in terms of coordinate differential operators as \( \hat{v} = v^{i} \hat{\partial}_{i} \), then there is an easy interpretation of its action on a scalar field \( f \) as the directional derivative \( \hat{v}(f) = v^{i} \partial_{i} f \). However, what would be the action of a hypothetical tensor of order (2, 0) \( T = \hat{u} \otimes \hat{v} \) on a scalar field \( f \)? Would it be \( \hat{u}(f) \hat{v} \), \( \hat{v}(f) \hat{u} \), or \( \hat{u}(f)\hat{v} + \hat{v}(f) \hat{u} \)? The mathematical result and intuitive interpretation are both non-obvious. It may even be possible that a higher-order tensor, despite being made of tensor products involving vectors, cannot act on a scalar field in any meaningful way, which would negate the need for expressing vectors as differential operators.
  • The action of a vector \( \hat{v} \) on a scalar field \( f \) is easily written as \( \hat{v}(f) = v^{i} \partial_{i} f \). However, what is the action of a vector on anything other than a scalar field? Can such a thing be well-defined? The following points are salient.
    • The action of a covector on a vector is defined to be \( \tilde{a}(\hat{v}) = a_{i} v^{i} \), so \( \tilde{\mathrm{d}}x^{i} (\hat{\partial}_{j}) = \delta^{i}_{\; j} \). However, what is the action of a vector on a covector? It would have been nice if \( \hat{v}(\tilde{a}) = \tilde{a}(\hat{v}) = v^{i} a_{i} \). The problem is that consistent use of partial derivatives would imply that \( \hat{v}(\tilde{a}) = v^{j} (\partial_{j} a_{i}) \tilde{\mathrm{d}}x^{i} + v^{j} a_{i} \hat{\partial}_{j} (\tilde{\mathrm{d}}x^{i}) \), so even if the basis vectors are defined to act on the basis covectors such that \( \hat{\partial}_{j} (\tilde{\mathrm{d}}x^{i}) = \delta_{j}^{\; i} \) (in order to give meaning to the expression \( \hat{\partial}_{j} (\tilde{\mathrm{d}}x^{i}) \), because otherwise, it is impossible to take a derivative of a covector because covectors at different points on the manifold exist in different cotangent spaces and are therefore incomparable), the existence of an additional term involving \(\partial_{j} a_{i} \) becomes problematic.
    • Facing this problem, it is tempting to claim that the basis vectors only act on scalar fields as partial differential operators and on basis covectors to yield scalars (and on nothing else). However, this ignores the derivation of the formula for the Lie derivative of a vector field with respect to another vector field, which is often written as \( \mathcal{L}_{\hat{u}} \hat{v} = \hat{u}\hat{v} - \hat{v}\hat{u} \) and is written in coordinate components as \( \mathcal{L}_{\hat{u}} \hat{v} = (u^{i} \partial_{i} v^{j} - v^{i} \partial_{i} u^{j}) \hat{\partial}_{j} \); that derivation requires the application of the basis vectors as partial differential operators acting on vector field components. Thus, there are fundamental inconsistencies arising in the way that the basis vectors act on different objects in this formulation.
  • It is tempting to interpret the covariant derivative of a basis vector with respect to another basis vector in terms of its action on a scalar field, such that the result gives how the action of a basis vector on a scalar field changes locally in the direction of another basis vector. However, this would intuitively suggest the second derivative \( \partial_{i} \partial_{j} f \), whereas the covariant derivative of a basis vector with respect to another basis vector applied to a scalar field is the first derivative \( (\nabla_{i} \hat{\partial}_{j})(f) = \Gamma^{k}_{\; ij} \partial_{k} f \). Thus, when considering the covariant derivative (which is very important to general relativity), the interpretation of vectors as differential operators becomes fundamentally misleading.

I understand that in applications of differential geometry to systems where a metric might not be defined (so neither can a covariant derivative), where evaluating scalar & vector fields along curves (perhaps representing particle trajectories) is important (making the Lie derivative important in turn), and where higher-order tensors are much less important, these considerations are significantly outweighed by the benefits of casting vectors as differential operators. However, my arguments are made in the contexts of general relativity & similar topics in physics that use differential geometry, and I believe in the validity of my arguments in those contexts. Thus, I propose somewhat different notation for vectors in differential geometry in those contexts.

Better notation for vectors in physics-relevant differential geometry contexts

Notational changes and implications are as follows. 

Coordinates

The notation \( \breve{x} \) for coordinates will not change. 

Covectors

Notation will not change for covectors and particularly gradient & basis covectors. However, gradient covectors will take on a much more important role than before, as they will constitute the only vehicle for the emergence of differential operators.

Vectors 

Vectors will be denoted like \( \vec{v} \), as is more typically done in physics. They will not be associated with differential operators. Instead, they will simply be elements of an abstract vector space, obeying the usual rules of linearity. The basis vector \( \vec{e}_{i} \) in each direction \( i \) will be associated with the direction of increasing coordinate \( x^{i} \); this can be pictured as an infinitesimal arrow of unit magnitude (with respect to dot products with the covector, as will be defined below) oriented in that direction and associated with that manifold point, and I am convinced that most people who use differential geometry in general relativity already have this picture in their heads for vectors instead of picturing vectors as differential operators. As before, vectors at different points on the manifold will exist in different tangent spaces and therefore be incomparable.

Dot products

Covectors can act on vectors to yield scalars, and vectors can act on covectors to yield scalars. These will be identical for the same vector-covector pair. In particular, \( \tilde{a}(\vec{v}) = \vec{v}(\tilde{a}) = v^{i} a_{i} = a_{i} v^{i} \). This will also be denoted as \( \tilde{a} \cdot \vec{v} = \vec{v} \cdot \tilde{a} \), noting that this dot product at this stage does not require the existence of a metric. The dot product relations for basis vectors & basis covectors are \( \vec{e}_{i} \cdot \tilde{\mathrm{d}}x^{j} = \tilde{\mathrm{d}}x^{j} \cdot \vec{e}_{i} = \delta_{i}^{\; j} \). The fact that vectors are no longer differential operators means that there is no longer any confusion about how a vector might act on a covector; a vector acts on a covector in an identical way to that covector acting on that vector (through the dot product). 

The dot product is the mechanism for extracting directional derivatives from gradient covectors. In particular, given a scalar field \( f \) and its gradient covector \( \tilde{\mathrm{d}}f \) at a given manifold point \( p \), the directional derivative with respect to a vector \( \vec{v} \) is \( \tilde{\mathrm{d}}f \cdot \vec{v} = \vec{v} \cdot \tilde{\mathrm{d}}f = v^{i} (\partial_{i} f)|_{p} \) as expected. This is the only way now to write a directional derivative with respect to a vector, as a vector cannot be said to act on a scalar field.

Lie derivatives

The fact that differential operators can now come only from the gradient covector significantly changes the appearance, though not the actual definitions, of Lie derivatives. This can be seen as follows.

From before, the Lie derivative of a scalar field with respect to a vector field was the directional derivative \( \mathcal{L}_{\hat{u}} f = \hat{u}(f) = u^{i} \partial_{i} f \). Now, it must be expressed as \( \mathcal{L}_{\vec{u}} f = \tilde{\mathrm{d}}f \cdot \vec{u} = u^{i} \partial_{i} f \).

From before, the Lie derivative of a vector field with respect to another vector field was the vector field \( \mathcal{L}_{\hat{u}} \hat{v} = \hat{u}\hat{v} - \hat{v}\hat{u} \), such that its action on a scalar field would be \( (\mathcal{L}_{\hat{u}} \hat{v})(f) = \hat{u}(\hat{v}(f)) - \hat{v}(\hat{u}(f)) \). In explicit coordinates, it would be written as \( \tilde{\mathrm{d}}x^{i} (\mathcal{L}_{\hat{u}} \hat{v}) = u^{j} (\partial_{j} v^{i}) - v^{j} (\partial_{j} u^{i}) \).

Now, the vector field obtained as the Lie derivative of a vector field with respect to another vector field looks more cumbersome to write in terms of gradient covectors. In particular, \( u^{j} (\partial_{j} v^{i}) - v^{j} (\partial_{j} u^{i}) = \vec{u} \cdot \tilde{\mathrm{d}}v^{i} - \vec{v} \cdot \tilde{\mathrm{d}}u^{i} \), so the resulting vector is \( \mathcal{L}_{\vec{u}} \vec{v} = (\vec{u} \cdot \tilde{\mathrm{d}}v^{i} - \vec{v} \cdot \tilde{\mathrm{d}}u^{i}) \vec{e}_{i} \). This bears some similarity to the expression of the Lie derivative of a scalar field with respect to a vector field in terms of gradient covectors. Additionally, it bears some similarity to the expression from flat 3-dimensional Riemannian manifolds \( \nabla \cdot (\vec{u} \times \vec{v}) = (\nabla \times \vec{u}) \cdot \vec{v} - \vec{u} \cdot (\nabla \times \vec{v}) \), showing how to generalize those cross products & curl operations to \( N \) dimensions via the Lie derivative. It may be possible to further define the notation \( \tilde{\mathrm{d}}\vec{u} = (\partial_{i} u^{j}) \vec{e}_{j} \otimes \tilde{\mathrm{d}}x^{i} \), such that the dot products of vectors act on the associated covectors (meaning that \( \vec{v} \cdot \tilde{\mathrm{d}}\vec{u} = (\partial_{i} u^{j}) \vec{e}_{j} \otimes \vec{v} \cdot \tilde{\mathrm{d}}x^{i} = v^{i} (\partial_{i} u^{j}) \vec{e}_{j} \) as desired). This allows writing the Lie derivative as \( \mathcal{L}_{\vec{u}} \vec{v} = \vec{u} \cdot \tilde{\mathrm{d}}\vec{v} - \vec{v} \cdot \tilde{\mathrm{d}}\vec{u} \), showing a clearer parallel to/generalization of the formula \( \nabla \cdot (\vec{u} \times \vec{v}) = (\nabla \times \vec{u}) \cdot \vec{v} - \vec{u} \cdot (\nabla \times \vec{v}) \) from flat 3-dimensional Riemannian manifolds. Note that just as before, components like \( \partial_{i} u^{j} \) by themselves which involve partial derivatives of vector components do not transform as vector or higher-order tensor components should transform; only the components of the antisymmetrized expression in the Lie derivative transform as desired.

It is arguably a defect of this formulation that the Lie derivative requires the use of covectors and particularly gradient covectors in addition to vectors, whereas the traditional formulation involves only vectors as differential operators. However, I find that this formulation provides insight on its own with clearer analogies to familiar formulas in multivariable calculus. Additionally, I am less worried about this defect because in practice, Lie derivatives are not used very much in the application of differential geometry to general relativity.

Tensors

Tensors are constructed the same way as before with tensor products, except using \( \vec{e}_{i} \) instead of \( \hat{\partial}_{i} \) for basis vectors. This has the benefit of no longer creating a possible ambiguity of how tensors are supposed to act on scalar fields. Additionally, the metric & its inverse work the same way as before (modulo these changes in notation for basis vectors).

Covariant derivatives

The definition of the covariant derivative of a scalar or basis vector with respect to another basis vector is the same as before (modulo these changes in notation for basis vectors). However, the fact that basis vectors are no longer associated with differential operators makes it much easier to avoid problematic intuitive derivations or interpretations of the action on a scalar field of the covariant derivative of a vector field with respect to another vector.

Additionally, the introduction of dot products makes it much easier to derive the formula for the covariant derivative of a covector with respect to a vector. In particular, if the covariant derivative is required to obey the Leibniz rule for dot products just like scalar field multiplication or tensor products, then \( \nabla_{\vec{u}} (\tilde{a} \cdot \vec{v}) = (\nabla_{\vec{u}} \tilde{a}) \cdot \vec{v} + \tilde{a} \cdot \nabla_{\vec{u}} \vec{v} \), and as \( \tilde{a} \cdot \vec{v} \) is a scalar field, then \( \nabla_{\vec{u}} (\tilde{a} \cdot \vec{v}) = \vec{u} \cdot \tilde{\mathrm{d}}(\tilde{a} \cdot \vec{v}) \). so \( (\nabla_{\vec{u}} \tilde{a}) \cdot \vec{v} = \vec{u} \cdot \tilde{\mathrm{d}}(\tilde{a} \cdot \vec{v}) - \tilde{a} \cdot \nabla_{\vec{u}} \vec{v} \). This can be evaluated with basis vectors & basis covectors to show that \( \nabla_{i} \tilde{\mathrm{d}}x^{j} = -\Gamma^{j}_{\; ik} \tilde{\mathrm{d}}x^{k} \).