For many years when and since I was in college, I wondered whether it might be possible to consistently represent contravariant & covariant objects using vector & matrix notation. In particular, when I learned about the idea of covariant representations of [invariant] vectors being duals to contravariant representations of [invariant] vectors, meaning that if a contravariant representation of a [invariant] vector can be seen as a column vector, then a covariant representation of a [invariant] vector can be seen as a row vector, I wondered how it would be possible to represent the fully covariant metric tensor as a metric tensor if it multiplies a contravariant representation of a [invariant] vector (i.e. a column vector) to yield a covariant representation of a [invariant] vector (i.e. a row vector), especially as traditionally in linear algebra, a matrix acting on a column vector yields another column vector (while transposition, though linear in the sense of respecting addition and scalar multiplication, cannot be represented simply as the action of another matrix). At various points, I've wondered if this means that fully contravariant or fully covariant representations of multi-index tensors should be represented as columns of columns or rows of rows, and I've tried to play around with these ideas more. This post is not the first to explore such ideas even online, as I came across notes online by Viktor T. Toth [LINK], but this post is my attempt to flesh out these ideas further. Follow the jump to see more. Throughout this post, I will work with the notation of 2 spatial indices, in which the fully covariant representation of the metric tensor gij=→ei⋅→ej might not be Euclidean, where indices will use English letters i,j,k,…∈{1,2}, where superscripts do not imply exponents, and where multiple superscripts do not imply single numbers (for example, g12 is the fully covariant component of the metric tensor with first index 1 and second index 2, not the covariant component at index 12 of a single-index tensor (vector)); extensions to spacetime (where the convention is to use indices labeled by Greek letters) and in particular to 3 spatial + 1 temporal dimensions are trivial. Additionally, Einstein summation will be assumed, and all tensors (including vectors & scalars) are assumed to be real-valued. Finally, I will do my best to ensure that when indices are raised or lowered, the ordering of indices is clear (as examples, distinguishing Tij from Tji instead of ambiguously using Tij or Tji), but this will depend on the quality of LaTeX rendering in this post.
Indices
A tensor represented with a contravariant index has that index raised. A tensor represented with a covariant index has that index lowered.
Metric and Kronecker delta tensors
The metric tensor is typically written in fully covariant form as gij. (This depends on the spatial coordinate, but because all manipulations in this post will be at a single point in space, that dependence will be implicit/ignored). The inverse of the metric tensor is the fully contravariant representation gij, defined such that gikgkj=δij, where δij is a representation of the Kronecker delta tensor (1 when i=j, 0 otherwise) such that, in this case, the first index is raised & the second index is lowered. Additionally, because the metric tensor is the object that raises & lowers indices, such that vi=gijvj and vi=gijvj, gij is identical to δij (and likewise gji is identical to δji, which also is 1 when i=j and 0 otherwise) regardless of what gij is.
Vectors and inner products
To be able to represent these objects using matrix notation, it is possible to start by saying that the contravariant representation of a vector vi can be represented in matrix notation as a column vector [v1v2] and the covariant representation of a vector vi can be represented in matrix notation as a row vector [v1v2].
The inner product between two vectors, which is invariant, can be written as →u⋅→v=uivi=u1v1+u2v2. This can be written in matrix notation as a multiplication of a row by a column, namely [u1u2][v1v2] (giving the same thing). This is the only matrix product whose result has fewer dimensions (indices) than the inputs and involves explicit addition of different components; all other products to be discussed are either scalar products or outer products.
Tensor representations as matrices
How would this matrix notation generalize to tensors with more than 1 index, especially when more than 1 index is simultaneously raised or lowered? The discussion in the beginning shows that it clearly can't be an ordinary matrix with rows & columns. Let us starting with the case of a tensor with 2 indices that are both raised, called Aij. It will actually look like a column of column vectors with components arranged as [[A11A21][A12A22]].
Why was this particular ordering chosen? Essentially, in a tensor like the one with representation Aij, the first index should represent the innermost matrix structure (column if raised, row if lowered) and then each index to the right will represent a progressively more exterior matrix structure until the rightmost index represents the outermost matrix structure (in each case column if raised, row if lowered). Intuitively, this can be seen as representing Aij as [Ai1Ai2] and expanding Ai1 as its own column [A11A21] and Ai2 as its own column [A12A22].
This still might not be entirely convincing. Why does this work? Consider the vector vi=Aijuj. In this case, uj is contracted with the outermost index of Aij, so uj must be represented as a row vector multiplying the outermost column vector of the representation of Aij. In particular, this means [v1v2]=[u1u2][[A11A21][A12A22]]. The first step is to use the rule of the inner product to say [u1u2][[A11A21][A12A22]]=u1[A11A21]+u2[A12A22]. This says that the inner product on the outer index of Aij means that each element of the row vector representation of uj is multiplying, as if it were a scalar, the inner entities of Aij which are the column vectors expanded from Ai1 and Ai2. The second step is to evaluate these scalar multiplications & additions to say that [v1v2]=[A11u1+A12u2A21u1+A22u2], so the result is indeed the same as computing this using only indexed quantities.
This still seems a little too neat. What about for wi=Ajiuj? In this case, uj is contracted with the innermost index of Aji. Intuitively, one may imagine that the row vector representation of uj must multiply, as an inner product, the inner columns of Aij. How would this work in matrix notation? By enclosing the row vector in another set of matrix brackets, that row vector functions effectively as a scalar multiplying the inner entities in Aij in the same way. This means that [w1w2]=[[u1u2]][[A11A21][A12A22]]. The first step is to carry out the effective scalar multiplication [[u1u2]][[A11A21][A12A22]]=[[u1u2][A11A21][u1u2][A12A22]]. The second step is to carry out the inner products implied by a row multiplied to the left of a column: [[u1u2][A11A21][u1u2][A12A22]]=[u1A11+u2A21u1A12+u2A22].
In many cases, this procedure of taking the inner product of two tensors (potentially with different numbers of indices) can be generalized to more than 1 or 2 indices. However, there are cases where this might not be easily generalizable. For example, in AjikBjk, contraction is happening in the first & third indices simultaneously for Ajik, but Bjk only has 2 indices (both of which are being contracted with indices of Ajik), so I think it will be hard to consistently represent Bjk with the correct matrix structures to make this work. One solution is to use the properties of the Kronecker delta tensor to say that AjikBjk=δjpδkqApiqBlmδljδmk. I will show how this works in practice in following sections with respect to use of the Kronecker delta tensor. More pressingly, this requires forming the outer product ApiqBlm, which is a tensor with 5 indices (of which the first 3 are raised & the remaining 2 are lowered). This can be done effectively by multiplying each element of Blm with the full representation of Apiq, as a column of columns of columns, as if it were a scalar. As an example, if Aij=aibj, this can be represented as [[a1b1a2b1][a1b2a2b2]]=[[a1a2]b1[a1a2]b2]; effectively, bj represented as the column [b1b2] has multiplicatively absorbed the column ai represented as [a1a2]. This representation crucially depends on the consistent ordering of indices when contractions occur, so some tensor multiplications require more careful accounting of matrix structures at different levels. For example, if Aijk=Bikcj, the contravariant representation of the vector cj is being multiplied by effectively being inserted between the indices of Bik, yet there is no guarantee that Bik can be represented as the product of the contravariant representations of just two vectors. Instead, if one is to represent Aijk as [[[A111A211][A121A221]][[A112A212][A122A222]]], then one must change the representation of Bik from [[B11B21][B12B22]] to be effectively 3 levels deep in which the second level is a dummy/scalar structure of the form [[[B11B21]][[B12B22]]] and the representation of cj from [c1c2] to be 2 levels deep as [[c1c2]] such that the outer level treats cj like a scalar and only the level below that treats it as a column vector which can undergo an outer product. In particular, the product [[[B11B21]][[B12B22]]][[c1c2]] can be evaluated by taking the effective scalar [c1c2] one level into the expanded 3-level representation of Bik, yielding [[[B11B21]][c1c2][[B12B22]][c1c2]]. Next, the column vectors [B11B21] and [B12B22] effectively function as scalars multiplying the individual components of [c1c2], yielding [[B11B21]c1[B11B21]c2] and [[B12B22]c1[B12B22]c2] respectively. Performing the final multiplications and putting these things together yields [[[A111A211][A121A221]][[A112A212][A122A222]]]=[[[B11c1B21c1][B11c2B21c2]][[B12c1B22c1][B12c2B22c2]]] as expected & desired.
Mixed indices and self-contractions
By now, it should be clear that there are no traditional matrices that are just arrays of numbers in rectangles. Instead, any tensor represented using matrix notation should be a scalar, a row, or a column, in which the elements of the rows & columns should themselves be scalars, rows, or columns, to arbitrary depth. This means that there could be a difference, for example, between Aij represented using matrix notation as [[A11A21][A12A22]] and Aji represented using matrix notation as [[A11A12][A21A22]]; the former is a row whose elements are columns, while the latter is a column whose elements are rows. This fact as well as the use of dummy brackets to imply scalar-like multiplication of tensors by elements of other tensors (yielding outer products) ensures that if a row is multiplied by a column without any dummy brackets enclosing the outside of either one, that multiplication can be assumed to be an inner product regardless of ordering; this is because for the outer product aibj to be represented with the first index in the innermost structure, the multiplication must be written in matrix notation as [[a1a2]][b1b2]=[[a1a2]b1[a1a2]b2]=[[a1b1a2b1][a1b2a2b2]] which is a row of columns, while for the outer product aibj to be represented with the first index in the innermost structure, the multiplication must be written in matrix notation as [[a1a2]][b1b2]=[[a1a2]b1[a1a2]b2]=[[a1b1a2b1][a1b2a2b2]] which is a column of rows, so interpreting something like [a1a2][b1b2] as an outer product (as would traditionally be done in matrix algebra) or anything other than an inner product leads to ambiguity.
This representation of 2-index tensors in which one index is raised & the other is lowered leads to a nice connection to ideas from traditional matrix algebra. In particular, Aijvj can be represented as [[A11A21][A12A22]][v1v2]=[A11A21]v1+[A12A22]v2= [A11v1+A12v2A21v1+A22v2] which is a more precise way to show how in matrix algebra, multiplying a matrix by a column vector to the right of a matrix can be seen as adding the columns of the matrix weighted by the elements of the column vector. Similarly, Ajiuj can be represented as [[A11A12][A21A22]][u1u2]=u1[A11A12]+u2[A21A22]= [A11u1+A21u2A12u1+A22u2] which is a more precise way to show how in matrix algebra, multiplying a matrix by a row vector to the left of a matrix can be seen as adding the rows
of the matrix weighted by the elements of the row vector. It is also possible to compute Aijui, which can be represented as [[A11A21][A12A22]][[u1u2]]=[[u1u2][A11A21][u1u2][A12A22]]= [A11u1+A21u2A12u1+A22u2], as well as Ajivi, which can be represented as [[A11A12][A21A22]][[v1v2]]=[[A11A12][v1v2][A21A22][v1v2]]= [A11v1+A12v2A21v1+A22v2], though perhaps as expected, these equations do not carry over neatly from traditional matrix algebra and therefore do not have a corresponding easy interpretation.
It is common for 2-index tensors, in which one index is raised and the other is lowered, to have those indices contracted with each other, yielding a [invariant] scalar. A contraction of Aij would look like Akk, and a contraction of Aji would look like Akk. Because these are not traditional matrices but are columns of rows or rows of columns, it is better to avoid falling back upon the traditional matrix definition of a trace as the sum of diagonal elements, even as Akk looks like the expression ∑k⟨u(k),ˆAu(k)⟩ for a linear operator ˆA and orthonormal basis vectors |u(k)⟩. As it turns out, the Kronecker delta tensors δij behave as an orthonormal basis of row vectors when i is fixed, and the Kronecker delta tensors δji behave as an orthonormal basis of column vectors when i is fixed. This means that Akk=δkiAijδjk and we can explicitly expand the summation as δkiAijδjk=δ1iAijδj1+δ2iAijδj2. As usual, Aij can be represented in matrix notation as [[A11A21][A12A22]]. Meanwhile, the Kronecker delta tensors with one fixed index can be represented as follows: δ1i looks like [[10]] because the index free i is contracted with the inner index of Aij (so a dummy set of brackets is needed for the outer index so that this row vector can be multiplied as an inner product with the interior column vectors inside Aij), δ2i looks like [[01]] for similar reasons as δ1i, δj1 looks like [10] with no need for extra dummy brackets because the contracted index j is the outer index of Aij, and δj2 looks like [01] for similar reasons as δj1. Carrying out these products and sums yields Akk=A11+A22 as expected. This process of enclosing the Kronecker delta tensors' vector constituents in as many dummy brackets as needed to match the position of the corresponding contracted index in another tensor can be generalized to tensors with arbitrary numbers of indices in which an arbitrary index may be contracted. That can be helpful not only for self-contraction but also for contraction with other tensors.
Examples
The Riemann curvature tensor is typically written as Rijkl; it has 4 indices, but only the first index is raised (and some conventions quote this tensor with that index lowered too). The Ricci curvature tensor is written as a self-contraction of the Riemann curvature tensor between its first & third indices, so Rij=Rkikj. This can be rewritten as Rkikj=δkmRmiljδlk where Rmilj can be written in matrix form as [[[[R1111R2111][R1211R2211]][[R1121R2121][R1221R2221]]][[[R1112R2112][R1212R2212]][[R1122R2122][R1222R2222]]]]. The Kronecker delta tensor constituents δkm are row vectors contracted with the innermost index constituents (column vectors) of Rmilj, so δ1m looks like [[[[10]]]] and δ2m looks like [[[[01]]]]. The Kronecker delta tensor constituents δlk are column vectors contracted with the third index constituents (row vectors) of Rmilj, so δl1 looks like [[10]] and δl2 looks like [[01]]. Carrying out these inner products yields Rij represented as [[R1111+R2121R1211+R2221][R1112+R2122R1212+R2222]] as expected and desired.
The Ricci scalar R=Rijgij can can be computed quite simply as R=[[R11R21][R12R22]][[g11g21][g12g22]] because the relevant tensors have the same number of indices and the contracted indices are in the same places for the relevant tensors (meaning the outermost column structure of gij undergoes an inner product with the outermost row structure of Rij and then the innermost column structure of gij undergoes an inner product with the innermost row structure of Rij). As expected, this gives R=R11g11+R21g21+R12g12+R22g22. If instead one wanted to compute something like AijBji without knowing whether those tensors are symmetric with respect to their indices (unlike gij which is known to be symmetric in its indices), one must instead construct Cklij=AijBkl and then perform the contraction Cijij=δmiδnjCklmnδikδjl using the methods described above, as that is a reliable method of doing such contractions.
Conclusions
- It is technically possible to recast tensors with raised or lowered indices in terms of generalized row & column vectors whose constituents may be rows, columns, or numbers, and with suitable definitions of outer products (including the use of dummy brackets) & inner products, recover all of the relevant tensor manipulations.
- Doing this can help illuminate how application of the fully covariant representation of a metric tensor can change a contravariant representation of a vector or tensor to a covariant representation, specifically by changing a column vector into a row vector, in a way that traditional matrix algebra cannot do by itself.
- Matrix notation is ultimately much more helpful for objects with only 1 or 2 indices where the indices can take on arbitrarily large values. In practice, it isn't helpful for objects with 3 or more indices, and it becomes easier to just directly manipulate objects with indices instead of trying to force-fit this machinery into matrix notation for its own sake.