Pages

2023-11-01

Contravariant and Covariant Objects in Matrix Notation

For many years when and since I was in college, I wondered whether it might be possible to consistently represent contravariant & covariant objects using vector & matrix notation. In particular, when I learned about the idea of covariant representations of [invariant] vectors being duals to contravariant representations of [invariant] vectors, meaning that if a contravariant representation of a [invariant] vector can be seen as a column vector, then a covariant representation of a [invariant] vector can be seen as a row vector, I wondered how it would be possible to represent the fully covariant metric tensor as a metric tensor if it multiplies a contravariant representation of a [invariant] vector (i.e. a column vector) to yield a covariant representation of a [invariant] vector (i.e. a row vector), especially as traditionally in linear algebra, a matrix acting on a column vector yields another column vector (while transposition, though linear in the sense of respecting addition and scalar multiplication, cannot be represented simply as the action of another matrix). At various points, I've wondered if this means that fully contravariant or fully covariant representations of multi-index tensors should be represented as columns of columns or rows of rows, and I've tried to play around with these ideas more. This post is not the first to explore such ideas even online, as I came across notes online by Viktor T. Toth [LINK], but this post is my attempt to flesh out these ideas further. Follow the jump to see more. Throughout this post, I will work with the notation of 2 spatial indices, in which the fully covariant representation of the metric tensor \( g_{ij} = \vec{e}_{i} \cdot \vec{e}_{j} \) might not be Euclidean, where indices will use English letters \( i, j, k, \ldots \in \{1, 2\} \), where superscripts do not imply exponents, and where multiple superscripts do not imply single numbers (for example, \( g_{12} \) is the fully covariant component of the metric tensor with first index 1 and second index 2, not the covariant component at index 12 of a single-index tensor (vector)); extensions to spacetime (where the convention is to use indices labeled by Greek letters) and in particular to 3 spatial + 1 temporal dimensions are trivial. Additionally, Einstein summation will be assumed, and all tensors (including vectors & scalars) are assumed to be real-valued. Finally, I will do my best to ensure that when indices are raised or lowered, the ordering of indices is clear (as examples, distinguishing \( T^{i}_{\, j} \) from \( T_{i}^{\, j} \) instead of ambiguously using \( T^{i}_{j} \) or \( T^{j}_{i} \)), but this will depend on the quality of LaTeX rendering in this post.

Indices

A tensor represented with a contravariant index has that index raised. A tensor represented with a covariant index has that index lowered.

Metric and Kronecker delta tensors

The metric tensor is typically written in fully covariant form as \( g_{ij} \). (This depends on the spatial coordinate, but because all manipulations in this post will be at a single point in space, that dependence will be implicit/ignored). The inverse of the metric tensor is the fully contravariant representation \( g^{ij} \), defined such that \( g^{ik} g_{kj} = \delta^{i}_{\, j} \), where \( \delta^{i}_{\, j} \) is a representation of the Kronecker delta tensor (1 when \( i = j \), 0 otherwise) such that, in this case, the first index is raised & the second index is lowered. Additionally, because the metric tensor is the object that raises & lowers indices, such that \( v_{i} = g_{ij} v^{j} \) and \( v^{i} = g^{ij} v_{j} \), \( g^{i}_{\, j} \) is identical to \( \delta^{i}_{\, j} \) (and likewise \( g_{i}^{\, j} \) is identical to \( \delta_{i}^{\, j} \), which also is 1 when \( i = j \) and 0 otherwise) regardless of what \( g_{ij} \) is.

Vectors and inner products

To be able to represent these objects using matrix notation, it is possible to start by saying that the contravariant representation of a vector \( v^{i} \) can be represented in matrix notation as a column vector \( \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} \) and the covariant representation of a vector \( v_{i} \) can be represented in matrix notation as a row vector \( \begin{bmatrix} v_{1} & v_{2} \end{bmatrix} \).

The inner product between two vectors, which is invariant, can be written as \( \vec{u} \cdot \vec{v} = u_{i} v^{i} = u_{1} v^{1} + u_{2} v^{2} \). This can be written in matrix notation as a multiplication of a row by a column, namely \( \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} \) (giving the same thing). This is the only matrix product whose result has fewer dimensions (indices) than the inputs and involves explicit addition of different components; all other products to be discussed are either scalar products or outer products.

Tensor representations as matrices

How would this matrix notation generalize to tensors with more than 1 index, especially when more than 1 index is simultaneously raised or lowered? The discussion in the beginning shows that it clearly can't be an ordinary matrix with rows & columns. Let us starting with the case of a tensor with 2 indices that are both raised, called \( A^{ij} \). It will actually look like a column of column vectors with components arranged as \( \begin{bmatrix} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \\ \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \end{bmatrix} \).

Why was this particular ordering chosen? Essentially, in a tensor like the one with representation \( A^{ij} \), the first index should represent the innermost matrix structure (column if raised, row if lowered) and then each index to the right will represent a progressively more exterior matrix structure until the rightmost index represents the outermost matrix structure (in each case column if raised, row if lowered). Intuitively, this can be seen as representing \( A^{ij} \) as \( \begin{bmatrix} A^{i1} \\ A^{i2} \end{bmatrix} \) and expanding \( A^{i1} \) as its own column \( \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \) and \( A^{i2} \) as its own column \( \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \).

This still might not be entirely convincing. Why does this work? Consider the vector \( v^{i} = A^{ij} u_{j} \). In this case, \( u_{j} \) is contracted with the outermost index of \( A^{ij} \), so \( u_{j} \) must be represented as a row vector multiplying the outermost column vector of the representation of \( A^{ij} \). In particular, this means \( \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} = \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \\ \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \end{bmatrix} \). The first step is to use the rule of the inner product to say \( \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \\ \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \end{bmatrix} = u_{1} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} + u_{2} \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \). This says that the inner product on the outer index of \( A^{ij} \) means that each element of the row vector representation of \( u_{j} \) is multiplying, as if it were a scalar, the inner entities of \( A^{ij} \) which are the column vectors expanded from \( A^{i1} \) and \( A^{i2} \). The second step is to evaluate these scalar multiplications & additions to say that \( \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} = \begin{bmatrix} A^{11} u_{1} + A^{12} u_{2} \\ A^{21} u_{1} + A^{22} u_{2} \end{bmatrix} \), so the result is indeed the same as computing this using only indexed quantities.

This still seems a little too neat. What about for \( w^{i} = A^{ji} u_{j} \)? In this case, \( u_{j} \) is contracted with the innermost index of \( A^{ji} \). Intuitively, one may imagine that the row vector representation of \( u_{j} \) must multiply, as an inner product, the inner columns of \( A^{ij} \). How would this work in matrix notation? By enclosing the row vector in another set of matrix brackets, that row vector functions effectively as a scalar multiplying the inner entities in \( A^{ij} \) in the same way. This means that \( \begin{bmatrix} w^{1} \\ w^{2} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \\ \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \end{bmatrix} \). The first step is to carry out the effective scalar multiplication \( \begin{bmatrix} \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \\ \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \\ \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \end{bmatrix} \). The second step is to carry out the inner products implied by a row multiplied to the left of a column: \( \begin{bmatrix} \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} A^{11} \\ A^{21} \end{bmatrix} \\ \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} A^{12} \\ A^{22} \end{bmatrix} \end{bmatrix} = \begin{bmatrix} u_{1} A^{11} + u_{2} A^{21} \\ u_{1} A^{12} + u_{2} A^{22} \end{bmatrix} \).

In many cases, this procedure of taking the inner product of two tensors (potentially with different numbers of indices) can be generalized to more than 1 or 2 indices. However, there are cases where this might not be easily generalizable. For example, in \( A^{jik} B_{jk} \), contraction is happening in the first & third indices simultaneously for \( A^{jik} \), but \( B_{jk} \) only has 2 indices (both of which are being contracted with indices of \( A^{jik} \)), so I think it will be hard to consistently represent \( B_{jk} \) with the correct matrix structures to make this work. One solution is to use the properties of the Kronecker delta tensor to say that \( A^{jik} B_{jk} = \delta^{j}_{\, p} \delta^{k}_{\, q} A^{piq} B_{lm} \delta^{l}_{\, j} \delta^{m}_{\, k} \). I will show how this works in practice in following sections with respect to use of the Kronecker delta tensor. More pressingly, this requires forming the outer product \( A^{piq} B_{lm} \), which is a tensor with 5 indices (of which the first 3 are raised & the remaining 2 are lowered). This can be done effectively by multiplying each element of \( B_{lm} \) with the full representation of \( A^{piq} \), as a column of columns of columns, as if it were a scalar. As an example, if \( A^{ij} = a^{i} b^{j} \), this can be represented as \( \begin{bmatrix} \begin{bmatrix} a^{1} b^{1} \\ a^{2} b^{1} \end{bmatrix} \\ \begin{bmatrix} a^{1} b^{2} \\ a^{2} b^{2} \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} a^{1} \\ a^{2} \end{bmatrix} b^{1} \\ \begin{bmatrix} a^{1} \\ a^{2} \end{bmatrix} b^{2} \end{bmatrix} \); effectively, \( b^{j} \) represented as the column \( \begin{bmatrix} b^{1} \\ b^{2} \end{bmatrix} \) has multiplicatively absorbed the column \( a^{i} \) represented as \( \begin{bmatrix} a^{1} \\ a^{2} \end{bmatrix} \). This representation crucially depends on the consistent ordering of indices when contractions occur, so some tensor multiplications require more careful accounting of matrix structures at different levels. For example, if \( A^{ijk} = B^{ik} c^{j} \), the contravariant representation of the vector \( c^{j} \) is being multiplied by effectively being inserted between the indices of \( B^{ik} \), yet there is no guarantee that \( B^{ik} \) can be represented as the product of the contravariant representations of just two vectors. Instead, if one is to represent \( A^{ijk} \) as \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} A^{111} \\ A^{211} \end{bmatrix} \\ \begin{bmatrix} A^{121} \\ A^{221} \end{bmatrix} \end{bmatrix} \\ \begin{bmatrix} \begin{bmatrix} A^{112} \\ A^{212} \end{bmatrix} \\ \begin{bmatrix} A^{122} \\ A^{222} \end{bmatrix} \end{bmatrix} \end{bmatrix} \), then one must change the representation of \( B^{ik} \) from \( \begin{bmatrix} \begin{bmatrix} B^{11} \\ B^{21} \end{bmatrix} \\ \begin{bmatrix} B^{12} \\ B^{22} \end{bmatrix} \end{bmatrix} \) to be effectively 3 levels deep in which the second level is a dummy/scalar structure of the form \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} B^{11} \\ B^{21} \end{bmatrix} \end{bmatrix} \\ \begin{bmatrix} \begin{bmatrix} B^{12} \\ B^{22} \end{bmatrix} \end{bmatrix} \end{bmatrix} \) and the representation of \( c^{j} \) from \( \begin{bmatrix} c^{1} \\ c^{2} \end{bmatrix} \) to be 2 levels deep as \( \begin{bmatrix} \begin{bmatrix} c^{1} \\ c^{2} \end{bmatrix} \end{bmatrix} \) such that the outer level treats \( c^{j} \) like a scalar and only the level below that treats it as a column vector which can undergo an outer product. In particular, the product \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} B^{11} \\ B^{21} \end{bmatrix} \end{bmatrix} \\ \begin{bmatrix} \begin{bmatrix} B^{12} \\ B^{22} \end{bmatrix} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} c^{1} \\ c^{2} \end{bmatrix} \end{bmatrix} \) can be evaluated by taking the effective scalar \( \begin{bmatrix} c^{1} \\ c^{2} \end{bmatrix} \) one level into the expanded 3-level representation of \( B^{ik} \), yielding \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} B^{11} \\ B^{21} \end{bmatrix} \end{bmatrix} \begin{bmatrix} c^{1} \\ c^{2} \end{bmatrix} \\ \begin{bmatrix} \begin{bmatrix} B^{12} \\ B^{22} \end{bmatrix} \end{bmatrix} \begin{bmatrix} c^{1} \\ c^{2} \end{bmatrix} \end{bmatrix} \). Next, the column vectors \( \begin{bmatrix} B^{11} \\ B^{21} \end{bmatrix} \) and \( \begin{bmatrix} B^{12} \\ B^{22} \end{bmatrix} \) effectively function as scalars multiplying the individual components of \( \begin{bmatrix} c^{1} \\ c^{2} \end{bmatrix} \), yielding \( \begin{bmatrix} \begin{bmatrix} B^{11} \\ B^{21} \end{bmatrix} c^{1} \\ \begin{bmatrix} B^{11} \\ B^{21} \end{bmatrix} c^{2} \end{bmatrix} \) and \( \begin{bmatrix} \begin{bmatrix} B^{12} \\ B^{22} \end{bmatrix} c^{1} \\ \begin{bmatrix} B^{12} \\ B^{22} \end{bmatrix} c^{2} \end{bmatrix} \) respectively. Performing the final multiplications and putting these things together yields \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} A^{111} \\ A^{211} \end{bmatrix} \\ \begin{bmatrix} A^{121} \\ A^{221} \end{bmatrix} \end{bmatrix} \\ \begin{bmatrix} \begin{bmatrix} A^{112} \\ A^{212} \end{bmatrix} \\ \begin{bmatrix} A^{122} \\ A^{222} \end{bmatrix} \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} B^{11} c^{1} \\ B^{21}  c^{1} \end{bmatrix} \\ \begin{bmatrix} B^{11} c^{2} \\ B^{21} c^{2} \end{bmatrix} \end{bmatrix} \\ \begin{bmatrix} \begin{bmatrix} B^{12} c^{1} \\ B^{22} c^{1} \end{bmatrix} \\ \begin{bmatrix} B^{12} c^{2} \\ B^{22} c^{2} \end{bmatrix} \end{bmatrix} \end{bmatrix} \) as expected & desired.

Mixed indices and self-contractions

By now, it should be clear that there are no traditional matrices that are just arrays of numbers in rectangles. Instead, any tensor represented using matrix notation should be a scalar, a row, or a column, in which the elements of the rows & columns should themselves be scalars, rows, or columns, to arbitrary depth. This means that there could be a difference, for example, between \( A^{i}_{\, j} \) represented using matrix notation as \( \begin{bmatrix} \begin{bmatrix} A^{1}_{\, 1} \\ A^{2}_{\, 1} \end{bmatrix} & \begin{bmatrix} A^{1}_{\, 2} \\ A^{2}_{\, 2} \end{bmatrix} \end{bmatrix} \) and \( A_{i}^{\, j} \) represented using matrix notation as \( \begin{bmatrix} \begin{bmatrix} A_{1}^{\, 1} & A_{2}^{\, 1} \end{bmatrix} \\ \begin{bmatrix} A_{1}^{\, 2} & A_{2}^{\, 2} \end{bmatrix} \end{bmatrix} \); the former is a row whose elements are columns, while the latter is a column whose elements are rows. This fact as well as the use of dummy brackets to imply scalar-like multiplication of tensors by elements of other tensors (yielding outer products) ensures that if a row is multiplied by a column without any dummy brackets enclosing the outside of either one, that multiplication can be assumed to be an inner product regardless of ordering; this is because for the outer product \( a^{i} b_{j} \) to be represented with the first index in the innermost structure, the multiplication must be written in matrix notation as \( \begin{bmatrix} \begin{bmatrix} a^{1} \\ a^{2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} b_{1} & b_{2} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} a^{1} \\ a^{2} \end{bmatrix} b_{1} & \begin{bmatrix} a^{1} \\ a^{2} \end{bmatrix} b_{2} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} a^{1} b_{1} \\ a^{2} b_{1} \end{bmatrix} & \begin{bmatrix} a^{1} b_{2} \\ a^{2} b_{2} \end{bmatrix} \end{bmatrix} \) which is a row of columns, while for the outer product \( a_{i} b^{j} \) to be represented with the first index in the innermost structure, the multiplication must be written in matrix notation as \( \begin{bmatrix} \begin{bmatrix} a_{1} & a_{2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} b^{1} \\ b^{2} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} a_{1} & a_{2} \end{bmatrix} b^{1} \\ \begin{bmatrix} a_{1} & a_{2} \end{bmatrix} b^{2} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} a_{1} b^{1} & a_{2} b^{1} \end{bmatrix} \\ \begin{bmatrix} a_{1} b^{2} & a_{2} b^{2} \end{bmatrix} \end{bmatrix} \) which is a column of rows, so interpreting something like \( \begin{bmatrix} a^{1} \\ a^{2} \end{bmatrix} \begin{bmatrix} b_{1} & b_{2} \end{bmatrix} \) as an outer product (as would traditionally be done in matrix algebra) or anything other than an inner product leads to ambiguity.

This representation of 2-index tensors in which one index is raised & the other is lowered leads to a nice connection to ideas from traditional matrix algebra. In particular, \( A^{i}_{\, j} v^{j} \) can be represented as \( \begin{bmatrix} \begin{bmatrix} A^{1}_{\, 1} \\ A^{2}_{\, 1} \end{bmatrix} & \begin{bmatrix} A^{1}_{\, 2} \\ A^{2}_{\, 2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} = \begin{bmatrix} A^{1}_{\, 1} \\ A^{2}_{\, 1} \end{bmatrix} v^{1} + \begin{bmatrix} A^{1}_{\, 2} \\ A^{2}_{\, 2} \end{bmatrix} v^{2} = \) \( \begin{bmatrix} A^{1}_{\, 1} v^{1} + A^{1}_{\, 2} v^{2} \\ A^{2}_{\, 1} v^{1} + A^{2}_{\, 2} v^{2} \end{bmatrix} \) which is a more precise way to show how in matrix algebra, multiplying a matrix by a column vector to the right of a matrix can be seen as adding the columns of the matrix weighted by the elements of the column vector. Similarly, \( A_{i}^{\, j} u_{j} \) can be represented as \( \begin{bmatrix} \begin{bmatrix} A_{1}^{\, 1} & A_{2}^{\, 1} \end{bmatrix} \\ \begin{bmatrix} A_{1}^{\, 2} & A_{2}^{\, 2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} = u_{1} \begin{bmatrix} A_{1}^{\, 1} & A_{2}^{\, 1} \end{bmatrix} + u_{2} \begin{bmatrix} A_{1}^{\, 2} & A_{2}^{\, 2} \end{bmatrix} = \) \( \begin{bmatrix} A_{1}^{\, 1} u_{1}+ A_{1}^{\, 2} u_{2} & A_{2}^{\, 1} u_{1}+ A_{2}^{\, 2} u_{2} \end{bmatrix} \) which is a more precise way to show how in matrix algebra, multiplying a matrix by a row vector to the left of a matrix can be seen as adding the rows of the matrix weighted by the elements of the row vector. It is also possible to compute \( A^{i}_{\, j} u_{i} \), which can be represented as \( \begin{bmatrix} \begin{bmatrix} A^{1}_{\, 1} \\ A^{2}_{\, 1} \end{bmatrix} & \begin{bmatrix} A^{1}_{\, 2} \\ A^{2}_{\, 2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} A^{1}_{\, 1} \\ A^{2}_{\, 1} \end{bmatrix} & \begin{bmatrix} u_{1} & u_{2} \end{bmatrix} \begin{bmatrix} A^{1}_{\, 2} \\ A^{2}_{\, 2} \end{bmatrix} \end{bmatrix} = \) \( \begin{bmatrix} A^{1}_{\, 1} u_{1} + A^{2}_{\, 1} u_{2} & A^{1}_{\, 2} u_{1} + A^{2}_{\, 2} u_{2} \end{bmatrix} \), as well as \( A_{i}^{\, j} v^{i} \), which can be represented as \( \begin{bmatrix} \begin{bmatrix} A_{1}^{\, 1} & A_{2}^{\, 1} \end{bmatrix} \\ \begin{bmatrix} A_{1}^{\, 2} & A_{2}^{\, 2} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} \end{bmatrix} = \begin{bmatrix} \begin{bmatrix} A_{1}^{\, 1} & A_{2}^{\, 1} \end{bmatrix} \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} \\ \begin{bmatrix} A_{1}^{\, 2} & A_{2}^{\, 2} \end{bmatrix} \begin{bmatrix} v^{1} \\ v^{2} \end{bmatrix} \end{bmatrix} = \) \( \begin{bmatrix} A_{1}^{\, 1} v^{1} + A_{2}^{\, 1} v^{2} \\ A_{1}^{\, 2} v^{1} + A_{2}^{\, 2} v^{2} \end{bmatrix} \), though perhaps as expected, these equations do not carry over neatly from traditional matrix algebra and therefore do not have a corresponding easy interpretation.

It is common for 2-index tensors, in which one index is raised and the other is lowered, to have those indices contracted with each other, yielding a [invariant] scalar. A contraction of \( A^{i}_{\, j} \) would look like \( A^{k}_{\, k} \), and a contraction of \( A_{i}^{\, j} \) would look like \( A_{k}^{\, k} \). Because these are not traditional matrices but are columns of rows or rows of columns, it is better to avoid falling back upon the traditional matrix definition of a trace as the sum of diagonal elements, even as \( A^{k}_{\, k} \) looks like the expression \( \sum_{k} \langle u^{(k)}, \hat{A} u^{(k)} \rangle \) for a linear operator \( \hat{A} \) and orthonormal basis vectors \( |u^{(k)}\rangle \). As it turns out, the Kronecker delta tensors \( \delta^{i}_{\, j} \) behave as an orthonormal basis of row vectors when \( i \) is fixed, and the Kronecker delta tensors \( \delta_{i}^{\, j} \) behave as an orthonormal basis of column vectors when \( i \) is fixed. This means that \( A^{k}_{\, k} = \delta^{k}_{\, i} A^{i}_{\, j} \delta^{j}_{\, k} \) and we can explicitly expand the summation as \( \delta^{k}_{\, i} A^{i}_{\, j} \delta^{j}_{\, k} = \delta^{1}_{\, i} A^{i}_{\, j} \delta^{j}_{\, 1} + \delta^{2}_{\, i} A^{i}_{\, j} \delta^{j}_{\, 2} \). As usual, \( A^{i}_{\, j} \) can be represented in matrix notation as \( \begin{bmatrix} \begin{bmatrix} A^{1}_{\, 1} \\ A^{2}_{\, 1} \end{bmatrix} & \begin{bmatrix} A^{1}_{\, 2} \\ A^{2}_{\, 2} \end{bmatrix} \end{bmatrix} \). Meanwhile, the Kronecker delta tensors with one fixed index can be represented as follows: \( \delta^{1}_{\, i} \) looks like \( \begin{bmatrix} \begin{bmatrix} 1 & 0 \end{bmatrix} \end{bmatrix} \) because the index free \( i \) is contracted with the inner index of \( A^{i}_{\, j} \) (so a dummy set of brackets is needed for the outer index so that this row vector can be multiplied as an inner product with the interior column vectors inside \( A^{i}_{\, j} \)), \( \delta^{2}_{\, i} \) looks like \( \begin{bmatrix} \begin{bmatrix} 0 & 1 \end{bmatrix} \end{bmatrix} \) for similar reasons as \( \delta^{1}_{\, i} \), \( \delta^{j}_{\, 1} \) looks like \( \begin{bmatrix} 1 \\ 0 \end{bmatrix} \) with no need for extra dummy brackets because the contracted index \( j \) is the outer index of \( A^{i}_{\, j} \), and \( \delta^{j}_{\, 2} \) looks like \( \begin{bmatrix} 0 \\ 1 \end{bmatrix} \) for similar reasons as \( \delta^{j}_{\, 1} \). Carrying out these products and sums yields \( A^{k}_{\, k} = A^{1}_{\, 1} + A^{2}_{\, 2} \) as expected. This process of enclosing the Kronecker delta tensors' vector constituents in as many dummy brackets as needed to match the position of the corresponding contracted index in another tensor can be generalized to tensors with arbitrary numbers of indices in which an arbitrary index may be contracted. That can be helpful not only for self-contraction but also for contraction with other tensors.

Examples

The Riemann curvature tensor is typically written as \( R^{i}_{\, jkl} \); it has 4 indices, but only the first index is raised (and some conventions quote this tensor with that index lowered too). The Ricci curvature tensor is written as a self-contraction of the Riemann curvature tensor between its first & third indices, so \( R_{ij} = R^{k}_{\, ikj} \). This can be rewritten as \( R^{k}_{\, ikj} = \delta^{k}_{\, m} R^{m}_{\, ilj} \delta^{l}_{\, k} \) where \( R^{m}_{\, ilj} \) can be written in matrix form as \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} R^{1}_{\, 111} \\ R^{2}_{\, 111} \end{bmatrix} & \begin{bmatrix} R^{1}_{\, 211} \\ R^{2}_{\, 211} \end{bmatrix} \end{bmatrix} & \begin{bmatrix} \begin{bmatrix} R^{1}_{\, 121} \\ R^{2}_{\, 121} \end{bmatrix} & \begin{bmatrix} R^{1}_{\, 221} \\ R^{2}_{\, 221} \end{bmatrix} \end{bmatrix} \end{bmatrix} & \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} R^{1}_{\, 112} \\ R^{2}_{\, 112} \end{bmatrix} & \begin{bmatrix} R^{1}_{\, 212} \\ R^{2}_{\, 212} \end{bmatrix} \end{bmatrix} & \begin{bmatrix} \begin{bmatrix} R^{1}_{\, 122} \\ R^{2}_{\, 122} \end{bmatrix} & \begin{bmatrix} R^{1}_{\, 222} \\ R^{2}_{\, 222} \end{bmatrix} \end{bmatrix} \end{bmatrix} \end{bmatrix} \). The Kronecker delta tensor constituents \( \delta^{k}_{\, m} \) are row vectors contracted with the innermost index constituents (column vectors) of \( R^{m}_{\, ilj} \), so \( \delta^{1}_{\, m} \) looks like \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} 1 & 0 \end{bmatrix} \end{bmatrix} \end{bmatrix} \end{bmatrix} \) and \( \delta^{2}_{\, m} \) looks like \( \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} \begin{bmatrix} 0 & 1 \end{bmatrix} \end{bmatrix} \end{bmatrix} \end{bmatrix} \). The Kronecker delta tensor constituents \( \delta^{l}_{\, k} \) are column vectors contracted with the third index constituents (row vectors) of \( R^{m}_{\, ilj} \), so \( \delta^{l}_{\, 1} \) looks like \( \begin{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} \end{bmatrix} \) and \( \delta^{l}_{\, 2} \) looks like \( \begin{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} \end{bmatrix} \). Carrying out these inner products yields \( R_{ij} \) represented as \( \begin{bmatrix} \begin{bmatrix} R^{1}_{\, 111} + R^{2}_{\, 121} & R^{1}_{\, 211} + R^{2}_{\, 221} \end{bmatrix} & \begin{bmatrix} R^{1}_{\, 112} + R^{2}_{\, 122} & R^{1}_{\, 212} + R^{2}_{\, 222} \end{bmatrix} \end{bmatrix} \) as expected and desired.

The Ricci scalar \( R = R_{ij} g^{ij} \) can can be computed quite simply as \( R = \begin{bmatrix} \begin{bmatrix} R_{11} & R_{21} \end{bmatrix} \begin{bmatrix} R_{12} & R_{22} \end{bmatrix} \end{bmatrix} \begin{bmatrix} \begin{bmatrix} g^{11} \\ g^{21} \end{bmatrix} \\ \begin{bmatrix} g^{12} \\ g^{22} \end{bmatrix} \end{bmatrix} \) because the relevant tensors have the same number of indices and the contracted indices are in the same places for the relevant tensors (meaning the outermost column structure of \( g^{ij} \) undergoes an inner product with the outermost row structure of \( R_{ij} \) and then the innermost column structure of \( g^{ij} \) undergoes an inner product with the innermost row structure of \( R_{ij} \)). As expected, this gives \( R = R_{11} g^{11} + R_{21} g^{21} + R_{12} g^{12} + R_{22} g^{22} \). If instead one wanted to compute something like \( A_{ij} B^{ji} \) without knowing whether those tensors are symmetric with respect to their indices (unlike \( g^{ij} \) which is known to be symmetric in its indices), one must instead construct \( C_{ij}^{\,\, kl} = A_{ij} B^{kl} \) and then perform the contraction \( C_{ij}^{\,\, ij} = \delta_{i}^{\, m} \delta_{j}^{\, n} C_{mn}^{\, \, kl} \delta_{k}^{\, i} \delta_{l}^{\, j} \) using the methods described above, as that is a reliable method of doing such contractions.

Conclusions

  1. It is technically possible to recast tensors with raised or lowered indices in terms of generalized row & column vectors whose constituents may be rows, columns, or numbers, and with suitable definitions of outer products (including the use of dummy brackets) & inner products, recover all of the relevant tensor manipulations.
  2. Doing this can help illuminate how application of the fully covariant representation of a metric tensor can change a contravariant representation of a vector or tensor to a covariant representation, specifically by changing a column vector into a row vector, in a way that traditional matrix algebra cannot do by itself.
  3. Matrix notation is ultimately much more helpful for objects with only 1 or 2 indices where the indices can take on arbitrarily large values. In practice, it isn't helpful for objects with 3 or more indices, and it becomes easier to just directly manipulate objects with indices instead of trying to force-fit this machinery into matrix notation for its own sake.