In this post, I discuss Schur-Weyl duality, its proof and some of its applications to the field of quantum information. With all the background, this post ended up being longer than I anticipated. I might stick to shorter posts in the future. There is some material that may be new, such as the dual version for the Schur transform and the description of immanants. I have a feeling that this may be known to some experts and just not written up anywhere.
1. Statement of Schur-Weyl duality
Schur-Weyl duality (in one form) says that the diagonal action of the general linear group commutes with the action of the symmetric group on the tensor product of finite dimensional vector spaces. Consider the -fold tensor product of a finite dimensional vector space of dimension . We have the diagonal action of the general linear group , which applies an invertible operator on each tensor copy. For this action of the general linear group, we are interested in the centralizer algebra i.e., . It is easy to see that the set of operators that permute the tensor copies are in this algebra and the question is – are there more? Schur-Weyl duality is the statement that there are no more operators and so, this centralizer algebra is isomorphic to the symmetric group algebra. We will actually consider the action of the so-called universal enveloping algebra of the Lie algebra instead of the general linear group. Schur-Weyl duality can then be stated as
Theorem 1 The image of and the image of in are centralizers of each other.
Before we prove this theorem, let us recall some facts about universal enveloping algebras of Lie algebras.
2. Universal enveloping algebras
A universal enveloping algebra of is a unital associative algebra such that its Lie algebra is . In order to motivate this, first recall that a Lie algebra is not associative. In fact, the Jacobi identity expresses this non-associativity. But, why do we care about having an associative algebra? The main reason is that we are usually interested in representations of Lie algebras rather than the algebras themselves. This means that if is a representation of the Lie algebra, then one would like to say that , where is the multiplication in the Lie algebra. But this is not the case with Lie algebra representations. We would like to have an associative algebra, so that we have this property for its representations.
Now, from any associative algebra , one can construct a Lie algebra by simply defining the Lie bracket to be the commutator i.e., . It is easy to check that this satisfies the axioms of a Lie algebra. So, a natural question is the converse, that is, given a Lie algebra , what is the associative algebra such that ? We do not know if such an object exists and even if it exists, it may not be unique. For some problems, existence can often be shown by a direct construction, but uniqueness can become problematic. In order to solve the uniqueness problem, we require that the object we are searching for, satisfy a universal property. This universal property says that if we find any other algebra whose Lie algebra is also , then there exists a unique algebra isomorphism from to such that . This guarantees that the algebra we find is the only algebra (up to a unique isomorphism) whose Lie algebra is . This also ensures that the representations of the Lie algebra (which are essentially homomorphisms from to an associative algebra) are in one to one correspondence with the representations of . This also means that Schur-Weyl duality, which we will prove for the action of , is true for the action of (and also the Lie group of ). Now for the formal definition of .
Definition 2 Given a Lie algebra , a unital associative algebra (with a Lie algebra homomorphism ) is said to be a universal enveloping algebra of if it satisfies the following universal property: for any unital associative algebra and a Lie algebra homomorphism , there exists a unique algebra homomorphism such that .
Here we will construct the universal enveloping algebra, but we will not prove the universal property. This is proved, for instance, in chapter 3 of Knapp’s book. From any vector space (such as ), one can construct an associative algebra called the tensor algebra , which satisfies the universal property with respect to . The tensor algebra is defined as
where is the fold tensor product of with itself. So, for instance, is the ground field and is . The tensor algebra is a graded algebra which means that it has a filtration such that and is a direct sum of the . A filtered algebra is more general; it satisfies the first condition and instead of the second condition, we would have i.e., need not be the direct sum of . The tensor algebra has two interesting algebras as its quotients. The symmetric algebra (which we will encounter again later in the proof of Schur-Weyl duality) and the exterior algebra are quotients of the tensor algebra by the ideals and respectively (for ). It is easy to see that the symmetric and alternating algebras are graded with and . It turns out that one can construct the universal enveloping algebra of as a quotient of as well. To include the Lie structure into this algebra, we impose the Lie bracket by modding out by the ideal generated by for . So, the universal enveloping algebra is . Since is not generated by homogeneous elements as in symmetric and alternating algebras, the quotient is filtered but not graded. The filtration is given by . For any filtered algebra, one can construct the so-called associated graded algebra. In the case of , the associated graded algebra turns out to be isomorphic to the symmetric algebra . This is part of the content of the Poincaré-Birkhoff-Witt (PBW) theorem. The PBW theorem also gives a basis for in terms of an ordered basis for . To construct the associated graded algebra of , we define the grading as . This can be shown to be isomorphic to the grading of the symmetric algebra. Using the PBW theorem, one can show that is generated by the elements , where and is an ordered basis for . Now, we are ready to prove Schur-Weyl duality.
3. Proof of Schur-Weyl duality
As mentioned earlier, we will prove this duality between and . But this can be extended to the one between and as well as the one between and .
Proof: Denote the image of in by , the image of by and by . Now, observe that from Maschke’s theorem, it follows that is semisimple. This means that one can decompose into irreducible invariant spaces of in the following way.
where are (spaces corresponding to) irreducible representations of and , i.e., the set of maps embedding into or the multiplicity spaces. Applying Schur’s lemma, we see that one can identify with the direct sum . In particular, this means that is semisimple.
The next step is to observe that since we are dealing with the commutator of the symmetric group, we have that is the symmetric space . Now, let us look at the image of the universal enveloping algebra on . The image is generated by elements of the form
where lies in . It is clear that this element lies in and that all elements that lie in are not in this image. If we show that the kernel does not contain any more elements of , then we have that the image is , which is isomorphic to by the PBW theorem and we would be done. In order to show that there are no more elements in the kernel, we can examine the elements in . These are generated by elements of the type , where . It can be seen that these are not in the kernel.
4. Irreducible representations of Lie algebras
For any simple Lie algebra , there exist a set of operators in the algebra such that (this is sometimes confusingly stated as commutation of and . The bracket here is multiplication in the Lie algebra and not commutation. But if one considers a matrix representation of , then one could think of it as commutation. Perhaps, a better way is to think of the is as commuting operators in the adjoint action as we will see below). The maximal such set is called the Cartan subalgebra , whose dimension is called the rank of the Lie algebra. This plays a central role in the structure of a Lie algebra. Since is a vector space, there also exists an action of the algebra on itself via the Lie bracket. This action is called the adjoint representation and can be written as , for any in the Lie algebra thought of as states. Now, in terms of the adjoint representation, we can see that the commute with each other (we need to use the Jacobi identity to see this). This means that they are simultaneously diagonalizable. Moreover, it can be seen that all the are in the null space of each of the . If we call the eigenvectors of the rest of the eigenspaces (where ranges over some set), we have that for some eigenvalue . It turns out that these eigenspaces are one-dimensional (if they are not, then it means that we have not chosen a maximal set of commuting operators for ). It is easy to show that are eigenvectors with eigenvalues and the has eigenvalue . If we define as , then and form an subalgebra inside this Lie algebra. This is useful to determine the structure of finite dimensional representations of . A last thing to note here before we move on to the dual space is that there exists a symmetric, bilinear form on the Lie algebra, called the Killing form, defined as (here is the dual Coxeter number of , but let’s not worry about that now).
[It is interesting that the existence of SICs or a complete set of MUBs can be related to existence of certain structures in Lie algebras. Boykin et al., show that the number of Cartan subalgebras of which are pairwise orthogonal with respect to the Killing form corresponds to the number of mutually unbiased bases in (if the Cartan subalgebra is closed under Hermitian conjugation). Appleby et al., showed that the existence problem of SIC-POVMs can be related to the existence of a particular basis of .]
One can construct the dual space of as the space of linear superpositions of maps , where . This dual space is called the root space and the are called roots. It turns out that the root space can be split into positive roots and negative roots of equal cardinality. One way to do this is to simply pick an arbitrary ordered basis for the root space and call positive roots those whose first non-zero coordinate is positive. For an arbitrary finite dimensional representation of , one can, as before, simultaneously diagonalize and consider the eigenvalues of the common eigenspaces. This gives us elements of the dual space, called weights, such that , where is the eigenvalue. Unlike in the adjoint representation, the common eigenspaces of the in an arbitrary finite dimensional representation need not be one-dimensional. In other words, the weights can have multiplicity. However, if is irreducible (like the adjoint representation), the highest and the lowest weights are unique. Finally, one can define an inner product on the root space by using the Killing form i.e., . This is clear when and are roots and by linearity, one can extend this to all of .
Let us now discuss how to describe the weights of a given finite dimensional irreducible representation. Using the definition of positive roots, one can define a special class of roots called simple roots as those positive roots that cannot be written as a sum of positive roots. It turns out that simple roots span the root space and essentially encode the structure of the Lie algebra. However, they are not always a convenient basis. Often one uses a different basis called the basis of fundamental weights. First, define the so called coroots as for each simple root (not to be confused with , which is the eigenvalue ). Now, fundamental weights are defined as the duals of coroots i.e., . In the basis of fundamental weights, the components of any weight are called Dynkin labels and there is a correspondence between the Dynkin labels of the highest weight of a finite dimensional irreducible representation and Young diagrams (which are partitions of some integer). Dynkin labels of highest weights are positive integers and one can create a partition , where . As you can see, the partitions need to be of the same integer. For weights that are not the highest weight, the Dynkin labels can be negative and need not correspond to valid partitions. Corresponding to each weight, there are weight vectors and all the weight vectors span the space of the irreducible representation. For a given weight , we label the weight vector as , where label any multiplicity it may have.
Now, we are ready to discuss the Clebsch-Gordan decomposition of two irreducible representations. When we take the tensor product of two irreducible representations, we have a representation of the group by what is called the diagonal action i.e., if and are two representations, then is the diagonal action. This is in general a reducible representation and its decomposition into irreducibles is called the Clebsch-Gordan (CG) decomposition. For the unitary group, there is a rule that tells us which irreducible representations appear in the decomposition. If we use Young diagrams to label the irreps, then the Littlewood-Richardson rule gives the different irreps and the multiplicity in the decomposition. (In using Young diagrams, one can remove any column of rows.) The weights occurring in the decomposition are linear combinations of the form
where label the multiplicity of the weight and the coefficient is non-zero only if . There are systematic ways to calculate the CG coefficients for the unitary group using the Wigner-Eckart theorem and reduced Wigner coefficients.
[The story of Littlewood-Richardson (LR) rule is quite interesting. In their paper, where Littlewood and Richardson introduce this famous rule, they have a proof for a special case (when one of the partitions has only two rows). However, it is stated as a theorem in general but with a note saying something like – no simple proof has been found for the general case. It seems unlikely that they had a proof (simple or otherwise), given that the first correct proof was found about 40 years later. Shortly after the LR paper, Robinson published a claimed proof. Even though it had several gaps, which were discovered after a long time, his technique eventually led to a correct proof. His method was rediscovered by Schensted and explored further by Knuth and has become the famous RSK correspondence. It appears that the first complete proof of the LR rule was given by Schützenberger using his jeu de taquin. Other correct proofs individually by Thomas, Macdonald and Zelevinsky followed. This interesting story is recounted at the end of this paper.]
We are primarily interested in applications to quantum information theory. Here we will look at three applications.
5.1. Schur transform
The space has the basis , where (here has the basis ). The action of the unitary and symmetric groups can be block diagonalized and this basis can be labeled , where labels an irreducible representation of (or ) and and are states in the irreducible spaces of and respectively. The Schur transform, then, is the unitary matrix that rotates the computational basis to the block diagonal basis. In this paper, Bacon, Chuang and Harrow construct an efficient quantum circuit for the Schur transform. In order to do this, they first construct the Clebsch-Gordan (CG) transform for the unitary group and use it to decompose the action. To see how the CG transform can be used to construct the Schur transform, suppose that we have decomposed into into irreducible spaces and want to go to the action of , then given any irreducible representation label (at the level) and a state in that irreducible space, we can use the CG transform to decompose the tensor product , where stands for a state in the fundamental representation (i.e., the tensor copy). The CG transform gives us . Here the sum is over all unitary group irreducible representations that appear in the CG decomposition and is a state in the irreducible space . It turns out that keeping track of the labels for each automatically gives us a basis of the irreducible space of the symmetric group.
A dual version of the Schur transform
A natural question is whether there is a way to approach the Schur transform using the symmetric group, rather than the unitary group. In order to do this, let us look at the representations in terms of the symmetric group. Recall that as the computational basis for , we picked a basis for each as the set and using this we construct a basis for . Given any basis vector of , define its type as the ordered set , where is the number of times appears in the tensor product. Clearly, the action of the symmetric group permutes all the vectors of a given type in a transitive fashion. This means that the subspace spanned by these vectors of type is an induced representation of — induced from the trivial representation of the Young subgroup (note that can be isomorphic for two different types). If we can block diagonalize these induced representations, we can use it to construct the Schur transform. Let us look at this in more detail. First, we want to convert the basis from to , where is the type of the vector and is the transversal of in . This transform can be done efficiently since the type and the transversal can be efficiently computed from the basis vector and vice versa. Now, we need to block diagonalize each of these induced representations. Fortunately, we can use the algorithm for the Fourier transform over by Beals (see here). There is still a problem since the Fourier transform decomposes the regular representation of the symmetric group, which is the induced representation from the trivial subgroup, not induced from . However, by using the QFTs over and , we can construct the transform for induced representations.
We describe here an algorithm to block diagonalize the induced representation from a subgroup to of an irreducible representation of , if one can perform a QFT over and (and a couple of other conditions, which we explain below). Take a state in the induced representation i.e., a state of the form where is an element of the transversal and is a state in the representation . Append it with an ancilla of dimension , where is the dimension of . Now apply the inverse QFT over to rotate this to the computational basis of . The register now contains states of the form , where is an element of . The first of the two conditions mentioned above is that we need the QFT over to rotate from this basis (which can be thought of as the computational basis) to the block diagonal basis. Next we apply the QFT over to get a basis , where is an irreducible representation of and indexes its multiplicity and is a state in the irreducible space. Now, we need the second condition. We need to be able to re-index the irrep labels so that the label register is of the form . The first register is, of course, of dimension . Now, we can return the ancilla register and we have our transform. Let us check that the two conditions are satisfied in our case i.e., when and . In Beals’ algorithm for the QFT over , the computational basis is assumed to be a word in terms of the generating set of consisting of adjacent transpositions i.e., . So, writing our pair as a sequence using this set should be easy. The second condition is also satisfied since the irreducible representations of that appear in the induced representations from to are those that are greater than the partition corresponding to (here greater is with respect to the dominance order of partitions). In fact, the multiplicities in that induced representation is given by the so called Kostka number. Therefore, if we use a labeling of partitions that respects dominance order, then we automatically satisfy the second condition. (We don’t really have to worry about the second condition since it would just lead to a slightly inefficient encoding.)
Now, the Schur transform is the following sequence of transformations. First, take the basis from to (as discussed above). Then to , where is a partition labeling an irreducible representation of , labels the multiplicity space and the representation space. We have our transform by rearranging the registers to , where is an orthonormal basis for the irreducible space of the unitary group.
5.2. Quantum de Finetti theorem
A version of the quantum de Finetti theorem states that given a joint density operator on systems with permutational symmetry, the partial trace over systems can be approximated as a convex sum of product states. Quantum de Finetti theorems have been applied to several areas of quantum information science such as security proof of quantum key distribution schemes against coherent attacks. Here, we discuss a proof of the quantum de Finetti theorem, which uses representation theory of Lie algebras and Schur-Weyl duality. This is from this paper of Koenig and Mitchison.
Suppose that we have a tensor product of two irreducible representations of a Lie algebra and we are interested in states in an irreducible representation inside the tensor product. Here and are the highest weights. We are specifically interested in the partial trace over the system of any state in and how we can write this as a convex sum over certain states in . Suppose that we have a state in , which is a weight vector. Since appears in the decomposition of , the weight vectors of can be written as linear combinations of the type
where the sum of the weights of and is the weight of . Now, consider the partial trace
where the integration is with respect to the Haar measure. The above equation follows from Schur’s lemma on the system. Notice that the integrand is a pure state i.e.,
This means that the partial trace is
where is the induced probability measure.
Now consider the set , where is the set of weights occurring in and let be the projector onto the space spanned by these weight vectors. We have that
But since is the highest weight, we can replace by i.e., . This means that we can “approximate” the partial trace of exactly with states from . In general, suppose we have another set of states (instead of the ones in ) and we start from another state (instead of ). Let the projector onto this new set of states be , then we can define the states (normalized appropriately), where . The integral can be used to approximate the partial trace and the error (i.e., the trace norm of the difference) in approximation is upper bounded by . The error is first related (via the gentle measurement lemma and triangle inequality) to
By plugging back the expression for and applying Schur’s lemma, this can be calculated to be
Defining as the supremum of over all states , we get the upper bound. This expression for the error gives us the exponential de Finetti theorem. For this, we take the representation with highest weight to be , to be and to be .
5.3. Spectrum of a density operator
The spectrum of many copies of a density operator turns out to be related to Schur-Weyl duality. There are many consequences of this and several related problems. But here, we discuss only one aspect of it. It was originally proved by Keyl and Werner. Hayashi and Matsumoto gave a proof using Schur-Weyl duality. The following is from Christandl and Mitchison.
If is a density operator (i.e., a positive self-adjoint operator with unit trace) in a Hilbert space of dimension , then the spectrum of is closely related to Schur-Weyl duality. Since the operator commutes with permutations of tensor copies, it “lives” in the irreducible spaces of the unitary group. It turns out that as increases it mostly lives in only one particular space. That irreducible representation (or rather its normalized version) tends to the spectrum of . If is a partition of , then the normalized version of is . To state this result more precisely,
Theorem 3 Suppose the spectrum of is and is the projection onto the isotypic space corresponding to the Young frame , then
where is the usual Kullback-Leibler distance between probability distributions.
Proof: First pick the eigenbasis of as the basis of each tensor copy of . As basis vectors of , we have . Each such basis vector corresponds to a partition by counting the number of times a certain occurs in it. Now, consider the projector onto the isotypic space of some in Schur-Weyl duality. This can be given by picking a standard Young tableau of shape .
where and are subsets of permutations of which permute the integers in each row and column respectively. It can be seen that this projector has non-zero overlap with only those basis vectors whose corresponding partition is dominated by (in the dominance order mentioned earlier). If not, then there would be two boxes in the same column which have the same integer and the projector would be zero. Therefore, we have that if is dominated by , then Tr is non-zero. This is easy to bound by using standard bounds on the dimensions of these irreducible representations.
5.4. Permanents, determinants and immanants
The last thing I want to mention is a way to describe immanants using Schur-Weyl duality. The permanent of an matrix is defined as
In order to relate this to Schur-Weyl duality, let us rewrite in terms of an fold tensor power.
Now recall that the projector onto the isotypic space of the trivial representation of the symmetric group is given by
where is the representation of the symmetric group on (i.e., the action that permutes the tensor factors). Using this, we have
We can generalize this to other isotypic spaces of the symmetric group to get the determinant and the immanents.
where is the projector into the isotypic space of the alternating irrep of . For the immanants, we have
where is the projector onto the irrep space of and is the dimension of the symmetric group irrep .
Let us now work only with immanants (since they are more general). The last equation can be written as
First, we can rewrite this by as
where is the permutation module corresponding to the partition . It is the induced representation from the trivial representation of the Young subgroup corresponding to the partition . To obtain the above equation, we have used the fact that
Now we can also block diagonalize to obtain
where is the identity on the symmetric group space and is the part in the unitary group irreducible space. This can finally be written as
where is the projection on to the multiplicity space of the irrep in the permutation module . This space has dimension equal to the Kostka number .
If we replace the permutation module by another one, then we get immanants of replicated matrices. First, define, for any type (where ) the matrix obtained by repeating the first row and column times, repeating the second row and column times etc. Then the immanant of this matrix is
where is the projection onto the isotypic space of in the permutation module corresponding to the type . To see this, note that the immanant of this replicated matrix is
The rest of it follows in the same way as before.