In a moment I will ask you a puzzle and it’s a pretty hard puzzle actually. But before I do, I want to lead with a spoiler, which is the fact that the way we’re going to solve this involves the use of complex numbers. And once you hear it you will agree that that seems absurd given that the puzzle is going to be purely a discrete question. It only asks about whole numbers and their sums, there’s not a whiff of the imaginary or even continuity anywhere on the horizon.

It’s certainly not the only time that complex numbers are unreasonably useful for discrete math to borrow a phrase. The more famous example that I could bring up would be how the modern way that mathematicians understand prime numbers, you know questions about how they’re distributed their density at certain regions things like that, well it involves studying specially designed functions whose inputs and outputs are complex numbers. Some of you may know that this is what the famous Riemann hypothesis is all about. Basically there’s a specially designed function and, on the face of it, it looks unrelated to the discrete world of primes – it’s smooth, it’s complex valued. But under the hood it encodes all of the information that you could ever want about those discrete prime numbers and, most importantly, certain questions about primes are easier to answer by analyzing this function than they would be by directly analyzing the primes themselves.

Of course our puzzle, which I promise I’ll share in just a moment, is a lot more innocent than the Riemann hypothesis; it’s a toy problem. But at the end of the video I’ll share how the techniques that we use to solve it, the real reason that we’re here, are actually pretty similar in spirit to the setup that leads to the Riemann hypothesis and the prime number theorem and that whole circle of thoughts around it. Our puzzle for today comes from this book here by Titu Andreescu and Zuming Feng. It’s basically a collection of problems used in training the USA team for the international math olympiad. And if we turn to chapter 2 advanced problems, problem number 10 asks this seemingly innocent question “find the number of subsets of the set one up to two thousand, the sum of whose elements is divisible by five”.

Okay, so that might take a little bit of a moment to parse. For example, something like the set three one four, that would be a subset, all of its elements are also elements in the big set. And its sum, three plus one plus four, is eight so that wouldn’t be considered, that’s not in our count. Whereas something like the set two three five, also a subset, has a sum of ten that is divisible by five so it’s one that we want to count. The preview animation that I had at the start is essentially a brute force program trying to answer this question. It will iterate through all of the different possible subsets finding the sum of each one along the way and it increments a counter each time that it finds a multiple of five.

And, you know what, a nice warm-up question here would be to pause and think about how many total subsets are there overall, forget this multiple of five stuff. How long will it take for this program to terminate? Many of you may know the answer is 2 to the power 2,000. The basic idea there is that when you’re constructing a subset you have 2,000 different binary choices you can make. Do you include an element? or do not? and all of those choices are independent of each other so the total number of choices you have in constructing a subset is 2 times 2 times 2 times 2 on and on 2,000 times.

And, thinking about our program, that is a monstrously huge number. So even if we gave this brute-forcing approach all the time in the universe, with all the physical resources the universe could conceivably provide, it wouldn’t even come close, it wouldn’t scratch the surface. Obviously, we have to be a lot cleverer than simply guessing what the answer should be, and making a rough approximation. It is true that there is probably a roughly even distribution of all the sums mod five, but the real challenge here is to get a precise answer. This cannot be the actual answer since it’s not an integer, but is the true answer a little bit more or a little bit less, or perhaps a lot more or a lot less? What tactics could you possibly use to figure out that error? To be clear, this lesson is more about the journey than the destination, and it is unlikely you will ever need to filter and count subsets in this way. However, it is a legitimately challenging question and navigating that challenge develops skills that are relevant to other sorts of challenging questions. There are at least two very surprising and beautiful twists and turns that the solution I’d like to share with you takes. I’ve already tipped my hand that complex numbers will make a surprise appearance, but before we even get to that there is another strange turn which is arguably even weirder and more unexpected. To set the stage, let’s just get our bearings with the puzzle and do what all good problem solvers should do and start with a simpler example, maybe just trying it with the set one two three four five. If you were solving this problem with pencil and paper, you could list out all two to the five subsets, since it is only 32. There are different ways to organize these in your mind, but since the thing we care about is their sum, the natural thing to do would be to go through all of them one by one and compute those sums. I have a computer, so I’ll cheat a little and show what all their sums are. I’ll also cheat a little bit and rearrange all of these, organizing them suggestively into collections that all have the same sum. For instance, there are three distinct subsets that add up to six and they’ll all sit in this little box and the three subsets adding up to 10 will all live in this little box. And all in all, the ones that we care about, the subsets with a sum divisible by five, have been put over here on the left and it looks like there’s a total of eight of them. Oh, and by the way, I should say we are counting the empty set, we consider its sum to be zero and we consider that to be a multiple of five. By the end, I hope you’ll agree all of those are abundantly natural choices to make. Take a moment to compare this answer to what you might expect heuristically. Out of all 32 total subsets, a fifth of that would have been 6.4, so at least in the small example the true answer is a little bit bigger than that. That’s something you want to tuck in the back of your mind. Okay and this is the part of the video where I’ll be honest with you, I’ve no idea how to motivate it personally. I like it when math feels like something you could have discovered yourself and if you and I were sitting down together solving this problem, I think there’s all sorts of natural steps that you might take. Maybe you try to understand if there’s some sort of structure to the subsets or you play around with how these sums are distributed mod 5 at many different iterations for other small examples and from that maybe you try to eke out some kind of proof by induction. When I shared an early version of this lesson with some patrons, people brought up some nice linear algebra approaches. All those are well and good, nothing wrong with those, but instead my goal here is to teach you about something called a generating function and it’s one of those tactics where after the fact you can think, okay, yeah I get that this works but how on earth would you have thought of that? Honestly, I don’t know. There’s a time in your life before you understand generating functions and a time after and I can’t think of anything that connects them other than a leap of faith.

I’m going to ask you to consider the polynomial 1 plus x times 1 plus x squared times 1 plus x cubed times 1 plus x to the fourth times 1 plus x to the fifth. Now I know you could rightfully ask where does this come from? What do polynomials have to do with things? What is the variable x even supposed to represent right now? And essentially x is purely a symbol, the only reason that we’ve written a polynomial here is that the act of algebraically expanding it is going to completely mirror the act of constructing subsets and importantly this grouping that we want, where subsets with the same sum are all bunched together kind of happens automatically when you do this and let me show you what I mean.

When you expand out this expression basically comes down to making five binary choices: which term from each parenthetical do you choose? If you choose the one from each of those parentheticals, that will correspond to the empty set where we don’t choose any of the elements. Whereas, if I choose the x to the one term and then ones from everything else, that will correspond to the singleton set that just contains the number one. Similarly, if I choose the x squared term, but ones from everything else, that corresponds to the set just containing two. Just choosing the x cubed term corresponds to the set just containing the number three.

But, interestingly, notice what happens if I choose the x to the one term and the x squared term and then ones from everything else. This corresponds to the choice of the subset that has one and two and nothing from everything else, but in the polynomial, the way it expands looks like x cubed. So we have two different x cubed terms each of which came from a subset whose sum was three.

And, honestly, the pattern that I’m going for here is one that’s probably easiest if you just take the time to pause and think through for yourself what happens when you expand everything here. Essentially every possible subset corresponds to one of the terms in this expansion and then the critical point is that the exponent in the term that you get from that expansion equals the sum of that corresponding subset. Kind of confusing when you say it out loud, but again if you just kind of think it through yourself I think you can see what I mean.

For example, when all of the dust settles and we collect all 32 terms here, three of those terms are x to the tenth and each of those came from a choice of elements whose sum was equal to ten. Now normally when we write a polynomial, we collect together all like terms so instead of having three copies of x to the tenth we would just see the coefficient 3 in front of x to the 10th. So each of these coefficients is a way of encoding the number of subsets with a particular sum.

So this, like I said at the start, is an example of something called a generating function where the idea is if you have some question with an answer associated with each positive integer. So, in our case, how many subsets add up to a particular value? When you construct a polynomial whose coefficients correspond to the answers to that question, you can get a surprising amount of insight from your original question by mathematically manipulating and analyzing the properties of this polynomial. But, if you can imagine a different problem   where you don’t know the coefficients and you  are trying to deduce them, this is a really nice 

There are tons and tons of examples of generating functions, but one especially fun example is to study Fibonacci numbers. In this case, it’s an infinite polynomial (or a power series) which can be expressed as an equation in terms of a function. Manipulating this equation, you can get an exact closed form expression for each individual Fibonacci number.

In our particular example, the generating function involves two thousand different binomial terms. If we were to expand this, the coefficients tell us all the information we want. It would be insane to actually expand it, but it is helpful to keep in the back of your mind in principle what that would look like.

If we evaluate the function at 0, we know how to evaluate it using the factored form above. All of the terms look like 1, so the answer is 1. In the expanded form, all of those terms involving an x will get killed, leaving us just with the first term c sub 0.

If we evaluate the function at 1, every term looks like a two, so in total we get two multiplied by itself two thousand times. On the other hand, in the expanded expression if you plug in x equals one, all of these powers of x go to 1, so we’re essentially adding up all of the coefficients.

By evaluating the function at a single number, we can deduce what the sum of all of the coefficients are. This is a really nice way to deduce the coefficients if we don’t already know them. Remember each coefficient counts how many subsets have a certain sum, so when you add them up we’re just counting all of the subsets which we know to be two to the two thousand. However, I can give you a genuinely new fact if I ask you to evaluate this function at negative 1; take a moment to think about what that means. If you plug in negative 1, all the first parenthetical goes to zero, so the whole expression has to be zero. This tells us that we have an oscillating sum between the even coefficients and the odd coefficients. We can interpret this as telling us that there is an equal balance between all the even coefficients and the odd coefficients. This expression is true for any generating function, and for our special generating function we know that this value should equal zero.

To answer our final question, we need to find a way to filter out all of the even coefficients and kill all of the odd coefficients. We can do this by taking the last two things we evaluated, adding them up and then dividing by one half. This is a way of filtering out all of the even coefficients and killing all of the odd coefficients, and it tells us that half of all the subsets have an even sum and half of them have an odd sum.

To get all the coefficients corresponding to multiples of five, we need to extend into the complex plane. We can find a value so that as we take successive powers of it, it will rotate by a fifth of a turn, giving us a process with a frequency of five. This will allow us to answer our final question, which is counting the total number of subsets whose sum is divisible by five. Then, when you square zeta to the one, it’s  the same thing as taking the square root of zeta  to the two, which is the same thing as zeta to  the four.So if I bring the tip of that vector  over to the tail of the last one, I get this. 

It’s discrete math, but hopefully it’s not all that wild. The trick we’re about to apply has a heavy resemblance to many other instances of using complex numbers to better understand discrete questions of integers. To be specific, the complex number that I care about is one that I’m going to label zeta and it sits a fifth of a turn around the unit circle. So its angle is two pi fifths radians and its magnitude is one. This means with the standard Euler’s formula notation we would write that number explicitly as e to the power 2 pi i divided by 5.

These numbers have a special name, they’re called the fifth roots of unity, essentially because they solve the equation z to the fifth equals one. If you just presented someone with this equation they would probably say the answer is clearly z equals one, but the idea is that there are four other answers in the complex plane. Four other numbers where when you raise them to the fifth you get one and considering them as a collective is often quite useful.

So in analogy with what we did earlier where we added together f of one and f of negative one to get this cancellation among the odd terms, what we’re going to do is evaluate f at all five of these numbers and then add them together and hopefully we get some cancellation. Another way to think about this is that all five of these terms are evenly balanced around the number zero; their center of mass is at the origin. Now it’s helpful to think about a slightly less trivial example, if f of x was x squared. So when you square zeta to the zero it stays zeta to the zero. This is just a fancy way of saying the number one. Then, when you square zeta to the one, it’s the same thing as taking the square root of zeta to the two, which is the same thing as zeta to the four. When you square $\zeta$, you get $\zeta^2$ itself, so you might imagine this dot up here moving over to the $\zeta^2$ dot when we do it. $\Theta^2$ moves to $\zeta^4$, so you might imagine this dot moving over to $\zeta^4$. $\Theta^3$ moves to $\zeta^6$, which because we loop around every five times, is the same thing as $\zeta^1$. So this dot will move up here. And finally $\zeta^4$ squares to give us $\theta^8$ which reduces to be the same as $\zeta^3$, which I might draw like this.

That might seem a little confusing to think about, especially with all the arrows I have drawn here, but it’s worth thinking through at least once in your life, because the idea here is that when we square this, I go to all of these different terms and I program them to double the angle that they have; the overall effect is to just shuffle those terms. We get the same numbers but written in a different order, so their sum is still going to be zero.

Similarly, if you go through this exercise with $x^3$, which I encourage you to do, and you follow around where each one of these dots will end up, you’ll be able to see that when we cube these terms, when we take each one and we multiply the angle that it has by three again, we just shuffle them around – same terms listed in a different order. Unsurprisingly, the same thing happens if our function was $x^4$, but critically where things change is if we consider the function $x^5$. In that case, when you raise $\zeta$ to the fifth power, by definition it goes to one. Similarly, $\zeta^2$ raised to the fifth power goes to one. All of these go to one, they are the roots of unity. This is, after all, their whole purpose in life.

So in this case, when we apply the function and add them all up, instead of going to zero and getting cancellation we get a kind of constructive interference. All of them equal one so their sum is equal to five. So if you step back and think about what all those examples mean, essentially this expression is something that will go to zero for powers of $x$ which are not divisible by five, but it goes to something non-zero for powers of $x$ which are divisible by five and that’s exactly the kind of filter that we’re looking for.

If you’re worried that our actual function is much more complicated than a simple power of $x$, essentially things play really nicely here because everything is linear. If $f$ is some massive polynomial and we want to evaluate this big sum, you could sort of think of going column by column, where each time you really are just adding up powers of $\zeta$ and in most cases all those powers cancel out with each other and you get zero, but when all of those powers are multiples of five they constructively interfere and instead you get five times whatever the corresponding coefficient is.

Deep in the weeds it’s easy to forget why we’re here in the first place, but remember each one of those coefficients tells us how many subsets add up to a certain value and so what we want is to add up all of the coefficients that are multiples of five and what we have right now is a way to explicitly do that.

If we evaluate this function on these five different routes of unity (which I know seems kind of weird), then all we have to do is divide by five and it gives us the sum that we want. That’s really cool if you ask me. We have a question that’s just about subsets, it’s a discrete math problem, and yet the way that we can answer it is to evaluate a crazy polynomial on some judiciously chosen complex numbers. The more math you do the less crazy that seems because complex numbers have this bizarre relationship with discrete math, but it really is wonderful – there’s no two ways about it. We can multiply  the lengths of the lines corresponding to the  conjugate pairs, and then multiply that product  by the length of the remaining line.

However, some of you might complain that the only way that this is useful is if we can actually evaluate this wild expression on our polynomial. Now, remember, the form of the polynomial we know the one we’re comfortable with is the factored form where you have this one plus x one plus x squared on and on all the way up to one plus x to the two thousand. Everything up to this point is just meaningless symbolic play pushing around one hard problem into another unless we can actually roll up our sleeves and do some honest calculation here. This is the final thrust in our argument so step back, take a deep breath. It’s actually not as bad as you might think, but let’s start just by thinking about how you might evaluate just one of the roots of unity that we need, maybe zeta itself.

So what that looks like is one plus zeta times one plus zeta squared times one plus a to cubed on and on except, importantly, after those first five terms everything starts repeating, because powers of zeta repeat the entire expression up to 2000 is basically just going to be a copy of this expression 400 times. It still might seem hard to evaluate this expression, but it’s way easier than multiplying out 2,000 different terms. A way you might visualize this is that we’re taking each one of those roots of unity but basically adding one, we’re shifting them all to the right. This picture actually lends itself to a really nice geometric intuition for the numerical answer that we might expect. The thing that we want is the product of these five different complex numbers, these five yellow dots and if you know a thing or two about complex numbers since these come in conjugate pairs all we really need is to multiply the lengths of these five yellow lines. For example, that dot furthest to the right corresponds to one plus zeta to the fifth, which in the diagram I’m labeling as zeta to the zero plus one, but it doesn’t matter in either case they’re both just fancy ways of writing the number two. Next to that, we have the values one plus zeta and one plus zeta to the fourth, both of which have the same magnitude. The lengths of these lines are the same and let’s just give that a name L1. So we need to multiply two different copies of that length L1 squared. Similarly, the remaining two values theta squared plus one and zeta cubed plus one, they also have the same length and they’re a conjugate pair so let’s just call that length L2. So our product needs to include two copies of that L2.

If we were just making a loose heuristic guess you might notice that L1 is a length that’s something a little bit longer than one and L2 is something a little bit shorter than one, so the final answer here probably comes to something around two ish. We’re not positive, but something in that ballpark. To turn this into an exact answer we could just expand out the full expression. It’s honestly not that bad there’s only 32 different terms.

Okay, you’ve hung with me for a long time now, and I know that it’s getting to be a lot. But there’s one final trick in this whole argument that makes our last step much simpler than you might think it should be. And let’s just recap to remind ourselves of where we are. We started with this question asking us to count the number of subsets of 1 up to 2 000 whose sum is divisible by five. We then constructed this polynomial whose coefficients tell us how many subsets have a particular sum for each value n. So what we want is to add up every fifth coefficient of that polynomial. Then we saw how evaluating this polynomial as a function on all of the fifth roots of unity, then adding them up, ends up giving us exactly this filter that we want. And here we’re evaluating just one of those terms, f(zeta), which essentially comes down to a product of five complex numbers.

As a super slick way to actually evaluate that product, here’s the final trick. We can multiply the lengths of the lines corresponding to the conjugate pairs, and then multiply that product by the length of the remaining line. Remember, I described these numbers as “roots of unity,” they solve the equation $z^5 = 1$. Another way to think about that is that they are roots of the polynomial $z^5 - 1$. Now what that means is we can factor the polynomial $z^5 - 1$ to look like this, where there’s one factor corresponding to each one of the roots; you take $z -$ each one of the roots. This expression is kind of magical when you think about all of the crazy cancellations that has to happen when you expand it all out, but it is true, and it’s super useful for us right now, because the expression on the right-hand side looks almost identical to the thing we need to evaluate up at the top here. It basically just has minus signs where we wish there were plus signs. The trick is to plug in $z = -1$. If you do that, you essentially have the negative of what we want, so if you multiply it by negative one, notice how the left-hand side here, which started out as $-1 - 1$, or $-2$, that just becomes two. And then the right-hand side turns into the thing that we want to evaluate. So just as our geometric intuition earlier might have suggested, not only is the answer around two, the answer quite magically turns out to be precisely two.

That is actually super nice, and very lovely, because it means this bigger expression that we want to evaluate, where we’re adding up $f$ on all of the different roots of unity, we know its value on the first root of unity. It will be $2^{400}$. Essentially identical reasoning shows that its value on the next three roots of unity is also $2^{400}$. Because remember, when you take powers of $\zeta^2$ or $\zeta^3$, you get the same list of numbers, they’re just shuffled in a different order. The only one that’s different is when we evaluate it at $\zeta^0$, but $\zeta^0$ is a fancy way of saying the number one, and we know how to evaluate this at one, that’s one of the easy things. We did this earlier. All of these parentheticals turn into two, so it looks like taking two multiplied by itself two thousand times.

And so finally with that, we have a highly explicit honest answer to our counting question. To add up all of these coefficients which are divisible by five, which remember is a way of counting how many total subsets have a sum divisible by five, the answer is one-fifth of this weird complex expression, which we just computed to be $2^{2000} + 4 \cdot 2^{400}$.

And here you might want to do just a quick sanity check on does this answer make any sense. For example, if you do it in the smaller case with the set ${1, 2, 3, 4, 5}$ and you walk through all the same reasoning that we just did, it tells you that the answer is one-fifth of $2^5$, the total number of subsets, plus four times two to the one, in this case, which is a fifth of 32 plus 8, which is 8. And if you’ll remember when we explicitly looked at them all that was in fact the answer.

Look, this is a hard puzzle, and when it’s worth putting in the time to solve a hard problem it’s also worth taking some time to reflect on it. What do you get out of this? What’s the takeaway? You could reflect on the answer itself, how the dominant part is indeed one-fifth of all the total subsets like we might have guessed, and how this error term came about from the not-quite-destructive interference in a massive combination of roots of unity. But again what makes this question interesting is not the answer, it’s the way that we solved it. Namely, taking a discrete sequence that we want to understand and treating it as the coefficients on a polynomial, then evaluating that polynomial on complex values. Both of those steps are probably highly unexpected at the outset. Both of these steps relate to some very general and powerful techniques found elsewhere in mathematics. At the beginning of the lesson, I promised that the technique we would use would be similar to the way primes are studied, and the set of ideas leading up to the Riemann hypothesis. This is a very beautiful topic, so much so that it seems a little criminal to rush it. The right thing to do is to make the video I promised a while back about the zeta function.

The two are parallel in that both involve a discrete sequence that carries information and a function whose coefficients are the terms in that sequence. In the case of primes, it is a Dirichlet series instead of a polynomial. Studying how the function behaves with complex-valued inputs reveals information about the coefficients.

The reason complex numbers are so useful is that they offer more power in making deductions about the coefficients. It is almost impossible to overstate how helpful the idea of using values on the unit circle, and roots of unity, to suss out frequency information is. This is the core idea underlying Fourier transforms and Fourier series.

To learn more, I highly recommend the book Generatingfunctionology by Herbert Wilf. I will also leave up a few fun puzzles for anyone who wants to flex their muscles with the idea.