WEBVTT
Kind: captions
Language: en
00:00:00.080 --> 00:00:03.920
Let me pull out an old differential
equations textbook I learned from in college,
00:00:03.920 --> 00:00:08.880
and let's turn to this funny little exercise
in here that asks the reader to compute e to
00:00:08.880 --> 00:00:15.120
the power of At, where A, we're told, is going to
be a matrix, and the insinuation seems to be that
00:00:15.120 --> 00:00:21.120
the result will also be a matrix. It then offers
several examples for what you might plug in for A.
00:00:22.000 --> 00:00:26.480
Now, taken out of context, putting a matrix in
an exponent like this probably seems like total
00:00:26.480 --> 00:00:31.840
nonsense, but what it refers to is an extremely
beautiful operation, and the reason it shows up in
00:00:31.840 --> 00:00:36.960
this book is that it’s useful, it's used to solve
a very important class of differential equations.
00:00:37.600 --> 00:00:42.320
In turn, given that the universe is often written
in the language of differential equations,
00:00:42.320 --> 00:00:46.400
you see it pop up in physics all the time
too, especially in quantum mechanics,
00:00:46.400 --> 00:00:51.120
where matrix exponents are littered throughout the
place, they play a particularly prominent role.
00:00:51.120 --> 00:00:55.120
This has a lot to do with Shrödinger’s equation,
which we’ll touch on a bit later. And it
00:00:55.120 --> 00:00:59.440
also may help in understanding your romantic
relationships, but again, all in due time.
00:01:05.520 --> 00:01:09.600
A big part of the reason I want to cover this
topic is that there’s an extremely nice way to
00:01:09.600 --> 00:01:14.720
visualize what matrix exponents are actually doing
using flow, that not a lot of people seem to talk
00:01:14.720 --> 00:01:19.360
about. But for the bulk of this chapter, let’s
start by laying out what exactly the operation
00:01:19.360 --> 00:01:23.440
actually is, and then see if we can get a feel
for what kinds of problems it helps us to solve.
00:01:24.240 --> 00:01:28.560
The first thing you should know is that this is
not some bizarre way to multiply the constant e
00:01:28.560 --> 00:01:33.920
by itself multiple times; you would be right
to call that nonsense. The actual definition
00:01:33.920 --> 00:01:38.400
is related to a certain infinite polynomial
describing for real number powers of e, what
00:01:38.400 --> 00:01:44.240
we call its Taylor series. For example, if I took
the number 2 and plugged it into this polynomial,
00:01:44.240 --> 00:01:49.600
then as you add more and more terms, each of
which looks like some power of 2 divided by
00:01:49.600 --> 00:02:00.400
some factorial...the sum approaches a number
near 7.389, and this number is precisely e*e.
00:02:01.920 --> 00:02:07.040
If you increment this input by 1, then somewhat
miraculously no matter where you started from
00:02:07.040 --> 00:02:10.960
the effect on the output is always to
multiply it by another factor of e.
00:02:12.160 --> 00:02:16.000
For reasons that you're going to see a bit,
mathematicians became interested in plugging
00:02:16.000 --> 00:02:21.040
all kinds of things into this polynomial, things
like complex numbers, and for our purposes today,
00:02:21.040 --> 00:02:25.600
matrices, even when those objects do
not immediately make sense as exponents.
00:02:26.400 --> 00:02:31.040
What some authors do is give this infinite
polynomial the name “exp” when you plug in more
00:02:31.040 --> 00:02:36.320
exotic inputs. It's a gentle nod to the connection
that this has to exponential functions in the
00:02:36.320 --> 00:02:40.160
case of real numbers, even though obviously
these inputs don't make sense as exponents.
00:02:40.720 --> 00:02:45.680
However, an equally common convention is to
give a much less gentle nod to the connection
00:02:45.680 --> 00:02:50.240
and just abbreviate the whole thing as e to the
power of whatever object you’re plugging in,
00:02:50.240 --> 00:02:54.400
whether that’s a complex number, or a
matrix, or all sorts of more exotic objects.
00:02:54.960 --> 00:03:00.160
So while this equation is a theorem for real
numbers, it’s a definition for more exotic inputs.
00:03:00.800 --> 00:03:05.360
Cynically, you could call this a blatant
abuse of notation. More charitably,
00:03:05.360 --> 00:03:09.600
you might view it as an example of the beautiful
cycle between discovery and invention in math.
00:03:10.560 --> 00:03:14.640
In either case, plugging in a matrix, even
to a polynomial, might seem a little strange,
00:03:14.640 --> 00:03:20.320
so let’s be clear on what we mean here. The matrix
needs to have the same number of rows and columns.
00:03:20.320 --> 00:03:25.200
That way you can multiply it by itself according
to the usual rules of matrix multiplication.
00:03:25.200 --> 00:03:30.160
This is what we mean by squaring it.
Similarly, if you were to take that result,
00:03:30.160 --> 00:03:35.600
and then multiply it by the original matrix
again, this is what we mean by cubing the matrix.
00:03:37.760 --> 00:03:43.200
If you carry on like this, you can take any
whole power of a matrix, it's perfectly sensible.
00:03:43.200 --> 00:03:47.440
In this context, powers still mean exactly
what you’d expect, repeated multiplication.
00:03:54.000 --> 00:03:59.120
Each term of this polynomial is scaled by 1
divided by some factorial, and with matrices,
00:03:59.120 --> 00:04:04.560
all that means is that you multiply each component
by that number. Likewise, it always makes sense to
00:04:04.560 --> 00:04:10.960
add two matrices, this is something you again to
term-by-term. The astute among you might ask how
00:04:10.960 --> 00:04:15.200
sensible it is to take this out to infinity,
which would be a great question, one that I'm
00:04:15.200 --> 00:04:19.520
largely going to postpone the answer to, but I
can show you one pretty fun example here now.
00:04:20.320 --> 00:04:25.440
Take this 2x2 matrix that has -π and
π sitting off its diagonal entries.
00:04:25.440 --> 00:04:29.680
Let's see what the sum gives. The
first term is the identity matrix,
00:04:29.680 --> 00:04:33.440
this is actually what we mean by definition
when we raise a matrix to the 0th power.
00:04:34.240 --> 00:04:39.360
Then we add in the matrix itself, which gives
us the pi-off-the-diagonal terms. And then add
00:04:39.360 --> 00:04:44.880
half of the matrix squared, and continuing on I’ll
have the computer keep adding more and more terms,
00:04:44.880 --> 00:04:49.600
each of which requires taking one more matrix
product to get a new power and adding it to a
00:04:49.600 --> 00:04:56.720
running tally. And as it keeps going, it seems to
be approaching a stable value, which is around -1
00:04:56.720 --> 00:05:02.080
times the identity matrix. In this sense, we
say the sum equals that negative identity.
00:05:02.800 --> 00:05:05.840
By the end of this video, my
hope is that this particular fact
00:05:05.840 --> 00:05:10.320
comes to make total sense to you, for any of
you familiar with Euler’s famous identity,
00:05:10.320 --> 00:05:14.800
this is essentially the matrix version
of that. It turns out that in general,
00:05:14.800 --> 00:05:19.120
no matter what matrix you start with, as
you add more and more terms you eventually
00:05:19.120 --> 00:05:23.840
approach some stable value. Though sometimes
it can take quite a while before you get there.
00:05:26.480 --> 00:05:31.680
Just seeing the definition like this, in
isolation, raises all kinds of questions.
00:05:31.680 --> 00:05:36.240
Most notably, why would mathematicians and
physicists be interested in torturing their
00:05:36.240 --> 00:05:41.360
poor matrices this way, what problems are they
trying to solve? And if you’re anything like me,
00:05:41.360 --> 00:05:45.840
a new operation is only satisfying when you
have a clear view of what it’s trying to do,
00:05:46.400 --> 00:05:51.200
some sense of how to predict the output based on
the input before you actually crunch the numbers.
00:05:51.200 --> 00:05:55.360
How on earth could you have predicted
that the matrix with pi off the diagonals
00:05:55.360 --> 00:05:57.600
results in the negative identity matrix like this?
00:05:58.880 --> 00:06:03.680
Often in math, you should view the definition
not as a starting point, but as a target.
00:06:03.680 --> 00:06:08.240
Contrary to the structure of textbooks,
mathematicians do not start by making definitions,
00:06:08.240 --> 00:06:12.720
and then listing a lot of theorems and proving
them, and then showing some examples. The process
00:06:12.720 --> 00:06:18.080
of discovering math typically goes the other way
around. They start by chewing on specific problems
00:06:18.080 --> 00:06:22.640
and then generalizing those problems, then coming
up with constructs that might be helpful in those
00:06:22.640 --> 00:06:28.080
general cases. And only then do you write
down a new definition, or extend an old one.
00:06:29.200 --> 00:06:34.320
As to what sorts of specific examples might
motivate matrix exponents, two come to mind,
00:06:34.320 --> 00:06:39.280
one involving relationships, and other quantum
mechanics. Let’s start with relationships.
00:06:43.040 --> 00:06:47.920
Suppose we have two lovers, let's call them
Romeo and Juliet. And let's let x represent
00:06:47.920 --> 00:06:54.000
Juliet’s love for Romeo, and y represent
his love for her, both of which are going
00:06:54.000 --> 00:06:59.360
to be values that change with time. This is an
example that we actually touched on in chapter 1,
00:06:59.360 --> 00:07:02.880
it's based on a Steven Strogatz article,
but it’s okay if you didn’t see that.
00:07:03.440 --> 00:07:08.400
The way their relationship works is that the
rate at which Juliet’s love for Romeo changes,
00:07:08.400 --> 00:07:15.440
the derivative of this value, is equal to the -1
times Romeo’s love for her. So in other words,
00:07:15.440 --> 00:07:20.640
when Romeo is expressing cool disinterest,
that' when Juliet’s feelings actually increase,
00:07:20.640 --> 00:07:24.640
whereas if he becomes too infatuated,
her interest will start to fade.
00:07:26.880 --> 00:07:30.000
Romeo, on the other hand, is the
opposite, the rate of change of his
00:07:30.000 --> 00:07:36.320
love is equal to Juliet’s love. So while Juliet
is mad at him, his affections tend to decrease,
00:07:38.880 --> 00:07:41.520
whereas if she loves him,
that's when his feelings grow.
00:07:42.480 --> 00:07:46.720
Of course, neither one of these numbers
is holding still; as Romeo’s love
00:07:46.720 --> 00:07:52.240
increases in response to Juliet, her equation
continues to apply and drives her love down.
00:07:53.200 --> 00:07:58.160
Both of these equations always apply, from
each infinitesimal point in time to the next,
00:07:58.160 --> 00:08:03.120
so every slight change to one value immediately
influences the rate of change of the other.
00:08:04.160 --> 00:08:08.960
This is a system of differential equations.
It’s a puzzle, where your challenge is to
00:08:08.960 --> 00:08:14.400
find explicit functions for x(t) and y(t)
that make both these expressions true.
00:08:15.440 --> 00:08:19.920
Now, as systems of differential equations
go, this one is on the simpler side,
00:08:19.920 --> 00:08:23.600
enough so that many calculus students
could probably just guess at an answer.
00:08:24.240 --> 00:08:28.400
But keep in mind, though, it’s not enough to
find some pair of functions that makes this true;
00:08:28.960 --> 00:08:33.600
if you want to actually predict where Romeo
and Juliet end up after some starting point,
00:08:33.600 --> 00:08:39.840
you have to make sure that your functions match
the initial set of conditions at time t=0. More to
00:08:39.840 --> 00:08:45.360
the point, our actual goal is to systematically
solve more general versions of this equation,
00:08:45.360 --> 00:08:49.440
without guessing and checking, and it's that
question that leads us to matrix exponents.
00:08:50.560 --> 00:08:54.640
Very often when you have multiple changing
values like this, it’s helpful to package
00:08:54.640 --> 00:08:58.000
them together as coordinates of a single
point in a higher-dimensional space.
00:08:58.560 --> 00:09:02.960
So for Romeo and Juliet, think of their
relationship as a point in a 2d space,
00:09:02.960 --> 00:09:10.240
the x-coordinate capturing Juilet’s
feelings..and the y-coordinate capturing Romeo’s.
00:09:13.120 --> 00:09:16.800
Sometimes it’s helpful to picture this
state as an arrow from the origin,
00:09:16.800 --> 00:09:22.080
other times just as a point; all that really
matters is that it encodes two numbers. And moving
00:09:22.080 --> 00:09:27.360
forward we'll be writing that as a column vector.
And of course, this is all a function of time.
00:09:28.320 --> 00:09:32.880
You might picture the rate of change of the state,
the thing that packages together the derivative of
00:09:32.880 --> 00:09:38.720
x and the derivative of y, as a kind of velocity
vector in this state space. Something that tugs
00:09:38.720 --> 00:09:43.280
on our vector in some direction, and with some
magnitude indicating how quickly it’s changing.
00:09:45.440 --> 00:09:52.000
Remember, the rule here is that the rate of
change of x is -y, and the rate of change of y is
00:09:52.000 --> 00:09:57.280
x. Set up as vectors like this, we could
rewrite the right-hand side of this equation
00:09:57.280 --> 00:10:04.640
as a product of this matrix with the original
vector [x, y]. The top row encodes Juliet’s rule,
00:10:04.640 --> 00:10:10.560
and the bottom row encodes Romeo’s rule. So what
we have here is a differential equation telling
00:10:10.560 --> 00:10:17.840
us that the rate of change of some vector
is equal to a certain matrix times itself.
00:10:18.800 --> 00:10:23.280
In a moment we’ll talk about how matrix
exponentiation solves this kind of equation,
00:10:23.280 --> 00:10:27.760
but before that let me show you a simpler way
that we can solve this particular system, one that
00:10:27.760 --> 00:10:32.560
uses pure geometry, and it helps set the stage
for visualizing matrix exponents a bit later.
00:10:34.000 --> 00:10:37.280
This matrix from our system is
a 90-degree rotation matrix.
00:10:38.800 --> 00:10:42.480
For any of you a bit rusty on how to
think of matrices as transformations,
00:10:42.480 --> 00:10:46.640
there’s a video all about it on this
channel, a series really. The basic
00:10:46.640 --> 00:10:52.000
idea is that when you multiply a matrix by the
vector [1, 0], it pulls out the first column.
00:10:53.840 --> 00:11:00.160
And similarly, if you multiply it by [0, 1]
that pulls out the second column. What this
00:11:00.160 --> 00:11:05.120
means is that when you look at a matrix, you
can read its columns as telling you what it
00:11:05.120 --> 00:11:11.200
does to these two vectors, known as the basis
vectors. The way it acts on any other vector
00:11:11.200 --> 00:11:16.480
is a result of scaling and adding these two
basis results by that vector’s coordinates.
00:11:17.360 --> 00:11:19.440
So, looking back at the matrix from our system,
00:11:19.440 --> 00:11:23.680
notice how from its columns we can tell
it takes the first basis vector to [0, 1],
00:11:23.680 --> 00:11:29.040
and the second to [-1, 0], hence why I’m
calling it the 90-degree rotation matrix.
00:11:31.040 --> 00:11:36.800
What it means for our equation is that it's saying
wherever Romeo and Juliet are in this state space,
00:11:36.800 --> 00:11:41.840
their rate of change has to look like a
90-degree rotation of this position vector.
00:11:42.480 --> 00:11:46.640
The only way velocity can be permanently
perpendicular to position like this
00:11:46.640 --> 00:11:50.560
is when you rotate about the origin
in circular motion. Never growing nor
00:11:50.560 --> 00:11:55.840
shrinking because the rate of change has
no component in the direction of position.
00:11:57.040 --> 00:12:02.160
More specifically, since the length of this
velocity vector equals the length of the
00:12:02.160 --> 00:12:08.560
position vector, then for each unit of time, the
distance that this covers is equal to one radius’s
00:12:08.560 --> 00:12:15.360
worth of arc length along that circle. In other
words, it rotates at one radian per unit time;
00:12:15.920 --> 00:12:20.640
so in particular, it would take 2pi
units of time to make a full revolution.
00:12:22.800 --> 00:12:26.880
If you want to describe this kind of
rotation with a formula, we can use a more
00:12:26.880 --> 00:12:32.560
general rotation matrix, which looks like this.
Again, we can read it in terms of the columns.
00:12:32.560 --> 00:12:37.520
Notice how the first column tells us that it
takes the first basis vector to [cos(t), sin(t)],
00:12:39.840 --> 00:12:43.040
and the second column tells us that
it takes the second basis vector
00:12:43.040 --> 00:12:48.560
to [-sin(t), cos(t)], both of which are
consistent with rotating by t radians.
00:12:49.600 --> 00:12:54.800
So to solve the system, if you want to predict
where Romeo and Juilet end up after t units of
00:12:54.800 --> 00:13:01.600
time, you can multiply this matrix by their
initial state. The active viewers among you
00:13:01.600 --> 00:13:06.880
may also enjoy taking a moment to pause and
confirm that explicit formulas you get for x(t)
00:13:06.880 --> 00:13:11.200
and y(t) really do satisfy the system of
differential equations that we started with.
00:13:17.840 --> 00:13:22.560
The mathematician in you might wonder if it’s
possible to solve not just this specific system,
00:13:22.560 --> 00:13:29.440
but equations like it for any matrix. To
ask this question is to set yourself up to
00:13:29.440 --> 00:13:34.960
rediscover matrix exponents. The main goal for
today is for you to understand how this equation
00:13:34.960 --> 00:13:40.000
lets you intuitively picture this operation
which we write as e raised to a matrix, and
00:13:40.000 --> 00:13:45.440
on the flip side how being able to compute matrix
exponents lets you explicitly solve this equation.
00:13:46.320 --> 00:13:49.760
A much less whimsical example
is Shrödinger’s famous equation,
00:13:49.760 --> 00:13:54.240
which is the fundamental equation describing
how systems in quantum mechanics change
00:13:54.240 --> 00:13:59.520
over time. It looks pretty intimidating, and I
mean, it’s quantum mechanics so of course it will,
00:13:59.520 --> 00:14:04.400
but it’s actually not that different from the
Romeo-Juliet setup. This symbol here refers to
00:14:04.400 --> 00:14:08.400
a certain vector. It's a vector that packages
together all the information you might care
00:14:08.400 --> 00:14:12.960
about in a system, like the various particles’
positions and momenta. It's analogous to our
00:14:12.960 --> 00:14:18.720
simpler 2d vector that encoded all the information
about Romeo and Juliet. The equation says that
00:14:18.720 --> 00:14:23.440
the rate at which this state vector changes
looks like a certain matrix times itself.
00:14:24.400 --> 00:14:28.880
There are a number of things making Shrödinger’s
equation notably more complicated. But in the
00:14:28.880 --> 00:14:33.040
back of your mind you might think of this as a
target point that you and I can build up to, with
00:14:33.040 --> 00:14:37.920
simpler examples like Romeo and Juliet offering
more friendly stepping stones along the way.
00:14:39.360 --> 00:14:43.760
Actually, the simplest example, which is
tied to ordinary real-number powers of e,
00:14:43.760 --> 00:14:48.320
is the one-dimensional case. This is when you
have a single changing value, and its rate of
00:14:48.320 --> 00:14:53.280
change equals some constant times itself. So
the bigger the value, the faster it grows.
00:14:54.960 --> 00:14:59.280
Most people are more comfortable visualizing
this with a graph, where the higher the value
00:14:59.280 --> 00:15:04.320
of the graph, the steeper its slope, resulting in
this ever-steepening upward curve. Just keep in
00:15:04.320 --> 00:15:08.000
mind that when we get to higher dimensional
variants, graphs are a lot less helpful.
00:15:08.960 --> 00:15:12.880
This is a highly important equation in its
own right. It's a very powerful concept
00:15:12.880 --> 00:15:16.640
when the rate of change of a value
is proportional to the value itself.
00:15:16.640 --> 00:15:19.840
This is the equation governing
things like compound interest,
00:15:19.840 --> 00:15:24.320
or the early stages of population growth before
the effects of limited resources kicks in,
00:15:25.200 --> 00:15:28.960
or the early stages of an epidemic while
most of the population is susceptible.
00:15:31.680 --> 00:15:38.480
Calculus students all learn about how the
derivative of e^(rt) is r times itself. In
00:15:38.480 --> 00:15:43.680
other words, this self-reinforcing growth
is the same thing as exponential growth,
00:15:43.680 --> 00:15:51.040
and e^(rt) solves this equation. Actually, a
better way to think about it is that there are
00:15:51.040 --> 00:15:55.920
many different solutions to this equation, one
for each initial condition, something like an
00:15:55.920 --> 00:16:01.760
initial investment size or initial population,
which we'll just call x0. Notice, by the way,
00:16:01.760 --> 00:16:06.560
how the higher the value for x0, the higher
the initial slope of the resulting solution,
00:16:07.360 --> 00:16:09.760
which should make complete
sense given the equation.
00:16:11.440 --> 00:16:18.320
The function e^(rt) is just a solution when the
initial condition is 1. But! If you multiply by
00:16:18.320 --> 00:16:22.800
any other initial condition, you get a new
function that still satisfies the property,
00:16:22.800 --> 00:16:27.600
it still has a derivative which is r times
itself. But this time it starts at x0,
00:16:28.160 --> 00:16:33.680
since e^0 is 1. This is worth highlighting
before we generalize to more dimensions:
00:16:33.680 --> 00:16:37.600
Do not think of the exponential
part as a solution in and of itself.
00:16:37.600 --> 00:16:42.240
Think of it as something which acts on an
initial condition in order to give a solution.
00:16:46.320 --> 00:16:50.240
You see, up in the 2-dimensional case,
when we have a changing vector whose
00:16:50.240 --> 00:16:53.040
rate of change is constrained
to be some matrix times itself,
00:16:54.080 --> 00:16:58.720
what the solution looks like is also an
exponential term acting on a given initial
00:16:58.720 --> 00:17:04.320
condition. But the exponential part, in that case,
will produce a matrix that changes with time,
00:17:04.320 --> 00:17:08.800
and the initial condition is a vector. In
fact, you should think of the definition of
00:17:08.800 --> 00:17:13.840
matrix exponentiation as being heavily
motivated by making sure this is true.
00:17:14.800 --> 00:17:20.560
For example, if we look back at the system that
popped up with Romeo and Juliet, the claim now is
00:17:20.560 --> 00:17:27.360
that solutions look like e raised to this [[0,
-1], [1, 0]] matrix all times time, multiplied
00:17:27.360 --> 00:17:32.400
by some initial condition. But we’ve already seen
the solution in this case, we know it looks like a
00:17:32.400 --> 00:17:37.200
rotation matrix times the initial condition. So
let’s take a moment to roll up our sleeves and
00:17:37.200 --> 00:17:42.160
compute the exponential term using the definition
I mentioned at the start, and see if it lines up.
00:17:43.040 --> 00:17:48.400
Remember, writing e to the power of a matrix is
a shorthand, a shorthand for plugging it into
00:17:48.400 --> 00:17:54.560
this long infinite polynomial, the Taylor
series of e^x. I know it might seem pretty
00:17:54.560 --> 00:18:00.000
complicated to do this, but trust me, it’s very
satisfying how this particular one turns out.
00:18:00.000 --> 00:18:04.160
If you actually sit down and you compute
successive powers of this matrix,
00:18:04.160 --> 00:18:13.840
what you’d notice is that they fall into
a cycling pattern every four iterations.
00:18:27.360 --> 00:18:31.280
This should make sense, given that we
know it’s a 90-degree rotation matrix.
00:18:31.280 --> 00:18:34.720
So when you add together all
infinitely many matrices term-by-term,
00:18:35.360 --> 00:18:40.720
each term in the result looks like a polynomial
in t with some nice cycling pattern in the
00:18:40.720 --> 00:18:46.880
coefficients, all of them scaled by the relevant
factorial term. Those of you who are savvy with
00:18:46.880 --> 00:18:51.520
Taylor series might be able to recognize that
each one of these components is the Taylor
00:18:51.520 --> 00:18:56.800
series for either sine or cosine, though in that
top right corner's case it’s actually -sin(t).
00:18:58.800 --> 00:19:03.200
So what we get from the computation is exactly
the rotation matrix we had from before!
00:19:07.280 --> 00:19:12.320
To me, this is extremely beautiful. We have two
completely different ways of reasoning about the
00:19:12.320 --> 00:19:17.360
same system and they give us the same answer.
I mean it’s reassuring that they do, but it
00:19:17.360 --> 00:19:21.600
is wild just how different the mode of thought
is when you're chugging through the polynomial
00:19:21.600 --> 00:19:26.640
vs. when you're geometrically reasoning about what
a velocity perpendicular to a position must imply.
00:19:27.520 --> 00:19:30.560
Hopefully the fact that these line
up inspires a little confidence
00:19:30.560 --> 00:19:34.080
in the claim that matrix exponents
really do solve systems like this.
00:19:35.120 --> 00:19:38.880
This explains the computation we saw at the
start, by the way, with the matrix that had
00:19:38.880 --> 00:19:45.280
-π and π off the diagonal, producing the negative
identity. This expression is exponentiating
00:19:45.280 --> 00:19:51.120
a 90-degree rotation matrix times π, which is
another way to describe what the Romeo-Juliet
00:19:51.120 --> 00:19:56.240
setup does after π units of time. As we
now know, that has the effect of rotating
00:19:56.240 --> 00:20:01.280
everything by 180-degrees in this state space,
which is the same as multiplying everything by -1.
00:20:02.880 --> 00:20:06.160
Also, for any of you familiar
with imaginary number exponents,
00:20:06.160 --> 00:20:12.320
this example is probably ringing a ton of bells.
It is 100% analogous. In fact, we could have
00:20:12.320 --> 00:20:17.440
framed the entire example where Romeo and Juliet’s
feelings were packaged into a complex number,
00:20:17.440 --> 00:20:21.840
and the rate of change of that complex
number would have been i times itself since
00:20:21.840 --> 00:20:27.360
multiplication by i also acts like a 90-degree
rotation. The same exact line of reasoning,
00:20:27.360 --> 00:20:34.320
both analytic and geometric, would have led to
this whole idea that e^(it) describes rotations.
00:20:34.320 --> 00:20:37.920
There are actually two of many different
examples throughout math and physics
00:20:37.920 --> 00:20:42.400
when you find yourself exponentiating some
object which acts as a 90-degree rotation,
00:20:42.400 --> 00:20:47.840
times time. It shows up with quaternions or many
of the matrices that pop up in quantum mechanics.
00:20:48.640 --> 00:20:53.200
In all of these cases, we have this really neat
general idea that if you take some operation
00:20:53.200 --> 00:20:58.160
that rotates 90-degrees in some plane, often
it's a plane in some high-dimensional space
00:20:58.160 --> 00:21:03.360
we can’t visualize, then what we get by
exponentiation that operation times time
00:21:03.360 --> 00:21:07.040
is something that generates all
other rotations in that same plane.
00:21:09.360 --> 00:21:13.600
One of the more complicated variations on
this same theme is Shrödinger’s equation.
00:21:13.600 --> 00:21:15.440
It’s not just that it has the
00:21:15.440 --> 00:21:20.960
derivative-of-a-state-equals-some-matrix-times-that-state
form. The nature of the relevant matrix
00:21:20.960 --> 00:21:26.320
here is such that this equation also describes
a kind of rotation, though in many applications
00:21:26.320 --> 00:21:30.240
of Schrödinger's equation it will be a
rotation inside a kind of function space.
00:21:30.240 --> 00:21:34.080
It’s a little more involved, though, because
typically there's a combination of many different
00:21:34.080 --> 00:21:38.560
rotations. It takes time to really dig into
this equation, and I’d love to do that in a
00:21:38.560 --> 00:21:44.560
later chapter. But right now I cannot help but at
least allude to the fact that this imaginary unit
00:21:44.560 --> 00:21:49.680
i that sits so impishly in such a fundamental
equation for all of the universe, is playing
00:21:49.680 --> 00:21:55.840
basically the same role as the matrix from our
Romeo-Juilet example. What this i communicates
00:21:55.840 --> 00:22:01.680
is that the rate of change of a certain state
is, in a sense, perpendicular to that state,
00:22:01.680 --> 00:22:06.560
and hence that the way things have to evolve
over time will involve a kind of oscillation.
00:22:10.960 --> 00:22:14.880
But matrix exponentiation can do
so much more than just rotation.
00:22:14.880 --> 00:22:18.960
You can always visualize these sorts of
differential equations using a vector field.
00:22:20.400 --> 00:22:25.200
The idea is that this equation tells us that the
velocity of a state is entirely determined by its
00:22:25.200 --> 00:22:30.480
position. So what we do is go to every point in
this space, and draw a little vector indicating
00:22:30.480 --> 00:22:36.720
what the velocity of a state must be if it passes
through that point. For our type of equation,
00:22:36.720 --> 00:22:54.000
this means that we go to each point v in
space, and we attach the vector M times v.
00:22:54.000 --> 00:22:59.120
To intuitively understand how any given initial
condition will evolve, you let it flow along this
00:22:59.120 --> 00:23:04.160
field, with a velocity always matching whatever
vector it’s sitting on at any given point in time.
00:23:05.600 --> 00:23:12.000
So if the claim is that solutions to this equation
look like e^(Mt) times some initial condition,
00:23:12.000 --> 00:23:16.880
it means you can visualize what the matrix
e^(Mt) does by letting every possible
00:23:16.880 --> 00:23:26.480
initial condition flow along this field for
t units of time. The transition from start
00:23:26.480 --> 00:23:31.920
to finish is described by whatever matrix
pops out from the computation for e^(Mt).
00:23:33.440 --> 00:23:38.400
In our main example with the 90-degree rotation
matrix, the vector field looks like this,
00:23:38.400 --> 00:23:44.160
and as we saw e^(Mt) describes rotation in that
case, which lines up with flow along this field.
00:23:45.600 --> 00:23:50.080
As another example, the more Shakesperian
Romeo and Juliet might have equations
00:23:50.080 --> 00:23:54.320
that look a little more like this, where
Juliet’s rule is symmetric with Romeo’s,
00:23:54.320 --> 00:23:58.320
and both of them are inclined to get carried
away in response to one and others feelings.
00:23:59.200 --> 00:24:03.680
Again, the way the vector field you’re looking
at has been defined is to go to each point v in
00:24:03.680 --> 00:24:10.000
space and attach the vector M times v. This is the
pictorial way of saying that the rate of change of
00:24:10.000 --> 00:24:16.960
a state must always equal M times itself. But for
this example, flow along the vector field looks a
00:24:16.960 --> 00:24:22.240
lot different from how it did before. If Romeo
and Juliet start anywhere in this upper-right
00:24:22.240 --> 00:24:27.040
half of the plane, their feelings will feed off
of each other and they both tend towards infinity.
00:24:30.720 --> 00:24:33.040
If they're in the other half of the plane, well,
00:24:33.040 --> 00:24:36.800
let’s just say they stay more true to their
Capulet and Montague family traditions.
00:24:37.760 --> 00:24:42.960
So even before you try calculating the exponential
of this particular matrix, you can already have an
00:24:42.960 --> 00:24:47.840
intuitive sense of what the answer should look
like. The resulting matrix should describe the
00:24:47.840 --> 00:24:53.440
transition from time 0 to time t, which, if you
look at the field, seems to indicate that it will
00:24:53.440 --> 00:24:59.840
squish along one diagonal while stretching along
another, getting more extreme as t gets larger.
00:25:00.640 --> 00:25:06.160
Of course, all of this is presuming that e^(Mt)
times an initial condition actually solves these
00:25:06.160 --> 00:25:11.040
systems. This is one of those facts that’s easiest
to believe when you just work it out yourself.
00:25:12.320 --> 00:25:13.840
But I’ll run through a quick rough sketch.
00:25:15.920 --> 00:25:19.120
Write out the full polynomial that defines e^(Mt),
00:25:19.120 --> 00:25:22.000
and multiply by some initial
condition vector on the right.
00:25:26.720 --> 00:25:29.200
And then take the derivative
of this with respect to t.
00:25:30.000 --> 00:25:41.840
Because M is a constant, this just means
applying the power rule to each one of the terms.
00:25:43.520 --> 00:25:46.960
And that power rule really nicely
cancels out with the factorial terms.
00:25:53.440 --> 00:25:58.000
So what we're left with is an expression that
looks almost identical to what we had before,
00:25:58.000 --> 00:26:01.040
except that every term has
an extra M hanging on to it.
00:26:01.040 --> 00:26:05.680
But this can be factored out to the left.
So the derivative of this expression
00:26:05.680 --> 00:26:10.240
is M times the original expression,
and hence it solves the equation.
00:26:11.200 --> 00:26:15.680
This actually sweeps under the rug some details
required for rigor, mostly centered around the
00:26:15.680 --> 00:26:19.760
question of whether or not this thing actually
converges, but it does give the main idea.
00:26:20.880 --> 00:26:25.040
In the next chapter, I would like to talk more
about the properties that this operation has,
00:26:25.040 --> 00:26:28.480
most notably its relationship
with eigenvectors and eigenvalues,
00:26:28.480 --> 00:26:33.200
which leads us to more concrete ways of thinking
about how you actually carry out the computation,
00:26:33.200 --> 00:26:37.680
which otherwise seems insane. Also,
time permitting, it might be fun to
00:26:37.680 --> 00:26:51.840
talk about what it means to raise e to
the power of the derivative operator.