3.5: Mathematics of Gradient Descent – Intelligence and Learning

In this video, I explain the mathematics behind Linear Regression with Gradient Descent, which was the topic of my previous machine learning video (

This video is part of session 3 of my Spring 2017 ITP “Intelligence and Learning” course (

3Blue1Brown’s Essence of Calculus:

My videos on calculus:
Power Rule:
Chain Rule:
Partial Derivative:

Support this channel on Patreon:
To buy Coding Train merchandise:
Donate to the Processing Foundation:

Send me your questions and coding challenges!:

The Coding Train website:

Links discussed in this video:
Session 3 of Intelligence and Learning:
3Blue1Brown’s Essence of Calculus:

Source Code for the all Video Lessons:


For More Coding Challenges:
For More Intelligence and Learning:

Help us caption & translate this video!






28 responses to “3.5: Mathematics of Gradient Descent – Intelligence and Learning”

  1. Shadman Kudchikar Avatar

    awesome u really gave different way to look gd will love to see more ML videos by you. Awesome work bro!

  2. Никита Богданов Avatar

    One of the best video from you . Thanks ! Could you make more videos about any other algorithms and explain math like in this video ?

  3. Justin Nunez Avatar

    I have taken machine learning and I will be taking statistical machine learning. This is a really helpful video even though it isn't as advanced or fast.

    I love your enthusiasm and humor as a teacher as well. Please keep this up! It is awesome.

  4. John Walker Avatar

    Had to watch the video a few times to understand but great job with the videos all very detailed unlike others that try to teach the same.

  5. satya Neelamraju Avatar

    An excellent video.. The best video in the internet for Gradient Descent Algorithm. Thanku so much 🙂 … Keep posting like this

  6. Caveman Al Toraboran Avatar

    Worse explanation ever. You jump on so many steps, even from the start. E.g. you didn't even say what is "guess". But I knew what you meant ony cause I know how this works. Are you teaching those who know? Or are you teachin those who don't? Poor teaching if it's the latter.

  7. m4n40 Avatar

    I love your videos man, thanks for sharing!

  8. SM Raza Abidi Avatar

    Really awesome, what a elegant style of delivering the concepts, mind boggling. I wish & dream i should work and get education under his supervision. Moreover, gestures, tone, humor was extra extra outstanding, i'm speechless. :). I must say that it is the best ever explanation of gradient descent I've seen so far. Thanks a lot.

  9. Wanawan Avatar

    I saw somewhere that the loss function has the sum multiplied by 1/N ? Is it not important here ?

  10. Peter Li Avatar

    at 12:01 , the summation sign on the cost function just disappear (sum of error squared) ? What happened ?

  11. Dr. K Avatar

    Thank you, great introduction. My question: at this point do you have an advantage over ols or does the gradient decent shine at higher dimensions? We are really just trying to find the equation of the best fitted prediction line and the result should be the same for this simple example, right? I am looking forward to the advanced videos.

  12. soumya sarkar Avatar

    very energetic presentaion …loved it

  13. souslicer Avatar

    I don't get it, youre updating m and b in a loop, per data point. what if you wanted to optimize over multiple datapoints at once

  14. NateZimer Avatar

    1. Start with a model. Y = m * x + b.
    2. Find jacobian with respect to your parameters(m, b). The jacobian is simply J = [x , 1].
    3. Plug the jacobian and your outputs, into the normal equation [m;b] = (J^T * J)^-1 * J^T * y.
    B00m, you have the optimal solution in 1 step. No 1+ hours of videos required. No magical factors going missing. You should start out with the jacobian approach since it is obvious how to apply this approach to any linear model. Furthermore, you can show how to formulate the jacobian through finite differences. This allows one to optimize against a highly nonlinear function easily like a Neural Network without having to take an analytical derivative.
    Seeing as you say the mathematics video is optional, it appears you are trying to create an army of ignorant numerical optimization programmers. This just creates a giant headache for the rest of us who have to deal with such individual because such people have no capacity to expand on their application/knowledge since they don't really understand what they are doing to begin with.

  15. ProGamerHD Avatar

    How do you detect a color in an if statement? I am trying to have it so if ellipse/x is greater than (color) the ball loops back to 30

  16. Marius Myburg Avatar

    Great video, thanks Daniel.

  17. Cre Henge Avatar

    it's interesting how different youtube channels become different classes^^ Khan gives you allt the calculus you could ever want if you're a beginner

  18. Danilo Marques Avatar

    Geez, you're very very smart.

  19. returnexitsuccess Avatar

    At 18:00, I think you don't really explain well why you're allowed to discount the 2. It is really only because the 2 appears in the m and the b derivative that you are able to cancel it, because then the direction of the gradient remains the same. Maybe this is what you were trying to get across but it sounded like your reasoning was just that since you're multiplying it by any small number that the 2 didn't matter.

    Thanks for these videos, they are pretty helpful!

  20. haha Avatar

    I just watched some kids building a neural network example using your tutorial (Nature of Code). Check it out: https://www.youtube.com/watch?v=Xfoa4EdbLcc This is amazing. I didn't know that your tutorials help kids tackle learning programming too, especially in learning these tough topics. You have done an excellent job, Dan.

  21. Eric Diaz Avatar

    Pure awesomeness!

  22. guestservice Avatar

    Your explanation is so good and easy to understand

  23. TheWeepingCorpse Avatar

    thank you for everything you do. I'm a c++ guy but your videos are very interesting.

  24. Oliva Midnight Avatar

    hi! i have one question
    i like these video: Tutorial: How to render Processing sketch as a movie ,but how i can do with p5,yesterday i start with processing today with p5 and i dont know how to save a animation in .mov

    any suggestion?

    btw these is what i do 🙂 (so happy with the result) https://www.youtube.com/watch?v=jU2R57J5VFc&t=2s


  25. Ilay köksal Avatar

    I can not believe this video aired just when i needed it, that you so much!

  26. BlazertronGames Avatar

    Did you have a different username around a year ago?

Leave a Reply

Your email address will not be published. Required fields are marked *