master

分支 (5)

管理

管理

master

vdp_animation

lqr_lmi

playbook

lpdp

underactuated-2020
/
stochastic.html

<!DOCTYPE html>

<html>

  <head>
    <title>Underactuated Robotics: Model Systems
with Stochasticity</title>
    <meta name="Underactuated Robotics: Model Systems
with Stochasticity" content="text/html; charset=utf-8;" />
    <link rel="canonical" href="http://underactuated.mit.edu/stochastic.html" />

    <script src="https://hypothes.is/embed.js" async></script>
    <script type="text/javascript" src="htmlbook/book.js"></script>

    <script src="htmlbook/mathjax-config.js" defer></script>
    <script type="text/javascript" id="MathJax-script" defer
      src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
    </script>
    <script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>

    <link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
    <script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
    <script>hljs.initHighlightingOnLoad();</script>

    <link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
  </head>

<body onload="loadChapter('underactuated');">

<div data-type="titlepage">
  <header>
    <h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
    <p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
    <p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
    <p style="font-size: 14px; text-align: right;">
      &copy; Russ Tedrake, 2020<br/>
      <a href="tocite.html">How to cite these notes</a> &nbsp; | &nbsp;
      <a target="_blank" href="https://docs.google.com/forms/d/e/1FAIpQLSesAhROfLRfexrRFebHWLtRpjhqtb8k_iEagWMkvc7xau08iQ/viewform?usp=sf_link">Send me your feedback</a><br/>
    </p>
  </header>
</div>

<p><b>Note:</b> These are working notes used for <a
href="http://underactuated.csail.mit.edu/Spring2020/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2020 semester.  <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture  videos are available on YouTube</a>.</p>

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=humanoids.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=dp.html>Next Chapter</a></td>
</tr></table>


<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 5"><h1>Model Systems
with Stochasticity</h1>

  <!-- TODO: Adopt more of the 6.437 notation; it's excellent and compatible:
  https://stellar.mit.edu/S/course/6/sp19/6.437/courseMaterial/topics/topic5/readings/Preliminaries/Preliminaries.pdf
   -->

  <p>My goals for this chapter are to build intuition for the beautiful and rich
  behavior of nonlinear dynamical system that are subjected to random
  (noise/disturbance) inputs.  So far we have focused primarily on systems
  described by \[ \dot{\bx}(t) = f(\bx(t),\bu(t)) \quad \text{or} \quad \bx[n+1]
  = f(\bx[n],\bu[n]). \] In this chapter, I would like to broaden the scope to
  think about \[ \dot{\bx}(t) = f(\bx(t),\bu(t),\bw(t)) \quad \text{or} \quad
  \bx[n+1] = f(\bx[n],\bu[n],\bw[n]), \] where this additional input $\bw$ is
  the (vector) output of some random process.  In other words, we can begin
  thinking about stochastic systems by simply understanding the dynamics of our
  existing ODEs subjected to an additional random input. </p>

  <p> This form is extremely general as written.  $\bw(t)$ can represent
  time-varying random disturbances (e.g. gusts of wind), or even constant model
  errors/uncertainty.  One thing that we are not adding, yet, is measurement
  uncertainty.  There is a great deal of work on observability and state
  estimation that study the question of how you can infer the true state of the
  system given noise sensor readings.  For this chapter we are assuming perfect
  measurements of the full state, and are focused instead on the way that
  "process noise" shapes the long-term dynamics of the system. </p>

  <p> I will also stick primarily to discrete time dynamics for this chapter,
  simply because it is easier to think about the output of a discrete-time
  random process, $\bw[n]$, than a $\bw(t)$. But you should know that all of the
  ideas work in continuous time, too.  Also, most of our examples will take the
  form of <i>additive noise</i>: \[ \bx[n+1] = f(\bx[n],\bu[n]) + \bw[n], \]
  which is a particular useful and common specialization of our general form.
  And this form doesn't give up much -- the disturbance on step $k$ can pass
  through the nonlinear function $f$ on step $k+1$ giving rich results -- but is
  often much easier to work with. </p>

  <section><h1>The Master Equation</h1>

    <p>Let's start by looking at some simple examples.</p>

    <example><h1>A Bistable System + Noise</h1>

      <p>Let's consider (a time-reversed version of) one of my favorite
      one-dimensional systems: \[ \dot{x} = x - x^3. \] <todo>add plot</todo>
      This deterministic system has stable fixed points at $x^* = \{-1,1\}$ and
      an unstable fixed point at $x^* = 0$. </p> <p> A reasonable discrete-time
      approximation of these dynamics, now with additive noise, is \[ x[n+1] =
      x[n] + h (x[n] - x[n]^3 + w[n]). \] When $w[n]=0$, this system has the
      same fixed points and stability properties as the continuous time system.
      But let's examine the system when $w[n]$ is instead the result of a
      <i>zero-mean Gaussian white noise process</i>, defined by: \begin{gather*}
      \forall n, E\left[ w[n] \right] = 0,\\ E\left[ w[i]w[j] \right] =
      \begin{cases} \sigma^2, & \text{ if } i=j,\\ 0, & \text{ otherwise.}
      \end{cases} \end{gather*} Here $\sigma$ is the standard deviation of the
      Gaussian. </p> <p>When you simulate this system for small values of
      $\sigma$, you will see trajectories move roughly towards one of the two
      fixed points (for the deterministic system), but every step is modified by
      the noise.  In fact, even if the trajectory were to arrive exactly at what
      was once a fixed point, it is almost surely going to move again on the
      very next step.  In fact, if we plot many runs of the simulation from
      different initial conditions all on the same plot, we will see something
      like the figure below.</p> <figure> <img width="80%"
      src="figures/cubic_bistable_particles_time.svg"/>
      <figcaption>Simulation of the bistable system with noise ($\sigma = 0.01$)
      from many initial conditions.</figcaption> </figure>

      <p>During any individual simulation, the state jumps around randomly for
      all time, and could even transition from the vicinity of one fixed point
      to the other fixed point.  Another way to visualize this output is to
      animate a histogram of the particles over time.</p>

      <figure> <img width="80%"
      src="figures/cubic_bistable_histogram.svg"/> <figcaption>Histogram of
      the states of the bistable system with noise ($\sigma = 0.01$) after
      simulating from random initial conditions until $t=30$. <br/><a
      href="figures/cubic_bistable_histogram.swf">Click here to see the
      animation</a></figcaption> </figure>

      <p>You can run this demo for yourself:</p>
      <jupyter>examples/stochastic.ipynb</jupyter>

    </example>

    <p>Let's take a moment to appreciate the implications of what we've just
    observed. Every time that we've analyzed a system to date, we've asked
    questions like "given x[0], what is the long-term behavior of the system,
    $\lim_{n\rightarrow\infty} x[n]$?", but now $x[n]$ is a <i>random
    variable</i>. The trajectories of this system do not converge, and the
    system does not exhibit any form of stability that we've introduced so far.
    </p>

    <p>All is not lost.  If you watch the animation closely, you might notice
    the <i>distribution</i> of this random variable is actually very well
    behaved.  This is the key idea for this chapter.</p>

    <p>Let us use $p_n(x)$ to denote the <a
    href="https://en.wikipedia.org/wiki/Probability_density_function">
    probability density function</a> over the random variable $x$ at time $n$.
    Note that this density function must satisfy \[ \int_{-\infty}^\infty p_n(x)
    dx = 1.\] It is actually possible to write the <i>dynamics of the
    probability density</i> with the simple relation \[ p_{n+1}(x) =
    \int_{-\infty}^\infty p(x|x') p_n(x') dx', \] where $p(x|x')$ encodes the
    stochastic dynamics as a conditional distribution of the next state (here
    $x$) as a function of the current state (here $x'$).  Dynamical systems that
    can be encoded in this way are known as <i>continuous-state Markov
    Processes</i>, and the governing equation above is often referred to as the
    "<a href="https://en.wikipedia.org/wiki/Master_equation">master
    equation</a>" for the stochastic process.  In fact this update is even
    linear(!) ; a fact that can enable closed-form solutions to some impressive
    long-term statistics, like mean time to failure or first passage
    times<elib>Byl08a</elib>. Unfortunately, it is often difficult to perform
    the integral in closed form, so we must often resort to discretization or
    numerical analysis. </p>

    <example id="logistic"><h1>The Logistic Map</h1>

      <p>In fact, one does not actually need stochastic dynamics in order for
      the dynamics of the distribution to be the meaningful object of study;
      random initial conditions can be enough.  One of the best examples comes
      from perhaps the simplest and most famous example of a chaotic system: the
      logistic map.  This example is described beautifully in
      <elib>Lasota13</elib>.</p>

      <p>Consider the following difference equation: $$x[n+1] = 4
      x[n](1-x[n]),$$ which we will study over the (invariant) interval $x \in
      [0, 1]$. </p>

      <figure> <img width="80%" src="figures/logistic_map.svg"/>
      <figcaption>The discrete-time dynamics of the "logistic map" (plotted
      alongside the line $x[n+1] = x[n]$).</figcaption>
      </figure>

      <p>It takes only a moment of tracing your finger along the plot using the
      "<a href="simple_legs.html#staircase">staircase method</a>" to see what
      makes this system so interesting -- rollouts from a single initial
      condition end up bouncing all over the interval $(0,1)$, and neighboring
      initial conditions will end up taking arbitrarily different trajectories
      (this is the hallmark of a "chaotic" system).</p>

      <figure> <img width="80%" src="figures/logistic_map_rollouts.svg"/>
        <figcaption>Two simulations of the logistic map from different initial
        conditions.  Remember, there is no stochasticity here -- the dynamics
        are entirely deterministic!</figcaption>
        </figure>

      <p>Here's what's completely fascinating -- even though the dynamics of any
      one initial condition for this system are extremely complex, if we study
      the dynamics of a distribution of states through the system, they are
      surprisingly simple and well-behaved.  This system is one of the rare
      cases when we can write the master equation in closed
      form<elib>Lasota13</elib>: $$p_{n+1}(x) = \frac{1}{4\sqrt{1-x}} \left[
      p_n\left(\frac{1}{2}-\frac{1}{2}\sqrt{1-x}\right) + p_n\left(\frac{1}{2} +
      \frac{1}{2}\sqrt{1-x}\right) \right].$$  Moreover, this master equation
      has a steady-state solution: $$p_*(x) = \frac{1}{\pi\sqrt{x(1-x)}}.$$ </p>

      <figure> <img width="116%" style="margin-left:-8%" src="figures/logistic_map_master.svg"/>
        <figcaption>Plotting the (closed-form) evolution of the master equation
        for the logistic map, initialized with $p_0(x) = 1$, reveals surprising
        simplicity, and rapid convergence to a stationary
        distribution (dashed orange line).</figcaption>
        </figure>

      <p>For this system (and many chaotic systems), the dynamics of a single initial condition are complicated, but the dynamics a <i>distribution</i> of initial conditions are beautiful and simple!</p>

      <p>Note: when the dynamical system under study has deterministic dynamics
      (but a distribution of initial conditions), the linear map given by the
      master equation is known as the <i>Perron-Frobenius operator</i>, and it
      gives rise to the Liouville equation that we will study later in the
      chapter.</p>

    </example>

    <p>The slightly more general form of the master equation, which works for
    multivariate distributions with state-domain ${\cal X}$, and systems with
    control inputs $\bu$, is \[ p_{n+1}(\bx) = \int_{\cal X} p(\bx|\bx',\bu)
    p_n(\bx') d\bx'. \] This is a (continuous-state) Markov Decision
    Process.</p>

    <p>Continuous-time formulations are also possible -- these lead to the
    so-called <a
    href="https://en.wikipedia.org/wiki/Fokker%E2%80%93Planck_equation">
    Fokker-Planck equation</a>.</p>

    <todo>mention Chapman-Kolmogorov?</todo>
    <todo>mention Ito Calculus?</todo>

  </section>

  <section><h1>Stationary Distributions</h1>

    <p>In the example above, the histogram is our numerical approximation of the
    probability density. If you simulate it a few times, you will probably
    believe that, although the individual trajectories of the system <i>do
    not</i> converge, the probability distribution actually <i>does</i> converge
    to what's known as a <i>stationary distribution</i> -- a fixed point of the
    master equation.  Instead of thinking about the dynamics of the
    trajectories, we need to start thinking about the dynamics of the
    distribution.</p>

    <example><h1>The Linear Gaussian System</h1>

      <p>Let's consider the one-dimensional linear system with additive noise:
      \[ x[n+1] = a x[n] + w[n]. \] When $w[n]=0$, the system is stable (to the
      origin) for $-1 < a < 1$.  Let's make sure that we can understand the
      dynamics of the master equation for the case when $w[n]$ is Gaussian white
      noise with standard deviation $\sigma$. </p>

      <p>First, recall that the probability density function of a Gaussian with
      mean $\mu$ is given by \[ p(w) = \frac{1}{\sqrt{2 \pi \sigma^2}}
      e^{-\frac{(w-\mu)^2}{2\sigma^2}}. \] When conditioned on $x[n]$, the
      distribution given by the dynamics subjected to mean-zero Gaussian white
      noise is simply another Gaussian, with the mean given by $ax[n]$: \[
      p(x[n+1]|x[n]) = \frac{1}{\sqrt{2 \pi \sigma^2}}
      e^{-\frac{(x[n+1]-ax[n])^2}{2\sigma^2}}, \] yielding the master equation
      \[ p_{n+1}(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \int_{-\infty}^\infty
      e^{-\frac{(x-ax')^2}{2\sigma^2}} p_n(x') dx'. \] </p>

      <p>Now here's the magic.  Let's push a distribution, $p_n(x)$, which is
      zero-mean, with standard deviation $\sigma_n$, through the master
      equation: \begin{align*} p_{n+1}(x) =& \frac{1}{\sqrt{2 \pi
      \sigma^2}}\frac{1}{\sqrt{2 \pi \sigma_n^2}} \int_{-\infty}^\infty
      e^{-\frac{(x-ax')^2}{2\sigma^2}} e^{-\frac{(x')^2}{2\sigma_n^2}} dx',\\ =&
      \frac{1}{\sqrt{2 \pi (\sigma^2+a^2 \sigma_n^2)}} e^{-\frac{x^2}{2(\sigma^2
      + a^2 \sigma_n^2)}}. \end{align*} The result is another mean-zero Gaussian
      with $\sigma_{n+1}^2 = \sigma^2 + a^2 \sigma_n^2$.  This result
      generalizes to the multi-variate case, too, and might be familiar to you
      e.g. from the process update of the <a
      href="https://en.wikipedia.org/wiki/Kalman_filter">Kalman filter</a>. </p>

      <p>Taking it a step further, we can see that a stationary distribution for
      this system is given by a mean-zero Gaussian with \[ \sigma_*^2 =
      \frac{\sigma^2}{1-a^2}. \] Note that this distribution is well defined
      when $-1 < a < 1$ (only when the system is stable).</p>

    </example>

    <p>The stationary distribution of the linear Gaussian system reveals the
    fundamental and general balance between two terms in the governing equations
    of any stochastic dynamical system: the stability of the deterministic
    system is bringing trajectories together (smaller $a$ means faster
    convergence of the deterministic system and results in a more narrow
    distribution), but the noise in the system is forcing trajectories apart
    (larger $\sigma$ means larger noise and results in a wider distribution).
    </p>

    <p>Given how rich the dynamics can be for deterministic nonlinear systems,
    you can probably imagine that the possible long-term dynamics of the
    probability are also extremely rich.  If we simply flip the signs in the
    dynamics we examined above, we'll get our next example:</p>

    <example><h1>The Cubic Example + Noise</h1>

      <p>Now let's consider the discrete-time approximation of \[ \dot{x} = -x +
      x^3, \] again with additive noise: \[ x[n+1] = x[n] + h (-x[n] + x[n]^3 +
      w[n]). \] With $w[n]=0$, the system has only a single stable fixed point
      (at the origin), whose deterministic region of attraction is given by $x
      \in (-1,1)$.  If we again simulate the system from a set of random initial
      conditions and plot the histogram, we will see something like the figure
      below. </p>

      <figure> <img width="80%" src="figures/cubic_histogram.svg"/>
      <figcaption>Histogram of the states of the bistable system with noise
      ($\sigma = 0.01$) after simulating from random initial conditions until
      $t=30$. <br/><a href="figures/cubic_histogram.swf">Click here to see the
      animation</a></figcaption> </figure>

      <p>Be sure to watch the animation.  Or better yet, run the simulation for
      yourself by changing the sign of the derivative in the bistability example
      and re-running:</p>

      <p>You can run this demo for yourself:</p>
      <jupyter>examples/stochastic.ipynb</jupyter>

      <p>Now try increasing the noise (e.g. pre-multiply the noise input,
        $w$, in the dynamics by a scalar like 2).</p>

      <p style="text-align:center"><a
      href="figures/cubic_histogram_more_noise.swf">Click here to see the
      animation</a></p>

      <p>What is the stationary distribution for this system?  In fact, there
      isn't one.  Although we initially collect probability density around the
      stable fixed point, you should notice a slow leak -- on every step there
      is some probability of transitioning past unstable fixed points and
      getting driven by the unstable dynamics off towards infinity.  If we run
      the simulation long enough, there won't be any probability density left at
      $x=0$.</p>
    </example>

    <example><h1>The Stochastic Van der Pol Oscillator</h1>

      <p>One more example; this is a fun one.  Let's think about one of the
      simplest examples that we had for a system that demonstrates limit cycle
      stability -- the Van der Pol oscillator -- but now we'll add Gaussian
      white noise, $$\ddot{q} + (q^2 - 1)\dot{q} + q = w(t),$$ Here's the
      question: if we start with a small set of initial conditions centered
      around one point on the limit cycle, then what is the long-term behavior
      of this distribution? </p>

      <figure> <img width="80%" src="figures/vanderpol_particles.svg"/>
      <figcaption>Randomly sampled initial conditions pictured with the stable
      limit cycle of the Van der Pol oscillator.</figcaption> </figure>

      <p>Since the long-term behavior of the deterministic system is periodic,
      it would be very logical to think that the state distribution for this
      stochastic system would fall into a stable periodic solution, too.  But
      think about it a little more, and then watch the animation (or run the
      simulation yourself).</p>

      <jupyter>examples/stochastic.ipynb</jupyter>

      <p style="text-align:center"><a
      href="figures/vanderpol_particles.swf">Click here to see the animation
      (first 20 seconds)</a></p>

      <p style="text-align:center"><a
      href="figures/vanderpol_particles_mixed.svg">Click here to see the
      particles after 2000 seconds of simulation</a></p>

       <p>The explanation is simple: the periodic solution of the system is only
       <i>orbitally stable</i>; there is no stability along the limit cycle.  So
       any disturbances that push the particles along the limit cycle will go
       unchecked.  Eventually, the distribution will "mix" along the entire
       cycle. Perhaps surprisingly, this system that has a limit cycle stability
       when $w=0$ eventually reaches a stationary distribution (fixed point) in
       the master equation.</p>

    </example>

  </section>


  <section><h1>Extended Example: The Rimless Wheel on Rough
  Terrain</h1>

    <p>My favorite example of a meaningful source of randomness on a model
    underactuated system is the <a href="simple_legs.html#rimless_wheel">rimless
    wheel</a> rolling down stochastically "rough" terrain<elib>Byl08f</elib>.
    Generating interesting/relevant probabilistic models of terrain in general
    can be quite complex, but the rimless wheel makes it easy -- since the robot
    only contacts that ground at the point foot, we can model almost arbitrary
    rough terrain by simply taking the ramp angle, $\gamma$, to be a random
    variable. If we restrict our analysis to rolling only in one direction (e.g.
    only downhill), then we can even consider this ramp angle to be i.i.d.;
    after each footstep we will independently draw a new ramp angle $\gamma[n]$
    for the next step.</p>

    <figure>
      <img width="40%" src="figures/rimlessWheel.svg"/>
      <figcaption>The rimless wheel. The orientation of the stance leg,
      $\theta$, is measured clockwise from the vertical axis. </figcaption>
      <todo>Update this to have rough terrain.</todo>
    </figure>

    <p>In our original analysis of the rimless wheel, we derived the "post-collision" return map -- mapping the angular velocity from the beginning of the stance phase to the (post-collision) angular velocity at the next stance phase.  But now that the relative location of the ground is changing on every step, we instead write the "apex-to-apex" return map, which maps the angular velocity from one apex (the moment that the leg is vertical) to the next, which is given by: $$\dot\theta[n+1] = \sqrt{\cos^2\alpha \left( \dot\theta^2[n] + \frac{2g}{l}\left(1-\cos(\alpha + \gamma[n])\right)\right) - \frac{2g}{l}(1 - \cos(\alpha - \gamma[n]))}.$$ </p>

    <p>More coming soon.  Read the paper <elib>Byl08f</elib> and/or watch the <a
        href="https://youtu.be/TaaJgvjYBQc">video</a>.</p>

  </section>

  <section><h1>Noise models for real robots/systems.</h1>

    <p>Sensor models.  Beam model from probabilistic robotics.  RGB-D
    dropouts.</p>

    <p>Perception subsystem.  Output of a
    perception system is not Gaussian noise, it's missed
    detections/drop-outs...</p>

    <p>Distributions over tasks/environments.</p>

  </section>

  <section><h1>Analysis (a preview)</h1>

    <p>Coming soon.  Uncertainty quantification (e.g., linear guassian
    approximation; discretize then closed form solutions using the transition
    matrix....).  Monte-Carlo. Particle sim/filter. Rare event simulation </p>

    <p>L2-gain with dissipation inequalities.  Finite-time verification with
    sums of squares.</p>

    <p>Occupation Measures</p>

    <todo>Section on the liouville equation (ode describring the Perron-Frobenius along a path.</todo>

  </section>


  <section><h1>Control Design (a preview)</h1>


  </section>

</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->

<table style="width:100%;"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=humanoids.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=dp.html>Next Chapter</a></td>
</tr></table>

<div id="footer">
  <hr>
  <table style="width:100%;">
    <tr><td><em>Underactuated Robotics</em></td><td align="right">&copy; Russ
      Tedrake, 2020</td></tr>
  </table>
</div>


</body>
</html>