Taking a break from other things to pick up a thread on floating point and teaching with @five9a2, @ShriramKMurthi, and @khinsen from earlier in this week (I got a bit distracted mid-discussion, but it was an interesting conversation).
When I talk about FP in my NA classes, I usually organize it around representations, operations, and exceptions, then talk about the common model used for error analysis and the benefits and shortcomings of that model.
There's a tendency to think of floating point numbers as fuzzy things, and that always rubs me the wrong way. Part of the reason it rubs me the wrong way is that the cases where FP operations are *exact* are really important.
I prefer to talk about floating point as exact arithmetic, correctly rounded, and then say there are approximate models that are useful for reasoning about special cases.
The "exact arithmetic, correctly rounded" is important if you want to explain how you can use standard floating point to emulate extra precision, or how you can get consistent views of geometric predicates using just floating point.
My preferred example for class (because I can do it in five minutes on the board) is use of mixed precision for the sign of a 2-by-2 determinant used to tell whether two line segments cross or not.
If your inputs are in single precision (and limited in range) and intermediates are in double precision, you can compute the sign of the determinant exactly. If you use the same precision for inputs and intermediates, things can go wrong.
I have no idea how to even start discussing why that works without first getting rid of the "floating point numbers are fuzzy numbers" mental model.
Once that is out of the way, I usually introduce the "1+delta" *model* of floating point arithmetic: when elementary operations that yield a normalized floating point number zhat, we have zhat = z (1+delta) where z is the exact result and delta is bounded by machine epsilon.
I point out that the 1+delta model overestimates error, and sometimes it's useful to explicitly remember that some ops (scaling by 2, differences of numbers within a factor of 2) are exact. But it's good for a lot of analyses.
Then I usually point out that the different floating point exceptions correspond to events that might violate your expectations: inexact for rounding, underflow for potential loss of precision, div-by-zero for going into the extended reals, etc.
The 1+delta model is good if you're working with inexact normalized numbers, but can fail if you fall outside those representations; fortunately, exceptions tell you when that has happened (assuming you can get to the exception flags, but that's a different discussion).
Nonetheless, the 1+delta mode, together with the usual linearized approximation (1+d1)(1+d2) approx (1+d1+d2) is great for lots of error analyses, particularly combined with the idea of backward error analysis.
Backward error analysis is the idea that you express rounding errors as perturbations to your inputs. So instead of trying to say that you've computed slightly the wrong answer to your problem, you say that you've computed the correct answer to slightly the wrong problem.
A nice thing about backward error analysis is that one can often separate the issue of rounding error from the issue of sensitivity of the problem to small perturbations -- which is useful when reasoning about other types of errors, too.
Usually at this point in the class, I've talked about the concept of problem sensitivity and conditioning. I often have also talked about different sources of error other than rounding (modeling error, measurement error, Monte Carlo noise, termination of iterations, etc).
So we start with "floating point numbers aren't fuzzy," but then move on to "we can model rounding errors via bounded perturbations" (as close to the "fuzzy" mental model as I get), and then explain how backward error lets us treat roundoff in a similar way to other error terms.
Do I expect students to have all this dumped on them the first time they see a float? No! But I dislike the "fuzzy numbers" intro because there are things you can do with FP that would seem to be obviated by the "fuzzy number" model.
Better to say "exact results, correctly rounded" on a first go, and then explain the 1+delta model and co later as students grow in sophistication (e.g. in an NA class).
The fact that a heck of a lot of folks use floating point with reckless abandon having never learned any of this bothers me, of course (and a lot of ML users fall into this category). But maybe that's a rant for another day. /fin