## Thursday, May 23, 2013

### The Data

or How Fast Does Google Think We Drive?

I picked this question up on Twitter a while back and really liked it. So, after we worked on making a plan, I thought it would be good to look at data.  Lots of data. And then coming up with ways to make sense out of it.

Process

So, we took to Google.  Students went to Maps, entered starting location and destination (I didn't realize that I'd need to explain that you can't drive from California to China, but, whatever.) and then entered their data into a Google Form.  We did this over five different classes and got a lot of "stuff" to sift through.

We dumped the data into GeoGebra and then took a look at a few different perspectives.  It's interesting to see how the data changes as we look at trips of different distances.

The applet below will really give you a good picture.

Takeaways

• Unit rates are valuable.
• When points don't line up perfectly, sometimes we can use a line to help us answer questions.
• As soon as we have a line we like, the actual data points can kind of get in the way.
• You can't drive to China.

These are 7th graders and they have some experience with linear relationships.  However, that experience has been limited to "the number in front of the x is the slope and the other number is the y-intercept" kind of stuff.  It really threw some kids that their line of "best fit" may not have been the same as everyone else's.  We are doing this very informally at this time.

Questions

I know that the formal process for determining a linear regression is pretty involved, but does it have to be for a proportional relationship?  That is, if we know a relationship (like distance : time)  is a proportion but the data doesn't line up exactly, is it appropriate to simply average the distance:time ratios to determine a "rate of best fit?"

When informally drawing a line of best fit for a proportional relationship, should (0,0) always be the starting point?

Marshall Hampton said...

I think if you are modelling a proportional relationship, it is appropriate to use (0,0) as a starting point.

Averaging slopes will not give you the same answer as regression. Surprisingly it is substantially worse for determining the slope in many cases than a linear regression which allows for a nonzero y-intercept, even if the data is truly proportional + some noise. But I think its statistically consistent and won't be too far off.

Marshall Hampton said...

If you have a set of points (x_i, y_i), and you do linear regression with a y-intercept of 0, then the least-square error slope is

sum (x_i y_i) / sum (x_i)^2

Compared to averaging the slopes, you can see that points with large x and y will dominate the calculation.