## Thursday, May 23, 2013

### The Data

or How Fast Does Google Think We Drive?

I picked this question up on Twitter a while back and really liked it. So, after we worked on making a plan, I thought it would be good to look at data.  Lots of data. And then coming up with ways to make sense out of it.

Process

So, we took to Google.  Students went to Maps, entered starting location and destination (I didn't realize that I'd need to explain that you can't drive from California to China, but, whatever.) and then entered their data into a Google Form.  We did this over five different classes and got a lot of "stuff" to sift through.

We dumped the data into GeoGebra and then took a look at a few different perspectives.  It's interesting to see how the data changes as we look at trips of different distances.

The applet below will really give you a good picture.

Takeaways

• Unit rates are valuable.
• When points don't line up perfectly, sometimes we can use a line to help us answer questions.
• As soon as we have a line we like, the actual data points can kind of get in the way.
• You can't drive to China.

These are 7th graders and they have some experience with linear relationships.  However, that experience has been limited to "the number in front of the x is the slope and the other number is the y-intercept" kind of stuff.  It really threw some kids that their line of "best fit" may not have been the same as everyone else's.  We are doing this very informally at this time.

Questions

I know that the formal process for determining a linear regression is pretty involved, but does it have to be for a proportional relationship?  That is, if we know a relationship (like distance : time)  is a proportion but the data doesn't line up exactly, is it appropriate to simply average the distance:time ratios to determine a "rate of best fit?"

When informally drawing a line of best fit for a proportional relationship, should (0,0) always be the starting point?

Unknown said...

I think if you are modelling a proportional relationship, it is appropriate to use (0,0) as a starting point.

Averaging slopes will not give you the same answer as regression. Surprisingly it is substantially worse for determining the slope in many cases than a linear regression which allows for a nonzero y-intercept, even if the data is truly proportional + some noise. But I think its statistically consistent and won't be too far off.

Unknown said...

If you have a set of points (x_i, y_i), and you do linear regression with a y-intercept of 0, then the least-square error slope is

sum (x_i y_i) / sum (x_i)^2

Compared to averaging the slopes, you can see that points with large x and y will dominate the calculation.

Mrs.Nehila said...

Hi David,
Awesome blog!
I have nominated you for a Liebster Award. The award is a way to recognize bloggers, offer encouragement, and possibly get more readers. Check out my blog post at http://fliplearnshare.blogspot.com/2013/07/an-awesome-surprise.html

Play along if you want to! I appreciate your work online!

Personal Finance said...

I love the idea of this post, but I couldn't get the applet to run in Safari or Chrome (my Java is updated). Not that you have time to trouble shoot, but what does the applet show and could you send the file my way?