## Finding a cheap calculation for the distance between two points in the United Kingdom

Posted : Friday, 02 September 2011 00:18:33

Recently I was working on a project whereby we had to be able to calculate the distance between two points (e.g. two sets of lat/lon coordinates) within the United Kingdom. The reason for doing so was so that we were able to rank a set of arbitrary search results by distance from a given location. This could potentially be very computationally expensive particularly when considering there could be hundreds or even thousands of results. By the time I was brought into the project a rough algorithm had been found to to do this but was as yet untested. The algorithm in question is known as the “Squared Euclidian” distance. It basically involves performing a Pythagoras equation to determine the square of a hypotenuse – given that in this scenario there is no requirement to calculate the actual distance as it was purely for ranking purposes there was no reason to find the square root of squared hypotenuse. As usual Wikipedia has information on the subject:

http://en.wikipedia.org/wiki/Euclidean_distance

So that’s the background. The equation was used to rank a set of search results which were then paginated and finally rendered onto a web page, at this point the actual distance was calculated in order to show the actual distance to the user. This was when some odd behaviour was observed. The ranked results (e.g. supposedly sorted by distance) were not appearing in the correct order on the page. For example the closest two results were 2.3 miles, these were then followed by one at 2.1 miles which was in turn followed by one 2.4 miles which was in turn followed by one at 2.2miles. Clearly they were not being ranked being distance! I spent a good deal of time investigating this issue and what follows is the process I used to solve the problem.

I began to run some calculations to try and find out the distance between a few sets of points and then compare the actual distance with the score that would have been generated by the Squared Euclidian equation to try and discover a flaw. Excel is great and I really like what it can do (more accurately I really what people far more skilled then me can do with it) but after a few hours staring at pages and pages of seemingly random numbers I decided on a different approach before my brain melted. I have a MCTS certification in Bing maps so it was a fairly straightforward task to build a little visualisation tool using the Bing Maps Ajax API.

I should point here that a huge amount of credit and thanks go to Ricky Brunditt who’s blog was a fantastic source of information and equations.

Firstly I plotted a ring of points all the same distance from a given location:

As you can see there is clearly a distortion whereby the further north a point is the larger it appears on the Mercator projection of the globe, also you can clearly see how the further north a point is the greater the horizontal distance appears between adjacent points. This is nothing ground-breaking but it did two things for me. First it helped me to visualise how the latitude and longitudes vary at different points on the globe. Second it made me think I really should have listened more in geography class – that would certainly have saved me some time in this instance.

So with a slightly clearer understanding I tried a different approach. I plotted a horizontal line of points and a vertical line of points all the same distance from the previous point:

Okay this was going somewhere. As we know that in the above picture all consecutive points along either axis are the same distance apart, the Squared Euclidian value for the first point on the latitudinal axis should be the same as that of the first point on the longitudinal axis. I took six points along each axis from a starting point just off Cornwall:

And then compared the values of the Squared Euclidian values:

Latitude | Longitude | Distance (miles) | Sq. Euclidian Distance |

51.5188 | -9.50419 | 100 | 2.097648419 |

52.96713 | -9.50419 | 200 | 8.390593674 |

54.41546 | -9.50419 | 300 | 18.87883577 |

55.86378 | -9.50419 | 400 | 33.5623747 |

57.31211 | -9.50419 | 500 | 52.44121046 |

58.76044 | -9.50419 | 600 | 75.51534307 |

50.07048 | -7.24837 | 100 | 5.088721445 |

50.07048 | -4.99665 | 200 | 20.31788589 |

50.07048 | -2.75308 | 300 | 45.57740415 |

50.07048 | -0.5216 | 400 | 80.68678436 |

50.07048 | 1.694005 | 500 | 125.3994645 |

50.07048 | 3.890164 | 600 | 179.4085853 |

Clearly Squared Euclidian values where the Longitude is constant increase far slower than those when the Latitude is constant. The goal of the exercise now became clear – to normalize these scores such that 100 miles west would result in the same Euclidian score as 100 miles north. I then calculated the difference in latitude between subsequent points along the horizontal (west) axis and the difference in longitude between the subsequent points along the vertical (north) axis and found the ratio between the two (dLat and dLon) for each point along the line in either direction. I then found the average of these values for the six sets of points in my dataset, lo and behold 1.64. I plugged this number into the Squared Euclidian equation for each of the points by multiplying the latitude value by 1.64 prior to squaring and was presented with the following results.

Latitude | Longitude | Distance (miles) | Sq. Euclidian Distance with 1.64 Factor |

51.5188 | -9.50419 | 100 | 4.974783 |

52.96713 | -9.50419 | 200 | 19.89913 |

54.41546 | -9.50419 | 300 | 44.77305 |

55.86378 | -9.50419 | 400 | 79.59653 |

57.31211 | -9.50419 | 500 | 124.3696 |

58.76044 | -9.50419 | 600 | 179.0922 |

50.07048 | -7.24837 | 100 | 5.088721445 |

50.07048 | -4.99665 | 200 | 20.31788589 |

50.07048 | -2.75308 | 300 | 45.57740415 |

50.07048 | -0.5216 | 400 | 80.68678436 |

50.07048 | 1.694005 | 500 | 125.3994645 |

50.07048 | 3.890164 | 600 | 179.4085853 |

As you can clearly see the numbers, while not exactly the same are much much closer. I plugged this into the ranking algorithm of the search system and hey presto the results were all presented in the correct order – e.g. ranked by ascending distance. It is essential to take note that this magic number of 1.64 will only work over the United Kingdom and the technique I outlined here would not be suitable for large areas, it does however provide a very cheap way of fairly reliably and very cheaply calculating the distance between two points within a smallish area.