;

Finding a cheap calculation for the distance between two points in the United Kingdom

Posted : Friday, 02 September 2011 00:18:33

Recently I was working on a project whereby we had to be able to calculate the distance between two points (e.g. two sets of lat/lon coordinates) within the United Kingdom. The reason for doing so was so that we were able to rank a set of arbitrary search results by distance from a given location. This could potentially be very computationally expensive particularly when considering there could be hundreds or even thousands of results. By the time I was brought into the project a rough algorithm had been found to to do this but was as yet untested. The algorithm in question is known as the “Squared Euclidian” distance. It basically involves performing a Pythagoras equation to determine the square of a hypotenuse – given that in this scenario there is no requirement to calculate the actual distance as it was purely for ranking purposes there was no reason to find the square root of squared hypotenuse. As usual Wikipedia has information on the subject:

http://en.wikipedia.org/wiki/Euclidean_distance

So that’s the background. The equation was used to rank a set of search results which were then paginated and finally rendered onto a web page, at this point the actual distance was calculated in order to show the actual distance to the user. This was when some odd behaviour was observed. The ranked results (e.g. supposedly sorted by distance) were not appearing in the correct order on the page. For example the closest two results  were 2.3 miles, these were then followed by one at 2.1 miles which was in turn followed by one 2.4 miles which was in turn followed by one at 2.2miles. Clearly they were not being ranked being distance! I spent a good deal of time investigating this issue and what follows is the process I used to solve the problem.

I began to run some calculations to try and find out the distance between a few sets of points and then compare the actual distance with the score that would have been generated by the Squared Euclidian equation to try and discover a flaw. Excel is great and I really like what it can do (more accurately I really what people far more skilled then me can do with it) but after a few hours staring at pages and pages of seemingly random numbers I decided on a different approach before my brain melted. I have a MCTS certification in Bing maps so it was a fairly straightforward task to build a little visualisation tool using the Bing Maps Ajax API.

I should point here that a huge amount of credit and thanks go to Ricky Brunditt who’s blog was a fantastic source of information and equations.

Firstly I plotted a ring of points all the same distance from a given location:

circle_distortion

As you can see there is clearly a distortion whereby the further north a point is the larger it appears on the Mercator projection of the globe, also you can clearly see how the further north a point is the greater the horizontal distance appears between adjacent points. This is nothing ground-breaking but it did two things for me. First it helped me to visualise how the latitude and longitudes vary at different points on the globe. Second it made me think I really should have listened more in geography class – that would certainly have saved me some time in this instance.

So with a slightly clearer understanding I tried a different approach. I plotted a horizontal line of points and a vertical line of points all the same distance from the previous point:

linear_distortion

Okay this was going somewhere. As we know that in the above picture all consecutive points along either axis are the same distance apart, the Squared Euclidian value for the first point on the latitudinal axis should be the same as that of the first point on the longitudinal axis. I took six points along each axis from a starting point just off Cornwall:

linear_for_calc

And then compared the values of the Squared Euclidian values:

Latitude Longitude Distance (miles) Sq. Euclidian Distance

51.5188

-9.50419

100 2.097648419

52.96713

-9.50419

200 8.390593674

54.41546

-9.50419

300

18.87883577

55.86378

-9.50419

400

33.5623747

57.31211

-9.50419

500

52.44121046

58.76044

-9.50419

600

75.51534307

       

50.07048

-7.24837

100

5.088721445

50.07048

-4.99665

200

20.31788589

50.07048

-2.75308

300

45.57740415

50.07048

-0.5216

400

80.68678436

50.07048

1.694005

500

125.3994645

50.07048

3.890164

600

179.4085853

Clearly Squared Euclidian values where the Longitude is constant increase far slower than those when the Latitude is constant. The goal of the exercise now became clear – to normalize these scores such that 100 miles west would result in the same Euclidian score as 100 miles north. I then calculated the difference in latitude between subsequent points along the horizontal (west) axis and the difference in longitude between the subsequent points along the vertical (north) axis and found the ratio between the two (dLat and dLon) for each point along the line in either direction. I then found the average of these values  for the six sets of points in my dataset, lo and behold 1.64. I plugged this number into the Squared Euclidian equation for each of the points by multiplying the latitude value by 1.64 prior to squaring and was presented with the following results.

Latitude Longitude Distance (miles) Sq. Euclidian Distance with 1.64 Factor

51.5188

-9.50419

100

4.974783

52.96713

-9.50419

200

19.89913

54.41546

-9.50419

300

44.77305

55.86378

-9.50419

400

79.59653

57.31211

-9.50419

500

124.3696

58.76044

-9.50419

600

179.0922

       

50.07048

-7.24837

100

5.088721445

50.07048

-4.99665

200

20.31788589

50.07048

-2.75308

300

45.57740415

50.07048

-0.5216

400

80.68678436

50.07048

1.694005

500

125.3994645

50.07048

3.890164

600

179.4085853

As you can clearly see the numbers, while not exactly the same are much much closer. I plugged this into the ranking algorithm of the search system and hey presto the results were all presented in the correct order – e.g. ranked by ascending distance. It is essential to take note that this magic number of 1.64 will only work over the United Kingdom and the technique I outlined here would not be suitable for large areas, it does however provide a very cheap way of fairly reliably and very cheaply calculating the distance between two points within a smallish area.

  • (This will not appear on the site)