hpc.social


High Performance Computing
Practitioners
and friends /#hpc
Share: 
This is a crosspost from   Cleve’s Corner: Cleve Moler on Mathematics and Computing Cleve Moler is the author of the first MATLAB, one of the founders of MathWorks, and is currently Chief Mathematician at the company. He writes here about MATLAB, scientific computing and interesting mathematics.. See the original post here.

Closest Pair of Points Problem

The Closest Pair of Points problem is a standard topic in an algorithms course today, but when I taught such a course fifty years ago, the algorithm was not yet known.

Contents

California Dreaming

Imagine you are driving a car on the Harbor Freeway in southern California with typical Los Angeles traffic conditions. Among the many things you might want to know is which pair of vehicles is nearest each other.

This is an instance of the Closest Pair of Points problem:

  • Given the location of n points in the plane, which pair of points is closest to each other?

Closest Pair of Points

It is convenient to represent the points by a vector of complex values. The distance between points z(k) and z(j) is then

  d = abs(z(k) - z(j))

Here are a few points in the unit square. The closest pair is highlighted.

Pairs

The first algorithm you might think of computes the distances between all possible pairs of points and finds the minimum. This is a brute force approach that requires only a few lines of code.

function d = Pairs(z)
    % Pairs.
    % d = Pairs(z) is the minimum distance between any two elements
    % of the complex vector z.
    n = length(z);
    d = Inf;
    for k = 1:n
        for j = k+1:n
            if abs(z(k) - z(j)) < d
                d = abs(z(k) - z(j));
            end
        end
    end
end

DivCon

DivCon stands for Divide and Conquer. In outline, the steps are:

  • Divide the set of points into two halves.
  • Recursively, find the closest pair in each half.
  • Consider the case when the closest pair has one point in each half.
  • Terminate the recursion with sets of two or three points.
function d = DivCon(z,sorted)
    % DivCon.
    % d = DivCon(z) is the minimum distance between any two elements
    % of the complex vector z.
    %
    % d = DivCon(z,true) is a recursive call with ascending real(z).
    n = length(z);
    if n <= 3
       d = Pairs(z);
    else
       if nargin < 2 || ~sorted
           [~,p] = sort(real(z));
           z = z(p);
       end
       m = floor(n/2);
       % Left half
       dl = DivCon(z(1:m),true)
       % Right half
       dr = DivCon(z(m+1:end),true);
       % Choose
       d = min(dl,dr);
       % Center strip
       ds = Center(z,d);
       d = min(ds,d);
    end
end

Center

The delicate case involves the strip of points near the center dividing line. The width of the strip is the closest distance found in the recursion. Any closer pair with one point in each half must be in this strip.

function d = Center(z,d)
    % Center(z,d) is used by DivCon to examine the
    % strip of half-width d about the center point.
    n = length(z)
    m = floor(n/2);
    xh = real(z(m));
    [~,p] = sort(imag(z));
    z = z(p);
    s = [];
    for i = 1:n
        if abs(real(z(i)) - xh) <  d
            s = [s; z(i)];
        end
    end
    ns = length(s);
    for k = 1:ns
        for j = k+1:ns
            if (imag(s(j)) - imag(s(k))) < d && abs(s(k) - s(j)) < d
                d = abs(s(k) - s(j));
            end
        end
    end
end

Complexity

Let n be the number of points. An asymptotic execution-time complexity analysis involves n approaching infinity.

It is not hard to see that the complexity of the brute force algorithm implemented in Pairs is O(n^2).

There are dozens of pages on the web devoted to showing that the complexity of the divide and conquer algorithm implemented in DivCon and Center is O(n*log(n)). The best page that I have seen is the YouTube video by Ling Qi. The key to the analysis is showing that the inner loop in Center is executed at most 7 times for any n.

Timing

We measured the execution time of Pairs(z) and DivCon(z) for n from 1,000 to 40,000 and computed the ratios of the two times. The complexity analysis predicts that this ratio is asymptotically

  O(n/log(n))

Here are the timing results and a least square fit by n/log(n).

Software

A self-extracting MATLAB archive is available at https://blogs.mathworks.com/cleve/files/TestDivCon_mzip.m

References

Ling Qi, IDeer7, Closest Pair of Points (Divide and Conquer) Explained. https://www.youtube.com/watch?v=6u_hWxbOc7E.

Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford. Introduction to Algorithms (4th ed.). MIT Press and McGraw-Hill. ISBN 0-262-04630-X. 1312 pp.


Get the MATLAB code

Published with MATLAB® R2023a