limited in practical applications. Note that without this restriction on the message size, a single convergecast would
suffice to accumulate all elements at a single node, which
could subsequently solve the problem locally.
As both proposed algorithms are iterative in that they
continuously reduce the set of possible solutions, we need
to distinguish between nodes holding elements that are still
of interest from the other nodes. Henceforth, the first set of
nodes is referred to as candidate nodes or candidates. We call
the reduction of the search space by a certain factor a phase
of the algorithm. The number of candidate nodes in phase i
is denoted n(i).
We assume that all nodes know the diameter D of the
graph. Furthermore, it is assumed that a BFS spanning tree
rooted at the node initiating the algorithm has been computed beforehand. These assumptions are not critical as
both the diameter and the spanning tree can be computed
in 2D time. 14
The main complexity measure used is the time complexity which is, for deterministic algorithms, the time required
from the start of an execution to its completion in the worst-case for every legal input and every execution scenario. The
time complexity is normalized in that the slowest message is
assumed to reach its target after one time unit. As far as our
randomized algorithm is concerned, we determine the time
after which the execution of the algorithm has completed
with high probability, i.e., with probability at least 1 – 1 nc for a
constant c ≥ 1. Thus, in both cases, we do not assign any cost
to local computation.
4. alGoRithms
The algorithms presented in this article operate in sequential phases in which the space of candidates is steadily reduced. This pattern is quite natural for k-selection and
used in all other proposed algorithms including the non-distributed case. The best known deterministic distributed
algorithm for general graphs uses a distributed version of
the well-known median-of-median technique, which we
briefly outlined in Section 2, resulting in a time complexity of O(Dn0.9114) for a constant group size. A straightforward
modification of this algorithm in which the group size in
each phase i is set to O( n(i)) results in a much better time
complexity. It can be shown that the time complexity of this
variant of the algorithm is bounded by O(D(log n)log log n+O( 1)).
However, since our proposed algorithm is substantially better, we will not further discuss this median-of-median-based
algorithm. Due to the more complex nature of the deterministic algorithm, we will treat the randomized algorithm
first.
4. 1 Randomized algorithm
While the derivation of an expedient deterministic algorithm is somewhat intricate, it is remarkably simple to come
up with a fast randomized algorithm. An apparent solution,
proposed by Shrira et al., 20 is to choose a node randomly
and take its element as an initial guess. After computing
the number of nodes with smaller and larger elements,
it is likely that a considerable fraction of all nodes no longer need be considered. By iterating this procedure on the
remaining candidate nodes, the kth smallest element can be
found quickly for all k.
A node can be chosen randomly using the following
scheme: A message indicating that a random element is to
be selected is sent along a random path in the spanning tree
starting at the root. If the root has l children u , …, u where
1l
child ui is the root of a subtree with ni candidate nodes in-
cluding itself, the root chooses its own element with prob-
ability 1/( 1 + ∑lj=1nj). Otherwise, it sends a message to one
of its children. The message is forwarded to node u with
i
probability n l i/( 1 + ∑ j=1nj) for all i ∈ { 1,...,l}, and the recipi-
ent of the message proceeds in the same manner. It is easy
to see that this scheme selects a node uniformly at random
and that it requires at most 2D time, because the times to
reach any node and to report back are both bounded by D.
Note that after each phase the probabilities change as they
depend on the altered number of candidate nodes remain-
ing in each subtree. However, having determined the new
interval in which the solution must lie, the number of nodes
satisfying the new predicate in all subtrees can again be
computed in 2D time.
This straightforward procedure yields an algorithm that
finds the kth smallest element in O(D log n) expected time,
as O(log n) phases suffice in expectation to narrow down the
number of candidates to a small constant. It can even be
shown that the time required is O(D log n) with high probability. The key observation to improve this algorithm is that
picking a node randomly always takes O(D) time, therefore
several random elements ought to be chosen in a single phase
in order to further reduce the number of candidate nodes.
The method to select a single random element can easily
be modified to allow for the selection of several random elements by including the number of needed random elements
in the request message. A node receiving such a message locally determines whether its own element is chosen, and also
how many random elements each of its children’s subtrees
has to provide. Subsequently, it forwards the requests to all
of its children whose subtrees must produce at least one random element. Note that all random elements can be found
in D time independent of the number of random elements,
but due to the restriction that only a constant number of elements can be packed into a single message, it is likely that
not all elements can propagate back to the root in D time.
However, all elements still arrive at the root in O(D) time if
the number of random elements is bounded by O(D).
By using this simple pipelining technique to select O(D)
random elements in O(D) time, we immediately get a more
efficient algorithm, which we will henceforth refer to as Arand.
When selecting O(D) elements uniformly at random in each
phase, it can be shown that the number of candidates is reduced by a factor of O(D) in a constant number of phases with
high probability with respect to O(D), as opposed to merely a
constant factor in case only a single element is chosen. This
result, together with the observation that each phase costs
merely O(D) time, is used to prove the following theorem.
Theorem 4. 1. In a connected graph of diameter D ≥ 2
consisting of n nodes, the time complexity of algorithm Arand is O(D
log n) w.h.p.