then receives each reply in turn, gathering them in a list. The client-side code for a server call is reused entirely as is.
By using worker processes, libraries are free to use receive expressions as needed without worrying about blocking their caller. If the caller does not wish to block, it is always free to spawn a worker.
Though it eliminates shared state, Erlang is not immune to races. The server behaviour allows its application code to execute as a critical section accessing protected data, but it’s always possible to draw the lines of this protection incorrectly.
Figure 5, for example, illustrates that if we had implemented sequences with raw primitives to read and write the counter, we would be just as vulnerable to races as a shared-state implementation that forgot to take locks.
This code is insidious as it will pass simple unit tests and can perform reli-ably in the field for a long time before it silently encounters an error. Both the client-side wrappers and server-side call-backs, however, look quite different from those of the correct implementation. By contrast, an incorrect shared-state program would look nearly identical to a correct one. It takes a trained eye to inspect a shared-state program and notice the missing
lock requests.
All standard errors in concurrent programming have their equivalents in Erlang: races, deadlock, livelock, starvation, and so on. Even with the help Erlang provides, concurrent programming is far from easy, and the nondeterminism of concurrency means that it is always difficult to know when the last bug has been removed.
Testing helps eliminate most gross errors—to the extent that the test cases model the behaviour encountered in the field. Injecting timing jitter and allowing long burn-in times will help the coverage; the combinatorial explosion of possible event orderings in a concurrent system means that no nontrivial application can be tested for all possible cases.
When reasonable efforts at testing reach their end, the remaining bugs are usually heisenbugs, which occur
5
nondeterministically but rarely. They can be seen only when some unusual timing pattern emerges in execution. They are the bane of debugging since they are difficult to reproduce, but this curse is also a blessing in disguise. If a heisenbug is difficult to reproduce, then if you rerun the computation, you might not see the bug. This suggests that flaws in concurrent programs, while unavoidable, can have their impact lessened with an automatic retry mechanism—as long as the impact of the initial bug event can be detected and constrained.
figure 5: badsequence.erl.
BAD - race-prone implementation - do not use - BAD -module(badsequence). -export([make_sequence/0, get_next/1, reset/1]). -export([init/0, handle_call/2, handle_cast/2]).
API make_sequence() -> server:start(badsequence). get_next(Sequence) ->
N = read(Sequence),
write(Sequence, N + 1), BAD: race!
N.
reset(Sequence) -> write(Sequence, 0).
read(Sequence) -> server:call (Sequence, read).
write(Sequence, N) ->
server:cast(Sequence, {write, N}).
Server callbacks
init()
handle_call(read, N)
handle_cast({write, N}, _)
-> 0.
-> {N, N}.
-> N.
Erlang is a safe language—all runtime faults, such as division by zero, an out-of-range index, or sending a message to a process that has terminated, result in clearly defined behavior, usually an exception. Application code can install exception handlers to contain and recover from expected faults, but an uncaught exception means that the process cannot continue to run. Such a process is said to have failed.
Sometimes a process can get stuck in an infinite loop instead of failing overtly. We can guard against stuck processes with internal watchdog processes. These watchdogs make periodic calls to various corners of the running application, ideally causing a chain of events that cover all long-lived processes, and fail if they don’t receive a response within a generous but finite timeout. Process failure is the uniform way of detecting errors in Erlang.
Erlang’s error-handling philosophy stems from the observation that any robust cluster of hardware must consist of at least two machines, one of which can react to the failure of the other and take steps toward recovery. 2 If the recovery mechanism were on the broken machine, it would be broken, too. The recovery mechanism must be outside the range of the failure. In Erlang, the process is not only the unit of concurrency, but also the range of failure. Since processes share no state, a fatal error in a process makes its state unavailable but won’t corrupt the state of other processes.
Erlang provides two primitives for one process to notice the failure of another. Establishing monitoring of another process creates a one-way notification of failure, and linking two processes establishes mutual notification. Monitoring is used during temporary relationships, such as a client-server call, and mutual linking is used for more permanent relationships. By default, when a fault notification is delivered to a linked process, it causes the receiver to fail as well, but a pro-cess-local flag can be set to turn fault notification into an ordinary message that can be handled by a receive expression.
In general application program-
References:
Archives