Showing posts with label erlang. Show all posts
Showing posts with label erlang. Show all posts

Sunday, 22 June 2008

Day 14 - Finished the painting

The painting of the door frames is done, another step in our renovation. Tomorrow the craftsmen will continue their work. The wallpapering of the kitchen will be completed and the radiators will be installed. Due to my journey to Berlin tomorrow I'll write about it on tuesday.

Beside all the renovation my Erlang development hasn't stopped. I've continued the implementation of a fist service for the CELLMB using Mnesia, In principle everything works fine, but I'm still missing one thing in my implementation - the management of complex event-driven processes. So I've started to think about a more general successor of the CELLMB. It will be called CELEAS - the Tideland CEL Events and Services. You can read more about it in my wiki at Google Code.

Saturday, 14 June 2008

Day 6 - Time for something complete different

OK, also today we've done a bit for our renovation. We prepared the first radiators for painting and will continue tomorrow. But the most time we've gone shopping, had a BBQ, and have been visited by my sister and her husband.

But yesterday I've read about the new release 4.0 of PLT Scheme. So I've visited the site and once again I've been fascinated of the Scheme style. Even if I'm currently preferring Erlang for my work, I like the Scheme syntax more. But I also like the keyword arguments of Smalltalk. So my favorite language should be a mix of those three. Concurrency, distribution, pattern-matching, and non-modifiable variables from Erlang, the syntax like Scheme, and arguments in the style of Smalltalk. This could look like

(define (insert customer: Customer into-database: Database)
(let ((PCustomer (prepare customer: Customer))
(OpenDatabase (open database: Database)))
(...)))

Now one question would be, how pattern-matching would look like. And message sending and receiving. Erlang tuples could just also be lists. But I also would need atoms. And at last I would like Smalltalks closures more than funs and lambdas.

(map fun: (|item: Item| (print Item)) on-list: MyList)

I'll never implement such a language, but it's fun to think about it.

Wednesday, 14 May 2008

Erlang development progress

After a short break - I've finished my article about Erlang for the German iX magazine - my Erlang development continued. I've spent most of my time into the improvement of the Tidleland CEL Lightweight Message Bus (CELLMB) which is now working really smooth. First approaches have been too complex but now it's simple and fast. I've don't really implemented a bus, Erlang already has a very good communication infrastructure. So I only use this. The major component is a supervisor managing and monitoring the services. Those are locally registered processes based on gen_server and with own service-depending callbacks. So e.g. the statement

{ok, Reply} = cellmb:request(Ctx, configuration, read, {my_app, my_group, path, to, my, var})

leads to the call of

my_cfg:read(Ctx, {my_app, my_group, path, to, my, var}, State).

Management and lifecycle are implemented in cellmb and cellmbsvc, so the writing and usage of own services is really simple. The next advantage is the semi-automatic distribution of messages. So imagine the following situation: due to a mix of load-balancing, a higher availability through redundancy, and the access of local resources the node alpha provides the services one and two, the node beta the services two and three, the node gamma the services three and four, and the node delta the services four and one. Each service is located on two nodes but each node doesn't provide all services. In other configurations there also may be a different distribution of services, e.g. non-overlapping or some services on all nodes. Clients of the CELLMB don't have to care. So if a client on node alpha publishes the message

{ok, Reply} = cellmb:request(Ctx, three, do_something, Args)

the system will discover which node provides service three - here the nodes beta and gamma - and establishes a proxy service to one of them. The message will then be resend through the proxy, like all following messages. If a connection is broken during the operation the system tries to establish a new one or - in case of a failure - throw an exception. So there are still some features missing, like a simple SOAP client interface, the migration of services between nodes during runtime, and pro- and post-processing of messages matching a defined pattern. The latter is intended for a kind of aspects. I'll add those features later when - or if - I need them.

Currently I'm implementing the first common services. The status will be documented in our wiki. Most of the standard implementations will use Mnesia for the persistency. My first trials have been really promising.

Saturday, 26 April 2008

Erlang for the OO-minded

Somehow my thoughts about concurrency-oriented programming as a better way of object-orientation seem to jolted the community. I never had so much traffic on my site. The discussion has been undecided. Some had problems with the functional nature of erlang, other ones with the single-assignment of variables in Erlang. Those who understood the ideas behind the Erlang messaging and the generic server have been able to follow me. So I'll try to make it clearer.

So let's take a small - and useless *smile* - Java class:

public class Adder {
private int a, b;

public void setA(int anA) {
a = anA
}

public void setB(int aB) {
b = aB
}

public int result() {
return a + b
}
}

The usage would be really simple:

Adder myAdder = new Adder();

myAdder.setA(1);
myAdder.setB(2);

System.out.println(myAdder.result());

And now the same thing in Erlang. I will show it in two different ways. The first one works without processes. It is simple, straight foreward and can be realized this way in many languages. But it doesn't use the advantages.

create() ->
{0, 0}.

set_a(A, {_oldA, B}) ->
{A, B}.

set_b(B, {A, _oldB}) ->
{A, B}.

result({A, B}) ->
A + B.

If the module of this code is called adder the usage would be

A1 = adder:create(),

A2 = adder:set_a(1, A1),
A3 = adder:set_b(2, A2),

io:format("~w", [adder:result(A3)]).

That's no real object-oriented way, it only shows the way how many Erlang modules work, e.g. to create and manage dictionaries. The data is managed using the Erlang basic and higher level data types like tuples and lists. Every change creates a new data or data structure because variables in Erlang are only single-assignable. Beside optimization the the major reason is the prevention of side-effects. Later more about that.

The implementation as a process looks a bit different:

create() ->
spawn(?MODULE, loop, [0, 0]).

set_a(Pid, A) ->
Pid ! {set_a, A}.

set_b(Pid, B) ->
Pid ! {set_b, B}.

result(Pid) ->
Pid ! {result, self()},
receive
{response, Value} -> Value
end.

loop(A, B) ->
receive
{set_a, newA} ->
loop(newA, B);
{set_b, newB} ->
loop(A, newB);
{result, Pid} ->
Pid ! {response, A + B}
loop(A, B)
end.

The major parts are the creation of the process using spawn and the process function loop. Inside this function messages are received and processed. After that the function is called tail-recursive, there'll be no stack-overflow. The status of the process - in object-oriented languages called attributes, instance variables, or properties - are maintained in the arguments A and B. Alternatively of several single arguments one tuple of record containing a complex data structure can be used. The other functions above are just helper functions, especially the result function. This is due to the asynchronous message handling where the return of the result is also a message send and receive. So the usage will be

Pid = adder2.create(),

adder2:set_a(Pid, 1),
adder2:set_b(Pid, 2),

io:format("~w", [adder2:result(Pid)]).

Here you easily see how this time only one object - the process - is created and modified. Due to the fact that a process has only one message queue all messages are handled sequentially. So there're no problems with synchronizations, semaphores, or locks. Multiple processes can use this process with no problems. And here the old metaphor of sending messages to objects is really true. You may think "Nah, doing dispatching on my own, bullshit." But OTP modules like gen_server and the callback mechanism allow to simplify that. They are generic, like abstract classes, and provide all the needed stuff so that you can concentrate on the business logic and some comfort functions.

So where's the advantage? Surely not in those tiny processes, I would implement them as standard modules like the first example. But the strength of Erlang is the concurrency, the parallel processing. Spawned processes are not working sequentially but really parallely, on one core, on multiple cores, on multiple processors, and on multiple systems. And that's the big advantage. Think about a special architecture like pipes and filters for the processing of a larger amount of data.


In a typical sequential way each retrieved insurance holder would be processes step by step and typically on a single processor. The processing of a large number of insurance holders and their contracts would last a long time. One solution could be the usage of multi-threading for the filters together with synchronized data queues for the pipes. Using inheritance simplifies the implementation. This solution would use multiple cores and processors. But still there's a limitation in the distribution of the filters or groups of filters. For example everything up to the premium filter could be on a first system, the both branches behind it on two further systems. With most languages you would need special pipes for the remote communication, which again would make the whole solution more complex.

To distribute processes in Erlang it would be just necessary to add another function:

create(Node) ->
spawn(Node, ?MODULE, loop, [0, 0]).

This way the adder - or in the scenario above a filter - could be startet on a different node and be used like if it is working locally.

Pid = adder2.create(my_node@my-server.in.my.net)

Beside that there's no need for more implementation. Only the VMs have to be started with a name, a cookie for securing the networking, and a host file with the names of all the nodes. It's funny how simple it is. The example above also shows how Erlang handles polymorphism. One way is the arity of the functions. That's the reason why the export of functions also contains the number of arguments. Here's one small example defining a function in the adder module.

-module(adder).
-export([add/1]).

add(List) ->
add(List, 0).

add([Head|Tail], Acc) ->
add(Tail, Acc + Head);
add([], Acc) ->
Acc.

Only the add function with one argument will be exported, the other one is internally. It shows also the second way of polymorphism, the pattern-matching. While there are elements in the list the first of the two add functions with two arguments will be executed. It adds the head element to the accumulator and continues recursively with the tail. If the list is empty the second one is called, which returns the accumulator as the result. Beside the functions the pattern-matching also works in the case-, if-, and receive-statements. I've shown this already in the adder process function above. It's easy to see how the received tuples could contain the same command atom as the first element and then a different number of arguments as the further elements.

Another way to realize polymorphism are guards. Those are constraints which can be added to function definitions and pattern-matchings. One major task of guards is to do type checking. Erlang uses duck typing, so the arity is sometimes not enough, e.g. for a function to append anything in it's string representation to a string.

string_append(String, Float) when is_float(Float) ->
...;
string_append(String, Integer) when is_integer(Integer) ->
...;
string_append(String, Tuple) when is_tuple(Tuple) ->
...

Multiple guards can also be combined using a semicolon (or) and a comma (and). They will be evaluated short-circuited to increase the performance. Their flexible definition additionally allows a more powerful polymorphism than in traditional languages. Think about a process for withdrawals which shall do this differently for different amounts:

loop(State) ->
receive
{withdraw, Amount, Account, Lo, Hi} when Amount =< Lo ->
% Perform a standard withdraw.
...;
{withdraw, Amount, Account, Lo, Hi} when Amount > Hi ->
% Perform a special customer approval before the withdraw.
...;
{withdraw, Amount, Account, Lo, Hi} ->
% Perform a simple customer approval for withdrawals between lo and hi.
...
end.

One big part of object-orientation is still missing: the inheritance. Here Erlang has no real solution in the sense of deep hierarchies based on one root class. But with behaviours and callbacks you can at least realize something like abstract classes and their children. The OTP libraries use this for several powerful modules. Here's my very small implementation of the generic server. The original one is by far more sophisticated.

-module(server).
-export([start/2, stop/1, call/2]).

start(Module, Args) ->
% Call init/1 in Module.
% It has to return an initial state.
State = Module:init(Args),
spawn(?MODULE, loop, [Module, State]).

stop(Pid) ->
Pid ! stop,
ok.

call(Pid, Msg) ->
Pid ! {call, Msg, self()},
receive
{response, Value} -> Value
end.

loop(Module, State) ->
receive
{call, Msg, Pid} ->
% Call function handle/2 in Module.
% It has to return {Value, NewState}.
{Value, NewState} = Module:handle(Msg, State),
Pid ! {response, Value},
loop(Module, NewState);
stop ->
% Call function terminate/1 in Module.
Module:terminate(State)
end.

This way the developer just has to implement the three functions init/1, handle/2 for each message, and terminate/1.

-module(account_server).
-export([init/1, handle/2, terminate/1]).

init(Args) ->
% Create an initial state, e.g. a database connection.
...

handle({open, Account}, State) ->
...,
{0, NewState};
handle({withdraw, Amount, Account}, State) ->
...,
{Balance, NewState};
handle({deposit, Amount, Account}, State) ->
...,
{Balance, NewState};
handle({balance, Account}, State) ->
...,
{Balance, NewState}.

terminate(State) ->
...,
ok.

So a simple session could be:

Pid = server:start(account_server, DatabaseName),

server:call(Pid, {open, 4711}),
server:call(Pid, {deposit, 1000.0, 4711}),
server:call(Pid, {withdraw, 250.0, 4711}),

Balance = server:call(Pid, {balance, 4711}),

% Balance now should be 750.0.

server:stop(Pid).

As written above this is typically more powerful and elegant handled, but this example should be enough to let you understand how Erlang processes could be seen as a kind of objects. Additionally to the features I mentioned here the receive construct also knows a time-based action which is called when no message has arrived for a given time. And through a simple mechanism parent processes can be notified if a child dies. These both features again allow more powerful solutions. Maybe this is reason enough for you to be as interested as I am in developing with Erlang.

Friday, 18 April 2008

Exciting days

The current days are a bit exciting with only few spare time for me. It started last Friday with a quick flight to Poland and the birthday party of my brother in law in the evening. On Sunday we celebrated the confirmation of our niece. Then from Monday till Thursday I got two courses in software architecture by the CMU SEI - Software Architecture: Principles and Practices (SAPP) and Documenting Software Architectures (DSA). On Monday and Tuesday Software Product Lines (SPL) and in May Software Architecture Design and Analysis (SADA) will follow. Last Tuesday I finished the work on my Erlang article which will be published next month. Today our little daughter Vanessa has her 12th birthday and on Sunday our older daughter Janina has her confirmation. *phew*

But I also had the chance to start the next improvement on the Tideland CEL Lightweight Message Bus. I'm currently adding a registry for the dynamic resolution of service names in a set of networked nodes. So if a publish can't be addressed to a local service the broker will retrieve a reference to an instance from the other nodes and cache this information. In the next step I'll add some kind of aspect orientation. So functions for cross-cutting concerns can be assigned to services so that they are executed before, after or around a service function.

My postings on COP and OOP led to much interest and response. Some of the comments in other forums showed that the writer doesn't know Erlang at all. In their eyes functional and object-orientation are diametrically opposed. So how about CLOS? Hmmm. But others could follow. So I'll write a small introduction Erlang for the OO-minded.

Monday, 7 April 2008

Ideas for an Erlang Object System

Using the Erlang concurrency-oriented style for object-oriented programming lacks the elegance of languages like Smalltalk. One way to solve this problem would be a pre-processor with an own syntax generating the Erlang code. But I don't like this solution because it would feel like a foreign substance. Additionally this language would have to be complete, documented, and able to use the Erlang libs. So a different idea would be a simple library, almost like gen_server, but more with OO ideas in mind. It would rely on callback modules together with the dynamic invocation of functions. First a call of ObjRef = eos:new(my_module) or ObjRef = eos:new(my_module, Args) would create an instance as a new linked process. The initialization could use my_module:new(Args, InitialState). The initial state would be the result of a recursive initialization through calling my_module:parent_module() and initializing those modules. This behaviour, calling parent_module/0 until it returns undefined or doesn't exist in a module, would the basis for inheritence.

To simplify life the EOS should only support synchronous method invocations and ignore all other Erlang messages. The call of Result = eos:invoke(ObjRef, my_method, Args) would lead to the call of my_module:my_method(Args, State) and has to be answered with Result or {Result, NewState}. The function invoke/3 would look for the function inside the module and, if it isn't exported in that module, recursively in the parent modules. If it can't be found it would try to invoke does_not_understand/2 the same way. If even this function can't be found the system should raise an error.

The dispose of the object could be done manually through eos:dispose(ObjRef) or automatically using the typical Erlang mechanism of linked processes and their notification. Alltogether this system is really simple and it definitely doesn't compete with the standard OO languages. But it may help some experienced OO developers to feel more homelike in the Erlang world. What do you say?

Sunday, 6 April 2008

COP - The better way of OOP?

The last weeks I haven't had as much time for the development of my Erlang software as wanted, but I've been busy in refactoring the Tideland CEL Lightweight Message Bus. It has been the goal to reach the beta state but after some tests and the development of a first real service I realized where I still have to do some work. Now the system follows the OTP design principles more than before, supports stateful services, has an improved load behaviour, scales better, and the API is more simply. *smile* Currently I'm doing some stability tests where services are restarted automatically when they die. But that's not the focus of my entry today.

I'm now developing software the object-oriented way since about 20 years, mostly in Smalltalk, Python, and Java. The common paradigm in many tutorials has been that objects are a kind of things with some knowledge communicating with each other through sending messages and receive the answers. But when programming in those systems it feels more like calling functions with an invisible record as the first argument to access the record fields. The program itself is a kind of imperative programming. OK, there are inheritance, the overriding of methods, and polymorphism. So there's a bit more, yes. And with threads or processes there are classes which allow an asynchronous execution. But after all there's still no feeling of really independent objects populating a common world, living together, acting autonomically, communicating through real messages.

But now, after learning concurrency-oriented programming with Erlang, it's different. On the first look Erlang seem to be a strange language, with Prolog roots and working functional. So you have to get accustomed with the pattern matching - it's great - and the fact that every variable can only be written once. A process is just a function that is spawned to work in the background. It can run once or endless through tail-recursion. But the real important fact is that every process has a queue for the asynchronous receiving of real messages. The receive construct uses the typical Erlang pattern matching so that a process can handle different messages differently. Additionally the construct can contain a timeout statement for automatic tasks after some time of idling. Links and monitors allow processes to get notified if another process is dying, once again through sending messages to the monitoring process.

my_object(State) ->
receive
{method_a, Arg1, Arg2} ->
NewState = do_method_a(State, Arg1, Arg2),
my_object(NewState);
{method_b, Arg1} ->
NewState = do_method_b(State, Arg1),
my_object(NewState);
{'EXIT', Pid, Reason} ->
NewState = handle_exit(Pid, Reason),
my_object(NewState)
end.

This may look inconvenient. But the generic OTP modules like the gen_server and callbacks hide this mechanism and allow quick, convenient, and powerful implementations. Own generic modules can integrate other ones, so some kind of inheritance can be implemented. In case of my Lightweight Message Bus the services are simply modules subscribing to the bus through

cellmb:subscribe(service_name, my_service_module)

In case of a stateful service a call of

cellmb:publish(Ctx, service_name, do_something, Args)

would lead to the execution of

my_service_module:do_something(Ctx, Args, State).

This way the implementation of own services is really simple. After some time of learning working with those kinds of objects gets more and more natural. You even don't have to do very much to distribute those processes over multiple cores, processors, nodes, or computers. But you've got to rethink your knowledge about application design to optimally use this concurrency based behaviour. It is still not trivial to find if and how problems can be solved through parallel execution.

What's missing: Not everything is an object, only those processes working with receive and tail-recursion. So you can't ask a string for the length, like it can be done in Smalltalk. Instead Erlang provides helpful libraries for the work with the standard and higher-level data types. If this is required, a pure object-oriented language, Erlang doesn't fit. But for me this doesn't hurt. I'm a fan of the clean style of Smalltalk. Nevertheless Erlang is productive and expressive. So why care if everything is an object? And the question hasn't been about the language, it has been about concurrency-oriented programming as a way of object-oriented programming. A typical Erlang system consists of up to several thousand processes on each node, processes like the ones I've described above. They are based on the generic server, the event handler, the finite state machine, the supervisor, or own implementations. And they all work like objects in a real world, really parallel, communicating with each other when needed. For me this behaviour seems to be the better way of object-oriented programming.

Saturday, 16 February 2008

Erlang guards

Like matches, which are a very powerful instrument to split a large complex function into short better maintainable, this can also be supported through guards. Lets take for example a server using internally a process pool. This pool shall only provide the managed active processes until a maximum number, and take them back until a maximum number of free ones is reached. So the functions may be

handle_cast({do_it, Task}, S) when S#state.actno >= ?MAX_ACT_NO ->
    % Too much active processes, resend message.
    gen_server:cast(?MODULE, {do_it, Task}),
    {noreply, S};
handle_cast({do_it, Task}, S) when S#state.freeno > 0 ->
    % Use a free process.
    ...
    {noreply, NewS};
handle_cast({do_it, Task}, S) ->
    % Create a new process and use it.
    ...
    {noreply, NewS};
handle_cast({return_it, Proc}, S) when S#state.freeno >= ?MAX_FREE_NO ->
    % Dispose process.
    ...
    {noreply, NewS};
handle_cast({return_it, Proc}, S) -> 
    % Return the worker to the pool.
    ...
    {noreply, NewS}.

Saturday, 9 February 2008

Experiences made with Erlang while developing the CEL

Since end of August last year I'm reading about and developing in Erlang. The current state is that I'll release the Tideland Common Erlang Library (CEL) as public beta in a few days. I've just to review those modules I havn't worked on the last weeks, but then I'll tag it in our subversion and also create a package for download. The modules inside the CEL are

  • the CELSML for parsing documents in the Simple Markup Language,

  • the CELSTH to convert SML into HTML,

  • the CELETM for execution time monitoring,

  • some helpful utility functions in CELUTL, and

  • the Lightweight Message Bus CELLMB.

Especially the development of the last one devoured some time. First trials without the right knowledge about the system led to bad designs. But through those experiments and through reading the book of Joe Armstrong the concepts get better and the implementation more and more reliable. Especially those features of Erlang, the concentration on reliability, high-availability, and scalability, are fascinating me due to my personal professional background in mainframe and high-available Unix environments.

So one of the major experiences made is, that this Erlang support is really good and mechanisms like unchangeable variables and lightweight processes which share nothing are extreme powerful. But in return especially the asynchronous message handling needs special attention. Processes may die and never send an answer, so working with timeouts is really helpful. But in times of a vary high load those timeouts may be exceeded. So your applications need to handle those situations proper. After my functional unit test I made some stress tests where I discovered this behaviour and wondered why my tests failed. So I can recommend everyone to start those stress tests early. Another importand point handling high loads is to keep an eye on the number of active processes. If spawned too fast it may exceed the configured VM limit. So this shoud be set to a high level or the spawning of new processes should stopped until others finished their work. The CELLMB provides a configurable limit and received message are resent to the bus until there are free resources again.

Another feature of Erlang I really appreciate is the combination of pattern matching and guards, especially for the declaration of functions. So instead of a large and nested application logic inside one function a set of small and handy function bodies are used. Those are by far better maintainable. Many of those matchings and guards are base on single arguments. But very often tuples are used, a simple and powerful mechanism. But while short tuples are easy to handle it gets more and more awkward when they grow and are used over module borders. Especially modification of them is bad. So even if their syntax is a bit strange here this is the right situation to use records. They are helpful in both forms, externally included or just internally inside a module.

What's helpful for generic frameworks are callbacks. They provide an easy way to create extensible frameworks where the user can plug-in his own logic. So the subscribed services to the CELLMB are just modules providing defined callbacks for start, stop, the test if they are interested in a message, and a function for each command or one generic process function. When a service subscribes with

cellmb:subscribe(myshopmodule, SvcArgs)

it will be initialized with

myshopmodule:start(SvcArgs).

This function can be empty, e.g. for stateless services, but it can also be used to start a process. A message published by the client through

MH = cellmb:publish(shop, purchase, [MyBasket]).

leads to the call of

myshopmodule:purchase([MyBasket], Context, SvcArgs, Msg).

if the module answered with a well defined priority after being asked through

myshopmodule:processes(Msg, SvcArgs).

Why that priority? Several modules may be interested in processing this message. Through returning a priority in a defined span they can control the order of processing and pass informations between them using the context. This allows very flexible configurations. After all services have done their work their collected responses are sent back to the client process and can be retrieved with 

Responses = cellmb:rcvres(MH, Timeout).

Additionally synchronous combinations of publish and receive, an unidirectional send function and many convenience functions are provided.

What else? Funs are always really useful, like in other dynamic languages, Exception handling is easy, the library is really great for distributed server-side applications, I still need more experience with the supervision tree and monitoring, and I'm looking forward my first experiences with Mnesia when now developing some standard services for authentication, a role-based authorization, a management of access control lists, a user management, and an address management using the structures defined in the vCard standard.

Thursday, 24 January 2008

Lightweight Message Bus development and test progress

After testing synchronous requests for the Lightweight Message Bus I've today tested asynchronous requests with unordered response receivings. Everything works fine, simple services, multiple services answering to a single request and the orchestration of services. So I now only have to test some convenience functions, the timeout behaviour when the time exceeds, and massive parallel regression testing. 

My next step after those tests will be a simple web connector for the CELLMB. Post requests with serialized Erlang data structures will be converted into a message an published onto the bus. The respones will be serialized and sent back. URI parts control service, command, correlation id, and timeout. 

The idea behind it is a combination of using Seaside, which still is the best web framework, for the frontend and Erlang as the runtime environment for my backend services using the bus and Mnesia as database. So I'm able to use the right tools for the right job.

Sunday, 20 January 2008

Ah, I've got it

Yesterday my order of Joe Armstrongs "Programming Erlang - Software for a Concurrent World" arrived. Even if I've discovered many aspects of the language and OTP through the documentation and other web sources this book is really a help. I've allready read a third of it and detected several useful hints or clarifications where I've been uncertain. It will even help me finishing my Erlang article this week. It will cover the history, the language, the OTP, and some notes about existing Erlang solutions.

Thursday, 10 January 2008

Success through simplicity

After a good and promising start with the development of my Lightweight Message Bus (CELLMB) I've stuck by mid of December. I've been unhappy with my architecture and so I've used the holidays to refactor it. Now it's working by far better with less code and a better maintainability. The key was a simpler architecture.

My requirements for the CELLMB are to provide a node-internal bus using the publish/subscribe messaging paradigm allowing asynchronous and synchronous requests. My first approach has been composed of a central process for managing the subscriptions and passing the messages to one out of a pool of dispatcher processes. The subscribed services have been single or pooled processes which received the message through the dispatcher. A simple API helped them to send their response back to the dispatcher to collect them. The calling cllient process could fetch those responses from the dispatcher through passing a message handle to the fetch function. As you may perceive - this is not very intuitive and simple. Additionally the subscription mechanism has been unflexible and synchronous requests had the problem that you don't know if the processing of the message has been finished by all subscribed services.

The refactored CELLMB has also a central managing process which controls the subscriptions but also a larger pool of worker processes. The size of this pool grows and shrinks based on parameters for the maximum size and a timeout for idling workers. Services are now just modules with a defined callback interface. It contains the four functions start, stop, is_processing and process. The messages are now passed round robin to a worker process which filters the suubscribed services through calling is_processing for the message and then does the real work through calling process on each interested service. The results are collected and in case of a synchronous call passed back to the caller. Additional optional arguments for the service start and the subscription test for each message help to develop customizable services - e.g. two accounting services realizing different workflows, both subscribe but are interested in different user defined amounts - and a decorator for each call of a service can be defined. All together this straight and simple approach needs by far less code than the first try. Additionally it's easier to follow the control flow.

The first tests run fine and I will now add more complex test scenarios with a higher number of subscribing services and parallel calls. So I'll get a nice reliable infrastructure to run my business components.

Wednesday, 2 January 2008

Making it right

In a forum about programming Erlang a member documented his learning based on the problems of Project Euler. A very interesting approach. But after tackling problem 10 (Find the sum of all the primes below one million.) he has been a bit disappointed. He hasn't been able to find a solution running in less than a minute. So I've started a quick approach on my own. It was a hack, without using the lists library, generating a list of all primes through a tail recursive function which tests all odd numbers greater than 2 up to the given number - here one million - if each one is a prime and in case that it is one collects it through adding it in front of a list. The prime test again has been very simple, checking the remainder for the odd numbers up to the square root of the number to check. The last step has just been the addition of the results, again through a simple tail recursive function. Surely there's a more elegant and efficient apprach.

The result has been a runtime of 1.6 seconds on my MacBook and 6 seconds on his notebook. He then tried different algorithms, all of them with even worse results. His approaches all based on the Sieve of Erastothenes, which relies on creating new lists through filtering. So his algorithms traversed the lists over and over again, filtering the elements through fun expressions and creating the result lists through appending the elements.

Knowing about Erlangs concept of linked lists, the resulting performance problems when appending data at the end, and the worse performance of fun expressions compared to functions the result isn't astoundingly. So what's the bottom line? For a given problem, regardless if it shall be solved functional, object-oriented, logical, sequential or concurrent, local or distributed, you've got to know the boundaries of your tool and your environment to make it right. In this case, where he was learning the language, it has been OK. But violating constraints in real-world solutions often leads to huge problems when the system is productive.

And how to choose the right tool for a given problem, that's another topic I'll treat later. *smile*

Tuesday, 13 November 2007

Lightweight Message Bus

The last days I've worked intensively on the Lightweight Message Bus (CELLMB). After a first approach developing everything in one module it is now splitted up into three modules, one more general and two for the CELLMB. The first one is the Service Process Pool (CELSPP), which starts service processes up to a defined number. The start function returns a process ID which now can be used as if the process is only one single instance process. But instead of handling those messages itself, the received messages will be resent round robin to the service processes. So here and in its lazy automatic process creation it differs from pg2. A normal exit is resent to all service processes to terminate gracefully, also died processes are removed and recreated on demand automaticly. A later extension will support the processes to opt out and terminate after an idle time.

The CELLMB now uses this pool for the internal dispatching of messages with a given context, verb, and noun to subscribed processes. Those can be single and pooled processes using the CELSPP. For a simple development of those processes the module CELLMBSVC provides a generic behaviour. Just the callbacks init, handle and terminate have to be implemented - of course beside possibly needed helper functions.

The next step will be the implementation of the unit tests for the CELLMB and then the first services for my Erlang Business Library (EBL) which I'll use in my different portal projects. Those will handle configuration, authentication, authorization, client management, user management, and addresses based on vCards. They all will use Mnesia as their persistency backend and will be loose-coupled using the CELLMB for communication.

Saturday, 27 October 2007

Ending the silence

The time between today and my last entry is about three weeks. Over twenty days where I havn't had the time to just write some word. *sigh* But now it can go on.

During this time I've mostly been busy starting a larger project at my employer. My role here is the system architect and I'm also take part in rolling out the requirement engineering together with the standards definition and tool evaluation. We decided us for the IRqA, a really nice tool, especially the next release. Visure gave us a little preview. The tool is more powerful than DOORS, cheaper, and the service is really, really good. So now we're on our way, analysing our customers requirements and writing fine use cases. *smile*

A large part of my time at home went into my current article about continuations. Even if it is a small article it had cost a lot of work. I've had much troubles to find the right words to demonstrate this technique in a practical way. But know it is finished and delivered and I can concentrate on my next article about Erlang.

Beside doing serious things I've also wasted some time playing around with my new gadget. My new mobile phone should be a smartphone. And after comparing several models trying to find out what exactly I want I've decide for the Nokia E61i. It's a real nice device, not too large, pwerful, and playing wonderful together with my Mac - I've allready installed a neat OS X theme - and some WLANs where I've tried it.

So now I can go on developing in Erlang and write about it - here and in my next article. This system still fascinates me and I see a very large potential for future applications. Even if it is not Erlang itself its technology will influence many others. The current discussions on the Squeak lists demonstrate this allready.

Sunday, 7 October 2007

Comments wanted

An important part of the CEL will be the lightweight message bus(LMB), an infrastructure providing a publish/subscribe mechanism. I've described the concept in our wiki and would like to receive some comments of experienced Erlang developers here.

Sunday, 30 September 2007

Small features leading to more reliabilty

Since some years I've allways ported one helpful library to my used programming languages. So I've got it in Java, Python, Smalltalk and now Erlang. It is the execution time monitor CELETM. It allows to gain useful informations about how often blocks are called, the minimum execution time, the maximum time, the average time and the total time. The two ways to use it are

Measuring = celetm:begin_measuring({?MODULE, my_block}),
do_this(),
do_that(),
celetm:end_measuring(Measuring).

or
celetm:measure({?MODULE, my_func}, fun() -> my_func() end).
as a convenience function. The accumulated measurings can be retrieved with celetm:retrieve() or displayed in a table in the shell with celetm:io(). Options for sorting and filtering or output to an i/o device exist. So what's the special feature here? My old solutions allways had two central tables, one as a buffer for collecting the single measurings, one for the accumulated measurings. The access to both have been controlled by synchronize blocks or semaphores. With a high load calling processes, e.g. in a server, would have to wait, a typical bottleneck. The operation here is fast enough, so I never had this problem. But it's easy to imagine similar situations with longer lasting central functions. The Erlang solution uses a gen_server as a backend, and celetm:end_measuring() is an asynchronous cast message. So it's a fire-and-forget for the calling process. The only synchronous functions are celetm:retrieve() and celetm:io(). So only they would have to wait. And thanks to the Erlang messaging the implementation has been a really easy job with the shortest code ever.

Thursday, 27 September 2007

Future plans

In a comment I've been asked about my plans regarding the Common Erlang Library (CEL). So I'll now write about the current status and my future plans for the Erlang development. Before I talk about the products I've got to say that I'm right now just evaluating Erlang. I've developed lots of stuff in Smalltalk and I still love that language. But as you may know I'm focussed on server-side application and I'm interested in scalability and reliability. So here Erlang is really interessting for my and I may switch. The actual evaluation experiences are really good. And definitely I'll release most of my code as open source when reaching a beta status.

The first project I'll realize in Erlang is the above mentioned CEL. I've got such libraries for almost every language I use as a stack of useful features I need. You allready read about the simple markup language (CELSML), which is a library for the creation, parsing, searching and later manipulation of SML documents, which is a lightweighted alternative to XML. CELSTH is based on this library and contains an extensible SML-to-HTML converter. Another helpful library is the execution time monitor CELETM, I've also implemented this in almost every environment I use. *smile* It allows simple execution time measurings and accumulation of logical blocks. The biggest part of the CEL will be the lighweight message bus CELLMB. It will provide the asynchronous distribution of messages to registered processes, realized using the publish/subscribe paradigm and based on a context, a verb, a noun, and optional metadata. So a request check permission sent in a context authorization will be answered by a registered authorization service process (it may catch all verb-noun combinations or they may be splitted to different processes, this depends on the individual needs), but also listened and logged by an audit service process. So flexible loose coupled architectures are possible. They benefit very much from the Erlang features (processes, messaging, distribution). Last currently known part of the CEL is a set of utilities in the CELUTL module. It provides a ping/pong to monitor processes or a GUID generator.

The first greater application developed simultaneous to the CEL is the Dynamic Content Processor (DCP) Release 2. This is a small content management system containing a blog and some more dynamic features I need. It will be based on Yaws and Mnesia and replace the software behind this blog, www.tideland.biz, and some wikis like wiki.tideland.biz. Here I'll see how Erlang is the right tool for my future projects. Two planned ones - currently I'm compiling the requirements - are the Train of Thoughts (TOT) and a portal for the collection, searching, viewing, and comparing whisky tasing notes. All three, the DCP, TOT, and the whisky portal, need a reliable platform.

Tuesday, 25 September 2007

Power of Erlang pattern matching

One thing I really like in Erlang is the pattern matching. It allows small and powerful code which would be much more in other languages. I've got one example in my SML code. Here I need a function for the recursive search of matching nodes. A node is a tuple like
{node, Tag, Qualifier, Childs}
where Childs is a list with more nodes or text entries like
{text, Data}
A matching node is, where the given tuple is a node and wheere Tag and Qualifier match their search counterparts or where those are wildcards ("*"). In a simple pseudo code - without casting - this would look like
FUNCTION node_matches(Test, Tag, Qualifier)
IF Test HAS TYPE node THEN
IF (Test.Tag EQUALS Tag) OR (Tag EQUALS "*") THEN
IF (Test.Qualifier EQUALS Qualifier) OR (Qualifier EQUALS "*") THEN
RETURN true
END
END
END

RETURN false
END
It works, and is standard. But I think the Erlang way is much more elegant here
node_matches({node, Tag, Qualifier, _Childs}, Tag, Qualifier) ->
true;
node_matches({node, Tag, _Qualifier, _Childs}, Tag, ?WILDCARD) ->
true;
node_matches({node, _Tag, Qualifier, _Childs}, ?WILDCARD, Qualifier) ->
true;
node_matches({node, _Tag, _Qualifier, _Childs}, ?WILDCARD, ?WILDCARD) ->
true;
node_matches(_Node, _Tag, _Qualifier) ->
false.

Monday, 24 September 2007

Erlang development continued

The last week I've continued to develop my first project in Erlang. As you allready know it will be the Dynamic Content Processor (DCP) using Yaws and Mnesia. Currently I havn't reached those last two libraries, but my own first one, the Common Erlang Library (CEL), grows.

I've started with the simple markup language, SML. I allready wrote about it. It is a really nice approach how the parser reads the document and sends events in form of messages to a background builder process. And the last message returns the constructed document. In the standard library this is a structure almost like the XML DOM, but much more lightweighted. It really astonished me how easy it is to parralelize typical serial problems.

The next step has been the HTML builder. It uses the same approach like above to produce HTML. No full blown HTML, but fragments to be inserted into templates. It shall be used in the DCP. This library is part of the CEL and can be extended with a callback function. This allows the DCP - or other apps - to extend the SML-to-HTML converter easily.

Beside the processes and the asynchronous messages the pattern matching is really helping. It makes the receiving of the messages and the definition of small and neat handling and poweful callback functions simple.

I'll now continue to develop the converter together with some more unit tests. Once again they help a lot. After that I'll develop something similar to my Smalltalk Lightweight Application Server, based on the Erlang processes and asynchronous messaging. So again, stay tuned.