While figuring out what went wrong with reverse threading support in the Phorum PHP Extension, I drew up some information on how the tree sorting code does its job, to find out where my logic went wrong in the extension code. I figured that it might be useful to others, so here’s the annotated schematics. This is not the full code solution, but it explains how Phorum’s non-recursive tree sorting algorithm works.

The message tree that has to be sorted could look like this (multiple threads in a forum):

0 (virtual root node: forum)
|
+--1
|  |
|  +--5
|  |  |
|  |  +--6
|  |
|  +--8
|  .  |
|  .  +--9
|  .
|  ..(10)
|     .        \
|     ...11     |
|        |       > some broken stale data, e.g. because node
|        +--12  |  10 was removed incorrectly.
|              /
+--2
|  |
|  +--3
|
+--4
   |
   +--7

First pass: build a parent -> childs overview

First tree ordering step: build parent -> childs relations. This must work for any input array order. During this step, we also keep track of messages for which we do not encounter a parent (stale messages).

For the above tree, we get this relation table:

+--------+-------+--------+
| parent | stale | childs |
+--------+-------+--------+
| 0      |   *   | 1/2/4  |
| 1      |       | 5/8    |
| 2      |       | 3      |
| 3      |       | -      |
| 4      |       | 7      |
| 5      |       | 6      |
| 6      |       | -      |
| 7      |       | -      |
| 8      |       | 9      |
| 9      |       | -      |
| 10     |   *   | 11     |
| 11     |       | 12     |
| 12     |       | -      |
+--------+-------+--------+

Since there is no real parent 0 (that is the virtual parent id for all
thread starter messages), that one will be marked at stale too. But we
will not do any processing on that one.

The other stale items have to be taken care of, to ensure that our message tree is consistent. The handling for stale items is that they get promoted to a higher parent node. In case the starter message for the stale message’s thread exists, the message (and its childs) gets linked to that thread starter. In case the start message does not exist, the message gets linked to parent id 0 and thus effectively represents a separate thread from here on. This is wrong but, but it is the best way to prevent problems. And remember: this shouldn’t happen anyway.

After fixing the stale data, our tree would look like this (asuming that node 10 had thread id 1):

+--------+-------+--------+
| parent | stale | childs |
+--------+-------+--------+
| 0      |       | 1/2/4  |
| 1      |       | 5/8/10 | <-- 10 added to node 1
| 2      |       | 3      |
| 3      |       | -      |
| 4      |       | 7      |
| 5      |       | 6      |
| 6      |       | -      |
| 7      |       | -      |
| 8      |       | 9      |
| 9      |       | -      |
| 10     |       | 11     |
| 11     |       | 12     |
| 12     |       | -      |
+--------+-------+--------+

Second pass: generate the sorted tree

We start at parent node 0. From there we go into the child messages one by one. For each child record, we first go into that child’s children (so it’s a depth first tree walk). It is very tempting to handle this in a classic recursive call solution, however this is not good for performance.

The Phorum team hacked up a non recursive tree sort after some beers at the MySQL conference 2007. This method uses a single loop over the messages and a stack to keep track of the tree. Here’s what this looks like for our example tree (I kept the stale data out of here, but that would work the same way once it’s tied to an existing parent). I hope you can follow this somewhat hard looking schema.

Legenda: 

  * = add a new item to the tree
  > = move down to a child node
  < = move up to a parent node 

  (N):X:Y:Z = Node N, children to process are X, Y and Z. 

action  stack0     stack1     stack2     stack3    tree order
----------------------------------------------------------------------------
  *0    (0):1/2/4                                  0
  >1    (0):2/4    (1):5/8                         0
  *1    (0):2/4    (1):5/8                         0,1
  >5    (0):2/4    (1):8      (5):6                0
  *5    (0):2/4    (1):8      (5):6                0,1,5
  >6    (0):2/4    (1):8      (5)        (6)       0,1,5
  *6    (0):2/4    (1):8      (5)        (6)       0,1,5,6
  <5    (0):2/4    (1):8      (5)                  0,1,5,6
  <1    (0):2/4    (1):8                           0,1,5,6
  >8    (0):2/4    (1)        (8):9                0,1,5,6
  *8    (0):2/4    (1)        (8):9                0,1,5,6,8
  >9    (0):2/4    (1)        (8):9                0,1,5,6,8
  >9    (0):2/4    (1)        (8)        (9)       0,1,5,6,8
  *9    (0):2/4    (1)        (8)        (9)       0,1,5,6,8,9
  <8    (0):2/4    (1)        (8)                  0,1,5,6,8,9
  <1    (0):2/4    (1)                             0,1,5,6,8,9
  <0    (0):2/4                                    0,1,5,6,8,9
  >2    (0):4      (2):3                           0,1,5,6,8,9
  *2    (0):4      (2):3                           0,1,5,6,8,9,2
  >3    (0):4      (2)        (3)                  0,1,5,6,8,9,2
  *3    (0):4      (2)        (3)                  0,1,5,6,8,9,2,3
  <2    (0):4      (2)                             0,1,5,6,8,9,2,3
  <0    (0):4                                      0,1,5,6,8,9,2,3
  >4    (0)        (4):7                           0,1,5,6,8,9,2,3
  *4    (0)        (4):7                           0,1,5,6,8,9,2,3,4
  >7    (0)        (4)        (7)                  0,1,5,6,8,9,2,3,4
  *7    (0)        (4)        (7)                  0,1,5,6,8,9,2,3,4,7
  <4    (0)        (4)                             0,1,5,6,8,9,2,3,4,7
  <0    (0)                                        0,1,5,6,8,9,2,3,4,7
----------------------------------------------------------------------------

While this was an okay view for me when tracking the algorithm, this might not be very appealing to others. So I drew up an image of the path that we take while walking the tree. Click the image below for a full view:

Tree walk path

The resulting tree order corresponds to the top to bottom order for the nodes in the example tree at the start of this document. The stack depth at the points where we add an item to the tree can be used to track at what level the node in the tree is situated. This data combined, is enough to build the sorted tree.

So, what about the reverse threading?

If reverse threading is enabled, then the nodes would be added in reverse order. So for node 0, the child nodes would be processed in the order 4, 2, 1 instead of 1, 2, 4 that we showed above. The whole tree walking schema is the same for the rest of the processing.

For all software that I write, I am an immense fan of the “Eat your own dog food” dogma. It tells that you should be willing to run your own software on your own systems. It’s the perfect way of finding out if this new dog food that you invented really is as good as you think it is. As long as you do not want to eat your own dog food, then there is probably something wrong with it.

Applying this rule to Phorum development, it means that I always try to keep up with the bleeding edge development tree for my live forum web site. This is of course not for the faint of heart and I do not recommend anybody to start doing this, unless:

  • You are an experienced Phorum developer;
  • You do not panic when things go “boom!“;
  • You have a user base which is prepared for the worst.

Especially the users are an important factor here, since forums involve both a lot of users and a lot of user functionality. If the users are not prepared well for the fact that occasional bugs can occur, they might get angry with you and leave. To keep this from happening, here are a few things that you can do:

  • Let your users know that they are used as lab rats.
    Before I switched my forums from some old Phorum 3 version to the active 5.2 development tree, I explained to my users that they would become an important part of Phorum development, because they would always be using the most up-to-date Phorum system in the world. Therefore they would be the first ones to run into development bugs.
  • Let your users know that being a lab rat is cool.
    Users will not only be the first ones to run into bugs, they will also be the ones to run into brand new features. Since development is spread over time, these new features will not arrive all at once. We all know that in the end it is more fun to slowly eat a bag of candy than to eat a full bag in one go. The users will also be the ones to be using new features, long before they become available through the stable software releases.
  • Always inform about upgrades that you have done.
    I always let my users know when I upgrade some part of the system. If the upgrade involves a visible change I tell them what changes for them. In my experience, users tend to report more bugs if they can respond to an upgrade notification. I often have received bug reports as response to an upgrade notification, which did not at all relate to the specific upgrade.
  • Always respond to bug reports.
    Do not let your users linger after they reported a bug to you. Users that report bugs are valuable and you should treat them as such. Try to solve bugs on short notice and report back about the progress. If you cannot solve a bug immediately, be sure to let the users know that you are working on it.

A lot of this info might seem rather obvious. I have seen projects however, where user care was not really high on the agenda, causing all kinds of user frustration. I hope this info might be useful to others who want to involve their user base in their software development process.

The last couple of days, I have been working on the “sphinx_search” module. This is a Phorum module which implements the Sphinx search engine as the backend for the Phorum search interface. This search engine is truly amazing for its search speed. Its query parser however, is not yet fully matured and has some shortcomings therefore. This caused us some troubles when we tried to use the extended query syntax for defining searches on both the body and subject of the forum messages. In this article I will show what work arounds were implemented in the “sphinx_search” module to get around this. Maybe these examples are useful for other who are using Spinx as well.

Search on all words in the query

We started out with the following format for defining a query “search for bodies that have both word1 and word2 in it or subjects that have both word1 and word2 in it”:

@body word1 word2 | @subject word1 word2

This appeared to be based on a wrong assumption. We found out that the OR symbol “|” takes precedence, making this query look like this for Sphinx:

@body word1 & (@body word2 | @subject word1) & @subject word2

This is not what we were looking for. So to make the query work like we wanted it to, we needed to group the query terms in the query like this:

(@body word1 word2) | (@subject word1 word2)

Unfortunately, this query syntax is not allowed by Sphinx (yet), so we had to find a different solution. After some experimenting, we ended up with the following query:

@body word1 | @subject word1 & @body word2 | @subject word2

This query looks like this for Shpinx:

(@body word1 | @subject word1) & (@body word2 | @subject word2)

Using this query we were able to get some good results. Note that this query is a little bit different from the query that we tried at first. For our application, this difference is no problem however. In fact, this latter query syntax makes the “sphinx_search” module behave the same as the standard Phorum search (which searches globally over the subject and body), so it is actually better in our case.

Search on any words in the query

When searching on any word in the query, things are easier. For this query it is possible to simply use one of these two syntaxes:

@body word1 | word2 | @subject word1 | word2
@body word1 | @subject word1 | @body word2 | @subject word2

Although the first notation is somewhat more elegant, I settled for the second one. That one is closest to the syntax that I ended up using for the “all words” query, which made it easier to program the quiry builder in the module.

Search on negated words

Using negated words, some words can be excluded from the search. So you could for example search for “word1 -word2” in the Phorum search. Again, we missed the option to group parts of the query, disallowing us to do the following query:

(@body word1 -word2) | (@subject word1 -word2)

To make this query work, we finally came up with the following syntax:

@body -word2 & @subject -word2 & @body word1 | @subject word1

This query will be seen by Sphinx as:

@body -word2 & @subject -word2 & (@body word1 | @subject word1)

Again, not a really elegant query syntax, but at least it can be parsed correctly by Sphinx this way and it delivers the required results.

Secret Sauce

October 13, 2007

Hello. My name is Maurice.
I am addicted to Phorum programming.
“Hi Maurice!”.

Finally, as the last one of the Phorum core development team (you can now stop pointing at me and calling me names like “troglodyte” and “cavemen” guys), I started a blog. This is where I will be blogging about my Phorum addiction and about my attempts to stay addicted as much as possible.

I am a huge fan of the Phorum module system and have built many modules that make use of it already. So one of the things that I want to talk about here, is the way in which I solved certain module writing issues, to make other module writers benefit from it. Also, if you are a module writer that is running into troubles with implementing a certain feature as a module, then please drop me a line and explain your problem to me. If your problem is a general module design issue, then I will be most happy to write an article about it to help you out.

Secret sauce?

So what is “Secret Sauce”? Well, it’s a concept that I picked up last year when the Phorum team was able to meet up at the MySQL conference in the U.S. One night we went out to grab a bite and I was totally stuck at choosing something to eat from the menu. Just when I was prepared to order anything at random, my eyes were drawn towards two words that seemed to be highlighted by some devine glow: SECRET SAUCE. For no other reason than me liking the idea of secret sauce, I ordered the sandwich that went with it and later on adopted the name for referring to my Phorum development activities.

Phorum 5.1.16 has just been released. In this new version, a couple of simple enhancements were implemented to facilitate module writing. Most of the changes were made to allow for writing more self-contained modules that do not need manual intervention at installation/upgrade time for moving script and template files to the right locations within the Phorum directory.

In this post I will explain the new features that can be used.

Templates that are stored within the module directory

It’s now possible to easily access the Phorum template system from within your modules. Before, using your own templates could only be done by manually copying the required template files to the appropriate Phorum templates directories. This is no longer needed. Storing and using templates can now be fully contained in a module.

To use this new feature, you have to create a templates directory in your module’s directory, which uses the same template structure as Phorum’s main templates directory. If you write a module named “foo”, which needs a template called “bar.tpl”, then you can store this in the file:

./mods/foo/templates/default/bar.tpl

If you have a main Phorum template named “yourtemplate”, for which you need a modified “bar.tpl”, you can put the modified template in:

./mods/foo/templates/yourtemplate/bar.tpl

If a module template is requested, Phorum will automatically check if there is a template file available for the active template. If there is none, then the template file in templates/default will be used. This is a huge improvement over copying the template files to Phorum’s templates dirctory, because now you do not have to copy the file to each template directory that is in use.

After you have created the templates that are used by your module, you can access them by prefixing the template name by “<module name>::”. So for the example template from above, you could use the following methods to access the template:

Loading the template from (a module’s) PHP code:
$template = phorum_get_template(“foo::bar”);
print $template;

Including the template from another template file:
{INCLUDE foo::bar}

Addon scripts that can be contained in a module

Addon scripts are scripts that implement functionality that cannot be implemented through hooking into the standard module hook system. Mostly, these are scripts that implement totally new functionlity instead of tweaking exiting functionality.

A special script was added to the Phorum release, to make it possible to write scripts that act as addon scripts, but which are implemented through a module. The big advantage of doing this is that installing and maintaining becomes much easier for admins that are using your addon. Installing and upgrading will no longer include moving files to the right directories in Phorum. Instead, installing becomes as easy as installing and activating a standard module.

For information on how to use the new addon.php script for your addon, please read the addon.php script itself. In that file I documented the use of this script.

Better module support for the posting editor

The hook “posting_custom_action” was added to the editor code. This hook is at a convenient place for doing all kinds of processing in the message editing process. If you need to handle custom fields that were added to the editor, then this is the hook to use for processing them. Also, if you want to change meta data for a message, then this is the designated hook for doing that.

Which template the editor code shows, is now determined by the variable $PHORUM[“posting_template”]. This allows for using your own templates for separate posting screens.

I wrote a new poll module from scratch, as a showcase for the new features in the posting editor code and the use of module templates. This module implements the poll editor as a fully separate screen in the posting process. Check out the code of the Topic Poll Module for an example of how to use the new possibilities.

Future changes

Phorum module writing is still in development and module authors will probably hit some (possibly brick) walls now and then. Those authors are invited to tell us what those walls are, so we can see if adjustments are needed to make Phorum more module friendy. Some things that I have already been thinking about are:

– better documentation (will be done for Phorum 5.2);
– a system for scheduling hook priorities (e.g. to always run the bbcode hook as the last hook). A beta system for this is already implemented in the Phorum 5.2 development tree;
– a library containing functions for recurring module tasks;
– more information in the module’s info.txt (like module version and compatibility).

More ideas for module writing improvements are welcome at all time.

This week I have been looking for better ways to document Phorum than using the Wiki on the website and text-files in the distribution. I am very enthousiastic about what Docbook has to offer for this. Therefore I have started a project to put all Phorum documentation in the Docbook format.

Those who are interested can take a peek at the results so far. The first thing that is documented is the template system and the new version of the template language in Phorum 5.2.

Since you will probably be one of the users of the Phorum Manual, suggestions for what you want to see in there are of course welcome.

Beware that the documentation is for Phorum 5.2, so there might be things in there that do not apply to the current stable 5.1.x release of Phorum!