Before we eagerly open up our text editors/IDEs and start throwing code around in the hope it will work, it’s worth setting things up and planning things first. Setting up involves getting the existing code, and planning involves working out what you are going to do to create your language file(s). Planning also involves a bit of background about how GeSHi works, so that you understand why the language files are structured the way they are and know what to do to get feature X working.

Setting Things Up

In order to be able to write language files, you will have to have access to a few things:

  • The GeSHi code
  • A web server/PHP support
  • An IDE/text editor

The GeSHi Code

While you could develop against the latest 1.1.X release, 1.1.X is still quite alpha and is under constant development. Therefore, I would suggest for now that you develop against a SVN-checked-out version of GeSHi. This way, if you have any problems that are bugs or missing features in the core codebase, I can do a fix while-u-wait and you can simply check out the fix from SVN instead of waiting for the next release. It also saves me sending files around in random fashion :)

Another advantage of SVN is that, should you choose to, you could become the official language maintainer of your chosen language. This means that you would be the person in control of that language and any bugs and features you would be assigned to fix. The upside of this is that you would gain publicity for yourself and any other cause you choose on the GeSHi website and elsewhere.

To get GeSHi from SVN, you check it out from SVN at sourceforge using the following parameters:

Path: /svnroot/geshi/trunk/geshi-src

An SVN checkout line as per the following should work, if you’re using *nix:

svn co geshi

Alternatively, you could write your files against the latest released version, albeit at the cost mentioned above. Simply download the latest version from sourceforge and set it up.

A web server/PHP support

I won’t insult you by suggesting you don’t know how to get this working ;). There is a raft of information available on the web for this. GeSHi should not care which operating system or web server you use, and it is supposed to work in any PHP version above 4.1, so try with any of these. If there are problems then they are valid bugs that should be submitted on the bug tracker.

An IDE/text editor

I used to flog Eclipse to the masses here, but now I use vim. Love me or hate me. Use whatever you feel like, as long as it saves with unix line endings, uses four spaces for indentation and doesn’t break the GeSHi coding standards (which are ill defined at this point, but just make sure your files look like the others :))


The Old Way...

One of the lessons I learned from GeSHi 1.0.X is that although it may have its merits, and work 95% of the time, parsing code by string matching and replacement is not a good idea. In 1.0.X, keywords are highlighted by preg_replace, where a complicated regular expression is made to work out whether parts of the code are actually allowed to be keywords, etc. This is bad, because in some languages keywords could come after > symbols, yet in other langauges this may be disallowed.

In addition, rather than build a new string based on the contents of the source code to be highlighted, things that are interesting are replaced inline with the HTML for source highlighting, so for example if the source looks like this:


Then this gets replaced with:


This is dangerous, because what if one of the keywords to highlight was class? We would have to take much care that the existing <span>s in the highlighted result did not get affected.

So, for 1.2 I have thrown away the old model, and exchanged it for a new model based on the idea of contexts and trees.

The New Way

In 1.2, instead of thinking of source code as a string with some interesting bits in it, we think of source code a little more like the way a compiler would parse it. That is, that there are various contexts in the code, like strings and comments, and each context has its own set of rules for what should be highlighted and what should not.

For example, PHP has three types of comments and three types of string:

  • "single" comments - these comments traverse a single line only, e.g. // and # comments.
  • "multi" comments - these comments traverse multiple lines until they hit their end string, e.g. /* ... */ comments
  • "phpdoc" comments - these are like multi comments but have phpdoc tags inside them highlighted, e.g. /** ... */ comments
  • "single" strings - the strings that start and end with a ' (e.g.: 'hello, world!')
  • "double" strings - the strings that start and end with a ". (e.g.: "goodbye, world!"). These have variables inside them, and some quite complicated escape patterns like \0x34
  • "heredoc" strings - these start with <<<MARKER and finish where MARKER occurs again.

As you can see, current GeSHi support for PHP can highlight them all:

 * <b>Silly program</b>
 * Uses {@link sillyStuff}
 * @author Horace P. Quagmire

// silly string
$str = 'Hello, world!';
# Another silly string
$foo = "\"Goodbye,\x20World!\"\n";
  A heredoc string
$str = <<<EOF
Good $afternoon!

Each context can have various things associated with it, such as what keywords it knows about, what symbols should be highlighted and whether numbers should be highlighted. As you can imagine, this is much more powerful than the old way of doing things.

Previous | Up | Next

lang/dev/tutorial/1.txt · Last modified: 2011/09/01 13:03
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki