Basically GeSHi already does use the results gathered when executing the delimiter searching. However you can't fully trust that information alone to find a context being split into parts:
A nice example would be:
Text := 'Hello' + // '
Splitting this source by its delimiters (the single quotes for a string) would lead to a break in highlighting since you need to care about the // introducing a single line comment.
Anyway you could optimize the use of the information gathered here by avoiding subsequent calls to geshi_get_position (@nigel: Remember the mail about that function ;-)) by calculating offsets relative to the current parsing position.
Given the above source you get:
0x00 Text unknown
0x05 := symbol
0x08 ' single_string
0x09 Hello unknown
0x0E ' single_string
0x10 + symbol
0x12 // single_comment
0x15 ' single_string
0x19 ' single_string
0x1A World unknown
0x20 ! unknown
0x21 ' single_string
0x22 ; symbol
Getting this list shouldn't be the problem (for long sources this should be limited to a reasonable size (cf. my mail).
Now you can use this:
a) for getting the first position (side effect)
b) getting the next starter without new searching (if there is a context beginning at higher offset, then the current ended.
If now e.g. GeSHi finds the string at 0x00 and completes its highlighting it steps on to 0x05 and finds 0x05 to be the next token to be rendered (I'm ATM not sure if the first step includes 0x04 already, but I assume). Now you can skip all positions before 0x05 (i.e. 0x00 and 0x04. For now not much gain ...
But let's go on for 0x12:
GeSHi finds it, parses it as comment (as the starter implies) and returns at location 0x17. Without searching again for splitters GeSHi can now skip the space at 0x14, the ' at 0x15 and the \n (the ender) at 0x16. finding its next item to care at location 0x17 (or 0x19 if spaces are ignored as the belong to no context). What we gain here is quite simple: Although we only looked into the string once we gathered enough information to not have to look into it again (until we reach the end of the buffer we already analyzed.
The prerequisits to have this work would be:
a) all splitters and starters in a big regexp (cf. your mail ;-))
b) an intelligent "look-ahead" system to guess up to which position a analyzis session should go (if it reaches to far into a long context it wastes time, if it's to short it moves our performance gain to /dev/nul ;-)).
Splitting the source is not an idea. What I meant was that the preg_matches are recorded, so they can be stored. Ex.
function geshi_foo_foo (&$context)
function geshi_foo_foo_bar (&$context)
This source fed to it is:
s // Commentah
somefoo some content blah foobar
Then, the matches will be stored as array('somefoo', 'foobar'), for example.
But, if you feed it the source:
somebar dgdfg foobar
The matches will be stored as array('somebar', 'foobar')
Even though it's the same context, really.
Get the idea?