GeSHi Bug Tracker - GeSHi
Viewing Issue Advanced Details
93 optimisations minor always 09-17-06 22:13 09-19-06 06:11
nigel  
nigel  
low  
assigned 1.1.2alpha2  
open  
none    
none  
0000093: Try to avoid passing long strings around
geshi_get_position currently takes some very long strings (upwards of 50K quite easily on large source), and does it often. It would be better if the callers passed shorter strings if they could.
The performance benefit of this is unknown, and will have to be tested.

Notes
(0000455)
BenBE   
09-19-06 06:11   
There are two possible things that could be done about this issue:

1. Pass only the string until the first known match of a string (including some chars behind) for further searching to geshi_get_position as every match that starts AFTER a known match is uninteresting for finding the next delimiter

2. Do first chance matching to find the next delimiter using a shortened version of the original code string (if this is above a given size*). This should significantly speed up RegExps that are known to be slow. Only if no match is found using the shortened string the whole remaining source is passed in for searching. This should help because GeSHi aims to find mostly short context strings thus the found offset is most times near the beginning of the passed string. The disadvantage of this method is that long contexts (e.g. a comment of multiple KB) would significantly slow down as it has to be searched twice - which should be compensated by the long contexts that make highlighting such comments very fast (only few searches of this kind will every appear in a source).