»
« home   paste   Anonymous | Login | Signup for a new account 06-25-2019 03:21 CEST
 
* X »
«
GeSHi - Generic Syntax Highlighter Syntax Coloriser for PHP
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000092 [GeSHi] optimisations minor always 09-17-06 22:06 09-21-06 09:11
Reporter nigel View Status public  
Assigned To nigel
Priority low Resolution open  
Status assigned   Product Version 1.1.2alpha2
Summary 0000092: Add strpos-check optimisation for delimiter matching
Description A lot of time is spent in geshi_get_position, matching regular expressions with preg_split. This could be reduced if regex delimiters were allowed to specify a string to check with strpos first - if it is not found then the regex can be guaranteed not to match.
Additional Information This will possibly require an API change - addDelimiters will need to somehow convey the information. I'm tempted to just change the format of the 'REGEX#$regex#modifiers' strings to also somehow convey the extra information, then it would always work (and addRegexGroup would actually LOSE a parameter).

This should be benchmark tested to see how much of a gain is won.
Attached Files

- Relationships

- Notes
(0000456)
BenBE
09-19-06 08:31

I won't change those strings themself, but change their container:

Normal Pos:
'needle'

RegExp:
Array(
    'regexp',
    'needle' /*optional*/
    TRY_OFFSET /* Offset that should be added to the found match */
);

Thus the check for a regexp should become simpler by this change. Also this won't require too much change inside the API (only for geshi_get_pos and the language files where the letter could still use the old format 'REGEX#...' if the "needle" feature is not required for them.

The TRY_OFFSET is the offset that is added to an found match of needle before using the regexp. This can be useful in situations where you want to look for RegExps having look aheads or look behinds OR where the preview needle is not the start of the regexp match (if looking for numbers with '.' as needle offset would be -10 as there are no more than 9 digits expected before the '.' ... (little example, no need to fit any purpose :P).
 
(0000457)
nigel
09-19-06 10:51
edited on: 09-19-06 10:52

Sounds alright, tho if we put regexes in an array then we may as well put *all* of them in, which will make the check for regex in geshi_get_position a simple is_array rather than an is_array OR substr(0, 5) == 'REGEX'.

The try offset seems useful too. When matching HTML tags for example, the needle could be < and the offset could be 0, meaning anything before the tag could be ignored. Useful when you have this:

&lt;p>
... lots of text
&lt;/p>

And after the start tag you're looking for the end. It will just to the regex match on the

and match immediately.

 
(0000458)
BenBE
09-19-06 23:03

This conversation between 'REGEX#...' to Array(...) could be done by the language file API to be compatible with the current language files without having to change too much.

Regarding the Ender Matching I haven't looked into your so so closely ... Compare 0000093 too for a possible slowdown (cf. Note: 0000455).
 
(0000459)
nigel
09-20-06 02:00

I'm not interested in compatibility with existing language file features - GeSHi is in alpha. After 1.2 it will be a different story. It's not hard for me to change all the languages at the moment.
 
(0000460)
BenBE
09-21-06 05:28

I know it's easy to change them, but it's easier to have the REGEX-Format still accepted and converted to the Array-Format automatically if necessary.
 
(0000462)
nigel
09-21-06 09:11

For speed reasons there's no reason to support the old way.I don't want to worry about BC in an alpha release, and offering two ways to do something will cause BC problems in the future if one is removed. Not to mention additional code to detect both cases - I know that the single character context for one does it's own checking of whether something is a regex.
 

- Issue History
Date Modified Username Field Change
09-17-06 22:06 nigel New Issue
09-17-06 22:06 nigel Status new => assigned
09-17-06 22:06 nigel Assigned To  => nigel
09-17-06 22:06 nigel Relationship added child of 0000091
09-18-06 00:10 nigel Category core => optimisations
09-18-06 00:11 nigel Relationship deleted child of 0000091
09-19-06 08:31 BenBE Note Added: 0000456
09-19-06 10:51 nigel Note Added: 0000457
09-19-06 10:51 nigel Note Edited: 0000457
09-19-06 10:52 nigel Note Edited: 0000457
09-19-06 23:03 BenBE Note Added: 0000458
09-20-06 02:00 nigel Note Added: 0000459
09-21-06 05:28 BenBE Note Added: 0000460
09-21-06 09:11 nigel Note Added: 0000462

  


Mantis 1.0.0rc2[^]
Copyright © 2000 - 2005 Mantis Group
50 total queries executed.
36 unique queries executed.
Powered by Mantis Bugtracker