»
« home   paste   Anonymous | Login | Signup for a new account 05-24-2019 07:52 CEST
 
* X »
«
GeSHi - Generic Syntax Highlighter Syntax Coloriser for PHP
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000086 [GeSHi] core minor always 07-28-06 23:17 09-14-06 00:05
Reporter nigel View Status public  
Assigned To Netocrat
Priority normal Resolution open  
Status assigned   Product Version 1.1.2alpha2
Summary 0000086: Single character context should support proper single characters
Description The single character class does not support "wide" single characters like '\xFFFF' (for C). The issue here is that \xFFFF is actually a character of length one.

The fact that characters of length over 1 can only occur via regex may be helpful in detecting whether the single char context should start - i.e. after finding a ', just look at the next part of the string comparing with all the characters/regexes specified.

Which leads to an optimisation: if the single char context started we might know its length, then we just do a substr() to get the contents of the context.
Additional Information
Attached Files  class.geshisinglecharcontext.php.patch [^] (6,225 bytes) 07-30-06 03:29
 functions.geshi.php.patch [^] (929 bytes) 07-30-06 03:30
 class.geshisinglecharcontext.php.v2.patch [^] (6,738 bytes) 08-01-06 12:50
 functions.geshi.php.v2.patch [^] (2,829 bytes) 08-01-06 12:51

- Relationships

- Notes
(0000415)
Netocrat
07-30-06 03:29

I've generated patches that attempt to deal with this. They are against the files geshi/functions.geshi.php and geshi/classes/class.geshisinglecharcontext.php.

The new code performs full checking for validity - including the end delimiter - in getContextStartData(), and stores the data so that _getContextEndData() simply returns the stored data. This is as close to the optimisation that you were hoping for as it seems is possible. There's potential for other optimisation though.

As well as resolving this issue, the patches:
* remove any assumptions on the length of the start delimiter, to support e.g. C's wide characters that begin with L' (the end delimiter is still assumed to have length 1).
* add a setDisallowEmptyChars() method to specify that empty characters are illegal, as they are in C: '' is a syntax error
* introduce recognition of $offset to geshi_get_position() when $needle is a REGEX: this is required to support the first patch.

I'll attach the patches to this bug report.
 
(0000416)
nigel
07-30-06 14:13

Looks good so far. I'm guessing that supporting a delimiter longer than length one would not be too hard now if it was required.
 
(0000417)
Netocrat
07-31-06 00:40

It would be easier, and the places where that assumption has been made are easier to spot.
 
(0000423)
Netocrat
08-01-06 12:56

...and I've done that, as well as allowing for arbitrary-length escape characters, and also fixing the issue indicated by a // WARN comment: now the most inclusive matching escape sequence is found, rather than the first one encountered. The changes are in the v2 patches that I've just uploaded.
 
(0000425)
nigel
08-08-06 00:00

As mentioned in e-mail: you may add them :)
 
(0000447)
nigel
09-14-06 00:05

Netocrat: As far as I can tell (after finally doing escape character grouping and updating the C language file) this seems to be fixed. The only thing I haven't done is to put the test cases in my test system, but that's another bug. Is this OK to be marked as resolved?
 

- Issue History
Date Modified Username Field Change
07-28-06 23:17 nigel New Issue
07-28-06 23:17 nigel Status new => assigned
07-28-06 23:17 nigel Assigned To  => Netocrat
07-30-06 03:29 Netocrat Note Added: 0000415
07-30-06 03:29 Netocrat File Added: class.geshisinglecharcontext.php.patch
07-30-06 03:30 Netocrat File Added: functions.geshi.php.patch
07-30-06 14:13 nigel Note Added: 0000416
07-31-06 00:40 Netocrat Note Added: 0000417
08-01-06 12:50 Netocrat File Added: class.geshisinglecharcontext.php.v2.patch
08-01-06 12:51 Netocrat File Added: functions.geshi.php.v2.patch
08-01-06 12:56 Netocrat Note Added: 0000423
08-08-06 00:00 nigel Note Added: 0000425
09-14-06 00:05 nigel Note Added: 0000447

  


Mantis 1.0.0rc2[^]
Copyright © 2000 - 2005 Mantis Group
44 total queries executed.
31 unique queries executed.
Powered by Mantis Bugtracker