»
« home   paste   Anonymous | Login | Signup for a new account 09-22-2017 15:31 CEST
 
* X »
«
GeSHi - Generic Syntax Highlighter Syntax Coloriser for PHP
  

Viewing Issue Simple Details Jump to Notes ] View Advanced ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000038 [GeSHi] core feature always 12-12-05 09:16 02-18-06 11:47
Reporter BenBE View Status public  
Assigned To nigel
Priority normal Resolution fixed  
Status closed   Product Version 1.1.1alpha3
Summary 0000038: Parse comment contexts as single regexps
Description For speed improvements it should be possible to tell comment contexts not be subdivided into their words.

Thus "// Test\n" get's detected as a comment and will not be subdivided into //, whitespace, Test and \n thus saving a lot of calls and improving performance significantly.

This step might involve introducing an extra flag for "rubish"-contexts that are not to be highlighted as is
Additional Information such rubish-contexts might be introduced by the context loader into the parents context as "/$starter.*$ender/i" or something simular (especially for one-char-enders). Thus single and multi-line comments for Delphi e.g. would reduce to 3 regexps that would have to be done with nothing more to do in between.
Attached Files

- Relationships
related to 0000037closed BenBE Delphi CP: Block Starter\Ender Detection irritated by comments 
child of 0000060assigned nigel CP-Dependend isComplex-Flag 

- Notes
(0000132)
BenBE
12-12-05 09:22

Before you ask for the connection between those bugs:
0000037 gave the initial idea for this performance optimization ;-)
I could already solve it for Delphi otherwise, including this change it would probably get alot faster.
 
(0000135)
nigel
12-12-05 09:35

Rather than using regular expressions and thus losing the advantage of the context tree, such "rubbish" contexts just need a flag marking them as not to be split. Then if the flag is set I can just pass it all in at once.

This fix should be both trivial, performance-enhancing, and not break every other language ;).
 
(0000136)
BenBE
12-12-05 11:46

At least you hope the last one ;-) It might be that I'll have to do some changes in the DCP, but basically it should work out properly.
 
(0000153)
nigel
12-18-05 12:57

I have made some changes to implement this.

There is now an _isComplex flag for contexts. If true, stuff gets split by whitespace and handed in seperately. If false (the default), it gets handed in at once.

I marked the delphi root context as complex, as well as php root contexts.

You may have to change the delphi code parser because of this change.
 
(0000154)
BenBE
12-18-05 13:12
edited on: 12-18-05 13:18

Why didn't you do that vice versa as "isSimple"-flag? An IsSimple flag with default false would have been much more logical in my eyes.

Anyways I'll look into any changes that have to be made. What things do I have to put special care in?

Does the isComplex only have effect on the execution of the CP in one single instance or does isComplex=false deactivate any subparsing?

 
(0000155)
nigel
12-18-05 18:17

Because the vast majority of contexts are simple. All comments and strings for example.

We want the default to be simple and thus fast, because for most languages the code parser won't be used for most of the contexts.

If you think about it, I've only marked four contexts as "complex" at this time. Four out of over 20.

As for special care: well comments/strings are passed in wholesale, as will be your ASM stuff. Your root delphi stuff will be broken up.

How isComplex works:

  if false then
    pass the whole token (i.e. the whole comment/string/whatever) to the codeparser at once
  else
    break up into pieces by whitespace and send them in one by one
 
(0000156)
nigel
12-18-05 18:19

Actually, your comment gives me an idea.

Rather than boolean, it could be like an ENUM.

If 0, don't even send to the code parser (could do this for comments/strings)
If 1, send as one big thing to code parser (can't think of where this is needed currently but you never know)
If 2, send broken up by whitespace.
 
(0000192)
BenBE
12-24-05 01:07

What about current state of this one?
 
(0000195)
nigel
12-24-05 15:16

It should have gone back to assigned. I guess you have no quarrel with the idea I put in the post before this then?
 
(0000200)
BenBE
12-24-05 15:24

No, I don't ;-)

For point 1: might be required for delphi/asm/comments ... I'll have to look into that. The ASM needs to know if the instruction was finished (Line break) or not. If the comment was completely left out, it would cause problems ...
 
(0000206)
nigel
12-24-05 21:50

Yes well that's something for you to work out for your parser. I will make that change soonish though, and you'll know about it here when I do.
 
(0000223)
nigel
01-02-06 18:45

Okay, I have made this change. The _isComplex flag is now _complexFlag, and can have three values:

  * GESHI_COMPLEX_NO: don't even bother passing to code parser
  * GESHI_COMPLEX_PASSALL: pass the whole match to the code parser
  * GESHI_COMPLEX_TOKENISE: break up by whitespace and pass the bits in to the code parser

Defalut is GESHI_PARSE_NO. I updated all of the current files that use complex to use this new stuff. For your delphi stuff, I made delphi.php GESHI_COMPLEX_TOKENISE, and the comment contexts to GESHI_COMPLEX_PASSALL, though you'll probably need to review that, since your stuff is probably broken now.
 
(0000228)
BenBE
01-03-06 01:38

Up to now I didn't notice any problems as the source simply tests for comments, not for the stuff they consist of ;-)

Maybe there's still a bug in my DCP, but that is not related to this issue rather than something I probably missed including.
 
(0000234)
nigel
01-03-06 11:08

Yes, well I only suggested that there might be a problem because I'm unfamiliar with how your code parser works and haven't got any ASM test cases lying around, so there may well not be be any problems.

If there are problems, they're related to this bug but not directly covered by it, so this bug is now resolved.
 
(0000331)
nigel
02-18-06 11:47

Issue closed.
 

- Issue History
Date Modified Username Field Change
12-12-05 09:16 BenBE New Issue
12-12-05 09:16 BenBE Status new => assigned
12-12-05 09:16 BenBE Assigned To  => nigel
12-12-05 09:17 BenBE Relationship added related to 0000037
12-12-05 09:22 BenBE Note Added: 0000132
12-12-05 09:35 nigel Note Added: 0000135
12-12-05 11:46 BenBE Note Added: 0000136
12-18-05 12:57 nigel Note Added: 0000153
12-18-05 12:57 nigel Status assigned => feedback
12-18-05 13:12 BenBE Note Added: 0000154
12-18-05 13:18 BenBE Note Edited: 0000154
12-18-05 18:17 nigel Note Added: 0000155
12-18-05 18:19 nigel Note Added: 0000156
12-24-05 01:07 BenBE Note Added: 0000192
12-24-05 15:16 nigel Note Added: 0000195
12-24-05 15:16 nigel Status feedback => assigned
12-24-05 15:24 BenBE Note Added: 0000200
12-24-05 21:50 nigel Note Added: 0000206
01-02-06 18:45 nigel Note Added: 0000223
01-03-06 01:38 BenBE Note Added: 0000228
01-03-06 11:08 nigel Note Added: 0000234
01-03-06 11:08 nigel Status assigned => resolved
01-03-06 11:08 nigel Resolution open => fixed
01-03-06 11:08 nigel Fixed in Version  => 1.1.1alpha4
01-10-06 00:38 BenBE Relationship added child of 0000060
02-18-06 11:47 nigel Status resolved => closed
02-18-06 11:47 nigel Note Added: 0000331

  


Mantis 1.0.0rc2[^]
Copyright © 2000 - 2005 Mantis Group
66 total queries executed.
44 unique queries executed.
Powered by Mantis Bugtracker