« home   paste   Anonymous | Login | Signup for a new account 08-19-2019 21:15 CEST
* X »
GeSHi - Generic Syntax Highlighter Syntax Coloriser for PHP

Viewing Issue Advanced Details Jump to Notes ] View Simple ] Issue History ] Print ]
ID Category Severity Reproducibility Date Submitted Last Update
0000068 [GeSHi] core minor always 02-10-06 09:44 02-18-06 11:44
Reporter nigel View Status public  
Assigned To nigel
Priority normal Resolution not fixable Platform
Status closed   OS
Projection none   OS Version
ETA none Fixed in Version 1.1.1alpha4 Product Version 1.1.1alpha3
  Product Build
Summary 0000068: Support two-stage parsing
Description The code parser should support two-stage parsing, for cases where it is required. Tim requires this for java.
Steps To Reproduce
Additional Information
Attached Files

- Relationships
related to 0000039assigned tim-w Java support 
child of 0000033assigned BenBE Highlight Labels after certain instructions 

- Notes
02-10-06 10:01

What do you exactly mean with this?

What does he need it for?
02-10-06 13:10

Ben b = new Ben();

How do we detect the first Ben as a user-defined class?

We can detect it after the second one because it's after "new", but by then we've already gone past the first one and it doesn't make sense to store three or four contexts in the store.

However, it's quite natural to catch that Ben the next time if we go through again.

Don't worry - second stage won't be used by most languages, and so there won't be any performance hit, I'll see to that.
02-11-06 03:05

Well it seems as if this might be an option for Label detection for delphi/asm ;-)

ie. for Delphi

    MOV EAX, 1337 //Be 1337
    INC EAX //Get more 1337
    JMP foo //<-- Label

A problem I see arising is the performance *g*
02-11-06 11:18

As I said, performance won't be a problem if you don't use it.

Also, I don't think that's a good way of detecting labels, since that way will only detect labels that are jumped to, not ones that exist but are never jumped to ;). You're better detecting [label syntax]:. Though a second pass will enable you to catch this:

   JMP label
02-11-06 12:56
edited on: 02-11-06 13:11

No. This way I could correctly detect labels at all and your source is the actual proof ;-)

The detection of labels would (as you mention) look for contexts already highlighted as labels and store them on the first pass. The second pass actually would recheck each unknown context block if it might be a label and if it is make it become one.

Having my source as input would just highlight everything correctly, as all labels are known before they get referenced (e.g. in a MOV, JMP, ADD or other ASM instruction argument).

If the input source was the one you specified this knowledge about "label" having to be highlighted as label wouldn't exist for the first pass AND would probably cause trouble when trying to detect it with a pregmatch beforehand. Thus the multi-pass would be the solution.

Maybe you should combine the multipass with the CP-dependend IsComplex-Flag I suggested a while ago. This way you'd save lot of time as you won't have to go through a block twice, when you needn't (e.g. an ASM block contains no labels to highlight).

02-11-06 16:16

I have now looked at this. Unfortunately, it would be impossible to implement :(.

This is because of the way things work, to reduce memory usage:

token found - passed to code parser - passed to renderer - final result stored
token found - passed to code parser - passed to renderer - final result stored
token found - passed to code parser - passed to renderer - final result stored

So each time the memory for the token is cleared. There may be a bit of a hold up in the code parser if the parser stores some on the stack, but memory usage isn't high.

But in order to pass everything through again, we would have to store all the tokens in a huge array again, which defys the point of doing it this way in the first place.

So instead, I have hacked the java code parser to use the stack and some handy references to highlight those class names that would be hard to catch, and also highlight variables, interfaces and methods.

You'll have to think of some other way to do the label thing :p. You might have to just be simple about it:

if matches label regex (so is like foo:) then it is label
if after JMP and other such statements then it is label

Don't forget, we don't care if the source is not correct for the language. So we can use that as an advantage when writing code parsers, and not bother about little things like the method/function/label actually existing.
02-18-06 11:44

Issue closed.

- Issue History
Date Modified Username Field Change
02-10-06 09:44 nigel New Issue
02-10-06 09:44 nigel Status new => assigned
02-10-06 09:44 nigel Assigned To  => nigel
02-10-06 09:45 nigel Relationship added related to 0000039
02-10-06 10:01 BenBE Note Added: 0000301
02-10-06 13:10 nigel Note Added: 0000302
02-11-06 03:05 BenBE Note Added: 0000303
02-11-06 03:08 BenBE Relationship added child of 0000033
02-11-06 11:18 nigel Note Added: 0000304
02-11-06 12:56 BenBE Note Added: 0000305
02-11-06 13:11 BenBE Note Edited: 0000305
02-11-06 16:16 nigel Note Added: 0000306
02-11-06 16:16 nigel Status assigned => resolved
02-11-06 16:16 nigel Resolution open => not fixable
02-11-06 16:16 nigel Fixed in Version  => 1.1.1alpha4
02-18-06 11:44 nigel Status resolved => closed
02-18-06 11:44 nigel Note Added: 0000325


Mantis 1.0.0rc2[^]
Copyright © 2000 - 2005 Mantis Group
51 total queries executed.
37 unique queries executed.
Powered by Mantis Bugtracker