GeSHi Bug Tracker - GeSHi
Viewing Issue Advanced Details
68 core minor always 02-10-06 09:44 02-18-06 11:44
nigel  
nigel  
normal  
closed 1.1.1alpha3  
not fixable  
none    
none 1.1.1alpha4  
0000068: Support two-stage parsing
The code parser should support two-stage parsing, for cases where it is required. Tim requires this for java.
related to 0000039assigned tim-w Java support 
child of 0000033assigned BenBE Highlight Labels after certain instructions 

Notes
(0000301)
BenBE   
02-10-06 10:01   
What do you exactly mean with this?

What does he need it for?
(0000302)
nigel   
02-10-06 13:10   
Ben b = new Ben();

How do we detect the first Ben as a user-defined class?

We can detect it after the second one because it's after "new", but by then we've already gone past the first one and it doesn't make sense to store three or four contexts in the store.

However, it's quite natural to catch that Ben the next time if we go through again.

Don't worry - second stage won't be used by most languages, and so there won't be any performance hit, I'll see to that.
(0000303)
BenBE   
02-11-06 03:05   
Well it seems as if this might be an option for Label detection for delphi/asm ;-)

ie. for Delphi

asm
    MOV EAX, 1337 //Be 1337
foo:
    INC EAX //Get more 1337
    JMP foo //<-- Label
end;

A problem I see arising is the performance *g*
(0000304)
nigel   
02-11-06 11:18   
As I said, performance won't be a problem if you don't use it.

Also, I don't think that's a good way of detecting labels, since that way will only detect labels that are jumped to, not ones that exist but are never jumped to ;). You're better detecting [label syntax]:. Though a second pass will enable you to catch this:

...
   JMP label
...
label:
(0000305)
BenBE   
02-11-06 12:56   
No. This way I could correctly detect labels at all and your source is the actual proof ;-)

The detection of labels would (as you mention) look for contexts already highlighted as labels and store them on the first pass. The second pass actually would recheck each unknown context block if it might be a label and if it is make it become one.

Having my source as input would just highlight everything correctly, as all labels are known before they get referenced (e.g. in a MOV, JMP, ADD or other ASM instruction argument).

If the input source was the one you specified this knowledge about "label" having to be highlighted as label wouldn't exist for the first pass AND would probably cause trouble when trying to detect it with a pregmatch beforehand. Thus the multi-pass would be the solution.

Maybe you should combine the multipass with the CP-dependend IsComplex-Flag I suggested a while ago. This way you'd save lot of time as you won't have to go through a block twice, when you needn't (e.g. an ASM block contains no labels to highlight).

(0000306)
nigel   
02-11-06 16:16   
I have now looked at this. Unfortunately, it would be impossible to implement :(.

This is because of the way things work, to reduce memory usage:

token found - passed to code parser - passed to renderer - final result stored
token found - passed to code parser - passed to renderer - final result stored
token found - passed to code parser - passed to renderer - final result stored
....

So each time the memory for the token is cleared. There may be a bit of a hold up in the code parser if the parser stores some on the stack, but memory usage isn't high.

But in order to pass everything through again, we would have to store all the tokens in a huge array again, which defys the point of doing it this way in the first place.

So instead, I have hacked the java code parser to use the stack and some handy references to highlight those class names that would be hard to catch, and also highlight variables, interfaces and methods.

You'll have to think of some other way to do the label thing :p. You might have to just be simple about it:

if matches label regex (so is like foo:) then it is label
if after JMP and other such statements then it is label

Don't forget, we don't care if the source is not correct for the language. So we can use that as an advantage when writing code parsers, and not bother about little things like the method/function/label actually existing.
(0000325)
nigel   
02-18-06 11:44   
Issue closed.