GeSHi Bug Tracker - GeSHi
Viewing Issue Advanced Details
109 lang feature N/A 02-11-07 11:20 02-15-07 05:35
assigned 1.1.2alpha3  
0000109: Common language names for ASM class languages
As there are multiple compilers\assembler notations for most target platforms (x86, z80, ...) we should create a unique naming scheme for all of them.
This grouping should be based on the target platform rather then program name or using a general "asm" language.
parent of 0000107assigned BenBE Z80 Assembly language (1.1.2) 
Not all the children of this issue are yet resolved or closed.

02-11-07 11:38   
Quote of #geshi IRC channel @

[23:00:17] BenBE: Regarding ASM I've a short question ...
[23:01:41] waawaamilk: yo
[23:02:08] BenBE: As ASM is mainly a family of languages and mostly its name represents the class of processors it is targeted on instead of the compiler it is run with, there are mostly multiple flavours of one and the same ASM\CPU target, but multiple compilers with multiple, different styles of noting the commands\instructions ...
[23:02:36] BenBE: I might have asked this before, but what would you suggest on the handling insside the language files?
[23:02:57] waawaamilk: err
[23:03:01] waawaamilk: you lost me :)
[23:03:05] BenBE: As of x86 assembly I know of 3 different notations already ...
[23:03:18] waawaamilk: well
[23:03:32] waawaamilk: asm/x86_notation1, asm/x86_notation2 ...
[23:04:00] BenBE: It's that stuff of:
 and _mov $eax, $eax on the next compiler ...
[23:04:26] BenBE: Well. I mainly prefer x86/programname ...
[23:04:58] BenBE: thus x86/nasm, x86/tasm, x86/masm ...
[23:05:32] waawaamilk: I'm not adding sub sub languages, but it might already be supported
[23:05:39] waawaamilk: if you want that, you can poke the source
[23:05:58] BenBE: I ask, because Knut already noted something regarding this in Note: 0000494.
[23:06:08] waawaamilk: 494?
[23:06:15] BenBE: Bug note 494.
[23:06:27] BenBE: # gives the bug ID, ~ the Bug note ID.
[23:06:44] waawaamilk: oh...
[23:07:15] BenBE: It's 0000106 if you need.
[23:07:29] waawaamilk: yep, found it
[23:07:44] waawaamilk: I don't see how he's asking for sub sub languages there...
[23:07:50] waawaamilk: he's asking which should be the default
[23:07:54] * waawaamilk replied below...
[23:09:05] BenBE: As of GeSHi 1.2 I'd deprecate using asm as the language name as ASM is only a class of languages, rather than a language itself.
[23:09:39] waawaamilk: oh, so rather than sub sub languages
[23:10:03] BenBE: Thus I'd promote having asm class languages called by their target platform instead.
[23:10:11] waawaamilk: right
[23:10:24] BenBE: Should we add a suffix 'asm'???
[23:10:31] waawaamilk: example names being as above by you
[23:10:35] waawaamilk: 11:06 <BenBE> thus x86/nasm, x86/tasm, x86/masm ...
[23:10:47] BenBE: i.e. x86asm, z80asm, ...???
[23:10:52] BenBE: Or use it without?
[23:11:02] waawaamilk: what is z80?
[23:11:09] waawaamilk: it's a platform?
[23:11:55] BenBE: z80 is a 8-bit CPU of ZiLOG Inc. that's used e.g. in the Texaws Instruments TI83+ calculator.
[23:12:19] waawaamilk: aah
[23:12:31] waawaamilk: okay, I think it should be with the asm
(Times are GMT+0100 if one cares :P)

Thus the following points should be remarked:
- The general language names like asm are deprecated as of GeSHi 1.2
- ASM languages should be called by their target platform if applicable
- using the program name as language name is deprecated
- The common language of a target platform should be the notation that has most common syntax notations among the others (i.e. shares most properties in it's notation).
- The name of the Assembly language should be its target platform with the string 'asm' appended

Open questions:
- How to handle platforms with lot's of different subsets of supported instructions? (i.e. x86: 16bit vs. 32bit vs. 64bit; Intel vs. AMD)

Proposed platform names:
- x86: 8086, 80286, 880386, 80486, Pentiumm, AMD K?, AMD XP, AMD X2, ...
- z80: ZiLOG Z80180, Z80280, Z80380
- 68k: 68000, 68020
- ppc: PowerPC (G1 to G5)
- arm: ARM based CPUs and instruction sets
02-11-07 11:41   
Looks good. One minor note is:

11:24 <waawaamilk> just a note: it's very likely I'll add aliasing supports, so people can specify 'x86/masm' as a language and get some other language

So if the language names appear to get quite long, in future it will be possible that a bunch of shorter names are made available that alias to the longer ones. This bug is about laying the languages out on disk as much as about what they are called by GeSHi users.
02-11-07 11:59   
Two further notes:

[23:41:19] BenBE: In ASM it might become common usage to have a common.php define all instructions, registers, ... group them as supported by the CPU and have the final languages that are called only assemble the actual language using this requisites ...
[23:42:15] waawaamilk: that seems reasonable
[23:43:40] BenBE: I.e. for x86 you had a file x86asm/common.php only defining one big array $langdata or something and the acutal language file e.g. x86asm/nasm/80286.php copies values as the CPU supports them ...
[23:45:12] waawaamilk: sure, though you should probably get the big array from a function instead of an actual array, for namespace safety
(Times as usual in GMT+0100 ;-))

[23:46:16] BenBE: BTW: do you implement any means protecting files called common.php against being used as language files directly?
[23:46:43] waawaamilk: I think that's protected already
[23:46:52] waawaamilk: and if not, I'm sure there's a todo in the source about it
[23:48:09] BenBE: k, just asking for it so that language files can use the common.php to store such settings without risking it to be abused a totally screwed language file ...

What's this note about:
This note should tell two things. The first thing is that in order to reduce file size assembler class languagess should make much use of the common.php file to reduce redundancy. Thus keywords should be contained in the common.php file only and each sublanguage (ASM flavour) only copies them from there.

The second is about the organization of languages itself to reflect CPUs and their instruction set differences as detailed as possible.
02-13-07 07:56   
Reminder sent to: Knut


as you are working at the ASM languages I'd ask you to read through this issue and its comments and comment on it. Would be nice to have an answer ASAPto see ifyou agree to me and nigel on this.

02-14-07 10:30   

I mostly agree, but another option might be to have the languages in an asm folder, like:

and so on...

But still have an extension to geshi, which makes "asm" illegal as a langueage, with error message that you must specify asm/*.

And opinions?
02-14-07 10:48   
I'm not a fan of that idea, it means writing extra core code. How do you tell when a language name is illegal, for example? I think the existing infrastructure should just be leveraged where possible.

(I am lazy :) )
02-15-07 05:35   
I'm against this solution with a "virtual" ASM language too, as ASM itself is no language, but a class of languages. Furthermore this solution you suggested requires unnecessary code that could be avoided by organizing the languages in mentioned before using the target platform as language name. Also I oppose it as it creates language identifiers that are unnecessarily long when you think of instruction subsets like asm/x86/amd64/tasm32 - remark: every slash is one directory lookup including a security check!!!