Frankenstella, a PHP compiler
Friday, September 03, 2004
 
Parser Generators - For Posterity and all that
Firstly, it must be said that there is no substitute for the Dragon Book. Having said that, reading the Dragon Book cover to cover is no way to go either. It helps to try out stuff at a Black Box level rather than delve into the mechanics of, say Lex & Yacc.

What follows would only be helpful after one has at least Chapter 2 Knowledge of the Dragon Book.

The Free Compiler Construction Tools gives a one stop shop for all your opensource, and free parser generators.

ANTLER is probably the most widely used Parser Generator, and is an improvement over the lexx/yacc combo in 2 ways
1. It uses EBNF style grammars
2. The code generated is more intuitive and understandable. The messy numeric states given by lex/yacc are thankfully missing.

The Antlr link given above point to some pretty nifty resources as well.

This article gives a discussion on parser generators that is heavily, biased towards ANTLR, but gives good insight into the playing field.

This article on Compiler Front End and Infrastructure Software gives a rundown on infrastructures aimed at research applications as well as commercial compiler construction toolkits.

incidentally, ANTLR is a second generation compiler toolkit evolving from the older PCCTS

The techniques used in the Parser can be demystified somewhat by using Visual YACC.

For those who want to experiment with the Grandfather of all toolkits, A Compact Guide to Lex & Yacc is the way to go.

That's a distillation of the links that would give you an overall idea of what is out there in terms of compiler toolkits.But all this is relevant only to the front end of a compiler. The back end code generation is much more difficult to get at, though it is less complicated. But that will follow later.
Friday, July 23, 2004
 
structure of zend_op_array
the things that seem important to us:
this is the biggy, this where are all the goodies are hidden!
note that although vld_dump_op_array prints everything out nicely, it calls vld_dump_op to do the real work which accesses this.

definition:
  •        opcode -> name of op
  •        result -> the result, duh
  •        op1 -> first op
  •        op2 -> second op, double duh
  •        extended_value -> if it exists
  •        lineno -> need i say it?

Monday, July 12, 2004
 
Hack for user defined functions & classes
At original incission point in zend.c, right after the vld_dump_oparray
zend_hash_apply (CG(function_table), (apply_func_t) vld_dump_fe TSRMLS_CC);
zend_hash_apply (CG(class_table), (apply_func_t) vld_dump_cle TSRMLS_CC);

# include zend_hash.c (maybe this isn't really neccessary)

Take srm_oparray.c and include definitions for
int vld_check_fe (zend_op_array *fe, zend_bool *have_fe TSRMLS_DC);
int vld_dump_fe (zend_op_array *fe TSRMLS_DC);
int vld_dump_cle (zend_class_entry *class_entry TSRMLS_DC);

don't forget to declare them in srm_oparry.h

Issues
1.had to take the 3 static declarations out, don't know why? maybe i should not have them declared in the header.
2.This only gives user defined functions.... What about the built in functions and ....??


Thursday, July 08, 2004
 
Big Picture Again
An unfortunate spate of googling has uncovered the ugly fact that various twisted guys have even more convoluted notions on making a PHP encoder. It will be worth your while to summarise the methods for getting at bytecode.

Whole thing was a bit off-putting so i closed the google windows and hence have lost all refs. Good luck.

oh that url btw,
http://www.coggeshall.org/projects.php
 
Ugly Hack for byte code
Used Derik's VLD code
1. Include srm_oparray.c & srm_oparray.h in zend source files
2. The fella u want is vld_dump_oparray in srm_oparray.c - so plug him into zend.c

Derik has used the not so good method of #include php.h for everthing, which causes heaps of problems, possibly since the config has changed soo, I spent a luvrly 2 hours figuring out the exact refrences for the zend constants he has used

it turns out that only zend.h zend_compile.h zend-types.h is needed, so include the fellas instead of php.h. in both srm_oparray.c & srm_oparray.h.

Don't forget to include srm_oparray.h in zend.h

VLD uses an xtension - ext/standard/url.h for string procesing which is needed for var names and such. This is problematic..

not any more .....

lessons u can avoid learning -

1.It's bad to have duplicate definitions of functions - really really bad......
2. when some one has gone to the trouble of making efree, respect him, and use it!

yet another ugly hack - renaming php_url_encode(_hack) & hexchars(2)[].... wot to do
Sunday, June 27, 2004
 
Maybe Not... Chance: Advance to byte code
http://www.derickrethans.nl/vld.php
Thursday, June 24, 2004
 
back to square one
sighhhh... it's not workin :'(

aaaaarrghaarghhhhhh crap crap crap!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! < tears out hair >

it seems as if the zend.c file is not involved whatsoever!
forget printf("%s",bc_op_array->type); for starters
-it's not even seeing my lovely printf("can u seee meeeee"); statement

modifications were done and the CLI version was compiled. filled with anticipation, i get nothing. N-O-T-H-I-N-G. it just obliviously carried on as if i never even surgically altered its very source.

horrible horrible PHP developers...
Tuesday, June 22, 2004
 
Altering zend.c : Getting PHP bytecode
1. zend.c calls the compile_file function
2. php bytecode is returned
3. stored in active_op_array

print active_op_array to get the bytecode.

active_op_array is a struct.
next step: find where active_op_array is defined so it can be printed.

ok, so active_op_array is defined in zend.hmore variable types such as zend_uchar can be further clarified in zend_types.h

Powered by Blogger