注册 登录  
 加关注
查看详情
   显示下一条  |  关闭
温馨提示!由于新浪微博认证机制调整,您的新浪微博帐号绑定已过期,请重新绑定!立即重新绑定新浪微博》  |  关闭

蒙奇D小豌豆的博客

蒙奇D小豌豆的学习记录

 
 
 

日志

 
 

Compiler principle  

2011-12-02 21:27:15|  分类: others |  标签: |举报 |字号 订阅

  下载LOFTER 我的照片书  |

Most of compilers contain  three parts. These are lexical-analyzer-generator, parser-generator,  optimizer.

 1. lexical-analyzer-generator

It is responsible for reading the input character from the source code or input command line, make them form the morpheme,  scan the morpheme through the pattern  to generate  lexical-unit which is used for parser-generator.

Morpheme: one characters sequence is input  from source code

Pattern: describe all the formations of  morpheme of a kind of lexical-unit

Lexical-unit: it contains a symbol and an optional  property value. Many morphemes may belong to one symbol and have different property value.

Action: when a morpheme match a kind of pattern, perform the action. action contains some c syntax and usually return a symbol  and the property value  to form the lexical-unit.

 

property value of lexical-unit:the morpheme 0 and 1 belong to lexical-unit symbol number .so the property value is used for  distinguishing the two different morpheme.

 

 lexical-analyzer-generator tool  Flex:

$ flex example.l  

Generate a lex.yy.c file.

 Flex file: it contains three parts. Separated by the %% symbol.

  (definitions)

%%

 (rules)

%%

 (user code)

 definitions section:

it defines some c header file variable and  fragment of morpheme in the regular expressions way(name definition).

Example:

%{

#include <ctype.h>

#include <string.h>

%}

All the c type definitions should use %{ %} symbols to wrap. It makes sure them can copy to the  lex.yy.c file.

DIGIT [0-9]: some fragment of morphemes form a morpheme which used for rules section.

  

rules section:

it contains pattern and action. Pattern is a regular expressions. action includes some c syntax. when a Morpheme match a pattern, then perform the action. the action should be wraped by symbol { and }.

Example:  input a num which match the DIGIT pattern , so the action return NUM symbol and  property value(yylval is a union contains all the symbol’s property value type) to the parser-generator.

{ DIGIT }                               { yylval.i = stoi((char *)yytext); return NUM; }

 

user code section:

all the code in this section will be copy to lex.yy.c file. It usually contains some assist function

Compiler principle - CR7 - CR7的博客

                        

 2. parser-generator

There are two kinds of parsing way. One is LR parsing(bottom-up parsing),another is LL parsing(top-down parsing).LR is used in automation parser such as bison. LL is used in manual parser.

Content-Free Grammar

a context-free grammar (CFG) is a formal grammar in which every production rule is of the form

V : NUM

where V is a single non-terminal-symbol, and w is a string of terminal-symbol and/or non-terminal-symbol (w can be empty).

non-terminal-symbol: syntax variables, it also can have a property value type

terminal-symbol: it is a kind of lexical-unit(symbol has a property value type)  defined in the parser. 

 

Example: we get every lexical-unit from lexical-analyzer through symbol and property value.but the token(symbol) is defined in parser.

 

%union {

                                 int i;

                                 ……

}

        %token  NUM

        %type   <i> NUM

       %type   <i>  V

 

 

parser-generator tool  Bison

$bison test.y

bison will generate test.tab.c file

 

bison file: it contains three parts. Separated by the %% symbol.

 

 (definitions)

%%

 (rules)

%%

 (user code)

 

definitions section:

it defines some c header file variable and define non-terminal-symbol ,terminal-symbol and the usymbol type union.

Example:

%{

#include <ctype.h>

#include <string.h>

%}

 

%union {

                                 int i;

                                 ……

}

 

%token  NUM

 %type   <i> NUM

 %type   <i>  V

 

All the c type definitions should use %{ %} symbols to wrap. It makes sure them can copy to the  lex.yy.c file.

 
 rules section:

it contains content-free grammar and action.  action includes some c syntax. when a terminal-symbol input from the lexical and match one rules , then perform the action. the action should be wraped by symbol { and }.

 

user code section:

all the code in this section will be copy to lex.yy.c file. It usually contains some assist function

  评论这张
 
阅读(395)| 评论(0)

历史上的今天

评论

<#--最新日志,群博日志--> <#--推荐日志--> <#--引用记录--> <#--博主推荐--> <#--随机阅读--> <#--首页推荐--> <#--历史上的今天--> <#--被推荐日志--> <#--上一篇,下一篇--> <#-- 热度 --> <#-- 网易新闻广告 --> <#--右边模块结构--> <#--评论模块结构--> <#--引用模块结构--> <#--博主发起的投票-->
 
 
 
 
 
 
 
 
 
 
 
 
 
 

页脚

网易公司版权所有 ©1997-2018