BEGIN and END are somewhat like what is above the first %% and below the second ...

kazinator · on Jan 16, 2015

The %% in lex and yacc statically organize the file into different areas. BEGIN and END have run-time semantics: do these things before applying the pattern/actions to the inputs, and do these things afterward.

101914 · on Jan 17, 2015

1. I never mentioned yacc. What relevance does it have to my comment? I typically use (f)lex without yacc/bison to do a similar job as I would use AWK for: text processing.

2. "statically organize the file into different areas" One is a code generator and the other is a scripting language with an interpreter, is that what you mean? In effect, this difference means little to me (except for speed of execution): I store my (f)lex programs as source files that I feed to the (f)lex code generator. Then I compile the generated C code. I store my AWK scripts as source files that I feed to the AWK interpreter. I use both flex and AWK to perform a similar task: text processing.

For whatever it is worth, I get better performance from my compiled flex scanners than from my interpreted AWK scripts. But I sometimes use them for the very same text processing jobs.

AWK:

  BEGIN { define variables }
  pattern-action rules
  END { stuff to do after EOF }

(f)lex:

  { definitions } user variables
  %%
  { rules } pattern-action rules
  %%
  { user routines } stuff to do after EOF

From the blog: "_BEGIN_, which matches only before any line has been input to the file. This is basically where you can _initiate variables_ and all other kinds of state in your script."

From the Lesk and Schmidt: So far only the rules have been described. The user needs additional options, though, to _define variables_ for use in his program and for use by Lex. These can go ... in the _definitions_ section...

From the blog: There is also END, which as you may have guessed, will match after the whole input has been handled. This lets you clean up or do some final output before exiting.

From Lesk and Schmidt: Another Lex library routine that the user will sometimes want to redefine is yywrap() which is called whenever Lex reaches an end-of-file.

I regularly use yywrap in the "user routines" section. It functions much the same way as commands I use in the END section of an AWK script.

I guess one can either focus on differences or similarities. I choose the later.

I care little about the "intended purpose" of a program. I care more about what a program can actually do.

kazinator · on Jan 17, 2015

I know, but both lex and yacc use the %% division in similar ways; that is why I mentioned it.

Simply put, your "definitions" are not stuff that is done before pattern-action rules, and "user routines" are not stuff that is done after EOF. It's all just stuff that is declared. Both sections can contain code, and that code can be called out from the pattern rules. Either section could contain a main function that calls yylex. If the lexer is reentrant, it could be re-entered from any of those places. And so on. Fact is, the %% division has nothing to do with processing order, unlike BEGIN and END in Awk.

101914 · on Jan 17, 2015

%% division can be used to do exactly what BEGIN and END do, and that is how I use it. Moreover, as I recalled correctly, the Lesk and Schmidt paper specifially mentions such usage.

My comment is not referring to the internal behavior of the two programs (as yours is). And the Lesk and Schmidt paper is not setting down hard and fast rules; it is only making suggestions. My commment was about how the two programs can be used to do similar work, i.e., text processing.

If you do a lot of text processing work, at some point AWK is not fast enough. I have other programs I use and flex is one of them. Specifically, scanners (filters) produced with flex.

kazinator · on Jan 19, 2015

I don't disagree that you can put stuff that is done first above the first %%, and then stuff that is done after scanning after the second %%. I just don't think that this makes %% analogous to BEGIN and END. For one thing, stuff can be moved around from one of those sections to the other, without changing the basic organization of the program. For instance, prior to the first %% you can put prototype declarations, and move everything to the bottom.