Monadic Parsing Theory2
What is parsing?
Define grammer;
Takes input string;
Generate data with structure you want.
What do we need?
LL, LR, LALR, CYK? No.
Manually? No.
YACC, Antlr? No.
Monadic parsers & combinations! Yes.
Simple:
type Parser a = String -> [( a , String )]
Parser abstraction and Monads:
;; takes any input and "consume" first char from it
( defn any [ input ]
( if ( empty? input ) ' ()
( list [( first input )
( apply str ( rest input ))])))
;; this one doesn't accept any input
( defn failure [ _ ] ' ())
( defn parse [ parser input ]
( parser input ))
( defn parse-all [ parser input ]
( ->> input
( parse parser )
( filter # ( = "" ( second % )))
ffirst ))
;; builds parser that always returns given element without consuming (changing) input
( defn return [ v ]
( fn [ input ] ( list [ v input ])))
;; takes parser and function that builds new parsers from (each) result of applying first one
( defn >>= [ m f ]
( fn [ input ]
( ->> input
( parse m )
( mapcat ( fn [[ v tail ]] ( parse ( f v ) tail ))))))
( defn merge-bind [ body bind ]
( if ( and ( not= clojure.lang.Symbol ( type bind ))
( = 3 ( count bind ))
( = '<- ( second bind )))
` ( >>= ~ ( last bind ) ( fn [ ~ ( first bind )] ~ body ))
` ( >>= ~ bind ( fn [ ~ '_ ] ~ body ))))
( defmacro do* [ & forms ]
( reduce merge-bind ( last forms ) ( reverse ( butlast forms ))))
Basic Parsers and Combinators
( defn sat [ pred ]
( >>= any ( fn [ v ] ( if ( pred v ) ( return v ) failure ))))
;; just a helper
( defn char-cmp [ f ]
( fn [ c ] ( sat ( partial f ( first c )))))
;; recognizes given char
( def match ( char-cmp = ))
;; rejects given char
( def noneOf ( char-cmp not= ))
;; just a helper
( defn from-re [ re ]
( sat ( fn [ v ] ( not ( nil? ( re-find re ( str v )))))))
;; recognizes any digit
( def digit ( from-re # "[0-9]" ))
;; recognizes any letter
( def letter ( from-re # "[a-zA-Z]" ))
;; (ab)
( defn and-then [ p1 p2 ]
( do*
( r1 <- p1 )
( r2 <- p2 )
;; xxx: note, that it's dirty hack to use STR to concat outputs
;; Full functional implementation should use MonadPlus protocol
( return ( str r1 r2 ))))
;; (a|b)
( defn or-else [ p1 p2 ]
( fn [ input ]
( lazy-cat ( parse p1 input ) ( parse p2 input ))))
( declare plus )
( declare optional )
;; (a*)
( defn many [ parser ] ( optional ( plus parser )))
;; (a+) equals to (aa*)
( defn plus [ parser ]
( do*
( a <- parser )
( as <- ( many parser ))
( return ( cons a as ))))
;; (a?)
( defn optional [ parser ] ( or-else parser ( return "" )))
What is INI file?
The INI file format is an informal standard for configuration files for some platforms or software.
INI files are simple text files with a basic structure composed of “sections” and “properties”.1
It is human-readable and simple to parse, so it is a usable format for configuration files that do not require much greater complexity.
The basic element contained in an INI file is the key or property. Every key has a name and a value, delimited by an equals sign (=). The name appears to the left of the equals sign.
Keys may (but need not) be grouped into arbitrarily named sections. The section name appears on a line by itself, in square brackets ([ and ]).
All keys after the section declaration are associated with that section.
There is no explicit “end of section” delimiter; sections end at the next section declaration, or the end of the file.
Sections may not be nested.
Case Not Sensitive
Comments
Semicolons (;) at the beginning of the line indicate a comment. Comment lines are ignored.
Example
; Development configuration
[database]
host = localhost
port = 3306
db = test_vagrant9200
user = eye
passwd = sauron
What do we want to get?
{
:database {
:host "localhost" ,
:port "3306" ,
:db "test_vagrant9200" ,
:user "eye" ,
:passwd "sauron"
}
}
INI Rule
From wikipedia ini definition:
data Property = Property String String deriving Show
data Section = Section String [ Property ] deriving Show
In Clojure we can use records to represent data types:
( defrecord property [ key value ])
( defrecord section [ name properties ])
Now lets define property
and section
parsers:
( def property-parser
( do*
( k <- ( many ( noneOf "[" )))
spaces
( match "=" )
spaces
( v <- ( many ( noneOf "\n" )))
spaces
( return ( Property. ( trim ( apply str k )) ( trim ( apply str v )) ))))
( def section-parser
( do*
spaces
( match "[" )
( name <- ( plus ( noneOf "]" )))
( match "]" )
spaces
( properties <- ( plus property-parser ))
( return ( Section. ( apply str name ) properties ))))
( parse-all property-parser "port = 3306 " )
;= #user.Property{:key "port", :value "3306"}
( parse-all section-parser "
[database]
host=127.0.0.1
port=3306
db=test_vagrant9200
user=eye
passwd=sauron
" )
;= #user.Section{:name "database", :properties (#user.Property{:key "host", :value "127.0.0.1"} #user.Property{:key "port", :value "3306"} #user.Property{:key "db", :value "test_vagrant9200"} #user.Property{:key "user", :value "eye"} #user.Property{:key "passwd", :value "sauron"})}
Resource
Monadic Parsing in Python by Alexey Kachayev