From: Kaz Wesley Date: Mon, 12 Mar 2018 06:52:52 +0000 (-0700) Subject: initial CHARTABLE and parser rules for Muddle X-Git-Url: https://jxself.org/git/?a=commitdiff_plain;h=da54afbd677beaaf96bb019858502d77b182ddd4;p=muddle.git initial CHARTABLE and parser rules for Muddle Signed-off-by: Kaz Wesley --- diff --git a/stdlib/parser/chartable.mud b/stdlib/parser/chartable.mud new file mode 100644 index 0000000..03fa902 --- /dev/null +++ b/stdlib/parser/chartable.mud @@ -0,0 +1,274 @@ +;"Copyright (C) 2018 Keziah Wesley + +You can redistribute and/or modify this file under the terms of the +GNU Affero General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any +later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public +License along with this file. If not, see +." + +;"CHARTABLE" + +;"A CHARTABLE maps reader-chars (Unicode chars optionally prefixed by + a !) to properties. All special handling of any character is defined + through CHARTABLE properties." + +;"Properties are of two varieties: + - Lexical classes. Each character is a member of 0 or more classes. + Class membership alone determines tokenization. + - Parse functions. Every SPECIAL character has an associated + function that handles creating any resultant object, including + PARSEing any children and matching delimiters, in + recursive-descent fashion." + +;"Lexical class combinations used by the standard parser: + [.] Special non-breaking: (SP) + [,] Special breaking: (SP BR) + [A] Unspecial: () +The behavior of a hypothetical non-special breaker is currently +unspecified: e.g. defining * as (BR) may cause *a* to parse as 2 or 3 +unspecial tokens." + +;"One category is particularly interesting: unspecial tokens, parsed + by DEFAULT-HANDLER. The parser for unspecials has to distinguish + between atoms and the various numeric encodings, but in all cases + its outward behavior is the same: it consumes a non-SP followed by + zero or more non-BR, and then pushes some kind of object onto the + result stack. It's important to note that this scalar-decoding + mechanism is independent of lexical classes, other than those + necessary to delimit the sequence--a character may have a different + meaning as part of an unspecial token." + +;"The current implementation is simple, and reasonably efficient under + typical circumstances (lots of low codepoints in input, not too many + characters with properties defined). Rules are 'compiled' from an + editable format into a query-optimized data structure, which is + transparent to the user except that you'll want to use a BULK-UPDATE + block if you have many edits to make and want to avoid spending time + on unneeded intermediate recompiles." + + + +;"Querying table contents" +BREAKER?!-CHARTABLE +SPECIAL?!-CHARTABLE +DEFAULT-HANDLER!-CHARTABLE + +;"Creating / updating tables" +CHARTABLE!-CHARTABLE +SET-SPECIAL!-CHARTABLE +SET-UNSPECIAL!-CHARTABLE +SET-DEFAULT!-CHARTABLE +BULK-UPDATE!-CHARTABLE + +;"Confusion can't %%" +% )> <>> + +;"Lowest 21-bits: Unicode codepoint" +;"Highest non-sign bit: flag indicating ! prefix" + + + +;"SRC: LIST (rchar)" +;"compiled: range table [start . post]" + +;"SRC: LIST (rchar)" +;"compiled: extended range table [start . post . index]" + +;"SRC: LIST ((rchar handler) !rest)" +;"compiled: vector of functions, ordered by char" + +;"Compiled: points to read table data in editable data structures." +;"Simpler to keep the source data than decompile the range tables." + +;"Keep track of BULK-UPDATEs in progress." + + + + +;"--- edit operations (SRC tables) ---" + +;"TODO: factor out redundancy between all these list-removers. Macro?" + + ) (ELSE )>> + >> + ;"(Re-)add to breakers if setting." + )>)>> + + ) (ELSE )>> + >>> + + .char> ) (ELSE )>> + >>> + +)) + + + + > + +)) + ;"remove any previous specialness properties and update is-breaker" + + )> + )> + > + +> +> + +;"--- query operations (compiled tables) ---" + +;"Simple linear search implementation." + +;"Look up whether the character is a breaker." +) "NAME" act) + <>) + (> <>) + (> T) + (ELSE > )>> + +;"Look up whether the character is special. If it is, return its handler." +) "NAME" act) + <>) + (> <>) + (> <- .c <3 .specials>>>) + (ELSE > )>> + +;"--- other ---" + +;"Create a new empty CHARTABLE." + <>] CTABLE>> + + 1>> + + + .tab> + +;"defer any recompiles to the end of the block" + 1>> + ,EVAL .stmts> > + > + +;"Call after modifying the table. Will recompile now, or later if + there's a BULK-UPDATE in progress." +> ) + (ELSE )>> + + ![!.xs!]>> + + ) + ( .pv>> >)> + > + > + > + + + + +;"Given a list of values, return a rangetable: [begin end...]" + ![]) + (ELSE >)>> + +;"Given a list of values, return an extended rangetable: [begin end running-excluded...]" + ![]) + (ELSE >)>> + +> 0>) + (t .rt) + (gaps <- <1 .v> 1>) + pv + "NAME" act) + ;"Begin a span." + >> + > + ;"Advance v until it points to a value beyond the span." + > ;"increment prev in advance" + .span>)>) + (<==? <1 .v> .pv> >) + (ELSE .span>)>> + ;"End the span. pv is the first value missing. <1 .v> will begin the next span." + + )> + > + )> + .pv>>> + > + +;"XXX: Confusion's sort is crashy with predicates." +; <1 .b>>> ![!.xs]>>> +;"XXX: Confusion's sort ignores the second sequence." +;) (vals )) + .keys 1 0 .vals> + .vals> + +;"Given a sequence of (key value) pairs, return a vector of the values + sorted by their keys." + >)) + ;"Sort zipped inputs." + .kv 2 0> + ;"Gather every 2nd value." + >> ) + (ELSE >)>> .kv>> + +;"Rebuild the table to incorporate any new modifications." +)) + >> + >> + >> + > + +% <>> + +;)) )>> +;) (ey )) + )>> + +;)) + + >> + >> + > + > + + > + > + > + + >> + >> + > + +;"Confusion needs this." +<> diff --git a/stdlib/parser/parse.mud b/stdlib/parser/parse.mud new file mode 100644 index 0000000..6268d7d --- /dev/null +++ b/stdlib/parser/parse.mud @@ -0,0 +1,322 @@ +;"Copyright (C) 2018 Keziah Wesley + +You can redistribute and/or modify this file under the terms of the +GNU Affero General Public License as published by the Free Software +Foundation, either version 3 of the License, or (at your option) any +later version. + +This file is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Affero General Public License for more details. + +You should have received a copy of the GNU Affero General Public +License along with this file. If not, see +." + + + + + +parse!- +PARSE-TABLE!- + +DEFAULT-TABLE!-PARSER + +read-all!-PARSER + +% )> <>> + +;"Operation: + Input string is provided to parser. Parser consumes from input and updates state. + Beginning and end of input will be treated as token boundaries. + Parser will return producing a value after reading 1 complete expression + (possibly leaving some input leftover)." + +;"If PARSE receives a partial input, it emits an ERROR. READ installs a + handler for the ERROR INTERRUPT that attempts to get more data and + ERRET." + + +> +> > >> +>> > + <3 .x>)> +>> >>> +>>> +)) + + >> )> + .c> +;"TODO: eliminate this and rely on peeking" + <1 >>> 1>>)> + >>> + +> )> + > + > + ) + (ELSE .t .c .s .a>)>> + +;"BLOCK is documented as modifying the value of .OBLIST, so I'm + assuming passing a lookup param to READ has a similar effect rather + than setting an internal variable." + +>> + +> + > ) + (ELSE > + ) (ELSE .res)>)>>>> + +;"Don't try to READ in objects of these types." + + +;"TODO: set appropriate ERROR as closer's EVALTYPE." + +>> +>> +>> + >) (ELSE )>> + +;"Create the appropriate ERROR if there's a brace mismatch." +;"Otherwise do nothing. Return value is ignored." + + .closers>) + ;"TODO: position info" + (ELSE + EXPECTED + FOUND >)>)>) + (<==? .c EOF> .pos>)>> + +;"TODO: accept radix in PARSE" + + +;"Create a standard Muddle CHARTABLE." +)) + >> ;"HT" + >> ;"LF" + >> ;"VT" + >> ;"FF" + >> ;"CR" + >> ;"space" + <> >>> + T >>> + > <> > SEGMENT>>> + > T > SEGMENT>>> + T >>> + T >> + T > > )> + >>> + T )) .type>>> + > T >> + <> >> + + T )) + > CHARACTER>>) + ( CHARACTER>>) + (<==? .c %> ) + (<==? .c 10> >) + (ELSE >)>>>>> + + ;"TODO: macros for reducing the redundancy of all these openers and closers." + + ;"A closer is an illegal object for anything except the corresponding opener." + > T > closer>>> + T closer>>> + T closer>>> + T closer>>> + >> T >> closer>>> + > T > closer>>> + > T > closer>>> + > T > closer>>> + + T > + )) + .s > ) + (ELSE )>>>>> + + T > + )) + .s >> ) + (ELSE )>>>>> + + T > + )) + .s > ) + (ELSE )>>>>> + + > T > + )) + > .s > > ) + (ELSE )>>>>> + + > T > + )) + > .s >> >> ) + (ELSE )>>> SEGMENT>>> + + ;"The prototemplate returned is an illegal object for anything except #-syntax." + T > + )) + .s > ) + (ELSE )>>> prototemplate>>> + + ;"Plain backslash just gets default handling in the lexer. It's only + special to the scalar sublexer." + + >> + +;"Lookup a string in an OBLIST / OBLIST list. If it isn't found, + insert it into the OBLIST / head of the OBLIST list." + >) + (ELSE )) + )>> .os> + >>)>> + +;"TODO: BLOCK..." +> + )) + ) + (<==? .k %>> ) + ( CHARACTER>>) + (ELSE >)>>>> + ;"Gather trailer-separated PNAMEs into a stack." + )> + ;"Now we have a LIST of STRINGs containing our trailed PNAMEs, with + the last still in .pn. The final PNAME is special: because it has + no trailer, we look it up in our oblist list." + + ;"Special case: we have an OBLIST, not an ATOM identifying an OBLIST." + .atm> .atm>>> + >) + (ELSE >)> + )> + ;"Lookup each atom in the oblist denoted the previous (trailer)," + ;"except the last, which is our ATOM." + + > + >>> + .atoms>> + +) c (n 0) (havedigits <>) "NAME" oct) + > + ) + ;"Made it to *?" + (<==? .c %> + .a>) + (ELSE )>) + ;"Got another octit?" + ( .c> >> + > >> + + ) + ;"Not octal after all." + (ELSE )>> + +;"If current radix isn't 10, pre-scan to see whether there's a decimal point." + 10) + (ELSE + ) "NAME" rdx) + > + ) + ;"Docs aren't clear on exponent-notation fixes without decimal points." + ;"Decimal makes more sense..." + (<==? % .c> ) + (<==? % .c> ) + (<==? % .c> )>>)>> + +) (n 0) (sgn 1) (z ) (rad )) + > ) + (ELSE )> + > + + .a>) + (ELSE .a>)>) + ;"Got a digit?" + ( .c> .rad>>> + <- .c %>>> + ) + ;".?" + ;"[eE]?" + ;"Not a decimal." + (ELSE .a>)>>> + +;"The Scalar Sublexer." +;"Parse an object that begins with a non-special character: FIX/FLOAT/ATOM." +> .c> >>> + ) + (ELSE )>> + +;"Fallback to use if PARSE-TABLE!- is empty and no argument is provided." +> + + +% <>> + +;"Like MAPF ,AND with early termination." + > .all>)>> !.argses> + T> + +;"This belongs in a PRINT module. Here for testing." + > ) + ( <==? <1 .x> LVAL> <==? <1 .x> GVAL>>>> + > >) + (ELSE )>> +) + ;"TODO: more aggressive single-lining" + (> + + <==? <1 .y> LVAL> + <==? <1 .y> GVAL>>> .x> + > + > >) + (ELSE + .lvl> + > >)>> +>> + +> + >>> + > + > + +