;"Copyright (C) 2018 Keziah Wesley You can redistribute and/or modify this file under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details. You should have received a copy of the GNU Affero General Public License along with this file. If not, see ." ;"CHARTABLE" ;"A CHARTABLE maps reader-chars (Unicode chars optionally prefixed by a !) to properties. All special handling of any character is defined through CHARTABLE properties." ;"Properties are of two varieties: - Lexical classes. Each character is a member of 0 or more classes. Class membership alone determines tokenization. - Parse functions. Every SPECIAL character has an associated function that handles creating any resultant object, including PARSEing any children and matching delimiters, in recursive-descent fashion." ;"Lexical class combinations used by the standard parser: [.] Special non-breaking: (SP) [,] Special breaking: (SP BR) [A] Unspecial: () The behavior of a hypothetical non-special breaker is currently unspecified: e.g. defining * as (BR) may cause *a* to parse as 2 or 3 unspecial tokens." ;"One category is particularly interesting: unspecial tokens, parsed by DEFAULT-HANDLER. The parser for unspecials has to distinguish between atoms and the various numeric encodings, but in all cases its outward behavior is the same: it consumes a non-SP followed by zero or more non-BR, and then pushes some kind of object onto the result stack. It's important to note that this scalar-decoding mechanism is independent of lexical classes, other than those necessary to delimit the sequence--a character may have a different meaning as part of an unspecial token." ;"The current implementation is simple, and reasonably efficient under typical circumstances (lots of low codepoints in input, not too many characters with properties defined). Rules are 'compiled' from an editable format into a query-optimized data structure, which is transparent to the user except that you'll want to use a BULK-UPDATE block if you have many edits to make and want to avoid spending time on unneeded intermediate recompiles." ;"Querying table contents" BREAKER?!-CHARTABLE SPECIAL?!-CHARTABLE DEFAULT-HANDLER!-CHARTABLE ;"Creating / updating tables" CHARTABLE!-CHARTABLE SET-SPECIAL!-CHARTABLE SET-UNSPECIAL!-CHARTABLE SET-DEFAULT!-CHARTABLE BULK-UPDATE!-CHARTABLE ;"Confusion can't %%" % )> <>> ;"Lowest 21-bits: Unicode codepoint" ;"Highest non-sign bit: flag indicating ! prefix" ;"SRC: LIST (rchar)" ;"compiled: range table [start . post]" ;"SRC: LIST (rchar)" ;"compiled: extended range table [start . post . index]" ;"SRC: LIST ((rchar handler) !rest)" ;"compiled: vector of functions, ordered by char" ;"Compiled: points to read table data in editable data structures." ;"Simpler to keep the source data than decompile the range tables." ;"Keep track of BULK-UPDATEs in progress." ;"--- edit operations (SRC tables) ---" ;"TODO: factor out redundancy between all these list-removers. Macro?" ) (ELSE )>> >> ;"(Re-)add to breakers if setting." )>)>> ) (ELSE )>> >>> .char> ) (ELSE )>> >>> )) > )) ;"remove any previous specialness properties and update is-breaker" )> )> > > > ;"--- query operations (compiled tables) ---" ;"Simple linear search implementation." ;"Look up whether the character is a breaker." ) "NAME" act) <>) (> <>) (> T) (ELSE > )>> ;"Look up whether the character is special. If it is, return its handler." ) "NAME" act) <>) (> <>) (> <- .c <3 .specials>>>) (ELSE > )>> ;"--- other ---" ;"Create a new empty CHARTABLE." <>] CTABLE>> 1>> .tab> ;"defer any recompiles to the end of the block" 1>> ,EVAL .stmts> > > ;"Call after modifying the table. Will recompile now, or later if there's a BULK-UPDATE in progress." > ) (ELSE )>> ![!.xs!]>> ) ( .pv>> >)> > > > ;"Given a list of values, return a rangetable: [begin end...]" ![]) (ELSE >)>> ;"Given a list of values, return an extended rangetable: [begin end running-excluded...]" ![]) (ELSE >)>> > 0>) (t .rt) (gaps <- <1 .v> 1>) pv "NAME" act) ;"Begin a span." >> > ;"Advance v until it points to a value beyond the span." > ;"increment prev in advance" .span>)>) (<==? <1 .v> .pv> >) (ELSE .span>)>> ;"End the span. pv is the first value missing. <1 .v> will begin the next span." )> > )> .pv>>> > ;"XXX: Confusion's sort is crashy with predicates." ; <1 .b>>> ![!.xs]>>> ;"XXX: Confusion's sort ignores the second sequence." ;) (vals )) .keys 1 0 .vals> .vals> ;"Given a sequence of (key value) pairs, return a vector of the values sorted by their keys." >)) ;"Sort zipped inputs." .kv 2 0> ;"Gather every 2nd value." >> ) (ELSE >)>> .kv>> ;"Rebuild the table to incorporate any new modifications." )) >> >> >> > % <>> ;)) )>> ;) (ey )) )>> ;)) >> >> > > > > > >> >> > ;"Confusion needs this." <>