Modulo:mlawc
Salti al navigilo
Salti al serĉilo
|
Memtesto disponeblas sur la dokumentaĵa subpaĝo |
--[===[ MODULE "MLAWC" (language and word class) "eo.wiktionary.org/wiki/Modulo:mlawc" <!--2021-Mar-10--> "id.wiktionary.org/wiki/Modul:mlawc" Purpose: shows the lemma in bold text and brews 2 or 3 tooltip texts and 3 or 5 locally invisible category includes from language code and 1 or 2 word class codes, creates 2...5 invisible "anchors" for linking to section, optionally splits a multiword lemma into links to the parts, optionally categorizes the parts Utilo: montras kapvorton en grasa tiparfasono kaj generas 2 aux 3 musumkonsilajn tekstojn kaj 3 aux 5 loke nevideblajn kategorienmetojn el lingva kodo kaj 1 aux 2 vortospecaj kodoj, kreas 2...5 nevideblajn "ankerojn" por ligado al sekcio, opcie disigas plurvortan kapvorton al ligiloj al la partoj, opcie kategoriigas la partojn Manfaat: memperlihatkan lema dengan teks tebal dan membuat 2 atau 3 teks tooltip dan 3 atau 5 masukan kategori tak terlihat secara setempat dari kode bahasa dan 1 atau 2 kode kelas kata, membuat 2...5 jangkar yang tidak terlihat untuk pranala ke bagian, juga bisa memotong lema beberapa kata menjadi pranala ke bagiannya, juga memungkinkan mengategorikan semua bagian ini Syfte: visar uppslagsordet med fet stil och skapar 2 eller 3 tooltiptexter och 3 eller 5 lokalt osynliga kategoriinlaeggningar fraan ... Used by templates / Uzata far sxablonoj / Digunakan oleh templat: - livs (EO) , bakk (ID) Required submodules / Bezonataj submoduloj / Submodul yang diperlukan: - "mpiktbllki" in turn requiring "mbllingvoj" (EO) or "mtblbahasa" (ID) - "msplitter" - "mtbllingvoj" in turn requiring template "tbllingvoj" (EO) - "mtblbahasa" in turn requiring template "tblbahasa" (ID) This module can accept parameters whether sent to itself (own frame) or to the caller (caller's frame). If there is a parameter "caller=true" on the own frame then that own frame is discarded in favor of the caller's one. Empty parameters and parameters longer than 120 octet:s are inherently invalid (#E02), further checks follow. Incoming: - 2 anonymous obligatory parameters (one of them can be "??" but NOT both) - language code (2 or 3 lowercase letters, use "??" if unknown) - word class code (2 UPPERCASE letters, use "??" if unknown) or 2 word class codes (4 UPPERCASE letters, no "??" then) - 1 or 4 named optional parameters (depends on semi-hardcoded configuration) - "dst=" (2...40 octet:s) distinction hint for word class, for example "koleg+o", "kol+eg+o", "fleksia bana+n", "baza bana", "en", "ett" (all brackets prohibited, apo "'" prohibited, OTOH plus "+" permitted and recommended) not showed but built in into the "anchor" (related to word class, not to compound split) - "fra=" (1...120 octet:s) split control string, pagename is always used as lemma, if it is multiword then it is split automatically, this parameter can request assisted automatic split or manual split or no split, can generate #E07 #E08 #E09 if faulty, this parameter is NOT supported (thus fully ignored) if splitting or even showing the lemma is deactivated in the source code, see below and "spec-splitter-en.txt" for details - "ext=" extra parameter for additional compound categories, can contain 1...4 fragments of type F210 only (":" or "!", no "L"), see "spec-splitter-en.txt" for details, or "&"-syntax - "scr=" script code, one uppercase letter, copied to cat name as-is, bypasses the splitter and is added to its output - 3 hidden parameters - "pagenameoverridetestonly=" can cause #E01 - "nocat=" no error possible - "detrc=" no error possible Returned: - one string intended to be showed alone in a line below h3-heading, consisting of: - the word in bold and enclosed in <bdi>...</bdi> - space - short summary with word classes - 1 word class (example: "( sv , VE )") with 2 tooltips (example: "Bahasa: Swedia (svenska)" and "Kelas kata: verba (kata kerja)") and 2 invisible anchors and 3 base categories or - 2 word classes (example: "( sv , VE , GR )") with 3 tooltips and 3 invisible anchors and 5 categories - up to 18 optional compound categories This module is unbreakable (when called with correct module name and function name). Every imaginable input from the caller and from the imported modules will output either a useful result or at least a helpful error string. Cxi tiu modulo estas nerompebla (kiam vokita kun gxustaj nomo de modulo kaj nomo de funkcio). Cxiu imagebla enigo de la vokanto kaj de la importataj moduloj eldonos aux utilan rezulton aux almenaux helpeman eraranoncan signocxenon. Following errors are possible: - <<#E01 Internal error in module "mlawc">> Possible causes: - strings not uncommented - function "mw.title.getCurrentTitle().text" AKA "{{PAGENAME}}" failed - pagename is invalid such as empty or too long or contains invalid brackets []{} or more than 1 consecutive apo, even if coming from "pagenameoverridetestonly=" - <<#E02 Erara uzo de sxablono "livs", legu gxian dokumentajxon>> Possible causes (early detected obvious problems with parameters): - less than 2 or more than 3 parameters, or holes - empty parameters or parameters longer than 120 octet:s - <<#E03 Eraro en subsxablonoj uzataj far sxablono "livs">> Possible causes: - submodule failure (or not found ??) - the 2 required columns c0 and c1 (soon c0 and c2) are missing or returned "=" (but "-" is tolerable in c2) - <<#E04 Evidente nevalida lingvokodo en sxablono "livs">> - <<#E05 Nekonata lingvokodo en sxablono "livs">> - <<#E06 Erara uzo de sxablono "livs" pro vortospeco>> Possible causes (later detected more clandestine problems with parameters): - invalid word class code - "??" used inside 4-char string - both language and word class given as "??" - <<#E07 Erara uzo de sxablono "livs" pro "fra=" apartigo>> Possible causes (later detected more clandestine problems with parameters): - split control parameter is faulty (assi or manu, excl "sum check", see below and spec) - <<#E08 Erara uzo de sxablono "livs" pro pagxonomo por "$S" "$H">> - "$S" used with wrong pagename (must end with "a"..."z") - "$H" used with wrong pagename (must not contain spaces mm) - <<#E09 Erara uzo de sxablono "livs" pro "sumkontrolo">> Possible causes: - "sum check" failure with manual split - <<#E11 Erara uzo de sxablono "livs" pro "dst=" distingo>> Possible causes (later detected more clandestine problems with parameters): - distinction hint parameter is faulty - <<#E13 Erara uzo de sxablono "livs" pro "ext=" kroma parametro>> Possible causes (later detected more clandestine problems with parameters): - extra parameter is faulty - <<#E14 Erara uzo de sxablono "livs" pro "scr=" skriba parametro>> Possible causes (later detected more clandestine problems with parameters): - script parameter is faulty (not one uppercase letter) The 25 word classes are: Main big classes (3): - SB noun - substantivo (O-vorto) - nomina (kata benda) - VE verb - verbo (I-vorto) - verba (kata kerja) - AJ adjective - adjektivo (A-vorto) - adjektiva (kata sifat) Further smaller classes (12): - PN pronoun - pronomo - pronomina (kata pengganti) - NV numeral - numeralo (nombrovorto) - numeralia (kata bilangan) - AV adverb - adverbo (E-vorto) - adverbia (kata keterangan) - PV verb particle (EN,SV) - verbpartiklo - partikel verba - QV question word - demandvorto - kata tanya - KJ coordinator - konjunkcio - konjungsi - SJ subordinator - subjunkcio (subfrazenkondukilo) - subjungsi (pengaju klausa terikat) - PP preposition - prepozicio (antauxlokigita rolvorteto) - preposisi (kata depan) - PO postposition (EN,SV) - postpozicio - postposisi (kata belakang) - PC circumposition (SV) - cirkumpozicio - sirkumposisi - AR article (EN,EO,SV) - artikolo - artikel (kata sandang) - IN interjection - interjekcio - interjeksi Nonstandalone elements (5): - PF prefix - prefikso - prefiks (awalan) - UF suffix - sufikso (postfikso, finajxo) - sufiks (akhiran) - KF circumfix - cirkumfikso (konfikso) - sirkumfiks (konfiks) - IF infix - infikso - infiks (sisipan) - NR nonstandalone root - nememstara radiko - akar kata terikat (prakategorial) Misc (2): - KA sentence - frazo - kalimat - KK character - signo - karakter Additional classes (3) : - KU abbreviation - mallongigo (kurtigo) - singkatan (abreviasi) - GR group of words - vortgrupo - kumpulan kata - TV table word - tabelvorto - kata tabel Class "NR" is exclusive and may NOT be combined with anything else (violation gives #E06). It affects the "$S" simple bare root split. Class "KA" is almost exclusive and may NOT be combined with anything other than "KU" (violation gives #E06). It is also special in that it affects morpheme cat:s (changes them from "vortgrupo" to "frazo") if they are enabled. Here we do NOT care about the "base word" property, it is categorized by module "tagg" / "k" instead. Similarly we do not care about "kofrovorto", "blandajxo", "derivajxo de tabelvorto" here. And we do NOT care about "Proverbo" (subclass of KA) and "Esprimo" (subclass of GR) either. We theoretically could autodetect the word classes KA and GR but don't. The chief trouble with autodetecting KA are some multiword abbreviations beginning with uppercase and ending with a dot, GR is probably less problematic. Still both would cause several problems: * how to override or suppress autodetection * how many word classes are permitted at same time given that an additional one can be autodetected List of 6+1+1+1 selectable morpheme types: C circumfix cirkumfikso I infix infikso (EO: -o- -et- -il- ...) M standalone root memstara radiko (EO: tri dek post ...) N nonstandalone root nememstara radiko (EO: fer voj ...) P prefix prefikso U suffix sufikso (postfikso, finajxo, EO: -a -j -n) ------- W word vorto ------- L same as "N" but changes linking behavior (only in F210) ------- X only after "&" in the extra parameter (convert it to 1 or 2 fragments) Note that 5 of those 9 are also word classes, but "M" and "W" aren't and reasonably shouldn't be. These mortyp:s can be used in the split control parameter before colon ":" with manual split, and in the extra parameter, but then "L" is prohibited (thus C I M N P U W are left plus maybe X), either after "&", or in fragments before ":" or "!" (see "spec-splitter-en.txt" for syntax details). We put only the letter symbol into the category name (except for the type word) as it otherwise would become unreasonably long. It must contain 3 pieces of information: - language (consider "-an" in SV and ID) - "mortyp" (consider "-an" and "an-" and "an" in SV) - the morpheme / affix / word itself Categories: There are obligatory base categories constructed from language and word class (3 or 5), and optional compound (morpheme) categories (1...18) that can arise from the fragments generated by the splitter if requested so. Structure of categories from both those groups is defined by "contabkatoj" (see submodule). EO: Kategorio:Kapvorto (angla) Kategorio:Kapvorto (Esperanto) Kategorio:Verbo Kategorio:Verbo Kategorio:Verbo (angla) Kategorio:Verbo (Esperanto) ID: Kategori:Kata bahasa Indonesia Kategori:Nomina Kategori:id:Nomina Notes: - we auto-remove the part of word class in brackets and auto-adjust the letter case, thus "adverbo (E-vorto)" becomes "Adverbo" or "nomina (kata benda)" becomes "Nomina" - "angla" is lowercase when in brackets, but begins uppercase when separate (pagename in category namespace), we can auto-adjust the letter case as needed Anchors: * Qsekt-en (lang only) * Qsekt-en-SB (lang and word class) (2 such created if 2 word classes) * Qsekt-sv-SB-ett (lang and word class and hint) (2 such created if 2 word classes) With 1 word class we brew 2 or 3 anchors. With 2 word classes we brew 3 or 5 anchors. With the hint provided we brew both a category without and with it built in. There are 2 ways to brew "anchors" in HTML: * <span id="tujuh"></span> HTML5 and works from wikitext, used here * <a name="tujuh"></a> HTML2 but does NOT work from wikitext, showed as plain text Semi-hardcoded configuration in the source: * "constrmainctl" type string 2 digits : * show image (0 or 1) the image is in "contabscrmisc[1]" * show lemma (0 none 1 raw 2 maybe split 3 maybe split and morpheme cat:s) * "conboomiddig" type "boolean" : * "true" to allow middle digit "s7a" in lng codes The splitter (see "spec-splitter-en.txt" for syntax details): The base split strategies available (selectable with the "fra=" split control parameter, var "numsplit") are: - #S0 automatic multiword split (default if splitter active) - #S1 assisted split - #S2 manual split - #S3 simple root split - #S4 simple bare root - #S5 large letter split - #S6 reserved - #S7 no split (only choice if splitter inactive) It is possible to deactivate (semi-hardcoded configuration in the source code of this "mlawc") only compound categories, or to deactivate the splitter resulting in the raw lemma showed without linking, or to deactivate showing the lemma altogether, in both latter cases the splitter is inactive and the submodule "msplitter" is not called at all. Some or all of the parameters "fra=" "ext=" "scr=" are NOT supported then (thus fully ignored, no error can arise from them). The "fra=" split control parameter is subject to strict prevalidation (unless the splitter is inactive) and can generate #E07. For manual split the prevalidation includes the "sum check" against the pagename that can give #E09. Later when the split is carried out no error can occur anymore, possible problems (with assisted split) are safely ignored instead. The automatic multiword splitter ("numsplit" = 0 and "lfsplitaa" "msplitter" "qsplitter") is fully automatic and the 2 tables "tabblock" and "tablinx" must be empty then. No error can occur here, but there is risk for a failure that no split boundaries can be applied, and the output is identical to the input. The assisted splitter ("numsplit" = 1 and "lfsplitaa" "qsplitter") is controlled by 2 prevalidated tables generated from the "fra=" parameter. * Table "tabblock" contains up to 16 values indexed by integers 0 to 15, value type string "1" means do block, type "nil" means do not block (the default). Other values should not occur and evaluate to do not block like "nil" does. * Table "tablinx" contains up to 16 values indexed by integers 0 to 15, value: * type string: * "N" or "I" or "A" (as described in "spec-splitter-en.txt") * colon ":" followed by the link target (length 1...40 octet:s NOT checked anymore here) Beginning char other than "N" or "I" or "A" or ":" should not occur and evaluates to do nothing unusual like "nil" does. * type "nil" means do nothing unusual (the default) No error can occur in the assisted splitter, but there is risk for a failure that no split boundaries can be applied, and the output is identical to the input. The manual splitter ("numsplit" = 2 and "lfsplitmn" "qsplitter") is controlled by one prevalidated table generated from the "fra=" parameter, the pagename does not even enter the split process. * Table "tabmnfragments" contains 1 to 16 strings indexed by integers 0 to 15, one string for every fragment. The 5 legal types are: * F000 : no brackets, no colon, no slash (visible text no link) * F200 : 2 brackets, no colon, no slash (combo target visible text) * F201 : 2 brackets, no colon, 1 slash (target / visible text) * F210 : 2 brackets, 1 colon, no slash (mortyp : combo target visible text) * F211 : 2 brackets, 1 colon, 1 slash (mortyp : target / visible text) No error can occur in the manual splitter and no failure due to lack of boundaries either, the "sum check" is part of the prevalidation. Note that we use slashes and single rectangular brackets "+[I:bug/BUG]" instead of wikisyntax "[[bug|BUG]]", beware that "[bug|BUG]" would NOT work. The tooltips: There are some difficulties with the tooltip to be displayed via the "title=" attribute. HTML tags cannot be nested, thus neither <br> nor <bdi>...</bdi> can be used. We have no solution to <br> (apart from splitting the tooltip into 2 fragments showed separately from different positions), and for <bdi>...</bdi> we use the unicode explicit isolator "FIRST STRONG ISOLATE (FSI)" which does have the expected effect but may as a side effect show as a rectangle in some browsers. Alternatively, an advanced tooltip can be achieved using CSS and the "hover" selector but this is not accessible from inside wikitext. Even an extension for such advanced tooltips exists but is not enabled on most public wikies. : --------------------------------------- * #T00 (no params, evil) * expected result: #E02 * actual result: "{{#invoke:mlawc|ek}}" ::* #T01 ("eo", one param, evil) ::* expected result: #E02 ::* actual result: "{{#invoke:mlawc|ek|eo}}" * #T02 ("en|SB", page "hole", simplest example) * expected result: OK * actual result: "{{#invoke:mlawc|ek|en|SB|pagenameoverridetestonly=hole|nocat=true}}" ::* #T03 ("en|??", page "hole") ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|en|??|pagenameoverridetestonly=hole|nocat=true}}" * #T04 ("??|SB", page "hole") * expected result: OK * actual result: "{{#invoke:mlawc|ek|??|SB|pagenameoverridetestonly=hole|nocat=true}}" ::* #T05 ("??|??", page "mojosa") ::* expected result: #E06 ::* actual result: "{{#invoke:mlawc|ek|??|??|pagenameoverridetestonly=mojosa|nocat=true}}" * #T06 ("id|SBGR", page "pembangkit listrik", default split) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|SBGR|pagenameoverridetestonly=pembangkit listrik|nocat=true}}" ::* #T07 ("en|SB|tria", page "hole", too many params) ::* expected result: #E02 ::* actual result: "{{#invoke:mlawc|ek|en|SB|tria|pagenameoverridetestonly=hole|nocat=true}}" * #T08 ("en|SB|tria|kvara", page "hole", too many params) * expected result: #E02 * actual result: "{{#invoke:mlawc|ek|en|SB|tria|kvara|pagenameoverridetestonly=hole|nocat=true}}" : --------------------------------------- * #T10 ("id|SBGR|fra=-", page "pembangkit listrik", no split) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=-|pagenameoverridetestonly=pembangkit listrik|nocat=true}}" ::* #T11 ("id|SBGR", page "pembangkit listrik tenaga surya", default split) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|id|SBGR|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" * #T12 ("id|SBGR|fra=-", page "pembangkit listrik tenaga surya", no split) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=-|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" ::* #T13 ("id|SBGR|fra=%0", page "pembangkit listrik tenaga surya", auto split except ZERO) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%0|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" * #T14 ("id|SBGR|fra=%1", page "pembangkit listrik tenaga surya", auto split except ONE) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%1|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" ::* #T15 ("id|SBGR|fra=%2", page "pembangkit listrik tenaga surya", auto split except 2) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%2|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" : --------------------------------------- * #T20 ("id|SBGR|fra=%3", page "pembangkit listrik tenaga surya", auto split except 3, ignored) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%3|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" ::* #T21 ("id|SBGR|fra=%F", page "pembangkit listrik tenaga surya", auto split except "F" AKA 15, ignored) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%F|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" * #T22 ("id|SBGR|fra=%G", page "pembangkit listrik tenaga surya", invalid split control string, bad char) * expected result: #E07 * actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%G|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" ::* #T23 ("id|SBGR|fra=%12", page "pembangkit listrik tenaga surya", auto split except 1 and 2) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%12|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" * #T24 ("id|SBGR|fra=%23456789", page "pembangkit listrik tenaga surya", auto split except 2...9, junk ignored) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%23456789|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" ::* #T25 ("id|SBGR|fra=%123456789", page "pembangkit listrik tenaga surya", auto split except 1...9, too long) ::* expected result: #E07 ::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%123456789|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" * #T26 ("id|SBGR|fra=%23456781", page "pembangkit listrik tenaga surya", auto split except nonsense, not ascending) * expected result: #E07 * actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%23456781|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" : --------------------------------------- * #T30 ("en|KA", page "When in a hole, stop digging.", default auto split but suboptimal result) * expected result: OK * actual result: "{{#invoke:mlawc|ek|en|KA|pagenameoverridetestonly=When in a hole, stop digging.|nocat=true}}" ::* #T31 ("en|KA|fra=-", page "When in a hole, stop digging.", no split, no link) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=-|pagenameoverridetestonly=When in a hole, stop digging.|nocat=true}}" * #T32 ("en|KA|fra=#0I", page "When in a hole, stop digging.", assi auto split, lowercase frag index 0) * expected result: OK * actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=When in a hole, stop digging.|nocat=true}}" ::* #T33 ("id|SBGR|fra=%1 #2A", page "pembangkit listrik tenaga surya", assi auto split, block boun ONE and uppercase frag index 2) ::* expected result: OK (silly with "listrik tenaga" together and "surya" linking to "Surya") ::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%1 #2A|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}" * #T34 ("en|KA|fra=#0I", page "When In A Hole, Stop Digging.", assi auto split, German style, lowercase frag index 0) * expected result: OK * actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=When In A Hole, Stop Digging.|nocat=true}}" ::* #T35 ("en|KA|fra=#0I #3I #4I #5I", page "When In A Hole, Stop Digging.", assi auto split, German style, lowercase frag index 0 3 4 5) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I #3I #4I #5I|pagenameoverridetestonly=When In A Hole, Stop Digging.|nocat=true}}" : --------------------------------------- * #T40 ("en|KA|fra=#0I", page "Digging", assi auto split and fix case requested index 0 but no split boundaries available) * expected result: OK (raw text "Digging" and no link to "digging" nor "Digging") * actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=Digging|nocat=true}}" ::* #T41 ("sv|KA", page "?va?", default split) ::* expected result: OK (link to "va") ::* actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=?va?|nocat=true}}" * #T42 ("sv|KA", page "?va", default split) * expected result: OK (link to "va") * actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=?va|nocat=true}}" ::* #T43 ("sv|KA", page "va?", default split) ::* expected result: OK (link to "va") ::* actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=va?|nocat=true}}" * #T44 ("sv|KA", page "va", default auto split but no split boundaries available) * expected result: OK (no link) * actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=va|nocat=true}}" ::* #T45 ("sv|KA|fra=%01", page "?va?", assi auto split, 2 boundaries available but both are blocked) ::* expected result: OK (raw text "?va?" and no link) ::* actual result: "{{#invoke:mlawc|ek|sv|KA|fra=%01|pagenameoverridetestonly=?va?|nocat=true}}" : --------------------------------------- * #T50 ("en|KA|fra=#0I", page "When in Rome, do as the Romans do.", assi auto split and fix case frag 0, suboptimal result due to word "Romans") * expected result: OK (links to "when" and "Romans") * actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=When in Rome, do as the Romans do.|nocat=true}}" ::* #T51 ("en|KA|fra=#0I #6:Roman", page "When in Rome, do as the Romans do.", assi auto split and fix case frag 0, good result, fixed word "Romans" index 6) ::* expected result: OK (links to "when" and "Roman") ::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I #6:Roman|pagenameoverridetestonly=When in Rome, do as the Romans do.|nocat=true}}" * #T52 ("en|KA|fra=#0I #6:Roman", page "When in,, , Rome, do as the Romans do.", assi auto split, fix case frag 0, fix word "Romans" idx 6) * expected result: silly OK (links to "when" and "Roman") * actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I #6:Roman|pagenameoverridetestonly=When in,, , Rome, do as the Romans do.|nocat=true}}" ::* #T53 ("en|KA|fra=%01 #0I #4:Romania", page "When in,, , Rome, do as the Romans do.", assi auto split, block 0&1, fix 0, fix word "Romans" idx 4 now) ::* expected result: very silly OK (links to "when in,, , Rome" and "Romania") ::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=%01 #0I #4:Romania|pagenameoverridetestonly=When in,, , Rome, do as the Romans do.|nocat=true}}" * #T54 ("eo|KA", page "!!!Mi jam,? estas fin-venkisto!!!", default auto split) * expected result: OK * actual result: "{{#invoke:mlawc|ek|eo|KA|pagenameoverridetestonly=!!!Mi jam,? estas fin-venkisto!!!|nocat=true}}" ::* #T55 ("eo|KA|fra=-", page "!!!Mi jam,? estas fin-venkisto!!!", no split) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|eo|KA|fra=-|pagenameoverridetestonly=!!!Mi jam,? estas fin-venkisto!!!|nocat=true}}" * #T56 ("eo|KA|fra=#3:fino", page "!!!Mi jam,? estas fin-venkisto!!!", assi auto split, link "fin-venkisto" to "fino") * expected result: OK * actual result: "{{#invoke:mlawc|ek|eo|KA|fra=#3:fino|pagenameoverridetestonly=!!!Mi jam,? estas fin-venkisto!!!|nocat=true}}" : --------------------------------------- * #T60 ("deu|SB", page "hole", invalid lng) * expected result: #E04 * actual result: "{{#invoke:mlawc|ek|deu|SB|pagenameoverridetestonly=hole|nocat=true}}" ::* #T61 ("xxx|SB", page "hole", unknown lng) ::* expected result: #E05 ::* actual result: "{{#invoke:mlawc|ek|xxx|SB|pagenameoverridetestonly=hole|nocat=true}}" * #T62 ("en|SS", page "hole", invalid word class) * expected result: #E06 * actual result: "{{#invoke:mlawc|ek|en|SS|pagenameoverridetestonly=hole|nocat=true}}" ::* #T63 ("en|SB??", page "move", invalid use of "??") ::* expected result: #E06 ::* actual result: "{{#invoke:mlawc|ek|en|SB??|pagenameoverridetestonly=move|nocat=true}}" * #T64 ("en|??SB", page "move", invalid use of "??") * expected result: #E06 * actual result: "{{#invoke:mlawc|ek|en|??SB|pagenameoverridetestonly=move|nocat=true}}" ::* #T65 ("en|????", page "move", invalid use of "??") ::* expected result: #E06 ::* actual result: "{{#invoke:mlawc|ek|en|????|pagenameoverridetestonly=move|nocat=true}}" * #T66 ("en|KAKU", page "PEBKAC") * expected result: OK * actual result: "{{#invoke:mlawc|ek|en|KAKU|pagenameoverridetestonly=PEBKAC|nocat=true}}" ::* #T67 ("en|KAAV", page "ASAP", "KA" is almost exclusive and "ASAP" is NOT a sentence) ::* expected result: #E06 ::* actual result: "{{#invoke:mlawc|ek|en|KAAV|pagenameoverridetestonly=ASAP|nocat=true}}" : --------------------------------------- * #T70 ("eo|KA", page "Mi estas fin-venkisto!!!", default auto split) * expected result: OK * actual result: "{{#invoke:mlawc|ek|eo|KA|pagenameoverridetestonly=Mi estas fin-venkisto!!!|nocat=true}}" ::* #T71 ("eo|KA|fra=-", page "Mi estas fin-venkisto!!!", no split) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|eo|KA|fra=-|pagenameoverridetestonly=Mi estas fin-venkisto!!!|nocat=true}}" * #T72 ("eo|KA|fra=[ri/Mi] [estas fin-v]enk[-ist/isto]!!!", page "Mi estas fin-venkisto!!!", manual split) * expected result: OK * actual result: "{{#invoke:mlawc|ek|eo|KA|fra=[ri/Mi] [estas fin-v]enk[-ist/isto]!!!|pagenameoverridetestonly=Mi estas fin-venkisto!!!|nocat=true}}" ::* #T73 ("eo|KA|fra=[ri/Mi] [estas fin-v]]enk[-ist/isto]!!!", page "Mi estas fin-venkisto!!!", broken manual split, double bracket) ::* expected result: #E07 ::* actual result: "{{#invoke:mlawc|ek|eo|KA|fra=[ri/Mi] [estas fin-v]]enk[-ist/isto]!!!|pagenameoverridetestonly=Mi estas fin-venkisto!!!|nocat=true}}" * #T74 <nowiki>("eo|KA|fra=[mi/Mi] [estas fin-v]e''nki''sto!!!", page "Mi estas fin-venkisto!!!", broken manual split, apo:s)</nowiki> * expected result: #E07 * actual result: "{{#invoke:mlawc|ek|eo|KA|fra=[mi/Mi] [estas fin-v]e''nki''sto!!!|pagenameoverridetestonly=Mi estas fin-venkisto!!!|nocat=true}}" ::* #T75 ("eo|KA|fra=[ri/Mi] [estas fin-v]enk[-ist/i[s]to]!!!", page "Mi estas fin-venkisto!!!", broken manual split, nested brackets) ::* expected result: #E07 ::* actual result: "{{#invoke:mlawc|ek|eo|KA|fra=[ri/Mi] [estas fin-v]enk[-ist/i[s]to]!!!|pagenameoverridetestonly=Mi estas fin-venkisto!!!|nocat=true}}" * #T76 ("eo|KA|fra=[ri/Mi] [estas fin-v]enk[-ist /isto]!!!", page "Mi estas fin-venkisto!!!", broken manual split, illegal space) * expected result: #E07 * actual result: "{{#invoke:mlawc|ek|eo|KA|fra=[ri/Mi] [estas fin-v]enk[-ist /isto]!!!|pagenameoverridetestonly=Mi estas fin-venkisto!!!|nocat=true}}" : --------------------------------------- * #T80 ("sv|AJ", page "icke-binaer", default auto split does nothing due to no boundary) * expected result: OK (suboptimal) * actual result: "{{#invoke:mlawc|ek|sv|AJ|pagenameoverridetestonly=icke-binaer|nocat=true}}" ::* #T81 ("sv|AJ|fra=[P:icke-][M:binaer]", page "icke-binaer", manual split) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|sv|AJ|fra=[P:icke-][M:binaer]|pagenameoverridetestonly=icke-binaer|nocat=true}}" * #T82 ("sv|AJ|fra=[P:icke][M:binaer]", page "icke-binaer", broken manual split) * expected result: #E09 * actual result: "{{#invoke:mlawc|ek|sv|AJ|fra=[P:icke][M:binaer]|pagenameoverridetestonly=icke-binaer|nocat=true}}" ::* #T83 ("id|SB|fra=[C:per-...-an/per][M:tidak][M:sama][C:per-...-an/an]", page "pertidaksamaan", manual split) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|id|SB|fra=[C:per-...-an/per][M:tidak][M:sama][C:per-...-an/an]|pagenameoverridetestonly=pertidaksamaan|nocat=true}}" * #T84 ("id|SB|fra=[C:per-...-an/per]+[M:tidak]+[M:sama]+[C:per-...-an/an]", page "pertidaksamaan", manual split, plussed) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|SB|fra=[C:per-...-an/per]+[M:tidak]+[M:sama]+[C:per-...-an/an]|pagenameoverridetestonly=pertidaksamaan|nocat=true}}" ::* #T85 ("id|SB|fra=[C:per-...-an/per]+[M:kereta( )api]+[C:per-...-an/an]", page "perkeretaapian", manual split, plussed, deleted space) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|id|SB|fra=[C:per-...-an/per]+[M:kereta( )api]+[C:per-...-an/an]|pagenameoverridetestonly=perkeretaapian|nocat=true}}" * #T86 ("eo|SB|fra=[L:polv(o)]+[I:o]+[L:sucx(i)]+[I:il]+[U:o]", page "polvosucxilo", manual split, deleted letter, "L"-trick, plussed) * expected result: OK * actual result: "{{#invoke:mlawc|ek|eo|SB|fra=[L:polv(o)]+[I:o]+[L:sucx(i)]+[I:il]+[U:o]|pagenameoverridetestonly=polvosucxilo|nocat=true}}" ::* #T87 ("sv|SB|fra=[M:vara/var(a)u][M:maerke]", page "varumaerke", manual split, deleted and replaced letter) ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|sv|SB|fra=[M:vara/var(a)u][M:maerke]|pagenameoverridetestonly=varumaerke|nocat=true}}" * #T88 ("id|VE|fra=[P:meN-/meng][M:(k)irim]", page "mengirim", manual split, deleted letter, plussed) * expected result: OK * actual result: "{{#invoke:mlawc|ek|id|VE|fra=[P:meN-/meng][M:(k)irim]|pagenameoverridetestonly=mengirim|nocat=true}}" ::* #T89 ("sv|SB|fra=[M:kung]+a+[M:doeme]", page "kungadoeme", manual split, plusses around "F000" fragment) ::* expected result: OK (see categories) ::* actual result nocat: "{{#invoke:mlawc|ek|sv|SB|fra=[M:kung]+a+[M:doeme]|pagenameoverridetestonly=kungadoeme|nocat=true}}" ::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|sv|SB|fra=[M:kung]+a+[M:doeme]|pagenameoverridetestonly=kungadoeme}}|nw}}" : --------------------------------------- * #T90 ("en|SB", page "sun", default auto split does nothing due to no boundary) * expected result: OK * actual result: "{{#invoke:mlawc|ek|en|SB|pagenameoverridetestonly=sun|nocat=true}}" ::* #T91 ("en|SB|fra=$B", page "sun", simple bare root strategy) ::* expected result: OK (no link, see categories, cat it as "M" under "sun" and main "-") ::* actual result nocat: "{{#invoke:mlawc|ek|en|SB|fra=$B|pagenameoverridetestonly=sun|nocat=true}}" ::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|en|SB|fra=$B|pagenameoverridetestonly=sun}}|nw}}" * #T92 ("en|SB|fra=$B", page "Sun", simple bare root strategy) * expected result: OK (link to "sun" and see categories, cat it as "M" under "sun" and main "-") * actual result nocat: "{{#invoke:mlawc|ek|en|SB|fra=$B|pagenameoverridetestonly=Sun|nocat=true}}" * actual result via debu: "{{debu|{{#invoke:mlawc|ek|en|SB|fra=$B|pagenameoverridetestonly=Sun}}|nw}}" ::* #T93 ("en|SB|ext=&M", page "Inverness", extra parameter) ::* expected result: OK (no link, see categories, cat it as "M" under "Inverness" and main "-") ::* actual result nocat: "{{#invoke:mlawc|ek|en|SB|ext=&M|pagenameoverridetestonly=Inverness|nocat=true}}" ::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|en|SB|ext=&M|pagenameoverridetestonly=Inverness}}|nw}}" * #T94 ("eo|SB", page "suno", default auto split does nothing due to no boundary) * expected result: OK * actual result: "{{#invoke:mlawc|ek|eo|SB|pagenameoverridetestonly=suno|nocat=true}}" ::* #T95 ("eo|SB|fra=$S", page "suno", simple root split) ::* expected result: OK (no link, see categories, cat it as "N" under "sun" and main "-") ::* actual result nocat: "{{#invoke:mlawc|ek|eo|SB|fra=$S|pagenameoverridetestonly=suno|nocat=true}}" ::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|eo|SB|fra=$S|pagenameoverridetestonly=suno}}|nw}}" * #T96 ("eo|SB|fra=$S", page "Suno", simple root split) * expected result: OK (link to "suno" and see categories, cat it as "N" under "sun" and main "-") * actual result nocat: "{{#invoke:mlawc|ek|eo|SB|fra=$S|pagenameoverridetestonly=Suno|nocat=true}}" * actual result via debu: "{{debu|{{#invoke:mlawc|ek|eo|SB|fra=$S|pagenameoverridetestonly=Suno}}|nw}}" ::* #T97 ("eo|SB|fra=GXakart+[U:o]|ext=[N!gxakart]", page "GXakarto", extra parameter) ::* expected result: OK (no link, see categories, cat it as "N" under "gxakart" and main "-") ::* actual result nocat: "{{#invoke:mlawc|ek|eo|SB|fra=GXakart+[U:o]|ext=[N!gxakart]|pagenameoverridetestonly=GXakarto|nocat=true}}" ::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|eo|SB|fra=GXakart+[U:o]|ext=[N!gxakart]|pagenameoverridetestonly=GXakarto}}|nw}}" * #T98 ("eo|SB|fra=GXakart+[U:o]|ext=[N:gxakart/Jakarta]", page "GXakarto", faulty extra parameter) * expected result: #E13 * actual result nocat: "{{#invoke:mlawc|ek|eo|SB|fra=GXakart+[U:o]|ext=[N:gxakart/Jakarta]|pagenameoverridetestonly=GXakarto}}" ::* #T99 ("sv|SB|fra=[M:loep(a)]+[U:-are/ar(e)]+[M:sko]", page "loeparsko", 2 stolen letters) ::* expected result: OK (see categories) ::* actual result nocat: "{{#invoke:mlawc|ek|sv|SB|fra=[M:loep(a)]+[U:-are/ar(e)]+[M:sko]|pagenameoverridetestonly=loeparsko|nocat=true}}" ::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|sv|SB|fra=[M:loep(a)]+[U:-are/ar(e)]+[M:sko]|pagenameoverridetestonly=loeparsko}}|nw}}" : --------------------------------------- * #TA0 ("en|AVKU", page "ASAP") * expected result: OK * actual result: "{{#invoke:mlawc|ek|en|AVKU|pagenameoverridetestonly=ASAP|nocat=true}}" ::* #TA1 ("en|SJ", page "when") ::* expected result: OK ::* actual result: "{{#invoke:mlawc|ek|en|SJ|pagenameoverridetestonly=when|nocat=true}}" * #TA2 ("sv|SB|dst=baza banan", page "banan", try to link to this one) * expected result: OK * actual result: "{{#invoke:mlawc|ek|sv|SB|dst=baza banan|pagenameoverridetestonly=banan|nocat=true}}" ::* #TA3 ("sv|SB|dst=fleksia", page "banan", try to link to this one) ::* expected result: OK (see categories) ::* actual result nocat: "{{#invoke:mlawc|ek|sv|SB|dst=fleksia|pagenameoverridetestonly=banan|nocat=true}}" ::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|sv|SB|dst=fleksia|pagenameoverridetestonly=banan}}|nw}}" * #TA4 ("sv|SB|dst=baza [ba]nan", page "banan", illegal brackets) * expected result: #E11 * actual result: "{{#invoke:mlawc|ek|sv|SB|dst=baza [ba]nan|pagenameoverridetestonly=banan|nocat=true}}" ::* #TA5 ("sv|SB|dst=banan'", page "banan", illegal apo) ::* expected result: #E11 ::* actual result: "{{#invoke:mlawc|ek|sv|SB|dst=banan'|pagenameoverridetestonly=banan|nocat=true}}" * #TA6 ("en|AVKU", page "ASAP", see categories) * expected result: OK * actual result: "{ {#invoke:mlawc|ek|en|AVKU|pagenameoverridetestonly=ASAP} }" (blocked) * actual result via debu: "{{debu|{{#invoke:mlawc|ek|en|AVKU|pagenameoverridetestonly=ASAP}}|nw}}" ::* #TA7 ("en|SJ", page "when", see categories) ::* expected result: OK ::* actual result: "{ {#invoke:mlawc|ek|en|SJ|pagenameoverridetestonly=when} }" (blocked) ::* actual result via debu "{{debu|{{#invoke:mlawc|ek|en|SJ|pagenameoverridetestonly=when}}|nw}}" * #TA8 ("en|AVKU|dst=test", page "ASAP", silly maximal test for anchors and categories) * expected result: OK * actual result: "{ {#invoke:mlawc|ek|en|AVKU|dst=test|pagenameoverridetestonly=ASAP} }" (blocked) * actual result via debu "{{debu|{{#invoke:mlawc|ek|en|AVKU|dst=test|pagenameoverridetestonly=ASAP}}|nw}}" : --------------------------------------- * note that tests #T89 #T91...#T93 #T95...#T97 #T99 #TA3 #TA6 #TA7 and #TA8 depend on "debu" * note that tests #TA6 #TA7 and #TA8 cannot be reasonably executed on the docs subpage without help of "pate" or "debu" : --------------------------------------- ]===] local lawc = {} ------------------------------------------------------------------------ ---- CONSTANTS ---- ------------------------------------------------------------------------ -- uncommentable EO vs ID constant strings (core site-related features) local constrpriv = "eo" -- EO (privileged site language) -- local constrpriv = "id" -- ID (privileged site language) local constrplki = "Modulo:mpiktbllki" -- EO -- local constrplki = "Modul:mpiktbllki" -- ID local constrsplt = "Modulo:msplitter" -- EO -- local constrsplt = "Modul:msplitter" -- ID local constrkatp = "Kategorio:" -- EO -- local constrkatp = "Kategori:" -- ID -- constant table (ban list) -- add only obviously invalid access codes (2-letter or 3-letter) -- length of the list is NOT stored anywhere, the processing stops -- when type "nil" is encountered -- "en.wiktionary.org/wiki/Wiktionary:Language_treatment" excluded languages -- "en.wikipedia.org/wiki/Spurious_languages" -- "iso639-3.sil.org/code/art" only valid in ISO 639-2 -- "iso639-3.sil.org/code/zxx" "No linguistic content" local contabisbanned = {} contabisbanned = {'dc', 'll', 'art','deu','eng','epo','fra','lat','por','rus','spa','swe','tup','zxx'} -- 1...14 -- constant table (surrogate transcoding table, only needed for EO) local contabtransiltable = {} contabtransiltable[ 67] = 0xC488 -- CX contabtransiltable[ 99] = 0xC489 -- cx contabtransiltable[ 71] = 0xC49C -- GX contabtransiltable[103] = 0xC49D -- gx contabtransiltable[ 74] = 0xC4B4 -- JX contabtransiltable[106] = 0xC4B5 -- jx contabtransiltable[ 83] = 0xC59C -- SX contabtransiltable[115] = 0xC59D -- sx contabtransiltable[ 85] = 0xC5AC -- UX breve contabtransiltable[117] = 0xC5AD -- ux breve -- constant strings (anchor HTML code and prefix) local constrankkom = '<span id="Qsekt' -- do NOT add the dash "-" here local constaankend = '"></span>' -- constant strings (error circumfixes) local constrelabg = '<span class="error"><b>' -- lagom whining begin local constrelaen = '</b></span>' -- lagom whining end local constrlaxhu = ' [] ' -- lagom -> huge circumfix " [] " -- uncommentable EO vs ID string (caller name for error messages) local constrkoll = 'sxablono "livs"' -- EO augmented name of the caller (semi-hardcoded, we do NOT peek it) -- local constrkoll = 'templat "bakk"' -- ID augmented name of the caller (semi-hardcoded, we do NOT peek it) -- uncommentable EO vs ID constant table (error messages) -- #E02...#E99, holes permitted (max 3 consecutive) -- note that #E00 and #E01 are NOT supposed to be included here -- separate "constrkoll" needed for "\\@" local contaberaroj = {} contaberaroj[2] = 'Erara uzo de \\@, legu gxian dokumentajxon' -- EO #E02 -- contaberaroj[2] = 'Penggunaan salah \\@, bacalah dokumentasinya' -- ID #E02 contaberaroj[3] = 'Eraro en subsxablonoj uzataj far \\@' -- EO #E03 -- contaberaroj[3] = 'Kesalahan dalam subtemplat digunakan oleh \\@' -- ID #E03 contaberaroj[4] = 'Evidente nevalida lingvokodo en \\@' -- EO #E04 -- contaberaroj[4] = 'Kode bahasa jelas-jelas salah dalam \\@' -- ID #E04 contaberaroj[05] = 'Nekonata lingvokodo en \\@' -- EO #E05 -- contaberaroj[05] = 'Kode bahasa tidak dikenal dalam \\@' -- ID #E05 contaberaroj[06] = 'Erara uzo de \\@ pro vortospeco' -- EO #E06 -- contaberaroj[06] = 'Penggunaan salah \\@ oleh karena kelas kata' -- ID #E06 contaberaroj[07] = 'Erara uzo de \\@ pro "fra=" apartigo' -- EO #E07 -- contaberaroj[07] = 'Penggunaan salah \\@ oleh karena "fra=" pemotongan' -- ID #E07 contaberaroj[08] = 'Erara uzo de \\@ pro pagxonomo por "$S" "$H"' -- EO #E08 -- contaberaroj[08] = 'Penggunaan salah \\@ oleh karena nama halaman untuk "$S" "$H"' -- ID #E08 contaberaroj[09] = 'Erara uzo de \\@ pro "sumkontrolo"' -- EO #E09 -- contaberaroj[09] = 'Penggunaan salah \\@ oleh karena "pemeriksaan jumlah"' -- ID #E09 contaberaroj[11] = 'Erara uzo de \\@ pro "dst=" distingo' -- EO #E11 -- contaberaroj[11] = 'Penggunaan salah \\@ oleh karena "dst=" pembedaan' -- ID #E11 contaberaroj[13] = 'Erara uzo de \\@ pro "ext=" kroma parametro' -- EO #E13 -- contaberaroj[13] = 'Penggunaan salah \\@ oleh karena "ext=" parameter ekstra' -- ID #E13 contaberaroj[14] = 'Erara uzo de \\@ pro "scr=" skriba parametro' -- EO #E14 -- contaberaroj[14] = 'Penggunaan salah \\@ oleh karena "scr=" parameter aksara' -- ID #E14 -- constant strings and table (tooltip and misc to be sent to the screen) local constrtoolt = 'style="border-bottom:1px dotted; cursor:help;"' -- lousy tooltip local constrisobg = '(⁨ ' -- isolator for "strange" (RTL, submicroscopic) text begin local constrisoen = ' ⁨)' -- isolator for "strange" (RTL, submicroscopic) text end local contabscrmisc = {} contabscrmisc[0] = '<div style="margin:0.2em;"></div>' -- must be empty, tiny EOL contabscrmisc[1] = '[[File:Gartoon apps kopete all away replaced.svg|24px|link=]]' -- uncommentable EO vs ID constant table (lng and wc stuff) local contablaxwc = {} contablaxwc [0] = "Lingvo: " -- EO tooltip only 1 -- contablaxwc [0] = "Bahasa: " -- ID tooltip only 1 contablaxwc [1] = "Vortospeco: " -- EO tooltip can be 2 -- contablaxwc [1] = "Kelas kata: " -- ID tooltip can be 2 contablaxwc [2] = "nekonata lingvo" -- EO placeholder -- contablaxwc [2] = "bahasa yang tidak dikenal" -- ID placeholder contablaxwc [3] = "nekonata vortospeco" -- EO placeholder -- contablaxwc [3] = "kelas kata yang tidak dikenal" -- ID placeholder -- uncommentable EO vs ID constant table (categories) -- syntax of insertion and discarding magic string: -- "@" followed by 2 uppercase letters and 2 hex numbers -- otherwise the hit is not processed, but copied as-is instead -- 2 letters select the insertable item from table supplied by the caller -- 2 hex numbers control discarding left and right (0...15 char:s) -- empty item is legal and results in discarding if some number is non-ZERO -- if uppercasing or other adjustment is needed then the caller must take -- care of it in the form of 2 or more separate items provided in the table -- insertable items defined: -- constant: -- * LK lng code (unknown "??" legal but take care elsewhere) -- * LN lng name (unknown legal, for example "dana" or "Ido") -- * LU lng name uppercased (unknown legal, for example "Dana" or "Ido") -- * LO lng name not own (empty or nil if own) -- * LV lng name uppercased not own (empty or nil if own) -- * LY lng name long (for example "bahasa Swedia") -- * LZ lng name long not own (empty or nil if own) -- * SC script code (for example "T", "S", "P" for ZH, "C" "L" for SH) -- variable (we can have 2 word classes): -- * WC word class name (for example "substantivo") -- * WU word class name uppercased (for example "Substantivo") -- * MT mortyp code (for example "C") -- * FR fragment (for example "peN-...-an" or "abelujo") -- see "lfinsertultim" and "tabstuff" use space here and avoid "_" -- note the malicious false friendship between EO:frazo kaj ID:frasa local contabkatoj = {} contabkatoj[0] = "Kapvorto (@LN00)" -- EO always (except "nocat=true") only 1 piece -- contabkatoj[0] = "Kata @LY00" -- ID always (except "nocat=true") only 1 piece contabkatoj[1] = "@WU00 (@LN00)" -- EO always (except "nocat=true") 1 or 2 pieces -- contabkatoj[1] = "@LK00:@WU00" -- ID always (except "nocat=true") 1 or 2 pieces contabkatoj[2] = "@WU00" -- EO always (except "nocat=true") 1 or 2 pieces -- contabkatoj[2] = "@WU00" -- ID always (except "nocat=true") 1 or 2 pieces -- uncommentable EO vs ID constant table (25 word classes) local contabwc = {} contabwc["SB"] = "substantivo (O-vorto)" -- EO | -- contabwc["SB"] = "nomina (kata benda)" -- ID | contabwc["VE"] = "verbo (I-vorto)" -- EO | main big (3) -- contabwc["VE"] = "verba (kata kerja)" -- ID | contabwc["AJ"] = "adjektivo (A-vorto)" -- EO | -- contabwc["AJ"] = "adjektiva (kata sifat)" -- ID | contabwc["PN"] = "pronomo" -- EO % -- contabwc["PN"] = "pronomina (kata pengganti)" -- ID % contabwc["NV"] = "numeralo (nombrovorto)" -- EO % -- contabwc["NV"] = "numeralia (kata bilangan)" -- ID % contabwc["AV"] = "adverbo (E-vorto)" -- EO % -- contabwc["AV"] = "adverbia (kata keterangan)" -- ID % contabwc["PV"] = "verbpartiklo" -- EO % -- contabwc["PV"] = "partikel verba" -- ID % contabwc["QV"] = "demandvorto" -- EO % -- contabwc["QV"] = "kata tanya" -- ID % contabwc["KJ"] = "konjunkcio" -- EO % -- contabwc["KJ"] = "konjungsi" -- ID % contabwc["SJ"] = "subjunkcio (subfrazenkondukilo)" -- EO % -- contabwc["SJ"] = "subjungsi (pengaju klausa terikat)" -- ID % further smaller (12) contabwc["PP"] = "prepozicio (antauxlokigita rolvorteto)" -- EO % -- contabwc["PP"] = "preposisi (kata depan)" -- ID % contabwc["PO"] = "postpozicio" -- EO % -- contabwc["PO"] = "postposisi (kata belakang)" -- ID % contabwc["PC"] = "cirkumpozicio" -- EO % -- contabwc["PC"] = "sirkumposisi" -- ID % contabwc["AR"] = "artikolo" -- EO % -- contabwc["AR"] = "artikel (kata sandang)" -- ID % contabwc["IN"] = "interjekcio" -- EO % -- contabwc["IN"] = "interjeksi" -- ID % contabwc["PF"] = "prefikso" -- EO # -- contabwc["PF"] = "prefiks (awalan)" -- ID # contabwc["UF"] = "sufikso (postfikso, finajxo)" -- EO # -- contabwc["UF"] = "sufiks (akhiran)" -- ID # nonstandalone (5) contabwc["KF"] = "cirkumfikso (konfikso)" -- EO # -- contabwc["KF"] = "sirkumfiks (konfiks)" -- ID # contabwc["IF"] = "infikso" -- EO # -- contabwc["IF"] = "infiks (sisipan)" -- ID # contabwc["NR"] = "nememstara radiko" -- EO # -- contabwc["NR"] = "akar kata terikat (prakategorial)" -- ID # contabwc["KA"] = "frazo" -- EO $ -- contabwc["KA"] = "kalimat" -- ID $ contabwc["KK"] = "signo" -- EO $ misc (2) -- contabwc["KK"] = "karakter" -- ID $ contabwc["KU"] = "mallongigo (kurtigo)" -- EO & -- contabwc["KU"] = "singkatan (abreviasi)" -- ID & contabwc["GR"] = "vortgrupo" -- EO & additional (3) -- contabwc["GR"] = "kumpulan kata" -- ID & contabwc["TV"] = "tabelvorto" -- EO & -- contabwc["TV"] = "kata tabel" -- ID & -- constant table (3 integers for preliminary parameter check) local contabparam = {} contabparam[0] = 2 -- minimal number of anon parameters contabparam[1] = 2 -- maximal number of anon parameters contabparam[2] = 160 -- maximal length of single para (min is hardcoded ONE) -- constants related to submodules local constrtblc0 = "0" -- in site language local constrtblc2 = "1" -- propralingve !!! SOON 2 !!! -- constants to control behaviour from source AKA semi-hardcoded parameters local constrmainctl = "13" -- image (0 or 1) lemma (0 none 1 raw 2 maybe split 3 maybe split and morpheme cat:s) local conboomiddig = false -- controls lng code checking, assign to "true" to allow middle digit "s7a" ------------------------------------------------------------------------ ---- SPECIAL STUFF OUTSIDE MAIN FUNCTION ---- ------------------------------------------------------------------------ ---- VAR:S ---- local qpiktbllki = {} -- type "table" with type "function" inside local qsplitter = {} -- type "table" with type "function" inside local qbooguard = false -- only for the guard test, pass to other var ASAP local qstrtrace = '' -- for main & sub:s, debug report request by "detrc=" local qtabkatoj = {} -- global for compound categories [0]...[41] ---- GUARD AGAINST INTERNAL ERROR & ONE IMPORT VIA REQUIRE ---- if ((type(constrpriv)~="string") or (type(constrplki)~="string") or (type(constrkatp)~="string")) then qbooguard = true else qpiktbllki = require(constrplki) -- can crash here despite guarding ?? qsplitter = require(constrsplt) -- can crash here despite guarding ?? if ((type(qpiktbllki)~="table") or (type(qsplitter)~="table")) then qbooguard = true end--if end--if ------------------------------------------------------------------------ ---- ORDINARY LOCAL DEBUG FUNCTIONS ---- ------------------------------------------------------------------------ -- Local function LFTRACEMSG -- for variables the other sub "lfshowvar" is preferable but in exceptional -- cases it can be justified to send text containing variables to this sub -- enhances global "qstrtrace" (may NOT be type "nil") local function lftracemsg (strbigcrap) qstrtrace = qstrtrace .. "<br>" .. strbigcrap .. '.' end--function lftracemsg ------------------------------------------------------------------------ -- Local function LFSHOWVAR -- Show content of a variable or list content of a table with -- integer indexes starting from ZERO. -- "vardescri" (def empty) and "vartablim" (def 0) are optional -- enhances global "qstrtrace" (may NOT be type "nil") local function lfshowvar (vardubious, strname, vardescri, vartablim) local strtype = '' local strreport = '' local numindax = 0 local numlencx = 0 local numsigno = 0 if (type(vardescri)~="string") then vardescri = '' else vardescri = ' (' .. vardescri .. ')' end--if if (type(vartablim)~="number") then vartablim = 0 end--if strname = '"' .. strname .. '"' .. vardescri -- combo strtype = type(vardubious) if ((strtype=="table") and (vartablim~=0)) then strreport = 'Table ' .. strname .. ' :' numindax = 0 while (true) do if (numindax>vartablim) then break -- done table end--if strreport = strreport .. ' ' .. tostring (numindax) .. ' -> "' .. tostring (vardubious[numindax]) .. '"' numindax = numindax + 1 end--while else strreport = 'Variable ' .. strname .. ' has type "' .. strtype .. '"' if (strtype=="string") then numlencx = string.len (vardubious) strreport = strreport .. ' and length ' .. tostring (numlencx) if (numlencx~=0) then strreport = strreport .. ' and content "' numindax = 1 while (true) do if (numindax>numlencx) then break -- done string char after char end--if numsigno = string.byte (vardubious,numindax,numindax) if ((numsigno<36) or (numsigno>122) or (numsigno==91) or (numsigno==93) or (numsigno==42) or (numsigno==58)) then strreport = strreport .. '{' .. tostring (numsigno) .. '}' else strreport = strreport .. string.char (numsigno) end--if if ((numlencx>50) and (numindax==20)) then numindax = numlencx - 20 -- jump strreport = strreport .. '" ... "' else numindax = numindax + 1 end--if end--while strreport = strreport .. '"' -- don't forget final quot end--if (numlencx~=0) then end--if if ((strtype~="string") and (strtype~="nil")) then strreport = strreport .. ' and content "' .. tostring (vardubious) .. '"' end--if end--if qstrtrace = qstrtrace .. "<br>" .. strreport .. '.' end--function lfshowvar ------------------------------------------------------------------------ ---- ORDINARY LOCAL MATH FUNCTIONS ---- ------------------------------------------------------------------------ local function mathdiv (xdividend, xdivisor) local resultdiv = 0 -- DIV operator lacks in LUA :-( resultdiv = math.floor (xdividend / xdivisor) return resultdiv end--function mathdiv local function mathmod (xdividendo, xdivisoro) local resultmod = 0 -- MOD operator is "%" and bitwise AND operator lack too resultmod = xdividendo % xdivisoro return resultmod end--function mathmod ------------------------------------------------------------------------ ---- ORDINARY LOCAL STRING FUNCTIONS ---- ------------------------------------------------------------------------ -- test whether char is an ASCII digit "0"..."9", return boolean local function lftestnum (numkaad) local boodigit = false boodigit = ((numkaad>=48) and (numkaad<=57)) return boodigit end--function lftestnum ------------------------------------------------------------------------ -- test whether char is an ASCII uppercase letter, return boolean local function lftestuc (numkode) local booupperc = false booupperc = ((numkode>=65) and (numkode<=90)) return booupperc end--function lftestuc ------------------------------------------------------------------------ -- test whether char is an ASCII lowercase letter, return boolean local function lftestlc (numcode) local boolowerc = false boolowerc = ((numcode>=97) and (numcode<=122)) return boolowerc end--function lftestlc ------------------------------------------------------------------------ -- Test whether incoming string consists of given number of -- ASCII uppercase letters, return boolean. -- return "true" on success -- This sub depends on "STRING FUNCTIONS"\"lftestuc". local function lfmultestuc (strinputi, numlenc) local booallupper = false local numtestindexx = 1 -- ONE-based local numtestedchar = 0 booallupper = (string.len(strinputi)==numlenc) if (booallupper) then while (true) do if (numtestindexx>numlenc) then break end--if numtestedchar = string.byte (strinputi,numtestindexx,numtestindexx) booallupper = booallupper and (lftestuc(numtestedchar)) numtestindexx = numtestindexx + 1 end--while end--if return booallupper end--function lfmultestuc ------------------------------------------------------------------------ -- Local function LFBANMULTI -- Ban listed single char:s by multiplicity (test for validity). -- Incoming control string "strkoneven" with pairs of char:s, for -- example "'2&0" will tolerate 2 consecutive apo:s but -- not 3, and completely ban the and-sign "&". -- Input : * "strkoneven" -- even and 2...24, wrong length gives -- "true", tolerated multiplicity "0"..."9" -- * "strsample" -- 0...1'024, empty gives "false", -- too long gives "true" -- Output : * "booisevil" -- "true" if evil -- This sub depends on "MATH FUNCTIONS"\"mathmod" -- and "STRING FUNCTIONS"\"lftestnum". local function lfbanmulti (strkoneven, strsample) local booisevil = false local numkonlen = 0 -- length of control string local numsamlen = 0 -- length of sample string local numinndex = 0 -- ZERO-based outer index local numinneri = 0 -- ZERO-based inner index local numchear = 0 local numnexxt = 0 local nummultiq = 1 -- counted multiplicity local numcrapp = 0 -- from "strkoneven" char to test local numvrapp = 0 -- from "strkoneven" multiplicity limit numsamlen = string.len (strsample) if (numsamlen~=0) then numkonlen = string.len (strkoneven) booisevil = (numkonlen<2) or (numkonlen>24) or (mathmod(numkonlen,2)~=0) or (numsamlen>1024) if (booisevil==false) then while (true) do -- outer loop if (numinndex==numsamlen) then break end--if numchear = string.byte (strsample,(numinndex+1),(numinndex+1)) if (numchear==0) then booisevil = true -- ZERO is unconditionally prohibited break end--if numinndex = numinndex + 1 numnexxt = 0 if (numinndex~=numsamlen) then numnexxt = string.byte (strsample,(numinndex+1),(numinndex+1)) end--if if (numchear==numnexxt) then nummultiq = nummultiq + 1 end--if if ((numchear~=numnexxt) or (numinndex==numsamlen)) then numinneri = 0 while (true) do -- innner loop if (numinneri==numkonlen) then break end--if numcrapp = string.byte (strkoneven,(numinneri+1),(numinneri+1)) numvrapp = string.byte (strkoneven,(numinneri+2),(numinneri+2)) if (lftestnum(numvrapp)==false) then booisevil = true -- crime in control string detected break end--if if ((numchear==numcrapp) and (nummultiq>(numvrapp-48))) then booisevil = true -- multiplicity crime in sample string detected break end--if numinneri = numinneri + 2 -- ZERO-based inner index and STEP 2 end--while -- innner loop if (booisevil) then break end--if nummultiq = 1 end--if ((numchear~=numnexxt) or (numinndex==numsamlen)) then end--while -- outer loop end--if (booisevil==false) then end--if (numsamlen~=0) then return booisevil end--function lfbanmulti ------------------------------------------------------------------------ -- Local function LFTESTSPACE -- Detect leading or trailing space in a string. local function lftestspace (strsuspectofspacing) local boospacedetected = false local numspacelength = 0 numspacelength = string.len(strsuspectofspacing) if (numspacelength~=0) then boospacedetected = (string.byte(strsuspectofspacing,1,1)==32) or (string.byte(strsuspectofspacing,numspacelength,numspacelength)==32) end--if return boospacedetected end--function lftestspace ------------------------------------------------------------------------ -- Local function LFDEBRACKET -- Separate bracketed part of a string and return the inner or outer -- part. On failure the string is returned complete and unchanged. -- Note that for length of hit ZERO ie "()" we have "numbegg" + 1 = "numendd" -- and for length of hit ONE ie "(x)" we have "numbegg" + 2 = "numendd". -- "numxminlencz" must be >= 1 !!! local function lfdebracket (strdeath, boooutside, numxminlencz) local numindoux = 1 -- ONE-based local numdlong = 0 local numwesel = 0 local numbegg = 0 -- ONE-based, ZERO invalid local numendd = 0 -- ONE-based, ZERO invalid numdlong = string.len (strdeath) while (true) do if (numindoux>numdlong) then break -- ONE-based -- if both "numbegg" "numendd" non-ZERO then maybe end--if numwesel = string.byte(strdeath,numindoux,numindoux) if (numwesel==40) then -- "(" if (numbegg==0) then numbegg = numindoux -- pos of "(" else numbegg = 0 break -- damn: more then 1 "(" present end--if end--if if (numwesel==41) then -- ")" if ((numendd==0) and (numbegg~=0) and ((numbegg+numxminlencz)<numindoux)) then numendd = numindoux -- pos of ")" else numendd = 0 break -- damn: more then 1 ")" present or ")" precedes "(" end--if end--if numindoux = numindoux + 1 end--while if ((numbegg~=0) and (numendd~=0)) then if (boooutside) then strdeath = string.sub(strdeath,1,(numbegg-1)) .. string.sub(strdeath,(numendd+1),numdlong) else strdeath = string.sub(strdeath,(numbegg+1),(numendd-1)) -- separate substring end--if end--if return strdeath -- same string variable end--function lfdebracket ------------------------------------------------------------------------ ---- ORDINARY LOCAL CONVERSION FUNCTIONS ---- ------------------------------------------------------------------------ -- Local function LFDEC1DIGCL -- Convert 1 decimal ASCII digit to UINT8 with clamp. local function lfdec1digcl (num1dugyt, num1clim) num1dugyt = num1dugyt - 48 -- may become invalid ie negative if ((num1dugyt<0) or (num1dugyt>num1clim)) then num1dugyt = 0 -- valid ZERO output on invalid input digit end--if return num1dugyt end--function lfdec1digcl ------------------------------------------------------------------------ -- Local function LFONEHEXTOINT -- Convert 1 ASCII code of a hex digit to an UINT4 ie 0...15 (255 invalid). -- Only uppercase accepted local function lfonehextoint (numdigit) local numresult = 255 if ((numdigit>47) and (numdigit<58)) then numresult = numdigit-48 end--if if ((numdigit>64) and (numdigit<71)) then numresult = numdigit-55 end--if return numresult end--function lfonehextoint ------------------------------------------------------------------------ -- This sub depends on "MATH FUNCTIONS"\"mathdiv" -- and "MATH FUNCTIONS"\"mathmod". local function lfnumto2digit (numzerotoninetynine) local strtwodig = '' strtwodig = mathdiv(numzerotoninetynine,10) .. mathmod(numzerotoninetynine,10) return strtwodig end--function lfnumto2digit ------------------------------------------------------------------------ ---- ORDINARY LOCAL UTF8 FUNCTIONS ---- ------------------------------------------------------------------------ -- Local function LFUTF8LENGTH -- Measure length of a single UTF8 char, return ZERO if invalid. -- Does NOT thoroughly check the validity, looks at 1 octet only -- Input : - numbgoctet (beginning octet of a UTF8 char) -- Output : - numlen1234x (1...4 or ZERO if invalid) local function lfutf8length (numbgoctet) local numlen1234x = 0 if (numbgoctet<128) then numlen1234x = 1 -- $00...$7F -- ANSI/ASCII end--if if ((numbgoctet>=194) and (numbgoctet<=223)) then numlen1234x = 2 -- $C2 to $DF end--if if ((numbgoctet>=224) and (numbgoctet<=239)) then numlen1234x = 3 -- $E0 to $EF end--if if ((numbgoctet>=240) and (numbgoctet<=244)) then numlen1234x = 4 -- $F0 to $F4 end--if return numlen1234x end--function lfutf8length ------------------------------------------------------------------------ -- Local function LFCASEGENE -- Adjust case of a single letter (generous), limited unicode support -- with some common UTF8 ranges. -- Input : * strucinut : single unicode letter (1 or 2 octet:s) -- * booucas : for desired uppercase "true" and for -- lowercase "false" -- Output : * strucinut : (same var, unchanged if input is -- empty or unknown or invalid) -- * in ASCII lowercase is $20 above uppercase, b5 reveals -- the case (1 is upper) -- * the same is valid in $C3-block -- * this is NOT valid in $C4-$C5-block, lowercase is usually 1 above -- uppercase and nothing reveals the case reliably -- * case delta can be 1 or $20 or $50 other -- * lowercase is usually above uppercase but not always -- * case pair distance can span $40-boundary or even $0100-boundary -- $C2-block $0080 $C2,$80 ... $00BF $C2,$BF no letters (OTOH NBSP mm) -- $C3-block $00C0 $C3,$80 ... $00FF $C3,$BF (SV mm) delta $20 UC-LC-UC-LC -- upper $00C0 $C3,$80 ... $00DF $C3,$9F -- lower $00E0 $C3,$A0 ... $00FF $C3,$BF -- AA AE EE NN OE UE mm -- $D7 $DF $F7 excluded (not letters) -- $FF excluded (here LC, UC is $0178) -- $C4-$C5-block $0100 $C4,$80 ... $017F $C5,$BF (EO mm) -- delta 1 and UC even but messy with many exceptions -- EO $0108 ... $016D case delta 1 -- for example SX upper $015C $C5,$9C - lower $015D $C5,$9D -- $0138 $0149 $017F excluded (not letters) -- $0178 excluded (here UC, LC is $FF) -- $0100 ... $0137 UC even -- $0139 ... $0148 reversed (UC odd) note that case delta is NOT reversed -- $014A ... $0177 UC even again -- $0179 ... $017E reversed (UC odd) note that case delta is NOT reversed -- $CC-$CF-block $0300 $CC,$80 ... $03FF $CF,$BF (EL mm) delta $20 -- EL $0370 ... $03FF (officially) -- strict EL base range $0391 ... $03C9 case delta $20 -- $0391 $CE,$91 ... $03AB $CE,$AB upper -- $03B1 $CE,$B1 ... $03CB $CD,$8B lower -- for example "omega" upper $03A9 $CE,$A9 - lower $03C9 $CF,$89 -- $D0-$D3-block $0400 $D0,$80 ... $04FF $D3,$BF (RU mm) delta $20 $50 -- strict RU base range $0410 ... $044F case delta $20 but 1 extra char !!! -- $0410 $D0,$90 ... $042F $D0,$AF upper -- $0430 $D0,$B0 ... $044F $D1,$8F lower -- for example "CCCP-gamma" upper $0413 $D0,$93 - lower $0433 $D0,$B3 -- extra base char and exception is special "E" with horizontal doubledot -- case delta $50 (upper $0401 $D0,$81 - lower $0451 $D1,$91) -- same applies for ranges $0400 $D0,$80 ... $040F $D0,$8F upper -- and $0450 $D1,$90 ... $045F $D1,$9F lower -- This sub depends on "MATH FUNCTIONS"\"mathmod" and -- "MATH FUNCTIONS"\"mathbittest" and "STRING FUNCTIONS"\"lftestuc" and -- "STRING FUNCTIONS"\"lftestlc" and "UTF8 FUNCTIONS"\"lfutf8length". local function lfcasegene (strucinut, booucas) local numlaengden = 0 -- length from "string.len" local numchaer = 0 -- UINT8 beginning char local numchaes = 0 -- UINT8 later char (BIG ENDIAN, lower value here) local numcharel = 0 -- UINT8 code relative to beginning of block $00...$FF local numdelabs = 0 -- UINT8 absolute positive delta local numdelta = 0 -- SINT16 signed, can be negative local numdelcarry = 0 -- SINT8 signed, can be negative local boowantlower = false local booisuppr = false local booislowr = false local boopending = false local booc3blok = false -- $C3 only $00C0...$00FF SV mm delta 32 local booc4c5bl = false -- $C4 $C5 $0100...$017F EO mm delta 1 local boocccfbl = false -- $CC $CF $0300...$03FF EL mm delta 32 local bood0d3bl = false -- $D0 $D3 $0400...$04FF RU mm delta 32 80 while (true) do -- fake loop numlaengden = string.len (strucinut) if ((numlaengden==0) or (numlaengden>2)) then break -- to join mark end--if numchaer = string.byte (strucinut,1,1) if ((lfutf8length(numchaer))~=numlaengden) then break -- to join mark -- mismatch with length from sub "lfutf8length" end--if boowantlower = (not booucas) if (numlaengden==1) then booisuppr = lftestuc(numchaer) booislowr = lftestlc(numchaer) if (booisuppr and boowantlower) then numdelta = 32 -- ASCII UPPER->lower end--if if (booislowr and booucas) then numdelta = -32 -- ASCII lower->UPPER end--if break -- to join mark end--if numchaes = string.byte (strucinut,2,2) booc3blok = (numchaer==195) -- case delta is 32 booc4c5bl = ((numchaer==196) or (numchaer==197)) -- case delta is 1 boocccfbl = ((numchaer>=204) and (numchaer<=207)) -- case delta is 32 bood0d3bl = ((numchaer>=208) and (numchaer<=211)) -- case delta is 32 80 if (booc3blok) then boopending = true numcharel = numchaes + 64 -- simplified calculation here (begins at $C0) if ((numcharel==215) or (numcharel==223) or (numcharel==247)) then boopending = false -- not a letter, we are done end--if if (numcharel==255) then boopending = false -- special LC silly "Y" with horizontal doubledot if (booucas) then numdelta = 121 -- lower->UPPER (distant and reversed) end--if end--if if (boopending) then booislowr = (mathbittest(numcharel,5)) -- mostly regular block booisuppr = not booislowr if (booisuppr and boowantlower) then numdelta = 32 -- UPPER->lower end--if if (booislowr and booucas) then numdelta = -32 -- lower->UPPER end--if end--if (boopending) then break -- to join mark end--if if (booc4c5bl) then boopending = true numcharel = (numchaer-196)*64 + (numchaes-128) -- begins at $C4 if ((numcharel==56) or (numcharel==73) or (numcharel==127)) then boopending = false -- not a letter, we are done end--if if (numcharel==120) then boopending = false -- special UC silly "Y" with horizontal doubledot if (boowantlower) then numdelta = -121 -- UPPER->lower (distant and reversed) end--if end--if if (boopending) then if (((numcharel>=57) and (numcharel<=73)) or (numcharel>=121)) then booislowr = ((mathmod(numcharel,2))==0) -- UC odd (reversed) else booislowr = ((mathmod(numcharel,2))==1) -- UC even (ordinary) end--if booisuppr = not booislowr if (booisuppr and boowantlower) then numdelta = 1 -- UPPER->lower end--if if (booislowr and booucas) then numdelta = -1 -- lower->UPPER end--if end--if (boopending) then break -- to join mark end--if if (boocccfbl) then numcharel = (numchaer-204)*64 + (numchaes-128) -- begins at $CC booisuppr = ((numcharel>=145) and (numcharel<=171)) booislowr = ((numcharel>=177) and (numcharel<=203)) if (booisuppr and boowantlower) then numdelta = 32 -- UPPER->lower end--if if (booislowr and booucas) then numdelta = -32 -- lower->UPPER end--if break -- to join mark end--if if (bood0d3bl) then numcharel = (numchaer-208)*64 + (numchaes-128) -- begins at $D0 booisuppr = (numcharel<=47) -- delta $20 $50 booislowr = ((numcharel>=48) and (numcharel<=95)) -- delta $20 $50 if (booisuppr or booislowr) then numdelabs = 32 if ((numcharel<=15) or (numcharel>=80)) then numdelabs = 80 end--if end--if if (booisuppr and boowantlower) then numdelta = numdelabs -- UPPER->lower end--if if (booislowr and booucas) then numdelta = -numdelabs -- lower->UPPER end--if break -- to join mark end--if break -- finally to join mark end--while -- fake loop -- join mark if ((numlaengden==1) and (numdelta~=0)) then strucinut = string.char (numchaer + numdelta) -- no risk of carry here end--if if ((numlaengden==2) and (numdelta~=0)) then numdelcarry = 0 while ((numchaes+numdelta)>=192) do numdelta = numdelta - 64 numdelcarry = numdelcarry + 1 -- add BIG ENDIAN 6 bits with carry end--while while ((numchaes+numdelta)<=127) do numdelta = numdelta + 64 numdelcarry = numdelcarry - 1 -- negat add BIG ENDIAN 6 bits with carry end--while strucinut = string.char (numchaer + numdelcarry) .. string.char (numchaes + numdelta) end--if return strucinut -- same var for input and output !!! end--function lfcasegene ------------------------------------------------------------------------ -- Local function LFXCASEULT -- Adjust letter case of beginning letter or all letters in a word or group of -- words to upper or lower, limited unicode support (generous LFCASEGENE). -- See LFFIXCASE for ASCII-only version. -- Input : * strenigo : word or group of words (may be empty) -- * booupcas : "true" for uppercase and "false" for lowercase -- * boodoall : "true" to adjust all letters, "false" only beginning -- This sub depends on "MATH FUNCTIONS"\"mathmod" and -- "MATH FUNCTIONS"\"mathbittest" and "STRING FUNCTIONS"\"lftestuc" and -- "STRING FUNCTIONS"\"lftestlc" and "UTF8 FUNCTIONS"\"lfutf8length" and -- "UTF8 FUNCTIONS"\"lfcasegene" (generous LFCASEGENE). local function lfxcaseult (strenigo, booupcas, boodoall) local numlein = 0 local numposi = 1 -- octet position ONE-based local numcut = 0 -- length of an UTF8 char local bootryadj = false -- try to adjust single char local strte7mp = "" local strelygo = "" numlein = string.len (strenigo) while (true) do if (numposi>numlein) then break -- done end--if bootryadj = (boodoall or (numposi==1)) numcut = lfutf8length(string.byte(strenigo,numposi,numposi)) if ((numcut==0) or ((numposi+numcut-1)>numlein)) then numcut = 1 -- skip ie copy one faulty octet bootryadj = false end--if strte7mp = string.sub (strenigo,numposi,(numposi+numcut-1)) -- 1...4 oct if (bootryadj) then strte7mp = lfcasegene(strte7mp,booupcas) -- (generous LFCASEGENE) end--if strelygo = strelygo .. strte7mp -- this can be slow numposi = numposi + numcut end--while return strelygo end--function lfxcaseult ------------------------------------------------------------------------ ---- ORDINARY LOCAL HIGH LEVEL FUNCTIONS ---- ------------------------------------------------------------------------ -- Local function LFBREWERROR -- #E02...#E99, note that #E00 and #E01 are NOT supposed to be included here. -- We need 3 const strings "constrelabg", "constrelaen", -- "constrlaxhu" and const table "contaberaroj". -- This sub depends on "CONVERSION FUNCTIONS"\"lfnumto2digit". local function lfbrewerror (numerrorcode) local vardeskrip = 0 local strytsux = '#E' vardeskrip = contaberaroj[numerrorcode] -- risk of type "nil" if (type(vardeskrip)=="string") then strytsux = strytsux .. lfnumto2digit(numerrorcode) .. ' ' .. vardeskrip else strytsux = strytsux .. '??' end--if strytsux = constrlaxhu .. constrelabg .. strytsux .. constrelaen .. constrlaxhu return strytsux end--function lfbrewerror ------------------------------------------------------------------------ -- Local function LFINSERTULTIM -- Insert selected extra strings into a given string at given positions -- with optional discarding if the insertable item is empty. Discarding is -- protected from access out of range by clamping. -- Input : * strmdata -- main data string with control cod (syntax see below) -- * tabinseert -- not-string is safe and has same effect as empty -- string, "nil" or empty string "" are preferred -- Output : * strhazil -- syntax of insertion and discarding magic string: -- "@" followed by 2 uppercase letters and 2 hex numbers -- otherwise the hit is not processed, but copied as-is instead -- 2 letters select the insertable item from table supplied by the caller -- 2 hex numbers control discarding left and right (0...15 char:s) -- empty item is legal and results in discarding if some number is non-ZERO -- if uppercasing or other adjustment is needed then the caller must take -- care of it in the form of 2 or more separate items provided in the table -- This sub depends on "STRING FUNCTIONS"\"lftestuc" -- and "CONVERSION FUNCTIONS"\"lfonehextoint". local function lfinsertultim (strmdata,tabinseert) local varduahuruf = 0 local strhazil = '' local numdatalen = 0 local numdatainx = 0 local numdataoct = 0 -- maybe @ local numdataodt = 0 -- UC local numdataoet = 0 -- UC local numammlef = 0 -- hex and discard left local numammrig = 0 -- hex and discard right local boogotmagic = false numdatalen = string.len(strmdata) numdatainx = 1 -- ONE-based while (true) do -- genuine loop, "numdatainx" is the counter if (numdatainx>numdatalen) then -- beware of risk of overflow below break -- done (ZERO iterations possible) end--if boogotmagic = false numdataoct = string.byte(strmdata,numdatainx,numdatainx) numdatainx = numdatainx + 1 while (true) do -- fake loop if ((numdataoct~=64) or ((numdatainx+3)>numdatalen)) then break -- no hit here end--if numdataodt = string.byte(strmdata, numdatainx , numdatainx ) numdataoet = string.byte(strmdata,(numdatainx+1),(numdatainx+1)) if ((lftestuc(numdataodt)==false) or (lftestuc(numdataoet)==false)) then break -- no hit here end--if numammlef = string.byte(strmdata,(numdatainx+2),(numdatainx+2)) numammrig = string.byte(strmdata,(numdatainx+3),(numdatainx+3)) numammlef = lfonehextoint (numammlef) numammrig = lfonehextoint (numammrig) boogotmagic = ((numammlef~=255) and (numammrig~=255)) break end--while -- fake loop if (boogotmagic) then numdatainx = numdatainx + 4 -- consumed 5 char:s, cannot overflow here varduahuruf = string.char (numdataodt,numdataoet) varduahuruf = tabinseert[varduahuruf] -- risk of type "nil" if (type(varduahuruf)~="string") then varduahuruf = '' -- type "nil" or invalid type gives empty string end--if if (varduahuruf=='') then numdataoct = string.len(strhazil) - numammlef -- this can underflow if (numdataoct<=0) then strhazil = '' else strhazil = string.sub(strhazil,1,numdataoct) -- discard left end--if numdatainx = numdatainx + numammrig -- discard right this can overflow else strhazil = strhazil .. varduahuruf -- augment end--if else strhazil = strhazil .. string.char(numdataoct) -- copy char as-is end--if (boogotmagic) else end--while return strhazil end--function lfinsertultim ------------------------------------------------------------------------ -- Local function LFFINDITEMS -- Input : * long string where to search -- * even number of char:s fe "WCWU" what to search -- Output : * boolean local function lffinditems (strwhere, strandevenwhat) local strcxztvaa = '' local numcxzlen = 0 local numcxzind = 1 -- ONE-based step TWO local boofoundthecrap = false numcxzlen = string.len(strandevenwhat) while (true) do if ((numcxzind+1)>numcxzlen) then break -- not found end--if strcxztvaa = "@" .. string.sub(strandevenwhat,numcxzind,(numcxzind+1)) boofoundthecrap = (string.find(strwhere,strcxztvaa,1,true)~=nil) if (boofoundthecrap) then break -- found end--if numcxzind = numcxzind + 2 end--while return boofoundthecrap end--function lffinditems ------------------------------------------------------------------------ -- Local function LFLONGNAME local function lflongname (strlingvonomo, strctlcode) local numsepanjang = 0 local numsejuta = 0 numsepanjang = string.len(strlingvonomo) if ((numsepanjang>=1) and (strctlcode=="eo")) then numsejuta = string.byte (strlingvonomo,numsepanjang,numsepanjang) if (numsejuta==97) then strlingvonomo = "la " .. strlingvonomo end--if end--if if ((numsepanjang>=1) and (strctlcode=="id")) then strlingvonomo = "bahasa " .. strlingvonomo end--if return strlingvonomo end--function lflongname ------------------------------------------------------------------------ -- Local function LFSTRIPPARENT -- Strip part of string hidden in parentheses. -- copy from "strwithparent" to "strystripped" until string " (" found local function lfstripparent (strwithparent) local strystripped = '' local numloongwy = 0 local numiindexx = 0 -- ZERO-based local numocct = 0 local numoddt = 0 numloongwy = string.len(strwithparent) while (true) do if (numiindexx==numloongwy) then break -- copied whole string end--if numocct = string.byte(strwithparent,(numiindexx+1),(numiindexx+1)) numoddt = 0 if ((numiindexx+1)<numloongwy) then numoddt = string.byte(strwithparent,(numiindexx+2),(numiindexx+2)) end--if if (numoddt==40) then break -- stop copying at " (" (2 char:s but only 1 checked) end--if strystripped = strystripped .. string.char(numocct) numiindexx = numiindexx + 1 end--while return strystripped end--function lfstripparent ------------------------------------------------------------------------ local function lfchk789ucase (numasciicode, bookaccepted, booxxaccepted) local boopositiveverdict = false if (numasciicode==88) then boopositiveverdict = booxxaccepted else if (numasciicode==76) then boopositiveverdict = bookaccepted else boopositiveverdict = ((numasciicode==67) or (numasciicode==73) or (numasciicode==77) or (numasciicode==78) or (numasciicode==80) or (numasciicode==85) or (numasciicode==87)) end--if end--if return boopositiveverdict end--function lfchk789ucase ------------------------------------------------------------------------ -- Local function LFCHKKODINV -- Check whether a string (intended to be a language code) contains only 2 -- or 3 lowercase letters or maybe a digit in middle position or maybe -- instead equals to "-" or "??" and maybe additionally is not -- included on the ban list. -- Input : * strqooq -- string (empty is useless and returns -- "true" ie "bad" but can't cause any harm) -- * numnokod -- "0" ... "3" how special codes "-" "??" should pass -- * boodigit -- "true" to allow digit in middle position -- * boonoban -- "true" to skip test against ban table -- Output : * booisbadlk -- boolean "true" if the string is evil -- This sub depends on "STRING FUNCTIONS"\"lftestnum" -- and "STRING FUNCTIONS"\"lftestlc". -- We need const table "contabisbanned". local function lfchkkodinv (strqooq, numnokod, boodigit, boonoban) local varomongkosong = 0 -- for check against the ban list local booisbadlk = false -- pre-assume good local numchiiar = 0 local numukurran = 0 local numindeex = 0 -- ZERO-based while (true) do -- fake (outer) loop if ((strqooq=="-") and ((numnokod==1) or (numnokod==3))) then break -- to join mark -- good end--if if ((strqooq=="??") and ((numnokod==2) or (numnokod==3))) then break -- to join mark -- good end--if numukurran = string.len (strqooq) if ((numukurran<2) or (numukurran>3)) then booisbadlk = true break -- to join mark -- evil end--if numchiiar = string.byte (strqooq,1,1) if (lftestlc(numchiiar)==false) then booisbadlk = true break -- to join mark -- evil end--if numchiiar = string.byte (strqooq,numukurran,numukurran) if (lftestlc(numchiiar)==false) then booisbadlk = true break -- to join mark -- evil end--if if (numukurran==3) then numchiiar = string.byte (strqooq,2,2) if ((boodigit==false) or (lftestnum(numchiiar)==false)) then if (lftestlc(numchiiar)==false) then booisbadlk = true break -- to join mark -- evil end--if end--if ((boodigit==false) or (lftestnum(numchiiar)==false)) end--if if (boonoban==false) then while (true) do -- ordinary inner loop varomongkosong = contabisbanned[numindeex+1] -- number of elem unknown if (type(varomongkosong)~="string") then break -- abort inner loop (then fake outer loop) due to end of table end--if numukurran = string.len (varomongkosong) if ((numukurran<2) or (numukurran>3)) then break -- abort inner loop (then fake outer loop) due to faulty table end--if if (strqooq==varomongkosong) then booisbadlk = true break -- abort inner loop (then fake outer loop) due to violation end--if numindeex = numindeex + 1 -- ZERO-based end--while -- ordinary inner loop end--if (boonoban==false) then break -- finally to join mark end--while -- fake loop -- join mark return booisbadlk end--function lfchkkodinv ------------------------------------------------------------------------ -- Local function LFFILLNAME -- Replace placeholder "\@" "\\@" by augmented name of the caller. -- The caller name is submitted to us as a parameter thus we -- do NOT access any constants and do NOT have to peek it either. local function lffillname (strmessage,strcaller) local strhasill = '' local numstrloen = 0 local numindfx = 1 -- ONE-based local numcjar = 0 local numcjnext = 0 numstrloen = string.len (strmessage) while (true) do if (numindfx>numstrloen) then break -- empty input is useless but cannot cause major harm end--if numcjar = string.byte (strmessage,numindfx,numindfx) numindfx = numindfx + 1 numcjnext = 0 if (numindfx<=numstrloen) then numcjnext = string.byte (strmessage,numindfx,numindfx) end--if if ((numcjar==92) and (numcjnext==64)) then strhasill = strhasill .. strcaller -- invalid input is caller's risk numindfx = numindfx + 1 -- skip 2 octet:s of the placeholder else strhasill = strhasill .. string.char (numcjar) end--if end--while return strhasill end--function lffillname ------------------------------------------------------------------------ -- Local function LFKODEOSG -- Transcode X-surrogates (without "\", thus for example "kacxo", -- NOT "ka\cxo") to UTF-8 cxapeloj in a string (EO only). -- Input : * strsurr -- string (empty is useless but can't cause major harm) -- Output : * strcxapeloj -- We need const table "contabtransiltable". -- This sub depends on "MATH FUNCTIONS"\"mathdiv" -- and "MATH FUNCTIONS"\"mathmod". local function lfkodeosg (strsurr) local varpeek = 0 local strcxapeloj = '' local numinputl = 0 local numininx = 0 -- ZERO-based source index local numknark = 0 -- current char (ZERO is NOT valid) local numknarp = 0 -- previous char (ZERO is NOT valid) local numlow = 0 local numhaj = 0 numinputl = string.len(strsurr) while (true) do if (numininx==numinputl) then break end--if numknark = string.byte(strsurr,(numininx+1),(numininx+1)) numininx = numininx + 1 numhaj = 0 -- pre-assume no translation if ((numknarp~=0) and ((numknark==88) or (numknark==120))) then -- got "x" varpeek = contabtransiltable[numknarp] -- UINT16 or nil if (varpeek~=nil) then numlow = mathmod (varpeek,256) numhaj = mathdiv (varpeek,256) end--if end--if if (numhaj~=0) then strcxapeloj = strcxapeloj .. string.char(numhaj,numlow) numknark = 0 -- invalidade current char else if (numknarp~=0) then -- add previous char only if valid strcxapeloj = strcxapeloj .. string.char(numknarp) -- add it end--if end--if numknarp = numknark -- copy to previous even if invalid end--while if (numknarp~=0) then -- add previous and last char only if valid strcxapeloj = strcxapeloj .. string.char(numknarp) -- add it end--if return strcxapeloj end--function lfkodeosg ------------------------------------------------------------------------ ---- MAIN EXPORTED FUNCTION ---- ------------------------------------------------------------------------ function lawc.ek (arxframent) -- general unknown type local varkantctl = 0 -- picked from "contabkatoj" local vartmp = 0 -- variable without type multipurpose local vartpm = 0 -- variable without type multipurpose -- special type "args" AKA "arx" local arxsomons = 0 -- metaized "args" from our own or caller's "frame" -- general tab ("qtabkatoj" is elsewhere) local tabblock = {} -- from "%"-syntax assi local tablinx = {} -- from "#"-syntax assi filled by "strkrositem" local tabmnfragments = {} -- for manual split local tabextfrog = {} -- from "ext=" local tabstuff = {} -- double-letter indexes -- general str ("qstrtrace" is elsewhere) local strtmp = "" -- temp (fix "contaberaroj", fill insane table, ...) local strviserr = "" -- visible error local strvisgud = "" -- visible good output local strinvank = "" -- invisible "anchor" part local strinvkat = "" -- invisible category part local strret = "" -- final result string -- str for prevalidation of split control string local strkrositem = "" -- assi: prevalidated item from the cross "#"-syntax local strreconl = "" -- manu: reconstructed complete lemma for "sum check" local strfragtbl = "" -- manu: prevalidated fragment to be stored in table local strinnertst = "" -- manu: inner content of brackets to be checked local str2field = "" -- manu -- str specific to language processing local strfrafra = "" -- split control string from "fra=" before conversion local strextext = "" -- extra param local strdstdst = "" -- distinction hint from "dst=" local strscrscr = "" -- script code from "scr=" local strpagenam = "" -- from "{{PAGENAME}}" or "pagenameoverridetestonly" local strlemma = "" -- bold lemma (maybe split) from pagename local strkodbah = "" -- language code (2 or 3 lowercase) from arxsomons[1] local strkodkek6 = "" -- word class code (2 uppercase) from arxsomons[2] local strkodkek7 = "" -- further word class local strnambah = "" -- language name (without prefix "Bahasa") local strnambalo = "" -- long language name (with prefix "la" or "bahasa") local strnamasin = "" -- language name in the language (propralingve) local strnamke6 = "" -- word class full local strnamco6 = "" -- word class stripped local strnamke7 = "" -- word class full local strnamco7 = "" -- word class stripped -- general num local numerr = 0 -- 1 in 2 pa 3 sub 4 neva 5 neko 6 wc 7 fra 8 $S$H 9 chk local numpindex = 0 -- number of anon params local numsplit = 0 -- split strategy (0 auto 1 assisted auto 2 manu 7 none) local numlong = 0 -- for parameter processing local numtamp = 0 -- for parameter processing and split processing local numtump = 0 -- for parameter processing and split processing local numoct = 0 local numodt = 0 local numoet = 0 local numkindex = 0 -- num for prevalidation of split control string local numlaong = 0 local numogt = 0 -- assi and manu local numoht = 0 local numtbindx = 0 -- current index local numprevdx = 0 -- previous index local numhelpcn = 0 -- help counter (assi) and fragment counter (manu) local numnestin = 0 -- number of source opened '[' (manu) local numofslhs = 0 -- number of source slashes (manu) -- quasi-constant num from "constrmainctl" local numshowlemma = 0 -- four-state 0...3 -- general boo local boonocat = false -- from "nocat=true" local bootrace = false -- from "detrc=true" local bookonata = false -- true if "qpiktbllki" index 0 returns valid name local boohavasi = false -- true if we have valid name in "strnamasin" too local boohavdua = false -- true if we have 2 word classes local boohavdst = false local boohavext = false -- true if we have "ext=" local boomo3kat = false -- true if "numshowlemma"=3 local boohavnyr = false -- true if we got "NR" (ultimately exclusive) local boohavkal = false -- true if we got "KA" (almost exclusive) local boohavkur = false -- true if we got "KU" local bootimp = false -- boo for for prevalidation of split control string local boocaught = false -- temp local boo210kl = false -- got "L:" thus slash is prohibited local booslshxx = false -- at least one slash "/" in complete manu "fra=" local boohavepl = false -- fragment is preceded by plus "+" local boohvtext = false -- have ordinary text char:s local boohvpref = false -- have prefix "M:" or similar inside [] -- quasi-constant boo from "constrmainctl" local booshowimage = false ---- GUARD AGAINST INTERNAL ERROR AGAIN ---- -- later reporting of #E01 may NOT depend on uncommentable strings qstrtrace = '<br>This is "mlawc", requested "detrc" report.' -- unconditionl lfshowvar (constrmainctl,'constrmainctl') -- "qstrtrace" lfshowvar (conboomiddig,'conboomiddig') -- "qstrtrace" if (qbooguard) then numerr = 1 -- #E01 internal end--if ---- FILL IN ERROR MESSAGES AND TRANSCODE EO IF NEEDED ---- -- placeholder "\@" "\\@" is replaced by augmented name of the caller -- from "constrkoll" in any case, for example 'sxablono "test"' -- or 'templat "test"' -- only for EO the X-substitution is subsequently performed if (numerr==0) then numtamp = 2 -- start with #E02 numtump = 0 -- tolerance for holes while (true) do vartmp = contaberaroj[numtamp] -- risk of type "nil" if (type(vartmp)=="string") then -- number of messages is NOT hardcoded numtump = 0 strtmp = lffillname (vartmp,constrkoll) if (constrpriv=="eo") then strtmp = lfkodeosg (strtmp) end--if contaberaroj[numtamp] = strtmp else numtump = numtump + 1 end--if if (numtump==4) then -- max 3 consecutive holes break end--if numtamp = numtamp + 1 -- TWO-based end--while if (constrpriv=="eo") then contabwc["PP"] = lfkodeosg(contabwc["PP"]) contabwc["UF"] = lfkodeosg(contabwc["UF"]) end--if end--if ---- FILL IN 2 SEMI-HARDCODED PARAMETERTS TO 3 VAR:S ---- numoct = string.byte (constrmainctl,1,1) -- "0" or "1" booshowimage = (numoct==49) numoct = string.byte (constrmainctl,2,2) -- "0" or "1" or "2" or "3" numshowlemma = lfdec1digcl (numoct,3) boomo3kat = (numshowlemma==3) -- needed for 2 sub:s and final categoriz ---- GET THE ARX (ONE OF TWO) ---- -- must be seized independently on "numerr" even if we already suck arxsomons = arxframent.args -- "args" from our own "frame" if (type(arxsomons)~="table") then arxsomons = {} -- guard against indexing error numerr = 1 -- #E01 internal end--if if (arxsomons['caller']=="true") then arxsomons = arxframent:getParent().args -- "args" from caller's "frame" end--if if (type(arxsomons)~="table") then arxsomons = {} -- guard against indexing error again numerr = 1 -- #E01 internal end--if ---- PROCESS 3 HIDDEN NAMED PARAMS INTO 1 STRING AND 2 BOOLEAN:S ---- -- this may override "mw.title.getCurrentTitle().text" and -- stipulate content in "strpagenam", empty is NOT valid -- bad "pagenameoverridetestonly=" can give #E01 -- no error is possible from other hidden parameters -- "detrc=" and "nocat=" must be seized independently on "numerr" -- even if we already suck, but type "table" must be ensured !!! strpagenam = "" if (numerr==0) then vartmp = arxsomons['pagenameoverridetestonly'] if (type(vartmp)=="string") then numtamp = string.len(vartmp) if ((numtamp>=1) and (numtamp<=120)) then strpagenam = vartmp -- empty is not legal else numerr = 1 -- #E01 internal end--if end--if end--if if (arxsomons['nocat']=='true') then boonocat = true end--if if (arxsomons['detrc']=='true') then bootrace = true end--if if (bootrace) then lfshowvar (numerr,'numerr','done with hidden parameters') -- "qstrtrace" end--if ---- SEIZE THE PAGENAME FROM MW ---- -- later reporting of #E01 may NOT depend on uncommentable strings -- must be 1...120 octet:s keep consistent with "pagenameoverridetestonly=" if ((numerr==0) and (strpagenam=='')) then vartmp = mw.title.getCurrentTitle().text -- without namespace prefix if (type(vartmp)=="string") then numtamp = string.len(vartmp) if ((numtamp>=1) and (numtamp<=120)) then strpagenam = vartmp -- pagename here (empty is NOT legal) else numerr = 1 -- #E01 internal end--if end--if end--if ---- STRICTLY CHECK THE PAGENAME ---- -- for example "o'clock" is legal "o'clock's" is legal -- but "o''clock" is a crime if (numerr==0) then if (strpagenam=='') then numerr = 1 -- #E01 internal else if (lfbanmulti("'1[0]0{0}0",strpagenam)) then numerr = 1 -- #E01 internal end--if end--if end--if ---- WHINE IF YOU MUST #E01 ---- -- reporting of this error #E01 may NOT depend on -- uncommentable strings as "constrkoll" and "contaberaroj" -- do NOT use sub "lfbrewerror", report our name (NOT of template) and in EN if (numerr==1) then strtmp = '#E01 Internal error in module "mlawc".' strviserr = constrlaxhu .. constrelabg .. strtmp .. constrelaen .. constrlaxhu end--if ---- PRELIMINARILY ANALYZE ANONYMOUS PARAMETERS ---- -- this will catch holes, empty parameters, too long parameters, -- and wrong number of parameters -- below on exit var "numpindex" will contain number of -- prevalidated anonymous params -- this depends on 3 constants: -- * contabparam[0] minimal number -- * contabparam[1] maximal number -- * contabparam[2] maximal length (default 160) if (numerr==0) then numpindex = 0 -- ZERO-based numtamp = contabparam[1] -- maximal number of params while (true) do vartmp = arxsomons [numpindex+1] -- can be "nil" if ((type(vartmp)~="string") or (numpindex>numtamp)) then break -- good or bad end--if numlong = string.len (vartmp) if ((numlong==0) or (numlong>contabparam[2])) then numerr = 2 -- #E02 param/RTFD break -- only bad here end--if numpindex = numpindex + 1 -- on exit has number of valid parameters end--while if ((numpindex<contabparam[0]) or (numpindex>numtamp)) then numerr = 2 -- #E02 param/RTFD end--if end--if ---- PROCESS 2 OBLIGATORY ANONYMOUS PARAMS INTO 3 STRINGS ---- -- now var "numpindex" sudah contains number of prevalidated params always -- 2 and is useless -- here we validate and assign "strkodbah", "strkodkek6", -- "boohavdua", "strkodkek7" (can be empty), "boohavkal", "boohavnyr" -- note that "lfchkkodinv" returns "true" on failure and natively supports -- "??" whereas "lfmultestuc" returns "true" on success and does -- NOT natively support "??" -- this depends directly on const boolean "conboomiddig" -- this depends indirectly on const table "contabisbanned" via "lfchkkodinv" if (numerr==0) then while (true) do -- fake loop strkodbah = arxsomons[1] -- language code (obligatory) if (lfchkkodinv(strkodbah,2,conboomiddig,false)) then numerr = 4 -- #E04 -- "??" is tolerable but "-" is NOT in "lfchkkodinv" break -- to join mark end--if boohavdua = false strkodkek6 = arxsomons[2] -- 2 UC or 4 UC (obligatory) numlong = string.len (strkodkek6) strkodkek7 = "" if (numlong==4) then -- maybe 2 word classes strkodkek7 = string.sub (strkodkek6,3,4) strkodkek6 = string.sub (strkodkek6,1,2) if ((strkodkek6=='??') or (strkodkek7=='??')) then numerr = 6 -- #E06 -- if both are specified then no "??" tolerable break -- to join mark end--if boohavdua = true end--if if (strkodkek6~='??') then -- "??" is unknown but not faulty if (lfmultestuc(strkodkek6,2)==false) then numerr = 6 -- #E06 break -- to join mark end--if end--if if (boohavdua) then -- here "??" for unknown is NOT permitted if (lfmultestuc(strkodkek7,2)==false) then numerr = 6 -- #E06 break -- to join mark end--if end--if if ((strkodkek6=='NR') or (strkodkek7=='NR')) then boohavnyr = true -- needed far below end--if if ((strkodkek6=='KA') or (strkodkek7=='KA')) then boohavkal = true -- needed far below end--if if ((strkodkek6=='KU') or (strkodkek7=='KU')) then boohavkur = true -- only for exclusivity test end--if if (boohavdua and boohavnyr) then numerr = 6 -- #E06 -- "NR" is ultimately exclusive break -- to join mark end--if if (boohavdua and boohavkal and (boohavkur==false)) then numerr = 6 -- #E06 -- "KA" is almost exclusive break -- to join mark end--if if ((strkodbah=='??') and (strkodkek6=='??')) then numerr = 6 -- #E06 -- both unknown is illegal break -- to join mark end--if break -- finally to join mark end--while -- fake loop -- join mark end--if ---- PROCESS 1 OPTIONAL NAMED PARAM INTO 1 STRING ---- -- here we validate and assign "boohavdst" and "strdstdst" -- (2...40 or empty) from "dst=" regardless "numshowlemma" boohavdst = false strdstdst = '' if (numerr==0) then while (true) do -- fake loop -- abort on both success or failure -- "dst" vartmp = arxsomons['dst'] -- optional, NOT yet prevalidated if (type(vartmp)~="string") then break -- parameter not specified end--if numtamp = string.len(vartmp) if ((numtamp<2) or (numtamp>40)) then numerr = 11 -- #E11 -- "dst=" is bad break end--if boohavdst = true strdstdst = vartmp if (lfbanmulti("'0[0]0{0}0(0)0",strdstdst)) then numerr = 11 -- #E11 -- "dst=" is bad -- all brackets prohibited if (bootrace) then lftracemsg ('Illegal bracket in parameter "dst=" found') end--if end--if break -- finally to join mark end--while -- fake loop -- join mark end--if if (bootrace) then lfshowvar (strdstdst,'strdstdst','"dst=" maybe seized') -- "qstrtrace" lfshowvar (numerr,'numerr') -- "qstrtrace" end--if ---- PROCESS 3 OPTIONAL NAMED PARAMS INTO 3 STRINGS ---- -- here we (only if "numshowlemma" >=2 or is 3) prevalidate and store -- 3 parameters "fra=" "ext=" "scr=" -- from "fra=" to string "strfrafra" (1...120 or empty) and to -- "numsplit" (0...5 or 7) #S5 #S7 -- min length is: 1 for "-" no split | 2 for assi split | 4 for manual split -- "numsplit" must be 7 if "numshowlemma" is 0 or 1 !!! -- tables "tabblock" and "tablinx" must be empty for "numsplit" other than 1 -- "strfrafra" is needed after end of this block only for "numsplit" 1 or 2 -- here we validate and assign "strextext" 2 char:s (8 possible -- values) or 5...120 char:s and assign "boohavext" -- here we validate and assign "strscrscr" 1 uppercase strfrafra = '' numsplit = 7 -- preliminary default strategy is no split #S7 if ((numerr==0) and (numshowlemma>=2)) then while (true) do -- fake loop -- abort on both success or failure -- "fra" numsplit = 0 -- default strategy is auto #S0 vartmp = arxsomons['fra'] -- optional, NOT yet prevalidated if (type(vartmp)~="string") then break -- parameter not specified, stick with default strategy 0 or 7 end--if numtamp = string.len(vartmp) if ((numtamp<1) or (numtamp>120)) then numerr = 7 -- #E07 -- "fra=" is bad -- illegal length break end--if strfrafra = vartmp if (lfbanmulti("/1(1)1+1'1[1]1{0}0|0",strfrafra)) then numerr = 7 -- #E07 -- "fra=" is bad -- illegal char:s detected break end--if vartmp = string.find (strfrafra, "[]", 1, true) -- plain text search if (vartmp) then numerr = 7 -- #E07 -- apartigo break end--if vartmp = string.find (strfrafra, "[http://", 1, true) -- plain text search if (vartmp) then numerr = 7 -- #E07 -- apartigo break end--if vartmp = string.find (strfrafra, "[https://", 1, true) -- plain text search if (vartmp) then numerr = 7 -- #E07 -- apartigo break end--if if (string.len(strfrafra)==2) then numoct = string.byte (strfrafra,1,1) -- maybe "$" ("&" belongs "ext=") numodt = string.byte (strfrafra,2,2) -- only 3 letters tolerable S B H if ((numoct==36) and ((numodt==83) or (numodt==66) or (numodt==72))) then numsplit = 3 -- 83 "$S" : simple root split #S3 -> frag type N+U if (numodt==66) then numsplit = 4 -- 66 "$B" : simple bare root #S4 -> frag type M or N end--if if (numodt==72) then numsplit = 5 -- 72 "$H" : simple zh letter #S5 -> frag type M end--if if (numsplit==3) then numtamp = string.len (strpagenam) -- at least 1 but 1 is too low numoet = string.byte (strpagenam,numtamp,numtamp) if ((numtamp==1) or (lftestlc(numoet)==false)) then numerr = 8 -- #E08 -- apartigo -- illegal pagename for "$S" #S3 if (bootrace) then lftracemsg ('Illegal pagename for "$S" (eo) in parameter "fra="') end--if end--if end--if (numsplit==3) then if (numsplit==5) then if (lfbanmulti("!0,0.0;0?0 0-0'0",strfrafra)) then numerr = 8 -- #E08 -- apartigo -- illegal pagename for "$H" #S5 if (bootrace) then lftracemsg ('Illegal pagename for "$H" (zh) in parameter "fra="') end--if end--if end--if (numsplit==5) then break -- done success 345 and "strfrafra" not needed anymore or #E08 end--if ((numoct==36) and ... end--if (string.len(strfrafra)==2) then if (strfrafra=="-") then numsplit = 7 -- no split #S7 break -- done success 7 and "strfrafra" not needed anymore end--if numoct = string.byte (strfrafra,1,1) if ((numoct==35) or (numoct==37)) then -- "#" or "%" numsplit = 1 -- assi auto #S1 else numsplit = 2 -- manual #S2 end--if numtamp = string.len(strfrafra) if ((numtamp<2) or ((numsplit==2) and (numtamp<4))) then numerr = 7 -- #E07 -- "fra=" is bad -- apartigo too short end--if break -- finally to join mark end--while -- fake loop -- join mark end--if if (bootrace) then lfshowvar (strfrafra,'strfrafra','"fra=" maybe seized') -- "qstrtrace" lfshowvar (numsplit,'numsplit') -- "qstrtrace" lfshowvar (numerr,'numerr') -- "qstrtrace" end--if strextext = '' boohavext = false if ((numerr==0) and (numshowlemma==3)) then while (true) do -- fake loop -- abort on both success or failure -- "ext" vartmp = arxsomons['ext'] -- optional, NOT yet prevalidated if (type(vartmp)~="string") then break -- parameter not specified end--if numtamp = string.len(vartmp) if ((numtamp<2) or (numtamp>120)) then numerr = 13 -- #E13 -- "ext=" is bad break end--if strextext = vartmp -- pick it (further validation pending) boohavext = true if (lfbanmulti("/0(0)0+0'1[1]1{0}0|0",strextext)) then numerr = 13 -- #E13 -- "ext=" is bad break end--if if (string.len(strextext)==2) then numoct = string.byte (strextext,1,1) -- maybe "&" numodt = string.byte (strextext,2,2) -- only 8 letters tolerable if ((numoct==38) and lfchk789ucase(numodt,false,true)) then break -- success with "&"-syntax C I M N P U W X end--if end--if if (numtamp<5) then numerr = 13 -- #E13 -- "ext=" is bad if (bootrace) then lftracemsg ('Parameter "ext=" has 2...4 char:s but not valid "&"-syntax') -- "qstrtrace" end--if end--if break -- finally to join mark end--while -- fake loop -- join mark end--if if (bootrace) then lfshowvar (strextext,'strextext','"ext=" maybe seized') -- "qstrtrace" lfshowvar (boohavext,'boohavext','from "ext=" too') -- "qstrtrace" lfshowvar (numerr,'numerr') -- "qstrtrace" end--if strscrscr = '' if ((numerr==0) and (numshowlemma==3)) then while (true) do -- fake loop -- abort on both success or failure -- "scr" vartmp = arxsomons['scr'] -- optional, NOT yet prevalidated if (type(vartmp)~="string") then break -- parameter not specified end--if numtamp = string.len(vartmp) if (numtamp~=1) then numerr = 14 -- #E14 -- "scr=" is bad break end--if strscrscr = vartmp -- pick it (further validation pending) numtamp = string.byte(strscrscr,1,1) if (lftestuc(numtamp)==false) then numerr = 14 -- #E14 -- "scr=" is bad break end--if break -- finally to join mark end--while -- fake loop -- join mark end--if if (bootrace) then lfshowvar (strscrscr,'strscrscr','"scr=" maybe seized') -- "qstrtrace" lfshowvar (numerr,'numerr') -- "qstrtrace" end--if ---- STRATE 1 -- PROCESS VALIDATE SPLIT CONTROL STRING TO 2 TABLES ---- -- process from "strfrafra" to "tabblock" (from "%") and -- to "tablinx" (from "#") both later processed in "lfsplitaa" "qsplitter" -- "numsplit" equal 1 means only that "strfrafra" is -- 2...120 octet's and begins with "#" or "%" and is free from some -- evil stuff such as "++" "''" "[[" "]]" "[]" "[http" "[https" but not more -- example of valid syntax "%3A #2N #5A #7N #8:test" -- note that "%" may not be alone ie empty nor followed by SPACE ie "% " -- any SPACE must be followed by "#" by syntax rules -- this can brew #E07 if ((numerr==0) and (numsplit==1)) then while (true) do -- outer fake loop numlaong = string.len (strfrafra) numtamp = 1 -- ONE-based index numprevdx = - 1 -- must be ascending, index ZERO valid numogt = string.byte (strfrafra,1,1) -- got "%" or NOT ?? if (numogt==37) then if (numlaong==1) then numerr = 7 -- #E07 -- "fra=" is bad -- "%" may not be empty break -- outer fake loop end--if numodt = string.byte (strfrafra,2,2) -- "% " is illegal if (numodt==32) then numerr = 7 -- #E07 -- "fra=" is bad -- "%" may not be empty break -- outer fake loop end--if numtamp = 2 -- ONE-based index -- check after "%" numhelpcn = 0 -- counts blocked boundaries (max 8) while (true) do -- inner honest loop if ((numtamp>numlaong) or (numhelpcn>8)) then break -- inner loop only -- good or bad end--if numogt = string.byte (strfrafra,numtamp,numtamp) -- SPACE or HEX req numtamp = numtamp + 1 if (numogt==32) then numoet = 0 if (numtamp<=numlaong) then numoet = string.byte (strfrafra,numtamp,numtamp) -- "#" required end--if if (numoet~=35) then numerr = 7 -- #E07 -- "fra=" is bad end--if break -- inner loop only -- good or bad end--if numtbindx = lfonehextoint (numogt) if ((numtbindx==255) or (numtbindx<=numprevdx)) then numerr = 7 -- #E07 -- "fra=" is bad -- not ascending break -- inner loop only end--if tabblock [numtbindx] = '1' -- type "string" numhelpcn = numhelpcn + 1 numprevdx = numtbindx end--while end--if if (numhelpcn>8) then numerr = 7 -- #E07 -- "fra=" is bad end--if if (numerr~=0) then break -- outer loop with #E07 end--if if (numtamp>numlaong) then break -- outer fake loop -- OK end--if numprevdx = - 1 -- must be ascending, index ZERO valid, restart from it while (true) do -- inner honest loop if (numtamp>numlaong) then break -- inner loop only -- good end of string end--if numogt = string.byte (strfrafra,numtamp,numtamp) -- "#" required numtamp = numtamp + 1 if (numogt~=35) then numerr = 7 -- #E07 -- "fra=" is bad break -- inner loop only end--if if (numtamp>numlaong) then numerr = 7 -- #E07 -- "fra=" is bad break -- inner loop only end--if numogt = string.byte (strfrafra,numtamp,numtamp) -- HEX required numtamp = numtamp + 1 numtbindx = lfonehextoint (numogt) if ((numtbindx==255) or (numtbindx<=numprevdx)) then numerr = 7 -- #E07 -- "fra=" is bad -- not ascending break -- inner loop only end--if strkrositem = "" -- no valid hit yet -- prevalidated from "#"-syntax if (numtamp>numlaong) then numerr = 7 -- #E07 -- "fra=" is bad break -- inner loop only end--if numodt = string.byte (strfrafra,numtamp,numtamp) -- one of 4 required numtamp = numtamp + 1 if ((numodt==78) or (numodt==73) or (numodt==65)) then strkrositem = string.char (numodt) -- "string" of "N" or "I" or "A" if (numtamp<=numlaong) then numoet = string.byte (strfrafra,numtamp,numtamp) -- SPACE required numtamp = numtamp + 1 -- SPACE must be eaten away here !!! if (numoet~=32) then numerr = 7 -- #E07 -- "fra=" is bad end--if if (numtamp<=numlaong) then numoet = string.byte (strfrafra,numtamp,numtamp) -- "#" required end--if if (numoet~=35) then numerr = 7 -- #E07 -- "fra=" is bad end--if end--if end--if ((numodt==78) or (numodt==73) or (numodt==65)) then if (numodt==58) then -- ":" numhelpcn = 0 -- counts char:s in the link target while (true) do -- deep honest loop if ((numtamp>numlaong) or (numhelpcn==41)) then break -- deep loop only -- good or bad end--if numodt = string.byte (strfrafra,numtamp,numtamp) -- trash "numodt" numtamp = numtamp + 1 if (numodt==32) then numoet = 0 -- SPACE must be eaten away here !!! INC is above if (numtamp<=numlaong) then numoet = string.byte (strfrafra,numtamp,numtamp) -- "#" required end--if if (numoet~=35) then numerr = 7 -- #E07 -- "fra=" is bad end--if break -- deep loop only -- good or bad end--if strkrositem = strkrositem .. string.char(numodt) -- no ":" prf yet numhelpcn = numhelpcn + 1 end--while if ((numhelpcn==0) or (numhelpcn>40)) then numerr = 7 -- #E07 -- "fra=" is bad end--if if (numerr~=0) then break -- inner loop with #E07 end--if strkrositem = ":" .. strkrositem -- add the prefix end--if (numodt==58) then if (strkrositem=='') then numerr = 7 -- #E07 -- "fra=" is bad end--if if (numerr~=0) then break -- inner loop with #E07 end--if tablinx [numtbindx] = strkrositem numprevdx = numtbindx end--while break -- finally to join mark end--while -- fake loop -- join mark end--if ((numerr==0) and (numsplit==1)) then if (bootrace) then lfshowvar (tabblock,'tabblock','from "%" assi',17) -- "qstrtrace" lfshowvar (tablinx,'tablinx','from "#" assi',17) -- "qstrtrace" lfshowvar (numerr,'numerr','done with 2 tables') -- "qstrtrace" end--if ---- STRATE 2 -- PROCESS VALIDATE SPLIT CONTROL STRING TO 1 TABLE ---- -- process from "strfrafra" to "tabmnfragments" later processed -- in "lfsplitmn" "qsplitter" and we need "strpagenam" too -- so far "numsplit" equal 2 means only that "strfrafra" is -- 4...120 octet's and does NOT begin with "#" or "%" and is free from some -- evil stuff such as "++" "''" "[[" "]]" "[]" "[http" "[https" but not more -- examples of valid syntax: -- "[C:per-...-an/per][M:tidak][M:sama][C:per-...-an/an]" -- "[C:per-...-an/per]+[M:kereta( )api]+[C:per-...-an/an]" -- "[M:loep(a)]+[U:-are/ar(e)]+[M:sko]" -- "[M:kung]+a+[M:doeme]" -- "[I:et]+[L:fingr(o)]+[U:o]" -- spaces are restricted: -- * a field may not begin nor end with a space ("[U:-are /ar(e)]" is bad) -- * deleted substring may not begin nor end with -- a space ("[M:loep( a)]" is bad) -- * deleted single spaces are prohibited after "L:" but otherwise -- permitted ("[L:fingr( )]" is bad but "[M:kereta( )api]" is good) -- we have to count slashes to make sure not to get more -- than 1 in a single fragment -- we do NOT have to count colons because they are ignored if -- not in the beginning, thus we cannot get more than 1 in a fragment ;-) -- colon is only regarded and can cause an error if: -- * preceded by an uppercase letter -- * those 2 char:s are located in the beginning of fragment and inside [...] -- otherwise it is considered to be an ordinary letter -- * for example "+[M:crap]" is regarded and valid (although maybe useless) -- * for example "+[A:crap]" is regarded and an error -- * for example "+[m:A:crap]" and "A:crap" is maybe nonsense but ignored -- and not an error against the spec -- here we do NOT YET introduce wikilinks with double brackets and walls -- here we do NOT YET expand "+" to " + " -- here we do NOT YET add dashes to some affixes -- here we DO CARRY OUT the "sum check" -- all 8 letters C I M N P U W L permitted here (but L restricted) -- "strfragtbl" bunches the fragment EXCLUDING possible "+" and "[" and "]" -- but they are RE-ADDED before it is stored in "tabmnfragments" !!! -- this can brew #E07 except for "sum check" carried out here giving #E09 -- "STRING FUNCTIONS"\"lftestspace" and "STRING FUNCTIONS"\"lfdebracket" if ((numerr==0) and (numsplit==2)) then numlaong = string.len (strfrafra) numtamp = 1 -- ONE-based source char index numhelpcn = 0 -- number valid fragments defined (less than 1 or 2 illegal) numnestin = 0 -- number of source opened '[' (only ZERO or ONE is legal) numofslhs = 0 -- number of source slashes '/' (only ZERO or ONE is legal) strfragtbl = '' -- fragment incl prefix ("M:") and "/" str2field = '' -- visible part of fragment after slash for "sum check" strreconl = '' -- reconstructed complete lemma for "sum check" boohvtext = false -- have ordinary text char:s in field incl () excl +[/] boohvpref = false -- have prefix "M:" or similar inside [] boo210kl = false -- separate verdict for every fragment "L:" used booslshxx = false -- true if we got slash inside complete control string boohavepl = false -- bracketed fragment is preceded by plus "+" while (true) do -- genuine loop, "numtamp" is the counter if (numtamp>numlaong) then if (numnestin==1) then numerr = 7 -- #E07 -- "fra=" is bad due to unclosed '[' break -- damn end--if if (boohvtext) then -- flush: do add but no need to erase anymore strreconl = strreconl .. str2field -- same thing if type "000" if (boohavepl) then str2field = "+" .. str2field -- no [] and no spaces yet here end--if tabmnfragments[numhelpcn] = str2field -- same thing if type "000" numhelpcn = numhelpcn + 1 end--if break -- done (some checks pending) end--if if (numhelpcn==16) then numerr = 7 -- #E07 -- "fra=" is bad due to more than 16 fragments break -- damn end--if numoht = 0 -- previous char if (numtamp~=1) then numoht = string.byte (strfrafra,(numtamp-1),(numtamp-1)) -- can get it end--if numoct = string.byte (strfrafra,numtamp,numtamp) numtamp = numtamp + 1 numogt = 0 -- pre-peeked following char if (numtamp<=numlaong) then numogt = string.byte (strfrafra,numtamp,numtamp) -- we can pre-peek end--if boocaught = false -- becomes true if char already caught (kaptiloj ...) if (numoct==32) then -- space -- keep "boocaught" false if ((boohvtext==false) and (numnestin==1)) then -- chk only inside [] numerr = 7 -- #E07 -- broken "fra=" due to field beginning wth space break -- damn end--if end--if if (numoct==43) then -- plus "+" is fragment separator boocaught = true if ((numoht==32) or (numogt==32) or (numoht==43) or (numogt==43)) then numerr = 7 -- #E07 -- broken "fra=" due to space or double plus break -- damn end--if if ((numoht~=93) and (numogt~=91)) then numerr = 7 -- #E07 -- broken "fra=" due to bad use of '+' no "[","]" break -- damn end--if if (numnestin==1) then numerr = 7 -- #E07 -- broken "fra=" due to bad use of '+' inside [] break -- damn end--if if (boohvtext) then -- flush: do add and do erase then strreconl = strreconl .. str2field -- same thing if type "F000" if (boohavepl) then -- possible previous plus, not this one !!! str2field = "+" .. str2field -- no [] and no spaces yet here end--if tabmnfragments[numhelpcn] = str2field -- same thing if type "F000" numhelpcn = numhelpcn + 1 end--if strfragtbl = '' -- for the table (not yet including rectangular bra) str2field = '' -- visible for "sum check" boohvtext = false -- empty field ready to be filled with garbage boohvpref = false -- empty field ready to be filled with garbage boo210kl = false -- separate verdict for every fragment boohavepl = true -- need this later when adding or flushing end--if if (numoct==91) then boocaught = true -- do NOT touch "boohavepl" !!! needed later if (numnestin==1) then numerr = 7 -- #E07 -- "fra=" is bad due to nesting of '[' break -- damn end--if numnestin = 1 -- after opening '[' and keep "boohavepl" untouched if (boohvtext) then -- flush: do add and do erase then strreconl = strreconl .. str2field -- same thing if type "F000" if (boohavepl) then str2field = "+" .. str2field -- no [] and no spaces yet here end--if tabmnfragments[numhelpcn] = str2field -- same thing if type "F000" numhelpcn = numhelpcn + 1 end--if strfragtbl = '' -- for the table (not yet including rectangular bra) str2field = '' -- visible for "sum check" boohvtext = false -- empty field ready to be filled with garbage boohvpref = false -- empty field ready to be filled with garbage boo210kl = false -- separate verdict for every fragment end--if if (numoct==93) then boocaught = true if ((numnestin==0) or (boohvtext==false)) then numerr = 7 -- #E07 -- "fra=" bad due to nesting of ']' or empty '[]' break -- damn end--if if (lftestspace(str2field)) then -- test visible part only here numerr = 7 -- #E07 "fra=" bad due to criminal spaces break -- damn end--if if (boo210kl) then strinnertst = lfdebracket (strfragtbl,false,1) -- inner part, no "/" if (lftestspace(strinnertst)) then numerr = 7 -- #E07 "fra=" bad criminal spaces inside ( ) aftr "L:" break -- damn end--if end--if strreconl = strreconl .. lfdebracket (str2field,true,1) -- visible part strfragtbl = '[' .. strfragtbl .. ']' -- plus "+" outside of [] !!! if (boohavepl) then strfragtbl = "+" .. strfragtbl -- outside and no spaces yet here end--if tabmnfragments[numhelpcn] = strfragtbl -- complete fragment numhelpcn = numhelpcn + 1 strfragtbl = '' -- for the table (not yet including rectangular bra) str2field = '' -- visible for "sum check" boohvtext = false -- empty field ready to be filled with garbage boohvpref = false -- empty field ready to be filled with garbage boohavepl = false -- separate verdict for every fragment boo210kl = false -- separate verdict for every fragment numnestin = 0 -- again ZERO after closing ']' numofslhs = 0 -- reset number of slashes to ZERO end--if if ((numogt==58) and lftestuc(numoct) and (numnestin==1) and (numofslhs==0) and (boohvtext==false) and (boohvpref==false)) then boocaught = true if (lfchk789ucase(numoct,true,false)) then numtamp = numtamp + 1 -- OK, eat it away for now C I M N P U W L strfragtbl = string.char (numoct) .. ':' -- begin fragment for table boo210kl = (numoct==76) -- "L" boohvpref = true else numerr = 7 -- #E07 -- "fra=" bad due to wrong uppercase before ":" break -- damn end--if end--if if (numoct==47) then boocaught = true -- slash "/" if ((numofslhs==1) or (lftestspace(str2field)) or boo210kl) then numerr = 7 -- #E07 -- bad due to space or excess slash or "L" break -- damn end--if booslshxx = true -- YES -- exists in complete split control string numofslhs = 1 -- number of "/" in this fragment strfragtbl = strfragtbl .. '/' -- add "/" to fragment no wall yet str2field = '' -- OTOH clear the visible part boohvtext = false -- in 1 field, note that empty after slash is NOT LEGAL end--if if (boocaught==false) then strfragtbl = strfragtbl .. string.char (numoct) -- add to frag for tbl str2field = str2field .. string.char (numoct) -- add for "sum check" boohvtext = true end--if end--while -- genuine loop, "numtamp" is the counter if ((numhelpcn==0) or ((numhelpcn==1) and (booslshxx==false))) then numerr = 7 -- #E07 -- "fra=" is bad -- at least 1 or 2 fragments requi end--if if ((numerr==0) and (strpagenam~=strreconl)) then numerr = 9 -- #E09 -- "fra=" is bad -- "sum check" if (bootrace) then lftracemsg ('Failed "sum check" in manual split : "' .. strpagenam .. '" <> "' .. strreconl .. '"') end--if end--if end--if ((numerr==0) and (numsplit==2)) then if (bootrace) then lfshowvar (tabmnfragments,'tabmnfragments','from manu done with one table',17) -- "qstrtrace" lfshowvar (numerr,'numerr') -- "qstrtrace" end--if ---- PROCESS AND VALIDATE EXTRA PARAMETER ---- -- process fragments from "strextext" to "tabextfrog" removing rectangular -- brackets and carrying out full validation (as opposed to above where -- rectangular brackets are preserved in "tabmnfragments") -- only type F210 is permitted and only C I M N P U W available -- and ":" or "!" is required -- no arc brackets "(" ")" no plus "+" no slash "/" (this is sudah checked) -- alternatively expand "&"-syntax from "strextext" to "tabextfrog" -- getting 1 or 2 "!"-fragments, even "X" permitted numlaong = 0 -- pre-ass'ume for empty parameter if ((numerr==0) and boohavext) then numlaong = string.len (strextext) end--if if (numlaong==2) then numoct = string.byte (strextext,2,2) -- only 8 possible letters tolerable if (numoct==88) then tabextfrog[0] = 'M!' .. strpagenam tabextfrog[1] = 'W!' .. strpagenam else tabextfrog[0] = string.char(numoct,33) .. strpagenam -- "!" too end--if end--if if (numlaong>=5) then -- skip this for "&"-syntax or empty numtamp = 1 -- ONE-based source char index numhelpcn = 0 -- number of valid fragments defined numnestin = 0 -- number of source opened '[' (only ZERO or ONE is legal) strfragtbl = '' -- fragment incl prefix (fe "M:" "U!") excl "[" and "]" boohvtext = false -- have ordinary text char:s in field excl [] and prefix while (true) do -- genuine loop, "numtamp" is the counter if (numtamp>numlaong) then if (numnestin==1) then numerr = 13 -- #E13 -- "ext=" is bad due to unclosed '[' end--if break -- done good or evil end--if if (numhelpcn==4) then numerr = 13 -- #E13 -- "ext=" is bad due to more than 4 fragments break -- damn end--if numoct = string.byte (strextext,numtamp,numtamp) numtamp = numtamp + 1 boocaught = false -- becomes true if char already caught (kaptiloj ...) if (numoct==91) then boocaught = true if (numnestin==1) then numerr = 13 -- #E13 -- "ext=" is bad due to nesting of '[' break -- damn end--if numoct = string.byte (strextext,numtamp,numtamp) -- mortyp prefix numtamp = numtamp + 1 if (lfchk789ucase(numoct,false,false)==false) then numerr = 13 -- #E13 -- "ext=" is bad due to lack of valid prefix break -- damn need C I M N P U W even in "ext=" no "X" here end--if numodt = string.byte (strextext,numtamp,numtamp) -- preserve "numoct" numtamp = numtamp + 1 if ((numodt~=58) and (numodt~=33)) then -- ":" or "!" numerr = 13 -- #E13 -- "ext=" is bad due to lack of valid prefix break -- damn end--if strfragtbl = string.char (numoct,numodt) -- begin fragment for table numnestin = 1 -- after opening '[' boohvtext = false -- empty field ready to be filled with garbage end--if if (numoct==93) then boocaught = true if ((numnestin==0) or (boohvtext==false)) then numerr = 13 -- #E13 -- "ext=" bad due nesting of ']' or empty '[]' break -- damn end--if tabextfrog[numhelpcn] = strfragtbl -- store complete fragment numhelpcn = numhelpcn + 1 strfragtbl = '' -- for the table (not including rectangular bra) boohvtext = false -- empty field ready to be filled with garbage numnestin = 0 -- again ZERO after closing ']' end--if if (boocaught==false) then if (numnestin==0) then numerr = 13 -- #E13 -- "ext=" bad due text outside [] ie type F000 break -- damn end--if strfragtbl = strfragtbl .. string.char (numoct) -- add to frag for tbl boohvtext = true end--if end--while -- genuine loop, "numtamp" is the counter end--if (numlaong>=5) then if (bootrace) then lfshowvar (tabextfrog,'tabextfrog','from "ext=" done with one table',5) -- "qstrtrace" lfshowvar (numerr,'numerr') -- "qstrtrace" end--if ---- PEEK THE LANGUAGE NAMES VIA SUBMODULE ---- -- for lng name in site language ("c0" -- "constrtblc0"): -- * "-" is unconditionally evil with #E03 (broken submodule) -- * "=" can be #E05 (unknown code) if the site language -- code works, otherwise #E03 (broken submodule) too -- for lng name "propralingve" ("c1" -- "constrtblc2"): -- * "=" is unconditionally evil with #E03 (since the code used -- to work just before) -- * "-" is silently ignored (name not available) -- silly isolators "constrisobg" and "constrisoen" are needed for -- "strnamasin" (valid only if "boohavasi" is true) but not for "strnambah" bookonata = false -- this is less evil than (numerr>0), needed below boohavasi = false -- this is barely bad, needed far below if (numerr==0) then if (strkodbah=='??') then -- "??" is unknown but not faulty strnambah = contablaxwc [2] -- unknown lang else bookonata = true strnambah = qpiktbllki.ek { args = { strkodbah , constrtblc0 } } -- l nam no "rl" if (strnambah=="=") then strtmp = qpiktbllki.ek { args = { constrpriv , "-" } } -- test site code if (strtmp=="1") then numerr = 5 -- #E05 unknown code (since site code works) else numerr = 3 -- #E03 broken submodule (site code does NOT work either) end--if end--if (strnambah=="=") if (strnambah=="-") then -- better content in "c0" absolutely required numerr = 3 -- #E03 broken submodule end--if end--if (strkodbah=='??') else end--if if ((numerr==0) and bookonata) then strnamasin = qpiktbllki.ek { args = { strkodbah , constrtblc2 , "1" } } -- lng asing if (strnamasin=="=") then -- content not absolutely requ but this is error numerr = 3 -- #E03 error else if (strnamasin~="-") then boohavasi = true -- have valid name better than "-" to display strnamasin = constrisobg .. strnamasin .. constrisoen -- add the isola end--if end--if end--if ---- TRANSLATE WORD CLASS CODE VIA LUA TABLE ---- -- "strnamke6" and "strnamke7" is the long word class with possible (...) if (numerr==0) then if (strkodkek6=='??') then -- "??" is unknown but not faulty strnamke6 = contablaxwc [3] -- word class full -- unknown word class else vartmp = contabwc [strkodkek6] if (vartmp==nil) then numerr = 6 -- #E06 -- unknown word class else strnamke6 = vartmp -- word class full -- found it in the table end--if end--if (strkodkek6=='??') else end--if if ((numerr==0) and boohavdua) then vartmp = contabwc[strkodkek7] -- no "??" possible here if (vartmp==nil) then numerr = 6 -- #E06 -- unknown word class else strnamke7 = vartmp -- word class full -- found it in the table end--if end--if ---- PARTIALLY FILL INSANE TABLE ---- -- base categories are created even for unknown lng or wc -- compound categories are available only if lng is known -- insertable items defined: -- constant: -- * LK lng code (unknown "??" legal but take care elsewhere) -- * LN lng name (unknown legal, for example "dana" or "Ido") -- * LU lng name uppercased (unknown legal, for example "Dana" or "Ido") -- * LO lng name not own (empty or nil if own) -- * LV lng name uppercased not own (empty or nil if own) -- * LY lng name long (for example "bahasa Swedia") -- * LZ lng name long not own (empty or nil if own) -- * SC script code (for example "T", "S", "P" for ZH, "C" "L" for SH) -- variable (we can have 2 word classes): -- * WC word class name (for example "substantivo") -- * WU word class name uppercased (for example "Substantivo") -- * MT mortyp code (for example "C") -- * FR fragment (for example "peN-...-an" or "abelujo") if (numerr==0) then if (bookonata) then strnambalo = lflongname (strnambah,constrpriv) -- brew long, maybe needed else strnambalo = strnambah -- not longer than that end--if strtmp = lfxcaseult(strnambah,true,false) -- short uppercased tabstuff = {} -- bugger all inside tabstuff["LK"] = strkodbah tabstuff["LN"] = strnambah -- short name tabstuff["LU"] = strtmp -- uppercased name if (strkodbah~=constrpriv) then tabstuff["LO"] = strnambah -- maybe short name end--if if (strkodbah~=constrpriv) then tabstuff["LV"] = strtmp -- maybe uppercased name end--if tabstuff["LY"] = strnambalo if (strkodbah~=constrpriv) then tabstuff["LZ"] = strnambalo end--if tabstuff["SC"] = strscrscr -- script (may be empty) end--if ---- SPLIT THE LEMMA + EXTRA IF NEEDED VIA SUBMODULE ---- -- process from "strpagenam" (sudah guaranteed to be -- non-empty) to "strlemma" (actually NOT for manual split) -- "numshowlemma" : 0 none 1 raw 2 maybe spl 3 maybe spl and morpheme cat:s -- "numsplit" : 0 auto 1 assi auto 2 manu 3 srs 4 sbr 5 zh 7 none -- "numsplit" must be 7 #S7 if "numshowlemma" is 0 or 1 !!! -- we do exactly nothing (and leave "strlemma" empty) if: -- * we already suck ie "numerr"<>0 -- * "numshowlemma" is ZERO -- we skip the split and copy only if: -- * "numsplit" is 7 (#S7 no split, can be due to "numshowlemma" equal 1) -- punctuation (5 char:s: ! , . ; ?) 21 33 | 2C 44 | 2E 46 | 3B 59 | 3F 63 -- dash "-" and apo "'" do NOT count as punctuation (for auto and assi auto) -- we depend on "boomo3kat" and "bookonata" (they can switch off some cat:s) -- we depend on "boohavkal" (switches between "vortgrupo" and "frazo") -- "qtabkatoj" is very global -- 0...17 cat names without "Category:" prefix, unused "nil" -- 20...37 "true" if main page, otherwise "nil" -- "qsplitter" will fill it from split and from "ext=" too if ((numerr==0) and (numshowlemma~=0)) then qtabkatoj [0] = (boomo3kat and bookonata) -- do we want compound cat:s ??? qtabkatoj [1] = strpagenam qtabkatoj [2] = numsplit qtabkatoj [3] = tabblock qtabkatoj [4] = tablinx qtabkatoj [5] = tabmnfragments qtabkatoj [6] = tabextfrog -- from "ext=" fragments qtabkatoj [7] = boohavext -- from "ext=" is true if "tabextfrog" is valid qtabkatoj [8] = tabstuff qtabkatoj [9] = boohavnyr qtabkatoj[10] = boohavkal if (bootrace) then lfshowvar (qtabkatoj,'qtabkatoj','before firing submodule',40) -- "qstrtrace" end--if qtabkatoj = qsplitter.ek { args = qtabkatoj } if (bootrace) then lfshowvar (qtabkatoj,'qtabkatoj','after return from submodule',40) -- "qstrtrace" lftracemsg ('Report from submodule:<br><br>"' .. tostring (qtabkatoj[41]) .. '<br>"') -- "qstrtrace" end--if strlemma = qtabkatoj[40] if (type(strlemma)=="string") then if (strlemma=="//") then numerr = 3 -- #E03 broken submodule end--if else numerr = 3 -- #E03 broken submodule end--if end--if ---- WHINE IF YOU MUST #E02...#E99 ---- -- reporting of errors #E02...#E99 depends on uncommentable strings -- and name of the caller filled in from "constrkoll" if (numerr>1) then strviserr = lfbrewerror(numerr) end--if ---- BREW 1 OR 2 EXTRA STRING:S ONLY FOR CATEGORIES ---- -- content in "strnamco6" and "strnamco7" is word class stripped if (numerr==0) then strnamco6 = lfstripparent(strnamke6) if (boohavdua) then strnamco7 = lfstripparent(strnamke7) end--if end--if ---- BREW THE INVISIBLE ANCHOR PART ---- -- uses "constrankkom" (does NOT end with a dash) and "constaankend" -- '<span id="' .. anchor name .. '"></span>' -- we can brew 2 or 3 or 5 anchors strinvank = '' if (numerr==0) then strinvank = constrankkom .. "-" .. strkodbah .. constaankend strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek6 .. constaankend if (boohavdst) then strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek6 .. '-' .. strdstdst .. constaankend end--if if (boohavdua) then strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek7 .. constaankend if (boohavdst) then strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek7 .. '-' .. strdstdst .. constaankend end--if end--if (boohavdua) then end--if ---- BREW THE VISIBLE PART ---- -- "strlemma" is the lemma with or without separation links -- "numshowlemma" is four-state but here we bother only abo ZERO vs non-ZERO strvisgud = '' if (numerr==0) then strvisgud = contabscrmisc[0] -- <div...></div> must be empty -- tiny EOL if (booshowimage) then strvisgud = strvisgud .. contabscrmisc[1] -- "File:Garto" ... with [[]] end--if strvisgud = strvisgud .. ' ' if (numshowlemma~=0) then strvisgud = strvisgud .. '<b><bdi>' .. strlemma .. '</bdi></b> ' -- lemma and space end--if strvisgud = strvisgud .. '( <span ' .. constrtoolt .. ' title="' .. contablaxwc [0] .. strnambah if (boohavasi) then strvisgud = strvisgud .. ' ' .. strnamasin -- lang name in the lang with isola end--if strvisgud = strvisgud .. '"> ' .. strkodbah .. ' </span>' strvisgud = strvisgud .. ' , <span ' .. constrtoolt .. ' title="' .. contablaxwc [1] .. strnamke6 strvisgud = strvisgud .. '"> ' .. strkodkek6 .. ' </span>' if (boohavdua) then strvisgud = strvisgud .. ' , <span ' .. constrtoolt .. ' title="' .. contablaxwc [1] .. strnamke7 strvisgud = strvisgud .. '"> ' .. strkodkek7 .. ' </span>' end--if strvisgud = strvisgud .. ' )' end--if ---- BREW THE INVISIBLE CATEGORY LIST BASE PART ---- -- Need string "constrkatp" and it is cat prefix and includes the colon ":". -- We need sub "lfinsertultim" (2 para) and table "contabkatoj" controlling -- the structure of the cat name. -- Note that these categories are unique as they: -- do NOT pass through "qtabkatoj" -- contain a word class for 2 of 3 -- are created even for unknown lng or wc strinvkat = '' if ((numerr==0) and (boonocat==false)) then tabstuff["MT"] = nil -- no stupid morpheme here tabstuff["FR"] = nil -- no stupid fragment here numkindex = 0 -- index 0...2 while (true) do if (numkindex==3) then break end--if varkantctl = contabkatoj[numkindex] -- 0...2 pick main data string no "nil" if (type(varkantctl)=="string") then numtamp = string.len(varkantctl) if (numtamp>=2) then bootimp = lffinditems(varkantctl,"WCWU") -- word class if (bootimp) then tabstuff["WC"] = strnamco6 tabstuff["WU"] = lfxcaseult(strnamco6,true,false) strinvkat = strinvkat .. '[[' .. constrkatp .. lfinsertultim (varkantctl,tabstuff) .. ']]' if (boohavdua) then tabstuff["WC"] = strnamco7 tabstuff["WU"] = lfxcaseult(strnamco7,true,false) strinvkat = strinvkat .. '[[' .. constrkatp .. lfinsertultim (varkantctl,tabstuff) .. ']]' end--if else tabstuff["WC"] = nil -- no word class, only lng tabstuff["WU"] = nil -- no word class, only lng strinvkat = strinvkat .. '[[' .. constrkatp .. lfinsertultim (varkantctl,tabstuff) .. ']]' end--if end--if (numtamp>=2) then end--if (type(varkantctl)=="string") then numkindex = numkindex + 1 end--while end--if ---- ENHANCE THE INVISIBLE CATEGORY LIST WITH COMPOUND PART ---- -- Need string "constrkatp" and it is cat prefix and includes the colon ":". -- List of cat names without NS prefix was maybe brewed by submodule "qsplitter" -- and is stored in global "qtabkatoj" ([0]...[17]). At +20 we have requests -- for main page of the category using "|-" as "key". if ((numerr==0) and (boonocat==false) and boomo3kat) then numkindex = 0 -- 0...17 max but stop at type "nil" guaranteed to occur while (true) do vartmp = qtabkatoj[numkindex] -- risk of type "nil" vartpm = qtabkatoj[numkindex+20] -- risk of type "nil" bootimp = (vartpm==true) -- main flag if (type(vartmp)=="string") then strinvkat = strinvkat .. '[[' .. constrkatp .. vartmp if (bootimp) then strinvkat = strinvkat .. '|-' -- main page of category end--if strinvkat = strinvkat .. ']]' else break -- abort at "nil" end--if numkindex = numkindex + 1 end--while end--if ---- RETURN THE JUNK STRING ---- strret = strviserr .. strinvank .. strvisgud .. strinvkat if (bootrace) then strret = "<br>" .. qstrtrace .. "<br><br>" .. strret end--if return strret end--function ---- RETURN THE JUNK LUA TABLE ---- return lawc