Modulo:mlawc
Salti al navigilo
Salti al serĉilo
|
Memtesto disponeblas sur la dokumentaĵa subpaĝo |
--[===[
MODULE "MLAWC" (language and word class)
"eo.wiktionary.org/wiki/Modulo:mlawc" <!--2021-Jan-16-->
"id.wiktionary.org/wiki/Modul:mlawc"
Purpose: shows the lemma in bold text and brews 2 or 3
tooltip texts and 3 or 5 locally invisible category includes
from language code and 1 or 2 word class codes,
creates 2...5 invisible "anchors" for linking to section,
optionally splits a multiword lemma into links to the parts
Utilo: montras kapvorton en grasa tiparfasono kaj generas 2 aux 3
musumkonsilajn tekstojn kaj 3 aux 5 loke nevideblajn kategorienmetojn
el lingva kodo kaj 1 aux 2 vortospecaj kodoj,
kreas 2...5 nevideblajn "ankerojn" por ligado al sekcio,
opcie disigas plurvortan kapvorton al ligiloj al la partoj
Manfaat: memperlihatkan lema dengan teks tebal dan membuat 2 atau 3
teks tooltip dan 3 atau 5 masukan kategori tak terlihat secara
setempat dari kode bahasa dan 1 atau 2 kode kelas kata,
membuat 2...5 jangkar yang tidak terlihat untuk pranala ke bagian,
juga bisa memotong lema beberapa kata menjadi pranala ke bagiannya
Syfte: visar uppslagsordet med fet stil och skapar 2 eller 3
tooltiptexter och 3 eller 5 lokalt osynliga kategoriinlaeggningar
fraan ...
Used by templates / Uzata far sxablonoj / Digunakan oleh templat:
- livs (EO) , bakk (ID)
Required submodules / Bezonataj submoduloj / Submodul yang diperlukan:
- "mpiktbllki" in turn requiring "mbllingvoj" (EO) or "mtblbahasa" (ID)
- "mtbllingvoj" in turn requiring template "tbllingvoj" (EO)
- "mtblbahasa" in turn requiring template "tblbahasa" (ID)
This module can accept parameters whether sent to itself (own frame) or
to the caller (caller's frame). If there is a parameter "caller=true"
on the own frame then that own frame is discarded in favor of the
caller's one. Empty parameters and parameters longer than 120
octet:s are inherently invalid (#E02), further checks follow.
Incoming: - 2 anonymous obligatory parameters (one of them
can be "??" but NOT both)
- language code (2 or 3 lowercase letters, use "??" if unknown)
- word class code (2 UPPERCASE letters, use "??" if unknown)
or 2 word class codes (4 UPPERCASE letters, no "??" then)
- 1 or 2 named optional parameters (depends on semi-hardcoded)
- "fra=" (1...120 octet:s) split control string, pagename
is aways used as lemma, if it is multiword then it is split
automatically, can generate #E07, this parameter is NOT
supported (thus fully ignored) if splitting or showing the
lemma is deactivated in the source code, see below for details
- "dst=" (2...40 octet:s) distinction hint for word class, for
example "koleg+o", "kol+eg+o", "fleksia bana+n", "baza bana",
"en", "ett" (all brackets prohibited, apo "'" prohibited, OTOH
plus "+" permitted and recommended) not showed but built in
into the "anchor"
- 3 hidden parameters
- "pagenameoverridetestonly="
- "nocat="
- "detrc="
Returned: - one string intended to be showed alone in a line below
h3-heading, consisting of the word in bold and enclosed in
<bdi>...</bdi>, space, and a short summary (example:
"( sv , VE )" with 2 tooltips (example:
"Bahasa: Swedia (svenska)" and "Kelas kata: verba (kata kerja)"),
2 invisible anchors and 3 categories,
even 2 word classes "( sv , VE , GR)" with 3 tooltips,
3 invisible anchors and 5 categories
This module is unbreakable (when called with correct module name
and function name). Every imaginable input from the caller and
from the imported modules will output either a useful result or
at least a helpful error string.
Cxi tiu modulo estas nerompebla (kiam vokita kun gxustaj nomo de modulo
kaj nomo de funkcio). Cxiu imagebla enigo de la vokanto kaj
de la importataj moduloj eldonos aux utilan rezulton aux
almenaux helpeman eraranoncan signocxenon.
Following errors are possible:
- <<#E01 Internal error in module "mlawc">>
Possible causes:
- strings not uncommented
- function "mw.title.getCurrentTitle().text" AKA "{{PAGENAME}}" failed
- <<#E02 Erara uzo de sxablono "livs", legu gxian dokumentajxon>>
Possible causes (early detected obvious problems with parameters):
- less than 2 or more than 3 parameters, or holes
- empty parameters or parameters longer than 120 octet:s
- <<#E03 Eraro en subsxablonoj uzataj far sxablono "livs">>
Possible causes:
- submodule failure (or not found ??)
- <<#E04 Evidente nevalida lingvokodo en sxablono "livs">>
- <<#E05 Nekonata lingvokodo en sxablono "livs">>
- <<#E06 Erara uzo de sxablono "livs" pro vortospeco>>
Possible causes (later detected more clandestine problems with parameters):
- invalid word class code
- "??" used inside 4-char string
- both language and word class given as "??"
- <<#E07 Erara uzo de sxablono "livs" pro "fra=" apartigo>>
Possible causes (later detected more clandestine problems with parameters):
- split control parameter is faulty (see below)
- <<#E08 Erara uzo de sxablono "livs" pro "dst=" distingo>>
Possible causes (later detected more clandestine problems with parameters):
- distinction hint parameter is faulty
The 25 word classes are:
Main big classes (3):
- SB noun - substantivo (O-vorto) - nomina (kata benda)
- VE verb - verbo (I-vorto) - verba (kata kerja)
- AJ adjective - adjektivo (A-vorto) - adjektiva (kata sifat)
Further smaller classes (12):
- PN pronoun - pronomo - pronomina (kata pengganti)
- NV numeral - numeralo (nombrovorto) - numeralia (kata bilangan)
- AV adverb - adverbo (E-vorto) - adverbia (kata keterangan)
- PV verb particle (EN,SV) - verbpartiklo - partikel verba
- QV question word - demandvorto - kata tanya
- KJ coordinator - konjunkcio - konjungsi
- SJ subordinator - subjunkcio (subfrazenkondukilo) - subjungsi (pengaju klausa terikat)
- PP preposition - prepozicio (antauxlokigita rolvorteto) - preposisi (kata depan)
- PO postposition (EN,SV) - postpozicio - postposisi (kata belakang)
- PC circumposition (SV) - cirkumpozicio - sirkumposisi
- AR article (EN,EO,SV) - artikolo - artikel (kata sandang)
- IN interjection - interjekcio - interjeksi
Nonstandalone elements (5):
- PF prefix - prefikso - prefiks (awalan)
- UF suffix - sufikso (postfikso, finajxo) - sufiks (akhiran)
- KF circumfix - cirkumfikso (konfikso) - sirkumfiks (konfiks)
- IF infix - infikso - infiks (sisipan)
- NR nonstandalone root - nememstara radiko - akar kata terikat
Misc (2):
- KA sentence - frazo - kalimat
- KK character - signo - karakter
Additional classes (3) :
- KU abbreviation - mallongigo (kurtigo) - singkatan (abreviasi)
- GR group of words - vortgrupo - kumpulan kata
- TV table word - tabelvorto - kata tabel
Here we do NOT care about the "base word" property, it is categorized by
module "tagg" / "k" instead. Similarly we do not care about "kofrovorto",
"blandajxo", "derivajxo de tabelvorto" here. And we do NOT care about
"Proverbo" (subclass of KA) and "Esprimo" (subclass of GR) either.
We theoretically could autodetect the word classes KA and GR but don't. The
chief trouble with autodetecting KA are some multiword abbreviations
beginning with uppercase and ending with a dot, GR is probably
less problematic. Still both would cause several problems:
* how to override or suppress autodetection
* how many word classes are permitted at same time given that an additional
one can be autodetected
Categories EO:
Kategorio:Kapvorto (angla) Kategorio:Kapvorto (Esperanto)
Kategorio:Verbo
Kategorio:Verbo (angla) Kategorio:Verbo (Esperanto)
Notes: - we auto-remove the part of word class in brackets and auto-adjust
the letter case, thus "adverbo (E-vorto)" becomes "Adverbo"
- "angla" is lowercase when in brackets, but begins uppercase when
separate (pagename in category namespace), we auto-adjust
the letter case (DEPRECATED)
Categories ID:
Kategori:Kata bahasa Indonesia
Kategori:Nomina
Kategori:id:Nomina
Notes: - we auto-remove the part of word class in brackets and auto-adjust
the letter case, thus "nomina (kata benda)" becomes "Nomina"
Anchors:
* Qsekt-en (lang only)
* Qsekt-en-SB (lang and word class) (2 such created if 2 word classes)
* Qsekt-sv-SB-ett (lang and word class and hint) (2 such created
if 2 word classes)
With 1 word class we brew 2 or 3 anchors.
With 2 word classes we brew 3 or 5 anchors.
With the hint provided we brew both a category without and with it built in.
There are 2 ways to brew "anchors" in HTML:
* <span id="tujuh"></span> HTML5 and works from wikitext, used here
* <a name="tujuh"></a> HTML2 but does NOT work from wikitext, showed
as plain text
Semi-harcoded parameters in the source:
* "constrmainctl" type string 2 digits :
* show image (0 or 1)
* show lemma (0 none 1 raw 2 maybe split)
the image is "[[File:Gartoon apps kopete all away replaced.svg|24px|link=]]"
* "conboomiddig" type bool :
* "true" to allow middle digit "s7a" in lng codes
The splitter:
The lemma is read from the pagename and if it is multiword then it is
automatically split at split boundaries. Such a boundary consists of
one or multiple qualifying char:s, those are space or punctuation
(5 char:s: ! , . ; ?). Note that particularly dash "-" and apo "'" do
NOT count as punctuation, thus for example "berjalan-jalan" or "o'clock"
will remain together. The fragments are linked by default and there are
some options to tune the result. It is posible to deactivate (semi-hardcoded
in the source code) only the splitter resulting in the raw lemma showed
without link, or deactivate showing the lemma
alltogether, in both cases the splitter is inactive and the parameter "fra="
is NOT supported (thus fully ignored, no error from it is possible then).
The parameter "fra=" controls the splitter. Bad content can
generate #E07, but some problems are ignored instead.
During the split work 2 separate ZERO-based counters are maintained and
commands in the split control string refer to those.
- input boundary counter : Counts boundaries between incoming words forming
the lemma. Multiple consecutive qualifying char:s count as one boundary,
this applies even to leading and trailing position. For example the text
"Apples, ? bananas and beer." contains 4 boundaries numbered from 0 to 3,
the string ", ? " (4 char:s) receives index 0. The string "?va?" contains
2 boundaries.
- output fragment counter : Counts generated fragments. For example
"pembangkit listrik tenaga surya" contains 3 boundaries (see above) and will
by default generate 4 fragments. If you disable breaking at boundaries 0 and
2 then the result will be only 2 fragments "pembangkit listrik" and "tenaga
surya" instead, numbered 0 and 1.
The counters are referenced by one-digit numbers, "0" to "9" and "A" to "F"
(uppercase) for rarely needed indexes "10"..."15", thus actually HEX numbers.
Limits:
- length of lemma : 1...120 octet:s
- length of parameter "fra=" : 1...120 octet:s
- length of explicitly provided link target : 1...40 octet:s
- number of blocked input boundaries : max 8
- number of accessible output fragments : max 16 (numbered "0"..."F")
Syntax of the "fra=" parameter:
- special value "-" : completely disable splitting
- sequence of tuning commands separated by spaces, or even only 1 command:
- "%" followed by 1...8 ascending HEX digits : do not split at listed
input word boundaries
- "#" followed by a HEX digit followed by "N" or "I" or "A" : tune
at pointed output fragment index
- "N" do not link the fragment
- "I" convert beginning letter to lowercase ("I" minusklo) for link target
- "A" convert beginning letter to uppercase ("A" majusklo) for link target
- "#" followed by a HEX digit followed by colon ":" followed by
1...40 char:s : link to that target instead
The "#"-items (ZERO or ONE or more permitted) must be ascending but need not
to be consecutive, and they must follow the single "%"-item if it is present.
For example "%3A #2N #5A #7N #8:test" will:
- avoid breaking at input boundaries 3 and 10
- avoid linking of fragments 2 and 7
- link fragment 5 to target with uppercase letter
- link fragment 8 to "test"
The most common use will be "#0I" fixing the case of the word at
beginning of a sentence lemma, for example "Yes we can." will
link to "yes", not to "Yes", besides "we" and "can".
Too high positions of boundaries and fragments are ignored but other
errors are not and result in #E07, most notably:
- messing up the order of "%" and "#", ie putting "#" before "%",
for example "#2N %3A #5A #7N #8:test"
- numbers after "%" are not ascending, for example "%A3 #2N #5A #7N #8:test"
- "#"-items are not ascending, for example "%3A #2N #5A #7N #6:test"
- invalid char:s or missing spaces, for example "%3A #2N#5A #7N #8:test"
Too high number of boundaries or fragments occurring in the lemma does not
cause an error but it is not possible to tune those with index >= 16 anymore.
The splitter is controlled by 2 prevalidated tables generated from the
"fra=" parameter.
* Table "tabblock" contains up to 16 values indexed by integers 0 to 15,
value type string "1" means do block, type "nil" means do not
block (the default). Other values should not occur and evaluate to
do not block like "nil" does.
* Table "tablinx" contains up to 16 values indexed by integers 0 to 15, value
* type string:
* "N" or "I" or "A" (as described above)
* colon ":" followed by the link target (length 1...40 octet:s NOT
checked anymore here)
Beginning char other than "N" or "I" or "A" or ":" should not
occur and evaluates to do nothing unusual like "nil" does.
* type "nil" means do nothing unusual (the default)
The tooltips:
There are some difficulties with the tooltip to be diplayed via the "title="
attribute. HTML tags cannot be nested, thus neither <br> nor <bdi>...</bdi>
can be used. We have no solution to <br> (apart from splitting the tooltip
into 2 fragments showed separately from different positions), and for
<bdi>...</bdi> we use the unicode explicit isolator "FIRST STRONG
ISOLATE (FSI)" which does have the expected effect but may as a side effect
show as a rectangle in some browsers. Alternatively, an advanced tooltip can
be achieved using CSS and the "hover" selector but this is not accessible
from inside wikitext. Even an extension for such advanced tooltips exists
but is not enabled on most public wikies.
: ---------------------------------------
* #T00 (no params, evil)
* expected result: #E02
* actual result: "{{#invoke:mlawc|ek}}"
::* #T01 ("eo", one param, evil)
::* expected result: #E02
::* actual result: "{{#invoke:mlawc|ek|eo}}"
* #T02 ("en|SB", page "hole", simplest example)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|en|SB|pagenameoverridetestonly=hole|nocat=true}}"
::* #T03 ("en|??", page "hole")
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|en|??|pagenameoverridetestonly=hole|nocat=true}}"
* #T04 ("??|SB", page "hole")
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|??|SB|pagenameoverridetestonly=hole|nocat=true}}"
::* #T05 ("??|??", page "mojosa")
::* expected result: #E06
::* actual result: "{{#invoke:mlawc|ek|??|??|pagenameoverridetestonly=mojosa|nocat=true}}"
* #T06 ("id|SBGR", page "pembangkit listrik", default split)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|id|SBGR|pagenameoverridetestonly=pembangkit listrik|nocat=true}}"
::* #T07 ("en|SB|tria", page "hole", too many params)
::* expected result: #E02
::* actual result: "{{#invoke:mlawc|ek|en|SB|tria|pagenameoverridetestonly=hole|nocat=true}}"
* #T08 ("en|SB|tria|kvara", page "hole", too many params)
* expected result: #E02
* actual result: "{{#invoke:mlawc|ek|en|SB|tria|kvara|pagenameoverridetestonly=hole|nocat=true}}"
: ---------------------------------------
* #T10 ("id|SBGR|fra=-", page "pembangkit listrik", no split)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=-|pagenameoverridetestonly=pembangkit listrik|nocat=true}}"
::* #T11 ("id|SBGR", page "pembangkit listrik tenaga surya", default split)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|id|SBGR|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
* #T12 ("id|SBGR|fra=-", page "pembangkit listrik tenaga surya", no split)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=-|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
::* #T13 ("id|SBGR|fra=%0", page "pembangkit listrik tenaga surya", auto split except ZERO)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%0|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
* #T14 ("id|SBGR|fra=%1", page "pembangkit listrik tenaga surya", auto split except ONE)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%1|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
::* #T15 ("id|SBGR|fra=%2", page "pembangkit listrik tenaga surya", auto split except 2)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%2|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
: ---------------------------------------
* #T20 ("id|SBGR|fra=%3", page "pembangkit listrik tenaga surya", auto split except 3, ignored)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%3|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
::* #T21 ("id|SBGR|fra=%F", page "pembangkit listrik tenaga surya", auto split except "F" AKA 15, ignored)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%F|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
* #T22 ("id|SBGR|fra=%G", page "pembangkit listrik tenaga surya", invalid split control string, bad char)
* expected result: #E07
* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%G|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
::* #T23 ("id|SBGR|fra=%12", page "pembangkit listrik tenaga surya", auto split except 1 and 2)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%12|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
* #T24 ("id|SBGR|fra=%23456789", page "pembangkit listrik tenaga surya", auto split except 2...9, junk ignored)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%23456789|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
::* #T25 ("id|SBGR|fra=%123456789", page "pembangkit listrik tenaga surya", auto split except 1...9, too long)
::* expected result: #E07
::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%123456789|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
* #T26 ("id|SBGR|fra=%23456781", page "pembangkit listrik tenaga surya", auto split except nonsense, not ascending)
* expected result: #E07
* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%23456781|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
: ---------------------------------------
* #T30 ("en|KA", page "When in a hole, stop digging.", default but suboptimal split)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|en|KA|pagenameoverridetestonly=When in a hole, stop digging.|nocat=true}}"
::* #T31 ("en|KA|fra=-", page "When in a hole, stop digging.", no split, no link)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=-|pagenameoverridetestonly=When in a hole, stop digging.|nocat=true}}"
* #T32 ("en|KA|fra=#0I", page "When in a hole, stop digging.", auto split, lowercase frag index 0)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=When in a hole, stop digging.|nocat=true}}"
::* #T33 ("id|SBGR|fra=%1 #2A", page "pembangkit listrik tenaga surya", auto split except boun ONE and uppercase frag index 2)
::* expected result: OK (silly with "listrik tenaga" together and "surya" linking to "Surya")
::* actual result: "{{#invoke:mlawc|ek|id|SBGR|fra=%1 #2A|pagenameoverridetestonly=pembangkit listrik tenaga surya|nocat=true}}"
* #T34 ("en|KA|fra=#0I", page "When In A Hole, Stop Digging.", auto split, German style, lowercase frag index 0)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=When In A Hole, Stop Digging.|nocat=true}}"
::* #T35 ("en|KA|fra=#0I #3I #4I #5I", page "When In A Hole, Stop Digging.", auto split, German style, lowercase frag index 0 3 4 5)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I #3I #4I #5I|pagenameoverridetestonly=When In A Hole, Stop Digging.|nocat=true}}"
: ---------------------------------------
* #T40 ("en|KA|fra=#0I", page "Digging", auto split and fix case requested index 0 but no split boundaries available)
* expected result: OK (raw text "Digging" and no link to "digging" nor "Digging")
* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=Digging|nocat=true}}"
::* #T41 ("sv|KA", page "?va?", default split)
::* expected result: OK (link to "va")
::* actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=?va?|nocat=true}}"
* #T42 ("sv|KA", page "?va", default split)
* expected result: OK (link to "va")
* actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=?va|nocat=true}}"
::* #T43 ("sv|KA", page "va?", default split)
::* expected result: OK (link to "va")
::* actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=va?|nocat=true}}"
* #T44 ("sv|KA", page "va", default split but no split boundaries available)
* expected result: OK (no link)
* actual result: "{{#invoke:mlawc|ek|sv|KA|pagenameoverridetestonly=va|nocat=true}}"
::* #T45 ("sv|KA|fra=%01", page "?va?", 2 boundaries available but both are blocked)
::* expected result: OK (raw text "?va?" and no link)
::* actual result: "{{#invoke:mlawc|ek|sv|KA|fra=%01|pagenameoverridetestonly=?va?|nocat=true}}"
: ---------------------------------------
* #T50 ("en|KA|fra=#0I", page "When in Rome, do as the Romans do.", auto split and fix case frag 0, suboptimal result due to word "Romans")
* expected result: OK (links to "when" and "Romans")
* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I|pagenameoverridetestonly=When in Rome, do as the Romans do.|nocat=true}}"
::* #T51 ("en|KA|fra=#0I #6:Roman", page "When in Rome, do as the Romans do.", auto split and fix case frag 0, good result, fixed word "Romans" index 6)
::* expected result: OK (links to "when" and "Roman")
::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I #6:Roman|pagenameoverridetestonly=When in Rome, do as the Romans do.|nocat=true}}"
* #T52 ("en|KA|fra=#0I #6:Roman", page "When in,, , Rome, do as the Romans do.", auto split and fix case frag 0, fixed word "Romans" index 6)
* expected result: silly OK (links to "when" and "Roman")
* actual result: "{{#invoke:mlawc|ek|en|KA|fra=#0I #6:Roman|pagenameoverridetestonly=When in,, , Rome, do as the Romans do.|nocat=true}}"
::* #T53 ("en|KA|fra=%01 #0I #4:Roman", page "When in,, , Rome, do as the Romans do.", auto split and fix case frag 0, fixed word "Romans" index 4 now)
::* expected result: silly OK (links to "when" and "Roman")
::* actual result: "{{#invoke:mlawc|ek|en|KA|fra=%01 #0I #4:Roman|pagenameoverridetestonly=When in,, , Rome, do as the Romans do.|nocat=true}}"
* #T54 ("eo|KA", page "!!!Mi jam,? estas fin-venkisto!!!", default split)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|eo|KA|pagenameoverridetestonly=!!!Mi jam,? estas fin-venkisto!!!|nocat=true}}"
::* #T55 ("eo|KA|fra=-", page "!!!Mi jam,? estas fin-venkisto!!!", no split)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|eo|KA|fra=-|pagenameoverridetestonly=!!!Mi jam,? estas fin-venkisto!!!|nocat=true}}"
* #T56 ("eo|KA|fra=#3:fino", page "!!!Mi jam,? estas fin-venkisto!!!", default split, and link "fin-venkisto" to "fino")
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|eo|KA|fra=#3:fino|pagenameoverridetestonly=!!!Mi jam,? estas fin-venkisto!!!|nocat=true}}"
: ---------------------------------------
* #T60 ("deu|SB", page "hole", invalid lng)
* expected result: #E04
* actual result: "{{#invoke:mlawc|ek|deu|SB|pagenameoverridetestonly=hole|nocat=true}}"
::* #T61 ("xxx|SB", page "hole", unknown lng)
::* expected result: #E05
::* actual result: "{{#invoke:mlawc|ek|xxx|SB|pagenameoverridetestonly=hole|nocat=true}}"
* #T62 ("en|SS", page "hole", invalid word class)
* expected result: #E06
* actual result: "{{#invoke:mlawc|ek|en|SS|pagenameoverridetestonly=hole|nocat=true}}"
::* #T63 ("en|SB??", page "move", invalid use of "??")
::* expected result: #E06
::* actual result: "{{#invoke:mlawc|ek|en|SB??|pagenameoverridetestonly=move|nocat=true}}"
* #T64 ("en|??SB", page "move", invalid use of "??")
* expected result: #E06
* actual result: "{{#invoke:mlawc|ek|en|??SB|pagenameoverridetestonly=move|nocat=true}}"
::* #T65 ("en|????", page "move", invalid use of "??")
::* expected result: #E06
::* actual result: "{{#invoke:mlawc|ek|en|????|pagenameoverridetestonly=move|nocat=true}}"
: ---------------------------------------
* #T70 ("en|AVKU", page "ASAP")
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|en|AVKU|pagenameoverridetestonly=ASAP|nocat=true}}"
::* #T71 ("en|SJ", page "when")
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|en|SJ|pagenameoverridetestonly=when|nocat=true}}"
* #T72 ("sv|SB|dst=baza banan", page "banan", try to link to this one)
* expected result: OK
* actual result: "{{#invoke:mlawc|ek|sv|SB|dst=baza banan|pagenameoverridetestonly=banan|nocat=true}}"
::* #T73 ("sv|SB|dst=fleksia", page "banan", try to link to this one, no categories)
::* expected result: OK
::* actual result: "{{#invoke:mlawc|ek|sv|SB|dst=fleksia|pagenameoverridetestonly=banan|nocat=true}}"
::* actual result via debu: "{{debu|{{#invoke:mlawc|ek|sv|SB|dst=fleksia|pagenameoverridetestonly=banan|nocat=true}}|nw}}"
* #T74 ("sv|SB|dst=baza [ba]nan", page "banan", illegal brackets)
* expected result: #E08
* actual result: "{{#invoke:mlawc|ek|sv|SB|dst=baza [ba]nan|pagenameoverridetestonly=banan|nocat=true}}"
::* #T75 ("sv|SB|dst=banan'", page "banan", illegal apo)
::* expected result: #E08
::* actual result: "{{#invoke:mlawc|ek|sv|SB|dst=banan'|pagenameoverridetestonly=banan|nocat=true}}"
* #T76 ("en|AVKU", page "ASAP", see categories)
* expected result: OK
* actual result: "{ {#invoke:mlawc|ek|en|AVKU|pagenameoverridetestonly=ASAP} }" (blocked)
* actual result via debu: "{{debu|{{#invoke:mlawc|ek|en|AVKU|pagenameoverridetestonly=ASAP}}|nw}}"
::* #T77 ("en|SJ", page "when", see categories)
::* expected result: OK
::* actual result: "{ {#invoke:mlawc|ek|en|SJ|pagenameoverridetestonly=when} }" (blocked)
::* actual result via debu "{{debu|{{#invoke:mlawc|ek|en|SJ|pagenameoverridetestonly=when}}|nw}}"
* #T78 ("en|AVKU|dst=test", page "ASAP", silly maximal test for anchors and categories)
* expected result: OK
* actual result: "{ {#invoke:mlawc|ek|en|AVKU|dst=test|pagenameoverridetestonly=ASAP} }" (blocked)
* actual result via debu "{{debu|{{#invoke:mlawc|ek|en|AVKU|dst=test|pagenameoverridetestonly=ASAP}}|nw}}"
: ---------------------------------------
* note that tests #T73 #T76 #T77 and #T78 depend on "debu"
* note that tests #T73 #T76 #T77 and #T78 cannot be reasonably executed on the docs subpage without help of "pate" or "debu"
: ---------------------------------------
]===]
local lawc = {}
---- CONSTANTS ----
-- uncommentable EO vs ID constant strings (core site-related features)
local constrpriv = "eo" -- EO (privileged site language)
-- local constrpriv = "id" -- ID (privileged site language)
local constrplki = "Modulo:mpiktbllki" -- EO
-- local constrplki = "Modul:mpiktbllki" -- ID
local constrkatp = "Kategorio:" -- EO
-- local constrkatp = "Kategori:" -- ID
-- constant table (ban list)
-- add only obviously invalid access codes (2-letter or 3-letter)
-- length of the list is NOT stored anywhere, the processing stops
-- when type "nil" is encountered
-- "en.wiktionary.org/wiki/Wiktionary:Language_treatment" excluded languages
-- "en.wikipedia.org/wiki/Spurious_languages"
-- "iso639-3.sil.org/code/art" only valid in ISO 639-2
-- "iso639-3.sil.org/code/zxx" "No linguistic content"
local contabisbanned = {}
contabisbanned = {'dc', 'll', 'art','deu','eng','epo','fra','lat','por','rus','spa','swe','tup','zxx'} -- 1...14
-- constant table (surrogate transcoding table, only needed for EO)
local contabtransiltable = {}
contabtransiltable[ 67] = 0xC488 -- CX
contabtransiltable[ 99] = 0xC489 -- cx
contabtransiltable[ 71] = 0xC49C -- GX
contabtransiltable[103] = 0xC49D -- gx
contabtransiltable[ 74] = 0xC4B4 -- JX
contabtransiltable[106] = 0xC4B5 -- jx
contabtransiltable[ 83] = 0xC59C -- SX
contabtransiltable[115] = 0xC59D -- sx
contabtransiltable[ 85] = 0xC5AC -- UX breve
contabtransiltable[117] = 0xC5AD -- ux breve
-- constant strings (tooltip)
local constrtoolt = 'style="border-bottom:1px dotted; cursor:help;"' -- lousy tooltip
local constrisobg = '(⁨ ' -- isolator for "strange" (RTL, submicroscopic) text begin
local constrisoen = ' ⁨)' -- isolator for "strange" (RTL, submicroscopic) text end
-- constant strings (anchor HTML code and prefix)
local constrankkom = '<span id="Qsekt' -- do NOT add the dash "-" here
local constaankend = '"></span>'
-- constant strings (error circumfixes)
local constrkros = ' [] ' -- lagom -> huge " [] "
local constrelabg = '<span class="error"><b>' -- lagom whining begin
local constrelaen = '</b></span>' -- lagom whining end
-- uncommentable EO vs ID (caller name for error messages)
local constrkoll = 'sxablono "livs"' -- EO augmented name of the caller (semi-hardcoded, we do NOT peek it)
-- local constrkoll = 'templat "bakk"' -- ID augmented name of the caller (semi-hardcoded, we do NOT peek it)
-- uncommentable EO vs ID constant table (error messages)
-- #E02...#E08, note that #E00 and #E01 are NOT supposed to be included here
local contaberaroj = {}
contaberaroj[2] = 'Erara uzo de \\@, legu gxian dokumentajxon' -- EO #E02
-- contaberaroj[2] = 'Penggunaan salah \\@, bacalah dokumentasinya' -- ID #E02
contaberaroj[3] = 'Eraro en subsxablonoj uzataj far \\@' -- EO #E03
-- contaberaroj[3] = 'Kesalahan dalam subtemplat digunakan oleh \\@' -- ID #E03
contaberaroj[4] = 'Evidente nevalida lingvokodo en \\@' -- EO #E04
-- contaberaroj[4] = 'Kode bahasa jelas-jelas salah dalam \\@' -- ID #E04
contaberaroj[5] = 'Nekonata lingvokodo en \\@' -- EO #E05
-- contaberaroj[5] = 'Kode bahasa tidak dikenal dalam \\@' -- ID #E05
contaberaroj[6] = 'Erara uzo de \\@ pro vortospeco' -- EO #E06
-- contaberaroj[6] = 'Penggunaan salah \\@ oleh karena kelas kata' -- ID #E06
contaberaroj[7] = 'Erara uzo de \\@ pro "fra=" apartigo' -- EO #E07
-- contaberaroj[7] = 'Penggunaan salah \\@ oleh karena "fra=" pemotongan' -- ID #E07
contaberaroj[8] = 'Erara uzo de \\@ pro "dst=" distingo' -- EO #E08
-- contaberaroj[8] = 'Penggunaan salah \\@ oleh karena "dst=" pembedaan' -- ID #E08
-- uncommentable EO vs ID constant strings (misc to be showed on the screen)
local constrneli = "nekonata lingvo" -- EO placeholder
-- local constrneli = "bahasa yang tidak dikenal" -- ID placeholder
local constrnekk = "nekonata vortospeco" -- EO placeholder
-- local constrnekk = "kelas kata yang tidak dikenal" -- ID placeholder
local constrbaha = "Lingvo: " -- EO tooltip
-- local constrbaha = "Bahasa: " -- ID tooltip
local constrkeka = "Vortospeco: " -- EO tooltip
-- local constrkeka = "Kelas kata: " -- ID tooltip
-- uncommentable EO vs ID constant table (25 word classes)
local contabwc = {}
contabwc["SB"] = "substantivo (O-vorto)" -- EO |
-- contabwc["SB"] = "nomina (kata benda)" -- ID |
contabwc["VE"] = "verbo (I-vorto)" -- EO | main big (3)
-- contabwc["VE"] = "verba (kata kerja)" -- ID |
contabwc["AJ"] = "adjektivo (A-vorto)" -- EO |
-- contabwc["AJ"] = "adjektiva (kata sifat)" -- ID |
contabwc["PN"] = "pronomo" -- EO %
-- contabwc["PN"] = "pronomina (kata pengganti)" -- ID %
contabwc["NV"] = "numeralo (nombrovorto)" -- EO %
-- contabwc["NV"] = "numeralia (kata bilangan)" -- ID %
contabwc["AV"] = "adverbo (E-vorto)" -- EO %
-- contabwc["AV"] = "adverbia (kata keterangan)" -- ID %
contabwc["PV"] = "verbpartiklo" -- EO %
-- contabwc["PV"] = "partikel verba" -- ID %
contabwc["QV"] = "demandvorto" -- EO %
-- contabwc["QV"] = "kata tanya" -- ID %
contabwc["KJ"] = "konjunkcio" -- EO %
-- contabwc["KJ"] = "konjungsi" -- ID %
contabwc["SJ"] = "subjunkcio (subfrazenkondukilo)" -- EO %
-- contabwc["SJ"] = "subjungsi (pengaju klausa terikat)" -- ID % further smaller (12)
contabwc["PP"] = "prepozicio (antauxlokigita rolvorteto)" -- EO %
-- contabwc["PP"] = "preposisi (kata depan)" -- ID %
contabwc["PO"] = "postpozicio" -- EO %
-- contabwc["PO"] = "postposisi (kata belakang)" -- ID %
contabwc["PC"] = "cirkumpozicio" -- EO %
-- contabwc["PC"] = "sirkumposisi" -- ID %
contabwc["AR"] = "artikolo" -- EO %
-- contabwc["AR"] = "artikel (kata sandang)" -- ID %
contabwc["IN"] = "interjekcio" -- EO %
-- contabwc["IN"] = "interjeksi" -- ID %
contabwc["PF"] = "prefikso" -- EO #
-- contabwc["PF"] = "prefiks (awalan)" -- ID #
contabwc["UF"] = "sufikso (postfikso, finajxo)" -- EO #
-- contabwc["UF"] = "sufiks (akhiran)" -- ID # nonstandalone (5)
contabwc["KF"] = "cirkumfikso (konfikso)" -- EO #
-- contabwc["KF"] = "sirkumfiks (konfiks)" -- ID #
contabwc["IF"] = "infikso" -- EO #
-- contabwc["IF"] = "infiks (sisipan)" -- ID #
contabwc["NR"] = "nememstara radiko" -- EO #
-- contabwc["NR"] = "akar kata terikat" -- ID #
contabwc["KA"] = "frazo" -- EO $
-- contabwc["KA"] = "kalimat" -- ID $
contabwc["KK"] = "signo" -- EO $ misc (2)
-- contabwc["KK"] = "karakter" -- ID $
contabwc["KU"] = "mallongigo (kurtigo)" -- EO &
-- contabwc["KU"] = "singkatan (abreviasi)" -- ID &
contabwc["GR"] = "vortgrupo" -- EO & additional (3)
-- contabwc["GR"] = "kumpulan kata" -- ID &
contabwc["TV"] = "tabelvorto" -- EO &
-- contabwc["TV"] = "kata tabel" -- ID &
-- constant table (3 integers for preliminary parameter check)
local contabparam = {}
contabparam[0] = 2 -- minimal number of anon parameters
contabparam[1] = 2 -- maximal number of anon parameters
contabparam[2] = 160 -- maximal length of single para (min is hardcoded ONE)
-- constants to control behaviour from source AKA semi-hardcoded parameters
local constrmainctl = "12" -- image (0 or 1) lemma (0 none 1 raw 2 maybe split)
local conboomiddig = false -- controls lng code checking, assign to "true" to allow middle digit "s7a"
------------------------------------------------------------------------
---- SPECIAL STUFF OUTSIDE MAIN FUNCTION ----
------------------------------------------------------------------------
---- VAR:S ----
local piktbllki = 0 -- type "function"
local strvistrc = "" -- for main & sub:s, debug report request by "detrc="
local booguard = false -- only for the guard test, pass to other var ASAP
---- GUARD AGAINST INTERNAL ERROR & IMPORTS ----
if ((type(constrpriv)~="string") or (type(constrplki)~="string") or (type(constrkatp)~="string")) then
booguard = true
else
piktbllki = require(constrplki) -- can crash here despite guarding ??
if (type(constrplki)=="nil") then
booguard = true
end--if
end--if
------------------------------------------------------------------------
---- ORDINARY LOCAL MATH FUNCTIONS ----
------------------------------------------------------------------------
local function mathdiv (xdividend, xdivisor)
local resultdiv = 0 -- DIV operator lacks in LUA :-(
resultdiv = math.floor (xdividend / xdivisor)
return resultdiv
end--function mathdiv
local function mathmod (xdividendo, xdivisoro)
local resultmod = 0 -- MOD operator is "%" and bitwise AND operator lack too
resultmod = xdividendo % xdivisoro
return resultmod
end--function mathmod
------------------------------------------------------------------------
---- ORDINARY LOCAL STRING FUNCTIONS ----
------------------------------------------------------------------------
-- test whether char is an ASCII digit "0"..."9", return bool
local function lftestnum (numkaad)
local boodigit = false
boodigit = ((numkaad>=48) and (numkaad<=57))
return boodigit
end--function lftestnum
------------------------------------------------------------------------
-- test whether char is an ASCII uppercase letter, return bool
local function lftestuc (numkode)
local booupperc = false
booupperc = ((numkode>=65) and (numkode<=90))
return booupperc
end--function lftestuc
------------------------------------------------------------------------
-- test whether char is an ASCII lowercase letter, return bool
local function lftestlc (numcode)
local boolowerc = false
boolowerc = ((numcode>=97) and (numcode<=122))
return boolowerc
end--function lftestlc
------------------------------------------------------------------------
-- test whether char is a punctuation sign, return bool
-- punctuation (5 char:s: ! , . ; ?) 21 33 | 2C 44 | 2E 46 | 3B 59 | 3F 63
-- dash "-" and apo "'" do NOT count as punctuation
-- here we do NOT include SPACE in the list
local function lftestpuncture (numcorde)
local boopunk = false
boopunk = ((numcorde==33) or (numcorde==44) or (numcorde==46) or (numcorde==59) or (numcorde==63))
return boopunk
end--function lftestpuncture
------------------------------------------------------------------------
-- Test whether incoming string consists of given number of
-- ASCII uppercase letters, return bool.
-- return "true" on success
-- This sub depends on "STRING FUNCTIONS"\"lftestuc".
local function lfmultestuc (strinputi, numlenc)
local booallupper = false
local numtestindexx = 1 -- ONE-based
local numtestedchar = 0
booallupper = (string.len(strinputi)==numlenc)
if (booallupper) then
while (true) do
if (numtestindexx>numlenc) then
break
end--if
numtestedchar = string.byte (strinputi,numtestindexx,numtestindexx)
booallupper = booallupper and (lftestuc(numtestedchar))
numtestindexx = numtestindexx + 1
end--while
end--if
return booallupper
end--function lfmultestuc
------------------------------------------------------------------------
-- Local function LFBANMULTI
-- Ban single char:s by multiplicity
-- Incoming control string "strkoneven" with pairs of char:s, for
-- example "'2&0" will tolerate 2 consecutive apo:s but
-- not 3, and completely ban the and-sign "&"
-- Input : - "strkoneven" (even and 2...24, wrong length gives
-- "true", tolerated multiplicity "0"..."9")
-- - "strsample" (0...1'024, empty gives "false",
-- too long gives "true")
-- Output : - "booisevil" (true if evil)
-- This sub depends on "MATH FUNCTIONS"\"mathmod"
-- and "STRING FUNCTIONS"\"lftestnum".
local function lfbanmulti (strkoneven, strsample)
local booisevil = false
local numkonlen = 0 -- lenght of control string
local numsamlen = 0 -- length of sample string
local numinndex = 0 -- ZERO-based outer index
local numinneri = 0 -- ZERO-based inner index
local numchear = 0
local numnexxt = 0
local nummultiq = 1 -- counted multiplicity
local numcrapp = 0 -- from "strkoneven" char to test
local numvrapp = 0 -- from "strkoneven" multiplicity limit
numsamlen = string.len (strsample)
if (numsamlen~=0) then
numkonlen = string.len (strkoneven)
booisevil = (numkonlen<2) or (numkonlen>24) or (mathmod(numkonlen,2)~=0) or (numsamlen>1024)
if (booisevil==false) then
while (true) do -- outer loop
if (numinndex==numsamlen) then
break
end--if
numchear = string.byte (strsample,(numinndex+1),(numinndex+1))
if (numchear==0) then
booisevil = true -- ZERO is unconditionally prohibited
break
end--if
numinndex = numinndex + 1
numnexxt = 0
if (numinndex~=numsamlen) then
numnexxt = string.byte (strsample,(numinndex+1),(numinndex+1))
end--if
if (numchear==numnexxt) then
nummultiq = nummultiq + 1
end--if
if ((numchear~=numnexxt) or (numinndex==numsamlen)) then
numinneri = 0
while (true) do -- innner loop
if (numinneri==numkonlen) then
break
end--if
numcrapp = string.byte (strkoneven,(numinneri+1),(numinneri+1))
numvrapp = string.byte (strkoneven,(numinneri+2),(numinneri+2))
if (lftestnum(numvrapp)==false) then
booisevil = true -- crime in control string detected
break
end--if
if ((numchear==numcrapp) and (nummultiq>(numvrapp-48))) then
booisevil = true -- multiplicity crime in sample string detected
break
end--if
numinneri = numinneri + 2 -- ZERO-based inner index and STEP 2
end--while -- innner loop
if (booisevil) then
break
end--if
nummultiq = 1
end--if ((numchear~=numnexxt) or (numinndex==numsamlen)) then
end--while -- outer loop
end--if (booisevil==false) then
end--if (numsamlen~=0) then
return booisevil
end--function lfbanmulti
------------------------------------------------------------------------
---- ORDINARY LOCAL CONVERSION FUNCTIONS ----
------------------------------------------------------------------------
-- Local function LFDEC1DIGCL
-- Convert 1 decimal ASCII digit to UINT8 with clamp.
local function lfdec1digcl (num1dugyt, num1clim)
num1dugyt = num1dugyt - 48 -- may become invalid ie negative
if ((num1dugyt<0) or (num1dugyt>num1clim)) then
num1dugyt = 0 -- valid ZERO output on invalid input digit
end--if
return num1dugyt
end--function lfdec1digcl
------------------------------------------------------------------------
-- Local function LFONEHEXTOINT
-- Convert 1 ASCII code of a hex digit to an UINT4 ie 0...15 (255 invalid)
-- Only uppercase accepted
local function lfonehextoint (numdigit)
local numresult = 255
if ((numdigit>47) and (numdigit<58)) then
numresult = numdigit-48
end--if
if ((numdigit>64) and (numdigit<71)) then
numresult = numdigit-55
end--if
return numresult
end--function lfonehextoint
------------------------------------------------------------------------
---- ORDINARY LOCAL UTF8 FUNCTIONS ----
------------------------------------------------------------------------
-- Local function LFUTF8LENGTH
-- Measure length of a single UTF8 char, return ZERO if invalid.
-- Does NOT thoroughly check the validity, looks at 1 octet only
-- Input : - numbgoctet (beginning octet of a UTF8 char)
-- Output : - numlen1234x (1...4 or ZERO if invalid)
local function lfutf8length (numbgoctet)
local numlen1234x = 0
if (numbgoctet<128) then
numlen1234x = 1 -- $00...$7F -- ANSI/ASCII
end--if
if ((numbgoctet>=194) and (numbgoctet<=223)) then
numlen1234x = 2 -- $C2 to $DF
end--if
if ((numbgoctet>=224) and (numbgoctet<=239)) then
numlen1234x = 3 -- $E0 to $EF
end--if
if ((numbgoctet>=240) and (numbgoctet<=244)) then
numlen1234x = 4 -- $F0 to $F4
end--if
return numlen1234x
end--function lfutf8length
------------------------------------------------------------------------
-- Local function LFCASEGENE
-- Adjust case of a single letter (generous), limited unicode support
-- with some common UTF8 ranges.
-- Input : * strucinut : single unicode letter (1 or 2 octet:s)
-- * booucas : for desired uppercase "true" and for
-- lowercase "false"
-- Output : * strucinut : (same var, unchanged if input is
-- empty or unknown or invalid)
-- * in ASCII lowercase is $20 above uppercase, b5 reveals
-- the case (1 is upper)
-- * the same is valid in $C3-block
-- * this is NOT valid in $C4-$C5-block, lowercase is usually 1 above
-- uppercase and nothing reveals the case reliably
-- * case delta can be 1 or $20 or $50 other
-- * lowercase is usually above uppercase but not always
-- * case pair distance can span $40-boundary or even $0100-boundary
-- $C2-block $0080 $C2,$80 ... $00BF $C2,$BF no letters (OTOH NBSP mm)
-- $C3-block $00C0 $C3,$80 ... $00FF $C3,$BF (SV mm) delta $20 UC-LC-UC-LC
-- upper $00C0 $C3,$80 ... $00DF $C3,$9F
-- lower $00E0 $C3,$A0 ... $00FF $C3,$BF
-- AA AE EE NN OE UE mm
-- $D7 $DF $F7 excluded (not letters)
-- $FF excluded (here LC, UC is $0178)
-- $C4-$C5-block $0100 $C4,$80 ... $017F $C5,$BF (EO mm)
-- delta 1 and UC even but messy with many exceptions
-- EO $0108 ... $016D case delta 1
-- for example SX upper $015C $C5,$9C - lower $015D $C5,$9D
-- $0138 $0149 $017F excluded (not letters)
-- $0178 excluded (here UC, LC is $FF)
-- $0100 ... $0137 UC even
-- $0139 ... $0148 reversed (UC odd) note that case delta is NOT reversed
-- $014A ... $0177 UC even again
-- $0179 ... $017E reversed (UC odd) note that case delta is NOT reversed
-- $CC-$CF-block $0300 $CC,$80 ... $03FF $CF,$BF (EL mm) delta $20
-- EL $0370 ... $03FF (officially)
-- strict EL base range $0391 ... $03C9 case delta $20
-- $0391 $CE,$91 ... $03AB $CE,$AB upper
-- $03B1 $CE,$B1 ... $03CB $CD,$8B lower
-- for example "omega" upper $03A9 $CE,$A9 - lower $03C9 $CF,$89
-- $D0-$D3-block $0400 $D0,$80 ... $04FF $D3,$BF (RU mm) delta $20 $50
-- strict RU base range $0410 ... $044F case delta $20 but 1 extra char !!!
-- $0410 $D0,$90 ... $042F $D0,$AF upper
-- $0430 $D0,$B0 ... $044F $D1,$8F lower
-- for example "CCCP-gamma" upper $0413 $D0,$93 - lower $0433 $D0,$B3
-- extra base char and exception is special "E" with horizontal doubledot
-- case delta $50 (upper $0401 $D0,$81 - lower $0451 $D1,$91)
-- same applies for ranges $0400 $D0,$80 ... $040F $D0,$8F upper
-- and $0450 $D1,$90 ... $045F $D1,$9F lower
-- This sub depends on "MATH FUNCTIONS"\"mathmod" and
-- "MATH FUNCTIONS"\"mathbittest" and "STRING FUNCTIONS"\"lftestuc" and
-- "STRING FUNCTIONS"\"lftestlc" and "UTF8 FUNCTIONS"\"lfutf8length".
local function lfcasegene (strucinut, booucas)
local numlaengden = 0 -- length from "string.len"
local numchaer = 0 -- UINT8 beginning char
local numchaes = 0 -- UINT8 later char (BIG ENDIAN, lower value here)
local numcharel = 0 -- UINT8 code relative to beginning of block $00...$FF
local numdelabs = 0 -- UINT8 absolute positive delta
local numdelta = 0 -- SINT16 signed, can be negative
local numdelcarry = 0 -- SINT8 signed, can be negative
local boowantlower = false
local booisuppr = false
local booislowr = false
local boopending = false
local booc3blok = false -- $C3 only $00C0...$00FF SV mm delta 32
local booc4c5bl = false -- $C4 $C5 $0100...$017F EO mm delta 1
local boocccfbl = false -- $CC $CF $0300...$03FF EL mm delta 32
local bood0d3bl = false -- $D0 $D3 $0400...$04FF RU mm delta 32 80
while (true) do -- fake loop
numlaengden = string.len (strucinut)
if ((numlaengden==0) or (numlaengden>2)) then
break -- to join mark
end--if
numchaer = string.byte (strucinut,1,1)
if ((lfutf8length(numchaer))~=numlaengden) then
break -- to join mark -- mismatch with length from sub "lfutf8length"
end--if
boowantlower = (not booucas)
if (numlaengden==1) then
booisuppr = lftestuc(numchaer)
booislowr = lftestlc(numchaer)
if (booisuppr and boowantlower) then
numdelta = 32 -- ASCII UPPER->lower
end--if
if (booislowr and booucas) then
numdelta = -32 -- ASCII lower->UPPER
end--if
break -- to join mark
end--if
numchaes = string.byte (strucinut,2,2)
booc3blok = (numchaer==195) -- case delta is 32
booc4c5bl = ((numchaer==196) or (numchaer==197)) -- case delta is 1
boocccfbl = ((numchaer>=204) and (numchaer<=207)) -- case delta is 32
bood0d3bl = ((numchaer>=208) and (numchaer<=211)) -- case delta is 32 80
if (booc3blok) then
boopending = true
numcharel = numchaes + 64 -- simplified calculation here (begins at $C0)
if ((numcharel==215) or (numcharel==223) or (numcharel==247)) then
boopending = false -- not a letter, we are done
end--if
if (numcharel==255) then
boopending = false -- special LC silly "Y" with horizontal doubledot
if (booucas) then
numdelta = 121 -- lower->UPPER (distant and reversed)
end--if
end--if
if (boopending) then
booislowr = (mathbittest(numcharel,5)) -- mostly regular block
booisuppr = not booislowr
if (booisuppr and boowantlower) then
numdelta = 32 -- UPPER->lower
end--if
if (booislowr and booucas) then
numdelta = -32 -- lower->UPPER
end--if
end--if (boopending) then
break -- to join mark
end--if
if (booc4c5bl) then
boopending = true
numcharel = (numchaer-196)*64 + (numchaes-128) -- begins at $C4
if ((numcharel==56) or (numcharel==73) or (numcharel==127)) then
boopending = false -- not a letter, we are done
end--if
if (numcharel==120) then
boopending = false -- special UC silly "Y" with horizontal doubledot
if (boowantlower) then
numdelta = -121 -- UPPER->lower (distant and reversed)
end--if
end--if
if (boopending) then
if (((numcharel>=57) and (numcharel<=73)) or (numcharel>=121)) then
booislowr = ((mathmod(numcharel,2))==0) -- UC odd (reversed)
else
booislowr = ((mathmod(numcharel,2))==1) -- UC even (ordinary)
end--if
booisuppr = not booislowr
if (booisuppr and boowantlower) then
numdelta = 1 -- UPPER->lower
end--if
if (booislowr and booucas) then
numdelta = -1 -- lower->UPPER
end--if
end--if (boopending) then
break -- to join mark
end--if
if (boocccfbl) then
numcharel = (numchaer-204)*64 + (numchaes-128) -- begins at $CC
booisuppr = ((numcharel>=145) and (numcharel<=171))
booislowr = ((numcharel>=177) and (numcharel<=203))
if (booisuppr and boowantlower) then
numdelta = 32 -- UPPER->lower
end--if
if (booislowr and booucas) then
numdelta = -32 -- lower->UPPER
end--if
break -- to join mark
end--if
if (bood0d3bl) then
numcharel = (numchaer-208)*64 + (numchaes-128) -- begins at $D0
booisuppr = (numcharel<=47) -- delta $20 $50
booislowr = ((numcharel>=48) and (numcharel<=95)) -- delta $20 $50
if (booisuppr or booislowr) then
numdelabs = 32
if ((numcharel<=15) or (numcharel>=80)) then
numdelabs = 80
end--if
end--if
if (booisuppr and boowantlower) then
numdelta = numdelabs -- UPPER->lower
end--if
if (booislowr and booucas) then
numdelta = -numdelabs -- lower->UPPER
end--if
break -- to join mark
end--if
break -- finally to join mark
end--while -- fake loop -- join mark
if ((numlaengden==1) and (numdelta~=0)) then
strucinut = string.char (numchaer + numdelta) -- no risk of carry here
end--if
if ((numlaengden==2) and (numdelta~=0)) then
numdelcarry = 0
while ((numchaes+numdelta)>=192) do
numdelta = numdelta - 64
numdelcarry = numdelcarry + 1 -- add BIG ENDIAN 6 bits with carry
end--while
while ((numchaes+numdelta)<=127) do
numdelta = numdelta + 64
numdelcarry = numdelcarry - 1 -- negat add BIG ENDIAN 6 bits with carry
end--while
strucinut = string.char (numchaer + numdelcarry) .. string.char (numchaes + numdelta)
end--if
return strucinut -- same var for input and output !!!
end--function lfcasegene
------------------------------------------------------------------------
-- Local function LFXCASEULT
-- Adjust letter case of beginning letter or all letters in a word or group of
-- words to upper or lower, limited unicode support (generous LFCASEGENE).
-- See LFFIXCASE for ASCII-only version.
-- Input : * strenigo : word or group of words (may be empty)
-- * booupcas : "true" for uppercase and "false" for lowercase
-- * boodoall : "true" to adjust all letters, "false" only beginning
-- This sub depends on "MATH FUNCTIONS"\"mathmod" and
-- "MATH FUNCTIONS"\"mathbittest" and "STRING FUNCTIONS"\"lftestuc" and
-- "STRING FUNCTIONS"\"lftestlc" and "UTF8 FUNCTIONS"\"lfutf8length" and
-- "UTF8 FUNCTIONS"\"lfcasegene" (generous LFCASEGENE).
local function lfxcaseult (strenigo, booupcas, boodoall)
local numlein = 0
local numposi = 1 -- octet position ONE-based
local numcut = 0 -- length of an UTF8 char
local bootryadj = false -- try to adjust single char
local strte7mp = ""
local strelygo = ""
numlein = string.len (strenigo)
while (true) do
if (numposi>numlein) then
break -- done
end--if
bootryadj = (boodoall or (numposi==1))
numcut = lfutf8length(string.byte(strenigo,numposi,numposi))
if ((numcut==0) or ((numposi+numcut-1)>numlein)) then
numcut = 1 -- skip ie copy one faulty octet
bootryadj = false
end--if
strte7mp = string.sub (strenigo,numposi,(numposi+numcut-1)) -- 1...4 oct
if (bootryadj) then
strte7mp = lfcasegene(strte7mp,booupcas) -- (generous LFCASEGENE)
end--if
strelygo = strelygo .. strte7mp -- this can be slow
numposi = numposi + numcut
end--while
return strelygo
end--function lfxcaseult
------------------------------------------------------------------------
---- ORDINARY LOCAL HIGH LEVEL FUNCTIONS ----
------------------------------------------------------------------------
-- Local function LFSTRIPPARENT
-- Strip part of string hidden in parantheses.
-- copy from "strwithparent" to "strystripped" until string " (" found
local function lfstripparent (strwithparent)
local strystripped = ''
local numloongwy = 0
local numiindexx = 0 -- ZERO-based
local numocct = 0
local numoddt = 0
numloongwy = string.len(strwithparent)
while (true) do
if (numiindexx==numloongwy) then
break -- copied whole string
end--if
numocct = string.byte(strwithparent,(numiindexx+1),(numiindexx+1))
numoddt = 0
if ((numiindexx+1)<numloongwy) then
numoddt = string.byte(strwithparent,(numiindexx+2),(numiindexx+2))
end--if
if (numoddt==40) then
break -- stop copying at " (" (2 char:s but only 1 checked)
end--if
strystripped = strystripped .. string.char(numocct)
numiindexx = numiindexx + 1
end--while
return strystripped
end--function lfstripparent
------------------------------------------------------------------------
-- Local function LFSPLIT
-- do NOT call this if no auto split is needed due
-- to wiki param "fra=-" or semi-hardcoded control param
-- Note that the split can sort of fail and return same string, most notably
-- if no split boundaries exist, or some do exist but all are blocked.
-- Counting of the boundaries is tricky. We DO count the suppressed ones but
-- do NOT count multiple consecutive non-letters more than once. Thus the
-- boundaries are between words only and at begin and end, there CANNOT
-- be empty content between 2 boundaries. We usually have 2 faked empty
-- boundaries at begin and end, but they can also be real and count then.
-- For example "AND YES, we !,definit-ely,! can." contains 5 words (that can
-- become 5 output fragments numbered 0...4) words and 5 input boundaries
-- (numbered 0...4). In the text "?va?" there are 2 boundaries at begin
-- and end.
-- "strlongtext" : input text
-- "tabblokr" : index 0...15 holes permitted from "%"
-- "tablinker" : index 0...15 holes permitted from "#"
local function lfsplit (strlongtext, tabblokr, tablinker)
local varrisktabl = 0 -- can be type "nil"
local strfragment = ''
local stromong = '' -- final result
local numloonginp = 0 -- length of input
local numinxed = 0 -- ZERO-based index of input char:s
local numboundrinp = 0 -- counter of detected boundaries include suppressed
local numoutfrag = 0 -- counter of produced fragments
local numotcot = 0
local numotcuu = 0 -- control code from "tablinker" (ZERO is "nil" ie none)
local boohavechar = false
local booqboueof = false -- combo status: boundary char or end of string
local booprevqbe = false -- previous combo status
local boosuppress = false -- suppress split but still do count the boundary
numloonginp = string.len(strlongtext)
while (true) do
if (numinxed==numloonginp) then
boohavechar = false
booqboueof = true -- copied whole string and end of fragment
boosuppress = false -- last chance, we must output accumulated fragment
else
boohavechar = true -- can be part of word or boundary !!!
numotcot = string.byte (strlongtext,(numinxed+1),(numinxed+1))
numinxed = numinxed + 1 -- ZERO-based
booqboueof = ((numotcot==32) or lftestpuncture(numotcot))
boosuppress = (tabblokr[numboundrinp]=="1")
end--if
if (booprevqbe and (booqboueof==false)) then
numboundrinp = numboundrinp + 1 -- count even suppressed boundaries
end--if
booprevqbe = booqboueof -- assign previous status for next round
if ((booqboueof) and (boosuppress==false) and (strfragment~='')) then
if ((stromong~='') or boohavechar) then -- avoid selflink to page
varrisktabl = tablinker[numoutfrag]
numotcuu = 0
if (type(varrisktabl)=="string") then
numotcuu = string.byte (varrisktabl,1,1)
end--if
if (numotcuu==73) then -- "I" lowercase
strfragment = lfxcaseult (strfragment,false,false) .. '|' .. strfragment
end--if
if (numotcuu==65) then -- "A"
strfragment = lfxcaseult (strfragment,true,false) .. '|' .. strfragment
end--if
if (numotcuu==58) then -- ":"
strfragment = string.sub (varrisktabl,2,string.len(varrisktabl)) .. '|' .. strfragment
end--if
if (numotcuu~=78) then -- "N" -- suppress linking
strfragment = '[[' .. strfragment .. ']]'
end--if
end--if ((stromong~='') or boohavechar) then
stromong = stromong .. strfragment
strfragment = ''
numoutfrag = numoutfrag + 1 -- count produced fragments
end--if ((booqboueof) and (boosuppress==false) and (strfragment~='')) then
if (boohavechar) then
if (booqboueof and (boosuppress==false)) then
stromong = stromong .. string.char(numotcot)
else
strfragment = strfragment .. string.char(numotcot)
end--if
else
break -- done all
end--if
end--while
return stromong
end--function lfsplit
------------------------------------------------------------------------
-- Local function LFBREWERROR
-- #E02...#E08, note that #E00 and #E01 are NOT supposed to be included here
-- We need const strings "constrkros", "constrelabg",
-- "constrelaen" and const table "contaberaroj"
local function lfbrewerror (numerrorcode)
local stritsucks = '#E'
if ((numerrorcode>=2) and (numerrorcode<=8)) then
stritsucks = stritsucks .. '0' .. string.char(numerrorcode+48) .. ' ' .. contaberaroj[numerrorcode]
else
stritsucks = stritsucks .. '??'
end--if
stritsucks = constrkros .. constrelabg .. stritsucks .. constrelaen .. constrkros
return stritsucks
end--function lfbrewerror
------------------------------------------------------------------------
-- Local function LFCHKKODINV
-- Check whether a string (intended to be a language code) contains only 2
-- or 3 lowercase letters or maybe a digit in middle position or maybe
-- instead equals to "-" or "??" and maybe additionally is not
-- included on the ban list.
-- Input : * strqooq -- string (empty is useless and returns
-- "true" ie "bad" but can't cause any harm)
-- * numnokod -- "0" ... "3" how special codes "-" "??" should pass
-- * boodigit -- "true" to allow digit in middle position
-- * boonoban -- "true" to skip test against ban table
-- Output : * booisbadlk -- bool "true" if the string is evil
-- This sub depends on "STRING FUNCTIONS"\"lftestnum"
-- and "STRING FUNCTIONS"\"lftestlc".
-- We need const table "contabisbanned".
local function lfchkkodinv (strqooq, numnokod, boodigit, boonoban)
local varomongkosong = "" -- for check against the ban list
local booisbadlk = false -- pre-assume good
local numchiiar = 0
local numukurran = 0
local numindeex = 0 -- ZERO-based
while (true) do -- fake (outer) loop
if ((strqooq=="-") and ((numnokod==1) or (numnokod==3))) then
break -- to join mark -- good
end--if
if ((strqooq=="??") and ((numnokod==2) or (numnokod==3))) then
break -- to join mark -- good
end--if
numukurran = string.len (strqooq)
if ((numukurran<2) or (numukurran>3)) then
booisbadlk = true
break -- to join mark -- evil
end--if
numchiiar = string.byte (strqooq,1,1)
if (lftestlc(numchiiar)==false) then
booisbadlk = true
break -- to join mark -- evil
end--if
numchiiar = string.byte (strqooq,numukurran,numukurran)
if (lftestlc(numchiiar)==false) then
booisbadlk = true
break -- to join mark -- evil
end--if
if (numukurran==3) then
numchiiar = string.byte (strqooq,2,2)
if ((boodigit==false) or (lftestnum(numchiiar)==false)) then
if (lftestlc(numchiiar)==false) then
booisbadlk = true
break -- to join mark -- evil
end--if
end--if ((boodigit==false) or (lftestnum(numchiiar)==false))
end--if
if (boonoban==false) then
while (true) do -- ordinary inner loop
varomongkosong = contabisbanned[numindeex+1] -- number of elem unknown
if (type(varomongkosong)~="string") then
break -- abort inner loop (then fake outer loop) due to end of table
end--if
numukurran = string.len (varomongkosong)
if ((numukurran<2) or (numukurran>3)) then
break -- abort inner loop (then fake outer loop) due to faulty table
end--if
if (strqooq==varomongkosong) then
booisbadlk = true
break -- abort inner loop (then fake outer loop) due to violation
end--if
numindeex = numindeex + 1 -- ZERO-based
end--while -- ordinary inner loop
end--if (boonoban==false) then
break -- finally to join mark
end--while -- fake loop -- join mark
return booisbadlk
end--function lfchkkodinv
------------------------------------------------------------------------
-- Local function LFFILLNAME
-- Replace placeholder "\@" "\\@" by augmented name of the caller.
-- The caller name is submitted to us as a parameter thus we
-- do NOT access any constants and do NOT have to peek it either.
local function lffillname (strmessage,strcaller)
local strhasill = ''
local numstrloen = 0
local numindfx = 1 -- ONE-based
local numcjar = 0
local numcjnext = 0
numstrloen = string.len (strmessage)
while (true) do
if (numindfx>numstrloen) then
break -- empty input is useless but cannot cause major harm
end--if
numcjar = string.byte (strmessage,numindfx,numindfx)
numindfx = numindfx + 1
numcjnext = 0
if (numindfx<=numstrloen) then
numcjnext = string.byte (strmessage,numindfx,numindfx)
end--if
if ((numcjar==92) and (numcjnext==64)) then
strhasill = strhasill .. strcaller -- invalid input is caller's risk
numindfx = numindfx + 1 -- skip 2 octet:s of the placeholder
else
strhasill = strhasill .. string.char (numcjar)
end--if
end--while
return strhasill
end--function lffillname
------------------------------------------------------------------------
-- Local function LFKODEOSG
-- Transcode X-surrogates (without "\", thus for example "kacxo",
-- NOT "ka\cxo") to cxapeloj in a string (EO only)
-- Input : - strsurr -- string (empty is useless but can't cause major harm)
-- Output : - strcxapeloj
-- We need const table "contabtransiltable".
-- This sub depends on "MATH FUNCTIONS"\"mathdiv"
-- and "MATH FUNCTIONS"\"mathmod".
local function lfkodeosg (strsurr)
local varpeek = 0
local strcxapeloj = ''
local numinputl = 0
local numininx = 0 -- ZERO-based source index
local numknark = 0 -- current char (ZERO is NOT valid)
local numknarp = 0 -- previous char (ZERO is NOT valid)
local numlow = 0
local numhaj = 0
numinputl = string.len(strsurr)
while (true) do
if (numininx==numinputl) then
break
end--if
numknark = string.byte(strsurr,(numininx+1),(numininx+1))
numininx = numininx + 1
numhaj = 0 -- pre-assume no translation
if ((numknarp~=0) and ((numknark==88) or (numknark==120))) then -- got "x"
varpeek = contabtransiltable[numknarp] -- UINT16 or nil
if (varpeek~=nil) then
numlow = mathmod (varpeek,256)
numhaj = mathdiv (varpeek,256)
end--if
end--if
if (numhaj~=0) then
strcxapeloj = strcxapeloj .. string.char(numhaj,numlow)
numknark = 0 -- invalidade current char
else
if (numknarp~=0) then -- add previous char only if valid
strcxapeloj = strcxapeloj .. string.char(numknarp) -- add it
end--if
end--if
numknarp = numknark -- copy to previous even if invalid
end--while
if (numknarp~=0) then -- add previous and last char only if valid
strcxapeloj = strcxapeloj .. string.char(numknarp) -- add it
end--if
return strcxapeloj
end--function lfkodeosg
------------------------------------------------------------------------
---- MAIN EXPORTED FUNCTION ----
------------------------------------------------------------------------
function lawc.ek (arxframent)
-- general unknown type
local vartmp = 0 -- variable without type multipurpose
-- special type "args" AKA "arx"
local arxsomons = 0 -- metaized "args" from our own or caller's "frame"
-- general tab
local tabblock = {} -- from "%"
local tablinx = {} -- from "#"
-- general str
local strtmp = "" -- temp (fix "contaberaroj", fill "strvistrc", ...)
local strtpm = "" -- temp
local strviserr = "" -- visible error
local strvisgud = "" -- visible good output
local strinvank = "" -- invisible "anchor" part
local strinvkat = "" -- invisible category part
local strret = "" -- final result string
-- str specific to language processing
local strfrafra = "" -- split control string from "fra=" before conversion
local strdstdst = "" -- distinction hint from "dst="
local strpagenam = "" -- from "{{PAGENAME}}" or "pagenameoverridetestonly"
local strlemma = "" -- bold lemma (maybe split) from pagename
local strkodbah = "" -- language code (2 or 3 lowercase) from arxsomons[1]
local strkodkek6 = "" -- word class code (2 uppercase) from arxsomons[2]
local strkodkek7 = "" -- further word class
local strnambah = "" -- language name (without prefix "Bahasa")
local strnambauc = "" -- language name uppercased begin ("Angla")
local strnamasin = "" -- language name in the language (propralingve)
local strnamke6 = "" -- word class full
local strnamco6 = "" -- word class truncated and uppercased begin
local strnamke7 = "" -- word class full
local strnamco7 = "" -- word class truncated and uppercased begin
-- general num
local numerr = 0 -- 1 inter 2 para 3 sub 4 neval 5 nekon 6 wc 7 fra 8 dst
local numpindex = 0 -- number of anon params
local numlong = 0 -- for parameter processing
local numtamp = 0 -- for parameter processing and split processing
local numoct = 0
-- num for split control string processing
local numlaong = 0
local numodt = 0
local numoet = 0
local numoft = 0
local numtbindx = 0 -- current index
local numprevdx = 0 -- previus index
local numhelpcn = 0 -- help counter
-- quasi-constant num from "constrmainctl"
local numshowlemma = 0 -- tri-state
-- general boo
local boonocat = false -- from "nocat=true"
local bootrace = false -- from "detrc=true"
local bookonata = false -- true if "piktbllki" index 0 returns valid name
local boohavasi = false -- true if we have valid name in "strnamasin" too
local boohavdua = false -- true if we have 2 word classes
local boohavfra = false
local boohavdst = false
-- quasi-constant boo from "constrmainctl"
local booshowimage = false
---- GUARD AGAINST INTERNAL ERROR AGAIN ----
-- later reporting of #E01 may NOT depend on uncommentable strings
if (booguard) then
numerr = 1 -- #E01 internal
end--if
---- FILL IN ERROR MESSAGES AND TRANSCODE EO IF NEEDED ----
-- placeholder "\@" "\\@" is replaced by augmented name of the caller
-- from "constrkoll" in any case, for example 'sxablono "test"'
-- or 'templat "test"'
-- only for EO the X-substitution is subsequently performed
if (numerr==0) then
numtamp = 2 -- start with #E02
while (true) do
vartmp = contaberaroj[numtamp]
if ((type(vartmp))=="nil") then -- number of messages is NOT harcoded
break
end--if
strtmp = lffillname (vartmp,constrkoll)
if (constrpriv=="eo") then
strtmp = lfkodeosg (strtmp)
end--if
contaberaroj[numtamp] = strtmp
numtamp = numtamp + 1 -- TWO-based
end--while
if (constrpriv=="eo") then
contabwc["PP"] = lfkodeosg(contabwc["PP"])
contabwc["UF"] = lfkodeosg(contabwc["UF"])
end--if
end--if
---- FILL IN 2 SEMI-HARDCODED PARAMETERTS ----
numoct = string.byte (constrmainctl,1,1) -- "0" or "1"
booshowimage = (numoct==49)
numoct = string.byte (constrmainctl,2,2) -- "0" or "1" or "2"
numshowlemma = lfdec1digcl (numoct,2)
---- SEIZE THE PAGENAME ----
-- later reporting of #E01 may NOT depend on uncommentable strings
-- must be 1...120 octet:s keep consistent with "pagenameoverridetestonly="
strpagenam = ""
if (numerr==0) then
vartmp = mw.title.getCurrentTitle().text -- without namespace prefix
if (type(vartmp)=="string") then
numtamp = string.len(vartmp)
if ((numtamp>=1) and (numtamp<=120)) then
strpagenam = vartmp -- pagename here (empty NOT legal see below)
end--if
end--if
if (strpagenam=="") then
numerr = 1 -- #E01 internal
end--if
end--if
---- WHINE IF YOU MUST #E01 ----
-- reporting of this error #E01 may NOT depend on
-- uncommentable strings as "constrkoll" and "contaberaroj"
-- do NOT use sub "lfbrewerror", report our name (NOT of template) and in EN
if (numerr==1) then
strtpm = '#E01 Internal error in module "mlawc".'
strviserr = constrkros .. constrelabg .. strtpm .. constrelaen .. constrkros
end--if
---- GET THE ARX (ONE OF TWO) ----
if (numerr==0) then
arxsomons = arxframent.args -- "args" from our own "frame"
if (arxsomons['caller']=="true") then
arxsomons = arxframent:getParent().args -- "args" from caller's "frame"
end--if
end--if
---- PRELIMINARILY ANALYZE ANONYMOUS PARAMETERS ----
-- this will catch holes, empty parameters, too long parameters,
-- and wrong number of parameters
-- below on exit var "numpindex" will contain number of
-- prevalidated anonymous params
-- this depends on 3 constants:
-- * contabparam[0] minimal number
-- * contabparam[1] maximal number
-- * contabparam[2] maximal length (default 160)
if (numerr==0) then
numpindex = 0 -- ZERO-based
numtamp = contabparam[1] -- maximal number of params
while (true) do
vartmp = arxsomons [numpindex+1] -- can be "nil"
if ((type(vartmp)~="string") or (numpindex>numtamp)) then
break -- good or bad
end--if
numlong = string.len (vartmp)
if ((numlong==0) or (numlong>contabparam[2])) then
numerr = 2 -- #E02 param/RTFD
break -- only bad here
end--if
numpindex = numpindex + 1 -- on exit has number of valid parameters
end--while
if ((numpindex<contabparam[0]) or (numpindex>numtamp)) then
numerr = 2 -- #E02 param/RTFD
end--if
end--if
---- PROCESS 2 OBLIGATORY ANONYMOUS PARAMS INTO 3 STRINGS ----
-- now var "numpindex" sudah contains number of prevalidated params always
-- 2 and is useless
-- here we validate and assign "strkodbah", "strkodkek6",
-- "boohavdua", "strkodkek7" (can be empty)
-- note that "lfchkkodinv" returns "true" on failure and natively supports
-- "??" whereas "lfmultestuc" returns "true" on success and does
-- NOT natively support "??"
-- this depends directly on const bool "conboomiddig"
-- this depends indirectly on const table "contabisbanned" via "lfchkkodinv"
if (numerr==0) then
while (true) do -- fake loop
strkodbah = arxsomons[1] -- language code (obligatory)
if (lfchkkodinv(strkodbah,2,conboomiddig,false)) then
numerr = 4 -- #E04 -- "??" is tolerable but "-" is NOT in "lfchkkodinv"
break -- to join mark
end--if
boohavdua = false
strkodkek6 = arxsomons[2] -- 2 UC or 4 UC (obligatory)
numlong = string.len (strkodkek6)
strkodkek7 = ""
if (numlong==4) then -- maybe 2 word classes
strkodkek7 = string.sub (strkodkek6,3,4)
strkodkek6 = string.sub (strkodkek6,1,2)
if ((strkodkek6=='??') or (strkodkek7=='??')) then
numerr = 6 -- #E06 -- if both are specified then no "??" tolerable
break -- to join mark
end--if
boohavdua = true
end--if
if (strkodkek6~='??') then -- "??" is unknown but not faulty
if (lfmultestuc(strkodkek6,2)==false) then
numerr = 6 -- #E06
break -- to join mark
end--if
end--if
if (boohavdua) then -- here "??" for unknown is NOT permitted
if (lfmultestuc(strkodkek7,2)==false) then
numerr = 6 -- #E06
break -- to join mark
end--if
end--if
if ((strkodbah=='??') and (strkodkek6=='??')) then
numerr = 6 -- #E06 -- both unknown is illegal
break -- to join mark
end--if
break -- finally to join mark
end--while -- fake loop -- join mark
end--if
---- PROCESS 2 OPTIONAL NAMED PARAMS INTO 2 STRINGS ----
-- here we prevalidate and assign "boohavfra" and "strfrafra"
-- (1...120) from "fra="
-- here we validate and assign "boohavdst" and "strdstdst"
-- (2...40) from "dst="
if (numerr==0) then
while (true) do -- fake loop
boohavfra = false
strfrafra = ''
vartmp = arxsomons['fra'] -- optional, NOT prevalidated
if (type(vartmp)=="string") then
numtamp = string.len(vartmp)
if ((numtamp>=1) and (numtamp<=120)) then
boohavfra = true -- even true with "-"
strfrafra = vartmp
else
numerr = 7 -- #E07 -- "fra=" is bad -- apartigo
break
end--if
end--if
boohavdst = false
strdstdst = ''
vartmp = arxsomons['dst'] -- optional, NOT prevalidated
if (type(vartmp)=="string") then
numtamp = string.len(vartmp)
if ((numtamp>=2) and (numtamp<=40)) then
boohavdst = true
strdstdst = vartmp
strtmp = 'Parameter "dst=" preliminarily seized, length is ' .. tostring (numtamp)
strvistrc = strvistrc .. "<br>" .. strtmp
if (lfbanmulti("'0[0]0{0}0(0)0",strdstdst)) then
strtmp = 'Illegal bracket in parameter "dst=" found'
strvistrc = strvistrc .. "<br>" .. strtmp
numerr = 8 -- #E08 -- "dst=" is bad -- all brackets prohibited
break
end--if
else
numerr = 8 -- #E08 -- "dst=" is bad
break
end--if
end--if
break -- finally to join mark
end--while -- fake loop -- join mark
end--if
---- PROCESS 3 HIDDEN NAMED PARAMS INTO 1 STRING AND 2 BOOL:S ----
-- this may "override mw.title.getCurrentTitle().text" and
-- replace content in "strpagenam", empty is NOT valid
-- no error is possible here
-- "detrc=" must be seized independently on "numerr" even if we
-- already suck, but type must be checked !!!
if (numerr==0) then
vartmp = arxsomons['pagenameoverridetestonly']
if (type(vartmp)=="string") then
numtamp = string.len(vartmp)
if ((numtamp>=1) and (numtamp<=120)) then
strpagenam = vartmp -- ignore if empty
end--if
end--if
if (arxsomons['nocat']=='true') then
boonocat = true
end--if
end--if
if (type(arxsomons)=="table") then
if (arxsomons['detrc']=='true') then
bootrace = true
end--if
end--if
if (bootrace) then
strtmp = 'Done with parameters, "numerr" is ' .. tostring (numerr)
strvistrc = strvistrc .. "<br>" .. strtmp
end--if
---- PROCESS AND VALIDATE SPLIT CONTROL STRING TO 2 TABLES ----
-- process from "boohavfra" and "strfrafra" to
-- "tabblock" (from "%") and "tablinx" (from "#")
-- "boohavfra" equal "true" means only that "strfrafra" is non-empty
-- 1...120 octet's but not more
-- beware of possible special value of "strfrafra" equal "-" still
-- valid below ("boohavfra" is "true", tables empty)
-- example of valid syntax "%3A #2N #5A #7N #8:test"
-- note that "%" may not be alone ie empty nor followed by SPACE ie "% "
-- any SPACE must be followed by "#" by syntax rules
-- this can brew #E07
if ((numerr==0) and boohavfra and (strfrafra~="-")) then
while (true) do -- outer fake loop
numlaong = string.len (strfrafra)
numtamp = 1 -- ONE-based index
numprevdx = - 1 -- must be consecutive, index ZERO valid
numoft = string.byte (strfrafra,1,1) -- got "%" or NOT ??
if (numoft==37) then
if (numlaong==1) then
numerr = 7 -- #E07 -- "fra=" is bad -- "%" may not be empty
break -- outer fake loop
end--if
numodt = string.byte (strfrafra,2,2) -- "% " is illegal
if (numodt==32) then
numerr = 7 -- #E07 -- "fra=" is bad -- "%" may not be empty
break -- outer fake loop
end--if
numtamp = 2 -- ONE-based index -- check after "%"
numhelpcn = 0 -- counts blocked boundaries (max 8)
while (true) do -- inner honest loop
if ((numtamp>numlaong) or (numhelpcn>8)) then
break -- inner loop only -- good or bad
end--if
numoft = string.byte (strfrafra,numtamp,numtamp) -- SPACE or HEX req
numtamp = numtamp + 1
if (numoft==32) then
numoet = 0
if (numtamp<=numlaong) then
numoet = string.byte (strfrafra,numtamp,numtamp) -- "#" required
end--if
if (numoet~=35) then
numerr = 7 -- #E07 -- "fra=" is bad
end--if
break -- inner loop only -- good or bad
end--if
numtbindx = lfonehextoint (numoft)
if ((numtbindx==255) or (numtbindx<=numprevdx)) then
numerr = 7 -- #E07 -- "fra=" is bad
break -- inner loop only
end--if
tabblock [numtbindx] = '1' -- type "string"
numhelpcn = numhelpcn + 1
numprevdx = numtbindx
end--while
end--if
if (numhelpcn>8) then
numerr = 7 -- #E07 -- "fra=" is bad
end--if
if (numerr~=0) then
break -- outer loop with #E07
end--if
if (numtamp>numlaong) then
break -- outer loop -- OK
end--if
numprevdx = - 1 -- must be consecutive, index ZERO valid, restart from it
while (true) do -- inner honest loop
if (numtamp>numlaong) then
break -- inner loop only -- good end of string
end--if
numoft = string.byte (strfrafra,numtamp,numtamp) -- "#" required
numtamp = numtamp + 1
if (numoft~=35) then
numerr = 7 -- #E07 -- "fra=" is bad
break -- inner loop only
end--if
if (numtamp>numlaong) then
numerr = 7 -- #E07 -- "fra=" is bad
break -- inner loop only
end--if
numoft = string.byte (strfrafra,numtamp,numtamp) -- HEX required
numtamp = numtamp + 1
numtbindx = lfonehextoint (numoft)
if ((numtbindx==255) or (numtbindx<=numprevdx)) then
numerr = 7 -- #E07 -- "fra=" is bad
break -- inner loop only
end--if
strtmp = "" -- no valid hit yet
if (numtamp>numlaong) then
numerr = 7 -- #E07 -- "fra=" is bad
break -- inner loop only
end--if
numodt = string.byte (strfrafra,numtamp,numtamp) -- one of 4 required
numtamp = numtamp + 1
if ((numodt==78) or (numodt==73) or (numodt==65)) then
strtmp = string.char (numodt) -- type "string"
if (numtamp<=numlaong) then
numoet = string.byte (strfrafra,numtamp,numtamp) -- SPACE required
numtamp = numtamp + 1 -- SPACE must be eaten away here !!!
if (numoet~=32) then
numerr = 7 -- #E07 -- "fra=" is bad
end--if
if (numtamp<=numlaong) then
numoet = string.byte (strfrafra,numtamp,numtamp) -- "#" required
end--if
if (numoet~=35) then
numerr = 7 -- #E07 -- "fra=" is bad
end--if
end--if
end--if ((numodt==78) or (numodt==73) or (numodt==65)) then
if (numodt==58) then -- ":"
numhelpcn = 0 -- counts char:s in the link target
while (true) do -- deep honest loop
if ((numtamp>numlaong) or (numhelpcn==41)) then
break -- deep loop only -- good or bad
end--if
numodt = string.byte (strfrafra,numtamp,numtamp) -- trash "numodt"
numtamp = numtamp + 1
if (numodt==32) then
numoet = 0 -- SPACE must be eaten away here !!! INC is above
if (numtamp<=numlaong) then
numoet = string.byte (strfrafra,numtamp,numtamp) -- "#" required
end--if
if (numoet~=35) then
numerr = 7 -- #E07 -- "fra=" is bad
end--if
break -- deep loop only -- good or bad
end--if
strtmp = strtmp .. string.char (numodt) -- no ":" prefix yet
numhelpcn = numhelpcn + 1
end--while
if ((numhelpcn==0) or (numhelpcn>40)) then
numerr = 7 -- #E07 -- "fra=" is bad
end--if
if (numerr~=0) then
break -- inner loop with #E07
end--if
strtmp = ":" .. strtmp -- add the prefix
end--if (numodt==58) then
if (strtmp=="") then
numerr = 7 -- #E07 -- "fra=" is bad
end--if
if (numerr~=0) then
break -- inner loop with #E07
end--if
tablinx [numtbindx] = strtmp
numprevdx = numtbindx
end--while
break -- finally to join mark
end--while -- fake loop -- join mark
end--if
if (bootrace) then
strtmp = 'Table "tabblock" from "%" :'
numtamp = 0
while (true) do
if (numtamp==17) then
break -- deliberately 0...16 incl
end--if
strtmp = strtmp .. ' ' .. tostring (numtamp) .. ' -> "' .. tostring (tabblock[numtamp]) .. '"'
numtamp = numtamp + 1
end--while
strvistrc = strvistrc .. "<br>" .. strtmp
strtmp = 'Table "tablinx" from "#" :'
numtamp = 0
while (true) do
if (numtamp==17) then
break -- deliberately 0...16 incl
end--if
strtmp = strtmp .. ' ' .. tostring (numtamp) .. ' -> "' .. tostring (tablinx[numtamp]) .. '"'
numtamp = numtamp + 1
end--while
strvistrc = strvistrc .. "<br>" .. strtmp
end--if
---- AUTOSPLIT THE LEMMA IF NEEDED ----
-- process from "strpagenam" (sudah guaranteed to be
-- non-empty) to "strlemma"
-- we do exactly nothing if:
-- * we already suck ie "numerr"<>0
-- * "numshowlemma" is ZERO
-- we skip the split and copy only if:
-- * "numshowlemma" is 1
-- * "strfrafra" is "-"
-- punctuation (5 char:s: ! , . ; ?) 21 33 | 2C 44 | 2E 46 | 3B 59 | 3F 63
-- dash "-" and apo "'" do NOT count as punctuation
if ((numerr==0) and (numshowlemma~=0)) then
if ((numshowlemma==2) and (strfrafra~="-")) then
strlemma = lfsplit (strpagenam, tabblock, tablinx)
else
strlemma = strpagenam
end--if
end--if
---- PEEK THE LANGUAGE NAMES VIA SUBMODULE ----
-- for lng name in site language ("c0"):
-- * "-" is unconditionally evil with #E03 (broken submodule)
-- * "=" can be #E05 (unknown code) if the site language
-- code works, otherwise #E03 (broken submodule) too
-- for lng name "propralingve" ("c1"):
-- * "=" is unconditionally evil with #E03 (since the code used
-- to work just before)
-- * "-" is silently ignored (name not available)
if (numerr==0) then
if (strkodbah=='??') then -- "??" is unknown but not faulty
bookonata = false -- less evil than (numerr>0)
strnambah = constrneli -- unknown lang
else
bookonata = true
strnambah = piktbllki.ek { args = { strkodbah , "0" } } -- l nam no "rl"
if (strnambah=="=") then
strtpm = piktbllki.ek { args = { constrpriv , "-" } } -- tes own l cod
if (strtpm=="1") then
numerr = 5 -- #E05 unknown code (since our own code found)
else
numerr = 3 -- #E03 broken submodule (site code does NOT work either)
end--if
end--if (strnambah=="=")
if (strnambah=="-") then -- better content in "c0" absolutely required
numerr = 3 -- #E03 broken submodule
end--if
end--if (strkodbah=='??') else
end--if
if ((numerr==0) and bookonata) then
boohavasi = false
strnamasin = piktbllki.ek { args = { strkodbah, "1" , "1" } } -- lng asing
if (strnamasin=="=") then -- content not absolutely requ but this is error
numerr = 3 -- #E03 error
else
if (strnamasin~="-") then
boohavasi = true -- have valid name better than "-" to display
strnamasin = constrisobg .. strnamasin .. constrisoen -- add the isola
end--if
end--if
end--if
--- TRANSLATE WORD CLASS CODE VIA LUA TABLE ----
-- "strnamke6" and "strnamke7" is the long word class with possible (...)
if (numerr==0) then
if (strkodkek6=='??') then -- "??" is unknown but not faulty
strnamke6 = constrnekk -- word class full -- unknown word class
else
vartmp = contabwc[strkodkek6]
if (vartmp==nil) then
numerr = 6 -- #E06 -- unknown word class
else
strnamke6 = vartmp -- word class full -- found it in the table
end--if
end--if (strkodkek6=='??') else
end--if
if ((numerr==0) and boohavdua) then
vartmp = contabwc[strkodkek7] -- no "??" possible here
if (vartmp==nil) then
numerr = 6 -- #E06 -- unknown word class
else
strnamke7 = vartmp -- word class full -- found it in the table
end--if
end--if
---- BREW 2 OR 3 EXTRA STRING:S ONLY FOR CATEGORIES ----
-- "strnambauc" is language name with uppercased begin
-- "strnamco6" and "strnamco7" is word class stripped with uppercased begin
if (numerr==0) then
strnambauc = lfxcaseult (strnambah,true,false)
strnamco6 = lfxcaseult (lfstripparent(strnamke6),true,false)
if (boohavdua) then
strnamco7 = lfxcaseult (lfstripparent(strnamke7),true,false)
end--if
end--if
---- WHINE IF YOU MUST #E02...#E08 ----
-- reporting of errors #E02...#E08 depends on uncommentable strings
-- and name of the caller filled in from "constrkoll"
if (numerr>1) then
strviserr = lfbrewerror(numerr)
end--if
---- BREW THE INVISIBLE ANCHOR PART ----
-- uses "constrankkom" (does NOT end with a dash) and "constaankend"
-- '<span id="' .. anchor name .. '"></span>'
-- we can brew 2 or 3 or 5 anchors
strinvank = ''
if (numerr==0) then
strinvank = constrankkom .. "-" .. strkodbah .. constaankend
strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek6 .. constaankend
if (boohavdst) then
strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek6 .. '-' .. strdstdst .. constaankend
end--if
if (boohavdua) then
strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek7 .. constaankend
if (boohavdst) then
strinvank = strinvank .. constrankkom .. "-" .. strkodbah .. "-" .. strkodkek7 .. '-' .. strdstdst .. constaankend
end--if
end--if (boohavdua) then
end--if
---- BREW THE VISIBLE PART ----
-- "strlemma" is the lemma with or without separation links
-- "numshowlemma" is tri-state but here we bother only about ZERO vs non-ZERO
strvisgud = ''
if (numerr==0) then
strvisgud = '<div style="margin:0.2em;"></div>' -- must be empty
if (booshowimage) then
strvisgud = strvisgud .. '[[File:Gartoon apps kopete all away replaced.svg|24px|link=]]'
end--if
strvisgud = strvisgud .. ' '
if (numshowlemma~=0) then
strvisgud = strvisgud .. '<b><bdi>' .. strlemma .. '</bdi></b> ' -- lemma and space
end--if
strvisgud = strvisgud .. '( <span ' .. constrtoolt .. ' title="' .. constrbaha .. strnambah
if (boohavasi) then
strvisgud = strvisgud .. ' ' .. strnamasin -- lang name in the lang with isola
end--if
strvisgud = strvisgud .. '"> ' .. strkodbah .. ' </span>'
strvisgud = strvisgud .. ' , <span ' .. constrtoolt .. ' title="' .. constrkeka .. strnamke6
strvisgud = strvisgud .. '"> ' .. strkodkek6 .. ' </span>'
if (boohavdua) then
strvisgud = strvisgud .. ' , <span ' .. constrtoolt .. ' title="' .. constrkeka .. strnamke7
strvisgud = strvisgud .. '"> ' .. strkodkek7 .. ' </span>'
end--if
strvisgud = strvisgud .. ' )'
end--if
---- BREW THE INVISIBLE CATEGORY PART ----
-- string "constrkatp" is cat prefix and includes the colon ":"
if ((numerr==0) and (boonocat==false)) then
if (constrpriv=="eo") then
strinvkat = '[[' .. constrkatp .. 'Kapvorto (' .. strnambah .. ')]]' -- lang
strinvkat = strinvkat .. '[[' .. constrkatp .. strnamco6 .. ']]' -- wc
strinvkat = strinvkat .. '[[' .. constrkatp .. strnamco6 .. ' (' .. strnambah .. ')]]'
if (boohavdua) then
strinvkat = strinvkat .. '[[' .. constrkatp .. strnamco7 .. ']]' -- wc
strinvkat = strinvkat .. '[[' .. constrkatp .. strnamco7 .. ' (' .. strnambah .. ')]]'
end--if
end--if
if (constrpriv=="id") then
strinvkat = '[[' .. constrkatp .. 'Kata bahasa ' .. strnambah .. ']]' -- lang
strinvkat = strinvkat .. '[[' .. constrkatp .. strnamco6 .. ']]' -- wc
strinvkat = strinvkat .. '[[' .. constrkatp .. strkodbah .. ':' .. strnamco6 .. ']]'
if (boohavdua) then
strinvkat = strinvkat .. '[[' .. constrkatp .. strnamco7 .. ']]' -- wc
strinvkat = strinvkat .. '[[' .. constrkatp .. strkodbah .. ':' .. strnamco7 .. ']]'
end--if
end--if
end--if
---- RETURN THE JUNK STRING ----
strret = strviserr .. strinvank .. strvisgud .. strinvkat
if (bootrace) then
strret = "<br>" .. strvistrc .. "<br><br>" .. strret
end--if
return strret
end--function
---- RETURN THE JUNK LUA TABLE ----
return lawc