--[===[
MODULE "MTMPLLOADDATA" (multiple private template to load data)
"eo.wiktionary.org/wiki/Modulo:mtbllingvoj" <!--2022-Jul-11-->
"id.wiktionary.org/wiki/Modul:mtblbahasa"
Purpose: translate (the transcludable part of) wikitext of a template
(name is hardcoded to "tblbahasa") to 4 LUA tables that can be
used repeatedly via the infamous "mw.loadData" command
Utilo: traduki na (la transkluzivigebla parto de) vikiteksto de sxablono
(nomo fiksita al "tbllingvoj") al 4 LUA-tabeloj kiuj povas esti
uzataj ripete per la famacxa ordono "mw.loadData"
Manfaat: menerjemahkan (bagian yang bisa ditransklusikan) wikiteks templat
(namanya tetap "tblbahasa") menjadi 4 tabel LUA yang bisa digunakan
beberapa kali melalui perintah "mw.loadData" yang terkenal buruk
Syfte: oeversaetta (den transkluderingsbara delen av) wikitext fraan en mall
(namn fastslaget till "tbllingvoj") till 4 LUA-tabeller som kan
anvaendas upprepade gaanger medelst det oekaenda
kommandot "mw.loadData"
Incoming: * nothing (imported via "mw.loadData", no ordinary caller)
Returned: * LUA table containing 2 ... 4 inner LUA tables and up to
7 items of status data :
* [0] (status, integer) status code (ZERO or 2...13)
* [1] (status, integer) full bloat of the source text from template
* [2] (status, integer) remaining bloat of the source text from
template after removing both outer areas but still
keeping all excessive inner whitespace
* [3] (status, integer) octet position of error (excessive
whitespace does count, two outer areas don't)
* [4] (status, integer) number of done lines, or number of line
where an error occurred (empty lines do NOT count)
* [5] (status, string) error string on some
errors, otherwise empty string
* on #E06 #E12 raw complete faulty line
* on #E07 early participant in sorting crime
* on #E08...#E10 "FORWARD DUPE" "REVERSE DUPE" "EXTRA DUPE"
* on #E11 "INVALID LANG CODE"
* [6] (status, string) error string on some
errors, otherwise empty string
* on #E06 #E12 report with string length and earliest and
last char and reverse and extra settings
* on #E07 latter participant in sorting crime
* on #E08...#E10 offending dupe string
* on #E11 offending invalid code
* [7] (main, table) list table with key/index ZERO-based
and value "cy"
* [8] (main, table) forward table with key/index "cy"
and value complete line
* [9] (main, table) reverse table with key/index "c0"
and value "cy"
* [10] (main, table) extra table with key/index "c1"
and value "cy"
Note that this module is NOT generic and CANNOT be made generic due to the
principle that data received by "mw.loadData" must be static and thus it is
not possible to submit parameters (namely name of the template to be seized)
to this module.
The name of this module is the name of the addressed template prefixed
by "m", for example "Template:tblgods" -> "Module:mtblgods".
It is permissible to read several templates by one module and merge the
content in order to facilitate editing, for example "Module:mtblgods" reads
"Template:tblgodsaf" + "Template:tblgodsgr" + "Template:tblgodssz".
Law and order rules:
* cy forward
* sorting is obligatory for items all except earliest and last, optional
for earliest and last
* dupes prohibited without exception (not that the sorting requirement
makes a forward dupe mostly but not absolutely impossible, such issue is
usually caught in the sort check, but we allow the earliest and last entry
to be excluded from sorting thus must check for dupe nevertheless)
* "-" prohibited
* c0 reverse and c1 extra
* output table can be disabled, no checks then
* no sorting requirement
* if table is enabled then dupes are prohibited
* c0 reverse
* "-" is totally prohibited, thus not even one occurrence will be tolerated
* c1 extra
* optionally "-" can be enabled, such values are NOT stored and thus
bypass the check for dupes
* if "-" is not enabled then it is totally prohibited, thus not even one
occurrence will be tolerated
Prohibition against both "-" and dupes in "cy" and "c0" is essential
for the system to work. Disabling reverse output table makes dupes in "c0"
pass undetected but still not legal. !!!FIXME!!!
Status codes:
* #E00: OK
* #E01: code reserved for caller, cannot occur here
* #E02: template not found
* #E03: bad template size, must be 10...1'000'000 octets
* #E04: failure removing two areas, or expected string "[[" not found
* #E05: bad length of line, must be 6...10'000 octets
* #E06: failed to extract elements "cy" and "c0" and "c1" from the CSV line
* #E07: sorting crime in "cy"
* #E08: forward dupe "cy"
* #E09: reverse dupe in "c0"
* #E10: extra dupe in "c1"
* #E11: invalid lang code in "cy"
* #E12: "-" detected in "c0" or "-" detected in "c1" while not enabled
* #E13: bad number of lines, must be 2...10'000
Two outer areas are cut off BEFORE the core processing loop. Empty lines are
skipped and multiple spaces are reduced to single ones INSIDE the core loop
when parsing a line. This is essential for reporting back a useful position
on error.
Note that the used "#ifexist:" function expects a wikilink, not a URL. This
means among others that percent-encoding CANNOT be used, UTF8 is required. !!!FIXME!!! use native LUA
]===]
------------------------------------------------------------------------
---- CONSTANTS [O] ----
------------------------------------------------------------------------
-- uncommentable constant strings EO vs ID
local construstm = string.char(0xC5,0x9C) .. "ablono:tbllingvoj" -- EO -- "SXablono"
-- local construstm = "Templat:tblbahasa" -- ID
-- constant table -- ban list -- add obviously invalid access codes (2-letter or 3-letter) only
-- length of the list is NOT stored anywhere, the processing stops
-- when type "nil" is encountered, used by "lfivalidatelnkoadv" only
-- "en.wiktionary.org/wiki/Wiktionary:Language_treatment" excluded languages
-- "en.wikipedia.org/wiki/Spurious_languages"
-- "iso639-3.sil.org/code/art" only valid in ISO 639-2
-- "iso639-3.sil.org/code/zxx" "No linguistic content"
local contabisbanned = {}
contabisbanned = {'by','dc','ll','jp','art','cmn','deu','eng','epo','fra','gem','ger','ido','lat','por','rus','spa','swe','tup','zxx'} -- 1...20
-- control flags
local conboohavreverse = true -- change to "false" to disable reverse table
local conboohavextra = true -- change to "false" to disable extra table
local conbootwocynoso = true -- change to "false" to enforce sort all "cy"
local conbooc1allowda = true -- change to "false" to disallow "-" in "c1"
------------------------------------------------------------------------
---- LOW LEVEL STRING FUNCTIONS [G] ----
------------------------------------------------------------------------
local function lftestnum (numkaad)
local boodigit = false
boodigit = ((numkaad>=48) and (numkaad<=57))
return boodigit
end--function lftestnum
local function lftestlc (numcode)
local boolowerc = false
boolowerc = ((numcode>=97) and (numcode<=122))
return boolowerc
end--function lftestlc
------------------------------------------------------------------------
---- HIGH LEVEL STRING FUNCTIONS [I] ----
------------------------------------------------------------------------
-- Local function LFIVALIDATELNKOADV
-- Advanced test whether a string (intended to be a language code) is valid
-- containing only 2 or 3 lowercase letters, or 2...10 char:s and with some
-- dashes, or maybe a digit in middle position or maybe instead equals to "-"
-- or "??" and maybe additionally is not included on the ban list.
-- Input : * strqooq -- string (empty is useless and returns
-- "true" ie "bad" but cannot cause any major harm)
-- * numnokod -- "0" ... "3" how special codes "-" "??" should pass
-- * boolonkg -- "true" to allow long codes such as "zh-min-nan"
-- * boodigit -- "true" to allow digit in middle position
-- * boonoban -- "true" to skip test against ban table
-- Output : * booisvaladv -- true if string is valid
-- Depends on functions :
-- [G] lftestnum lftestlc
-- Depends on constants :
-- * table "contabisbanned"
-- Incoming empty string is safe but type "nil" is NOT.
-- Digit is tolerable only ("and" applies):
-- * if boodigit is "true"
-- * if length is 3 char:s
-- * in middle position
-- Dashes are tolerable (except special code "-") only ("and" applies):
-- * if length is at least 4 char:s (if this is permitted at all)
-- * in inner positions
-- * NOT adjacent
-- * maximally TWO totally
-- There may be maximally 3 adjacent letters, this makes at least ONE dash
-- obligatory for length 4...7, and TWO dashes for length 8...10.
local function lfivalidatelnkoadv (strqooq, numnokod, boolonkg, boodigit, boonoban)
local varomongkosong = 0 -- for check against the ban list
local numchiiar = 0
local numukurran = 0
local numindeex = 0 -- ZERO-based -- two loops
local numadjlet = 0 -- number of adjacent letters (max 3)
local numadjdsh = 0 -- number of adjacent dashes (max 1)
local numtotdsh = 0 -- total number of dashes (max 2)
local booislclc = false
local booisdigi = false
local booisdash = false
local booisvaladv = true -- preASSume innocence -- later final verdict here
while (true) do -- fake (outer) loop
if ((strqooq=="-") and ((numnokod==1) or (numnokod==3))) then
break -- to join mark -- good
end--if
if ((strqooq=="??") and ((numnokod==2) or (numnokod==3))) then
break -- to join mark -- good
end--if
numukurran = string.len (strqooq)
if ((numukurran<2) or (numukurran>10)) then
booisvaladv = false
break -- to join mark -- evil
end--if
if (not boolonkg and (numukurran>3)) then
booisvaladv = false
break -- to join mark -- evil
end--if
numindeex = 0
while (true) do -- ordinary inner loop over char:s
if (numindeex>=numukurran) then
break -- done -- good
end--if
numchiiar = string.byte (strqooq,(numindeex+1),(numindeex+1))
booislclc = lftestlc(numchiiar)
booisdigi = lftestnum(numchiiar)
booisdash = (numchiiar==45)
if (not (booislclc or booisdigi or booisdash)) then
booisvaladv = false
break -- to join mark -- inherently bad char
end--if
if (booislclc) then
numadjlet = numadjlet + 1
else
numadjlet = 0
end--if
if (booisdigi and ((numukurran~=3) or (numindeex~=1) or (not boodigit))) then
booisvaladv = false
break -- to join mark -- illegal digit
end--if
if (booisdash) then
if ((numukurran<4) or (numindeex==0) or ((numindeex+1)==numukurran)) then
booisvaladv = false
break -- to join mark -- illegal dash
end--if
numadjdsh = numadjdsh + 1
numtotdsh = numtotdsh + 1 -- total
else
numadjdsh = 0 -- do NOT zeroize the total !!!
end--if
if ((numadjlet>3) or (numadjdsh>1) or (numtotdsh>2)) then
booisvaladv = false
break -- to join mark -- evil
end--if
numindeex = numindeex + 1 -- ZERO-based
end--while -- ordinary inner loop over char:s
if (not boonoban) then -- if "yesban" then
numindeex = 0
while (true) do -- ordinary lower inner loop
varomongkosong = contabisbanned[numindeex+1] -- number of elem unknown
if (type(varomongkosong)~="string") then
break -- abort inner loop (then fake outer loop) due to end of table
end--if
numukurran = string.len (varomongkosong)
if ((numukurran<2) or (numukurran>3)) then
break -- abort inner loop (then fake outer loop) due to faulty table
end--if
if (strqooq==varomongkosong) then
booisvaladv = false
break -- abort inner loop (then fake outer loop) due to violation
end--if
numindeex = numindeex + 1 -- ZERO-based
end--while -- ordinary lower inner loop
end--if (not boonoban) then
break -- finally to join mark
end--while -- fake loop -- join mark
return booisvaladv
end--function lfivalidatelnkoadv
------------------------------------------------------------------------
---- HIGH LEVEL FUNCTIONS [H] ----
------------------------------------------------------------------------
-- Local function LFHEXTRACT
-- Extract 3 substrings ("cy", "c0", "c1") from a CSV line beginning with the
-- "cy" part in double rectangular brackets WITHOUT following comma. Note that
-- excessive spaces have already been reduced, but an optional space before
-- and after every comma and after the "cy" part still can occur. There is
-- NO EOL char at the end.
-- Example of minimal imaginable line : "[[a]]0"
local function lfhextract (strline)
local strcky = ''
local strck0 = ''
local strck1 = ''
local strtump = ''
local numpanjang = 0
local numindxe = 0
local numfas = 0 -- 0:cy -- 1:c0 -- 2:c1 -- 3:abort
local numchch = 0
local numchnx = 0
local boocrap = false
local boopopospace = false
local boononempty = false
numpanjang = string.len (strline)
if (numpanjang<6) then
boocrap = true
else
boocrap = (string.sub(strline,1,2)~="[[")
end--if
if (not boocrap) then
numindxe = 2 -- ZERO-based and skipping "[["
while (true) do
if (numindxe==numpanjang) then
break -- end of string, LF EOL not used here
end--if
numchnx = 0
numchch = string.byte(strline,(numindxe+1),(numindxe+1)) -- pick
numindxe = numindxe + 1
if (numindxe<numpanjang) then
numchnx = string.byte(strline,(numindxe+1),(numindxe+1)) -- prepeek
end--if
if ((numchch==44) or ((numchch==93) and (numchnx==93))) then -- "," or "]]"
if (numchch~=44) then
numindxe = numindxe + 1 -- skip both
end--if
numfas = numfas + 1
if (numfas==3) then
break
end--if
boopopospace = false -- !!!CRUCIAL!!!
boononempty = false -- !!!CRUCIAL!!!
else
if (numchch==32) then
boopopospace = boononempty -- here trim away leading spaces too
else
if (boopopospace) then
strtump = string.char(32,numchch)
boopopospace = false -- !!!CRUCIAL!!!
else
strtump = string.char(numchch)
end--if
if (numfas==0) then
strcky = strcky .. strtump -- this is slow
end--if
if (numfas==1) then
strck0 = strck0 .. strtump -- this is slow
end--if
if (numfas==2) then
strck1 = strck1 .. strtump -- this is slow
end--if
boononempty = true -- !!!CRUCIAL!!!
end--if (numchch==32) else
end--if
end--while
if (strcky=='') then
strck0 = '' -- broken "strcky" ruins "strck0" but NOT vice-versa
end--if
end--if
return strcky, strck0, strck1
end--function lfhextract
------------------------------------------------------------------------
---- VARIABLES [R] ----
------------------------------------------------------------------------
-- general table --
local tld = {} -- outer table
local tbllist = {}
local tblforward = {}
local tblreverse = {}
local tblextra = {}
-- general unknown type
local vartmp = 0 -- variable without type
-- general type "frame"
local arxframent = 0
-- general str
local strbig = '' -- big string
local strerrearl = '' -- early participant in sort crime or ...
local strerrlatt = '' -- latter participant in sort crime or ...
local strcy = ''
local strc0 = ''
local strc1 = ''
local strttmp = '' -- temp
local strliine = '' -- complete line with spaces reduced
local strprevious = ''
-- general num
local numerr = 0 -- status 0: OK -- 2: template not found ...
local numbiglen = 0 -- full
local numbeg = 0
local numend = 0
local numtrmlen = 0 -- after trimming away 2 areas but not more
local numtrmpos = 0
local numlnjlen = 0
local numchv = 0
local numchw = 0
local numcntlin = 0 -- processed lines
-- general boo
local boogotchar = false
local boopostpone = false
local boochecksort = false
------------------------------------------------------------------------
---- MAIN [Z] ----
------------------------------------------------------------------------
---- SEIZE THE INFAMOUS "FRAME" OBJECT (THERE IS NO ORDINARY CALLER) ----
arxframent = mw.getCurrentFrame () -- use this if no main function exists
---- CHECK WHETHER THE POINTED TEMPLATE EXISTS AT ALL & EXPAND IT IF SO ----
strttmp = arxframent:callParserFunction ('#ifexist:'..construstm,'1','0') -- !!!FIXME!!!
if (strttmp=='1') then
vartmp = arxframent:expandTemplate { title = construstm } -- !!!FIXME!!!
if ((type(vartmp))=='string') then
strbig = vartmp -- may be empty
end--if
else
numerr = 2 -- #E02
end--if
---- CHECK LENGTH ----
if (numerr==0) then
numbiglen = string.len(strbig)
if ((numbiglen<10) or (numbiglen>1000000)) then
numerr = 3 -- #E03
end--if
end--if
---- TRIM AWAY TWO AREAS ----
if (numerr==0) then
vartmp = 0 -- ONE-based or type "nil"
numbeg = 0 -- ONE-based thus ZERO is invalid
numend = 0 -- ONE-based thus ZERO is invalid
while (true) do -- search for all "[["
vartmp = string.find(strbig, "[[" , (vartmp+1), true)
if (not vartmp) then
break -- no more hit
end--if
if (numbeg==0) then
numbeg = vartmp
else
numend = vartmp
end--if
end--while
if ((numbeg==0) or (numend==0) or ((numbeg+5)>numend)) then -- HALF size
numerr = 4 -- #E04 failure removing two areas
else
while (true) do -- search for next LF EOL
if (numend>numbiglen) then -- "numend" is ONE-based
break -- "numend" is ONE-based and OFF-BY-ONE now
end--if
numchw = string.byte(strbig,numend,numend) -- pick
numend = numend + 1
if (numchw==10) then
break -- "numend" is ONE-based and OFF-BY-ONE now
end--if
end--while
if ((numbeg+10)>numend) then
numerr = 4 -- #E04 failure removing two areas
else
strbig = string.sub (strbig,numbeg,(numend-1))
numtrmlen = string.len (strbig)
end--if
end--if ((numbeg==0) or (numend==0) or ((numbeg+5)>numend)) else
end--if (numerr==0) then
---- PARSE THE BIG TEXT TABLE AND FILL 2 ... 4 LUA TABLES ----
if (numerr==0) then
strprevious = '' -- used to detect sorting crimes in "cy"
numtrmpos = 0 -- ZERO-based: position in the text
numcntlin = 0 -- ZERO-based: number of processed lines, index in "tbllist"
while (true) do -- outer loop over all lines
strliine = ''
boogotchar = false
while (true) do -- upper inner loop to find a line
if (numtrmpos>=numtrmlen) then
break -- upper inner loop, EOF
end--if
numchw = string.byte(strbig,(numtrmpos+1),(numtrmpos+1)) -- pick
numtrmpos = numtrmpos + 1
if ((numchw~=10) and (numchw~=32)) then -- trim leading spaces too
boogotchar = true
break -- upper inner loop, found line
end--if
end--while -- upper inner
if (not boogotchar) then
break -- outer loop, EOF
end--if
boopostpone = false
while (true) do -- inner loop to seize a line
if (numchw==10) then
break -- inner loop, captured an EOL, do NOT store it anywhere
end--if
if (numchw==32) then
boopostpone = true -- no need to trim away leading spaces here
else
if (boopostpone) then
strliine = strliine .. string.char(32,numchw) -- build the line
boopostpone = false -- !!!CRUCIAL!!!
else
strliine = strliine .. string.char(numchw) -- build the line
end--if
end--if
if (numtrmpos>=numtrmlen) then
break -- inner loop, EOF
end--if
numchw = string.byte(strbig,(numtrmpos+1),(numtrmpos+1)) -- pick
numtrmpos = numtrmpos + 1
end--while
numlnjlen = string.len(strliine)
if ((numlnjlen<6) or (numlnjlen>10000)) then
numerr = 5 -- #E05 -- bad length of single line
break -- outer loop
end--if
strcy, strc0, strc1 = lfhextract (strliine) -- !!! 3 results !!!
if ((strcy=='') or ((strc0=='') and conboohavreverse) or ((strc1=='') and conboohavextra)) then
numerr = 6 -- #E06 -- failed to extract elements
end--if
if ((strc0=='-') or ((strc1=='-') and conboohavextra and (not conbooc1allowda))) then
numerr = 12 -- #E12 -- illegal dash "-"
end--if
if (numerr~=0) then -- #E06 or #E12
numchv = string.byte(strliine,1,1)
numchw = string.byte(strliine,numlnjlen,numlnjlen)
strerrearl = strliine -- complete line
strerrlatt = "len=" .. tostring(numlnjlen) ..
" beg=" .. tostring(numchv) .. -- decimal code
" end=" .. tostring(numchw) .. -- decimal code
" rev=" .. tostring(conboohavreverse) ..
" ext=" .. tostring(conboohavextra)
break -- outer loop
end--if
if (not lfivalidatelnkoadv(strcy,0,false,false,false)) then
strerrearl = "INVALID LANG CODE" -- !!!FIXME!!! allow other types of indexes
strerrlatt = strcy
numerr = 11 -- #E11 -- illegal lang code
break -- outer loop
end--if
boochecksort = (not conbootwocynoso) or ((numtrmpos<numtrmlen) and (numcntlin>1))
if (boochecksort and (strcy<=strprevious)) then
strerrearl = strprevious
strerrlatt = strcy -- "strcy" should be bigger but is not :-(
numerr = 7 -- #E07 -- sorting crime in "cy"
break -- outer loop
end--if
if (tblforward [strcy]) then -- always prohibited, no override
strerrearl = "FORWARD DUPE"
strerrlatt = strcy
numerr = 8 -- #E08 -- forward dupe in "cy"
break -- outer loop
end--if
tbllist [numcntlin] = strcy -- always store "cy" at key/index ZERO-based
tblforward [strcy] = strliine -- always store line at key/index "cy"
if (conboohavreverse) then -- store all items, "-" prohibited
vartmp = tblreverse [strc0]
if (vartmp) then
strerrearl = "REVERSE DUPE"
strerrlatt = vartmp
numerr = 9 -- #E09 -- reverse dupe in "c0"
break -- outer loop
end--if
tblreverse [strc0] = strcy -- store "cy" at key/index "c0"
end--if
if (conboohavextra and (strc1~="-")) then -- exclude "-" from storing
vartmp = tblextra [strc1]
if (vartmp) then
strerrearl = "EXTRA DUPE"
strerrlatt = vartmp
numerr = 10 -- #E10 -- extra dupe in "c1"
break -- outer loop
end--if
tblextra [strc1] = strcy -- store "cy" at key/index "c1"
end--if
strprevious = strcy -- sorting is OBLIGATORY (with exception, see above)
numcntlin = numcntlin + 1 -- successfully stored one line
end--while -- outer loop over all lines
end--if (numerr==0) then
if (numerr==0) then
if ((numcntlin<2) or (numcntlin>10000)) then
numerr = 13 -- #E13 -- bad number of lines
end--if
end--if
---- RETURN THE JUNK ----
tld[0] = numerr
tld[1] = numbiglen
tld[2] = numtrmlen
tld[3] = numtrmpos -- equal "numtrmlen" on success
tld[4] = numcntlin
tld[5] = strerrearl -- empty or whining
tld[6] = strerrlatt -- empty or whining
if (numerr==0) then
tld[7] = tbllist -- always on success
tld[8] = tblforward -- always on success
if (conboohavreverse) then
tld[9] = tblreverse -- can be disabled
end--if
if (conboohavextra) then
tld[10] = tblextra -- can be disabled
end--if
end--if
return tld -- only allowed data type is "table"