Unicode::LineBreak˜[ja] − UAX #14 Unicode è¡åå²ã¢ã«ã´ãªãºã
use
Unicode::LineBreak;
$lb = Unicode::LineBreak−>new();
$broken = $lb−>break($string);
Unicode::LineBreak ã¯ãUnicode æ¨æºã®é屿¸14 [UAX #14] ã§è¿°ã¹ã Unicode è¡åå²ã¢ã«ã´ãªãºã ãå®è¡ããã åå²ä½ç½®ã決å®ããéã«ãé屿¸11 [UAX #11] ã§å®ç¾©ããã East_Asian_Width åèç¹æ§ãèæ®ããã
便å®çã«ä»¥ä¸ã®ç¨èªã使ãã
å¼·å¶åå²ãmandatory breakãã¯ãåºæ¬è¦åã§å®ãããã¦ãããå¨å²ã®æå- ã«é¢ä¿ãªã義åçã«å®è¡ãããè¡åå²åä½ã ä»»æåå²ã¯ãåºæ¬è¦åã§èªãããã¦ãããã¦ã¼ã¶ãå®è¡ããã¨æ±ºããå ´åã«è¡ãããè¡åå²åä½ã [UAX #14] ã§å®ç¾©ãããä»»æåå²ã«ã¯ç´æ¥åå²ãdirect breakãã¨éæ¥åå²ãindirect breakãã¨ãããã
é³ç´ æåçãªæåãalphabetic charactersãã¯ãé常ãä»ã®æå- ãåå²ã®æ©ä¼ãä¸ããªãããããæåå士ã®éã§è¡åå²ã§ããªãæåã è¡¨èªæåçãªæåãideographic charactersãã¯ãé常ããã®åå¾ã§è¡åå²ã§ããæåã [UAX #14] ã§ã¯é³ç´ æåçãªæå- ã®ã»ã¨ãã©ã AL ã«ãè¡¨èªæåçãªæåã®ã»ã¨ãã©ã ID ã«åé¡ãã¦ãã (ãããã®ç¨èªã¯æåå- ¦ã®è¦³ç¹ããããã°ä¸æ£ç¢ºã§ãã)ã è¥å¹²ã®ç¨åç³»ã§ã¯ãåãã®æå- ããã¯åå²ä½ç½®ãæç¢ºã«ãªããªããããè¾æ¸ã«ããçºè¦çæ¹æ³ãç¨ããã
æååã®æ¡æ°ã¯ãæååã«å«ã¾ããæåã®æ°ã¨çããã¨ã¯ããããªãã åãã®æåã¯åºããwideãããç- ããnarrowãããåé²ãä¼´ããªããnonspacingããã®ããããã§ãããåã 2 æ¡ã1 æ¡ã0 æ¡ãå ããã è¥å¹²ã®æåã¯ã使ãããæèã«ãã£ã¦åºããçãããªãå¾ãã ã«ã¹ã¿ãã¤ãºã«ãã£ã¦ãæå- ã¯ãã夿§ãªå¹ãæã¡ããã
new ([KEY => VALUE, ...])
ã³ã³ã¹ãã©ã¯ã¿ã KEY => VALUE ã®å¯¾ã«ã¤ãã¦ã¯ "ãªãã·ã§ã³" ãåç§ã
break (STRING)
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã Unicode æåå STRING ãåå²ãããããè¿ãã éåã³ã³ãã¯ã¹ãã§ã¯ãçµæã®åè¡ã®éåãè¿ãã
break_partial (STRING)
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã break() ã¨åãã ããæååãå°ããã¤è¿½å ãã¦å¥åããå ´åã å¥åãå®äºãããã¨ã示ãã«ã¯ãSTRING 弿°ã« "undef" ãä¸ããã
config (KEY)
config (KEY => VALUE, ...)
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã è¨å®ãåå¾ã¾ãã¯å¤æ´ããã KEY => VALUE ã®å¯¾ã«ã¤ãã¦ã¯ "ãªãã·ã§ã³" ãåç§ã
copy
ã³ãã¼ã³ã³ã¹ãã©ã¯ã¿ã ãªãã¸ã§ã¯ãã¤ã³ã¹ã¿ã³ã¹ã®è¤è£½ãã¤ããã
breakingRule (BEFORESTR, AFTERSTR)
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æåå BEFORESTR 㨠AFTERSTR ã®éã§ã®è¡åå²åä½ãå¾ãã è¿å¤ã«ã¤ãã¦ã¯ "宿°" ãåç§ã
注: ãã®ã¡ã½ããã¯ãè¡åå²ã®ããã¾ããªåä½ã表ãå¤ãè¿ãã«ãããªãã å®éã®ãã- ã¹ããè¡æãããã«ã¯ãbreak() çã®ã¡ã½ããã使ã£ã¦ã»ããã
context ([Charset => CHARSET], [Language => LANGUAGE])
颿°ã ãã£ã©ã¯ã¿ã»ãã CHARSET ããã³è¨èªã³ã¼ã LANGUAGE ãããããã使ãè¨èª/å°åã®æèãå¾ãã
"new"ã"config"
ã®ä¸¡ã¡ã½ããã«ã¯ä»¥ä¸ã®å¯¾ãæå®ã§ããã
æ¡æ°ã®ç®åº
([E])ãæ¸è¨ç´ ã¯ã©ã¹ã¿åç¯
([G]) (Unicode::GCString˜[ja]
ãåç§)ãè¡åå²åä½
([L])
ã«å½±é¿ãããã®ãããã
BreakIndent => "YES" | "NO"
[L] è¡é ã® SPACE ã®ä¸¦ã³ (ã¤ã³ãã³ã) ã®å¾ã§ã¯å¸¸ã«åå²ã許ãã [UAX #14] 㯠SPACE ã®ãã®ãããªç¨æ³ãèæ®ãã¦ããªãã åæå¤ã¯ "YES"ã
注: ãã®ãªãã·ã§ã³ã¯ãªãªã¼ã¹ 1.011 ã§å°å¥ãããã
CharMax => NUMBER
[L] è¡ã«å«ã¿ããæå¤§ã®æåæ°ãè¡æ«ã®ç©ºç½æåã¨æ¹è¡ã®æååãé¤ãã æå- æ°ã¯ä¸è¬ã«è¡ã®é·ãã表ããªããã¨ã«æ³¨æã åæå¤ã¯ 998ã 0 ã«ã¯ã§ããªãã
ColMin => NUMBER
[L] ä»»æåå²ãããè¡ã®ãæ¹è¡ã®æååã¨è¡æ«ã®ç©ºç½æåãå«ããªãæå°æ¡æ°ã åæå¤ã¯ 0ã
ColMax => NUMBER
[L] è¡ã®ãæ¹è¡ã®æååã¨è¡æ«ã®ç©ºç½æåãå«ããªãæå¤§æ¡æ°ãã¤ã¾ããè¡ã®æå¤§é·ã åæå¤ã¯ 76ã
"Urgent"
ãªãã·ã§ã³ããã³
"ã¦ã¼ã¶å®ç¾©ã®è¡åå²åä½"
ãåç§ã
ComplexBreaking => "YES" | "NO"
[L] æ±åã¢ã¸ã¢ã®è¤éãªæèã§ãçºè¦çãªè¡æããè¡ãã åæå¤ã¯ãæ±åã¢ã¸ã¢ã®è¡¨è¨ä½ç³»ã§ã®åèªåç¯ãæå¹ãªã "YES"ã
Context => CONTEXT
[E][L] è¨èª/å°åã®æèãæå®ããã ç¾å¨ä½¿ããæè㯠"EASTASIAN" ã "NONEASTASIAN"ã åæã®æè㯠"NONEASTASIAN"ã
"EASTASIAN" æèã§ã¯ãEast_Asian_Width ç¹æ§ãææ§ (A) ã§ããã°ãåºããæå- ã¨ã¿ãªããè¡åå²ç¹æ§ã AI ã§ããã°è¡¨èªæåç (ID) ã¨ã¿ãªãã
"NONEASTASIAN" æèã§ã¯ãEast_Asian_Width ç¹æ§ãææ§ (A) ã§ããã°ãçããæå- ã¨ã¿ãªããè¡åå²ç¹æ§ã AI ã§ããã°é³ç´ æåç (AL) ã¨ã¿ãªãã
EAWidth => "[" ORD
"=>" PROPERTY "]"
EAWidth => "undef"
[E] åãã®æåã® East_Asian_Width ç¹æ§ãæç´ãããã ORD ã¯æåã® UCS ã¤ã³ãã¯ã¹å¤ãããããã®éåã¸ã®åç§ã PROPERTY 㯠East_Asian_Width ç¹æ§å¤ãæ¡å¼µå¤ã®ãããã ("宿°" ãåç§)ã ãã®ãªãã·ã§ã³ã¯è¤æ°åæå®ã§ããã "undef" ãæå®ããã¨ãããã¾ã§ã®æç´ãããã¹ã¦åãæ¶ãã
åæå¤ã§ã¯ãEast_Asian_width ç¹æ§ã®æç´ãã¯ããªãã "æåã®ç¹æ§ã®æç´ã" ãåç§ã
Format => METHOD
[L]
åå²ããè¡ãæ´å½¢ããæ¹æ³ãæå®ããã
"SIMPLE"
åæã®æ¹æ³ã ä»»æåå²ã®ä½ç½®ã«æ¹è¡ãæ¿å¥ããã ãã
"NEWLINE"
"Newline" ãªãã·ã§ã³ã§æå®ãããã®ã§æ¹è¡ãç½®ãæããã æ¹è¡ã®åã¨ãã- ã¹ãçµç«¯ã®ç©ºç½æåãé¤å»ããã ããã¹ãçµç«¯ã«æ¹è¡ããªããã°è¿½å ããã
"TRIM"
ä»»æåå²ã®ä½ç½®ã«æ¹è¡ãæ¿å¥ããã æ¹è¡ã®åã®ç©ºç½æåãé¤å»ããã
"undef"
ãªã«ãããªã (æ¹è¡ã®æ¿å¥ã)ã
ãµãã«ã¼ãã³ã¸ã®åç§
"è¡ã®æ´å½¢" ãåç§ã
HangulAsAL => "YES" | "NO"
[L] ãã³ã°ã«é³ç¯ã¨ãã³ã°ã«é£çµãã£ã¢ãconjoining jamoããé³ç´ æåçãªæå (AL) ã¨æ±ãã åæå¤ã¯ "NO"ã
LBClass => "[" ORD
"=>" CLASS "]"
LBClass => "undef"
[G][L] åãã®æåã®è¡åå²ç¹æ§ (åé¡) ãæç´ãããã ORD ã¯æåã® UCS ã¤ã³ãã¯ã¹å¤ãããããã®éåã¸ã®åç§ã CLASS ã¯è¡åå²ç¹æ§å¤ã®ãããã ("宿°" ãåç§)ã ãã®ãªãã·ã§ã³ã¯è¤æ°åæå®ã§ããã "undef" ãæå®ããã¨ãããã¾ã§ã®æç´ãããã¹ã¦åãæ¶ãã
åæå¤ã§ã¯ãè¡åå²ç¹æ§ã®æç´ãã¯ããªãã "æåã®ç¹æ§ã®æç´ã" ãåç§ã
LegacyCM => "YES" | "NO"
[G][L] åã«ç©ºç½æåãã¤ããçµåæåãåç¬ã®çµåæå (ID) ã¨æ±ãã Unicode 5.0 çããã¯ãç©ºç½æåã®ãã®ãããªä½¿ãããã¯æ¨å¥¨ãããªãã åæå¤ã¯ "YES"ã
Newline => STRING
[L] æ¹è¡ã®æååã¨ãã Unicode æååã åæå¤ã¯ "\n"ã
Prep => METHOD
[L]
ã¦ã¼ã¶å®ç¾©ã®è¡åå²åä½ã追å ããã
ãã®ãªãã·ã§ã³ã¯è¤æ°åæå®ã§ããã
METHOD
ã«ã¯ä»¥ä¸ã®ãã®ãæå®ã§ããã
"NONBREAKURI"
URI ãåå²ããªãã
"BREAKURI"
URI ããå°å·ç©ã«é©ããè¦åã§åå²ããã 詳ãã㯠[CMOS] ã® 6.17 ç¯ã¨ 17.11 ç¯ãåç§ã
"[" REGEX, SUBREF "]"
æ£è¦è¡¨ç¾ REGEX ã«ãããããæååããSUBREF ã§åç§ããããµãã«ã¼ãã³ã§åå²ããã 詳細㯠"ã¦ã¼ã¶å®ç¾©ã®è¡åå²åä½" ãåç§ã
"undef"
ããã¾ã§ã«è¿½å ããåä½ããã¹ã¦åãæ¶ãã
Sizing => METHOD
[L]
æååã®é·ããç®åºããæ¹æ³ãæå®ããã
以ä¸ã®ãªãã·ã§ã³ã使ããã
"UAX11"
åæã®æ¹æ³ã çµã¿è¾¼ã¿ã®æåãã¼ã¿ãã¼ã¹ã«ãã£ã¦æåã®æ¡æ°ãç®åºããã
"undef"
æååã«å«ã¾ããæ¸è¨ç´ ã¯ã©ã¹ã¿ (Unicode::GCString åç§) ã®æ°ãè¿ãã
ãµãã«ã¼ãã³ã¸ã®åç§
"æååé·ã®ç®åº" ãåç§ã
"ColMax"ã"ColMin"ã"EAWidth" ãªãã·ã§ã³ãåç§ã
Urgent => METHOD
[L]
é·ãããè¡ã®æ±ããããæå®ããã
以ä¸ã®ãªãã·ã§ã³ã使ããã
"CROAK"
ã¨ã©ã¼ã¡ãã»ã¼ã¸ãåºåãã¦æ»ã¬ã
"FORCE"
é·ãããæååãç¡çããåå²ããã
"undef"
åæã®æ¹æ³ã é·ãããæååãåå²ããªãã
ãµãã«ã¼ãã³ã¸ã®åç§
"ã¦ã¼ã¶å®ç¾©ã®è¡åå²åä½" ãåç§ã
ViramaAsJoiner => "YES" | "NO"
[G] ã´ã£ã©ã¼ãè¨å· (ãã³ãã£èªã§ã¯ããã©ã³ãããã¯ã¡ã¼ã«æåã§ã®ãèã) ã¨ããã«ç¶ãåã¨ãåé¢ããªãã åæå¤ã¯ "YES"ã æ³¨: ãã®ãªãã·ã§ã³ã¯ãªãªã¼ã¹ 2011.001_29 ã§å°å¥ãããã 以åã®ãªãªã¼ã¹ã§ã¯ "NO" ã«åºå®ã§ãã£ãã ããã¯ã[UAX #29] ã§å®ç¾©ãããåæã®ãæ¸è¨ç´ ã¯ã©ã¹ã¿ã«ã¯å«ã¾ããªã仿§ã§ããã
"EA_Na", "EA_N", "EA_A", "EA_W", "EA_H", "EA_F"
[UAX #11] ã§å®ç¾©ããã 6 ã¤ã® East_Asian_Width ç¹æ§å¤ã ç (Na)ãä¸ç« (N)ãææ§ (A)ãåº (W)ãåè§ (H)ãå¨è§ (F)ã
"EA_Z"
åé²ãä¼´ããªãæåã® East_Asian_Width ç¹æ§ã®å¤ã
注: ãã®ãåé²ãä¼´ããªããå¤ã¯å½ã¢ã¸ã¥ã¼ã«ã«ããæ¡å¼µã§ããã [UAX #11] ã®ä¸é¨ã§ã¯ãªãã
"LB_BK",
"LB_CR", "LB_LF", "LB_NL",
"LB_SP", "LB_OP", "LB_CL",
"LB_CP",
"LB_QU", "LB_GL", "LB_NS",
"LB_EX", "LB_SY", "LB_IS",
"LB_PR", "LB_PO",
"LB_NU", "LB_AL", "LB_HL",
"LB_ID", "LB_IN", "LB_HY",
"LB_BA", "LB_BB",
"LB_B2", "LB_CB", "LB_ZW",
"LB_CM", "LB_WJ", "LB_H2",
"LB_H3", "LB_JL",
"LB_JV", "LB_JT", "LB_SG",
"LB_AI", "LB_CJ", "LB_SA",
"LB_XX", "LB_RI"
[UAX #14] ã§å®ç¾©ããã 40 ã®è¡åå²ç¹æ§å¤ (åé¡)ã
注: ç¹æ§å¤ CP ã¯Unicode 5.2.0çã§å°å¥ãããã ç¹æ§å¤ HL 㨠CJ ã¯Unicode 6.1.0çã§å°å¥ãããã ç¹æ§å¤ RI 㯠Unicode 6.2.0çã§å°å¥ãããã
"MANDATORY", "DIRECT", "INDIRECT", "PROHIBITED"
è¡åå²åä½ã表ã 4 ã¤ã®å¤ã å¼·å¶åå²ãç´æ¥åå²ã鿥åå²ãèªããã鿥åå²ãèªãããç´æ¥åå²ã¯ç¦ãããåå²ãç¦ããã
"Unicode::LineBreak::SouthEastAsian::supported"
æ±åã¢ã¸ã¢ã®è¡¨è¨ä½ç³»ã®ããã®åèªåç¯æ©è½ãæå¹ãã©ããã示ããã©ã°ã ãã®æ©è½ãæå¹ã«ãªã£ã¦ããã°ã空ã§ãªãæååã ããã§ãªããã° "undef"ã
注: ç¾ãªãªã¼ã¹ã§ã¯ç¾ä»£ã¿ã¤èªã®ã¿ã¤æåã«ã®ã¿å¯¾å¿ãã¦ããã
"UNICODE_VERSION"
ãã®ã¢ã¸ã¥ã¼ã«ãåç§ãã Unicode æ¨æºã®çã示ãæååã
"Format" ãªãã·ã§ã³ã«ãµãã«ã¼ãã³ã¸ã®åç§ãæå®ããå ´åããã®ãµãã«ã¼ãã³ã¯ 3 ã¤ã®å¼æ°ãåããªããã°ãªããªãã
$ä¿®æ£å¾ = &ãµãã«ã¼ãã³(SELF, EVENT, STR);
SELF 㯠Unicode::LineBreak ãªãã¸ã§ã¯ããEVENT ã¯ãµãã«ã¼ãã³ãå¼ã°ããæèã表ãæååãSTR ã¯åå²ä½ç½®ã®åã¾ãã¯å¾ã® Unicode æå- åã®æçã
EVENT
|é§åã®å¥æ©
|STR
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
"sot"
|ããã¹ãåé
|æåã®è¡ã®æç
"sop"
|å¼·å¶åå²ã®å¾
|次ã®è¡ã®æç
"sol"
|ä»»æåå²ã®å¾
|ç¶ãã®è¡ã®æç
""
|åå²ã®ç´å
|è¡å¨ä½
(çµç«¯ã®ç©ºç½æåãé¤ã)
"eol"
|ä»»æåå²
|åå²ä½ç½®ã®åã®ç©ºç½æå
"eop"
|å¼·å¶åå²
|æ¹è¡ã¨ãã®åã®ç©ºç½æå
"eot"
|ããã¹ãçµç«¯
|ããã¹ãçµç«¯ã®ç©ºç½æå
(ã¨æ¹è¡)
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
ãµãã«ã¼ãã³ã¯ãããã¹ãã®æçãä¿®æ£ãã¦è¿ããªããã°ãªããªãããªã«ãä¿®æ- £ããªãã£ããã¨ã示ãã«ã¯ã"undef" ãè¿ãã°ããã ãªãã"sot"ã"sop"ã"sol" ã®æèã§ã®ä¿®æ£ã¯ãã®å¾ã®åå²ä½ç½®ã®æ±ºå®ã«å½±é¿ããããã»ãã®æèã§ã®ä¿®æ- £ã¯å½±é¿ããªãã
注æ: æååã®å¼æ°ã¯å®éã«ã¯æ¸è¨ç´ ã¯ã©ã¹ã¿åã§ããã Unicode::GCString˜[ja] åç§ã
ãã¨ãã°æ¬¡ã®ã³ã¼ãã¯ãè¡æ«ã®ç©ºç½ãåãé¤ãã¦è¡æããããã
sub fmt {
if ($_[1] =˜ /ˆeo/) {
return "\n";
}
return undef;
}
my $lb = Unicode::LineBreak−>new(Format =>
\&fmt);
$output = $lb−>break($text);
ä»»æåå²ã«ãã£ã¦çããè¡ã CharMaxãColMaxãColMin ã®ããããã®å¶éãè¶ããã¨è¦è¾¼ã¾ããã¨ãã¯ãå¼ãç¶ãæååã«å¯¾ãã¦ç·æ¥åå²ãå®è¡ã§ããã "Urgent" ãªãã·ã§ã³ã«ãµãã«ã¼ãã³ã¸ã®åç§ãæå®ããå ´åããã®ãµãã«ã¼ãã³ã¯ 2 ã¤ã®å¼æ°ãåããªããã°ãªããªãã
@åå²å¾ = &ãµãã«ã¼ãã³(SELF, STR);
SELF 㯠Unicode::LineBreak ãªãã¸ã§ã¯ããSTR ã¯åå²ãã¹ã Unicode æååã
ãµãã«ã¼ãã³ã¯ãæåå STR ãåå²ããçµæã®éåãè¿ããªããã°ãªããªãã
注æ: æååã®å¼æ°ã¯å®éã«ã¯æ¸è¨ç´ ã¯ã©ã¹ã¿åã§ããã Unicode::GCString˜[ja] åç§ã
ãã¨ãã°æ¬¡ã®ã³ã¼ãã¯ãè¥å¹²ã®åå¦ç©è³ª (ããã³ã®ãããª) ã®åç§°ã«ãã¤ãã³ãæ¿å¥ããè¡æãã§ããããã«ããã
sub hyphenize {
return map {$_ =˜ s/yl$/yl−/; $_} split
/(\w+?yl(?=\w))/, $_[1];
}
my $lb = Unicode::LineBreak−>new(Urgent =>
\&hyphenize);
$output =
$lb−>break("Methionylthreonylthreonylglutaminylarginyl...");
"Prep" ãªãã·ã§ã³ã« [REGEX, SUBREF] ã®éååç§ãæå®ããå ´åããµãã«ã¼ãã³ã¯ 2 ã¤ã®å¼æ°ãåããªããã°ãªããªãã
@åå²å¾ = &ãµãã«ã¼ãã³(SELF, STR);
SELF 㯠Unicode::LineBreak ãªãã¸ã§ã¯ããSTR 㯠REGEX ã«ãããããåå²ãã¹ã Unicode æååã
ãµãã«ã¼ãã³ã¯ãæåå STR ãåå²ããçµæã®éåãè¿ããªããã°ãªããªãã
ãã¨ãã°æ¬¡ã®ã³ã¼ãã¯ãHTTP URL ã [CMOS] ã®è¦åãç¨ãã¦åå²ããã
my $url =
qr{http://[\x21−\x7E]+}i;
sub breakurl {
my $self = shift;
my $str = shift;
return split m{(?<=[/]) (?=[ˆ/]) |
(?<=[ˆ−.]) (?=[−˜.,_?\#%=&]) |
(?<=[=&]) (?=.)}x, $str;
}
my $lb = Unicode::LineBreak−>new(Prep => [$url,
\&breakurl]);
$output = $lb−>break($string);
ç¶æã®ä¿å
Unicode::LineBreak ãªãã¸ã§ã¯ãã¯ããã·ã¥åç§ã¨ãã¦ãµãã¾ãã ä»»æã®è¦ç´ ãããªãã¸ã§ã¯ãã®å卿éä¸ä¿åã§ããã
ãã¨ãã°æ¬¡ã®ã³ã¼ãã¯ã段è½ã空è¡ã§åããã
sub paraformat {
my $self = shift;
my $action = shift;
my $str = shift;
if ($action eq 'sot' or $action eq 'sop') {
$self−>{'line'} = '';
} elsif ($action eq '') {
$self−>{'line'} = $str;
} elsif ($action eq 'eol') {
return "\n";
} elsif ($action eq 'eop') {
if (length $self−>{'line'}) {
return "\n\n";
} else {
return "\n";
}
} elsif ($action eq 'eot') {
return "\n";
}
return undef;
}
my $lb = Unicode::LineBreak−>new(Format =>
\¶format);
$output = $lb−>break($string);
"Sizing" ãªãã·ã§ã³ã«ãµãã«ã¼ãã³ã¸ã®åç§ãæå®ããå ´åããã®ãµãã«ã¼ãã³ã¯ 5 ã¤ã®å¼æ°ãåããªããã°ãªããªãã
$æ¡æ° = &ãµãã«ã¼ãã³(SELF, LEN, PRE, SPC, STR);
SELF 㯠Unicode::LineBreak ãªãã¸ã§ã¯ããLEN ã¯åè¡ããæååã®é·ããPRE ã¯åè¡ãã Unicode æååãSPC ã¯è¿½å ãããç©ºç½æåãSTR ã¯å¦çãã Unicode æååã
ãµãã«ã¼ãã³ã¯ "PRE.SPC.STR" ã®æ¡æ°ãç®åºãã¦è¿ããªããã°ãªããªãã æ¡æ°ã¯æ´æ°ã§ãªãã¦ããããæ¡æ°ã®åä½ã¯éæã«é¸ã¹ããã"ColMin" ãªãã·ã§ã³ããã³ "ColMax" ãªãã·ã§ã³ã®ããã¨ä¸è´ãããªããã°ãªããªãã
注æ: æååã®å¼æ°ã¯å®éã«ã¯æ¸è¨ç´ ã¯ã©ã¹ã¿åã§ããã Unicode::GCString˜[ja] åç§ã
ãã¨ãã°æ¬¡ã®ã³ã¼ãã¯ãè¡ã« 8 æ¡ãã¨ã®ã¿ãã¹ãããããããã®ã¨ãã¦å¦çããã
sub tabbedsizing
{
my ($self, $cols, $pre, $spc, $str) = @_;
my $spcstr = $spc.$str;
while ($spcstr−>lbc == LB_SP) {
my $c = $spcstr−>item(0);
if ($c eq "\t") {
$cols += 8 − $cols % 8;
} else {
$cols += $c−>columns;
}
$spcstr = $spcstr−>substr(1);
}
$cols += $spcstr−>columns;
return $cols;
};
my $lb = Unicode::LineBreak−>new(LBClass =>
[ord("\t") => LB_SP],
Sizing => \&tabbedsizing);
$output = $lb−>break($string);
"LBClass" ãªãã·ã§ã³ããã³ "EAWidth" ãªãã·ã§ã³ã§åãã®æåã®è¡åå²ç¹æ§ (åé¡) ã East_Asian_Width ç¹æ§ãæç´ãã§ããããã®éã«ä¾¿å©ãªå®æ°ãããã¤ãå®ç¾©ãã¦ããã
è¡åå²ç¹æ§
ä»®åãªã©ã®è¡é ç¦åæå
åæå¤ã§ã¯ãè¥å¹²ã®ä»®åãä»®åã«æºãããã®ãè¡é ç¦åæå
(NS ã¾ã㯠CJ)
ã¨æ±ãã
以ä¸ã®å¯¾ã
LBClass
ãªãã·ã§ã³ã«æå®ããã°ããããã®æåãé常ã®è¡¨èªæåçãªæå
(ID) ã¨æ±ããã
"KANA_NONSTARTERS() => LB_ID"
ä¸è¨ã®æåãã¹ã¦ã
"IDEOGRAPHIC_ITERATION_MARKS() => LB_ID"
è¡¨èªæåçãªç¹°ãè¿ãè¨å·ã U+3005 ç¹°è¿ãè¨å·ãU+303B ãããç¹ãU+309D 平仮åç¹°è¿ãè¨å·ãU+309E 平仮åç¹°è¿ãè¨å· (æ¿ç¹)ãU+30FD çä»®åç¹°è¿ãè¨å·ãU+30FE çä»®åç¹°è¿ãè¨å· (æ¿ç¹)ã
注ãä»®åã§ã¯ãªããã®ãããã
"KANA_SMALL_LETTERS() =>
LB_ID"
"KANA_PROLONGED_SOUND_MARKS() => LB_ID"
å°æ¸ãä»®åã å°æ¸ã平仮å U+3041 ã, U+3043 ã, U+3045 ã, U+3047 ã, U+3049 ã, U+3063 ã£, U+3083 ã, U+3085 ã, U+3087 ã, U+308E ã, U+3095 ã, U+3096 ãã å°æ¸ãçä»®å U+30A1 ã¡, U+30A3 ã£, U+30A5 ã¥, U+30A7 ã§, U+30A9 ã©, U+30C3 ã, U+30E3 ã£, U+30E5 ã¥, U+30E7 ã§, U+30EE ã®, U+30F5 ãµ, U+30F6 ã¶ã çä»®åè¡¨é³æ¡å¼µ U+31F0 ã° − U+31FF ã¿ã å°æ¸ãçä»®å (代æ¿åç§°) U+FF67 ï½§ − U+FF6F ッã
é·é³è¨å·ã U+30FC é·é³è¨å·ãU+FF70 é·é³è¨å· (代æ¿åç§°)ã
注ããããã®æåã¯è¡é ç¦åæåã¨æ±ããããã¨ããé常ã®è¡¨èªæåçãªæå- ã¨æ±ããããã¨ãããã[JIS X 4051] 6.1.1ã[JLREQ] 3.1.7 ã [UAX14] ãåç§ã
注ãU+3095 ã, U+3096 ã, U+30F5 ãµ, U+30F6 ã¶ ã¯ä»®åã§ã¯ãªãã¨ãããã
"MASU_MARK() => LB_ID"
U+303C ã¾ãè¨å·ã
注ããã®æåã¯ä»®åã§ã¯ãªãããé常 "ã¾ã" ã "ãã¹" ã®ç¥è¨ã¨ãã¦ç¨ããããã
注ããã®æå㯠[UAX #14] ã§ã¯è¡é ç¦åæå (NS) ã«åé¡ããããã[JIS X 4051] ã [JLREQ] ã§ã¯æåã¯ã©ã¹ (13) ã cl−19 (ID ã«ç¸å½) ã«åé¡ãããã
ææ§ãªå¼ç¨ç¬¦
åæå¤ã§ã¯ãè¥å¹²ã®è¨å·ãææ§ãªå¼ç¨ç¬¦
(QU) ã¨æ±ãã
"BACKWARD_QUOTES() => LB_OP, FORWARD_QUOTES() =>
LB_CL"
ããè¨èª (ãªã©ã³ãèªãè±èªãã¤ã¿ãªã¢èªããã«ãã¬ã«èªãã¹ãã¤ã³èªããã«ã³èªã ããã³æ±ã¢ã¸ã¢ã®å¤ãã®è¨èª) ã§ã¯ãéãè¨å·ã« 9 ãå転ããå½¢ç¶ã®å¼ç¨ç¬¦ (â â) ããéãè¨å·ã« 9 ã®å½¢ç¶ã®å¼ç¨ç¬¦ (â â) ãç¨ããã
"FORWARD_QUOTES() => LB_OP, BACKWARD_QUOTES() => LB_CL"
ã»ãã®è¨èª (ãã§ã³èªããã¤ãèªãã¹ãã´ã¡ã¯èª) ã§ã¯ã9 ã®å½¢ç¶ã®å¼ç¨ç¬¦ (â â) ãéãè¨å·ã«ã9 ãå転ããå½¢ç¶ã®å¼ç¨ç¬¦ (â â) ãéãè¨å·ã«ç¨ããã
"BACKWARD_GUILLEMETS() => LB_OP, FORWARD_GUILLEMETS() => LB_CL"
ãã©ã³ã¹èªãã®ãªã·ã£èªããã·ã¢èªãªã©ã§ã¯ãå·¦åãã®ã®ã¥ã¡ (« â¹) ãéãè¨å·ã«ãå³åãã®ã®ã¥ã¡ (» âº) ãéãè¨å·ã«ç¨ããã
"FORWARD_GUILLEMETS() => LB_OP, BACKWARD_GUILLEMETS() => LB_CL"
ãã¤ãèªãã¹ãã´ã¡ã¯èªã§ã¯ãå³åãã®ã®ã¥ã¡ (» âº) ãéãè¨å·ã«ãå·¦åãã®ã®ã¥ã¡ (« â¹) ãéãè¨å·ã«ç¨ããã
ãã¼ã³èªããã£ã³èªããã«ã¦ã§ã¼èªãã¹ã¦ã§ã¼ãã³èªã§ã¯ã9 ã®å½¢ç¶ã®å¼ç¨ç¬¦ã å³åãã®ã®ã¥ã¡ (â â » âº) ãéãè¨å·ã«ãéãè¨å·ã«ãç¨ããã
ååéé
"IDEOGRAPHIC_SPACE() => LB_BA"
U+3000 ååééãè¡é ã«æ¥ãªãããã«ããã ãããåæã®æåã§ããã
"IDEOGRAPHIC_SPACE() => LB_ID"
ååééãè¡é ã«æ¥ããã¨ãããã Unicode 6.2以åã¯ãããåæã®æåã§ãã£ãã
"IDEOGRAPHIC_SPACE() => LB_SP"
ååééãè¡é ã«æ¥ããè¡æ«ã§ã¯ã¯ã¿åºãããã«ããã
East_Asian_Width ç¹æ§
ã©ãã³ãã®ãªã·ã¢ãããªã«ã®åç¨åç³»ã§ã¯ãç¹å®ã®æåãææ§
(A) ã® East_Asian_Width
ç¹æ§ãæã£ã¦ããããã®ããããããã£ãæåã¯
"EASTASIAN"
æèã§åºãæåã¨æ±ãããã
"EAWidth => [ AMBIGUOUS_"*"() => EA_N
]"
ã¨æå®ãããã¨ã§ããã®ãããªæåã常ã«çãæåã¨æ±ãã
"AMBIGUOUS_ALPHABETICS() => EA_N"
ä¸è¨ã®æåãã¹ã¦ã East_Asian_Width ç¹æ§ N (ä¸ç«) ã®æåã¨æ±ãã
"AMBIGUOUS_CYRILLIC() =>
EA_N"
"AMBIGUOUS_GREEK() => EA_N"
"AMBIGUOUS_LATIN() => EA_N"
ææ§ (A) ã®å¹ãæã¤ããªã«ãã®ãªã·ã¢ãã©ãã³ç¨åç³»ã®æåãä¸ç« (N) ã®æåã¨æ±ãã
ãã£ã½ããæ±ã¢ã¸ã¢ã®ç¬¦å·åæåéåã«å¯¾ããå¤ãã®å®è£ã§ãã³ãã³åºãæå-
ã«æç»ããã¦ããã«ãããããããUnicode
æ¨æºã§ã¯å¨è§
(F)
ã®äºææåãæã¤ãããã«çã
(Na)
æåã¨ããã¦ããæåãè¥å¹²ãããEAWidth
ãªãã·ã§ã³ã«ä»¥ä¸ã®ããã«æå®ãããã¨ã§ããããã®æåã
"EASTASIAN"
æèã§åºãæåã¨æ±ããã
"QUESTIONABLE_NARROW_SIGNS() => EA_A"
U+00A2 ã»ã³ãè¨å·ãU+00A3 ãã³ãè¨å·ãU+00A5 åè¨å· (ã¾ãã¯åè¨å·)ãU+00A6 ç ´æç·ãU+00AC å¦å®ãU+00AF ãã¯ãã³ã
"new" ã¡ã½ããããã³ "config" ã¡ã½ããã®ãªãã·ã§ã³å¼æ°ã®çµã¿è¾¼ã¿åæå¤ã¯ã è¨- å®ãã¡ã¤ã«ã§ä¸æ¸ãã§ããã Unicode/LineBreak/Defaults.pmã 詳細㯠Unicode/LineBreak/Defaults.pm.sample ãèªãã§ã»ããã
ãã°ããã°ã®ãããªåä½ã¯ãéçºèã«æãã¦ãã ããã
CPAN Request Tracker: <http://rt.cpan.org/Public/Dist/Display.html?Name=Unicode−LineBreak>.
$VERSION 夿°ãåç§ãã¦ã»ããã
2012.06
• |
eawidth() ã¡ã½ããã廿¢ããã 代ããã« "columns" in Unicode::GCString ã使ãããããããªãã | ||
• |
lbclass() ã¡ã½ããã廿¢ããã "lbc" in Unicode::GCString ã "lbcext" in Unicode::GCString ã使ã£ã¦ã»ããã |
ãã®ã¢ã¸ã¥ã¼ã«ã§ç¨ãã¦ããæåã®ç¹æ§å¤ã¯ãUnicode æ¨æº 8.0.0çã«ããã
ãã®ã¢ã¸ã¥ã¼ã«ã§ã¯ãå®è£æ°´æº UAX14−C2 ãå®è£ãã¦ããã¤ããã
• |
ä¸é¨ã®è¡¨èªæåçãªæåã NS ã¨ãã¦æ±ãã ID ã¨ãã¦æ±ãããé¸ã¹ãã | ||
• |
ãã³ã°ã«é³ç¯ããã³ãã³ã°ã«é£çµãã£ã¢ã ID ã¨ãã¦æ±ãã AL ã¨ãã¦æ±ãããé¸ã¹ãã | ||
• |
AI ã«åé¡ãããæåã AL 㨠ID ã®ã©ã¡ãã«è§£æ±ºããããé¸ã¹ãã | ||
• |
CB ã«åé¡ãããæåã¯è§£æ±ºããªãã | ||
• |
CJ ã«åé¡ãããæåã¯å¸¸ã« NS ã«è§£æ±ºãããããæè»ãªæç´ãã®æ©æ§ãæä¾ãããã | ||
• |
æ±åã¢ã¸ã¢ã®è¡¨è¨ä½ç³»ã®åèªåç¯ã«å¯¾å¿ããªãå ´åã¯ã SA ã«åé¡ãããæå㯠AL ã«è§£æ±ºããã ãã ããGrapheme_Cluster_Break ç¹æ§ã®å¤ã Extend ã SpacingMark ã§ããæå㯠CM ã«è§£æ±ºããã | ||
• |
SG ã XX ã«åé¡ãããæå㯠AL ã«è§£æ±ºããã | ||
• |
以ä¸ã® UCS ã®ç¯å²ã«ããã³ã¼ããã¤ã³ãã¯ãæå- ãå²ãå½ã¦ããã¦ããªãã¦ã決ã¾ã£ãç¹æ§å¤ãæã¤ã |
ç¯å²
| UAX #14 | UAX #11 | 説æ
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
U+20A0..U+20CF | PR [*1] | N [*2] |
é貨è¨å·
U+3400..U+4DBF | ID | W | CJKæ¼¢å
U+4E00..U+9FFF | ID | W | CJKæ¼¢å
U+D800..U+DFFF | AL (SG) | N |
ãµãã²ã¼ã
U+E000..U+F8FF | AL (XX) | F ã N (A) |
ç§ç¨é å
U+F900..U+FAFF | ID | W | CJKæ¼¢å
U+20000..U+2FFFD | ID | W | CJKæ¼¢å
U+30000..U+3FFFD | ID | W |
夿¼¢å
U+F0000..U+FFFFD | AL (XX) | F ã N (A) |
ç§ç¨é å
U+100000..U+10FFFD | AL (XX) | F ã N (A) |
ç§ç¨é å
ãã®ä»æªå²ãå½ã¦
| AL (XX) | N |
æªå²ãå½ã¦ã
| | |
äºç´ãéæå
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
[*1] U+20A7
ãã»ã¿è¨å·
(PO)ãU+20B6
ãã¥ã¼ã«ã»ãªã¼ã´ã«è¨å·
(PO)ãU+20BB
ã¹ã«ã³ãã£ãã´ã£ã¢ã»ãã«ã¯è¨å·
(PO)ãU+20BE
ã©ãªè¨å·
(PO) ãé¤ãã
[*2] U+20A9
ã¦ã©ã³è¨å·
(H)ãU+20AC
ã¦ã¼ãè¨å·
(F ã N (A)) ã
é¤ãã
• |
ä¸è¬ã«ãã´ãªç¹æ§ã MnãMeãCcãCfãZlãZp ã®ããããã§ããæåã¯ãåé²ãä¼´ããªãæå- ã¨ã¿ãªãã |
[CMOS]
The Chicago Manual of Style, 15th edition. University of Chicago Press, 2003.
[JIS X 4051]
JIS X 4051:2004 æ¥æ¬èªææ¸ã®çµçæ¹æ³. æ¥æ¬è¦æ ¼åä¼, 2004.
[JLREQ]
é¿å康å®ä». æ¥æ¬èªçµçå¦çã®è¦ä»¶, W3C æè¡ãã¼ã 2012å¹´4æ3æ¥. <http://www.w3.org/TR/2012/NOTE−jlreq−20120403/ja/>.
[UAX #11]
A. Freytag (ed.) (2008−2009). Unicode Standard Annex #11: East Asian Width, Revisions 17−19. <http://unicode.org/reports/tr11/>.
[UAX #14]
A. Freytag and A. Heninger (eds.) (2008−2015). Unicode Standard Annex #14: Unicode Line Breaking Algorithm, Revisions 22−35. <http://unicode.org/reports/tr14/>.
[UAX #29]
Mark Davis (ed.) (2009−2013). Unicode Standard Annex #29: Unicode Text Segmentation, Revisions 15−23. <http://www.unicode.org/reports/tr29/>.
Text::LineFold˜[ja], Text::Wrap, Unicode::GCString˜[ja].
Copyright (C) 2009−2018 Hatuka*nezumi − IKEDA Soji <hatuka(at)nezumi.nu>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.