Unicode::GCString˜[ja] − UAX #29 æ¸è¨ç´ ã¯ã©ã¹ã¿ã®åã¨ãã¦ã®æåå
use
Unicode::GCString;
$gcstring = Unicode::GCString−>new($string);
Unicode::GCString ã¯UnicodeæååããUnicodeæ¨æºé屿¸29 [UAX #29] ã§å®ç¾©ããããæ¡å¼µæ¸è¨ç´ ã¯ã©ã¹ã¿ããextended grapheme clusterãã®åã¨ãã¦æ±ãã
æ¸è¨ç´ ã¯ã©ã¹ã¿ãgrapheme clusterãã¯ãUnicodeæå- ã®åã§ãã²ã¨ã¤ã®æ¸è¨ç´ åºåºãgrapheme baseãã¨ãä»å çãªæ¸è¨ç´ ã¨ã- ã¹ãã³ããgrapheme extenderãããã³/ã¾ãã¯ãåç½®ãæåãâprependâ characterãããæããããã¯äººããæåãã¨ã¿ãªããã®ã«è¿ãã
ã³ã³ã¹ãã©ã¯ã¿
new (STRING, [KEY => VALUE, ...])
new (STRING, [LINEBREAK])
ã³ã³ã¹ãã©ã¯ã¿ã Unicodeæåå STRING ããæ°ãã«æ¸è¨ç´ ã¯ã©ã¹ã¿æåå (Unicode::GCString ãªãã¸ã§ã¯ã) ãä½ãã
KEY => VALUE ã®å¯¾ã«ã¤ãã¦ã¯ "ãªãã·ã§ã³" in Unicode::LineBreak˜[ja]ãåç§ã 第äºã®å½¢å¼ã§ã¯ã Unicode::LineBreak˜[ja] ãªãã¸ã§ã¯ã LINEBREAK ã§åç¯ã®ä»æ§ã決å®ããã
注: æåã®å½¢å¼ã¯ãªãªã¼ã¹ 2012.10 ã§å°å¥ãããã
copy
ã³ãã¼ã³ã³ã¹ãã©ã¯ã¿ã æ¸è¨ç´ ã¯ã©ã¹ã¿æååã®è¤è£½ãä½ãã æ°ããªæå- åã§ã¯ã次ã®ä½ç½®ã¯åé ã«ãªãã
é·ã
chars
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æ¸è¨ç´ ã¯ã©ã¹ã¿æååã«å«ã¾ããUnicodeæå- ã®æ°ãã¤ã¾ãUnicodeæååã¨ãã¦ã®é·ããè¿ãã
columns
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã çµã¿è¾¼ã¿ã®æå- ãã¼ã¿ãã¼ã¹ã§æ±ºå®ãããæ¸è¨ç´ ã¯ã©ã¹ã¿æååã®æ¡æ°ãè¿ãã 詳ãã㯠"DESCRIPTION" in Unicode::LineBreak˜[ja] ãåç§ã
length
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æ¸è¨ç´ ã¯ã©ã¹ã¿æå- åã«å«ã¾ããæ¸è¨ç´ ã¯ã©ã¹ã¿ã®æ°ãè¿ãã
æååã¨ãã¦ã®æä½
as_string
"""OBJECT"""
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æ¸è¨ç´ ã¯ã©ã¹ã¿æååãæç¤ºçã«Unicodeæååã«å¤æããã
cmp (STRING)
STRING "cmp" STRING
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æååãæ¯è¼ãããç¹ã«é¢¨å¤ãããªã¨ããã¯ãªãã æå- åã®ã©ã¡ãããUnicodeæååã§ãããã
concat (STRING)
STRING "." STRING
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æ¸è¨ç´ ã¯ã©ã¹ã¿æååãçµåããã STRING ã®ã©ã¡ãããUnicodeæååã§ãããã çµæã®æååã®æ¡æ° (columns() ãåç§) ãæ¸è¨ç´ ã¯ã©ã¹ã¿ã®æ° (length() ãåç§) ã¯ããµãã¤ã®æå- åã®ããã®åã«ãªãã¨ã¯ããããªããã¨ã«æ³¨æã æ°ããªæå- åã§ã¯ã次ã®ä½ç½®ã¯å·¦è¾ºã®æååã«ã»ããããã¦ããä½ç½®ã«ãªãã
join ([STRING, ...])
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã STRING ããæ¸è¨ç´ ã¯ã©ã¹ã¿æååãã¯ããã§ã¤ãªããã STRING ã®ãã¡ã« Unicodeæååããã£ã¦ãããã
substr (OFFSET, [LENGTH, [REPLACEMENT]])
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æ¸è¨ç´ ã¯ã©ã¹ã¿æååã®é¨åæååãè¿ãã OFFSET 㨠LENGTH ã¯æ¸è¨ç´ ã¯ã©ã¹ã¿ã§æ°ããã REPLACEMENT ãæå®ããã¨ãé¨åæå- åãããã§ç½®ãæããã REPLACEMENT 㯠Unicodeæååã§ãããã
Note: ãã®ã¡ã½ããã¯çµã¿è¾¼ã¿é¢æ° substr() ã¨ç°ãªãã左辺å¤ãè¿ããã¨ã¯ãªãã
æ¸è¨ç´ ã¯ã©ã¹ã¿ã®åã¨ãã¦ã®æä½
as_array
"@{"OBJECT"}"
as_arrayref
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æ¸è¨ç´ ã¯ã©ã¹ã¿æå- åããæ¸è¨ç´ ã¯ã©ã¹ã¿ã®æå ±ã®éåã«å¤æããã
eos |
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã ç¾å¨ã®ä½ç½®ãæ¸è¨ç´ ã¯ã©ã¹ã¿æååã®æå¾ãã©ãã調ã¹ãã |
item ([OFFSET])
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã OFFSETçªãã®æ¸è¨ç´ ã¯ã©ã¹ã¿ãè¿ãã OFFSET ãæå®ããªãã¨ã次ã®ä½ç½®ã®æ¸è¨ç´ ã¯ã©ã¹ã¿ã®æå ±ãè¿ãã
next
"<"OBJECT">"
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ãããå復çã æ¬¡ã®ä½ç½®ã®æ¸è¨ç´ ã¯ã©ã¹ã¿ãè¿ããæ¬¡ã®ä½ç½®ãã²ã¨ã¤é²ããã
pos ([OFFSET])
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã OFFSET ãæå®ããå ´åã¯ã次ã®ä½ç½®ãããã«ããã æ¸è¨ç´ ã¯ã©ã¹ã¿æååã®æ¬¡ã®ä½ç½®ãè¿ãã
ãã®ä»
lbc |
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æåã®æ¸è¨ç´ ã¯ã©ã¹ã¿ã®æåã®æåã®è¡åå²ã¯ã©ã¹ (Unicode::LineBreak˜[ja] åç§) ãè¿ãã |
lbcext
ã¤ã³ã¹ã¿ã³ã¹ã¡ã½ããã æå¾ã®æ¸è¨ç´ ã¯ã©ã¹ã¿ã®æå¾ã®æ¸è¨ç´ ã¨ã- ã¹ãã³ãã®è¡åå²ã¯ã©ã¹ (Unicode::LineBreak˜[ja] åç§) ãè¿ãã æ¸è¨ç´ ã¨ã- ã¹ãã³ãããªãããã¾ãã¯ã¯ã©ã¹ã CM ã®å ´åã¯ã æå¾ã®æ¸è¨ç´ åºåºã®è¡åå²ã¯ã©ã¹ãè¿ãã
• |
æ¸è¨ç´ ã¯ã©ã¹ã¿ããæ¸è¨ç´ ãã¨å¼ã¶ã¹ãã§ã¯ãªã (ã©ãªã¼ã¯ããå¼ã¶ã)ã | ||
• |
Perl ã® 5.10.1 çãããã§ã¯ãUnicode::GCString ãªãã¸ã§ã¯ããã Unicode æå- åã¸ã®æé»ã®å¤æã "utf8_mg_pos_cache_update" ãã£ãã·ã¥ãæ··ä¹±ããããã¨ãããã |
ãã¨ãã°ãã¤ãã®ããã«
$sub = substr($gcstring, $i, $j);
ãããããã«ãã¤ãã®ããã«ããã¨ããã
$sub =
substr("$gcstring", $i, $j);
$sub = substr($gcstring−>as_string, $i, $j);
• |
ãã®ã¢ã¸ã¥ã¼ã«ã§ã¯ãåæã®ãæ¸è¨ç´ã¯ã©ã¹ã¿å¢çå¤å¥ã¢ã«ã´ãªãºã ãå®è£ãã¦ããã æç´ããtailoringãã®æ©æ§ã«ã¯ã¾ã 対å¿ãã¦ããªãã |
$VERSION 夿°ãåç§ãã¦ã»ããã
2013.10
• |
new() ã¡ã½ããã¯éUnicodeæååã弿°ã«åããããã«ãªã£ãã ãã®å ´åãæå- åãiso−8859−1 (Latin 1) ãã£ã©ã¯ã¿ã»ããã§å¾©å·ããã 以åã®ãªãªã¼ã¹ã§ã¯ããã®ã¡ã½ããã«éUnicodeãå¥åããã¨æ»ã¬ããã«ãªã£ã¦ããã |
[UAX #29] Mark Davis (ed.) (2009−2013). Unicode Standard Annex #29: Unicode Text Segmentation, Revisions 15−23. <http://www.unicode.org/reports/tr29/>.
Hatuka*nezumi − IKEDA Soji <hatuka(at)nezumi.nu>
Copyright (C) 2009−2013 Hatuka*nezumi − IKEDA Soji.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.