Z-j;vdZddlmZddlZddlZddlmZddlmZddl m Z m Z m Z ddgZ d d lmZd d lmZmZmZmZmZmZmZmZmZmZmZmZmZmZe rdd lm Z ej!d ko e"edZ#dZ$GddeZ%edd4dZ&edd5dZ'edd5dZ(edd5dZ)edd5dZ*Gdde Z+edd6d"Z,d7d'Z- d8d9d-Z. d8d:d/Z/d;d1Z0dri)maxsizeucsintreturnc|dkr tjS|dkr tjS|dkr tjSt |t r tjSt |tr tjSt |tr tj St |tr tj St |tr tjSt |tr tjSt |t"r tjSt |t&r tjSt |t*r tjSt |t.r tjStjS)z;Return the Grapheme_Cluster_Break property for a codepoint.r*r'i )rr0r1r4 _bisearchrr2rr3rr5rr6rr7r r8r r9r r:rr;rr<r/rAs r?_grapheme_cluster_breakrGNsA  f}}v  f}}v  f}}w&''{o&&z122&%%&''{*++ j!!u j!!u j!!u k""v l##w 9r>boolcFtt|tS)z6Check if codepoint has Extended_Pictographic property.)rHrErrFs r?_is_extended_pictographicrJqs  #455 6 66r>cFtt|tS)z,Check if codepoint has InCB=Linker property.)rHrErrFs r?_is_incb_linkerrLw  #{++ , ,,r>cFtt|tS)z/Check if codepoint has InCB=Consonant property.)rHrErrFs r?_is_incb_consonantrO}s  #~.. / //r>cFtt|tS)z,Check if codepoint has InCB=Extend property.)rHrErrFs r?_is_incb_extendrQrMr>c(eZdZUdZded<ded<dS) BreakResultz*Result of grapheme cluster break decision.rH should_breakrBri_countN)r+r,r-r.__annotations__r=r>r?rSrSs+44MMMMMr>rSprev_gcbcurr_gcbOptional[BreakResult]c|tjkr!|tjkrtddS|tjtjtjfvrtddS|tjtjtjfvrtddS|tjkrA|tjtjtjtjfvrtddS|tjtjfvr+|tjtj fvrtddS|tjtj fvr!|tj krtddS|tj krtddS|tj krtddS|tj krtddSdS)z Check simple GCB-pair-based break rules (cacheable). Returns BreakResult for rules that can be determined from GCB properties alone, or None if complex lookback rules (GB9c, GB11) need to be checked. FrrTrUTN) rr0r1rSr2r8r9r;r<r:r3r7r6)rWrXs r?_simple_break_checkr\s36h#&00::::CK000q9999CK000q999935X#%)HHH::::CFCE?""xCE35>'A'A::::CGSU###CE(9(9::::3:::::3###::::3;:::: 4r>textstrcurr_idxrUct||}||S|tjkrtddSt ||}t |rxd}|dz }|dkrkt ||} t | rd}|dz}n9t| r|dz}n$t | r|rtddSnn|dkk|tjkr{t|rl|dz }|dkrat ||} t| } | tj kr|dz}n!t| rtddSn|dka|tj kr>|tj kr.|dzdkrtd|dzStddS|tj krdnd}td|S)z Determine if there should be a grapheme cluster break between prev and curr. Implements UAX #29 grapheme cluster boundary rules. NFrr[r Tr ) r\rr4rSordrOrLrQrJrGr3r5) rWrXr]r_rUresultcurr_ucs has_linkeriprev_ucs prev_props r? _should_breakrhs!8 4 4F  37:::: 4>""H(## qL1ff47||Hx(( ! Q ** Q#H-- G&EAFFFF1ff378BB qL1ff47||H/99ICJ&&Q*844 "BBBB1ff3)))h#:P.P.P a<1  EHqLIII Iq9999 666qqAH D8 < < <unistrstartend Optional[int] Iterator[str]c#K|sdSt|}||}||ks||krdSt||}|||}tj|D]}||j|jVdS)i Iterate over grapheme clusters using :func:`unicodedata.iter_graphemes`. Grapheme clusters are "user-perceived characters" - what a user would consider a single character, which may consist of multiple Unicode codepoints (e.g., a base character with combining marks, emoji sequences). :param unistr: The Unicode string to segment. :param start: Starting index (default 0). :param end: Ending index (default len(unistr)). :yields: Grapheme cluster substrings. Example:: >>> list(iter_graphemes('cafe\u0301')) ['c', 'a', 'f', 'e\u0301'] >>> list(iter_graphemes('ok\U0001F468\u200D\U0001F469\u200D\U0001F467')) ['o', 'k', '\U0001F468\u200D\U0001F469\u200D\U0001F467'] >>> list(iter_graphemes('ok\U0001F1FA\U0001F1F8')) ['o', 'k', '\U0001F1FA\U0001F1F8'] .. versionadded:: 0.3.0 N)lenmin unicodedatarrjrk)rirjrklength full_segmentsegs r?_iter_graphemes_stdlibrvs8  [[F { ||u c6  C%)$L),77..39SW,-----..r> int | Nonec#K|sdSt|}||}||ks||krdSt||}|}d}tt||}|tjkrd}t |dz|D]U}tt||}t|||||} | j}| j r|||V|}|}V|||VdS)roNrr ) rprqrGrarr5rangerhrUrT) rirjrkrs cluster_startrUrWidxrXrbs r?_iter_graphemes_pythonr|2s(8  [[F { ||u c6  CMH's6%='9'9::H3)))UQY$$  *3vc{+;+;<<x63II?   s*+ + + +M s" ######r>poscVt||dz }|dkr|dkr||dz dkr|dz S|dkr_|dkrT|dkrNt||dz }|dkr0t|tjkrt ||dz S|dz S|dz }|dkrk||z t kr]t||}d|cxkrdkrnnn7t|tjkrn|dz}|dkr||z t k]|}tt||}|tjkrdnd}t|dz|D]I} tt|| } t|| || |} | j }| j r| }| }J|S)a Find the start of the grapheme cluster containing the character before pos. Scans backwards from pos to find a safe starting point, then iterates forward using standard break rules to find the actual cluster boundary. :param text: The Unicode string. :param pos: Position to search before (exclusive). :returns: Start position of the grapheme cluster. r r'r  rr) rarGrr6_find_cluster_startMAX_GRAPHEME_SCANr2r5ryrhrUrT) r]r} target_cpprev_cp safe_startcprzleft_gcbrUre right_gcbrbs r?rrvsDqM""IDSAXX$sQw-4*?*?Qw4 !88 T))$sQw-((G$#:7#C#Cs{#R#R*4q999QwqJ q..cJ.2CCC j! " " 2           "2 & &#+ 5 5 a q..cJ.2CCCM&s4 +;'<'<==H 666qqAH :>3 ' '+CQLL99 xD!XFF?   M r>c h|dkrdSt|t|t|S)a Find the grapheme cluster boundary immediately before a position. :param unistr: The Unicode string to search. :param pos: Position in the string (0 < pos <= len(unistr)). :returns: Start index of the grapheme cluster containing the character at pos-1. Example:: >>> grapheme_boundary_before('Hello \U0001F44B\U0001F3FB', 8) 6 >>> grapheme_boundary_before('a\r\nb', 3) 1 .. versionadded:: 0.3.6 r)rrqrp)rir}s r?grapheme_boundary_beforers2" axxq vs3F '<'< = ==r>c#K|sdSt|}||nt||}t|d}||ks||krdS|}||kr.t||}||krdS|||V|}||k,dSdS)a Iterate over grapheme clusters in reverse order (last to first). :param unistr: The Unicode string to segment. :param start: Starting index (default 0). :param end: Ending index (default len(unistr)). :yields: Grapheme cluster substrings in reverse order. Example:: >>> list(iter_graphemes_reverse('cafe\u0301')) ['e\u0301', 'f', 'a', 'c'] .. versionadded:: 0.3.6 Nr)rprqmaxr)rirjrkrsr}rzs r?iter_graphemes_reversers(  [[FK&&Sf%5%5C qMME ||u C +++FC88 5 E]3&'''' ++++++r>)rArBrCr)rArBrCrH)rWrrXrrCrY) rWrrXrr]r^r_rBrUrBrCrS)rN)rir^rjrBrkrlrCrm)rir^rjrBrkrwrCrm)r]r^r}rBrCrB)rir^r}rBrCrB)4r. __future__rsysrrenumr functoolsrtypingrrr__lazy_modules__r rEtable_graphemer r r rrrrrrrrrrrcollections.abcr version_infohasattr_HAS_PYTHON315_ITER_GRAPHEMESrrrGrJrLrOrQrSr\rhrvr|rrrrr=r>r?rs#""""" 6666666666 ,+++++ : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :)((((((/ -..     '   , 4D 47777  4----  40000  4---- * 4----`@=@=@=@=J+.+.+.+.+.`A$A$A$A$A$H1111h>>>>0&&&&&V<  r>