123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163 |
- # These test special UTF and UCP features of DFA matching. The output is
- # different for the different widths.
- #subject dfa
- # ----------------------------------------------------
- # These are a selection of the more comprehensive tests that are run for
- # non-DFA matching.
- /X/utf
- XX\x{d800}
- Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
- XX\x{d800}\=offset=3
- Error -36 (bad UTF-8 offset)
- XX\x{d800}\=no_utf_check
- 0: X
- XX\x{da00}
- Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
- XX\x{da00}\=no_utf_check
- 0: X
- XX\x{dc00}
- Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
- XX\x{dc00}\=no_utf_check
- 0: X
- XX\x{de00}
- Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
- XX\x{de00}\=no_utf_check
- 0: X
- XX\x{dfff}
- Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
- XX\x{dfff}\=no_utf_check
- 0: X
- XX\x{110000}
- Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2
- XX\x{d800}\x{1234}
- Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
-
- /badutf/utf
- X\xdf
- Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1
- XX\xef
- Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
- XXX\xef\x80
- Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
- X\xf7
- Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1
- XX\xf7\x80
- Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
- XXX\xf7\x80\x80
- Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
- /shortutf/utf
- XX\xdf\=ph
- Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
- XX\xef\=ph
- Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
- XX\xef\x80\=ph
- Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
- \xf7\=ph
- Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
- \xf7\x80\=ph
- Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
-
- # ----------------------------------------------------
- # UCP and casing tests - except for the first two, these will all fail in 8-bit
- # mode because they are testing UCP without UTF and use characters > 255.
- /\x{c1}/i,no_start_optimize
- \= Expect no match
- \x{e1}
- No match
- /\x{c1}+\x{e1}/iB,ucp
- ------------------------------------------------------------------
- Bra
- /i \x{c1}+
- /i \x{e1}
- Ket
- End
- ------------------------------------------------------------------
- \x{c1}\x{c1}\x{c1}
- 0: \xc1\xc1\xc1
- 1: \xc1\xc1
- \x{e1}\x{e1}\x{e1}
- 0: \xe1\xe1\xe1
- 1: \xe1\xe1
- /\x{120}\x{c1}/i,ucp,no_start_optimize
- Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
- \x{121}\x{e1}
- /\x{120}\x{c1}/i,ucp
- Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
- \x{121}\x{e1}
- /[^\x{120}]/i,no_start_optimize
- Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
- \x{121}
- /[^\x{120}]/i,ucp,no_start_optimize
- Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
- \= Expect no match
- \x{121}
- /[^\x{120}]/i
- Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
- \x{121}
- /[^\x{120}]/i,ucp
- Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
- \= Expect no match
- \x{121}
-
- /\x{120}{2}/i,ucp
- Failed: error 134 at offset 6: character code point value in \x{} or \o{} is too large
- \x{121}\x{121}
- /[^\x{120}]{2}/i,ucp
- Failed: error 134 at offset 8: character code point value in \x{} or \o{} is too large
- \= Expect no match
- \x{121}\x{121}
- # ----------------------------------------------------
- # ----------------------------------------------------
- # Tests for handling 0xffffffff in caseless UCP mode. They only apply to 32-bit
- # mode; for the other widths they will fail.
- /k*\x{ffffffff}/caseless,ucp
- Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
- \x{ffffffff}
- /k+\x{ffffffff}/caseless,ucp,no_start_optimize
- Failed: error 134 at offset 13: character code point value in \x{} or \o{} is too large
- K\x{ffffffff}
- \= Expect no match
- \x{ffffffff}\x{ffffffff}
- /k{2}\x{ffffffff}/caseless,ucp,no_start_optimize
- Failed: error 134 at offset 15: character code point value in \x{} or \o{} is too large
- \= Expect no match
- \x{ffffffff}\x{ffffffff}\x{ffffffff}
- /k\x{ffffffff}/caseless,ucp,no_start_optimize
- Failed: error 134 at offset 12: character code point value in \x{} or \o{} is too large
- K\x{ffffffff}
- \= Expect no match
- \x{ffffffff}\x{ffffffff}\x{ffffffff}
- /k{2,}?Z/caseless,ucp,no_start_optimize,no_auto_possess
- \= Expect no match
- Kk\x{ffffffff}\x{ffffffff}\x{ffffffff}Z
- ** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
- ** Truncation will probably give the wrong result.
- ** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
- ** Truncation will probably give the wrong result.
- ** Character \x{ffffffff} is greater than 255 and UTF-8 mode is not enabled.
- ** Truncation will probably give the wrong result.
- No match
- # ----------------------------------------------------
- # End of testinput14
|