| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398 |
- Strict
- Rem
- bbdoc: Streams/Text streams
- about:
- The Text Stream module allows you to load and save text in a number
- of formats: LATIN1, UTF8 and UTF16.
- The LATIN1 format uses a single byte to represent each character, and
- is therefore only capable of manipulating 256 distinct character values.
- The UTF8 and UTF16 formats are capable of manipulating up to 1114112
- character values, but will generally use greater storage space. In addition,
- many text processing applications are unable to handle UTF8 and UTF16 files.
- End Rem
- Module BRL.TextStream
- ModuleInfo "Version: 1.03 "
- ModuleInfo "Author: Mark Sibly"
- ModuleInfo "License: zlib/libpng"
- ModuleInfo "Copyright: Blitz Research Ltd"
- ModuleInfo "Modserver: BRL"
- ModuleInfo "History: 1.03 Release"
- ModuleInfo "History: Modified LoadText to handle stream URLs"
- ModuleInfo "History: 1.02 Release"
- ModuleInfo "History: Added LoadText, SaveText"
- ModuleInfo "History: Fixed UTF16LE=4"
- ModuleInfo "History: 1.01 Release"
- ModuleInfo "History: 1.00 Release"
- ModuleInfo "History: Added TextStream module"
- Import BRL.Stream
- Type TTextStream Extends TStreamWrapper
- '***** PUBLIC *****
- Const LATIN1=1
- Const UTF8=2
- Const UTF16BE=3
- Const UTF16LE=4
- Method Read( buf:Byte Ptr,count )
- For Local i=0 Until count
- If _bufcount=32 _FlushRead
- Local hi=_ReadByte()
- Local lo=_ReadByte()
- hi:-48;If hi>9 hi:-7
- lo:-48;If lo>9 lo:-7
- buf[i]=hi Shl 4 | lo
- _bufcount:+1
- Next
- Return count
- End Method
-
- Method Write( buf:Byte Ptr,count )
- For Local i=0 Until count
- Local hi=buf[i] Shr 4
- Local lo=buf[i] & $f
- hi:+48;If hi>57 hi:+7
- lo:+48;If lo>57 lo:+7
- _WriteByte hi
- _WriteByte lo
- _bufcount:+1
- If _bufcount=32 _FlushWrite
- Next
- Return count
- End Method
-
- Method ReadByte()
- _FlushRead
- Return Int( ReadLine() )
- End Method
-
- Method WriteByte( n )
- _FlushWrite
- WriteLine n
- End Method
-
- Method ReadShort()
- _FlushRead
- Return Int( ReadLine() )
- End Method
-
- Method WriteShort( n )
- _FlushWrite
- WriteLine n
- End Method
-
- Method ReadInt()
- _FlushRead
- Return Int( ReadLine() )
- End Method
-
- Method WriteInt( n )
- _FlushWrite
- WriteLine n
- End Method
-
- Method ReadLong:Long()
- _FlushRead
- Return Long( ReadLine() )
- End Method
-
- Method WriteLong( n:Long )
- _FlushWrite
- WriteLine n
- End Method
-
- Method ReadFloat:Float()
- _FlushRead
- Return Float( ReadLine() )
- End Method
-
- Method WriteFloat( n:Float )
- _FlushWrite
- WriteLine n
- End Method
-
- Method ReadDouble:Double()
- _FlushRead
- Return Double( ReadLine() )
- End Method
-
- Method WriteDouble( n:Double )
- _FlushWrite
- WriteLine n
- End Method
-
- Method ReadLine$()
- _FlushRead
- Local buf:Short[1024],i
- While Not Eof()
- Local n=ReadChar()
- If n=0 Exit
- If n=10 Exit
- If n=13 Continue
- If buf.length=i buf=buf[..i+1024]
- buf[i]=n
- i:+1
- Wend
- Return String.FromShorts(buf,i)
- End Method
-
- Method ReadFile$()
- _FlushRead
- Local buf:Short[1024],i
- While Not Eof()
- Local n=ReadChar()
- If buf.length=i buf=buf[..i+1024]
- buf[i]=n
- i:+1
- Wend
- Return String.FromShorts( buf,i )
- End Method
-
- Method WriteLine( str$ )
- _FlushWrite
- WriteString str
- WriteString "~r~n"
- End Method
-
- Method ReadString$( length )
- _FlushRead
- Local buf:Short[length]
- For Local i=0 Until length
- buf[i]=ReadChar()
- Next
- Return String.FromShorts(buf,length)
- End Method
-
- Method WriteString( str$ )
- _FlushWrite
- For Local i=0 Until str.length
- WriteChar str[i]
- Next
- End Method
-
- Method ReadChar()
- Local c=_ReadByte()
- Select _encoding
- Case LATIN1
- Return c
- Case UTF8
- If c<128 Return c
- Local d=_ReadByte()
- If c<224 Return (c-192)*64+(d-128)
- Local e=_ReadByte()
- If c<240 Return (c-224)*4096+(d-128)*64+(e-128)
- Case UTF16BE
- Local d=_ReadByte()
- Return c Shl 8 | d
- Case UTF16LE
- Local d=_ReadByte()
- Return d Shl 8 | c
- End Select
- End Method
-
- Method WriteChar( char )
- Assert char>=0 And char<=$ffff
- Select _encoding
- Case LATIN1
- _WriteByte char
- Case UTF8
- If char<128
- _WriteByte char
- Else If char<2048
- _WriteByte char/64 | 192
- _WriteByte char Mod 64 | 128
- Else
- _WriteByte char/4096 | 224
- _WriteByte char/64 Mod 64 | 128
- _WriteByte char Mod 64 | 128
- EndIf
- Case UTF16BE
- _WriteByte char Shr 8
- _WriteByte char
- Case UTF16LE
- _WriteByte char
- _WriteByte char Shr 8
- End Select
- End Method
- Function Create:TTextStream( stream:TStream,encoding )
- Local t:TTextStream=New TTextStream
- t._encoding=encoding
- t.SetStream stream
- Return t
- End Function
- '***** PRIVATE *****
-
- Method _ReadByte()
- Return Super.ReadByte()
- End Method
-
- Method _WriteByte( n )
- Super.WriteByte n
- End Method
-
- Method _FlushRead()
- If Not _bufcount Return
- Local n=_ReadByte()
- If n=13 n=_ReadByte()
- If n<>10 Throw "Malformed line terminator"
- _bufcount=0
- End Method
-
- Method _FlushWrite()
- If Not _bufcount Return
- _WriteByte 13
- _WriteByte 10
- _bufcount=0
- End Method
-
- Field _encoding,_bufcount
-
- End Type
-
- Type TTextStreamFactory Extends TStreamFactory
- Method CreateStream:TStream( url:Object,proto$,path$,readable,writeable )
- Local encoding
- Select proto$
- Case "latin1"
- encoding=TTextStream.LATIN1
- Case "utf8"
- encoding=TTextStream.UTF8
- Case "utf16be"
- encoding=TTextStream.UTF16BE
- Case "utf16le"
- encoding=TTextStream.UTF16LE
- End Select
- If Not encoding Return
- Local stream:TStream=OpenStream( path,readable,writeable )
- If stream Return TTextStream.Create( stream,encoding )
- End Method
- End Type
- New TTextStreamFactory
- Rem
- bbdoc: Load text from a stream
- returns: A string containing the text
- about:
- #LoadText loads LATIN1, UTF8 or UTF16 text from @url.
- The first bytes read from the stream control the format of the text:
- [ &$fe $ff | Text is big endian UTF16
- * &$ff $fe | Text is little endian UTF16
- * &$ef $bb $bf | Text is UTF8
- ]
- If the first bytes don't match any of the above values, the stream
- is assumed to contain LATIN1 text.
- A #TStreamReadException is thrown if not all bytes could be read.
- End Rem
- Function LoadText$( url:Object )
- Local stream:TStream=ReadStream( url )
- If Not stream Throw New TStreamReadException
- Local format,size,c,d,e
- If Not stream.Eof()
- c=stream.ReadByte()
- size:+1
- If Not stream.Eof()
- d=stream.ReadByte()
- size:+1
- If c=$fe And d=$ff
- format=TTextStream.UTF16BE
- Else If c=$ff And d=$fe
- format=TTextStream.UTF16LE
- Else If c=$ef And d=$bb
- If Not stream.Eof()
- e=stream.ReadByte()
- size:+1
- If e=$bf format=TTextStream.UTF8
- EndIf
- EndIf
- EndIf
- EndIf
- If Not format
- Local data:Byte[1024]
- data[0]=c;data[1]=d;data[2]=e
- While Not stream.Eof()
- If size=data.length data=data[..size*2]
- size:+stream.Read( (Byte Ptr data)+size,data.length-size )
- Wend
- stream.Close
- Return String.FromBytes( data,size )
- EndIf
-
- Local TStream:TTextStream=TTextStream.Create( stream,format )
- Local str$=TStream.ReadFile()
- TStream.Close
- stream.Close
- Return str
- End Function
- Rem
- bbdoc: Save text to a stream
- about:
- #SaveText saves the characters in @str to @url.
- If @str contains any characters with a character code greater than 255,
- then @str is saved in UTF16 format. Otherwise, @str is saved in LATIN1 format.
- A #TStreamWriteException is thrown if not all bytes could be written.
- End Rem
- Function SaveText( str$,url:Object )
- Local format
- For Local i=0 Until str.length
- If str[i]>255
- ?BigEndian
- format=TTextStream.UTF16BE
- ?LittleEndian
- format=TTextStream.UTF16LE
- ?
- Exit
- EndIf
- Next
-
- If Not format
- SaveString str,url
- Return True
- EndIf
- Local stream:TStream=WriteStream( url )
- If Not stream Throw New TStreamWriteException
-
- Select format
- Case TTextStream.UTF8
- stream.WriteByte $ef
- stream.WriteByte $bb
- Case TTextStream.UTF16BE
- stream.WriteByte $fe
- stream.WriteByte $ff
- Case TTextStream.UTF16LE
- stream.WriteByte $ff
- stream.WriteByte $fe
- End Select
-
- Local TStream:TTextStream=TTextStream.Create( stream,format )
- TStream.WriteString str
- TStream.Close
- stream.Close
- Return True
- End Function
|