| 12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182 |
- <!DOCTYPE html> <HTML lang=en> <HEAD> <STYLE>
- body { background-color: #EEFFEE; font-size: 1.0rem; font-family: Arial; max-width: 60rem;
- color: #000000; margin: 0px;
- padding-left: 0px; padding-right: 0px; padding-top: 0px; padding-bottom: 0px; }
- H1 { padding-left: 10px; padding-right: 0px; padding-top: 10px; padding-bottom: 10px; font-size: 1.4rem; }
- H2 { padding-left: 10px; padding-right: 0px; padding-top: 10px; padding-bottom: 0px; font-size: 1.2rem; }
- blockquote {
- tab-size: 3rem;
- color: #88FF88; background: #000000;
- font-size: 0.95rem; font-family: monospace;
- padding-left: 5px; padding-right: 5px;
- padding-top: 5px; padding-bottom: 5px;
- }
- P { padding-left: 20px; padding-right: 0px; padding-top: 0px; padding-bottom: 0px; }
- IMG { padding-left: 0px; padding-right: 0px; padding-top: 2px; padding-bottom: 0px;
- max-width: 100%; }
- A { display: inline; border-radius: 4px;
- font-size: 1.0rem; font-family: Arial; color: #000044; text-decoration: none;
- padding-left: 4px; padding-right: 4px; padding-top: 4px; padding-bottom: 4px; }
- A:hover { color: #FFFF00; background: #000044; }
- A:active { color: #FFFFFF; background: #444444; }
- </STYLE> </HEAD> <BODY>
- <IMG SRC="Images/Title.png" ALT="Images/Title.png">
- <P>
- <A href="Manual.html">Back to main page</A>
- </P><P>
- </P><H1> Basic text operations</H1><P>The string API can be found in Source/DFPSR/api/stringAPI.h, where you can read the specific documentation for each method.
- Element access it read-only even for writable strings, so you're supposed to create strings by clearing and appending.
- </P><IMG SRC="Images/Border.png"><P>
- </P><H2> Why have a custom string type instead of using std::string?</H2><P>Developers often wonder why many C++ frameworks have their own string types instead of using std::string.
- While std::string might be enough when simply passing on a filename in a native format and maybe merge some strings, it is not nearly powerful enough for parsing text files.
- The problem comes from the simple fact that std::string is an entirely undefined type across platforms.
- All you know about the std::string format is that it uses the char type, which is defined as a signed or unsigned integer of 8 or more bits, using any character encoding.
- If you for example attempt to search for the letter 'A' in std::string, you don't know if an element matching 'A' represents the letter as a whole, or is a part of a Chinese character represented using multiple char elements in UTF-8.
- You could spend the rest of your life implementing your most basic parsing functions for every character encoding on earth, just to handle std::string correctly.
- Another option would be to use std:u32string, which does have a fixed format, but then other people will insist on using std::u8string instead.
- Then you might as well just use a custom string type and get both a standard representation and modern optimizations.
- dsr::String will save you heap allocations when passed by value, because the buffer is reference counted and reused until actually modified by appending data to it.
- It will behave just as if the content was cloned every time you pass it by value, but it automatically reuses allocations when doing so will save time and memory.
- Because dsr::ReadableString stores the length as an integer per view instead of writing a null terminator into the character data, splitting a string will create many views to the same allocation instead of creating lots of small heap allocations just to insert null terminators.
- </P><IMG SRC="Images/Border.png"><P>
- </P><H2> Encoding</H2><P>Both dsr::String and dsr::ReadableString are encoded in the UTF-32 format using only line-feed for line-breaks.
- This takes more memory but guarantees that each character is one element which makes algorithms a lot easier to implement when you cannot get corrupted characters or line-breaks by mistake.
- </P><IMG SRC="Images/Border.png"><P>
- </P><H2> string_load</H2><P>Loading text from a file using string_load supports UTF-8 and UTF-16.
- If no byte order mark is detected, the content is loaded as raw Latin-1 by treating each byte in the file as U+00 to U+FF.
- Loading a string from a file using string_load removes carriage-return (U'\r' or 13) and null terminators (U'\0' or 0).
- </P><IMG SRC="Images/Border.png"><P>
- </P><H2> string_save</H2><P>Saving text to a file using string_save lets you select the encodings for characters and line-breaks.
- By default, text is stored as UTF-8 (only takes more space when needed) with a byte order mark (so that other programs know that it's UTF-8) and CR-LF line-breaks (so that it can be read on all major desktop systems).
- </P><P>
- </P><IMG SRC="Images/Border.png"><P>
- </P><H2> dsr::String</H2><P>String is the dynamic text container based on reference counting and immutability.
- It guarantees that a head allocated buffer exists when length > 0.
- Assigning a String to another will make a shallow copy and increase the buffer's reference count.
- Appending more text to or clearing a String sharing its buffers with others will clone the buffer to prevent it from overwriting other strings.
- Splitting a String will use reference counting to refer to the same allocation.
- Splitting a literal will first create a new heap allocation and then refer to it from all new elements.
- </P><IMG SRC="Images/Border.png"><P>
- </P><H2> dsr::ReadableString</H2><P>ReadableString is used instead of String as an input argument so that U"" literals can be given without creating a new allocation.
- Accidentally giving a regular "" literal (not UTF-32) will be stopped instead of automatically converted.
- If you want to accept giving "" and automatically allocate a buffer for the UTF-32 conversion, then just use String.
- See the String as a value and ReadableString as a constant reference.
- </P><IMG SRC="Images/Border.png"><P>
- </P><H2> dsr::Printable</H2><P>Inheriting from Printable and defining toStreamIndented allow printing your type using printText (prints to standard output), debugText (only prints in debug mode) and throwError (calls std::runtime_error).
- </P><P>
- For non-virtual types, you can define string_toStreamIndented with an overload to keep the type simple.
- </P><P>
- Each of these printing methods allow passing multiple arguments separated by commas.
- To print to a new String, give a number of arguments to string_combine.
- If you want to keep the existing content and add more text at the end, use string_append.
- If appending a character, you probably don't want to print its numerical value, so call string_appendChar for each character being added.
- </P><P>
- Unlike the << operation, toStreamIndented can take an indentation argument which makes it faster and easier to serialize types into files.
- Just let each line begin with the given indentation and then add your own, which can be given to the child components' indentation arguments recursively.
- </P><IMG SRC="Images/Border.png"><P>
- </P>
- </BODY> </HTML>
|