Strings.html 6.9 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
  1. <!DOCTYPE html> <HTML lang=en> <HEAD> <STYLE>
  2. body { background-color: #EEFFEE; font-size: 1.0rem; font-family: Arial; max-width: 60rem;
  3. color: #000000; margin: 0px;
  4. padding-left: 0px; padding-right: 0px; padding-top: 0px; padding-bottom: 0px; }
  5. H1 { padding-left: 10px; padding-right: 0px; padding-top: 10px; padding-bottom: 10px; font-size: 1.4rem; }
  6. H2 { padding-left: 10px; padding-right: 0px; padding-top: 10px; padding-bottom: 0px; font-size: 1.2rem; }
  7. blockquote {
  8. tab-size: 3rem;
  9. color: #88FF88; background: #000000;
  10. font-size: 0.95rem; font-family: monospace;
  11. padding-left: 5px; padding-right: 5px;
  12. padding-top: 5px; padding-bottom: 5px;
  13. }
  14. P { padding-left: 20px; padding-right: 0px; padding-top: 0px; padding-bottom: 0px; }
  15. IMG { padding-left: 0px; padding-right: 0px; padding-top: 2px; padding-bottom: 0px;
  16. max-width: 100%; }
  17. A { display: inline; border-radius: 4px;
  18. font-size: 1.0rem; font-family: Arial; color: #000044; text-decoration: none;
  19. padding-left: 4px; padding-right: 4px; padding-top: 4px; padding-bottom: 4px; }
  20. A:hover { color: #FFFF00; background: #000044; }
  21. A:active { color: #FFFFFF; background: #444444; }
  22. </STYLE> </HEAD> <BODY>
  23. <IMG SRC="Images/Title.png" ALT="Images/Title.png">
  24. <P>
  25. <A href="Manual.html">Back to main page</A>
  26. </P><P>
  27. </P><H1> Basic text operations</H1><P>The string API can be found in Source/DFPSR/api/stringAPI.h, where you can read the specific documentation for each method.
  28. Element access it read-only even for writable strings, so you're supposed to create strings by clearing and appending.
  29. </P><IMG SRC="Images/Border.png"><P>
  30. </P><H2> Why have a custom string type instead of using std::string?</H2><P>Developers often wonder why many C++ frameworks have their own string types instead of using std::string.
  31. While std::string might be enough when simply passing on a filename in a native format and maybe merge some strings, it is not nearly powerful enough for parsing text files.
  32. The problem comes from the simple fact that std::string is an entirely undefined type across platforms.
  33. All you know about the std::string format is that it uses the char type, which is defined as a signed or unsigned integer of 8 or more bits, using any character encoding.
  34. If you for example attempt to search for the letter 'A' in std::string, you don't know if an element matching 'A' represents the letter as a whole, or is a part of a Chinese character represented using multiple char elements in UTF-8.
  35. You could spend the rest of your life implementing your most basic parsing functions for every character encoding on earth, just to handle std::string correctly.
  36. Another option would be to use std:u32string, which does have a fixed format, but then other people will insist on using std::u8string instead.
  37. Then you might as well just use a custom string type and get both a standard representation and modern optimizations.
  38. dsr::String will save you heap allocations when passed by value, because the buffer is reference counted and reused until actually modified by appending data to it.
  39. It will behave just as if the content was cloned every time you pass it by value, but it automatically reuses allocations when doing so will save time and memory.
  40. Because dsr::ReadableString stores the length as an integer per view instead of writing a null terminator into the character data, splitting a string will create many views to the same allocation instead of creating lots of small heap allocations just to insert null terminators.
  41. </P><IMG SRC="Images/Border.png"><P>
  42. </P><H2> Encoding</H2><P>Both dsr::String and dsr::ReadableString are encoded in the UTF-32 format using only line-feed for line-breaks.
  43. This takes more memory but guarantees that each character is one element which makes algorithms a lot easier to implement when you cannot get corrupted characters or line-breaks by mistake.
  44. </P><IMG SRC="Images/Border.png"><P>
  45. </P><H2> string_load</H2><P>Loading text from a file using string_load supports UTF-8 and UTF-16.
  46. If no byte order mark is detected, the content is loaded as raw Latin-1 by treating each byte in the file as U+00 to U+FF.
  47. Loading a string from a file using string_load removes carriage-return (U'\r' or 13) and null terminators (U'\0' or 0).
  48. </P><IMG SRC="Images/Border.png"><P>
  49. </P><H2> string_save</H2><P>Saving text to a file using string_save lets you select the encodings for characters and line-breaks.
  50. By default, text is stored as UTF-8 (only takes more space when needed) with a byte order mark (so that other programs know that it's UTF-8) and CR-LF line-breaks (so that it can be read on all major desktop systems).
  51. </P><P>
  52. </P><IMG SRC="Images/Border.png"><P>
  53. </P><H2> dsr::String</H2><P>String is the dynamic text container based on reference counting and immutability.
  54. It guarantees that a head allocated buffer exists when length > 0.
  55. Assigning a String to another will make a shallow copy and increase the buffer's reference count.
  56. Appending more text to or clearing a String sharing its buffers with others will clone the buffer to prevent it from overwriting other strings.
  57. Splitting a String will use reference counting to refer to the same allocation.
  58. Splitting a literal will first create a new heap allocation and then refer to it from all new elements.
  59. </P><IMG SRC="Images/Border.png"><P>
  60. </P><H2> dsr::ReadableString</H2><P>ReadableString is used instead of String as an input argument so that U"" literals can be given without creating a new allocation.
  61. Accidentally giving a regular "" literal (not UTF-32) will be stopped instead of automatically converted.
  62. If you want to accept giving "" and automatically allocate a buffer for the UTF-32 conversion, then just use String.
  63. See the String as a value and ReadableString as a constant reference.
  64. </P><IMG SRC="Images/Border.png"><P>
  65. </P><H2> dsr::Printable</H2><P>Inheriting from Printable and defining toStreamIndented allow printing your type using printText (prints to standard output), debugText (only prints in debug mode) and throwError (calls std::runtime_error).
  66. </P><P>
  67. For non-virtual types, you can define string_toStreamIndented with an overload to keep the type simple.
  68. </P><P>
  69. Each of these printing methods allow passing multiple arguments separated by commas.
  70. To print to a new String, give a number of arguments to string_combine.
  71. If you want to keep the existing content and add more text at the end, use string_append.
  72. If appending a character, you probably don't want to print its numerical value, so call string_appendChar for each character being added.
  73. </P><P>
  74. Unlike the << operation, toStreamIndented can take an indentation argument which makes it faster and easier to serialize types into files.
  75. Just let each line begin with the given indentation and then add your own, which can be given to the child components' indentation arguments recursively.
  76. </P><IMG SRC="Images/Border.png"><P>
  77. </P>
  78. </BODY> </HTML>