Browse Source

Implemented support for Unicode in filenames (#35)

* Implemented support for Unicode file paths.

* Updated information on what needs to change when porting.

* Added the option of ending converted strings with a null terminator.

* Using a C API for file access to work with MinGW on Windows.
Dawoodoz 3 years ago
parent
commit
14bab0e68d
5 changed files with 78 additions and 64 deletions
  1. 2 2
      README.md
  2. 34 36
      Source/DFPSR/api/fileAPI.cpp
  3. 2 1
      Source/DFPSR/api/fileAPI.h
  4. 35 24
      Source/DFPSR/api/stringAPI.cpp
  5. 5 1
      Source/DFPSR/api/stringAPI.h

+ 2 - 2
README.md

@@ -1,5 +1,5 @@
 # DFPSR
-A modern software rendering library for C++14 using SSE/NEON created by David Forsgren Piuva. If you're looking for the latest mainstream fad, look elsewhere. This is a library for quality software meant to be developed over multiple decades and survive your grandchildren with minimal maintenance. Just like carving your legacy into stone, it takes more effort to master the skill but gives a more robust result by not relying on a far away library.
+A modern software rendering library for C++14 using SSE/NEON created by David Forsgren Piuva. If you're looking for the latest mainstream fad, look elsewhere. This is a library for quality software meant to be developed over multiple decades and survive your grandchildren with minimal maintenance. Just like carving your legacy into stone, it takes more effort to master the skill but gives a more robust result by not relying on a far away library. Maximum user experience and minimum system dependency.
 
 The official website:
 [dawoodoz.com](https://dawoodoz.com)
@@ -16,7 +16,7 @@ Real-time dynamic light with depth-based casted shadows and normal mapping at 45
 * **Determinism down to machine precision** means that if it worked on one computer, it will probably work the same on another computer.
 * **Often faster than the monitor's refresh rate** for isometric graphics with dynamic light. Try the Sandbox SDK example compiled in release mode on Ubuntu or Manjaro to check if it's smooth on your CPU. Quad-core Intel Core I5 should be fast enough in resonable resolutions, hexa-core I5 will have plenty of performance and octa-core I7 is butter smooth even in high resolutions.
 * **Low latency for retro 2D graphics** using the CPU's higher frequency for low resolutions. There are no hardware limits other than CPU cycles and memory. Render to textures, apply deferred light filters or write your modified rendering pipeline for custom geometry formats.
-* **Create your legacy.** Make software that future generations might be able to port, compile and run natively without the need for emulators. Each new operating system is supported by implementing a single module with basic mouse, keyboard and window interaction. Portable key-codes are mapped to equivalents of the most essential keys. Just make sure to keep file paths and external dependencies to a minimum.
+* **Create your legacy.** Make software that future generations might be able to port, compile and run natively without the need for emulators. Each new operating system is supported by entering some information about the file system in fileAPI.cpp (path encoding, separator syntax...) and implementing a wrapper module in Source/windowManagers (mouse, keyboard, canvas, full-screen...). This standardization of minimal system dependency makes it easy to repair things and port to new systems on your own, rather than having everyone porting their own subset of a feature bloated media layer when popularity dies out.
 
 ## More than a graphics API, less than a graphics engine
 It is a rendering API, image processing framework and graphical user interface system in a static C++14 library meant to minimize the use of dynamic dependencies in long-term projects while still offering the power to make your own abstractions on top of low-level rendering operations. The core library itself is pure math on a hardware abstraction and can be compiled on most systems using GNU's C++14.

+ 34 - 36
Source/DFPSR/api/fileAPI.cpp

@@ -1,6 +1,6 @@
 // zlib open source license
 //
-// Copyright (c) 2020 David Forsgren Piuva
+// Copyright (c) 2020 to 2022 David Forsgren Piuva
 // 
 // This software is provided 'as-is', without any express or implied
 // warranty. In no event will the authors be held liable for any damages
@@ -28,32 +28,36 @@
 
 namespace dsr {
 
-// TODO: Try converting to UTF-8 for file names, which would only have another chance at working
-static char toAscii(DsrChar c) {
-	if (c > 127) {
-		return '?';
-	} else {
-		return c;
+// If porting to a new operating system that is not following Posix standard, list how the file system works here.
+// * pathSeparator is the token used to separate folders in the system, expressed as a UTF-32 string literal.
+// * accessFile is the function for opening a file using the UTF-32 filename, for reading or writing.
+//   The C API is used for access, because some C++ standard library implementations don't support wide strings for MS-Windows.
+#if defined(WIN32) || defined(_WIN32)
+	#include <windows.h>
+	static const char32_t* pathSeparator = U"\\";
+	static FILE* accessFile(const ReadableString &filename, bool write) {
+		Buffer pathBuffer = string_saveToMemory(filename, CharacterEncoding::BOM_UTF16LE, LineEncoding::CrLf, false, true);
+		return _wfopen((const wchar_t*)buffer_dangerous_getUnsafeData(pathBuffer), write ? L"wb" : L"rb");
 	}
-}
-#define TO_RAW_ASCII(TARGET, SOURCE) \
-	char TARGET[string_length(SOURCE) + 1]; \
-	for (int i = 0; i < string_length(SOURCE); i++) { \
-		TARGET[i] = toAscii(SOURCE[i]); \
-	} \
-	TARGET[string_length(SOURCE)] = '\0';
+#else
+	static const char32_t* pathSeparator = U"/";
+	static FILE* accessFile(const ReadableString &filename, bool write) {
+		Buffer pathBuffer = string_saveToMemory(filename, CharacterEncoding::BOM_UTF8, LineEncoding::CrLf, false, true);
+		return fopen((const char*)buffer_dangerous_getUnsafeData(pathBuffer), write ? "wb" : "rb");
+	}
+#endif
 
 Buffer file_loadBuffer(const ReadableString& filename, bool mustExist) {
-	// TODO: Load files using Unicode filenames when available
-	TO_RAW_ASCII(asciiFilename, filename);
-	std::ifstream fileStream(asciiFilename, std::ios_base::in | std::ios_base::binary);
-	if (fileStream.is_open()) {
-		// Get the file's length and allocate an array for the raw encoding
-		fileStream.seekg (0, fileStream.end);
-		int64_t fileLength = fileStream.tellg();
-		fileStream.seekg (0, fileStream.beg);
-		Buffer buffer = buffer_create(fileLength);
-		fileStream.read((char*)buffer_dangerous_getUnsafeData(buffer), fileLength);
+	FILE *file = accessFile(filename, false);
+	if (file != nullptr) {
+		// Get the file's size by going to the end, measuring, and going back
+		fseek(file, 0L, SEEK_END);
+		int64_t fileSize = ftell(file);
+		rewind(file);
+		// Allocate a buffer of the file's size
+		Buffer buffer = buffer_create(fileSize);
+		fread((void*)buffer_dangerous_getUnsafeData(buffer), fileSize, 1, file);
+		fclose(file);
 		return buffer;
 	} else {
 		if (mustExist) {
@@ -65,27 +69,21 @@ Buffer file_loadBuffer(const ReadableString& filename, bool mustExist) {
 }
 
 void file_saveBuffer(const ReadableString& filename, Buffer buffer) {
-	// TODO: Save files using Unicode filenames
 	if (!buffer_exists(buffer)) {
 		throwError(U"buffer_save: Cannot save a buffer that don't exist to a file.\n");
 	} else {
-		TO_RAW_ASCII(asciiFilename, filename);
-		std::ofstream fileStream(asciiFilename, std::ios_base::out | std::ios_base::binary);
-		if (fileStream.is_open()) {
-			fileStream.write((char*)buffer_dangerous_getUnsafeData(buffer), buffer_getSize(buffer));
-			fileStream.close();
+		FILE *file = accessFile(filename, true);
+		if (file != nullptr) {
+			fwrite((void*)buffer_dangerous_getUnsafeData(buffer), buffer_getSize(buffer), 1, file);
+			fclose(file);
 		} else {
-			throwError("Failed to save ", filename, "\n");
+			throwError("Failed to save ", filename, ".\n");
 		}
 	}
 }
 
 const char32_t* file_separator() {
-	#if defined(WIN32) || defined(_WIN32)
-		return U"\\";
-	#else
-		return U"/";
-	#endif
+	return pathSeparator;
 }
 
 }

+ 2 - 1
Source/DFPSR/api/fileAPI.h

@@ -1,6 +1,6 @@
 // zlib open source license
 //
-// Copyright (c) 2020 David Forsgren Piuva
+// Copyright (c) 2020 to 2022 David Forsgren Piuva
 // 
 // This software is provided 'as-is', without any express or implied
 // warranty. In no event will the authors be held liable for any damages
@@ -42,6 +42,7 @@ namespace dsr {
 	void file_saveBuffer(const ReadableString& filename, Buffer buffer);
 
 	// Get a path separator for the target operating system.
+	//   Can be used to construct a file path that works for both forward and backward slash separators.
 	const char32_t* file_separator();
 }
 

+ 35 - 24
Source/DFPSR/api/stringAPI.cpp

@@ -620,18 +620,20 @@ static void encodeCharacter(const ByteWriterFunction &receiver, DsrChar characte
 
 // Template for encoding a whole string
 template <CharacterEncoding characterEncoding, LineEncoding lineEncoding>
-static void encodeText(const ByteWriterFunction &receiver, String content) {
-	// Write byte order marks
-	if (characterEncoding == CharacterEncoding::BOM_UTF8) {
-		receiver(0xEF);
-		receiver(0xBB);
-		receiver(0xBF);
-	} else if (characterEncoding == CharacterEncoding::BOM_UTF16BE) {
-		receiver(0xFE);
-		receiver(0xFF);
-	} else if (characterEncoding == CharacterEncoding::BOM_UTF16LE) {
-		receiver(0xFF);
-		receiver(0xFE);
+static void encodeText(const ByteWriterFunction &receiver, String content, bool writeBOM, bool writeNullTerminator) {
+	if (writeBOM) {
+		// Write byte order marks
+		if (characterEncoding == CharacterEncoding::BOM_UTF8) {
+			receiver(0xEF);
+			receiver(0xBB);
+			receiver(0xBF);
+		} else if (characterEncoding == CharacterEncoding::BOM_UTF16BE) {
+			receiver(0xFE);
+			receiver(0xFF);
+		} else if (characterEncoding == CharacterEncoding::BOM_UTF16LE) {
+			receiver(0xFF);
+			receiver(0xFE);
+		}
 	}
 	// Write encoded content
 	for (int64_t i = 0; i < string_length(content); i++) {
@@ -647,33 +649,42 @@ static void encodeText(const ByteWriterFunction &receiver, String content) {
 			encodeCharacter<characterEncoding>(receiver, character);
 		}
 	}
+	if (writeNullTerminator) {
+		// Terminate internal strings with \0 to prevent getting garbage data after unpadded buffers
+		if (characterEncoding == CharacterEncoding::BOM_UTF16BE || characterEncoding == CharacterEncoding::BOM_UTF16LE) {
+			receiver(0);
+			receiver(0);
+		} else {
+			receiver(0);
+		}
+	}
 }
 
 // Macro for converting run-time arguments into template arguments for encodeText
-#define ENCODE_TEXT(RECEIVER, CONTENT, CHAR_ENCODING, LINE_ENCODING) \
+#define ENCODE_TEXT(RECEIVER, CONTENT, CHAR_ENCODING, LINE_ENCODING, WRITE_BOM, WRITE_NULL_TERMINATOR) \
 	if (CHAR_ENCODING == CharacterEncoding::Raw_Latin1) { \
 		if (LINE_ENCODING == LineEncoding::CrLf) { \
-			encodeText<CharacterEncoding::Raw_Latin1, LineEncoding::CrLf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::Raw_Latin1, LineEncoding::CrLf>(RECEIVER, CONTENT, false, WRITE_NULL_TERMINATOR); \
 		} else if (LINE_ENCODING == LineEncoding::Lf) { \
-			encodeText<CharacterEncoding::Raw_Latin1, LineEncoding::Lf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::Raw_Latin1, LineEncoding::Lf>(RECEIVER, CONTENT, false, WRITE_NULL_TERMINATOR); \
 		} \
 	} else if (CHAR_ENCODING == CharacterEncoding::BOM_UTF8) { \
 		if (LINE_ENCODING == LineEncoding::CrLf) { \
-			encodeText<CharacterEncoding::BOM_UTF8, LineEncoding::CrLf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::BOM_UTF8, LineEncoding::CrLf>(RECEIVER, CONTENT, WRITE_BOM, WRITE_NULL_TERMINATOR); \
 		} else if (LINE_ENCODING == LineEncoding::Lf) { \
-			encodeText<CharacterEncoding::BOM_UTF8, LineEncoding::Lf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::BOM_UTF8, LineEncoding::Lf>(RECEIVER, CONTENT, WRITE_BOM, WRITE_NULL_TERMINATOR); \
 		} \
 	} else if (CHAR_ENCODING == CharacterEncoding::BOM_UTF16BE) { \
 		if (LINE_ENCODING == LineEncoding::CrLf) { \
-			encodeText<CharacterEncoding::BOM_UTF16BE, LineEncoding::CrLf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::BOM_UTF16BE, LineEncoding::CrLf>(RECEIVER, CONTENT, WRITE_BOM, WRITE_NULL_TERMINATOR); \
 		} else if (LINE_ENCODING == LineEncoding::Lf) { \
-			encodeText<CharacterEncoding::BOM_UTF16BE, LineEncoding::Lf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::BOM_UTF16BE, LineEncoding::Lf>(RECEIVER, CONTENT, WRITE_BOM, WRITE_NULL_TERMINATOR); \
 		} \
 	} else if (CHAR_ENCODING == CharacterEncoding::BOM_UTF16LE) { \
 		if (LINE_ENCODING == LineEncoding::CrLf) { \
-			encodeText<CharacterEncoding::BOM_UTF16LE, LineEncoding::CrLf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::BOM_UTF16LE, LineEncoding::CrLf>(RECEIVER, CONTENT, WRITE_BOM, WRITE_NULL_TERMINATOR); \
 		} else if (LINE_ENCODING == LineEncoding::Lf) { \
-			encodeText<CharacterEncoding::BOM_UTF16LE, LineEncoding::Lf>(RECEIVER, CONTENT); \
+			encodeText<CharacterEncoding::BOM_UTF16LE, LineEncoding::Lf>(RECEIVER, CONTENT, WRITE_BOM, WRITE_NULL_TERMINATOR); \
 		} \
 	}
 
@@ -686,19 +697,19 @@ void dsr::string_save(const ReadableString& filename, const ReadableString& cont
 	}
 }
 
-Buffer dsr::string_saveToMemory(const ReadableString& content, CharacterEncoding characterEncoding, LineEncoding lineEncoding) {
+Buffer dsr::string_saveToMemory(const ReadableString& content, CharacterEncoding characterEncoding, LineEncoding lineEncoding, bool writeByteOrderMark, bool writeNullTerminator) {
 	int64_t byteCount = 0;
 	ByteWriterFunction counter = [&byteCount](uint8_t value) {
 		byteCount++;
 	};
-	ENCODE_TEXT(counter, content, characterEncoding, lineEncoding);
+	ENCODE_TEXT(counter, content, characterEncoding, lineEncoding, writeByteOrderMark, writeNullTerminator);
 	Buffer result = buffer_create(byteCount);
 	SafePointer<uint8_t> byteWriter = buffer_getSafeData<uint8_t>(result, "Buffer for string encoding");
 	ByteWriterFunction receiver = [&byteWriter](uint8_t value) {
 		*byteWriter = value;
 		byteWriter += 1;
 	};
-	ENCODE_TEXT(receiver, content, characterEncoding, lineEncoding);
+	ENCODE_TEXT(receiver, content, characterEncoding, lineEncoding, writeByteOrderMark, writeNullTerminator);
 	return result;
 }
 

+ 5 - 1
Source/DFPSR/api/stringAPI.h

@@ -290,9 +290,13 @@ void string_save(const ReadableString& filename, const ReadableString& content,
   LineEncoding lineEncoding = LineEncoding::CrLf
 );
 // Encode the string and keep the raw buffer instead of saving it to a file.
+// Disabling writeByteOrderMark can be done when the result is casted to a native string for platform specific APIs, where a BOM is not allowed.
+// Enabling writeNullTerminator should be done when using the result as a pointer, so that the length is known when the buffer does not have padding.
 Buffer string_saveToMemory(const ReadableString& content,
   CharacterEncoding characterEncoding = CharacterEncoding::BOM_UTF8,
-  LineEncoding lineEncoding = LineEncoding::CrLf
+  LineEncoding lineEncoding = LineEncoding::CrLf,
+  bool writeByteOrderMark = true,
+  bool writeNullTerminator = false
 );
 
 // Post-condition: Returns true iff strings a and b are exactly equal.