pascal/laz-doublcmd: Double Commander is a free cross platform open source file manager with two panels side by side. website: doublecmd.sourceforge.io #cross-platform #file-manager #double-commander #lazarus #project #pascal #fpc #freepascal @ v1.1.32

Alexander Koblov 4363a1eb9b UPD: Component address		3 vuotta sitten
..
src	80d795cfaf FIX: Disable UCS2BELangModel and UCS2LELangModel, it is often mistaken	6 vuotta sitten
Licence.txt	aa6f55a44a UPD: Set svn:eol-style property to "native"	13 vuotta sitten
ReadMe.txt	4363a1eb9b UPD: Component address	3 vuotta sitten
chsdet.lpk	877ea0897a UPD: Charset Detector	6 vuotta sitten
chsdet.pas	877ea0897a UPD: Charset Detector	6 vuotta sitten

		
			
			
				ReadMe.txt
			
		
		
	
			
				-----------Summary
Charset Detector - as the name says - is a stand alone component for automatic charset detection of a given text.
It can be useful for internationalisation support in multilingual applications such as web-script editors or Unicode editors.
Given input buffer will be analysed to guess used encoding. The result can be used as control parameter for charset conversation procedure.
Based on Mozilla's i18n component - https://dxr.mozilla.org/mozilla/source/extensions/universalchardet/.

-----------State
Version 0.2.9 stable.
Copyright (C) 2011-2019 Alexander Koblov
The latest version can be found at https://github.com/doublecmd/doublecmd/tree/master/components/chsdet/.

-----------Original
Based on
Charset Detector - http://chsdet.sourceforge.net
Copyright (C) 2006-2013 Nikolaj Yakowlew

-----------Requirements
Charset Detector doesn't need any external components.

-----------Output
As result you will get guessed charset as MS Windows Code Page id and charset name.

-----------Licence
Charset Detector is open source project and distributed under GNU LGPL.
See the GNU Lesser General Public License for more details - https://opensource.org/licenses/LGPL-2.1

-----------Supported charsets

 +-----------+---------------------------+------------------------+
 | Code pade |           Name            |      Note              |
 +-----------+---------------------------+------------------------+
 |      0    |  ASCII                    |   Pseudo code page.    |
 |    855    |  IBM855                   |                        |
 |    866    |  IBM866                   |                        |
 |    932    |  Shift_JIS                |                        |
 |    950    |  Big5                     |                        |
 |   1200    |  UTF-16LE                 |                        |
 |   1201    |  UTF-16BE                 |                        |
 |   1251    |  windows-1251             |                        |
 |   1252    |  windows-1252             |                        |
 |   1253    |  windows-1253             |                        |
 |   1255    |  windows-1255             |                        |
 |  10007    |  x-mac-cyrillic           |                        |
 |  12000    |  X-ISO-10646-UCS-4-2143   |                        |
 |  12000    |  UTF-32LE                 |                        |
 |  12001    |  X-ISO-10646-UCS-4-3412   |                        |
 |  12001    |  UTF-32BE                 |                        |
 |  20866    |  KOI8-R                   |                        |
 |  28595    |  ISO-8859-5               |                        |
 |  28595    |  ISO-8859-5               |                        |
 |  28597    |  ISO-8859-7               |                        |
 |  28598    |  ISO-8859-8               |                        |
 |  50222    |  ISO-2022-JP              |                        |
 |  50225    |  ISO-2022-KR              |                        |
 |  50227    |  ISO-2022-CN              |                        |
 |  51932    |  EUC-JP                   |                        |
 |  51936    |  x-euc-tw                 |                        |
 |  51949    |  EUC-KR                   |                        |
 |  52936    |  HZ-GB-2312               |                        |
 |  54936    |  GB18030                  |                        |
 |  65001    |  UTF-8                    |                        |
 +-----------+---------------------------+------------------------+

-----------Types
Return values

  NS_OK = 0;
  NS_ERROR_OUT_OF_MEMORY = $8007000e;

Returned types

  rCharsetInfo = record
    Name: PAnsiChar;      // Charset GNU canonical name
    CodePage: Integer;    // MS Windows CodePage ID
    Language: PAnsiChar;
  end;

-----------Usage sample

Below is a small usage sample in Free Pascal.

function DetectEncoding(const S: String): rCharsetInfo;
var
  Detector: TnsUniversalDetector;
begin
  Detector:= TnsUniversalDetector.Create;
  try
    Detector.Reset;
    Detector.HandleData(PAnsiChar(S), Length(S));
    if not Detector.Done then Detector.DataEnd;
    Result:= Detector.GetDetectedCharsetInfo;
  finally
    FreeAndNil(Detector);
  end;
end;