Browse Source

* Patch from Sergei Gorelkin:
fcl-xml/src/dom.pp: resolved a number of Level 1 conformance issues:

* Node.Normalize always deletes empty text nodes
* Node.Normalize is recursive into Attributes
* Node.InsertBefore corrected exception code in case when RefChild is
not one of node's children
+ Node.InsertBefore added missing check for possible cycle in tree
+ Node.AppendChild and Node.InsertBefore added checking type of NewChild
+ CloneNode enabled for Fragment and Entity
- CloneNode deleted for DocumentType (w3 specs directly prohibit cloning
it between documents, and cloning within one document is claimed
'implementation specific' - but makes no sense).
+ Node.ImportNode is now working

* Uncommented Level 2 node properties (NamespaceURI, localName and
Prefix), this caused a name clash and a lot of function argument
renames.

fcl-xml/src/xmlutils.pp:

+ overloaded IsXmlName() that accepts PWideChars

fcl-xml/src/xmlconf.pp

* Applied a fix similar to xmlcfg.pp for Mantis #10554

fcl-xml/src/xmlread.pp:

* Major: Got errors reported at correct locations for all 1600+ negative
tests. Easy to say, but it required modifying almost every second
line of code.
* TContentParticle references an existing element definition instead of
storing its own name (this allows content model matching without
string comparisons).
* Resorted to old-style 'object' for TElementValidator and to plain
procedures for decoders (allows to drop almost all related memory
management).
* Moved parameter entity detection from char to token level, this
simplifies things a bit.
+ Added second level of buffering to input source (a step towards
supporting arbitrary encodings).
* The main parsing loop contains no implicit exception frames now.


fcl-xml/src/xmlwrite.pp

* Replaced the stupid indenting algorithm with a simple rule: "Do not
write indents adjacent to text nodes". Now it does not make a mess
out of the documents which were parsed with PreserveWhitespace=True.
* Use specialized node properties instead of generic ones, this
eliminates WideString copies and results in almost 2x performance
boost in Windows.
* Even more performance:
* Write line endings together with indents -> twice less calls.
* Increase slack in buffer and write strings with known length (i.e.
most of markup) without overflow checking.

fcl-xml/tests/xmlts.pp:

* Use parser options instead of dedicated procedure to 'canonicalize'
documents, the parser has become mature enough to do that.
* Fatal error in non-valid category is a test failure, as well as
validation error alone in not-wellformed category.

fcl-xml/src/README

* Brought a bit up to date

fcl-xml/tests/README

+ Added testsuite errata/issues

git-svn-id: trunk@10314 -

michael 17 years ago
parent
commit
77b38b6be5

+ 14 - 13
packages/fcl-xml/src/README

@@ -9,33 +9,34 @@ DOM level 2 extensions.
 
 XMLRead
 -------
-Provides a simple XML reader, which can read XML data from a file or stream.
-This simple parser will be replaced by a much improved one soon, which will be
-able to handle different character encodings, namespaces and entity references.
-(This parser reads the file directly, i.e. it doesn't support Unicode or
-different charsets yet.)
+Provides an XML reader, which can read XML data from a file or stream.
+The parser can read files encoded in UTF-8, UTF-16 (both endianness),
+and ISO-8859-1. It supports DTD validation.
 Regarding entity references: The pre-defined entities "lt", "gt", "amp", "apos"
-and "quot" are replaced by their normal value during reading. Other entity
-references are stored as TDOMEntityReference nodes in the DOM tree.
-Regarding whitespace handling: Whitespace directly after the beginning of a
+and "quot", and internal entities declared in DTD, are replaced by their
+defined values during reading. Ability to resolve external entities is
+currently limited to the file system.
+Regarding whitespace handling: By default, whitespace directly after the beginning of a
 tag is discarded, and sections of the XML file which contain only whitespace and
-no other text content are discarded as well.
+no other text content are discarded as well. However, whitespace-preserving
+mode can be enabled by setting TDOMParser.Options.PreserveWhitespace property to
+True.
 
 
 XMLWrite
 --------
 Writes a DOM structure as XML data into a file or stream. It can deal both with
 XML files and XML fragments.
-At the moment it supports only the node types which can be read by XMLRead.
+At the moment it supports only the UTF-8 output endcoding.
 Please note that the writer replaces some characters by entity references
 automatically:
 For normal text nodes, the following replacements will be done:
 '<' => '&lt;'
 '>' => '&gt;'
 '&' => '&amp;'
-For attribute values, additionally '"' gets replaced by '&quot;'. Single
-apostrophes (') don't need to get converted, as values are already written using
-"" quotes.
+For attribute values, additionally '"' gets replaced by '&quot;', and characters
+#9, #10 and #13 are escaped using numerical references. Single apostrophes (')
+don't need to get converted, as values are already written using "" quotes.
 The XML reader (in xmlread.pp) will convert these entity references back to
 their original characters.
 

+ 212 - 118
packages/fcl-xml/src/dom.pp

@@ -197,7 +197,7 @@ type
     FPreviousSibling, FNextSibling: TDOMNode;
     FOwnerDocument: TDOMDocument;
 
-    function  GetNodeName: DOMString; virtual;
+    function  GetNodeName: DOMString; virtual; abstract;
     function  GetNodeValue: DOMString; virtual;
     procedure SetNodeValue(const AValue: DOMString); virtual;
     function  GetFirstChild: TDOMNode; virtual;
@@ -209,6 +209,8 @@ type
     procedure SetTextContent(const AValue: DOMString); virtual;
     function GetLocalName: DOMString; virtual;
     function GetNamespaceURI: DOMString; virtual;
+    function GetPrefix: DOMString; virtual;
+    procedure SetPrefix(const Value: DOMString); virtual;
   public
     constructor Create(AOwner: TDOMDocument);
     destructor Destroy; override;
@@ -242,16 +244,11 @@ type
     function Supports(const Feature, Version: DOMString): Boolean;
     *)
     function HasAttributes: Boolean; virtual;
-    procedure Normalize;
+    procedure Normalize; virtual;
 
-    // always '' for nodes other than ELEMENT and ATTRIBUTE
-    // as well as for nodes created with DOM 1 methods
-    //property NamespaceURI: DOMString read GetNamespaceURI;
-    //property LocalName: DOMString read GetLocalName;
-    (*
-    // Prefix may only be changed if it was specified at creation time.
-    property Prefix: DOMString read FPrefix (write SetPrefix?);
-    *)
+    property NamespaceURI: DOMString read GetNamespaceURI;
+    property LocalName: DOMString read GetLocalName;
+    property Prefix: DOMString read GetPrefix write SetPrefix;
     // DOM level 3
     property TextContent: DOMString read GetTextContent write SetTextContent;
     // Extensions to DOM interface:
@@ -327,13 +324,14 @@ type
 
   TDOMNamedNodeMap = class(TObject)
   protected
-    FOwnerElement: TDOMNode;
+    FOwner: TDOMNode;
     FNodeType: Integer;
     FList: TList;
     function GetItem(index: LongWord): TDOMNode;
     function GetLength: LongWord;
     function Find(const name: DOMString; out Index: LongWord): Boolean;
     function InternalRemove(const name: DOMString): TDOMNode;
+    function ValidateInsert(arg: TDOMNode): Integer;
   public
     constructor Create(AOwner: TDOMNode; ANodeType: Integer);
     destructor Destroy; override;
@@ -399,6 +397,8 @@ type
   protected
     function GetNodeType: Integer; override;
     function GetNodeName: DOMString; override;
+  public
+    function CloneNode(deep: Boolean; ACloneOwner: TDOMDocument): TDOMNode; overload; override;
   end;
 
 
@@ -445,12 +445,11 @@ type
 
     // DOM level 2 methods
     function ImportNode(ImportedNode: TDOMNode; Deep: Boolean): TDOMNode;
-    function CreateElementNS(const NamespaceURI, QualifiedName: DOMString): TDOMElement;
-    function CreateAttributeNS(const NamespaceURI, QualifiedName: DOMString): TDOMAttr;
-    function GetElementsByTagNameNS(const namespaceURI, localName: DOMString): TDOMNodeList;
+    function CreateElementNS(const nsURI, QualifiedName: DOMString): TDOMElement;
+    function CreateAttributeNS(const nsURI, QualifiedName: DOMString): TDOMAttr;
+    function GetElementsByTagNameNS(const nsURI, alocalName: DOMString): TDOMNodeList;
     function GetElementById(const ElementID: DOMString): TDOMElement;
     // Extensions to DOM interface:
-    // TODO: obsolete now, but must check for usage dependencies
     constructor Create;
     destructor Destroy; override;
     function AddID(Attr: TDOMAttr): Boolean;
@@ -492,9 +491,8 @@ type
   protected
     FName: DOMString;
     FOwnerElement: TDOMElement;
-    // TODO: following 3 - replace with a link to AttDecl ??
-    // ('specified' isn't related...)
     FSpecified: Boolean;
+    // TODO: following 2 - replace with a link to AttDecl ??    
     FDeclared: Boolean;
     FDataType: TAttrDataType;
     function  GetNodeValue: DOMString; override;
@@ -528,6 +526,7 @@ type
     destructor Destroy; override;
     function  CloneNode(deep: Boolean; ACloneOwner: TDOMDocument): TDOMNode; overload; override;
     property  TagName: DOMString read FNodeName;
+    procedure Normalize; override;
     function  GetAttribute(const name: DOMString): DOMString;
     procedure SetAttribute(const name, value: DOMString);
     procedure RemoveAttribute(const name: DOMString);
@@ -538,14 +537,14 @@ type
     function  GetElementsByTagName(const name: DOMString): TDOMNodeList;
 
     // Introduced in DOM Level 2:
-    function GetAttributeNS(const namespaceURI, localName: DOMString): DOMString;
-    procedure SetAttributeNS(const namespaceURI, qualifiedName, value: DOMString);
-    procedure RemoveAttributeNS(const namespaceURI, localName: DOMString);
-    function GetAttributeNodeNS(const namespaceURI, localName: DOMString): TDOMAttr;
+    function GetAttributeNS(const nsURI, aLocalName: DOMString): DOMString;
+    procedure SetAttributeNS(const nsURI, qualifiedName, value: DOMString);
+    procedure RemoveAttributeNS(const nsURI, aLocalName: DOMString);
+    function GetAttributeNodeNS(const nsURI, aLocalName: DOMString): TDOMAttr;
     function SetAttributeNodeNS(newAttr: TDOMAttr): TDOMAttr;
-    function GetElementsByTagNameNS(const namespaceURI, localName: DOMString): TDOMNodeList;
+    function GetElementsByTagNameNS(const nsURI, aLocalName: DOMString): TDOMNodeList;
     function hasAttribute(const name: DOMString): Boolean;
-    function hasAttributeNS(const namespaceURI, localName: DOMString): Boolean;
+    function hasAttributeNS(const nsURI, aLocalName: DOMString): Boolean;
     function HasAttributes: Boolean; override;
     // extension
     function CompareName(const name: DOMString): Integer; override;
@@ -619,7 +618,6 @@ type
     function GetElementDefs: TDOMNamedNodeMap;
   public
     destructor Destroy; override;
-    function CloneNode(deep: Boolean; ACloneOwner: TDOMDocument): TDOMNode; overload; override;
     property Name: DOMString read FName;
     property Entities: TDOMNamedNodeMap read GetEntities;
     property Notations: TDOMNamedNodeMap read GetNotations;
@@ -660,6 +658,7 @@ type
     function GetNodeType: Integer; override;
     function GetNodeName: DOMString; override;
   public
+    function CloneNode(deep: Boolean; aCloneOwner: TDOMDocument): TDOMNode; override;
     property PublicID: DOMString read FPublicID;
     property SystemID: DOMString read FSystemID;
     property NotationName: DOMString read FNotationName;
@@ -820,11 +819,6 @@ begin
   inherited Destroy;
 end;
 
-function TDOMNode.GetNodeName: DOMString;
-begin
-  Result := '';
-end;
-
 function TDOMNode.GetNodeValue: DOMString;
 begin
   Result := '';
@@ -899,7 +893,8 @@ end;
 
 function TDOMNode.CloneNode(deep: Boolean; ACloneOwner: TDOMDocument): TDOMNode;
 begin
-  raise EDOMNotSupported.CreateFmt('CloneNode not implemented for %s', [ClassName]);
+// !! CreateFmt() does not set Code property !!
+  raise EDOMNotSupported.Create(Format('Cloning/importing of %s is not supported', [ClassName]));
   Result:=nil;
 end;
 
@@ -919,7 +914,6 @@ begin
   Result := False;
 end;
 
-// DONE: moved to TDOMNode and implemented
 procedure TDOMNode.Normalize;
 var
   Child, tmp: TDOMNode;
@@ -932,19 +926,23 @@ begin
   begin
     if Child.NodeType = TEXT_NODE then
     begin
-      if Assigned(Txt) then
-      begin
-        tmp := Child.NextSibling;
-        Txt.AppendData(TDOMText(Child).Data);
-        Txt.FMayBeIgnorable := Txt.FMayBeIgnorable and TDOMText(Child).FMayBeIgnorable;
-        RemoveChild(Child);
-        Child := tmp;
-      end
-      else
+      tmp := Child.NextSibling;
+      if TDOMText(Child).Data <> '' then
       begin
-        Txt := TDOMText(Child);
-        Child := Child.NextSibling;
-      end
+        if Assigned(Txt) then
+        begin
+          Txt.AppendData(TDOMText(Child).Data);
+          Txt.FMayBeIgnorable := Txt.FMayBeIgnorable and TDOMText(Child).FMayBeIgnorable;
+        end
+        else
+        begin
+          Txt := TDOMText(Child);
+          Child := Child.NextSibling;
+          Continue;
+        end;
+      end;
+      Child.Free;
+      Child := tmp;
     end
     else
     begin
@@ -975,6 +973,16 @@ begin
   Result := '';
 end;
 
+function TDOMNode.GetPrefix: DOMString;
+begin
+  Result := '';
+end;
+
+procedure TDOMNode.SetPrefix(const Value: DOMString);
+begin
+  // do nothing, override for Elements and Attributes
+end;
+
 function CompareDOMStrings(const s1, s2: DOMPChar; l1, l2: integer): integer;
 var i: integer;
 begin
@@ -1008,6 +1016,28 @@ begin
   Result := TDOMNode(ANode).CompareName(PDOMString(AKey)^);
 end;
 
+type
+  TNodeTypeEnum = ELEMENT_NODE..NOTATION_NODE;
+  TNodeTypeSet = set of TNodeTypeEnum;
+
+const
+  stdChildren = [TEXT_NODE, ENTITY_REFERENCE_NODE, PROCESSING_INSTRUCTION_NODE,
+                 COMMENT_NODE, CDATA_SECTION_NODE, ELEMENT_NODE];
+
+  ValidChildren: array [TNodeTypeEnum] of TNodeTypeSet = (
+   stdChildren, { element }
+   [TEXT_NODE, ENTITY_REFERENCE_NODE], { attribute }
+   [], { text }
+   [], { cdata }
+   stdChildren, { ent ref }
+   stdChildren, { entity }
+   [], { pi }
+   [], { comment }
+   [ELEMENT_NODE, DOCUMENT_TYPE_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE], { document }
+   [], { doctype }
+   stdChildren, { fragment }
+   []  { notation }
+  );
 
 function TDOMNode_WithChildren.GetFirstChild: TDOMNode;
 begin
@@ -1030,6 +1060,7 @@ function TDOMNode_WithChildren.InsertBefore(NewChild, RefChild: TDOMNode):
   TDOMNode;
 var
   Tmp: TDOMNode;
+  NewChildType: Integer;
 begin
   Result := NewChild;
 
@@ -1043,15 +1074,24 @@ begin
     raise EDOMWrongDocument.Create('NodeWC.InsertBefore');
 
   if RefChild.ParentNode <> Self then
-    raise EDOMHierarchyRequest.Create('NodeWC.InsertBefore');
+    raise EDOMNotFound.Create('NodeWC.InsertBefore');
 
-  Inc(FOwnerDocument.FRevision); // invalidate nodelists
+  NewChildType := NewChild.NodeType;
+  if not (NewChildType in [TEXT_NODE, CDATA_SECTION_NODE, COMMENT_NODE, PROCESSING_INSTRUCTION_NODE]) and (NewChild.FirstChild <> nil) then
+  begin
+    Tmp := Self;
+    while Assigned(Tmp) do
+    begin
+      if Tmp = NewChild then
+        raise EDOMHierarchyRequest.Create('NodeWC.InsertBefore (cycle in tree)');
+      Tmp := Tmp.ParentNode;
+    end;
+  end;
 
-  if Assigned(NewChild.FParentNode) then
-    NewChild.FParentNode.DetachChild(NewChild);
+  Inc(FOwnerDocument.FRevision); // invalidate nodelists
 
   // DONE: Implemented InsertBefore for DocumentFragments (except ChildNodeTree)
-  if NewChild.NodeType = DOCUMENT_FRAGMENT_NODE then
+  if NewChildType = DOCUMENT_FRAGMENT_NODE then
   begin
     // Is fragment empty?
     Tmp := NewChild.FirstChild;
@@ -1068,8 +1108,8 @@ begin
     if (RefChild = nil) or (RefChild.FPreviousSibling = nil) then
     begin  // insert at the beginning  <- AppendChild ??? ->
       // no, AppendChild will insert after RefChild and we need it before
-      if Assigned(FirstChild) then
-        FirstChild.FPreviousSibling := NewChild.LastChild;
+      if Assigned(FFirstChild) then
+        FFirstChild.FPreviousSibling := NewChild.LastChild;
       NewChild.LastChild.FNextSibling := FirstChild;
       if not Assigned(FLastChild) then
         FLastChild := NewChild.LastChild;
@@ -1089,6 +1129,12 @@ begin
     Exit;
   end;
 
+  if not (NewChildType in ValidChildren[NodeType]) then
+    raise EDOMHierarchyRequest.Create('NodeWC.InsertBefore');
+
+  if Assigned(NewChild.FParentNode) then
+    NewChild.FParentNode.DetachChild(NewChild);
+
   NewChild.FNextSibling := RefChild;
   if RefChild = FFirstChild then
     FFirstChild := NewChild
@@ -1142,25 +1188,28 @@ end;
 function TDOMNode_WithChildren.AppendChild(NewChild: TDOMNode): TDOMNode;
 var
   Tmp: TDOMNode;
+  NewChildType: Integer;
 begin
   if NewChild.FOwnerDocument <> FOwnerDocument then
     raise EDOMWrongDocument.Create('NodeWC.AppendChild');
 
-  Tmp := Self;
-  while Assigned(Tmp) do
+  // Don't walk the tree if NewChild apriori cannot be our parent.
+  // This saves a lot of CPU cache misses.
+  NewChildType := NewChild.NodeType;
+  if not (NewChildType in [TEXT_NODE, CDATA_SECTION_NODE, COMMENT_NODE, PROCESSING_INSTRUCTION_NODE]) and (NewChild.FirstChild <> nil) then
   begin
-    if Tmp = NewChild then
-      raise EDOMHierarchyRequest.Create('NodeWC.AppendChild (cycle in tree)');
-    Tmp := Tmp.ParentNode;
+    Tmp := Self;
+    while Assigned(Tmp) do
+    begin
+      if Tmp = NewChild then
+        raise EDOMHierarchyRequest.Create('NodeWC.AppendChild (cycle in tree)');
+      Tmp := Tmp.ParentNode;
+    end;
   end;
-
   Inc(FOwnerDocument.FRevision); // invalidate nodelists
 
-  if Assigned(NewChild.FParentNode) then
-    NewChild.FParentNode.DetachChild(NewChild);
-
   // DONE: supported AppendChild for DocumentFragments (except ChildNodeTree)
-  if NewChild.NodeType = DOCUMENT_FRAGMENT_NODE then
+  if NewChildType = DOCUMENT_FRAGMENT_NODE then
   begin
     Tmp := NewChild.FirstChild;
     // Is fragment empty?
@@ -1187,6 +1236,12 @@ begin
   end
   else
   begin
+    if not (NewChildType in ValidChildren[NodeType]) then
+      raise EDOMHierarchyRequest.Create('NodeWC.AppendChild');
+
+    if Assigned(NewChild.FParentNode) then
+      NewChild.FParentNode.DetachChild(NewChild);
+
     if Assigned(FFirstChild) then
     begin
       FLastChild.FNextSibling := NewChild;
@@ -1243,7 +1298,7 @@ begin
   begin
     next := child.NextSibling;
     child.FParentNode := nil;
-    child.Free;
+    child.Destroy;   // we know it's not nil, so save a call
     child := next;
   end;
   FFirstChild := nil;
@@ -1369,7 +1424,7 @@ begin
   while Assigned(Child) and (Child <> FNode) do
   begin
     if (Child.NodeType = ELEMENT_NODE) and (not UseFilter or (TDOMElement(Child).TagName = filter)) then
-      FList.Add(Child);
+          FList.Add(Child);
     // recursive track node hierarchy  
     if Assigned(Child.FirstChild) then
       Child := Child.FirstChild
@@ -1395,7 +1450,7 @@ end;
 constructor TDOMNamedNodeMap.Create(AOwner: TDOMNode; ANodeType: Integer);
 begin
   inherited Create;
-  FOwnerElement := AOwner;
+  FOwner := AOwner;
   FNodeType := ANodeType;
   FList := TList.Create;
 end;
@@ -1461,28 +1516,40 @@ function TDOMNamedNodeMap.GetNamedItemNS(const namespaceURI, localName: DOMStrin
 begin
   // TODO: implement TDOMNamedNodeMap.GetNamedItemNS
   raise EDOMNotSupported.Create('TDOMNamedNodeMap.GetNamedItemNS');
-  Result := nil;
+    Result := nil;
+end;
+
+function TDOMNamedNodeMap.ValidateInsert(arg: TDOMNode): Integer;
+var
+  AttrOwner: TDOMNode;
+begin
+  Result := 0;
+  if arg.FOwnerDocument <> FOwner.FOwnerDocument then
+    Result := WRONG_DOCUMENT_ERR
+  else if arg.NodeType <> FNodeType then
+    Result := HIERARCHY_REQUEST_ERR
+  else if (FNodeType = ATTRIBUTE_NODE) then
+  begin
+    AttrOwner := TDOMAttr(arg).ownerElement;
+    if Assigned(AttrOwner) and (AttrOwner <> FOwner) then
+      Result := INUSE_ATTRIBUTE_ERR;
+  end;
 end;
 
 function TDOMNamedNodeMap.SetNamedItem(arg: TDOMNode): TDOMNode;
 var
   i: Cardinal;
-  AttrOwner: TDOMElement;
   Exists: Boolean;
+  res: Integer;
 begin
-  if arg.FOwnerDocument <> FOwnerElement.FOwnerDocument then
-    raise EDOMWrongDocument.Create('NamedNodeMap.SetNamedItem');
-
-  if arg.NodeType <> FNodeType then
-    raise EDOMHierarchyRequest.Create('NamedNodeMap.SetNamedItem');
+  res := ValidateInsert(arg);
+  if res <> 0 then
+    raise EDOMError.Create(res, 'NamedNodeMap.SetNamedItem');
 
   if FNodeType = ATTRIBUTE_NODE then
   begin
-    AttrOwner := TDOMAttr(arg).ownerElement;
-    if Assigned(AttrOwner) and (AttrOwner <> FOwnerElement) then
-      raise EDOMInUseAttribute.Create('NamedNodeMap.SetNamedItem');
-    TDOMAttr(arg).FOwnerElement := TDOMElement(FOwnerElement);
-    Exists := Find(TDOMAttr(arg).FName, i); // optimization
+    TDOMAttr(arg).FOwnerElement := TDOMElement(FOwner);
+    Exists := Find(TDOMAttr(arg).Name, i); // optimization
   end
   else
     Exists := Find(arg.NodeName, i);
@@ -1498,9 +1565,15 @@ begin
 end;
 
 function TDOMNamedNodeMap.SetNamedItemNS(arg: TDOMNode): TDOMNode;
+var
+  res: Integer;
 begin
   // TODO: implement TDOMNamedNodeMap.SetNamedItemNS
-  Result := nil;
+  res := ValidateInsert(arg);
+  if res <> 0 then
+    raise EDOMError.Create(res, 'NamedNodeMap.SetNamedItemNS');
+
+    Result := nil;
 end;
 
 function TDOMNamedNodeMap.InternalRemove(const name: DOMString): TDOMNode;
@@ -1601,6 +1674,13 @@ begin
   Result := '#document-fragment';
 end;
 
+function TDOMDocumentFragment.CloneNode(deep: Boolean; aCloneOwner: TDOMDocument): TDOMNode;
+begin
+  Result := aCloneOwner.CreateDocumentFragment;
+  if deep then
+    CloneChildren(Result, aCloneOwner);
+end;
+
 // -------------------------------------------------------
 //   DOMImplementation
 // -------------------------------------------------------
@@ -1641,17 +1721,22 @@ var
   Root: TDOMNode;
 begin
   // TODO: This method is not usable yet due to CreateElementNS...
-  Result := TDOMDocument.Create;
+  Result := TXMLDocument.Create;
   Result.FImplementation := Self;
-  if Assigned(doctype) then
-  begin
-    if Assigned(doctype.OwnerDocument) then
-      raise EDOMWrongDocument.Create('DOMImplementation.CreateDocument');
-    Doctype.FOwnerDocument := Result;
-    Result.AppendChild(doctype);
-  end;  
-  Root := Result.CreateElementNS(NamespaceURI, QualifiedName);
-  Result.AppendChild(Root);
+  try
+    if Assigned(doctype) then
+    begin
+      if Assigned(doctype.OwnerDocument) then
+        raise EDOMWrongDocument.Create('Implementation.CreateDocument');
+      Doctype.FOwnerDocument := Result;
+      Result.AppendChild(doctype);
+    end;
+    Root := Result.CreateElementNS(NamespaceURI, QualifiedName);
+    Result.AppendChild(Root);
+  except
+    Result.Free;
+    raise;
+  end;
 end;
 
 
@@ -1867,12 +1952,12 @@ begin
   Result := TDOMElementList.Create(Self, tagname);
 end;
 
-function TDOMDocument.GetElementsByTagNameNS(const namespaceURI, localName: DOMString): TDOMNodeList;
+function TDOMDocument.GetElementsByTagNameNS(const nsURI, aLocalName: DOMString): TDOMNodeList;
 begin
-  Result := TDOMElementList.Create(Self, namespaceURI, localName);
+  Result := TDOMElementList.Create(Self, nsURI, aLocalName);
 end;
 
-function TDOMDocument.CreateAttributeNS(const NamespaceURI,
+function TDOMDocument.CreateAttributeNS(const nsURI,
   QualifiedName: DOMString): TDOMAttr;
 begin
   // TODO: Implement TDOMDocument.CreateAttributeNS
@@ -1880,7 +1965,7 @@ begin
   Result := nil;
 end;
 
-function TDOMDocument.CreateElementNS(const NamespaceURI,
+function TDOMDocument.CreateElementNS(const nsURI,
   QualifiedName: DOMString): TDOMElement;
 begin
   // TODO: Implement TDOMDocument.CreateElementNS
@@ -1895,21 +1980,19 @@ begin
   if Assigned(FIDList) and FindID(ElementID, I) then
     Result := PIDItem(FIDList.List^[I])^.Element
   else
-    Result := nil;
+  Result := nil;
 end;
 
 function TDOMDocument.ImportNode(ImportedNode: TDOMNode;
   Deep: Boolean): TDOMNode;
 begin
-  // TODO: Implement TDOMDocument.ImportNode
-  raise EDOMNotSupported.Create('TDOMDocument.ImportNode');
-  Result := nil;
+  Result := ImportedNode.CloneNode(Deep, Self);
 end;
 
 function TDOMDocument.IndexOfNS(const nsURI: DOMString): Integer;
 begin
   // TODO: implement
-  Result := -1;
+    Result := -1;
 end;
 
 
@@ -1937,6 +2020,7 @@ begin
     raise EDOMError.Create(INVALID_CHARACTER_ERR, 'XMLDocument.CreateEntityReference');
   Result := TDOMEntityReference.Create(Self);
   Result.FName := name;
+  // TODO: if entity is known, its child list must be cloned or so.
 end;
 
 procedure TXMLDocument.SetXMLVersion(const aValue: DOMString);
@@ -2024,6 +2108,16 @@ begin
     CloneChildren(Result, ACloneOwner);
 end;
 
+procedure TDOMElement.Normalize;
+var
+  I: Integer;
+begin
+  if Assigned(FAttributes) then
+    for I := 0 to FAttributes.Length - 1 do
+      FAttributes[I].Normalize;
+  inherited Normalize;    
+end;
+
 function TDOMElement.GetAttributes: TDOMNamedNodeMap;
 begin
   if FAttributes=nil then
@@ -2044,14 +2138,14 @@ begin
   end;
 end;
 
-function TDOMElement.GetAttributeNS(const namespaceURI, localName: DOMString): DOMString;
+function TDOMElement.GetAttributeNS(const nsURI, aLocalName: DOMString): DOMString;
 var
   Attr: TDOMNode;
 begin
   SetLength(Result, 0);
   if Assigned(FAttributes) then
   begin
-    Attr := FAttributes.GetNamedItemNS(namespaceURI, localName);
+    Attr := FAttributes.GetNamedItemNS(nsURI, aLocalName);
     if Assigned(Attr) then
       Result := Attr.NodeValue;
   end;
@@ -2079,22 +2173,22 @@ begin
     FAttributes.InternalRemove(name).Free;
 end;
 
-procedure TDOMElement.RemoveAttributeNS(const namespaceURI,
-  localName: DOMString);
+procedure TDOMElement.RemoveAttributeNS(const nsURI,
+  aLocalName: DOMString);
 begin
   // TODO: Implement TDOMElement.RemoveAttributeNS
   raise EDOMNotSupported.Create('TDOMElement.RemoveAttributeNS');
 end;
 
-procedure TDOMElement.SetAttributeNS(const namespaceURI, qualifiedName,
+procedure TDOMElement.SetAttributeNS(const nsURI, qualifiedName,
   value: DOMString);
 var
   Attr: TDOMAttr;
 begin
-  Attr := Attributes.GetNamedItemNS(namespaceURI, qualifiedName) as TDOMAttr;
+  Attr := Attributes.GetNamedItemNS(nsURI, qualifiedName) as TDOMAttr;
   if attr = nil then
   begin
-    attr := FOwnerDocument.CreateAttributeNS(namespaceURI, qualifiedName);
+    attr := FOwnerDocument.CreateAttributeNS(nsURI, qualifiedName);
     // TODO 5: keep sorted!
     FAttributes.FList.Add(attr);
   end;
@@ -2110,10 +2204,10 @@ begin
     Result := nil;
 end;
 
-function TDOMElement.GetAttributeNodeNS(const namespaceURI, localName: DOMString): TDOMAttr;
+function TDOMElement.GetAttributeNodeNS(const nsURI, aLocalName: DOMString): TDOMAttr;
 begin
   if Assigned(FAttributes) then
-    Result := FAttributes.GetNamedItemNS(namespaceURI, localName) as TDOMAttr
+    Result := FAttributes.GetNamedItemNS(nsURI, aLocalName) as TDOMAttr
   else
     Result := nil;
 end;
@@ -2154,9 +2248,9 @@ begin
   Result := TDOMElementList.Create(Self, name);
 end;
 
-function TDOMElement.GetElementsByTagNameNS(const namespaceURI, localName: DOMString): TDOMNodeList;
+function TDOMElement.GetElementsByTagNameNS(const nsURI, aLocalName: DOMString): TDOMNodeList;
 begin
-  Result := TDOMElementList.Create(Self, namespaceURI, localName);
+  Result := TDOMElementList.Create(Self, nsURI, aLocalName);
 end;
 
 function TDOMElement.hasAttribute(const name: DOMString): Boolean;
@@ -2165,10 +2259,10 @@ begin
     Assigned(FAttributes.GetNamedItem(name));
 end;
 
-function TDOMElement.hasAttributeNS(const namespaceURI, localName: DOMString): Boolean;
+function TDOMElement.hasAttributeNS(const nsURI, aLocalName: DOMString): Boolean;
 begin
   Result := Assigned(FAttributes) and
-    Assigned(FAttributes.getNamedItemNS(namespaceURI, localName));
+    Assigned(FAttributes.getNamedItemNS(nsURI, aLocalName));
 end;
 
 function TDOMElement.HasAttributes: Boolean;
@@ -2283,16 +2377,6 @@ begin
   inherited Destroy;
 end;
 
-function TDOMDocumentType.CloneNode(deep: Boolean; ACloneOwner: TDOMDocument): TDOMNode;
-begin
-  Result := TDOMDocumentType.Create(ACloneOwner);
-  TDOMDocumentType(Result).FName := FName;
-  TDOMDocumentType(Result).FSystemID := FSystemID;
-  TDOMDocumentType(Result).FPublicID := FPublicID;
-  // ignore deep - DocumentType cannot have children
-  // TODO: Clone Entities and Notations 
-end;
-
 function TDOMDocumentType.GetEntities: TDOMNamedNodeMap;
 begin
   if FEntities = nil then
@@ -2352,6 +2436,15 @@ begin
   Result := FName;
 end;
 
+function TDOMEntity.CloneNode(deep: Boolean; aCloneOwner: TDOMDocument): TDOMNode;
+begin
+  Result := TDOMEntity.Create(aCloneOwner);
+  TDOMEntity(Result).FName := FName;
+  TDOMEntity(Result).FSystemID := FSystemID;
+  TDOMEntity(Result).FPublicID := FPublicID;
+  TDOMEntity(Result).FNotationName := FNotationName;  
+end;
+
 // -------------------------------------------------------
 //   EntityReference
 // -------------------------------------------------------
@@ -2369,6 +2462,7 @@ end;
 function TDOMEntityReference.CloneNode(deep: Boolean; ACloneOwner: TDOMDocument): TDOMNode;
 begin
   Result := ACloneOwner.CreateEntityReference(FName);
+  // TODO -cConformance: this is done in CreateEntityReference?
   CloneChildren(Result, ACloneOwner);
 end;
 

+ 5 - 4
packages/fcl-xml/src/xmlconf.pp

@@ -124,7 +124,7 @@ end;
 
 procedure TXMLConfig.Flush;
 begin
-  if Modified then
+  if Modified and (Filename <> '') then
   begin
     WriteXMLFile(Doc, Filename);
     FModified := False;
@@ -348,14 +348,15 @@ procedure TXMLConfig.DoSetFilename(const AFilename: String; ForceReload: Boolean
 begin
   if (not ForceReload) and (FFilename = AFilename) then
     exit;
+    
+  Flush;
+  FreeAndNil(Doc);
+    
   FFilename := AFilename;
 
   if csLoading in ComponentState then
     exit;
 
-  Flush;
-  FreeAndNil(Doc);
-
   if FileExists(AFilename) and not FStartEmpty then
     ReadXMLFile(Doc, AFilename);
 

File diff suppressed because it is too large
+ 365 - 391
packages/fcl-xml/src/xmlread.pp


+ 22 - 5
packages/fcl-xml/src/xmlutils.pp

@@ -22,7 +22,8 @@ interface
 uses
   SysUtils;
 
-function IsXmlName(const Value: WideString; Xml11: Boolean = False): Boolean;
+function IsXmlName(const Value: WideString; Xml11: Boolean = False): Boolean; overload;
+function IsXmlName(Value: PWideChar; Len: Integer; Xml11: Boolean = False): Boolean; overload;
 function IsXmlNames(const Value: WideString; Xml11: Boolean = False): Boolean;
 function IsXmlNmToken(const Value: WideString; Xml11: Boolean = False): Boolean;
 function IsXmlNmTokens(const Value: WideString; Xml11: Boolean = False): Boolean;
@@ -64,7 +65,18 @@ begin
   Result := Xml11Pg;
 end;
 
-function IsXml11Char(const Value: WideString; var Index: Integer): Boolean;
+function IsXml11Char(Value: PWideChar; var Index: Integer): Boolean; overload;
+begin
+  if (Value[Index] >= #$D800) and (Value[Index] <= #$DB7F) then
+  begin
+    Inc(Index);
+    Result := (Value[Index] >= #$DC00) and (Value[Index] <= #$DFFF);
+  end
+  else
+    Result := False;
+end;
+
+function IsXml11Char(const Value: WideString; var Index: Integer): Boolean; overload;
 begin
   if (Value[Index] >= #$D800) and (Value[Index] <= #$DB7F) then
   begin
@@ -76,6 +88,11 @@ begin
 end;
 
 function IsXmlName(const Value: WideString; Xml11: Boolean): Boolean;
+begin
+  Result := IsXmlName(PWideChar(Value), Length(Value), Xml11);
+end;
+
+function IsXmlName(Value: PWideChar; Len: Integer; Xml11: Boolean = False): Boolean; overload;
 var
   Pages: PByteArray;
   I: Integer;
@@ -86,12 +103,12 @@ begin
   else
     Pages := @NamePages;
 
-  I := 1;
-  if (Value = '') or not ((Byte(Value[I]) in NamingBitmap[Pages^[hi(Word(Value[I]))]]) or
+  I := 0;
+  if (Len = 0) or not ((Byte(Value[I]) in NamingBitmap[Pages^[hi(Word(Value[I]))]]) or
     (Xml11 and IsXml11Char(Value, I))) then
       Exit;
   Inc(I);
-  while I <= Length(Value) do
+  while I < Len do
   begin
     if not ((Byte(Value[I]) in NamingBitmap[Pages^[$100+hi(Word(Value[I]))]]) or
       (Xml11 and IsXml11Char(Value, I))) then

+ 51 - 74
packages/fcl-xml/src/xmlwrite.pp

@@ -56,7 +56,6 @@ type
     procedure DecIndent; {$IFDEF HAS_INLINE} inline; {$ENDIF}
     procedure wrtStr(const ws: WideString); {$IFDEF HAS_INLINE} inline; {$ENDIF}
     procedure wrtChr(c: WideChar); {$IFDEF HAS_INLINE} inline; {$ENDIF}
-    procedure wrtLineEnd; {$IFDEF HAS_INLINE} inline; {$ENDIF}
     procedure wrtIndent; {$IFDEF HAS_INLINE} inline; {$ENDIF}
     procedure wrtQuotedLiteral(const ws: WideString);
     procedure ConvWrite(const s: WideString; const SpecialChars: TSetOfChar;
@@ -154,7 +153,8 @@ begin
   FCapacity := 512;
   // Initialize Indent string
   SetLength(FIndent, 100);
-  for I := 1 to 100 do FIndent[I] := ' ';
+  FIndent[1] := #10;
+  for I := 2 to 100 do FIndent[I] := ' ';
   FIndentCount := 0;
   // Later on, this may be put under user control
   // for now, take OS setting
@@ -175,7 +175,6 @@ var
   pb: PChar;
   wc: Cardinal;
   SrcEnd: PWideChar;
-  I: Integer;
 begin
   pb := FBufPos;
   SrcEnd := Src + Length;
@@ -191,30 +190,29 @@ begin
 
     wc := Cardinal(Src^);  Inc(Src);
     case wc of
-      $0A:  for I := 1 to System.Length(FLineBreak) do
-            begin
-              pb^ := FLineBreak[I]; Inc(pb);
-            end;
+      $0A: pb := StrECopy(pb, PChar(FLineBreak));
 
       0..$09, $0B..$7F:  begin
         pb^ := char(wc); Inc(pb);
       end;
 
       $80..$7FF: begin
-        pb^ := Char($C0 or (wc shr 6));   Inc(pb);
-        pb^ := Char($80 or (wc and $3F)); Inc(pb);
+        pb^ := Char($C0 or (wc shr 6));
+        pb[1] := Char($80 or (wc and $3F));
+        Inc(pb,2);
       end;
 
       $D800..$DBFF: begin
         if (Src < SrcEnd) and (Src^ >= #$DC00) and (Src^ <= #$DFFF) then
         begin
-          wc := ((wc - $D7C0) shl 10) + (word(Src^) xor $DC00);
+          wc := ((LongInt(wc) - $D7C0) shl 10) + LongInt(word(Src^) xor $DC00);
           Inc(Src);
 
-          pb^ := Char($F0 or (wc shr 18));           Inc(pb);
-          pb^ := Char($80 or ((wc shr 12) and $3F)); Inc(pb);
-          pb^ := Char($80 or ((wc shr 6) and $3F));  Inc(pb);
-          pb^ := Char($80 or (wc and $3F));          Inc(pb);
+          pb^ := Char($F0 or (wc shr 18));
+          pb[1] := Char($80 or ((wc shr 12) and $3F));
+          pb[2] := Char($80 or ((wc shr 6) and $3F));
+          pb[3] := Char($80 or (wc and $3F));
+          Inc(pb,4);
         end
         else
           raise EConvertError.Create('High surrogate without low one');
@@ -223,9 +221,10 @@ begin
         raise EConvertError.Create('Low surrogate without high one');
       else   // $800 >= wc > $FFFF, excluding surrogates
       begin
-        pb^ := Char($E0 or (wc shr 12));          Inc(pb);
-        pb^ := Char($80 or ((wc shr 6) and $3F)); Inc(pb);
-        pb^ := Char($80 or (wc and $3F));         Inc(pb);
+        pb^ := Char($E0 or (wc shr 12));
+        pb[1] := Char($80 or ((wc shr 6) and $3F));
+        pb[2] := Char($80 or (wc and $3F));
+        Inc(pb,3);
       end;
     end;
   end;
@@ -237,20 +236,16 @@ begin
   wrtChars(PWideChar(ws), Length(ws));
 end;
 
+{ No checks here - buffer always has 32 extra bytes }
 procedure TXMLWriter.wrtChr(c: WideChar); { inline }
 begin
-  wrtChars(@c,1);
-end;
-
-procedure TXMLWriter.wrtLineEnd; { inline }
-begin
-  // line endings now handled in WrtStr!
-  wrtChr(#10);
+  FBufPos^ := char(ord(c));
+  Inc(FBufPos);
 end;
 
 procedure TXMLWriter.wrtIndent; { inline }
 begin
-  wrtChars(PWideChar(FIndent), FIndentCount*2);
+  wrtChars(PWideChar(FIndent), FIndentCount*2+1);
 end;
 
 procedure TXMLWriter.IncIndent;
@@ -366,75 +361,56 @@ end;
 procedure TXMLWriter.VisitElement(node: TDOMNode);
 var
   i: Integer;
-  attr, child: TDOMNode;
+  child: TDOMNode;
   SavedInsideTextNode: Boolean;
-  IsLeaf: Boolean;
-  MixedContent: Boolean;
 begin
   if not FInsideTextNode then
     wrtIndent;
   wrtChr('<');
-  wrtStr(node.NodeName);
+  wrtStr(TDOMElement(node).TagName);
   // FIX: Accessing Attributes was causing them to be created for every element :(
   if node.HasAttributes then
     for i := 0 to node.Attributes.Length - 1 do
     begin
-      attr := node.Attributes.Item[i];
-      if TDOMAttr(attr).Specified then
-        VisitAttribute(attr);
+      child := node.Attributes.Item[i];
+      if TDOMAttr(child).Specified then
+        VisitAttribute(child);
     end;
   Child := node.FirstChild;
   if Child = nil then
-    wrtStr('/>')
+    wrtChars('/>', 2)
   else
   begin
     SavedInsideTextNode := FInsideTextNode;
     wrtChr('>');
-    MixedContent := False;
-    repeat
-      if Assigned(Child.PreviousSibling) and
-        (Child.PreviousSibling.InheritsFrom(TDOMText) <> Child.InheritsFrom(TDOMText)) then
-        MixedContent := True;
-      Child := Child.NextSibling;
-    until Child = nil;
-    Child := node.FirstChild; // restore
-
-    IsLeaf := (Child = node.LastChild) and (Child.FirstChild = nil);
-    if not (FInsideTextNode or MixedContent or IsLeaf) then
-      wrtLineEnd;
-
-    FInsideTextNode := {FInsideTextNode or} MixedContent or IsLeaf;
+    FInsideTextNode := Child.NodeType in [TEXT_NODE, CDATA_SECTION_NODE];
     IncIndent;
     repeat
       WriteNode(Child);
       Child := Child.NextSibling;
     until Child = nil;
     DecIndent;
-    if not FInsideTextNode then
+    if not (node.LastChild.NodeType in [TEXT_NODE, CDATA_SECTION_NODE]) then
       wrtIndent;
     FInsideTextNode := SavedInsideTextNode;
-    wrtStr('</');
-    wrtStr(Node.NodeName);
+    wrtChars('</', 2);
+    wrtStr(TDOMElement(Node).TagName);
     wrtChr('>');
   end;
-  if not FInsideTextNode then
-    wrtLineEnd;
 end;
 
 procedure TXMLWriter.VisitText(node: TDOMNode);
 begin
-  ConvWrite(node.NodeValue, TextSpecialChars, {$IFDEF FPC}@{$ENDIF}TextnodeSpecialCharCallback);
+  ConvWrite(TDOMCharacterData(node).Data, TextSpecialChars, {$IFDEF FPC}@{$ENDIF}TextnodeSpecialCharCallback);
 end;
 
 procedure TXMLWriter.VisitCDATA(node: TDOMNode);
 begin
   if not FInsideTextNode then
     wrtIndent;
-  wrtStr('<![CDATA[');
-  wrtStr(node.NodeValue);
-  wrtStr(']]>');
-  if not FInsideTextNode then
-    wrtLineEnd;
+  wrtChars('<![CDATA[', 9);
+  wrtStr(TDOMCharacterData(node).Data);
+  wrtChars(']]>', 3);
 end;
 
 procedure TXMLWriter.VisitEntityRef(node: TDOMNode);
@@ -452,16 +428,14 @@ begin
   wrtChr(' ');
   wrtStr(TDOMProcessingInstruction(node).Data);
   wrtStr('?>');
-  if not FInsideTextNode then wrtLineEnd;
 end;
 
 procedure TXMLWriter.VisitComment(node: TDOMNode);
 begin
   if not FInsideTextNode then wrtIndent;
-  wrtStr('<!--');
-  wrtStr(node.NodeValue);
-  wrtStr('-->');
-  if not FInsideTextNode then wrtLineEnd;
+  wrtChars('<!--', 4);
+  wrtStr(TDOMCharacterData(node).Data);
+  wrtChars('-->', 3);
 end;
 
 procedure TXMLWriter.VisitDocument(node: TDOMNode);
@@ -486,16 +460,16 @@ begin
     wrtChr('"');
   end;
 *)
-  wrtStr('?>'#10);
+  wrtStr('?>');
 
   // TODO: now handled as a regular PI, remove this?
   if Length(TXMLDocument(node).StylesheetType) > 0 then
   begin
-    wrtStr('<?xml-stylesheet type="');
+    wrtStr(#10'<?xml-stylesheet type="');
     wrtStr(TXMLDocument(node).StylesheetType);
     wrtStr('" href="');
     wrtStr(TXMLDocument(node).StylesheetHRef);
-    wrtStr('"?>'#10);
+    wrtStr('"?>');
   end;
 
   child := node.FirstChild;
@@ -504,6 +478,7 @@ begin
     WriteNode(Child);
     Child := Child.NextSibling;
   end;
+  wrtChars(#10, 1);
 end;
 
 procedure TXMLWriter.VisitAttribute(Node: TDOMNode);
@@ -511,15 +486,17 @@ var
   Child: TDOMNode;
 begin
   wrtChr(' ');
-  wrtStr(Node.NodeName);
-  wrtStr('="');
+  wrtStr(TDOMAttr(Node).Name);
+  wrtChars('="', 2);
   Child := Node.FirstChild;
   while Assigned(Child) do
   begin
-    if Child.NodeType = ENTITY_REFERENCE_NODE then
-      VisitEntityRef(Child)
-    else
-      ConvWrite(Child.NodeValue, AttrSpecialChars, {$IFDEF FPC}@{$ENDIF}AttrSpecialCharCallback);
+    case Child.NodeType of
+      ENTITY_REFERENCE_NODE:
+        VisitEntityRef(Child);
+      TEXT_NODE:
+        ConvWrite(TDOMCharacterData(Child).Data, AttrSpecialChars, {$IFDEF FPC}@{$ENDIF}AttrSpecialCharCallback);
+    end;
     Child := Child.NextSibling;
   end;
   wrtChr('"');
@@ -527,7 +504,7 @@ end;
 
 procedure TXMLWriter.VisitDocumentType(Node: TDOMNode);
 begin
-  wrtStr('<!DOCTYPE ');
+  wrtStr(#10'<!DOCTYPE ');
   wrtStr(Node.NodeName);
   wrtChr(' ');
   with TDOMDocumentType(Node) do
@@ -551,7 +528,7 @@ begin
       wrtChr(']');
     end;
   end;
-  wrtStr('>'#10);
+  wrtChr('>');
 end;
 
 procedure TXMLWriter.VisitFragment(Node: TDOMNode);

+ 92 - 0
packages/fcl-xml/tests/README

@@ -33,3 +33,95 @@ two lines at the bottom which reference 'eduni-ns10' and 'eduni-ns11' testsuites
 
 </TESTSUITE>
 )
+
+
+Testsuite errata
+--------------------------------------------
+The following issues were encountered while testing the parser. Fortunately, none
+of these change the category of any test, but in some cases cause incorrect error
+message and/or postion to be reported.
+
+1) xmltest/not-wf/sa/081.xml
+   xmltest/not-wf/sa/082.xml
+   xmltest/not-wf/sa/083.xml
+   xmltest/not-wf/sa/084.xml
+
+All four reference an external entity with SystemID 'nul', which is a reserved
+name under Windows (you won't be able to create such file). The archive contains
+a file named 'nul.ent' that differs from entity's SystemID, so it won't resolve
+anyway even in non-Windows.
+This issue does not have any effect on FCL parser.
+Additionally, tests 083.xml and 084.xml contain a reference to undefined notation.
+This cause an extra validation error to be reported before the fatal error.
+
+2) oasis/p49fail1.xml
+   oasis/p50fail1.xml
+
+Both tests are missing ']' that should close the internal DTD subset.
+
+3) oasis/p58fail1.xml
+   oasis/p58fail2.xml
+   oasis/p58fail3.xml
+
+All three have a NOTATION attribute declared on EMPTY element. This causes an extra
+validation error to be reported before the fatal one.
+
+4) ibm/xml-1.1/not-wf/p02/ibm02n66.ent
+
+Presumably, missing '<' at start of CDATA. Does not change the diagnostic, though.
+
+5) ibm/not-wf/p23/ibm23n05.xml
+
+Contains encoding name 'ASCII' which is not supported by the parser. As a result, it aborts
+before detecting the illegal XML declaration closing sequence.
+
+6) ibm/not-wf/p72/ibm72n09.xml
+
+Missing whitespace between 'ENTITY' and '%' at line 6 is detected before the bad tag closing
+sequence.
+
+7) ibm/not-wf/p77/ibm77n01.ent
+
+Invalid encoding name 'UTF8' is detected before the wrong token order.
+
+8) sun/invalid/attr03.xml
+   sun/invalid/attr04.xml
+   sun/invalid/attr15.xml
+
+Have a NOTATION attribute is declared on EMPTY element. Diagnostics incorrect.
+
+9) ibm/invalid/p56/ibm56i11.xml
+   ibm/invalid/p56/ibm56i12.xml
+   ibm/invalid/p56/ibm56i14.xml
+   ibm/invalid/p56/ibm56i15.xml
+
+Contain a reference to undeclared notation 'gif'. Diagnostics incorrect.
+
+10) eduni/xml-1.1/052.xml
+    eduni/xml-1.1/053.xml
+
+Intended to test handling of NEL and LSEP chars as element content whitespace, these
+tests enclose NEL and LSEP within ordinary ascii chars ('abc_def') that are clearly not
+a whitespace. A 'correct' error is therefore reported regardless of actual NEL/LSEP handling.
+
+11) ibm/not-wf/p69/ibm69n06.xml
+    ibm/not-wf/p69/ibm69n07.xml
+
+Designed to check parameter entity recursion, both tests contain PE references within entity
+value declarations in internal DTD subset, which is a fatal error by itself.
+
+12) ibm/not-wf/p21/ibm21n01.xml
+
+Tests illegal CDEnd, but has an extra '[' in CDStart, which is detected earlier.
+
+13) ibm/not-wf/p21/ibm21n02.xml
+
+Tests illegal CDEnd, but has lowercase 'cdata' in CDStart, which is detected earlier.
+
+14) ibm/xml-1.1/not-wf/p02/ibm02n58.xml
+
+The first illegal character 0x99 is at position (2, 24), but another one at position (4,7) is
+represented with malformed UTF-8 sequence (0xC1 0xA3, while correct one is 0xC2 0x99).
+An 'xml-unaware' decoder can detect this before processing any 'normal' characters,
+so diagnostics may be wrong.
+

+ 32 - 10
packages/fcl-xml/tests/xmlts.pp

@@ -49,7 +49,6 @@ type
     FPassed, FFailCount: Integer;
     FFalsePasses: Integer;
     FRootUri: string;
-    FTemplateName: string;
     FSuiteName: string;
     FDoc: TXMLDocument;
     FValidating: Boolean;
@@ -64,12 +63,12 @@ type
     table_informative: TDOMNode;
     FValError: string;
     FTestID: DOMString;
+    FErrLine, FErrCol: Integer;
     procedure LoadTemplate(const Name: string);
     procedure HandleTemplatePIs(Element: TDOMNode);
     procedure Diagnose(Element, Table: TDOMNode; Category: TDiagCategory; const Error: DOMString);
     procedure DiagnoseOut(const ErrorMsg: DOMString);
     function CompareNodes(actual, correct: TDOMNode; out Msg: string): Boolean;
-    procedure Canonicalize(node: TDOMNode);
     procedure ErrorHandler(Error: EXMLReadError);
   public
     constructor Create;
@@ -133,10 +132,20 @@ begin
   inherited Create;
   FParser := TDOMParser.Create;
   FParser.Options.PreserveWhitespace := True;
+  FParser.Options.ExpandEntities := True;
+  FParser.Options.IgnoreComments := True;
+  FParser.Options.CDSectionsAsText := True;
 end;
 
 procedure TTestSuite.ErrorHandler(Error: EXMLReadError);
 begin
+  // allow fatal error position to override that of validation error
+  if (FErrLine < 0) or (Error.Severity = esFatal) then
+  begin
+    FErrLine := Error.Line;
+    FErrCol := Error.LinePos;
+  end;  
+
   if Error.Severity = esError then
   begin
     if FValError = '' then // fetch the _first_ message
@@ -345,6 +354,8 @@ var
   I: Integer;
   root: UTF8String;
 begin
+  FErrLine := -1;
+  FErrCol := -1;
   FTestID := Element['ID'];
   TestType := Element['TYPE'];
   root := GetBaseURI(Element, FRootUri);
@@ -382,12 +393,16 @@ begin
   try
     try
       FParser.Options.Validate := FValidating;
+//      FParser.Options.Namespaces := (Element['NAMESPACE'] <> 'no');
       FParser.OnError := {$IFDEF FPC}@{$ENDIF}ErrorHandler;
       FParser.ParseUri(s, TempDoc);
     except
       on E: Exception do
         if E.ClassType <> EAbort then
+        begin
           FailMsg := E.Message;
+          FValError := '';
+        end;
     end;
 
     if table = table_informative then
@@ -412,12 +427,22 @@ begin
       begin
         if FailMsg <> '' then  // Fatal error
         begin
-          Inc(FFalsePasses);
-          Diagnose(Element, table, dcPass, FailMsg);
+          { outside not-wf category it is a test failure }
+          if table <> table_not_wf then
+          begin
+            Inc(FFailCount);
+            Diagnose(Element, table, dcFail, FailMsg);
+          end
+          else
+          begin
+            Inc(FFalsePasses);
+            Diagnose(Element, table, dcPass, FailMsg);
+          end;
         end
         else
         begin
-          if table = table_not_wf then  // validation error here is a test failure!
+          { outside invalid category it is a test failure }
+          if table = table_not_wf then
           begin
             Inc(FFailCount);
             Diagnose(Element, table, dcFail, FValError);
@@ -448,7 +473,6 @@ begin
       end;
 
     if outURI = '' then Exit;
-    Canonicalize(TempDoc);
     TempDoc.DocumentElement.Normalize;
     try
       // reference data must be parsed in non-validating mode because it contains DTDs
@@ -627,8 +651,7 @@ begin
   table_output.AppendChild(tr);
 end;
 
-
-procedure TTestSuite.Canonicalize(node: TDOMNode);
+procedure Canonicalize(node: TDOMNode);
 var
   child, work: TDOMNode;
   Frag: TDOMDocumentFragment;
@@ -772,9 +795,8 @@ begin
   with TTestSuite.Create do
   try
     FSuiteName := SuiteName;
-    FTemplateName := TemplateName;
     FValidating := Validation;
-    LoadTemplate(FTemplateName);
+    LoadTemplate(TemplateName);
     if Assigned(FTemplate) then
     begin
       Run(FSuiteName);

Some files were not shown because too many files changed in this diff