xml-classes 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320
  1. * XML Classes
  2. ** Abstract
  3. XML library is used by several areas of Mono such as ADO.NET and XML
  4. Digital Signature (xmldsig). Here I write about System.Xml.dll and
  5. related tools. This page won't include any classes which are in other
  6. assemblies such as XmlDataDocument.
  7. Note that current corlib has its own XML parser class (Mono.Xml.MiniParser).
  8. Basically System.XML.dll feature is almost finished, so I write this
  9. document mainly for bugs and improvement hints.
  10. ** System.Xml namespace
  11. *** Document Object Model (Core)
  12. DOM feature has already implemented. There is still missing feature.
  13. <ul>
  14. * ID constraint support is problematic because W3C DOM does not
  15. specify handling of ID attributes into non-adapted element.
  16. (MS.NET also looks incomplete in this area).
  17. </ul>
  18. *** Xml Writer
  19. Here XmlWriter almost equals to XmlTextWriter. If you want to see
  20. another implementation, check XmlNodeWriter.cs and DTMXPathDocumentWriter.cs
  21. in System.XML sources.
  22. XmlTextWriter is completed, though it looks a bit slower than MS.NET (I
  23. tried 1.1).
  24. *** XmlResolver
  25. Currently XmlTextReader uses specified XmlResolver. If nothing was supplied,
  26. then it uses XmlUrlResolver. XmlResolver is used to parse external DTD,
  27. importing XSL stylesheets and schemas etc.
  28. However, XmlUrlResolver is still buggy (mainly because System.Uri is also
  29. incomplete yet) and this results in several loading error.
  30. XmlSecureResolver, which is introduced in MS .NET Framework 1.1 is basically
  31. implemented, but it requires CAS (code access security) feature. We need to
  32. fixup this class after ongoing CAS effort works.
  33. You might also be interested in an improved <a href="http://codeblogs.ximian.com/blogs/benm/archives/000039.html">XmlCachingResolver</a> by Ben Maurer.
  34. *** XmlNameTable
  35. NameTable itself is implemented. However, it should be actually used in
  36. several classes. Currently it makes sense if compared names are both in
  37. the table, they should be simply compared using ReferenceEquals(). We
  38. have done where it seems possible e.g. in XmlNamespaceManager (in .NET
  39. 1.2 methods; if the build is not NET_1_2, it will be used internally).
  40. NameTable also needs performance improvement.
  41. *** Xml Stream Reader
  42. When we are using ASCII document, we don't care which encoding we are using.
  43. However, XmlTextReader must be aware of the specified encoding in XML
  44. declaration. So we have internal XmlStreamReader class (and currently
  45. XmlInputStream class. This may disappear since XmlStreamReader is enough to
  46. handle this problem).
  47. However, there seems some problems in these classes on reading network
  48. stream (especially on Linux). This should be fixed soon, if we found the
  49. actual reason.
  50. *** XML Reader
  51. XmlTextReader, XmlNodeReader and XmlValidatingReader are almost finished.
  52. <ul>
  53. * All OASIS conformance test passes as Microsoft does. Some
  54. W3C tests fail, but it looks better.
  55. * Entity expansion and its well-formedness check is incomplete.
  56. It incorrectly allows divided content models. It incorrectly
  57. treats its Base URI, so some dtd fails.
  58. * I won't add any XDR support on XmlValidatingReader. (I haven't
  59. ever seen XDR used other than Microsoft's BizTalk Server 2000,
  60. and Now they have 2002 with XML Schema support)
  61. </ul>
  62. XmlTextReader and XmlValidatingReader should be faster than now. Currently
  63. XmlTextReader looks nearly twice as slow as MS.NET, and XmlValidatingReader
  64. (which uses this slow XmlTextReader) looks nearly three times slower. (Note
  65. that XmlValidatingReader won't be slow as itself. It uses schema validating
  66. reader and dtd validating reader.)
  67. **** Some Advantages
  68. The design of Mono's XmlValidatingReader is radically different from
  69. that of Microsoft's implementation. Under MS.NET, DTD content validation
  70. engine is in fact simple replacement of XML Schema validation engine.
  71. Mono's DTD validation is designed fully separate and does validation
  72. as normal XML parser does. For example, Mono allows non-deterministic DTD.
  73. Another advantage of this XmlValidatingReader is support for *any* XmlReader.
  74. Microsoft supports only XmlTextReader.
  75. I added extra support interface named "IHasXmlParserContext", which is
  76. considered in XmlValidatingReader.ResolveEntity(). Microsoft failed to
  77. design XmlReader to support pluggable use of XmlReader (i.e. wrapping use
  78. of other XmlReader) since XmlParserContext is required to support both
  79. entity resolution and namespace manager. (In .NET 1.2, Microsoft also
  80. supported similar to IHasXmlParserContext, named IXmlNamespaceResolver,
  81. but it still does not provide any DTD information.)
  82. We also have RELAX NG validating reader. See mcs/class/Commons.Xml.Relaxng.
  83. ** System.Xml.Schema
  84. *** Summary
  85. Basically it is completed. We can compile complex and simple types, refer to
  86. external schemas, extend or restrict other types, or use substitution groups.
  87. You can test how current schema validation engine is (in)complete by using
  88. standalone test module
  89. (see mcs/class/System.XML/Test/System.Xml.Schema/standalone_tests).
  90. At least in my box, msxsdtest fails only 30 cases with bugfixed catalog.
  91. *** Schema Object Model
  92. Completed, except for some things to be fixed:
  93. <ul>
  94. * Complete facet support. Currently some of them is missing.
  95. Recently David Sheldon is doing several fixes on them.
  96. * ContentTypeParticle for pointless xs:choice is incomplete
  97. (It is because fixing this arose another bugs in
  98. compilation. Interestingly, MS.NET also fails around here,
  99. so it might be nature of ContentTypeParticle design)
  100. * Some derivation by restriction (DBR) handling is incorrect.
  101. </ul>
  102. *** Validating Reader
  103. XML Schema validation feature is (currently) implemented on
  104. Mono.Xml.Schema.XsdValidatingReader, which is internally used in
  105. XmlValidatingReader.
  106. Basically this is implemented and actually its feature is almost complete,
  107. but I have only did validation feature testing. So we have to write more
  108. tests on properties, methods, and events (validation errors).
  109. ** System.Xml.Serialization
  110. Lluis rules ;-)
  111. Well, in fact XmlSerializer is almost finished and is on bugfix phase.
  112. However, we appliciate more tests. Please try
  113. <ul>
  114. * System.Web.Services to invoke SOAP services.
  115. * xsd.exe and wsdl.exe to create classes.
  116. </ul>
  117. And if any problems were found, please file it to bugzilla.
  118. Lluis also built interesting standalone test system placed under
  119. mcs/class/System.Web.Services/Test/standalone.
  120. You might also interested in genxs, which enables you to create custom
  121. XML serializer. This is not included in Microsoft.NET.
  122. See <a
  123. href="http://primates.ximian.com/~lluis/blog/archives/000120.html">here</a>
  124. and mcs/tools/genxs for the details.
  125. ** System.Xml.XPath and System.Xml.Xsl
  126. There are two implementations for XSLT. One (and historical) implementation
  127. is based on libxslt (aka Unmanaged XSLT). Now we uses fully implemented
  128. managed XSLT. To use Unmanaged XSLT, set MONO_UNMANAGED_XSLT environment
  129. value (any value is acceptable).
  130. As for Managed XSLT, we support msxsl:script.
  131. It would be nice if we can support <a href="http://www.exslt.org/">EXSLT</a>.
  132. <a href="http://msdn.microsoft.com/WebServices/default.aspx?pull=/library/en-us/dnexxml/html/xml05192003.asp">Microsoft has already done it</a>, but it
  133. is not good code since it depends on internal concrete derivatives of
  134. XPathNodeIterator classes.
  135. In general, .NET's "extension objects" (including msxsl:script) is not
  136. usable to return node-sets, so if we support EXSLT, it has to be done
  137. internally inside our System.XML.dll. Volunteers are welcome.
  138. Our managed XSLT implementation is still inefficient. For some kind of
  139. transformation, XslTransform.Load() and .Transform() looks slower than MS.
  140. ** System.Xml and ADO.NET v2.0
  141. Microsoft introduced the first beta version of .NET Framework 1.2 runtime
  142. and sdk (and Visual Studio Whidbey). They are now available on MSDN
  143. _subscriber_ download (i.e. it is not publicly downloadable yet). It
  144. contains several new classes.
  145. There are two assemblies related to System.Xml v2.0; System.Xml.dll and
  146. System.Data.SqlXml.dll (here I treat sqlxml.dll as part of System.Xml v2.0,
  147. but note that it is also one of the ADO.NET 2.0 feature). There are several
  148. namespaces such as MS.Internal.Xml and System.Xml. Note that .NET Framework
  149. is pre-release version and MS.Internal.Xml namespace apparently shows that
  150. it is not in stable status as yet.
  151. System.Xml 2.0 contains several features such as:
  152. <ul>
  153. * XPathNavigator2 and XPathDocument2
  154. * XML Query
  155. * XmlAdapter
  156. * XSLT IL generator (similar to Apache XSLTC) - it is
  157. internal use
  158. </ul>
  159. Tim Coleman started ADO.NET 2.0 related works. Currently I have no plan to
  160. implement System.Xml v2.0 classes and won't touch with them immediately,
  161. but will start in some months. If any of you wants to try this frontier,
  162. we welcome your effort.
  163. *** XPathNavigator2
  164. System.Xml v2.0 implementation will be started from XPathDocument2 and
  165. XPathNavigator2 implementations. First, its document structure and basic
  166. navigation feature will be implemented. And next, XPath2 engine should
  167. be implemented (XPathNavigator2 looks very different from XPathNavigator).
  168. It is once described as to contain schema validation feature, but MS
  169. guys said that they have removed that feature (It is just a beta version,
  170. so anything might happen).
  171. *** XML Query
  172. XML Query is a new face XML data manipulation language (well, at least new
  173. to .NET world). It is similar to SQL, but intended to manipulate and to
  174. support XML. It is similar to XSLT, but extended to support new features
  175. such as XML Schema based datatypes.
  176. XML Query implementation can be found mainly in System.Xml.Query and
  177. MS.Internal.Xml.Query namespaces. Note that they are in
  178. System.Data.SqlXml.dll.
  179. MSDN documentation says that there are two kind of API for XML Query: High
  180. Level API and Low Level API. At the time of this beta version, the Low Level
  181. API is described not released yet (though it may be MS.Internal.Xml.*
  182. classes). However, to implement the High Level API, the Low Level API will
  183. be used. They looks to have interesting class structures in MS.Internal.Xml
  184. related stuff, so it would be nice (and I will) start to learn about them.
  185. They looks to have IL generator classes, but it would be difficult to
  186. start from them.
  187. *** System.Data.Mapping
  188. System.Data.Mapping and System.Data.Mapping.RelationalSchema are the
  189. namespaces for mapping support between database and xml. This is at
  190. stubbing phase (incomplete as yet).
  191. *** XmlAdapter
  192. XmlAdapter is used to support XML based query and update using
  193. XPathDocument2 and XPathNavigator2. This class is designed to synthesize
  194. ADO.NET and System.Xml. It connects to databases, and querys data in XML
  195. shape into XPathDocument2, using Mapping schema above. This must be
  196. done after several classes such as XPathDocument2 and MappingSchema.
  197. ** Miscellaneous Class Libraries
  198. *** RELAX NG
  199. I implemented an experimental RelaxngValidatingReader. It is still not
  200. complete, for example some simplification stuff (see RELAX NG spec
  201. chapter 4; especially 4.17-19) and some constraints (especially 7.3).
  202. It now supports custom datatype handling. Right now, you can use XML
  203. schema datatypes ( http://www.w3.org/2001/XMLSchema-datatypes ) as well
  204. as RELAX NG default datatypes (as used in relaxng.rng).
  205. I am planning improvements (giving more kind error messages, supporting
  206. compact syntax and even object mapping), but it is still my wishlist.
  207. ** Tools
  208. *** xsd.exe
  209. See <a href="ado-net.html">ADO.NET page</a>.
  210. Microsoft has another inference class from XmlReader to XmlSchemaCollection
  211. (Microsoft.XsdInference). It may be useful, but it won't be so easy.
  212. ** Miscellaneous
  213. *** Mutual assembly dependency
  214. Sometimes I hear complain about System.dll and System.Xml.dll mutual
  215. dependency: System.dll references to System.Xml.dll (e.g.
  216. System.Configuration.ConfigXmlDocument extended from XmlDocument), while
  217. System.Xml.dll vice versa (e.g. XmlUrlResolver.ResolveUri takes System.Uri).
  218. Since they are in public method signatures, so at least we cannot get rid
  219. of these mutual references.
  220. Nowadays System.Xml.dll is built using incomplete System.dll (lacking
  221. System.Xml dependent classes such as ConfigXmlDocument). Full System.dll
  222. is built after System.Xml.dll is done.
  223. Also note that you still need System.dll to run mcs.