| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464 |
- * XML Classes Status and Tasks
- ** Abstract
- XML library is used by several areas of Mono such as ADO.NET and XML
- Digital Signature (xmldsig). Here I write about System.Xml.dll and
- related tools. This page won't include any classes which are in other
- assemblies such as XmlDataDocument.
- Note that current corlib has its own XML parser class (Mono.Xml.MiniParser).
- Basically System.XML.dll feature is almost finished, so I write this
- document mainly for bugs and improvement hints.
- ** Status
- *** System.Xml namespace
- **** Document Object Model (Core)
- DOM implementation has finished and our DOM implementation scores better
- than MS.NET as to the NIST DOM test results (it is ported by Mainsoft
- hackers and in our unit tests).
- **** Xml Writer
- Here XmlWriter almost equals to XmlTextWriter. If you want to see
- another implementation, check XmlNodeWriter.cs and DTMXPathDocumentWriter.cs
- in System.XML sources.
- XmlTextWriter is completed, though it looks a bit slower than MS.NET (I
- tried 1.1).
- **** XmlResolver
- XmlUrlResolver is implemented.
- XmlSecureResolver, which is introduced in MS .NET Framework 1.1 is basically
- implemented, but it requires CAS (code access security) feature. We need to
- fixup this class after ongoing CAS effort works.
- You might also be interested in an improved <a href="http://codeblogs.ximian.com/blogs/benm/archives/000039.html">XmlCachingResolver</a> by Ben Maurer.
- If even one time download is not acceptable, you can use <a href="http://primates.ximian.com/~atsushi/XmlStoredResolver.cs">this one</a>.
- [2.0] XmlDataSourceResolver is not implemented as yet.
- **** XmlNameTable
- NameTable is implemented, but also needs performance improvement.
- It affects on the whole XML processing performance so much.
- Optimization hackings are welcome. There is also a <a
- href="http://bugzilla.ximian.com/show_bug.cgi?id=59537">bugzilla entry</a>
- for this matter.
- **** XML Reader
- XmlTextReader, XmlNodeReader and XmlValidatingReader are almost finished.
- <ul>
- * All OASIS conformance test passes as Microsoft does. Some
- W3C tests fail, but it looks better.
- * Entity expansion and its well-formedness check is incomplete.
- It incorrectly allows divided content models. It incorrectly
- treats its Base URI, so some dtd parse fails.
- * I won't add any XDR support on XmlValidatingReader. (I haven't
- ever seen XDR used other than Microsoft's BizTalk Server 2000,
- and Now they have 2002 with XML Schema support). If anyone
- contributes an implementation, it would be still nice.
- </ul>
- XmlTextReader and XmlValidatingReader should be faster than now. Currently
- XmlTextReader looks nearly twice as slow as MS.NET, and XmlValidatingReader
- (which uses this slow XmlTextReader) looks nearly three times slower. (Note
- that XmlValidatingReader wouldn't be so slow as itself. It uses schema
- validating reader and dtd validating reader.)
- **** Some Advantages
- The design of Mono's XmlValidatingReader is radically different from
- that of Microsoft's implementation. Under MS.NET, DTD content validation
- engine is in fact simple replacement of XML Schema validation engine.
- Mono's DTD validation is designed fully separate and does validation
- as normal XML parser does. For example, Mono allows non-deterministic DTD.
- Another advantage of this XmlValidatingReader is support for *any* XmlReader.
- Microsoft supports only XmlTextReader (this bug is fixed in .NET 2.0 beta,
- taking shape of XmlReader.Create()).
- <del>I added extra support interface named "IHasXmlParserContext", which is
- considered in XmlValidatingReader.ResolveEntity(). </del><ins>This is now
- made as internal interface.</ins> Microsoft failed to design XmlReader
- so that XmlReader cannot be subtree-pluggable (i.e. wrapping use of other
- XmlReader) since XmlParserContext shoud be supplied for DTD information
- support (e.g. entity references cannot be expanded) and namespace manager.
- (In .NET 2.0, Microsoft also supported similar to IHasXmlParserContext,
- named IXmlNamespaceResolver, but it still does not provide DTD information.)
- We also have RELAX NG validating reader (described later).
- *** System.Xml.Schema
- **** Summary
- Basically it is completed. You can test how current schema validation engine
- is complete (incomplete) by using standalone test module (see
- mcs/class/System.XML/Test/System.Xml.Schema/standalone_tests).
- At least in my box, msxsdtest fails only 30 cases with bugfixed catalog -
- this score is better than that of Microsoft implementation. But instead,
- we need performance boost. There should be many points to improve
- schema compilation and validation.
- **** Schema Object Model
- Completed, except for some things to be fixed:
- <ul>
- * Complete facet support. Currently some of them is missing.
- Recently David Sheldon is doing several fixes on them.
- * ContentTypeParticle for pointless xs:choice is incomplete
- (fixing this arose another bugs in compilation.
- Interestingly, MS.NET also fails around here, so it might
- be nature of ContentTypeParticle design)
- * Some derivation by restriction (DBR) handling is incorrect.
- </ul>
- **** Validating Reader
- Basically this is implemented and actually its feature is complete,
- but I have only did validation feature testing. So we have to write more
- tests on properties, methods, and events (validation errors).
- *** System.Xml.Serialization
- Lluis rules ;-)
- Well, in fact XmlSerializer is almost finished and is on bugfix phase.
- However, we appliciate more tests. Please try
-
- <ul>
- * System.Web.Services to invoke SOAP services.
- * xsd.exe and wsdl.exe to create classes.
- </ul>
- And if any problems were found, please file it to bugzilla.
- Lluis also built interesting standalone test system placed under
- mcs/class/System.Web.Services/Test/standalone.
- You might also interested in "genxs", which enables you to create custom
- XML serializer. This is not included in Microsoft.NET.
- See <a
- href="http://primates.ximian.com/~lluis/blog/archives/000120.html">here</a>
- and manpages for details. Code files are in mcs/tools/genxs.
- Lluis also created "sgen", that based on XmlSerializer.GenerateSerializer().
- Code files are in mcs/tools/sgen.
- *** System.Xml.XPath and System.Xml.Xsl
- There are two XSLT implementations. One and historical implementation is
- based on libxslt (aka Unmanaged XSLT). Now we uses fully implemented and
- managed XSLT by default. To use Unmanaged XSLT, set MONO_UNMANAGED_XSLT
- environment value (any value is acceptable).
- As for Managed XSLT, we support msxsl:script.
- It would be nice if we can support <a href="http://www.exslt.org/">EXSLT</a>.
- <a href="http://msdn.microsoft.com/WebServices/default.aspx?pull=/library/en-us/dnexxml/html/xml05192003.asp">Microsoft has tried to do some of them</a>,
- but it is not successful because of System.Xml.Xsl design problem:
- <ul>
- * In general, .NET's "extension objects" (including
- msxsl:script) is not useful to return node-sets (MS XSLT
- implementation rejects just overriden XPathNodeIterator,
- but accepts only their hidden classes. And are the same
- in Mono though classes are different)
- * In .NET's extension object design, extension function name
- is a valid method name that cannot contain some characters
- such as '-'. That is, implementing EXSLT in C# is impossible.
- </ul>
- So if we support EXSLT, it has to be done inside our System.XML.dll.
- Microsoft developers are also aware of this problem and some of them wish
- to have EXSLT support in WinFX (not whidbey). If anyone is interested
- in it, it would be nice.
- Our managed XSLT implementation is slower than MS XSLT for some kind of
- stylesheets, and faster for some.
- *** RELAX NG
- I implemented an experimental RelaxngValidatingReader. It is still not
- complete, for example some simplification stuff (see RELAX NG spec
- chapter 4; especially 4.17-19) and some constraints (especially 7.3).
- See mcs/class/Commons.Xml.Relaxng/README for details.
- Currently we have
- <ul>
- * Custom datatype support. Right now, you can use XML schema
- datatypes ( http://www.w3.org/2001/XMLSchema-datatypes )
- as well as RELAX NG default datatypes (as used in relaxng.rng).
- * RELAX NG Compact Syntax support, though not yet stable.
- See Commons.Xml.Relaxng.Rnc.RncParser class.
- </ul>
- ** System.Xml v2.0
- Microsoft released the first public beta version of .NET Framework 2.0,
- available from <a href="http://www.microsoft.com/downloads/details.aspx?familyid=916EC067-8BDC-4737-9430-6CEC9667655C&displaylang=en">MSDN</a>.
- It contains several new classes.
- There are two assemblies related to System.Xml v2.0; System.Xml.dll and
- System.Data.SqlXml.dll. Now that System.Data.SqlXml.dll is little important.
- It just contains only XQueryCommand class inside System.Xml.* namespace.
- Most of the important part are in System.Xml.dll.
- Note that .NET Framework is pre-release version, so they are subject
- to change. Actually many of the pre-released classes vanished.
- System.Xml 2.0 contains several features such as:
- <ul>
- * new XPathNavigator <del>and XPathDocument</del><ins>XPathDocument is <a href="http://blogs.msdn.com/dareobasanjo/archive/2004/08/25/220251.aspx">being reverted</a></ins>
- * XmlReaderSettings, XmlWriterSettings and factory methods
- * Strongly typed XmlReader and XmlWriter.
- * XML Schema design changes
- * XSD Inference
- * Well-documented and improved XmlSerializer.
- * XQuery execution engine
- * XQuery and XSLT per-stylesheet assembly generator
- </ul>
- *** System.Xml 2.0
- **** XmlReader/XmlWrier Factory methods
- In .NET 2.0, XmlTextReader, XmlNodeReader, XmlValidatingReader are
- obsolete and XmlReader.Create() is recommended (there is however no
- alternative way to create XmlNodeReader). Similarly, there are
- XmlWriter.Create() overloads.
- Currently, Microsoft's XmlWriter.Create() is unreliable and maybe there
- will be changes. So basically XmlWriter.Create() is supposed to be done
- after the next beta version of .NET 2.0.
- Some of XmlReader.Create() overloads are implemented, with limited
- XmlReaderSettings support.
- **** Typed XmlReader/XmlWriter
- In .NET 2.0, XmlReader is supposed to support strongly-typed data reading.
- They are based on W3C "XML Schema Datatypes" Recommendation and "XQuery 1.0
- and XPath 2.0 Data Model" Working Draft.
- Some of XmlReader.ReadValueAsXxx() and XmlWriter.WriteValue() overloads are
- implemented, though incompletely. They are based on internal XQueryConvert.
- **** Sub-tree handling in XmlReader/XmlWriter/XPathNavigator
- Currently XmlReader.ReadSubtree(), XmlWriter.WriteSubtree() and
- XPathNavigator.ReadSubtree() are implemented, though not well-tested.
- They are based on Mono.Xml.SubtreeXmlReader and
- Mono.Xml.XPath.XPathNavigatorReader classes.
- *** System.Xml.Schema 2.0
- Since .NET 1.x is not so compliant with W3C XML Schema specification,
- Microsoft had to redesign System.Xml.Schema classes. We also have to
- change many things.
- 1) It does not expose XmlSchemaDatatype anymore (except for obsolete
- members). Primitive types are represented as XmlSchemaSimpleType
- instances (thus there are ElementSchemaType, AttributeSchemaType,
- BaseXmlSchemaType that replace some existing properties).
- 2) "XQuery 1.0 and XPath 2.0 Data Model" datatypes (such as
- xdt:dayTimeDuration) are newly supported. They are partially implemented
- yet. This task is partly done.
- 3) schema structures are now bound in parent-child relationship. It is
- not yet implemented. With related to it, there seems bunch of schema
- compilation bugfixes.
- 4) XmlSchemaCollection is not used anymore to represent effective set of
- schemas. Instead, new XmlSchemaSet class is used. It should affect on
- schema compilation design. In fact, I've implemented XmlSchemaCollection
- as more conformant to W3C specification, but there are still many changes
- required. This task is partly done.
- **** XSD Inference
- In .NET 2.0, there is an XML Schema inference implementation. Now that
- XmlSchemaSet is basically implemented, it can be separately done by anyone.
- Volunteer efforts are welcome here.
- *** System.Xml.XPath 2.0
- **** Editable XPathDocument
- <del>
- in .NET 2.0 XPathDocument is supposed to be editable. Currently we provide
- fast document table model based implementation (DTMXPathNavigator), but
- by that design change, we (and they) cannot provide fast read only
- XPathNavigator from XPathDocument anymore.
- </del><ins>
- It is being reverted to the original (.NET 1.x) XPathDocument. We still have
- them, but we'll revert them too in the future. So our XPathDocument will be still faster one.
- </ins>
- Currently, new XPathDocument implementation is provided. The actual
- implementation is Mono.Xml.XPath.XPathDocument2, that is simple dom-like
- tree model. XPathDocument2 implements the same interfaces as XPathDocument
- does. And XPathDocument delegates most of the methods to that class (for
- example, XPathDocument.CreateEditor() calls XPathDocument2.CreateEditor()).
- Currently Mono.Xml.XPath.XPathDocument2 is unstable (it does not pass
- the standalone XSLT tests unlike existing DTMXPathDocument does). So
- it did not replace existing XPathDocument implementation, but you can use
- new implementation by explicitly setting environment value
- USE_XPATH_DOCUMENT_2 = yes. Currently it supports (well, is supposed
- to support) basic editor feature such as AppendChild(). Other members
- are untested (such as RejectChanges()).
- **** extra stuff - XPathEditableDocument
- Currently we provide another IXPathEditable; XPathEditableDocument. That is
- based on the idea that handles XmlDocument as editor target. It is
- implemented as Mono.Xml.XPath.XPathEditableDocument. We might provide this
- class as extra set (might be different mono-specific XML assembly).
- **** System.Xml.XQuery
- In this namespace, there are two significant classes. XsltCommand and
- XQueryCommand.
- XsltCommand implements XSLT transformation. It is almost the same as
- System.Xml.Xsl.XslTransform, but this class transforms documents twice
- to four times as fast as XslTransform. Instead, stylesheet compilation
- is much slower, because it generates compiled stylesheet assembly.
- XQueryCommand implements XQuery. XQuery is a new face XML document
- manipulation language (at least new face in .NET world). It is similar
- to XSLT, but extended to support XML Schema based datatypes (and it is
- not XML based langauge). It is similar to XPath, but it can construct
- XML nodes. It has no complicated template resolution, but works like
- functional languages.
- Under MS.NET, XQuery implementation is mainly in System.Xml.Query and
- MS.Internal.Xml.* namespaces. The implementation is mostly
- in System.Xml.dll. It is also true to our System.Xml.dll. Our XQueryCommand
- in System.Data.SqlXml.dll just invokes the actual XQuery processor
- (Mono.Xml.XPath2.XQueryCommandImpl) which resides in System.Xml.dll via
- reflection.
- Currently we are not implementing MS.Internal.Xml.* classes. MS
- implementation is based on an old version of the W3C specification, and
- our implementation is currently based on
- <a href="http://www.w3.org/TR/2004/WD-xquery-20040723/">23 July 2004
- version</a> (latest as of now) of the working draft.
- XQuery implementation tasks are:
- <ul>
- * XQuery syntax parser that parses xquery string to AST
- (abstract syntax tree). -> partly not done.
- * XQuery AST compiler into static context -> partly not done.
- * XQuery (dynamic context) runtime = XQuery expression evaluator
- + sequence iterator. -> partly not done.
- * XPathItem data model and (mainly) conversion support.
- -> partly done.
- * Applied expression classes for XQuery/XPath 2.0 functions and
- operators. -> partly done.
- * Optimization, and design per-query assembly code generator (later)
- </ul>
- It already handles some queries, while major part implementation is missing
- or buggy (like FLWOR, expressions for sequence type handling, built-in
- function support etc.).
- *** Relax NG and DSDL in Mono 1.2
- Currently we support only RELAX NG as one part of ISO DSDL effort. There
- is existing Schematron implementation (NMatrix Project: <a
- href="http://sourceforge.net/projects/dotnetopensrc/">
- http://sourceforge.net/projects/dotnetopensrc/</a>). With a few changes,
- it can be used with mono.
-
- We also don't have multi-language based validation support, namely
- Namespace-based Validation Dispatch Language (NVDL). To support unwrapping,
- one special XmlReader implementation is required (other schema validation
- support can be done by ReadSubtree()). Note that we had seen RELAX
- Namespace, Modular Namespace (MNS) and Namespace Routing Language (NRL)
- - that is, standardization effort is still ongoing (though NVDL looks
- mostly the same as NRL).
- In Mono 1.2, there might be improvements on Commons.Xml.Relaxng.
- <ul>
- * Currently RelaxngPattern.Compile() provides cheap compilation
- error information. At least it can provide error location.
- Also, the type of error should be kind of
- RelaxngGrammarException.
- * Right now there is no ambiguity detection implementation that
- would be useful for RelaxngPattern based xml serialization (if
- there is need).
- * Because of lack of ambiguity detection, there is no way to
- provide XmlMapping (XmlTypeMapping/XmlMemberMapping). But
- If anyone is interested in such effort, integration with
- XmlSerializer would be interesting task.
- </ul>
- ** Tools
- *** xsd.exe
- See <a href="ado-net.html">ADO.NET page</a>.
- ** Miscellaneous
- *** Mutual assembly dependency
- Sometimes I hear complain about System.dll and System.Xml.dll mutual
- dependency: System.dll references to System.Xml.dll (e.g.
- System.Configuration.ConfigXmlDocument extended from XmlDocument), while
- System.Xml.dll vice versa (e.g. XmlUrlResolver.ResolveUri takes System.Uri).
- Since they are in public method signatures, so at least we cannot get rid
- of these mutual references.
- Nowadays System.Xml.dll is built using incomplete System.dll (lacking
- System.Xml dependent classes such as ConfigXmlDocument). Full System.dll
- is built after System.Xml.dll is done.
- Note that you still need System.dll to run mcs.
- Atsushi Eno <[email protected]>
- last updated 09/02/2004
|