c
/
mono
mirror of https://github.com/mono/mono.git


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343
							* INCOMPLETE

* XML Schema Inference Rules

** Requirements

	XmlReader:
	<ul>
		- that does not expose EntityReference.
		- that does not contain xsd:* elements.
	</ul>

	XmlSchemaSet: only that was generated by this utility class. See
	particle inference section described later.

	Actually MS implementation has insufficient check for this input,
	so it accepts more than it expects.

*** Allowed schema components

	Before infering merged particles with premised particles in
	XmlSchemaSet, we have to know what is expected and what is not:

	<ul>
		- facets are not supported. [a014.xsd]
		- xs:all is not supported. [a003.xsd]
		- xs:group (ref) is not supported. [a004.xsd]
		- xs:choice that does not contain xs:sequence is not
		  supported [a005.xsd].
		- xs:any is not supported. Only xs:element are expected
		  to be contained in xs:sequence. [a011.xsd]
		- same name particles that are still not ambiguous
		  are computed into invalid particles. It looks
		  like MS's unexpected bug. [a010.xsd]
		- attributeGroup looks not supposed to be there (MS has a
		  bug around here). [a006.xsd]
		- anyAttribute is not regarded as a valid particle, and
		  the output complexType definition just rips them out.
		  [a013.xsd]
		- but substitutionGroup is not rejected and it will remain
		  in the output. [a001.xsd]
		  -> It must be rejected. It breaks choice compatibility.
	</ul>


** Processing model

	First, parameter XmlSchemaSet is compiled[*1] and interpreted into
	its internal schema representation that is going to be used for
	XmlReader input examination. The resulting XmlSchemaSet is the same
	as the input XmlSchemaSet.

	[*1] FIXME: this design might change.
	The XmlSchemaSet is compiled and , because 1) it might contain
	XmlSchemaInclude items. So it won't be possible to process inference
	inside the input schema set. However, reusing the input reduces
	some annoyance; to preserve elementFormDefault etc.

	Second, XmlReader is moved to content (document element) and
	"element inference" starts from here (described later).

	Resulting XmlSchemaSet keeps original XmlSchemas into itslef.
	For example, it keeps elementFormDefault and attributeFormDefault.

	Basically it will process the XmlReader with existing XmlSchemaSet
	and won't "merge" two XmlSchemaSets one of which is newly infered
	from this XmlReader. Because anyways the XmlReader will have to
	infer sequential nodes (siblings).

	Once the element definition is determined (or created), any other
	branches in the schema are ignored.


** Attributes

*** attribute component definitions and references.

**** ignored attributes

	xsi:type, xsi:schemaLocation and xsi:noNamespaceSchemaLocation
	attributes are ignored.

**** special attributes

	If xsi:nil does exist, then its content are not handled, while its
	attributes are handled.

	xml:* schema are predetermined; it has a fixed schema for that ns.

**** namespaced attributes

	miscellaneous attributes that resides in a certain namespace is
	referenced as <attribute ref="qualified-name" />

**** local attributes

	miscellaneous attributes are represented as <attribute name="blah" />


*** attribute occurence

	when defining a complexType for a newly-created element, the attribute
	can be set as "required". Otherwise, it must be set as "optional".

	For every element instance occurence, all attributes are tested
	existence, and if it does not, then it must be set as "optional".

*** attribute value types

	FIXME: need to describe the relaxation of attribute value types.


** Content model inference

*** inference processing model

	Content model consists of two parts;

		- content type : empty | elementOnly | textOnly | mixed
		- particle : sequence | choice | all | groupRef

	On processing reader.Read(), the node is first "tested" against
	current schema content model. If the current node on the XmlReader
	is not acceptable, then "content model expansion" happens.

	<ul>
		- If the current node is text content, then process the
		  text node according to "evaluating text content".
		- If the current node is an element, then process it
		  in accordance with "evaluating particle".
	</ul>


*** evaluating element

	When an element occured, then it must be accepted as a particle.
	First, content type must be examined:

	<ul>
		- If the content type was simpleType, then it is changed
		  into complexType with complexContent and mixed='true'.
		  The infered content particle must be optional.
		- If the content type was empty, then it is changed into
		  complexType with complexContent (it is not mixed unlike
		  above). The infered content particle must be optional.
		- If the content type was elementOnly or mixed, no need
		  to change.
	</ul>

	Next, the content particle must be evaluated. 

	According to the input XmlSchemaSet limitations, there will be
	only these patterns listed here:

		- empty content

		- simple content

		- sequence (of element particles)

		- choice of sequences

**** Reader progress

	Every element is tested against current element candidates.

	<ul>
		- When the target element is a document element, then all
		  the global elements in XmlSchemaSet are the candidates.

		<ul>
			- If there is a maching name, then that element
			  definition is used as the context element for
			  the node's content, and current particle is
			  in front of the first particle.
		 	- If there isn't, then the inference engine creates
			  a new element definition, and content is none
			  (none != empty).
		</ul>

		- When the target element is infered in a new element
		  definition, then 
	</ul>


**** Particle inference

	IMPORTANT: Here I tried to formalize the inference, but it is
	incomplete notes.

	Target {particle} to add:
		isNew  -> <xs:element name={name}> ... </xs:element>
		!isNew -> <xs:element name={name minOccurs="0"> ... </xs:element>

	no definition
	//	define complexType and add {particle} to .Particle
		toComplexType()
		processcontent(ct.Particle, isNew)

	simpleType
		makeComplexContent()

	complexType
		empty definition (no content model, no particle)
	//		-> add xs:element name={name} minOccurs="0" to .Particle
			-> processcontent(ct.Particle, isNew)

		simple content
			-> makeComplexContent()

		complex content / extension
			-> processContent(cce.Particle, isNew)

		complex content / restriction
			-> processContent(ccr.Particle, isNew)

		.Particle
			-> processContent(ct.Particle, isNew)

	makeComplexContent()
		change to complexType which has complex content mixed="true" and
		extension. Discard simple type information. Add {particle} to
		extension's .Particle.

	processContent(Particle particle, isNew)
		if particle is either empty or sequence
			processSequential(particle, 0, false, isNew)
		else if particle is sequence of choices
			processLax(particle, 0)
		else
			error.

	processSequential(Sequence particle, int index, bool consumed, bool isNew)
		particle.Count <= index
			-> appendSequential(particle, isNew)
		sequence
			if (particle[index] has the same name)
			     -> if (consumed) then sequence[index].maxOccurs = inf.
				InferElement (sequence[index])
				processParticles(particle, index, true)
			else
			     -> if (!consumed)
					sequence[index].minOccurs = 0.
					processParticle(particle, index+1, false)
				else
					particle = toSequenceOfChoice(particle)
					processLax(particle, index)

	processLax(choice, index)
		foreach (element el in choice.Items)
			if (el has the same name)
				InferElement (el)
				processLax(choice, index + 1)
				return;
		appendLax(particle)

	appendSequential(particle)
		if (particle is empty)
			make particle as sequence
		sequence.Items.Add(InferElement(null))

	appendLax(choice)
		choice.Items.Add(InferElement(null))


*** evaluating text content

	When text content occured, it must be accepted as simple content.

	<ul>
		- If the content type was textOnly, then "type relaxation"
		  happens (described later).
		- If the content type was already mixed, then it is skipped.
		- If the content type was elementOnly, then the content type
		  becomes mixed and then skipped.
		- If the content type was empty, then its content type
		  becomes text and then skipped. The type is xs:string (no
		  type promotion will happen since empty value cannot be
		  accepted as any other types handles in this design).
	</ul>

	(Actually inference is done from non post compilation information.)

	Note that type relaxation happens only when it is infered as textOnly
	and it always occurs.


** Type inference

	All data types are infered from string value; either element content
	or attribute value.


*** primitive type inference

	When a string is being evaluated as xs:blahblah typed value, it is
	tried against several types.

	<ul>
		- First, it is evaluated as xs:boolean; true, false<del>, 1 or 0</del>.

		- Next, its integer value is computed. 1) If it is
		  successful, then its value range is examined if it
		  matches with unsignedByte, byte, unsignedShort, short,
		  unsignedInt, int, unsignedLong, long, and integer.

		- If it was not an integer, then it is evaluated as a float
		  number, as a double number, and then as a decimal number
		  as well.

		- Next, it is examined as xs:dateTime, xs:duration and
		  related schema types.

		- If if did not match any kind of predefined types, then
		  xs:string is infered. No other string-based types (such
		  as xs:token) are infered.
	</ul>


*** type relaxation

	When a string value is being accepted with existing type, the type
	might have to change to accept it.
	
	For example:
	<ul>
		- xs:int cannot accept "abc"
		- <del>string with maxLength="3" cannot accept "abcd"</del>
		  facets are not created anyways and thus not supported
		  by this inference engine.
		- 12345 is not acceptable for xs:unsignedByte, but acceptable
		  for unsignedShort
	</ul>

	Here, the new string value is infered into a simpleType, and then
	the processor will compute the most specific common type between
	the existing type and the newly infered type.