| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343 |
- * INCOMPLETE
- * XML Schema Inference Rules
- ** Requirements
- XmlReader:
- <ul>
- - that does not expose EntityReference.
- - that does not contain xsd:* elements.
- </ul>
- XmlSchemaSet: only that was generated by this utility class. See
- particle inference section described later.
- Actually MS implementation has insufficient check for this input,
- so it accepts more than it expects.
- *** Allowed schema components
- Before infering merged particles with premised particles in
- XmlSchemaSet, we have to know what is expected and what is not:
- <ul>
- - facets are not supported. [a014.xsd]
- - xs:all is not supported. [a003.xsd]
- - xs:group (ref) is not supported. [a004.xsd]
- - xs:choice that does not contain xs:sequence is not
- supported [a005.xsd].
- - xs:any is not supported. Only xs:element are expected
- to be contained in xs:sequence. [a011.xsd]
- - same name particles that are still not ambiguous
- are computed into invalid particles. It looks
- like MS's unexpected bug. [a010.xsd]
- - attributeGroup looks not supposed to be there (MS has a
- bug around here). [a006.xsd]
- - anyAttribute is not regarded as a valid particle, and
- the output complexType definition just rips them out.
- [a013.xsd]
- - but substitutionGroup is not rejected and it will remain
- in the output. [a001.xsd]
- -> It must be rejected. It breaks choice compatibility.
- </ul>
- ** Processing model
- First, parameter XmlSchemaSet is compiled[*1] and interpreted into
- its internal schema representation that is going to be used for
- XmlReader input examination. The resulting XmlSchemaSet is the same
- as the input XmlSchemaSet.
- [*1] FIXME: this design might change.
- The XmlSchemaSet is compiled and , because 1) it might contain
- XmlSchemaInclude items. So it won't be possible to process inference
- inside the input schema set. However, reusing the input reduces
- some annoyance; to preserve elementFormDefault etc.
- Second, XmlReader is moved to content (document element) and
- "element inference" starts from here (described later).
- Resulting XmlSchemaSet keeps original XmlSchemas into itslef.
- For example, it keeps elementFormDefault and attributeFormDefault.
- Basically it will process the XmlReader with existing XmlSchemaSet
- and won't "merge" two XmlSchemaSets one of which is newly infered
- from this XmlReader. Because anyways the XmlReader will have to
- infer sequential nodes (siblings).
- Once the element definition is determined (or created), any other
- branches in the schema are ignored.
- ** Attributes
- *** attribute component definitions and references.
- **** ignored attributes
- xsi:type, xsi:schemaLocation and xsi:noNamespaceSchemaLocation
- attributes are ignored.
- **** special attributes
- If xsi:nil does exist, then its content are not handled, while its
- attributes are handled.
- xml:* schema are predetermined; it has a fixed schema for that ns.
- **** namespaced attributes
- miscellaneous attributes that resides in a certain namespace is
- referenced as <attribute ref="qualified-name" />
- **** local attributes
- miscellaneous attributes are represented as <attribute name="blah" />
- *** attribute occurence
- when defining a complexType for a newly-created element, the attribute
- can be set as "required". Otherwise, it must be set as "optional".
- For every element instance occurence, all attributes are tested
- existence, and if it does not, then it must be set as "optional".
- *** attribute value types
- FIXME: need to describe the relaxation of attribute value types.
- ** Content model inference
- *** inference processing model
- Content model consists of two parts;
- - content type : empty | elementOnly | textOnly | mixed
- - particle : sequence | choice | all | groupRef
- On processing reader.Read(), the node is first "tested" against
- current schema content model. If the current node on the XmlReader
- is not acceptable, then "content model expansion" happens.
- <ul>
- - If the current node is text content, then process the
- text node according to "evaluating text content".
- - If the current node is an element, then process it
- in accordance with "evaluating particle".
- </ul>
- *** evaluating element
- When an element occured, then it must be accepted as a particle.
- First, content type must be examined:
- <ul>
- - If the content type was simpleType, then it is changed
- into complexType with complexContent and mixed='true'.
- The infered content particle must be optional.
- - If the content type was empty, then it is changed into
- complexType with complexContent (it is not mixed unlike
- above). The infered content particle must be optional.
- - If the content type was elementOnly or mixed, no need
- to change.
- </ul>
- Next, the content particle must be evaluated.
- According to the input XmlSchemaSet limitations, there will be
- only these patterns listed here:
- - empty content
- - simple content
- - sequence (of element particles)
- - choice of sequences
- **** Reader progress
- Every element is tested against current element candidates.
- <ul>
- - When the target element is a document element, then all
- the global elements in XmlSchemaSet are the candidates.
- <ul>
- - If there is a maching name, then that element
- definition is used as the context element for
- the node's content, and current particle is
- in front of the first particle.
- - If there isn't, then the inference engine creates
- a new element definition, and content is none
- (none != empty).
- </ul>
- - When the target element is infered in a new element
- definition, then
- </ul>
- **** Particle inference
- IMPORTANT: Here I tried to formalize the inference, but it is
- incomplete notes.
- Target {particle} to add:
- isNew -> <xs:element name={name}> ... </xs:element>
- !isNew -> <xs:element name={name minOccurs="0"> ... </xs:element>
- no definition
- // define complexType and add {particle} to .Particle
- toComplexType()
- processcontent(ct.Particle, isNew)
- simpleType
- makeComplexContent()
- complexType
- empty definition (no content model, no particle)
- // -> add xs:element name={name} minOccurs="0" to .Particle
- -> processcontent(ct.Particle, isNew)
- simple content
- -> makeComplexContent()
- complex content / extension
- -> processContent(cce.Particle, isNew)
- complex content / restriction
- -> processContent(ccr.Particle, isNew)
- .Particle
- -> processContent(ct.Particle, isNew)
- makeComplexContent()
- change to complexType which has complex content mixed="true" and
- extension. Discard simple type information. Add {particle} to
- extension's .Particle.
- processContent(Particle particle, isNew)
- if particle is either empty or sequence
- processSequential(particle, 0, false, isNew)
- else if particle is sequence of choices
- processLax(particle, 0)
- else
- error.
- processSequential(Sequence particle, int index, bool consumed, bool isNew)
- particle.Count <= index
- -> appendSequential(particle, isNew)
- sequence
- if (particle[index] has the same name)
- -> if (consumed) then sequence[index].maxOccurs = inf.
- InferElement (sequence[index])
- processParticles(particle, index, true)
- else
- -> if (!consumed)
- sequence[index].minOccurs = 0.
- processParticle(particle, index+1, false)
- else
- particle = toSequenceOfChoice(particle)
- processLax(particle, index)
- processLax(choice, index)
- foreach (element el in choice.Items)
- if (el has the same name)
- InferElement (el)
- processLax(choice, index + 1)
- return;
- appendLax(particle)
- appendSequential(particle)
- if (particle is empty)
- make particle as sequence
- sequence.Items.Add(InferElement(null))
- appendLax(choice)
- choice.Items.Add(InferElement(null))
- *** evaluating text content
- When text content occured, it must be accepted as simple content.
- <ul>
- - If the content type was textOnly, then "type relaxation"
- happens (described later).
- - If the content type was already mixed, then it is skipped.
- - If the content type was elementOnly, then the content type
- becomes mixed and then skipped.
- - If the content type was empty, then its content type
- becomes text and then skipped. The type is xs:string (no
- type promotion will happen since empty value cannot be
- accepted as any other types handles in this design).
- </ul>
- (Actually inference is done from non post compilation information.)
- Note that type relaxation happens only when it is infered as textOnly
- and it always occurs.
- ** Type inference
- All data types are infered from string value; either element content
- or attribute value.
- *** primitive type inference
- When a string is being evaluated as xs:blahblah typed value, it is
- tried against several types.
- <ul>
- - First, it is evaluated as xs:boolean; true, false<del>, 1 or 0</del>.
- - Next, its integer value is computed. 1) If it is
- successful, then its value range is examined if it
- matches with unsignedByte, byte, unsignedShort, short,
- unsignedInt, int, unsignedLong, long, and integer.
- - If it was not an integer, then it is evaluated as a float
- number, as a double number, and then as a decimal number
- as well.
- - Next, it is examined as xs:dateTime, xs:duration and
- related schema types.
- - If if did not match any kind of predefined types, then
- xs:string is infered. No other string-based types (such
- as xs:token) are infered.
- </ul>
- *** type relaxation
- When a string value is being accepted with existing type, the type
- might have to change to accept it.
-
- For example:
- <ul>
- - xs:int cannot accept "abc"
- - <del>string with maxLength="3" cannot accept "abcd"</del>
- facets are not created anyways and thus not supported
- by this inference engine.
- - 12345 is not acceptable for xs:unsignedByte, but acceptable
- for unsignedShort
- </ul>
- Here, the new string value is infered into a simpleType, and then
- the processor will compute the most specific common type between
- the existing type and the newly infered type.
|