123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525 |
- ## What is XML?
- XML (eXtensible Markup Language) is a markup language that defines a set of rules for encoding
- documents in a format that is both human-readable and machine-readable. It is designed to store
- and transport data, as well as being a universal standard for data interchange between different
- applications and platforms.
- XML files have a hierarchical structure, where data is organized into elements, attributes,
- and text nodes. Elements are the primary building blocks of XML, and they can be nested within each
- other to create a tree-like structure. Attributes provide additional information about elements and
- are defined within the opening tag of an element. Text nodes store the actual content of the elements.
- Similar to XML, JSON (JavaScript Object Notation) is another popular format for data interchange,
- which also uses a hierarchical structure to organize data. Both XML and JSON are designed to be
- human-readable and machine-readable, making them well-suited for exchanging data between different
- programming languages and platforms.
- However, there are some differences between XML and JSON. While XML relies on tags and attributes to
- define and structure data, JSON uses key-value pairs and objects (denoted by curly braces) or arrays
- (denoted by square brackets) to represent data. In general, JSON is considered to be more compact and less
- verbose than XML, which can result in faster parsing and reduced data size for transmission.
- Here is a comparison of XML and JSON representations for the same data:
- XML:
- ```xml
- <bookstore>
- <book id="1">
- <title>Book Title 1</title>
- <author>Author 1</author>
- <price>19.99</price>
- </book>
- <book id="2">
- <title>Book Title 2</title>
- <author>Author 2</author>
- <price>24.99</price>
- </book>
- </bookstore>
- ```
- JSON:
- ```json
- {
- "bookstore": {
- "book": [
- {
- "id": "1",
- "title": "Book Title 1",
- "author": "Author 1",
- "price": 19.99
- },
- {
- "id": "2",
- "title": "Book Title 2",
- "author": "Author 2",
- "price": 24.99
- }
- ]
- }
- }
- ```
- ## XML by Example
- We are going to explore the fundamentals of XML by creating an example book store application.
- This application will manage a list of books, including their titles, authors, publication years,
- prices, and other relevant details. Through this example, we will demonstrate how XML is used to
- organize and structure data, making it easier to store, exchange, and process information.
- Our example book store will maintain an XML file, books.xml, which contains the following book data:
- ```xml
- <?xml version="1.0" encoding="UTF-8"?>
- <bookstore>
- <book category="cooking">
- <title lang="en">Everyday Italian</title>
- <author>Giada De Laurentiis</author>
- <year>2005</year>
- <price>30.00</price>
- </book>
- <book category="children">
- <title lang="en">Harry Potter</title>
- <author>J K. Rowling</author>
- <year>2005</year>
- <price>29.99</price>
- </book>
- <book category="web">
- <title lang="en">XQuery Kick Start</title>
- <author>James McGovern</author>
- <author>Per Bothner</author>
- <author>Kurt Cagle</author>
- <author>James Linn</author>
- <author>Vaidyanathan Nagarajan</author>
- <year>2003</year>
- <price>49.99</price>
- </book>
- <book category="web" cover="paperback">
- <title lang="en">Learning XML</title>
- <author>Erik T. Ray</author>
- <year>2003</year>
- <price>39.95</price>
- </book>
- </bookstore>
- ```
- __Source:__ https://www.w3schools.com/xml/books.xml
- In this XML file, each book is represented as a separate element node, with the books nested within
- the parent "bookstore" element. Element nodes can contain attributes, such as "category" or "lang",
- and they can wrap either text or child element nodes.
- For instance, the last book in our example has a "category" attribute with the value "web" and a
- "cover" attribute with the value "paperback". This book also has multiple authors, which
- demonstrates that element nodes can contain multiple child nodes of the same type.
- #### Nodes
- In XML, the DOM (Document Object Model) represents everything within an XML file as a node.
- A node is a fundamental component of an XML document, and there are various types of nodes,
- including element nodes, attribute nodes, and text nodes. Understanding these different node
- types is crucial for effectively working with XML data.
- * **Element nodes** : These are the primary building blocks of an XML document, representing
- the XML tags and their content. In our example, `<book>` is an element node. Element nodes can
- be nested within one another, creating a tree-like structure that organizes the data.
- * **Attribute nodes** : These nodes provide additional information about an element node and are
- defined within the opening tag of the element. For instance, the category attribute in the `<book>`
- element is an attribute node. Attribute nodes are associated with a specific element and cannot exist
- on their own.
- * **Text nodes** : These nodes store the actual content within element nodes. In our example, the
- text "Everyday Italian" within the `<title>` element is a text node.
- For simplicity, we will often refer to them as elements, attributes, and text.
- #### Analyzing an XML file
- Examining an XML file allows us to extract various valuable information, such as:
- * **File encoding, XML version, and root node name** : These metadata details are essential for processing
- and interpreting the XML file. In our example, the file encoding is UTF-8, the XML version is 1.0,
- and the root node name is "bookstore."
- * **Attributes of nodes** : Attributes provide extra information about elements, such as
- categorization, language, or other properties. In our book store example, the "category" and
- "lang" attributes are used to classify books and specify the language of the titles, respectively.
- * **Content of nodes** : The content within nodes can be either text or other nodes, representing
- data or further organization of the XML structure. In our example, the `<book>` element contains
- both text nodes (e.g., the book title) and child element nodes (e.g., the `<author>` element).
- * **Number of nodes, attributes in a node, and children of a node** : Counting and understanding the
- relationships between nodes, their attributes, and their children is vital for navigating and manipulating
- XML data effectively. For instance, knowing the number of books in the bookstore or the number of
- authors for a particular book can be useful for various operations and analyses.
- ### Reading XML Files
- To read data from an existing XML file, you first need to create an #TXmlDoc instance and parse
- the `books.xml` file.
- ```blitzmax
- Local xmlDoc:TXmlDoc = TXmlDoc.parseFile("books.xml")
- ```
- However, we also want to store the book data as objects in BlitzMax,
- so we will create a custom #Type for it first. As you observed in the XML file, the book titles
- can be in different languages, so we need to ensure that we can identify them using the `lang`
- attribute of the `<title>` element. We can achieve this with a key-value mapping using a #StringMap.
- For authors, since they are not identified by a key, we can use a simple #String array to store them.
- ```blitzmax
- Type TBook
- Field category:String
- Field cover:String
- Field title:StringMap
- Field authors:String[]
- Field year:Int
- Field price:Float
- End Type
- ```
- _(Note: Using floating-point numbers for financial values should generally be avoided,
- as it can lead to rounding errors, which is undesirable in financial contexts. For the sake
- of simplicity, we will use `Float` in this example.)_
- Now that we have parsed the XML file and created the `TBook` type to store the information,
- we are ready to read the data from the XML and store it in our custom class.
- To extract information from the nodes in the parsed XML file, we can start by retrieving
- the root node using the `getRootElement()` method provided by the #TXmlDoc instance.
- ```blitzmax
- Local rootNode:XmlNode = xmlDoc.getRootElement()
- ```
- #XmlNode objects have various methods for setting and retrieving values. While reading our
- XML file, the following methods might prove useful:
- * **Finding elements**: `getAttribute()`, `getAttributeList()`, `hasAttribute()`, `getChildren()`,
- `nextSibling()`, `previousSibling()`, and `findElement()` (if you know what you're looking for).
- * **Retrieving content**: `getContent()`.
- Now that we have the root node - which is the `<bookstore>` element - we can access its
- children, which in this case are the `<book>` elements. To iterate over all the books,
- we can use the root node's `getChildren()` method, which returns a #TList. We can then
- iterate over this list using a `For EachIn` loop.
- ```blitzmax
- For local childNode:XmlNode = EachIn rootNode.getChildren()
- ' read data of interest from childNode
- Next
- ```
- Let's take a closer look at a typical book entry in our XML file:
- ```xml
- <book category="web">
- <title lang="en">XQuery Kick Start</title>
- <author>James McGovern</author>
- <author>Per Bothner</author>
- <author>Kurt Cagle</author>
- <author>James Linn</author>
- <author>Vaidyanathan Nagarajan</author>
- <year>2003</year>
- <price>49.99</price>
- </book>
- ```
- To extract information from this book entry, we will use the following approaches:
- * **Category attribute** : Use `getAttribute()` to retrieve the category value.
- * **Title element and `lang` attribute** : Use `getAttribute()` to obtain the language key of the
- title, and `getContent()` to get the title text.
- * **Author, year, and price elements** : These elements only contain text, so we can use `getContent()`
- to read their values.
- For each book, we will create a new `TBook` instance and add it to a #TList that stores all our books.
- In your own projects, you might want to add conditional checks before adding a book to the list.
- For example, you could skip adding a book if certain required data is missing from the XML file
- (e.g., both "title" and "author" are absent). For the purpose of this example, we will skip checking
- for duplicates or incomplete entries.
- ```blitzmax
- Local allBooks:TList = New TList
- For Local childNode:TxmlNode = EachIn rootNode.getChildren()
- ' create a new book instance
- Local book:TBook = New TBook
-
- book.category = childNode.getAttribute( "category" )
- ' this stays empty if the attribute is not set
- book.cover = childNode.getAttribute( "cover" )
-
- ' loop over all child nodes and handle them according to
- ' their name
- For Local subNode:TxmlNode = EachIn childNode.getChildren()
- Select subNode.getName()
- Case "title"
- Local key:String = subNode.getAttribute( "lang" ).ToLower()
- Local value:String = subNode.getContent()
- book.title.Insert(key, value)
- Case "author"
- ' add the "single entry array" to the authors array
- book.authors :+ [ subNode.getContent() ]
- Case "year"
- ' cast the string content value to Int
- book.year = Int( subNode.getContent() )
- Case "price"
- ' cast the string content value to Float
- book.price = Float( subNode.getContent() )
- End Select
- Next
-
- ' add the book to our book storage list
- allBooks.AddLast( book )
- Next
- ' close the TxmlDoc instance
- xmlDoc.Free()
- ```
- Now that we have successfully extracted all the book information from the XML file and stored
- it in our application, it is time to take the next step. Let's explore how to save our book data back
- to a new XML file.
- ### Saving Changes to XML Files
- When working with XML files, it is crucial to save your changes periodically.
- In this section, we will explore the process of saving objects to XML files, otherwise known as
- "serialization." This involves converting the desired data into text format, making it compatible
- with XML file structures.
- In our book example, we only have to handle numbers and text data, which simplifies the serialization process.
- To save our data, we will need a #TxmlDoc object as the foundation and a root node to add our book nodes.
- When creating a new #TxmlDoc, it is a good practice to use the `newDoc()` helper function, which allows you
- to specify the XML version to use. This ensures that your output file is consistent with the original
- `books.xml` file format.
- For demonstration purposes, we will also print the content of the document using the #TxmlDoc's `saveFile()`
- method (passing "-" skips saving and redirects content to the output):
- ```blitzmax
- Local exportXmlDoc:TxmlDoc = TxmlDoc.newDoc("1.0")
- Local exportRootNode:TxmlNode = TxmlNode.newNode("bookstore")
- ' Set the newly created node as the root for the empty document
- exportXmlDoc.setRootElement(exportRootNode)
- ' Print the content in a formatted manner
- exportXmlDoc.saveFile("-", True, True)
- ```
- This code will output the following:
- ```xml
- <?xml version="1.0" encoding="utf-8"?>
- <bookstore />
- ```
- As you can see, we've successfully created an empty XML file with the correct root element.
- In order to save each book, we need to create a new #TxmlNode for each book, along with additional
- nodes or attributes to represent the properties. The BlitzMax XML module simplifies this process
- by allowing you to pass values directly when creating a new child node using `addChild(name, value)`.
- ```blitzmax
- ' Create and attach a new book node under the root node
- Local bookNode:TxmlNode = exportRootNode.addChild("book")
- bookNode.setAttribute("category", book.category)
- ' Iterate over title keys and store the corresponding values
- ' Using method chaining, we can save storing the node first
- For local lang:String = EachIn book.title.Keys()
- Local title:String = String(book.title.ValueForKey(lang))
- bookNode.addChild("title", title).addAttribute("lang", lang)
- Next
- For local author:String = EachIn book.authors
- bookNode.addChild("author", author)
- Next
- bookNode.addChild("year", book.year)
- bookNode.addChild("price", book.price)
- ```
- Now, we can combine all of the code blocks we've discussed so far to accomplish the following tasks:
- 1. Load the `books.xml` file.
- 2. Save all the books as a new file named `books.new.xml`.
- By combining these code blocks, you can create a complete solution to read, manipulate, and save
- XML data using BlitzMax.
- ```blitzmax
- SuperStrict
- ' xml.mod is not automatically imported like brl.mod and pub.mod, so you need to manually import it
- Import text.xml
- Type TBook
- Field category:String
- Field cover:String
- Field title:TStringMap = New TStringMap
- Field authors:String[]
- Field year:Int
- Field price:Float
- End Type
- Local allBooks:TList = New TList
- ' === LOADING ===
- ' Load and parse the books.xml file
- Local xmlDoc:TxmlDoc = TxmlDoc.parseFile("books.xml")
- ' Retrieve the root element (bookstore)
- Local rootNode:TxmlNode = xmlDoc.getRootElement()
- ' Loop through all child nodes (books)
- For Local childNode:TxmlNode = EachIn rootNode.getChildren()
- ' Create a new TBook instance
- Local book:TBook = New TBook
- ' Get the book attributes
- book.category = childNode.getAttribute("category")
- book.cover = childNode.getAttribute("cover") ' This remains empty if the attribute is not set
- ' Loop through all child nodes of the current book and handle them according to their name
- For Local subNode:TxmlNode = EachIn childNode.getChildren()
- Select subNode.getName()
- Case "title"
- ' Store the title in the TStringMap using the language as the key
- Local key:String = subNode.getAttribute("lang").ToLower()
- Local value:String = subNode.getContent()
- book.title.Insert(key, value)
- Case "author"
- ' Add the author to the authors array
- book.authors :+ [subNode.getContent()]
- Case "year"
- ' Convert the year from a string to an integer
- book.year = Int(subNode.getContent())
- Case "price"
- ' Convert the price from a string to a float
- book.price = Float(subNode.getContent())
- End Select
- Next
- ' Add the TBook instance to the allBooks list
- allBooks.AddLast(book)
- Next
- ' Release resources associated with xmlDoc
- xmlDoc.Free()
- ' === SAVING ===
- ' Create a new TxmlDoc with version 1.0
- Local exportXmlDoc:TxmlDoc = TxmlDoc.newDoc("1.0")
- ' Create a new root node (bookstore) for the export document
- Local exportRootNode:TxmlNode = TxmlNode.newNode("bookstore")
- ' Set the new root node as the document's root element
- exportXmlDoc.setRootElement(exportRootNode)
- ' Loop through all books in the allBooks list
- For Local book:TBook = EachIn allBooks
- ' Create a new book node under the root node
- Local bookNode:TxmlNode = exportRootNode.addChild("book")
- ' Set the book's attributes
- bookNode.setAttribute("category", book.category)
- If book.cover
- bookNode.setAttribute("cover", book.cover)
- EndIf
- ' Add title nodes with their corresponding language attributes
- For local lang:String = EachIn book.title.Keys()
- Local title:String = String(book.title.ValueForKey(lang))
- bookNode.addChild("title", title).addAttribute("lang", lang)
- Next
- ' Add author nodes
- For local author:String = EachIn book.authors
- bookNode.addChild("author", author)
- Next
- ' Add year and price nodes
- bookNode.addChild("year", book.year)
- bookNode.addChild("price", book.price)
- Next
- ' Save the XML file as books.new.xml with formatting
- exportXmlDoc.saveFile("books.new.xml", True, True)
- ' Release resources associated with exportXmlDoc
- exportXmlDoc.free()
- ' Now, the new XML file has been created with the processed book data from the original XML file.
- ' You can check the output in the "books.new.xml" file.
- ```
- This is the content of the generated `books.new.xml` file:
- ```xml
- <?xml version="1.0" encoding="utf-8"?>
- <bookstore>
- <book category="cooking">
- <title lang="en">Everyday Italian</title>
- <author>Giada De Laurentiis</author>
- <year>2005</year>
- <price>30.0000000</price>
- </book>
- <book category="children">
- <title lang="en">Harry Potter</title>
- <author>J K. Rowling</author>
- <year>2005</year>
- <price>29.9899998</price>
- </book>
- <book category="web">
- <title lang="en">XQuery Kick Start</title>
- <author>James McGovern</author>
- <author>Per Bothner</author>
- <author>Kurt Cagle</author>
- <author>James Linn</author>
- <author>Vaidyanathan Nagarajan</author>
- <year>2003</year>
- <price>49.9900017</price>
- </book>
- <book category="web" cover="paperback">
- <title lang="en">Learning XML</title>
- <author>Erik T. Ray</author>
- <year>2003</year>
- <price>39.9500008</price>
- </book>
- </bookstore>
- ```
- ### Summary
- Throughout this short guide, we have demonstrated how to create a bookstore application using XML
- files to store and manipulate book data. Here's a recap of the key steps we covered:
- 1. **Understanding XML Structure** : We examined the hierarchical structure of XML files, which consists
- of elements, attributes, and text nodes. We also touched upon the similarities between XML and JSON.
- 2. **Creating a Bookstore Example** : We designed an XML file representing a bookstore that contains
- information about various books, including their categories, titles, authors, publication years, and prices.
- 3. **XML Nodes** : We explored the concept of nodes in XML and their various types, including element
- nodes, attribute nodes, and text nodes.
- 4. **Reading XML Files** : We demonstrated how to read an existing XML file using the #TxmlDoc class
- and parse the book data into our custom `TBook` type.
- 5. **Storing Book Data** : We created a custom `TBook` type to store book data within our application and
- added the book instances to a #TList.
- 6. **Accessing XML Elements** : We retrieved specific elements and their attributes from the XML file
- using various #TxmlNode methods.
- 7. **Saving XML Files** : We discussed how to serialize our `TBook` objects and save them into a new
- XML file, creating new TxmlNodes for each book and setting their attributes and child nodes accordingly.
- By following these steps, we successfully built a bookstore application that reads, processes, and saves book
- data using XML files.
|