1 | \documentclass[11pt]{article} |
---|
2 | %\decimalpoint |
---|
3 | \tolerance 10000 |
---|
4 | \textheight 24cm |
---|
5 | \textwidth 16cm |
---|
6 | \oddsidemargin 1mm |
---|
7 | \topmargin -20mm |
---|
8 | \parindent 0mm |
---|
9 | \begin{document}\title{xmlf90: A parser for XML in Fortran90} |
---|
10 | \author{Alberto Garc\'{\i}a \\ |
---|
11 | Departamento de F\'{\i}sica de la Materia Condensada \\ |
---|
12 | Facultad de Ciencia y Tecnolog\'{\i}a \\ |
---|
13 | Universidad del Pa\'{\i}s Vasco\\ |
---|
14 | Apartado 644 , 48080 Bilbao, Spain\\ |
---|
15 | http://lcdx00.wm.lc.ehu.es/ag/xml/} |
---|
16 | \date{30 January 2004 --- xmlf90 Version 1.1} |
---|
17 | |
---|
18 | \maketitle\section{Introduction} |
---|
19 | |
---|
20 | {\bf NOTE: This version of the User Guide and Tutorial does not |
---|
21 | cover either the WXML printing library or the new DOM API |
---|
22 | conceived by Jon Wakelin. See the html reference material and the |
---|
23 | relevant example subdirectories.} |
---|
24 | \bigskip |
---|
25 | |
---|
26 | This tutorial documents the user interface of \texttt{xmlf90}, a |
---|
27 | native Fortran90 XML parser. The parser was designed to be a useful |
---|
28 | tool in the extraction and analysis of data in the context of |
---|
29 | scientific computing, and thus the priorities were efficiency and the |
---|
30 | ability to deal with very large XML files while maintaining a small |
---|
31 | memory footprint. There are two programming interfaces. The first is |
---|
32 | based on the very successful SAX (Simple API for XML) model: the |
---|
33 | parser calls routines provided by the user to handle certain events, |
---|
34 | such as the encounter of the beginning of an element, or the end of an |
---|
35 | element, or the reading of character data. The other is based on the |
---|
36 | XPATH standard. Only a very limited set of the full XPATH |
---|
37 | specification is offered, but it is already quite useful. |
---|
38 | |
---|
39 | Some familiarity of XML is assumed. Apart from the examples discussed |
---|
40 | in this tutorial (chosen for their simplicity), the interested reader |
---|
41 | can refer to the \texttt{Examples/} directory in the \texttt{xmlf90} |
---|
42 | distribution. |
---|
43 | |
---|
44 | |
---|
45 | |
---|
46 | \section{The SAX interface} |
---|
47 | \subsection{A simple example} |
---|
48 | |
---|
49 | To illustrate the working of the SAX interface, consider the following |
---|
50 | XML snippet |
---|
51 | |
---|
52 | \begin{verbatim} |
---|
53 | <item id="003"> |
---|
54 | <description>Washing machine</description> |
---|
55 | <price currency="euro">1500.00</price> |
---|
56 | </item> |
---|
57 | \end{verbatim} |
---|
58 | % |
---|
59 | When the parser processes this snippet, it carries out the sequence of calls: |
---|
60 | |
---|
61 | \begin{enumerate} |
---|
62 | \item call to \texttt{begin\_element\_handler} with name="item" and |
---|
63 | attributes=(Dictionary with the pair (id,003)) |
---|
64 | \item call to \texttt{begin\_element\_handler} with name="description" and an |
---|
65 | empty attribute dictionary. |
---|
66 | \item call to \texttt{pcdata\_chunk\_handler} with pcdata="Washing machine" |
---|
67 | \item call to \texttt{end\_element\_handler} with name="description" |
---|
68 | \item call to \texttt{begin\_element\_handler} with name="price" and |
---|
69 | attributes=(Dictionary with the pair (currency,euro)) |
---|
70 | \item call to \texttt{pcdata\_chunk\_handler} with pcdata="1500.00" |
---|
71 | \item call to \texttt{end\_element\_handler} with name="price" |
---|
72 | \item call to \texttt{end\_element\_handler} with name="item" |
---|
73 | \end{enumerate} |
---|
74 | |
---|
75 | The handler routines are written by the user and passed to the parser |
---|
76 | as procedure arguments. A simple program that parses the above XML |
---|
77 | fragment (assuming it resides in file \textsl{inventory.xml}) and |
---|
78 | prints out the names of the elements and any \textsl{id} attributes as |
---|
79 | they are found, is: |
---|
80 | |
---|
81 | \begin{verbatim} |
---|
82 | program simple |
---|
83 | use flib_sax |
---|
84 | |
---|
85 | type(xml_t) :: fxml ! XML file object (opaque) |
---|
86 | integer :: iostat ! Return code (0 if OK) |
---|
87 | |
---|
88 | call open_xmlfile("inventory.xml",fxml,iostat) |
---|
89 | if (iostat /= 0) stop "cannot open xml file" |
---|
90 | |
---|
91 | call xml_parse(fxml, begin_element_handler=begin_element_print) |
---|
92 | |
---|
93 | contains !---------------- handler subroutine follows |
---|
94 | |
---|
95 | subroutine begin_element_print(name,attributes) |
---|
96 | character(len=*), intent(in) :: name |
---|
97 | type(dictionary_t), intent(in) :: attributes |
---|
98 | |
---|
99 | character(len=3) :: id |
---|
100 | integer :: status |
---|
101 | |
---|
102 | print *, "Start of element: ", name |
---|
103 | if (has_key(attributes,"id")) then |
---|
104 | call get_value(attributes,"id",id,status) |
---|
105 | print *, " Id attribute: ", id |
---|
106 | endif |
---|
107 | end subroutine begin_element_print |
---|
108 | |
---|
109 | end program simple |
---|
110 | \end{verbatim} |
---|
111 | % |
---|
112 | To access the XML parsing functionality, the user only needs to \texttt{use} |
---|
113 | the module \texttt{flib\_sax}, open the XML file, and call the main routine |
---|
114 | \texttt{xml\_parse}, providing it with the appropriate event handlers. |
---|
115 | |
---|
116 | The subroutine interfaces are: |
---|
117 | |
---|
118 | \begin{verbatim} |
---|
119 | subroutine open_xmlfile(fname,fxml,iostat) |
---|
120 | character(len=*), intent(in) :: fname ! File name |
---|
121 | type(xml_t), intent(out) :: fxml ! XML file object (opaque) |
---|
122 | integer, intent(out ) :: iostat ! Return code (0 if OK) |
---|
123 | |
---|
124 | |
---|
125 | subroutine xml_parse(fxml, & |
---|
126 | begin_element_handler, & |
---|
127 | end_element_handler, & |
---|
128 | pcdata_chunk_handler .... |
---|
129 | .... MORE OPTIONAL HANDLERS ) |
---|
130 | |
---|
131 | \end{verbatim} |
---|
132 | |
---|
133 | The handlers are OPTIONAL arguments (in the above example we just |
---|
134 | specify \texttt{begin\_element\_handler}). If no handlers are given, |
---|
135 | nothing useful will happen, except that any errors are detected and |
---|
136 | reported. The interfaces for the most useful handlers are: |
---|
137 | |
---|
138 | \begin{verbatim} |
---|
139 | subroutine begin_element_handler(name,attributes) |
---|
140 | character(len=*), intent(in) :: name |
---|
141 | type(dictionary_t), intent(in) :: attributes |
---|
142 | end subroutine begin_element_handler |
---|
143 | |
---|
144 | subroutine end_element_handler(name) |
---|
145 | character(len=*), intent(in) :: name |
---|
146 | end subroutine end_element_handler |
---|
147 | |
---|
148 | subroutine pcdata_chunk_handler(chunk) |
---|
149 | character(len=*), intent(in) :: chunk |
---|
150 | end subroutine pcdata_chunk_handler |
---|
151 | \end{verbatim} |
---|
152 | |
---|
153 | The attribute information in an element tag is represented as a |
---|
154 | dictionary of name/value pairs, held in a \texttt{dictionary\_t} |
---|
155 | abstract type. The information in it can be accessed through a set of |
---|
156 | dictionary methods such as \texttt{has\_key} and \texttt{get\_value} |
---|
157 | (full interfaces to be found in Sect.~\ref{sec:reference}). |
---|
158 | |
---|
159 | \subsection{Monitoring the sequence of events} |
---|
160 | The above example is too simple and not very useful if what we want is |
---|
161 | to extract information in a coherent manner. For example, assume we |
---|
162 | have a more complete inventory of appliances such as |
---|
163 | % |
---|
164 | \begin{verbatim} |
---|
165 | <inventory> |
---|
166 | <item id="003"> |
---|
167 | <description>Washing machine</description> |
---|
168 | <price currency="euro">1500.00</price> |
---|
169 | </item> |
---|
170 | <item id="007"> |
---|
171 | <description>Microwave oven</description> |
---|
172 | <price currency="euro">300.00</price> |
---|
173 | </item> |
---|
174 | <item id="011"> |
---|
175 | <description>Dishwasher</description> |
---|
176 | <price currency="swedish crown">10000.00</price> |
---|
177 | </item> |
---|
178 | </inventory> |
---|
179 | \end{verbatim} |
---|
180 | % |
---|
181 | and we want to print the items with their prices in the form: |
---|
182 | % |
---|
183 | \begin{verbatim} |
---|
184 | 003 Washing machine : 1500.00 euro |
---|
185 | 007 Microwave oven : 300.00 euro |
---|
186 | 011 Dishwasher : 10000.00 swedish crown |
---|
187 | \end{verbatim} |
---|
188 | |
---|
189 | We begin by writing the following module |
---|
190 | |
---|
191 | \begin{verbatim} |
---|
192 | module m_handlers |
---|
193 | use flib_sax |
---|
194 | private |
---|
195 | public :: begin_element, end_element, pcdata_chunk |
---|
196 | ! |
---|
197 | logical, private :: in_item, in_description, in_price |
---|
198 | character(len=40), private :: what, price, currency, id |
---|
199 | ! |
---|
200 | contains !----------------------------------------- |
---|
201 | ! |
---|
202 | subroutine begin_element(name,attributes) |
---|
203 | character(len=*), intent(in) :: name |
---|
204 | type(dictionary_t), intent(in) :: attributes |
---|
205 | |
---|
206 | integer :: status |
---|
207 | |
---|
208 | select case(name) |
---|
209 | case("item") |
---|
210 | in_item = .true. |
---|
211 | call get_value(attributes,"id",id,status) |
---|
212 | |
---|
213 | case("description") |
---|
214 | in_description = .true. |
---|
215 | |
---|
216 | case("price") |
---|
217 | in_price = .true. |
---|
218 | call get_value(attributes,"currency",currency,status) |
---|
219 | |
---|
220 | end select |
---|
221 | |
---|
222 | end subroutine begin_element |
---|
223 | !--------------------------------------------------------------- |
---|
224 | subroutine pcdata_chunk_handler(chunk) |
---|
225 | character(len=*), intent(in) :: chunk |
---|
226 | |
---|
227 | if (in_description) what = chunk |
---|
228 | if (in_price) price = chunk |
---|
229 | |
---|
230 | end subroutine pcdata_chunk_handler |
---|
231 | !--------------------------------------------------------------- |
---|
232 | subroutine end_element(name) |
---|
233 | character(len=*), intent(in) :: name |
---|
234 | |
---|
235 | select case(name) |
---|
236 | case("item") |
---|
237 | in_item = .false. |
---|
238 | write(unit=*,fmt="(5(a,1x))") trim(id), trim(what), ":", & |
---|
239 | trim(price), trim(currency) |
---|
240 | |
---|
241 | case("description") |
---|
242 | in_description = .false. |
---|
243 | |
---|
244 | case("price") |
---|
245 | in_price = .false. |
---|
246 | |
---|
247 | end select |
---|
248 | |
---|
249 | end subroutine end_element |
---|
250 | !--------------------------------------------------------------- |
---|
251 | end module m_handlers |
---|
252 | \end{verbatim} |
---|
253 | % |
---|
254 | PCDATA chunks are passed back as simple fortran character variables, |
---|
255 | and we assign them to \texttt{what} or \texttt{price} depending on the |
---|
256 | context, which we monitor through the logical variables |
---|
257 | \texttt{in\_description, in\_price}, updated as we enter and leave |
---|
258 | different elements. (The variable \texttt{in\_item} is not strictly |
---|
259 | necessary.) |
---|
260 | |
---|
261 | The program to parse the file just needs to use the functionality in |
---|
262 | the module \texttt{m\_handlers}: |
---|
263 | % |
---|
264 | \begin{verbatim} |
---|
265 | program inventory |
---|
266 | use flib_sax |
---|
267 | use m_handlers |
---|
268 | |
---|
269 | type(xml_t) :: fxml ! XML file object (opaque) |
---|
270 | integer :: iostat |
---|
271 | |
---|
272 | call open_xmlfile("inventory.xml",fxml,iostat) |
---|
273 | if (iostat /= 0) stop "cannot open xml file" |
---|
274 | |
---|
275 | call xml_parse(fxml, begin_element_handler=begin_element, & |
---|
276 | end_element_handler=end_element, & |
---|
277 | pcdata_chunk_handler=pcdata_chunk ) |
---|
278 | |
---|
279 | end program inventory |
---|
280 | |
---|
281 | \end{verbatim} |
---|
282 | % |
---|
283 | \subsubsection{Exercises} |
---|
284 | \begin{enumerate} |
---|
285 | \item Code the above fortran files and the XML file in your |
---|
286 | computer. Compile and run the program and check that the output is |
---|
287 | correct. (Compilation instructions are provided in |
---|
288 | Sect.~\ref{sec:compiling}). |
---|
289 | \item Edit the XML file and remove one of the \texttt{</item>} |
---|
290 | lines. What happens? This is an example of a \textsl{mal-formed} XML |
---|
291 | file. The parser can detect it and complain about it. |
---|
292 | \item Edit the XML file and remove the \texttt{currency} attribute |
---|
293 | from one of the elements. What happens? In this case, the parser |
---|
294 | cannot detect the missing attribute (it is not a \textsl{validating |
---|
295 | parser}). However, it could be possible for the user to detect early |
---|
296 | that something is wrong by checking the value of the \texttt{status} |
---|
297 | variable after the call to \texttt{get\_value}. |
---|
298 | \item Modify the program to print the prices in euros (1 euro buys |
---|
299 | approximately 9.2 swedish crowns). |
---|
300 | \end{enumerate} |
---|
301 | |
---|
302 | \subsection{Other tags and their handlers} |
---|
303 | |
---|
304 | The parser can also process comments, XML declarations (formally known |
---|
305 | as ``processing instructions"), and SGML declarations, although the |
---|
306 | latter two are not acted upon in any way (in particular, no attempt at |
---|
307 | validation of the XML document is done). |
---|
308 | |
---|
309 | \begin{itemize} |
---|
310 | |
---|
311 | \item |
---|
312 | An \textbf{empty element} tag of the form |
---|
313 | % |
---|
314 | \begin{verbatim} |
---|
315 | <name att="value"... /> |
---|
316 | \end{verbatim} |
---|
317 | % |
---|
318 | can be handled as successive calls to \texttt{begin\_element\_handler} |
---|
319 | and \texttt{end\_element\_handler}. However, if the optional handler |
---|
320 | \texttt{empty\_element\_handler} is present, it is called instead. Its |
---|
321 | interface is exactly the same as that of |
---|
322 | \texttt{begin\_element\_handler}: |
---|
323 | % |
---|
324 | \begin{verbatim} |
---|
325 | subroutine empty_element_handler(name,attributes) |
---|
326 | character(len=*), intent(in) :: name |
---|
327 | type(dictionary_t), intent(in) :: attributes |
---|
328 | end subroutine empty_element_handler |
---|
329 | \end{verbatim} |
---|
330 | % |
---|
331 | \item |
---|
332 | \textbf{Comments} are sections of the XML file contained between the markup |
---|
333 | \texttt{<!{-}-} and \texttt{{-}->}, |
---|
334 | and are handled by the optional argument \texttt{comment\_handler} |
---|
335 | % |
---|
336 | \begin{verbatim} |
---|
337 | subroutine comment_handler(comment) |
---|
338 | character(len=*), intent(in) :: comment |
---|
339 | end subroutine comment_handler |
---|
340 | \end{verbatim} |
---|
341 | % |
---|
342 | \item |
---|
343 | \textbf{XML declarations} can be processed |
---|
344 | in the same way as elements, with the ``target" being the element name, etc. |
---|
345 | For example, in |
---|
346 | % |
---|
347 | \begin{verbatim} |
---|
348 | <?xml version="1.0"?> |
---|
349 | \end{verbatim} |
---|
350 | % |
---|
351 | \textsl{xml} would be the ``element name", \textsl{version} an |
---|
352 | attribute name, and \textsl{1.0} its value. The optional handler |
---|
353 | interface is: |
---|
354 | % |
---|
355 | \begin{verbatim} |
---|
356 | subroutine xml_declaration_handler(name,attributes) |
---|
357 | character(len=*), intent(in) :: name |
---|
358 | type(dictionary_t), intent(in) :: attributes |
---|
359 | end subroutine xml_declaration_handler |
---|
360 | \end{verbatim} |
---|
361 | % |
---|
362 | \item |
---|
363 | \textbf{SGML declarations} such as entity declarations or doctype |
---|
364 | specifications are treated basically as comments. Interface: |
---|
365 | % |
---|
366 | \begin{verbatim} |
---|
367 | subroutine sgml_declaration_handler(sgml_declaration) |
---|
368 | character(len=*), intent(in) :: sgml_declaration |
---|
369 | end subroutine sgml_declaration_handler |
---|
370 | \end{verbatim} |
---|
371 | % |
---|
372 | \end{itemize} |
---|
373 | In the current version of the parser, overly long comments and SGML |
---|
374 | declarations might be truncated. |
---|
375 | |
---|
376 | |
---|
377 | \section{The XPATH interface} |
---|
378 | |
---|
379 | \textsl{NOTE: The current implementation gets its inspiration from |
---|
380 | XPATH, but by no means it is a complete, or even a subset, |
---|
381 | implementation of the standard. Since it is built on top of the SAX |
---|
382 | interface, it uses a ``stream" paradigm which is completely alien to |
---|
383 | the XPATH specification. It is nevertheless still quite useful. The |
---|
384 | author is open to suggestions to refine the interface.} |
---|
385 | |
---|
386 | \bigskip |
---|
387 | |
---|
388 | This API is based on the concept of an XML path. For example: |
---|
389 | % |
---|
390 | \begin{verbatim} |
---|
391 | /inventory/item |
---|
392 | \end{verbatim} |
---|
393 | % |
---|
394 | represents a 'item' element which is a child of the root element |
---|
395 | 'inventory'. Paths can contain special wildcard markers such as |
---|
396 | \texttt{//} and \texttt{*}. The following are examples of valid paths: |
---|
397 | % |
---|
398 | \begin{verbatim} |
---|
399 | //a : Any occurrence of element 'a', at any depth. |
---|
400 | /a/*/b : Any 'b' which is a grand-child of 'a' |
---|
401 | ./a : A relative path (with respect to the current path) |
---|
402 | a : (same as above) |
---|
403 | /a/b/./c : Same as /a/b/c (the dot (.) is a dummy) |
---|
404 | //* : Any element. |
---|
405 | //a/*//b : Any 'b' under any children of 'a'. |
---|
406 | |
---|
407 | \end{verbatim} |
---|
408 | % |
---|
409 | \subsection{Simple example} |
---|
410 | Using the XPATH interface it is possible to search for any element |
---|
411 | directly, and to recover its attributes or character content. For |
---|
412 | example, to print the names of all the appliances in the inventory: |
---|
413 | % |
---|
414 | \begin{verbatim} |
---|
415 | program simple |
---|
416 | use flib_xpath |
---|
417 | |
---|
418 | type(xml_t) :: fxml |
---|
419 | |
---|
420 | integer :: status |
---|
421 | character(len=100) :: what |
---|
422 | |
---|
423 | call open_xmlfile("inventory.xml",fxml,status) |
---|
424 | ! |
---|
425 | do |
---|
426 | call get_node(fxml,path="//description",pcdata=what,status=status) |
---|
427 | if (status < 0) exit |
---|
428 | print *, "Appliance: ", trim(what) |
---|
429 | enddo |
---|
430 | end program simple |
---|
431 | \end{verbatim} |
---|
432 | % |
---|
433 | Repeated calls to \texttt{get\_node} return the character content of |
---|
434 | the 'description' elements (at any depth). We exit the loop when the |
---|
435 | \texttt{status} variable is negative on return from the call. This |
---|
436 | indicates that there are no more elements matching the |
---|
437 | \texttt{//description} path pattern.\footnote{Returning a negative |
---|
438 | value for an end-of-file or end-or-record condition follows the |
---|
439 | standard practice. Positive return values signal malfunctions} |
---|
440 | |
---|
441 | Apart from path patterns, we can narrow our search by specifying |
---|
442 | conditions on the attribute list of the element. For example, to print |
---|
443 | only the prices which are given in euros we can use the |
---|
444 | \texttt{att\_name} and \texttt{att\_value} optional arguments: |
---|
445 | % |
---|
446 | \begin{verbatim} |
---|
447 | program euros |
---|
448 | use flib_xpath |
---|
449 | |
---|
450 | type(xml_t) :: fxml |
---|
451 | |
---|
452 | integer :: status |
---|
453 | character(len=100) :: price |
---|
454 | |
---|
455 | call open_xmlfile("inventory.xml",fxml,status) |
---|
456 | ! |
---|
457 | do |
---|
458 | call get_node(fxml,path="//price", & |
---|
459 | att_name="currency",att_value="euro", & |
---|
460 | pcdata=price,status=status) |
---|
461 | if (status < 0) exit |
---|
462 | print *, "Price (euro): ", trim(price) |
---|
463 | enddo |
---|
464 | end program euros |
---|
465 | \end{verbatim} |
---|
466 | % |
---|
467 | We can zero in on any element in this fashion, but we apparently give |
---|
468 | up the all-important context. What happens if we want to print |
---|
469 | \textsl{both} the appliance description and its price? |
---|
470 | % |
---|
471 | \begin{verbatim} |
---|
472 | program twoelements |
---|
473 | use flib_xpath |
---|
474 | |
---|
475 | type(xml_t) :: fxml |
---|
476 | |
---|
477 | integer :: status |
---|
478 | character(len=100) :: what, price, currency |
---|
479 | |
---|
480 | call open_xmlfile("inventory.xml",fxml,status) |
---|
481 | ! |
---|
482 | do |
---|
483 | call get_node(fxml,path="//description", & |
---|
484 | pcdata=what,status=status) |
---|
485 | if (status < 0) exit ! No more items |
---|
486 | ! |
---|
487 | ! Price comes right after description... |
---|
488 | ! |
---|
489 | call get_node(fxml,path="//price", & |
---|
490 | attributes=attributes,pcdata=price,status=status) |
---|
491 | if (status /= 0) stop "missing price element!" |
---|
492 | |
---|
493 | call get_value(attributes,"currency",currency,status) |
---|
494 | if (status /= 0) stop "missing currency attribute!" |
---|
495 | |
---|
496 | write(unit=*,fmt="(6a)") "Appliance: ", trim(what), & |
---|
497 | ". Price: ", trim(price), " ", trim(currency) |
---|
498 | enddo |
---|
499 | end program twoelements |
---|
500 | \end{verbatim} |
---|
501 | % |
---|
502 | \subsubsection{Exercises} |
---|
503 | \begin{enumerate} |
---|
504 | \item Modify the above programs to print only the appliances priced in |
---|
505 | euros. |
---|
506 | \item Modify the order of the 'description' and 'price' elements in a |
---|
507 | item. What happens to the 'twoelements' program output? |
---|
508 | \item The full XPATH specification allows the query for a particular |
---|
509 | element among a set of elements with the same path, based on the |
---|
510 | ordering of the element. For example, "/inventory/item[2]" will refer |
---|
511 | to the second 'item' element in the XML file. Write a routine that |
---|
512 | implements this feature and returns the element's attribute |
---|
513 | dictionary. |
---|
514 | \item Queries for paths can be issued in any order, and so some |
---|
515 | mechanism for "rewinding" the XML file is necessary. It is provided by |
---|
516 | the appropriately named \texttt{rewind\_xmlfile} subroutine (see full |
---|
517 | interface in the Reference section). Use it to implement a silly |
---|
518 | program that prints items from the inventory at random. (Extra points |
---|
519 | for including logic to minimize the number of rewinds.) |
---|
520 | \end{enumerate} |
---|
521 | % |
---|
522 | |
---|
523 | \subsection{Contexts and restricted searches} |
---|
524 | |
---|
525 | The logic of the \texttt{twoelements} program in the previous section |
---|
526 | follows from the assumption that the 'price' element follows the |
---|
527 | 'description' element in a typical 'item'. If the DTD says so, and |
---|
528 | the XML file is valid (in the technical sense of conforming to the |
---|
529 | DTD), the assumption should be correct. However, since the parser is |
---|
530 | non-validating, it might be unreasonable to expect the proper |
---|
531 | ordering in all cases. What we should expect (as a minimum) is that |
---|
532 | both the price and description elements are children of the 'item' |
---|
533 | element. In the following version we make use of the \textbf{context} |
---|
534 | concept to achieve a more robust solution. |
---|
535 | % |
---|
536 | \begin{verbatim} |
---|
537 | program item_context |
---|
538 | use flib_xpath |
---|
539 | |
---|
540 | type(xml_t) :: fxml, contex |
---|
541 | |
---|
542 | integer :: status |
---|
543 | character(len=100) :: what, price, currency |
---|
544 | |
---|
545 | call open_xmlfile("inventory.xml",fxml,status) |
---|
546 | ! |
---|
547 | do |
---|
548 | call mark_node(fxml,path="//item",status=status) |
---|
549 | if (status < 0) exit ! No more items |
---|
550 | context = fxml ! Save item context |
---|
551 | ! |
---|
552 | ! Search relative to context |
---|
553 | ! |
---|
554 | call get_node(fxml,path="price", & |
---|
555 | attributes=attributes,pcdata=price,status=status) |
---|
556 | call get_value(attributes,"currency",currency,status) |
---|
557 | if (status /= 0) stop "missing currency attribute!" |
---|
558 | ! |
---|
559 | ! Rewind to beginning of context |
---|
560 | ! |
---|
561 | fxml = context |
---|
562 | call sync_xmlfile(fxml) |
---|
563 | ! |
---|
564 | ! Search relative to context |
---|
565 | ! |
---|
566 | call get_node(fxml,path="description",pcdata=what,status=status) |
---|
567 | write(unit=*,fmt="(6a)") "Appliance: ", trim(what), & |
---|
568 | ". Price: ", trim(price), " ", trim(currency) |
---|
569 | enddo |
---|
570 | end program item_context |
---|
571 | \end{verbatim} |
---|
572 | % |
---|
573 | The call to \texttt{mark\_node} positions the parser's file handle |
---|
574 | \texttt{fxml} right after the end of the starting tag of the next |
---|
575 | 'item' element. We save that position as a ``context marker" to which |
---|
576 | we can return later on. The calls to \texttt{get\_node} use path |
---|
577 | patterns that do not start with a \texttt{/}: they are |
---|
578 | \textbf{searches relative to the current context}. After getting the |
---|
579 | information about the 'price' element, we restore the parser's file |
---|
580 | handle to the appropriate position at the beginning of the 'item' |
---|
581 | context, and search for the 'description' element. In the following |
---|
582 | iteration of the loop, the parser will find the next 'item' element, |
---|
583 | and the process will be repeated until there are no more 'item's. |
---|
584 | |
---|
585 | |
---|
586 | Contexts come in handy to encapsulate parsing tasks in re-usable |
---|
587 | subroutines. Suppose you are going to find the basic 'item' element |
---|
588 | content in a whole lot of different XML files. The following |
---|
589 | subroutine extracts the description and price information: |
---|
590 | % |
---|
591 | \begin{verbatim} |
---|
592 | subroutine get_item_info(context,what,price,currency) |
---|
593 | type(xml_t), intent(in) :: contex |
---|
594 | character(len=*), intent(out) :: what, price, currency |
---|
595 | |
---|
596 | ! |
---|
597 | ! Local variables |
---|
598 | ! |
---|
599 | type(xml_t) :: ff |
---|
600 | integer :: status |
---|
601 | type(dictionary_t) :: attributes |
---|
602 | |
---|
603 | ! |
---|
604 | ! context is read-only, so make a copy and sync just in case |
---|
605 | ! |
---|
606 | ff = context |
---|
607 | call sync_xmlfile(ff) |
---|
608 | ! |
---|
609 | call get_node(ff,path="price", & |
---|
610 | attributes=attributes,pcdata=price,status=status) |
---|
611 | call get_value(attributes,"currency",currency,status) |
---|
612 | if (status /= 0) stop "missing currency attribute!" |
---|
613 | ! |
---|
614 | ! Rewind to beginning of context |
---|
615 | ! |
---|
616 | ff = context |
---|
617 | call sync_xmlfile(ff) |
---|
618 | ! |
---|
619 | call get_node(ff,path="description",pcdata=what,status=status) |
---|
620 | |
---|
621 | end subroutine get_item_info |
---|
622 | \end{verbatim} |
---|
623 | % |
---|
624 | Using this routine, the parsing is much more compact: |
---|
625 | % |
---|
626 | \begin{verbatim} |
---|
627 | program item_context |
---|
628 | use flib_xpath |
---|
629 | |
---|
630 | type(xml_t) :: fxml |
---|
631 | |
---|
632 | integer :: status |
---|
633 | character(len=100) :: what, price, currency |
---|
634 | |
---|
635 | call open_xmlfile("inventory.xml",fxml,status) |
---|
636 | ! |
---|
637 | do |
---|
638 | call mark_node(fxml,path="//item",status=status) |
---|
639 | if (status /= 0) exit ! No more items |
---|
640 | call get_item_info(fxml,what,price,currency) |
---|
641 | write(unit=*,fmt="(6a)") "Appliance: ", trim(what), & |
---|
642 | ". Price: ", trim(price), " ", trim(currency) |
---|
643 | call sync_xmlfile(fxml) |
---|
644 | enddo |
---|
645 | end program item_context |
---|
646 | \end{verbatim} |
---|
647 | % |
---|
648 | It is extremely important to understand the meaning of the call to |
---|
649 | \texttt{sync\_xmlfile}. The file handle \texttt{fxml} holds parsing |
---|
650 | context \textbf{and} a physical pointer to the file position |
---|
651 | (basically a variable counting the number of characters read so |
---|
652 | far). When the context is passed to the subroutine and the parsing |
---|
653 | carried out, the context and the file position get out of |
---|
654 | sync. Synchronization means to re-position the physical file pointer |
---|
655 | to the place where it was when the context was first created. |
---|
656 | |
---|
657 | |
---|
658 | \subsubsection{Exercises} |
---|
659 | \begin{enumerate} |
---|
660 | \item Modify the above programs to print only the appliances priced in |
---|
661 | euros. |
---|
662 | \item Write a program that prints only the most expensive |
---|
663 | item. (Assume that the inventory is very large and it is not feasible |
---|
664 | to hold everything in memory...) |
---|
665 | \item Use the \texttt{get\_item\_info} subroutine to print |
---|
666 | descriptions and price information from the following XML file: |
---|
667 | % |
---|
668 | \begin{verbatim} |
---|
669 | <vacations> |
---|
670 | <trip> |
---|
671 | <description>Mediterranean cruise</description> |
---|
672 | <price currency="euro">1500.00</price> |
---|
673 | </trip> |
---|
674 | <trip> |
---|
675 | <description>Week in Majorca</description> |
---|
676 | <price currency="euro">300.00</price> |
---|
677 | </trip> |
---|
678 | <trip> |
---|
679 | <description>Wilderness Route</description> |
---|
680 | <price currency="swedish crown">10000.00</price> |
---|
681 | </trip> |
---|
682 | </vacations> |
---|
683 | \end{verbatim} |
---|
684 | % |
---|
685 | (Note that the routine does not care what the context name is (it |
---|
686 | could be 'item' or 'trip'). It is only the fact that the children |
---|
687 | ('description' and 'price') are the same that matters. |
---|
688 | \end{enumerate} |
---|
689 | |
---|
690 | \section{Handling of scientific data} |
---|
691 | |
---|
692 | \subsection{Numerical datasets} |
---|
693 | |
---|
694 | While the ASCII form is not the most efficient for the storage of |
---|
695 | numerical data, the portability and flexibility offered by the XML |
---|
696 | format makes it attractive for the interchange of scientific |
---|
697 | datasets. There are a number of efforts under way to standardize this |
---|
698 | area, and presumably we will have nifty tools for the creation and |
---|
699 | visualization of files in the near future. Even then, however, it will |
---|
700 | be necessary to be able to read numerical information into fortran |
---|
701 | programs. The \texttt{xmlf90} package offers limited but useful |
---|
702 | functionality in this regard, making it possible to build numerical |
---|
703 | arrays on the fly as the XML file containing the data is parsed. As an |
---|
704 | example, consider the dataset: |
---|
705 | % |
---|
706 | \begin{verbatim} |
---|
707 | <data> |
---|
708 | 8.90679398599 8.90729421510 8.90780189594 8.90831710494 |
---|
709 | 8.90883991832 8.90937041202 8.90990866166 8.91045474255 |
---|
710 | 8.91100872963 8.91157069732 8.91214071958 8.91271886986 |
---|
711 | 8.91330522098 8.91389984506 8.91450281355 8.91511419713 |
---|
712 | 8.91573406560 8.91636248785 8.91699953183 8.91764526444 |
---|
713 | 8.91829975142 8.91896305734 8.91963524555 8.92031637799 |
---|
714 | 8.92100651514 8.92170571605 8.92241403816 8.92313153711 |
---|
715 | 8.92385826683 8.92459427943 8.92533962491 8.92609435120 |
---|
716 | 8.92685850416 8.92763212726 8.92841526149 8.92920794545 |
---|
717 | </data> |
---|
718 | \end{verbatim} |
---|
719 | % |
---|
720 | and the following fragment of a \texttt{m\_handlers} module for SAX parsing: |
---|
721 | % |
---|
722 | \begin{verbatim} |
---|
723 | |
---|
724 | real, dimension(1000) :: x ! numerical array to hold data |
---|
725 | |
---|
726 | subroutine begin_element(name,attributes) |
---|
727 | ... |
---|
728 | select case(name) |
---|
729 | case("data") |
---|
730 | in_data = .true. |
---|
731 | ndata = 0 |
---|
732 | ... |
---|
733 | end select |
---|
734 | |
---|
735 | end subroutine begin_element |
---|
736 | !--------------------------------------------------------------- |
---|
737 | subroutine pcdata_chunk_handler(chunk) |
---|
738 | character(len=*), intent(in) :: chunk |
---|
739 | |
---|
740 | if (in_data) call build_data_array(chunk,x,ndata) |
---|
741 | ... |
---|
742 | |
---|
743 | end subroutine pcdata_chunk_handler |
---|
744 | !------------------------------------------------------------- |
---|
745 | subroutine end_element(name) |
---|
746 | ... |
---|
747 | select case(name) |
---|
748 | case("data") |
---|
749 | in_data = .false. |
---|
750 | print *, "Read ", ndata, " data elements." |
---|
751 | print *, "X: ", x(1:ndata) |
---|
752 | ... |
---|
753 | end select |
---|
754 | |
---|
755 | end subroutine end_element |
---|
756 | \end{verbatim} |
---|
757 | % |
---|
758 | When the \texttt{<data>} tag is encountered by the parser, the |
---|
759 | variable \texttt{ndata} is initialized. Any PCDATA chunks found from |
---|
760 | then on and until the \texttt{</data>} tag is seen are passed to the |
---|
761 | \texttt{build\_data\_array} generic subroutine, which converts the |
---|
762 | character data to the numerical format (integer, default real, double |
---|
763 | precision) implied by the array \texttt{x}. The array is filled with |
---|
764 | data and the \texttt{ndata} variable increased accordingly. |
---|
765 | |
---|
766 | If the data is known to represent a multi-dimensional array (something |
---|
767 | that could be encoded in the XML as attributes to the 'data' element, |
---|
768 | for example), the user can employ the fortran \texttt{reshape} |
---|
769 | intrinsic to obtain the final form. |
---|
770 | |
---|
771 | There is absolutely no limit to the size of the data (apart from |
---|
772 | filesystem size and total memory constraints) since the parser only |
---|
773 | holds in memory at any given time a small chunk of character data (the |
---|
774 | default is to split the character data stream and call the |
---|
775 | \texttt{pcdata\_chunk\_handler} routine at the end of a line, or at |
---|
776 | the end of a token if the line is too long). This is one of the most |
---|
777 | useful features of the SAX approach to XML parsing. |
---|
778 | |
---|
779 | In order to read numerical data with the XPATH interface in its |
---|
780 | current implementation, one must first read the PCDATA into the |
---|
781 | \texttt{pcdata} optional argument of \texttt{get\_node}, and then call |
---|
782 | \texttt{build\_data\_array}. However, there is an internal limit to |
---|
783 | the size of the PCDATA buffer, so this method cannot be safely used |
---|
784 | for large datasets at this point. In a forthcoming version there will |
---|
785 | be a generic subroutine \texttt{get\_node} with a \texttt{data} |
---|
786 | numerical array optional argument which will be filled by the parser |
---|
787 | on the fly. |
---|
788 | |
---|
789 | |
---|
790 | |
---|
791 | |
---|
792 | \subsubsection{Exercises} |
---|
793 | \begin{enumerate} |
---|
794 | \item Generate an XML file containing a large dataset, and write a |
---|
795 | program to read the information back. You might want to include |
---|
796 | somewhere in the XML file information about the number of data |
---|
797 | elements, so that an array of the proper size can be used. |
---|
798 | \item Devise a strategy to read a dataset without knowing in advance |
---|
799 | the number of data elements. (Some possibilities: re-sizable |
---|
800 | allocatable arrays, two-pass parsing...). |
---|
801 | \item Suggest a possible encoding for the storage of two-dimensional |
---|
802 | arrays, and write a program to read the information from the XML file |
---|
803 | and create the appropriate array. |
---|
804 | \item Write a program that could read a 10Gb Monte Carlo simulation |
---|
805 | dataset and print the average and standard deviation of the data. (We |
---|
806 | are not advocating the use of XML for such large datasets. NetCDF |
---|
807 | would be much more efficient in this case). |
---|
808 | \end{enumerate} |
---|
809 | |
---|
810 | \subsection{Mapping of XML elements to derived types} |
---|
811 | |
---|
812 | After the parsing, the data has to be put somewhere. A good strategy |
---|
813 | to handle structured content is to try to replicate it within data |
---|
814 | structures inside the user program. For example, an element of the |
---|
815 | form |
---|
816 | % |
---|
817 | \begin{verbatim} |
---|
818 | <table units="nm" npts="100"> |
---|
819 | <description>Cluster diameters</description> |
---|
820 | <data> |
---|
821 | 2.3 4.5 5.6 3.4 2.3 1.2 ... |
---|
822 | ... |
---|
823 | ... |
---|
824 | </data> |
---|
825 | </table> |
---|
826 | \end{verbatim} |
---|
827 | % |
---|
828 | could be mapped onto a derived type of the form: |
---|
829 | % |
---|
830 | \begin{verbatim} |
---|
831 | type :: table |
---|
832 | character(len=50) :: description |
---|
833 | character(len=20) :: units |
---|
834 | integer :: npts |
---|
835 | real, dimension(:), pointer :: data |
---|
836 | end type table |
---|
837 | \end{verbatim} |
---|
838 | % |
---|
839 | There could even be parsing and output subroutines associated to this |
---|
840 | derived type, so that the user can handle the XML production and |
---|
841 | reading transparently. Directory \texttt{Examples/} in the |
---|
842 | \texttt{xmlf90} distribution contains some code along these lines. |
---|
843 | |
---|
844 | \subsubsection{Exercises} |
---|
845 | % |
---|
846 | \begin{enumerate} |
---|
847 | \item Study the \texttt{pseudo} example in \texttt{Examples/sax/} and |
---|
848 | \texttt{Examples/xpath/}. Now, with your own application in mind, |
---|
849 | write derived-type definitions and parsing routines to handle your XML |
---|
850 | data (which would also need to be \textsl{designed} somehow). |
---|
851 | |
---|
852 | \end{enumerate} |
---|
853 | % |
---|
854 | |
---|
855 | |
---|
856 | \section{REFERENCE: Subroutine interfaces} |
---|
857 | \label{sec:reference} |
---|
858 | |
---|
859 | \subsection{Dictionary handling} |
---|
860 | |
---|
861 | Attribute lists are handled as instances of a derived type |
---|
862 | \texttt{dictionary\_t}, loosely inspired by the Python type. The |
---|
863 | terminology is more general: keys and entries instead of names and |
---|
864 | attributes. |
---|
865 | |
---|
866 | \begin{itemize} |
---|
867 | \item |
---|
868 | % |
---|
869 | \begin{verbatim} |
---|
870 | function number_of_entries(dict) result(n) |
---|
871 | ! |
---|
872 | ! Returns the number of entries in the dictionary |
---|
873 | ! |
---|
874 | type(dictionary_t), intent(in) :: dict |
---|
875 | integer :: n |
---|
876 | \end{verbatim} |
---|
877 | % |
---|
878 | \item |
---|
879 | % |
---|
880 | \begin{verbatim} |
---|
881 | function has_key(dict,key) result(found) |
---|
882 | ! |
---|
883 | ! Checks whether there is an entry with |
---|
884 | ! the given key in the dictionary |
---|
885 | ! |
---|
886 | type(dictionary_t), intent(in) :: dict |
---|
887 | character(len=*), intent(in) :: key |
---|
888 | logical :: found |
---|
889 | \end{verbatim} |
---|
890 | \item |
---|
891 | % |
---|
892 | \begin{verbatim} |
---|
893 | subroutine get_value(dict,key,value,status) |
---|
894 | ! |
---|
895 | ! Gets values by key |
---|
896 | ! |
---|
897 | type(dictionary_t), intent(in) :: dict |
---|
898 | character(len=*), intent(in) :: key |
---|
899 | character(len=*), intent(out) :: value |
---|
900 | integer, intent(out) :: status |
---|
901 | \end{verbatim} |
---|
902 | % |
---|
903 | \item |
---|
904 | % |
---|
905 | \begin{verbatim} |
---|
906 | subroutine get_key(dict,i,key,status) |
---|
907 | ! |
---|
908 | ! Gets keys by their order in the dictionary |
---|
909 | ! |
---|
910 | type(dictionary_t), intent(in) :: dict |
---|
911 | integer, intent(in) :: i |
---|
912 | character(len=*), intent(out) :: key |
---|
913 | integer, intent(out) :: status |
---|
914 | |
---|
915 | \end{verbatim} |
---|
916 | % |
---|
917 | \item |
---|
918 | % |
---|
919 | \begin{verbatim} |
---|
920 | subroutine print_dict(dict) |
---|
921 | ! |
---|
922 | ! Prints the contents of the dictionary to stdout |
---|
923 | ! |
---|
924 | type(dictionary_t), intent(in) :: dict |
---|
925 | \end{verbatim} |
---|
926 | \end{itemize} |
---|
927 | |
---|
928 | \subsection{SAX interface} |
---|
929 | |
---|
930 | \begin{itemize} |
---|
931 | \item |
---|
932 | \begin{verbatim} |
---|
933 | subroutine open_xmlfile(fname,fxml,iostat) |
---|
934 | ! |
---|
935 | ! Opens the file "fname" and creates an xml handle fxml |
---|
936 | ! iostat /= 0 on error. |
---|
937 | ! |
---|
938 | character(len=*), intent(in) :: fname |
---|
939 | integer, intent(out) :: iostat |
---|
940 | type(xml_t), intent(out) :: fxml |
---|
941 | \end{verbatim} |
---|
942 | \item |
---|
943 | \begin{verbatim} |
---|
944 | subroutine xml_parse(fxml, begin_element_handler, & |
---|
945 | end_element_handler, & |
---|
946 | pcdata_chunk_handler, & |
---|
947 | comment_handler, & |
---|
948 | xml_declaration_handler, & |
---|
949 | sgml_declaration_handler, & |
---|
950 | error_handler, & |
---|
951 | signal_handler, & |
---|
952 | verbose, & |
---|
953 | empty_element_handler) |
---|
954 | |
---|
955 | type(xml_t), intent(inout), target :: fxml |
---|
956 | |
---|
957 | optional :: begin_element_handler |
---|
958 | optional :: end_element_handler |
---|
959 | optional :: pcdata_chunk_handler |
---|
960 | optional :: comment_handler |
---|
961 | optional :: xml_declaration_handler |
---|
962 | optional :: sgml_declaration_handler |
---|
963 | optional :: error_handler |
---|
964 | optional :: signal_handler ! see XPATH code |
---|
965 | logical, intent(in), optional :: verbose |
---|
966 | optional :: empty_element_handler |
---|
967 | |
---|
968 | \end{verbatim} |
---|
969 | \item Interfaces for handlers follow: |
---|
970 | |
---|
971 | \begin{verbatim} |
---|
972 | subroutine begin_element_handler(name,attributes) |
---|
973 | character(len=*), intent(in) :: name |
---|
974 | type(dictionary_t), intent(in) :: attributes |
---|
975 | end subroutine begin_element_handler |
---|
976 | |
---|
977 | subroutine end_element_handler(name) |
---|
978 | character(len=*), intent(in) :: name |
---|
979 | end subroutine end_element_handler |
---|
980 | |
---|
981 | subroutine pcdata_chunk_handler(chunk) |
---|
982 | character(len=*), intent(in) :: chunk |
---|
983 | end subroutine pcdata_chunk_handler |
---|
984 | |
---|
985 | subroutine comment_handler(comment) |
---|
986 | character(len=*), intent(in) :: comment |
---|
987 | end subroutine comment_handler |
---|
988 | |
---|
989 | subroutine xml_declaration_handler(name,attributes) |
---|
990 | character(len=*), intent(in) :: name |
---|
991 | type(dictionary_t), intent(in) :: attributes |
---|
992 | end subroutine xml_declaration_handler |
---|
993 | |
---|
994 | subroutine sgml_declaration_handler(sgml_declaration) |
---|
995 | character(len=*), intent(in) :: sgml_declaration |
---|
996 | end subroutine sgml_declaration_handler |
---|
997 | |
---|
998 | subroutine error_handler(error_info) |
---|
999 | type(xml_error_t), intent(in) :: error_info |
---|
1000 | end subroutine error_handler |
---|
1001 | |
---|
1002 | subroutine signal_handler(code) |
---|
1003 | logical, intent(out) :: code |
---|
1004 | end subroutine signal_handler |
---|
1005 | |
---|
1006 | subroutine empty_element_handler(name,attributes) |
---|
1007 | character(len=*), intent(in) :: name |
---|
1008 | type(dictionary_t), intent(in) :: attributes |
---|
1009 | end subroutine empty_element_handler |
---|
1010 | \end{verbatim} |
---|
1011 | \end{itemize} |
---|
1012 | |
---|
1013 | Other file handling routines (some of them really only useful within |
---|
1014 | the XPATH interface): |
---|
1015 | |
---|
1016 | \begin{itemize} |
---|
1017 | \item |
---|
1018 | \begin{verbatim} |
---|
1019 | subroutine REWIND_XMLFILE(fxml) |
---|
1020 | ! |
---|
1021 | ! Rewinds the physical file associated to fxml and clears the data |
---|
1022 | ! structures used in parsing. |
---|
1023 | ! |
---|
1024 | type(xml_t), intent(inout) :: fxml |
---|
1025 | \end{verbatim} |
---|
1026 | |
---|
1027 | \item |
---|
1028 | \begin{verbatim} |
---|
1029 | subroutine SYNC_XMLFILE(fxml,status) |
---|
1030 | ! |
---|
1031 | ! Synchronizes the physical file associated to fxml so that reading |
---|
1032 | ! can resume at the exact point in the parsing saved in fxml. |
---|
1033 | ! |
---|
1034 | type(xml_t), intent(inout) :: fxml |
---|
1035 | integer, intent(out) :: status |
---|
1036 | |
---|
1037 | \end{verbatim} |
---|
1038 | \item |
---|
1039 | \begin{verbatim} |
---|
1040 | subroutine CLOSE_XMLFILE(fxml) |
---|
1041 | ! |
---|
1042 | ! Closes the file handle fmxl (and the associated OS file object) |
---|
1043 | ! |
---|
1044 | type(xml_t), intent(inout) :: fxml |
---|
1045 | \end{verbatim} |
---|
1046 | \end{itemize} |
---|
1047 | |
---|
1048 | \subsection{XPATH interface} |
---|
1049 | % |
---|
1050 | \begin{itemize} |
---|
1051 | \item |
---|
1052 | \begin{verbatim} |
---|
1053 | subroutine MARK_NODE(fxml,path,att_name,att_value,attributes,status) |
---|
1054 | ! |
---|
1055 | ! Performs a search of a given element (by path, and/or presence of |
---|
1056 | ! a given attribute and/or value of that attribute), returning optionally |
---|
1057 | ! the element's attribute dictionary, and leaving the file handle fxml |
---|
1058 | ! ready to process the rest of the element's contents (child elements |
---|
1059 | ! and/or pcdata). |
---|
1060 | ! |
---|
1061 | ! Side effects: it sets a "path_mark" in fxml to enable its use as a |
---|
1062 | ! context. |
---|
1063 | ! |
---|
1064 | ! If the argument "path" is present and evaluates to a relative path (a |
---|
1065 | ! string not beginning with "/"), the search is interrupted after the end |
---|
1066 | ! of the "ancestor_element" set by a previous call to "mark_node". |
---|
1067 | ! If not earlier, the search ends at the end of the file. |
---|
1068 | ! |
---|
1069 | ! The status argument, if present, will hold a return value, |
---|
1070 | ! which will be: |
---|
1071 | ! |
---|
1072 | ! 0 on success, |
---|
1073 | ! negative in case of end-of-file or end-of-ancestor-element, or |
---|
1074 | ! positive in case of other malfunction |
---|
1075 | ! |
---|
1076 | type(xml_t), intent(inout), target :: fxml |
---|
1077 | character(len=*), intent(in), optional :: path |
---|
1078 | character(len=*), intent(in), optional :: att_name |
---|
1079 | character(len=*), intent(in), optional :: att_value |
---|
1080 | type(dictionary_t), intent(out), optional :: attributes |
---|
1081 | integer, intent(out), optional :: status |
---|
1082 | \end{verbatim} |
---|
1083 | |
---|
1084 | \item |
---|
1085 | \begin{verbatim} |
---|
1086 | subroutine GET_NODE(fxml,path,att_name,att_value,attributes,pcdata,status) |
---|
1087 | ! |
---|
1088 | ! Performs a search of a given element (by path, and/or presence of |
---|
1089 | ! a given attribute and/or value of that attribute), returning optionally |
---|
1090 | ! the element's attribute dictionary and any PCDATA characters contained |
---|
1091 | ! in the element's scope (but not child elements). It leaves the file handle |
---|
1092 | ! physically and logically positioned: |
---|
1093 | ! |
---|
1094 | ! after the end of the element's start tag if 'pcdata' is not present |
---|
1095 | ! after the end of the element's end tag if 'pcdata' is present |
---|
1096 | ! |
---|
1097 | ! If the argument "path" is present and evaluates to a relative path (a |
---|
1098 | ! string not beginning with "/"), the search is interrupted after the end |
---|
1099 | ! of the "ancestor_element" set by a previous call to "mark_node". |
---|
1100 | ! If not earlier, the search ends at the end of the file. |
---|
1101 | ! |
---|
1102 | ! The status argument, if present, will hold a return value, |
---|
1103 | ! which will be: |
---|
1104 | ! |
---|
1105 | ! 0 on success, |
---|
1106 | ! negative in case of end-of-file or end-of-ancestor-element, or |
---|
1107 | ! positive in case of a malfunction (such as the overflow of the |
---|
1108 | ! user's pcdata buffer). |
---|
1109 | ! |
---|
1110 | type(xml_t), intent(inout), target :: fxml |
---|
1111 | character(len=*), intent(in), optional :: path |
---|
1112 | character(len=*), intent(in), optional :: att_name |
---|
1113 | character(len=*), intent(in), optional :: att_value |
---|
1114 | type(dictionary_t), intent(out), optional :: attributes |
---|
1115 | character(len=*), intent(out), optional :: pcdata |
---|
1116 | integer, intent(out), optional :: status |
---|
1117 | \end{verbatim} |
---|
1118 | \end{itemize} |
---|
1119 | % |
---|
1120 | \subsection{PCDATA conversion routines} |
---|
1121 | \begin{itemize} |
---|
1122 | \item |
---|
1123 | |
---|
1124 | \begin{verbatim} |
---|
1125 | subroutine build_data_array(str,x,n) |
---|
1126 | ! |
---|
1127 | ! Incrementally builds the data array x from |
---|
1128 | ! character data contained in str. n holds |
---|
1129 | ! the number of entries of x set so far. |
---|
1130 | ! |
---|
1131 | character(len=*), intent(in) :: str |
---|
1132 | NUMERIC TYPE, dimension(:), intent(inout) :: x |
---|
1133 | integer, intent(inout) :: n |
---|
1134 | ! |
---|
1135 | ! NUMERIC TYPE can be any of: |
---|
1136 | ! integer |
---|
1137 | ! real |
---|
1138 | ! real(kind=selected_real_kind(14)) |
---|
1139 | ! |
---|
1140 | \end{verbatim} |
---|
1141 | \end{itemize} |
---|
1142 | |
---|
1143 | \subsection{Other utility routines} |
---|
1144 | \begin{itemize} |
---|
1145 | \item |
---|
1146 | |
---|
1147 | \begin{verbatim} |
---|
1148 | function xml_char_count(fxml) result (nc) |
---|
1149 | ! |
---|
1150 | ! Provides the value of the processed-characters counter |
---|
1151 | ! |
---|
1152 | type(xml_t), intent(in) :: fxml |
---|
1153 | integer :: nc |
---|
1154 | |
---|
1155 | nc = nchars_processed(fxml%fb) |
---|
1156 | |
---|
1157 | end function xml_char_count |
---|
1158 | \end{verbatim} |
---|
1159 | \end{itemize} |
---|
1160 | |
---|
1161 | \section{Other parser features, limitations, and design issues} |
---|
1162 | |
---|
1163 | \subsection{Features} |
---|
1164 | \begin{itemize} |
---|
1165 | \item |
---|
1166 | The parser can detect badly formed documents, giving by default an |
---|
1167 | error report including the line and column where it happened. It also |
---|
1168 | will accept an \texttt{error\_handler} routine as another optional |
---|
1169 | argument, for finer control by the user. In the SAX interface, if the |
---|
1170 | optional logical argument "verbose" is present and it is ".true.", the |
---|
1171 | parser will offer detailed information about its inner workings. In |
---|
1172 | the XPATH interface, there are a pair of routines, |
---|
1173 | \texttt{enable\_debug} and \texttt{disable\_debug}, to control |
---|
1174 | verbosity. See \texttt{Examples/xpath/} for examples. |
---|
1175 | |
---|
1176 | \item |
---|
1177 | It ignores PCDATA outside of element context (and warns about it) |
---|
1178 | |
---|
1179 | \item |
---|
1180 | Attribute values can be specified using both single and double |
---|
1181 | quotes (as per the XML specs). |
---|
1182 | |
---|
1183 | \item |
---|
1184 | It processes the default entities: \> \& \< \' and |
---|
1185 | \" and decimal and hex character entities (for example: \&\#123; |
---|
1186 | \&\#4E;). The processing is not |
---|
1187 | "on the fly", but after reading chunks of PCDATA. |
---|
1188 | |
---|
1189 | \item |
---|
1190 | Understands and processes CDATA sections (transparently passed as |
---|
1191 | PCDATA to the handler). |
---|
1192 | |
---|
1193 | \end{itemize} |
---|
1194 | |
---|
1195 | See \texttt{Examples/sax/features} for an illustration of the above |
---|
1196 | features. |
---|
1197 | |
---|
1198 | \subsection{Limitations} |
---|
1199 | \begin{itemize} |
---|
1200 | |
---|
1201 | \item It is not a validating parser. |
---|
1202 | |
---|
1203 | \item It accepts only single-byte encodings for characters. |
---|
1204 | |
---|
1205 | \item Currently, there are hard-wired limits on the length of element |
---|
1206 | and attribute identifiers, and the length of attribute values and |
---|
1207 | unbroken (i.e., without whitespace) PCDATA sections. The limit is |
---|
1208 | set in \texttt{sax/m\_buffer.f90} to \texttt{MAX\_BUFF\_SIZE=300}. |
---|
1209 | |
---|
1210 | \item Overly long comments and SGML declarations can also be |
---|
1211 | truncated, but the effect is currently harmless since the parser does |
---|
1212 | not make use of that information. In a future version there could be a |
---|
1213 | more robust retrieval mechanism. |
---|
1214 | |
---|
1215 | \item The number of attributes is limited to \texttt{MAX\_ITEMS=20} |
---|
1216 | in \texttt{sax/m\_dictionary.f90}: |
---|
1217 | |
---|
1218 | |
---|
1219 | \item In the XPATH interface, returned PCDATA character buffers |
---|
1220 | cannot be larger than an internal size of |
---|
1221 | \texttt{MAX\_PCDATA\_SIZE=65536} set in \texttt{xpath/m\_path.f90} |
---|
1222 | |
---|
1223 | |
---|
1224 | \end{itemize} |
---|
1225 | |
---|
1226 | \subsection{Design Issues} |
---|
1227 | |
---|
1228 | See \texttt{\{sax,xpath\}/Developer.Guide}. |
---|
1229 | |
---|
1230 | The parser is actually written in the \texttt{F} subset of Fortran90, |
---|
1231 | for which inexpensive compilers are available. (See |
---|
1232 | \texttt{http://fortran.com/imagine1/}). |
---|
1233 | |
---|
1234 | There are two other projects aimed at parsing XML in Fortran: those of |
---|
1235 | Mart Rentmeester (\texttt{http://nn-online.sci.kun.nl/fortran/}) and |
---|
1236 | Arjen Markus (\texttt{http://xml-fortran.sourceforge.net/}). Up to |
---|
1237 | this point the three projects have progressed independently, but it is |
---|
1238 | anticipated that there will be a pooling of efforts in the near |
---|
1239 | future. |
---|
1240 | |
---|
1241 | \newpage |
---|
1242 | \section{Installation Instructions} |
---|
1243 | % |
---|
1244 | There is extensible built-in support for arbitrary compilers. The |
---|
1245 | setup discussed below is taken from the author's \texttt{flib} |
---|
1246 | project\footnote{There seems to be other projects with that very obvious |
---|
1247 | name...} The idea is to have a configurable repository of useful |
---|
1248 | modules and library objects which can be accessed by fortran |
---|
1249 | programs. Different compilers are supported by tailored macros. |
---|
1250 | |
---|
1251 | \texttt{xmlf90} is just one of several packages in \texttt{flib}, |
---|
1252 | hence the \texttt{flib\_} prefix in the package's visible module |
---|
1253 | names. |
---|
1254 | |
---|
1255 | To install the package, follow this steps: |
---|
1256 | |
---|
1257 | \begin{verbatim} |
---|
1258 | |
---|
1259 | * Create a directory somewhere containing a copy of the stuff in the |
---|
1260 | subdirectory 'macros': |
---|
1261 | |
---|
1262 | cp -rp macros $HOME/flib |
---|
1263 | |
---|
1264 | * Define the environment variable FLIB_ROOT to point to that directory. |
---|
1265 | |
---|
1266 | FLIB_ROOT=$HOME/flib ; export FLIB_ROOT (sh-like shells) |
---|
1267 | setenv FLIB_ROOT $HOME/flib (csh-like shells) |
---|
1268 | |
---|
1269 | |
---|
1270 | * Go into $FLIB_ROOT, look through the fortran-XXXX.mk files, |
---|
1271 | and see if one of them applies to your computer/compiler combination. |
---|
1272 | If so, copy it or make a (symbolic) link to 'fortran.mk': |
---|
1273 | |
---|
1274 | ln -sf fortran-lf95.mk fortran.mk |
---|
1275 | |
---|
1276 | If none of the .mk files look useful, write your own, using the |
---|
1277 | files provided as a guide. Basically you need to figure out the |
---|
1278 | name and options for the compiler, the extension assigned to |
---|
1279 | module files, and the flag used to identify the module search path. |
---|
1280 | |
---|
1281 | The above steps need only be done once. |
---|
1282 | |
---|
1283 | * Go into subdirectory 'sax' and type 'make'. |
---|
1284 | * Go into subdirectory 'xpath' and type 'make'. |
---|
1285 | * Go into subdirectory 'Tutorial' and try the exercises in this guide |
---|
1286 | (see the next section for compilation details). |
---|
1287 | * Go into subdirectory 'Examples' and explore. |
---|
1288 | |
---|
1289 | \end{verbatim} |
---|
1290 | % |
---|
1291 | \section{Compiling user programs} |
---|
1292 | \label{sec:compiling} |
---|
1293 | |
---|
1294 | After installation, the appropriate modules and library files should |
---|
1295 | already be in \texttt{\$FLIB\_ROOT/modules} and |
---|
1296 | \texttt{\$FLIB\_ROOT/lib}, respectively. To compile user programs, it |
---|
1297 | is suggested that the user create a separate directory to hold the |
---|
1298 | program files and prepare a \texttt{Makefile} following the template |
---|
1299 | (taken from \texttt{Examples/sax/simple/}): |
---|
1300 | |
---|
1301 | \begin{verbatim} |
---|
1302 | #--------------------------------------------------------------- |
---|
1303 | # |
---|
1304 | default: example |
---|
1305 | # |
---|
1306 | #--------------------------- |
---|
1307 | MK=$(FLIB_ROOT)/fortran.mk |
---|
1308 | include $(MK) |
---|
1309 | #--------------------------- |
---|
1310 | # |
---|
1311 | # Uncomment the following line for debugging support |
---|
1312 | # |
---|
1313 | FFLAGS=$(FFLAGS_DEBUG) |
---|
1314 | # |
---|
1315 | LIBS=$(LIB_PREFIX)$(LIB_STD) -lflib |
---|
1316 | # |
---|
1317 | OBJS= m_handlers.o example.o |
---|
1318 | |
---|
1319 | example: $(OBJS) |
---|
1320 | $(FC) $(LDFLAGS) -o $@ $(OBJS) $(LIBS) |
---|
1321 | # |
---|
1322 | clean: |
---|
1323 | rm -f *.o example *$(MOD_EXT) |
---|
1324 | # |
---|
1325 | #--------------------------------------------------------------- |
---|
1326 | \end{verbatim} |
---|
1327 | % |
---|
1328 | Here it is assumed that the user has two source files, |
---|
1329 | \texttt{example.f90} and \texttt{m\_handlers.f90}. Simply typing |
---|
1330 | \texttt{make} will compile \texttt{example}, pulling in all the needed |
---|
1331 | modules and library objects. |
---|
1332 | |
---|
1333 | |
---|
1334 | \end{document} |
---|