CommonDoc ยป Operations

As computers capable of constructing concordances become more and more acessible, the task of compiling such an index becomes less and less significant. What was once the work of a lifetime โ€“ or longer โ€“ is now a relatively modest project. In 1875, Mary Cowden Clarke proudly wrote in the preface to her concordance of Shakespeare that "to furnish a faithful guide to this rich mine of intellectual treasure... has been the ambition of a life; and it is hoped that the sixteen years' assiduous labour... may be found to have accomplished that ambition". It may have been hard for Mrs. Clarke to imagine that a century later, just one person, Todd K. Bender, professor of English at the University of Wisconsin, would produce nine concordances in the time it took her to construct one.

โ€” Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images

Representing documents is half the battle: Now we need ways to traverse, edit and filter them.

CommonDoc, on top of providing the representation of documents, also provides operations that can be applied to all documents. These range from the simple operation of traversing every node in the document to more complex tasks like generating a table of contents or ensuring every section in the document has a unique ID.

Document Traversal

traverse-document(node function &optional depth)
Apply a side-effectful function recursively to every element in the document. Depth-first. Doesn't apply the function to the document itself.
with-document-traversal((doc node &optional (depth (quote depth))) &body body)
Execute body in each node of the document.

Examples

(defpackage traverse-example
  (:use :cl :common-doc)
  (:import-from :common-doc.ops
                :with-document-traversal))
(in-package :traverse-example)

(defvar *document*
  (make-document "test"
                 :children
                 (list
                  (make-bold
                   (list
                    (make-italic
                     (list
                      (make-underline
                       (list (make-text "Hello, world!"))))))))))

(with-document-traversal (*document* node)
  (print node))

;; #<DOCUMENT "test"> 
;; #<BOLD children: ITALIC> 
;; #<ITALIC children: UNDERLINE> 
;; #<UNDERLINE children: TEXT-NODE> 
;; #<TEXT-NODE text: Hello, world!> 
;; NIL

Extraction

Many textbooks include lists of figures and tables. These operations make this kind of document preparation tasks easier.

collect-figures(doc-or-node)
Return a list of figures in the document.
collect-images(doc-or-node)
Return a list of images in the document.
collect-tables(doc-or-node)
Return a list of tables in the document.
collect-external-links(doc-or-node)
Return a list of external links in the document.
collect-all-text(doc-or-node)
Return all the text from a node or document.

Examples

(defpackage extraction-example
  (:use :cl :common-doc)
  (:import-from :common-doc.ops
                :collect-figures))
(in-package :extraction-example)

(defvar *document*
  (make-document "test"
                 :children
                 (list
                  (make-section
                   (list (make-text "Section 1"))
                   :children
                   (list
                    (make-figure
                     (make-image "fig1.jpg")
                     (list
                      (make-text "Fig 1")))))
                  (make-section
                   (list (make-text "Section 2"))
                   :children
                   (list
                    (make-figure
                     (make-image "fig2.jpg")
                     (list
                      (make-text "Fig 2"))))))))

(collect-figures *document*) ;; => (#<FIGURE {1009913D83}> #<FIGURE {1009A98923}>)

Filling References

This operation goes through a document, ensuring every section has a unique reference ID. Each ID is the 'slug' of the title's text (The text is extracted using the collect-all-text operations), optionally with a number preprended if this slug is not unique.

fill-unique-refs(doc-or-node)
Recur through a document, giving unique reference IDs to each section.

Table of Contents

table-of-contents(doc-or-node &key max-depth)
Extract a tree of document links representing the table of contents of a document. All the sections in the document must have references, so you should call fill-unique-refs first.

Examples

(defpackage toc-example
  (:use :cl :common-doc)
  (:import-from :common-doc.ops
                :table-of-contents))
(in-package :toc-example)

(defvar *document*
  (make-document "test"
                 :children
                 (list
                  (make-section
                   (list (make-text "Section 1"))
                   :reference "sec1"
                   :children
                   (list
                    (make-content
                     (list
                      (make-content
                       (list
                        (make-section
                         (list (make-text "Section 1.1"))
                         :reference "sec11")))))))
                  (make-section
                   (list (make-text "Section 2"))
                   :reference "sec2"
                   :children
                   (list
                    (make-text "sec2 contents"))))))

(defvar *toc* (table-of-contents *document*))

(dump *toc*)
;; ordered-list [class=toc]
;;   list-item
;;     content-node
;;       document-link
;;         text-node
;;           "Section 1"
;;       ordered-list
;;         list-item
;;           content-node
;;             document-link
;;               text-node
;;                 "Section 1.1"
;;   list-item
;;     content-node
;;       document-link
;;         text-node
;;           "Section 2"

Equality

node-equal(node-a node-b)
Recursively check whether two nodes are equal.
node-specific-equal(node-a node-b)
Use this method to make node equality more specific.