As computers capable of constructing concordances become more and more acessible, the task of compiling such an index becomes less and less significant. What was once the work of a lifetime โ or longer โ is now a relatively modest project. In 1875, Mary Cowden Clarke proudly wrote in the preface to her concordance of Shakespeare that "to furnish a faithful guide to this rich mine of intellectual treasure... has been the ambition of a life; and it is hoped that the sixteen years' assiduous labour... may be found to have accomplished that ambition". It may have been hard for Mrs. Clarke to imagine that a century later, just one person, Todd K. Bender, professor of English at the University of Wisconsin, would produce nine concordances in the time it took her to construct one.
โ Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images
Representing documents is half the battle: Now we need ways to traverse, edit and filter them.
CommonDoc, on top of providing the representation of documents, also provides operations that can be applied to all documents. These range from the simple operation of traversing every node in the document to more complex tasks like generating a table of contents or ensuring every section in the document has a unique ID.
Document Traversal
traverse-document
(node function &optional depth)
with-document-traversal
((doc node &optional (depth (quote depth))) &body body)
body
in each node
of the document.Examples
(defpackage traverse-example
(:use :cl :common-doc)
(:import-from :common-doc.ops
:with-document-traversal))
(in-package :traverse-example)
(defvar *document*
(make-document "test"
:children
(list
(make-bold
(list
(make-italic
(list
(make-underline
(list (make-text "Hello, world!"))))))))))
(with-document-traversal (*document* node)
(print node))
;; #<DOCUMENT "test">
;; #<BOLD children: ITALIC>
;; #<ITALIC children: UNDERLINE>
;; #<UNDERLINE children: TEXT-NODE>
;; #<TEXT-NODE text: Hello, world!>
;; NIL
Extraction
Many textbooks include lists of figures and tables. These operations make this kind of document preparation tasks easier.
collect-figures
(doc-or-node)
collect-images
(doc-or-node)
collect-tables
(doc-or-node)
collect-external-links
(doc-or-node)
collect-all-text
(doc-or-node)
Examples
(defpackage extraction-example
(:use :cl :common-doc)
(:import-from :common-doc.ops
:collect-figures))
(in-package :extraction-example)
(defvar *document*
(make-document "test"
:children
(list
(make-section
(list (make-text "Section 1"))
:children
(list
(make-figure
(make-image "fig1.jpg")
(list
(make-text "Fig 1")))))
(make-section
(list (make-text "Section 2"))
:children
(list
(make-figure
(make-image "fig2.jpg")
(list
(make-text "Fig 2"))))))))
(collect-figures *document*) ;; => (#<FIGURE {1009913D83}> #<FIGURE {1009A98923}>)
Filling References
This operation goes through a document, ensuring every section has a unique
reference ID. Each ID is the 'slug' of the title's text (The text is extracted
using the collect-all-text
operations), optionally with a number preprended
if this slug is not unique.
fill-unique-refs
(doc-or-node)
Table of Contents
table-of-contents
(doc-or-node &key max-depth)
Examples
(defpackage toc-example
(:use :cl :common-doc)
(:import-from :common-doc.ops
:table-of-contents))
(in-package :toc-example)
(defvar *document*
(make-document "test"
:children
(list
(make-section
(list (make-text "Section 1"))
:reference "sec1"
:children
(list
(make-content
(list
(make-content
(list
(make-section
(list (make-text "Section 1.1"))
:reference "sec11")))))))
(make-section
(list (make-text "Section 2"))
:reference "sec2"
:children
(list
(make-text "sec2 contents"))))))
(defvar *toc* (table-of-contents *document*))
(dump *toc*)
;; ordered-list [class=toc]
;; list-item
;; content-node
;; document-link
;; text-node
;; "Section 1"
;; ordered-list
;; list-item
;; content-node
;; document-link
;; text-node
;; "Section 1.1"
;; list-item
;; content-node
;; document-link
;; text-node
;; "Section 2"
Equality
node-equal
(node-a node-b)
node-specific-equal
(node-a node-b)