The DATAMP Patent XML Specification
Table Of Contents
XML Basics
DATAMP Patent XML
Examples

Describing Patent Data in XML:

The following documents the XML format the DATAMP project uses for input of patent data. All XML input will be checked against this specification, and only uploaded if it adheres to it.

XML Basics:

For those unfamiliar with XML, there are several very good tutorials avaialable on the web. My personal favorite is the one from W3Schools.org, available here.

XML Overview: Elements and Attributes

XML is a text format very similar to HTML, but with stricter formatting rules. Like HTML, XML, is comprised of elements (sometimes with attributes), and text. Elements have a start and end tag, which consist of the element names enclosed in angle brackets, as such:

<element>(element contents)</element>

Note that the end tag uses a '/' character to indicate that it is the end of the element. Between the start and end tags you an include other elements or text, depending on what data you are representing. Each XML format defines its own rules for what is valid content in each element.

Unlike HTML, XML does not define specific tags that can be used-- it allows different systems to define tags that make sense for their specific task. For example, in DATAMP, we define tags for things like "Patents" and "Patentees".

If an element has no sub-elements or text content, you can use a special shorthand notation to indicate this. Rather than using a separate start and end tag for these elements, you can use an empty tag, as such:

<element/>

The trailing slash on the name indicates that there is no closing tag.

Certain elements may also support attributes, which give additional information about the element. Attributes are specified as a set of "name=value" fields inside the start tage, as such:

<element attr1='Hello' attr2="123">(element contents)</element>

Note that the attribute values must be enclosed in quotes (single or double, it doesn't matter). Each element will define the set of attributes that it allows.

XML Overview: Special Characters

You will notice that there are certain characters that are "important" to XML: greater and less than symbols (<. >), and quotation marks( ', "). If these symbols show up in your data, you need to take some special stems To prevent them from being interpreted as XML markup (and causing errors).

XML handles this by defining entities for special characters. An entity is specified using the format "&name;", where "name" is a fixed value representing the character. The following entities are defined by default:

Entity Encoding Example
Apostrophe (') &apos; Scarlett O'Hara => Scarlett O&apos;Hara
Quotes (") &quot; This is "unusual" => This is &quot;unusual&quot;
Less than (<) &lt; 5 <10 => 5 &lt; 10
Greater than (>) &gt; 15 >10 => 15 &gt; 10
Ampersand (&) &amp; Rock & Roll => Rock &amp; Roll

These special encodings can be used in element attribute values, as well as in element text, as such:

<name first='Scarlett' last='O&apos;Hara'>
  Proprieter of &quot;Tara&quot;
</name>

This example shoes one of the key reasons we need to worry about these characters. If we were to try to use an apostrophe in O'Hara, it would cause the value to look like 'O'Hara'-- the XML parser would assume the value stopped after the O.

XML Overview: Preamble

All XML files start with a fixed preamble, indicating that this is an XML file, and what kind of characters it uses. For all DATAMP XML files, we use the following preamble:

<?xml version="1.0" encoding="UTF-8"?>

This tells the system that this is an XML version 1.0 file, and that it uses UTF-8 characters (don't worry if you don't know what that means-- just include the line exactly as written and you'll be fine).

XML Overview: The Document Element

Every XML file has exactly one top-level element, which is called the document element. It must follow immediately after the preamble. All other elements in the file are children od this top-level element.

<?xml version="1.0" encoding="UTF-8"?>
<documentElement>
   ...contents of the XML file...
<<documentElement>

The actual name of the document element is specified by the type of XML file you are using. For DATAMP, our document element is called "Patents".

XML Overview: The Schema

As you can see from the preceding sections, XML puts very little restriction on what the individual elements contain, and what nesting of elements is permitted. This allows a great deal of flexibility, but it also means that there is no way for XML to validate that the contents of the file are actually meaningful.

To allow a system to define a structure for its particular "flavor" of XML, a special XML file called a XML Schema is used. This contains rules about what elements and attributes are valid, and how they are nested. The schema can be used to validate an XML file's contents to make sure it is correct and meaningful.

The schema definition is included in a file using a special attribute on the document element, called an XML namespace (xmlns for short). The namespace contains the URL of the namespace definition, as such:

<?xml version="1.0" encoding="UTF-8"?>
<documentElement xmlns="http://...">
   ...contents of the XML file...
<<documentElement>

For DATAMP, we use the URL "http://www.datamp.org/2003/xsd/patentInput.xsd" for our schema definition. This is a normal XML file, so if you are interested in exactly what the rules are you can refer to this file (if you know how to read a schema!).

DATAMP Patent XML:

The following sections describe the various elements which are used by the DATAMP XML schema, what attributes they support, and how they are nested.

DATAMP XML: Patents Elemenmt

The Patents element is the highest-level (document) element. There must be exactly one Patents element, and it must be the first element defined in the XML file.

The following attributes may be defined for the Patents element:

Attribute Name
Required?
Default
Comments
xmlns
yes
none
This specifies the schema to use for validation of the XML file contents. It MUST contain the URL: http://www.datamp.org/2003/xsd/patentInput.xsd

The following elements may be nested under the Patents element:

Element Name
Required?
Allow
Multiples?
Comments
Patent
Yes
Yes
There will be one Patent element for each patent in the data set.

This element may not define any non-element content (PCDATA).

DATAMP XML: Patent Element

The Patent element defines the information for a single patent. There will be one Patent element for each patent in the file.

The following attributes may be defined for the Patents element:

Attribute Name
Required?
Default
Comments
number
Yes
None
Specifies the number of the patent. This number should not include any type specification, so patent "X1,234" will have the number "1234".
type
No
'UT'
Specifies the type of patent. Valid values for this field are:
  • UT - Utility Patent (default)
  • AI - Additional Improvement
  • D - Design Patent
  • PP - Plant Patent
  • X - Pre-1830 Patent
country
No
'US'
Specifies the country that granted the patent. If not explicitly specified otherwise, it is assumed to be a US patent.
title
Yes
None
Specifies the title of the patent, as taken from the specification.
alt
No
None
Specifies a more descriptive title for the patent. Many patents have very general titles (such as "Woodworking Machine"), so this field allows us to put something more descriptive (like "Surface Planer").

The following elements may be nested under the Patent element:

Element Name
Required?
Allow
Multiples?
Comments
Description
No
No
Textual description of the tool.
GrantDate
Yes
No
Specifies the grant date for the patent.
ApplicationDate
No
No
Specifies the application date for the patent.
EffectiveDate
No
No
For antedated patents, specifies the date when the patent became effective. If not specified, the effective date is the same as the grant date.
Patentee
Yes
Yes
Specifies information about a person the patent was granted to. If the patent was granted to more than one person, each will have its own patentee element.
Classification
Yes
Yes
Specifies information about the patent classification. If the patent is in multiple classes, there will be one element for each classification.
Type
Yes
Yes
Specifies information about the type of tool covered by the patent, taken from a standardized list. A tool can be of more than one type (i.e. a combination tool).
Witness
No
Yes
Specifies information about a person who witnessed the patent. Each witness (there are usually at least 2) will have its own element.
Assignee
No
Yes
Specifies information about a person or company the patent was assigned to.
Manufacturer
No
Yes
Specifies information about a company that manufactured the patented tool. In the case of long-lived or widely copied patents, only those who manufactured the tool while under patent will be listed.
Link
No
Yes
Specifies the URL of a web page or other external source of information about the patented tool.
ReissuedAs
No
No
Specifies the reissued patent for this patent, if any.
ReissueOf
No
No
For reissued patents, specifies the patent which this patent is a reissue of.
KnownExamples
No
No
This element will be present if there are known examples of tools using the patent extant.
Gizmo
No
No
This element will be present if patent is unusual or amusing.
Notable
No
No
This element will be present if patent is notable for some reason.
Extended
No
No
This element will be present if patent was ever granted an extension. Details of the extension should be in the Description field.

This element may not define any non-element content (PCDATA).

DATAMP XML: GrantDate Element

The Date element contains information about the date the patent was granted.

The following attributes may be defined for the GrantDate element:

Attribute Name
Required?
Default
Comments
month
Yes
None
Month component of date (1-12).
day
Yes
None
Day component of date (1-31).
year
Yes
None
Year component of date.

The following elements may be nested under the GrantDate element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Name Element

The Name element contains information about a person. It is used for various data such as:

The following attributes may be defined for the Name element:

Attribute Name
Required?
Default
Comments
first
Yes
None
Person's first name.
middle
No
None.
Person's middle name or initial (if any).
Last
Yes
None.
Person's last name
suffix
No
None.
Suffix for the person's name ("Jr.", "Esq.", etc.)

The following elements may be nested under the Name element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Company Element

The Company element contains information about a company. It is used for various data such as:

The following attributes may be defined for the Company element:

Attribute Name
Required?
Default
Comments
name
Yes
None
Name of the company.
sortby
No
None
Name to use for sorting the company. If this is not specified, the company name will be used.
This allows us to sort something like "E.A. Fay and Co." as "Fay and Co."

The following elements may be nested under the Company element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Location Element

The Location element contains information about where a company or person is located.

The following attributes may be defined for the Location element:

Attribute Name
Required?
Default
Comments
city
Yes
None
Name of the city the person/company is located in.
state
No
None
Name of the state the person/company is located in.
country
Yes
'US'
Name of the country the person/company is located in.

The following elements may be nested under the Location element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Classification Element

The Classification element contains information about a patent classification.

The following attributes may be defined for the Classification element:

Attribute Name
Required?
Default
Comments
class
Yes
None
Main patent classification for the patent.
subclass
Yes
None
Subclass for the patent classification.

The following elements may be nested under the Classification element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Link Element

The Link element contains information about an external URL associated with the patent.

The following attributes may be defined for the Link element:

Attribute Name
Required?
Default
Comments
href
Yes
None
URL of the associated file.

The following elements may be nested under the Link element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

The non-element content specifies the text of the hyperlink.

DATAMP XML: Patentee Element

The Patentee element contains information about a single patentee. There can be any number of patentees defined for a patent.

The following attributes may be defined for the Patentee element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Patentee element:

Element Name
Required?
Allow
Multiples?
Comments
Name
Yes
No
Contains information about the patentee name.
Location
No
No
Contains information about where the patentee lived when the patent was issued.

This element may not define any non-element content (PCDATA).

DATAMP XML: Witness Element

The Witnesses element contains information about a persons who witnessed the patent. There will be one Witness element for each person (at least two witnesses are required for a patent).

The following attributes may be defined for the Witness element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Witness element:

Element Name
Required?
Allow
Multiples?
Comments
Name
Yes
No
Contains information about the witness name.

This element may not define any non-element content (PCDATA).

DATAMP XML: Assignee Element

The Assignee element contains information about a single assignee for the patent. There can be any number of assigness defined for a patent.

The following attributes may be defined for the Assignee element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Assignee element:

Element Name
Required?
Allow
Multiples?
Comments
Name
Yes*
No
If the assignee is a person, this element will contain information about his name. There must be either a Name or Company element defined for each Assignee, but not both.
Company
Yes*
No
If the assignee is a company, this element will contain information about the company name. There must be either a Name or Company element defined for each Assignee, but not both.
Location
No
No
Contains information about where the assignee (person or company) was located at the time the patent was issued.

This element may not define any non-element content (PCDATA).

DATAMP XML: Manufacturer Element

The Manufacturer element contains information about a single manufacturer of the tool covered by the patent. There can be any number of manufacturers defined for a patent.

The following attributes may be defined for the Manufacturer element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Manufacturer element:

Element Name
Required?
Allow
Multiples?
Comments
Company
Yes*
No
This element will contain information about a company that manufactured the patented piece. There must be either a Name or Company element defined, but not both.
Name
Yes*
No
This element will contain information about a person who manufactured the patented piece. There must be either a Name or Company element defined, but not both.
Location
No
No
Contains information about where the company/person was located at the time they produced the tool.

This element may not define any non-element content (PCDATA).

DATAMP XML: Type Element

The Type element contains information about the type of tool described by the patent. A patent will define at least one type, and if a tool embodies more than one type (combination tool), more than one.

Types will be fixed by the system, but the number will be fairly large.

The following attributes may be defined for the Type element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Type element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

The non-element content will contain the name of the type. This type name consists of a colon-separated list in the format "class:category:type", and must be one defined in our category taxonomy.

DATAMP XML: Description Element

The Description element contains a brief description of the tool covered by the patent, and any other pertinent information. A patent may only will define one Description element.

The Description text can contain basic XHTML markup if desired.

The following attributes may be defined for the Description element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Description element:

Element Name
Required?
Allow
Multiples?
Comments
Only Simple XHTML Markup Elements Allowed

The non-element content will contain the descriptive text.

Two special non-XML tags are allows in the description text: links to other patents, and line breaks.

To insert a link to another patent in DATAMP, use the syntax "{xxx}", where "xxx" is the number of the patent you are linking to. For example, insert a link to patent D1234, you could do something like:

<description>This is similar to patent {D1,234}</description>

Note that the text inside the braces will become the link, so it is normally wise to include basic formatting as shown above.

To break a long description into paragraphs, you can use the "{br}" tag to insert a break. This will add a blank line to the description.

DATAMP XML: ApplicationDate Element

The Application element contains information about the date the patent was applied for. If the application date is not known, this element should not be present.

The following attributes may be defined for the ApplicationDate element:

Attribute Name
Required?
Default
Comments
month
Yes
None
Month component of date (1-12).
day
Yes
None
Day component of date (1-31).
year
Yes
None
Year component of date.

The following elements may be nested under the ApplicationDate element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: EffectiveDate Element

The EffectiveDate element contains information about the date the patent went into effect. If the patent is not explicitly antedated this element is not necessary, since the patent became effective on the grant date.

The following attributes may be defined for the EffectiveDate element:

Attribute Name
Required?
Default
Comments
month
Yes
None
Month component of date (1-12).
day
Yes
None
Day component of date (1-31).
year
Yes
None
Year component of date.

The following elements may be nested under the EffectiveDate element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: ReissueOf Element

In a reissued patent, the ReissueOf element specifies the patent that is being reissued. A reissued patent will have exactly one ReissueOf element.

The following attributes may be defined for the ReissueOf element:

Attribute Name
Required?
Default
Comments
number
Yes
None
Specifies the number of the original patent.
type
No
'UT'
Specifies the type of the original patent. Valid values for this field are:
  • UT - Utility Patent (default)
  • AI - Additional Improvement
  • D - Design Patent
  • PP - Plant Patent
  • X - Pre-1830 Patent
country
No
'US'
Specifies the country that granted the original patent. If not explicitly specified otherwise, it is assumed to be a US patent.

The following elements may be nested under the ReissueOf element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: ReissuedAs Element

If a patent is reissued, the ReissuedAs element specifies the name and number of the reissued patent.

The following attributes may be defined for the ReissuedAs element:

Attribute Name
Required?
Default
Comments
number
Yes
None
Specifies the number of the reissued patent.
type
No
'RE'
Specifies the type of the reissued patent. In this case, the only valid type for the patent is "RE".
country
No
'US'
Specifies the country that reissued the patent. If not explicitly specified otherwise, it is assumed to be a US patent.

The following elements may be nested under the ReissuedAs element:

Element Name
Required?
Allow
Multiples?
Comments
No Subelements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: KnownExamples element

The KnownExamples element should be specified if there are known examples of the tool covered by the patent extant.

The following attributes may be defined for the KnownExamples element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the KnownExamples element:

Element Name
Required?
Allow
Multiples?
Comments
No Sub-Elements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Notable Element

The Notable element should be specified if patent is notable for some reason.

The following attributes may be defined for the Notable element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Notable element:

Element Name
Required?
Allow
Multiples?
Comments
No Sub-Elements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Gizmo Element

The Gizmo element should be specified if patent is particularly amusing or unusual.

The following attributes may be defined for the Gizmo element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Gizmo element:

Element Name
Required?
Allow
Multiples?
Comments
No Sub-Elements Allowed

This element may not define any non-element content (PCDATA).

DATAMP XML: Extended Element

The Extended element should be specified if patent was extended during its history.

The following attributes may be defined for the Extended element:

Attribute Name
Required?
Default
Comments
No Attributes Defined

The following elements may be nested under the Extended element:

Element Name
Required?
Allow
Multiples?
Comments
No Sub-Elements Allowed

This element may not define any non-element content (PCDATA).


Example: Patent Input XML

Here is a very basic example of the DATAMP XML input file format. It specifies the data for a pair of marking gage patents:

<?xml version="1.0" encoding="UTF-8"?>
<Patents xmlns='http://www.datamp.org/2003/xsd/patentInput.xsd'>

    <Patent number="15556" title="Carpenter's Gage"
                alt="Replaceable points for marking gages">
        <GrantDate month="8" day="19" year="1856"/>
        <Classification class="33" subclass="44"/>
        <Classification class="33" subclass="486"/>
        <Patentee>
            <Name first="Joel" last="Bryant"/>
            <Location city="Brooklyn" state="NY"/>
        </Patentee>
        <Type>layout tools:marking gauges</Type>
        <Description>Replacable points for marking gages.</Description>
        <Witness>
            <Name first="J." middle="L." last="Marcellus"/>
        </Witness>
        <Witness>
            <Name first="John" middle="C." last="Schencke"/>
        </Witness>
        <ReissuedAs number="448" type="RE"/>
    </Patent>

    <Patent number="17403" title="Compound Gage">
        <GrantDate month="5" day="26" year="1857"/>
        <Classification class="33" subclass="44"/>
        <Patentee>
            <Name first="Albert" last="Williams"/>
            <Location city="Philadelphia" state="PA"/>
        </Patentee>
        <Manufacturer>
            <Company name="Stanley Rule &amp; Level Co."/>
            <Location city="New Britain" state="CT"/>
        </Manufacturer>
        <Type>layout tools:marking gauges</Type>
        <Description>Combination Gage containing 3
         marking points and a pair of mortise points.
         Produced by Stanley Rule &amp; Level Co.</Description>
        <KnownExamples/>
        <Witness>
            <Name first="Enoch" last="Remick"/>
        </Witness>
        <Witness>
            <Name first="James" last="Nichol"/>
        </Witness>
    </Patent>
    
</Patents>