Google Code Search

  Google Code Search packagemap file definition


Terms of Use

Discussion Group

Google Labs

Contents

Overview
Syntax

Overview [Contents]

Google Code Search enables users to search the web for archives containing source code. Our software locates source code files within those archives, and detects the language and licence. Just as you can use a regular Sitemap to give us information about the pages on your site, you can use a packagemap file to tell us the language and licence of the source code in your archive files.

Syntax [Contents]

The code is in XML. Here is an example:

<?xml version="1.0" encoding="UTF-8"?>
<fileset>
<file>
   <path>source/myfile.cpp</path>
   <type>C++</type>
   <license>LGPL</icense>
</file>    

<file>     
   <path>messages/messages.tgz</path>
   <type>archive</type>     
   <license>BSD</license>     
   <packagemap>info/PackageMap.xml</packagemap>  
</file>    
</fileset>

File names

In a Code Search Sitemap, specify the name of the packagemap with the <packagemap> tag. If you don't specify the packagemap file, we will check the top directory in the archive for the following files, and use the first one that is found:

  • PACKAGEMAP.XML
  • PACKAGEMAP.xml
  • Packagemap.xml
  • packagemap.xml
  • PACKAGEMAP
  • Packagemap
  • packagemap

XML tag definitions

The available XML tags are described below.

<fileset>
required Encapsulates the file and references the current protocol standard.
<file>
required Child of <fileset>
<path>
required Child of <file>. Describes the file path within the archive. Case-sensitive: can contain any characters.
<type>
required

Child of <file>. Value can be a language name or "archive". Examples for the language name include: "C", "Python", "C#", "Java", "Vim".

Case is ignored: "Java", "JAVA" and "java" are equivalent.

The value must be printable ASCII characters, no white space.

The name must be one of the supported languages.

We only index files with a supported language. All other files will be ignored. You can use a language name that we do not support yet, and we may index the file in the future.

The special value "archive" can be used for an archive inside an archive. This is only useful if this archive contains source code.

Because Code Search indexes only source code, there is no need to add an entry for any archive containing only text, html, etc.

<license>
optional

Child of <file>. Value should be the name of the copyright licence. Examples include: "GPL", "BSD", "Python", "disclaimer".

Case is ignored: "LPGL", "Lgpl" and "lgpl" are equivalent.

When <type> is "archive" the value of <licence> is the default licence for the files in the archive. A different licence can be specified for specific files with a packagemap in the archive.

The licence must be one of the supported licences. We ignore unrecognised licences, and list the licence value as "unknown".

<packagemap>
optional

Child of <file>. The name of the packagemap file is inside the archive. We recommend "PACKAGEMAP.xml". In this case, we will automatically detect the packagemap file, so you do not need to include it here.

Case-sensitive.

This tag can be used only for <file> entries where the value of <type> is "archive".

Entity escaping

Leading and trailing white space is ignored. UTF-8 encoding is mandatory. As with all XML files, any data values (including URLs) must use entity escape codes for the characters listed in the table below.

Character Escape Code
Ampersand & &amp;
Single Quote ' &apos;
Double Quote " &quot;
Greater Than > &gt;
Less Than < &lt;


Google Home - Google Labs - Discuss - Terms of Service - Help - Submit Your CodeNew!

©2009 Google