Common Malware Enumeration (CME)
CME > The CME Process  

The CME Process:
Scope, Identifiers, and Guidelines for Deconfliction

Document version: 0.0.3    Date: July 24, 2006

This is a draft report and does not represent an official position of The MITRE Corporation. Copyright © 2006, The MITRE Corporation. All rights reserved. Permission is granted to redistribute this document if this paragraph is not removed. This document is subject to change without notice.

Author: Desiree Beck, CME Team Member

Table of Contents

  1. Introduction
  2. Scope
  3. Identifiers
  4. Identifier Assignment Process
  5. Deconfliction
  6. Other Notes
  7. Glossary

1. Introduction

CME identifiers will identify "malware threats". At the most basic level, a "malware threat" is anything that has the potential to damage a computer system or network. Furthermore,

  • A malware threat can be identified by a signature.
  • A malware threat may or may not exploit a vulnerability.
  • A malware threat may or may not rely on user action in order to be effective.

The CME initiative assumes that it is possible to protect against a malware threat. Examples of malware threats include viruses, worms, and Trojan horses.

Malware threats will be represented by a collection of one or more "samples." A malware threat sample will likely contain multiple files (i.e., not consist of a single executable binary file). A CME identifier will be associated with one or more representative samples. Each sample in a CME identifier sample set should be equivalent with respect to deconfliction (see Section 5), but should illustrate an aspect of the malware threat not illustrated by any other sample.

As this paper will discuss, it is not necessarily possible to define malware threat attributes so that someone with their own threat sample will be able to find the correct CME identifier associated with the sample. More likely, the value of a CME identifier will be in the coordination of different security devices (e.g., host-based anti-virus (AV) products, gateway devices).

This document addresses operational aspects of the CME initiative, including the purpose and scope of CME, how CME identifiers are assigned, and the initial process for the deconfliction of malware threat samples. The initiative expects this document to evolve as a result of discussions among the CME Editorial Board and as additional identifiers are assigned over time.

Back to top

2. Scope

The objective of the CME initiative is to provide common identifiers to those malware threats that are of primary significance from the perspective of anti-virus vendors, IT security managers, and the general public. CME identifiers will not be assigned to all malware threats, but a CME identifier should be assigned to any malware threat for which one or more of the following statements is/are true:

  1. Considerable or notable malware threat(s) potentially confusing users.
    1. Process CME assignments for minor variants that quickly emerge following a related CME identified threat, to aid in de-confliction of multiple samples.
    2. Process CME assignments for notable threats that have multiple names and/or variants.
    3. Any aggregate threat from multiple variants, where individual variants do not qualify for a CME assignment but the aggregate threat does qualify for CME assignment. For example, CME-540 related to rapid exploitation of Microsoft Security Bulletin MS05-039: Vulnerability in Plug and Play Could Allow Remote Code Execution and Elevation of Privilege and dozens of bots that quickly emerged in the wild.
  2. The malware threat poses a considerable risk to a user.
    1. Likelihood.
      • Global prevalence.
      • Exploitation of a vulnerability.
      • Automated spreading capabilities.
    2. Impact.
      • Severe payload or incidental impact.
      • New major families of code or significant changes in functionality.
      • Difficulty to mitigate the malware.
  3. The malware threat has considerable media exposure.
    1. Multiple media sources reporting on the threat.
    2. Significant global media source reports on the threat.
    3. Consumers are found to have a high interest in a specific threat.

Spyware, adware, and phishing attacks are not currently within the scope of CME. The scope of CME may be expanded to encompass these and other types of "security risks" in the future, however.

Back to top

3. Identifiers

Initially, CME identifiers will be in the format CME-N where N is an integer between 1 and 999, such as "CME-123". Digits will be added when the remaining unused identifier space becomes too small.

Furthermore:

  • When necessary, CME IDs can be abbreviated (e.g., M123), but the official format (e.g., CME-123) should be used in places such as Web pages, alerts, encyclopedias, etc.
  • For the sake of successful text-based comparisons, leading zeros will always be omitted in an identifier. For example, CME-00123 will always be written as CME-123.
  • Identifiers will be randomly generated within each size range (e.g., CME-439 might be issued before CME-28). In this way it will not be possible for someone to assign their own identifier by guessing the next in sequence.
  • When more identifiers are needed, the CME Editorial Board will decide how many digits to add. Eventually, CME will use up to seven digits.

Back to top

4. Identifier Assignment Process

The secure CME Submission Server, which is used only by authorized members of the CME Sample Redistribution Group, went online in April 2005 for assigning CME identifiers. Highlights of this portion of the process include:

  • CME identifiers are assigned to "malware threats" and not to individual threat components.
  • CME identifier distribution is largely automated.
  • Samples from participants should be submitted as close to signature generation time as possible.
  • Deconfliction will be done by the existing 24x7 malware analysis teams among participating vendor organizations. Final deconfliction decisions will be by consensus of the CME Sample Redistribution Group. Note that the deconfliction process is the most difficult aspect of the CME identifier assignment process; see Section 5 for a detailed discussion.
  • Samples will only be shared among the trusted CME Participants. A submitter will bundle all files of the sample into a zip archive. The archive will be encrypted with all keys contained on the CME PGP key ring so that all members of the Sample Redistribution Group will have access to the sample. Samples are not stored on the submission server, but are immediately redistributed to the Sample Redistribution Group. Initially, Sample Redistribution Group membership will overlap with the CME Editorial Board.

The process for a CME Participant to acquire a CME identifier is:

  1. A participant identifies a sample that is (a) critical in nature, and (b) does not yet have a CME identifier. (This second question is difficult and must be verified as part of the deconfliction process.)
  2. The participant requests a CME identifier through an automated, Web-based interface. The request includes the sample and any available supporting data.
  3. If there have not been any other requests for CME identifiers in a 2-hour time window, then the automatic program responds with a CME identifier and sends an email notification to all CME participants.
  4. If there have been earlier requests within a 2-hour time window, then a "moratorium" period is entered. The automatic program does not provide a CME identifier; instead, it lists all recent requests. The current Sample Redistribution Group must review the recent requests to determine whether the new sample is a duplicate of an existing request (see Section 5 for more on deconfliction). If the deconfliction process indicates that the sample is equivalent to the earlier request, then the CME identifier for the original request is used. If the sample is a new threat, then the current requester "overrides" the moratorium and forces a new CME identifier to be obtained. To prevent abuse of this system, only trusted users can override requests.

At present, all sample submission is performed by members of the CME Sample Redistribution Group.

Back to top

5. Deconfliction

The deconfliction process answers the question, when are two malware threats equivalent? If one malware threat is equivalent to another malware threat, they will both have the same CME identifier. If they are not equivalent, they will have different CME identifiers.

The deconfliction process is one of group consensus by the Board where the group follows the current technical CME Deconfliction Guidelines in Section 5.1. When an appropriate guideline does not yet exist, the Board will formally define one when possible.

The general deconfliction process for CME is as follows:

  1. The submitter provides a malware threat sample and as much analysis information as possible. Analysis information should indicate why the submitter thinks that the threat is different than other threats that have CME identifiers. Specific guidelines (Section 5.1) should be referenced when possible.
  2. Members of the CME Editorial Board will evaluate the deconfliction information and will submit any information that indicates whether or not the sample is equivalent to a malware threat previously identified.
  3. If there is disagreement over whether a sample requires a new CME identifier, email will be exchanged or a teleconference will be held so that consensus can be reached.

When multiple outbreaks are underway, it may take time for samples to be submitted. It will be crucial for the submitter, as well as others on the Board, to modify the request (e.g., add additional samples, add new supporting files, modify previously submitted analysis notes) to ensure deconfliction is completed accurately.

5.1 CME Deconfliction Guidelines

This is the initial list of guidelines for deconfliction as defined by the Board. Additional guidelines will be identified based on operational content decisions of the Sample Redistribution Group as additional CME identifiers are assigned. Ideally, the deconfliction process will evolve over time to depend more and more on technical characteristics of the malware threat samples and to follow more and more explicit guidelines.

For some guidelines, example cases are provided that refer to malware threats by name. After each name, the vendor(s) using the particular name is given in parentheses.

GUIDELINES:
G.1 Every file or component of a malware threat will be assigned the same CME identifier.

If a new outbreak downloads additional files from an external website, the downloaded files will get the same CME identifier as the file that initiated the download. In the cases where there is more than one downloaded file, or when the downloaded file changes (e.g., a modified version is uploaded), the CME identifier is assigned to all the additional files.

This will mean that a single file (e.g., a file downloaded by multiple threats) might be associated with more than one CME identifier. In their product, a vendor may only be able to identify the first CME identifier assigned to a file. However, other assigned CME identifiers should be provided in the vendor's encyclopedia.

Some files that are associated with a malware threat are excluded from CME identifier coverage. For example, valid and harmless .com files that might be used as part of a malware threat should not be assigned a CME identifier. See other guidelines for specific details.

Example case: Bagle.BE (Trend) outbreak in Feb. 2005 arrived as a downloader file, which downloaded additional files from several URLs included in the malware code. The downloader file and all of the additional downloaded files would have the same CME identifier.

G2. New files uploaded to a download site more than 48 hours after an initial outbreak will not be associated with any CME identifier.

There must be a limit on the number of files associated with a CME identifier.

Example case: None

G3. Log files generated by a malware threat and stored on the victim hard drive are not associated with the CME identifier.

A description of the log file would be an attribute contained in the CME identifier profile.

G4. A system file that is modified by a malware threat is not assigned a CME identifier directly. Rather, the fact that the system file is modified is an attribute of the CME-identified malware threat.

A description of the modification would be an attribute contained in the CME identifier profile.

Example case: Matcher.A (Trend) outbreak in July 2001 made a harmless modification to autoexec.bat. This file would not be identified by the CME identifier.

G5. Any file that is dropped by a malware threat is associated with the CME identifier, whether or not the file is malicious (subject to guideline G-6).
 
G6. Code that exploits a vulnerability that can be detected with a scanner will be assigned a CME identifier, along with any related files.

Example case: Nimda.A (Trend) outbreak in September 2001 arrived as an email attachment, dropped several files on the hard drive, infected files, and spread as a network worm. A CME identifier would be assigned to the byte sequence captured by a scanner, as well as to the email attachment, dropped files, infected files, and downloaded files.

G7. Memory dumps will be assigned a CME identifier, along with any related files.

Example case: the CodeRed outbreak in July 2001 caused a buffer overflow and never dropped any files to the hard drive. The memory dump of CodeRed would be assigned a CME identifier.

G8. Some tangible file (e.g., a packet capture) is required before a CME identifier can be assigned.

Example case: The Slammer worm (outbreak January 2003) was contained in a single UDP packet. Until a packet capture was available, a CME identifier could not be assigned.

G9. Malware threats that have functional differences will be assigned different CME identifiers.

A functional difference is defined to be any byte difference in the code. Examples include a difference of port number or email subject line. Vendors do not always report all functionally different malware threats to customers, choosing instead to associated multiple threats with a single name. In these cases, the single name would be associated in the vendor encyclopedia with multiple CME identifiers.

Example case: Many files were associated with Bagle activity on 3/1/05. Because of string differences and a difference of downloaded files, five different CME identifiers would have been assigned.

G10. A difference of attributes that are randomly generated by a malware threat (e.g., randomly generated email subject lines) does not constitute a functional difference.
 
G11. The packing method of a malware threat does not constitute a functional difference.
 
G12. Each malware threat created by a single malware threat "construction kit" will be given separate CME identifier if they are functionally different. A separate CME identifier will be assigned to the construction kit itself.

Back to top

6. Other notes

Other items that have resulted from discussion of the CME Editorial Board:

  • The date a threat first appears will be given in Universal Coordinated Time (UTC).

Back to top

7. Glossary

Adware - A form of spyware that collects information about the user in order to display advertisements. The term can also refer to software that contains embedded advertisements.

High-profile, high-impact - Malware threats that satisfy outbreak conditions.

Malware threat - Anything that has the potential to damage a computer system or network.

Security risk - Software that may pose a security risk, depending on the policies, expectations, and knowledge of the user.

Sufficiently high - This term is intentionally vague and simply means that the particular characteristic of the malware threat warrants the assignment of a CME identifier.

Spyware - Any software that covertly gathers user information.

Trojan - Code that does something that is not expected by the executor of the code.

Virus - A program that infects a computer by attaching itself to another program, and propagating itself when that program is executed.

Worm - A computer program that can make copies of itself, and spreads through connected systems, using up resources in affected computers or causing other damage.

Back to top