Common Malware Enumeration (CME)

This is a draft report and does not represent an official position of The MITRE Corporation. Copyright © 2006, The MITRE Corporation. All rights reserved. Permission is granted to redistribute this document if this paragraph is not removed. This document is subject to change without notice.

1. Introduction

CME identifiers will identify "malware threats". At the most basic level, a "malware threat" is anything that has the potential to damage a computer system or network. Furthermore,

The CME initiative assumes that it is possible to protect against a malware threat. Examples of malware threats include viruses, worms, and Trojan horses.

Malware threats will be represented by a collection of one or more "samples." A malware threat sample will likely contain multiple files (i.e., not consist of a single executable binary file). A CME identifier will be associated with one or more representative samples. Each sample in a CME identifier sample set should be equivalent with respect to deconfliction (see Section 5), but should illustrate an aspect of the malware threat not illustrated by any other sample.

As this paper will discuss, it is not necessarily possible to define malware threat attributes so that someone with their own threat sample will be able to find the correct CME identifier associated with the sample. More likely, the value of a CME identifier will be in the coordination of different security devices (e.g., host-based anti-virus (AV) products, gateway devices).

This document addresses operational aspects of the CME initiative, including the purpose and scope of CME, how CME identifiers are assigned, and the initial process for the deconfliction of malware threat samples. The initiative expects this document to evolve as a result of discussions among the CME Editorial Board and as additional identifiers are assigned over time.

2. Scope

The objective of the CME initiative is to provide common identifiers to those malware threats that are of primary significance from the perspective of anti-virus vendors, IT security managers, and the general public. CME identifiers will not be assigned to all malware threats, but a CME identifier should be assigned to any malware threat for which one or more of the following statements is/are true:

Spyware, adware, and phishing attacks are not currently within the scope of CME. The scope of CME may be expanded to encompass these and other types of "security risks" in the future, however.

3. Identifiers

Initially, CME identifiers will be in the format CME-N where N is an integer between 1 and 999, such as "CME-123". Digits will be added when the remaining unused identifier space becomes too small.

4. Identifier Assignment Process

The secure CME Submission Server, which is used only by authorized members of the CME Sample Redistribution Group, went online in April 2005 for assigning CME identifiers. Highlights of this portion of the process include:

At present, all sample submission is performed by members of the CME Sample Redistribution Group.

5. Deconfliction

The deconfliction process answers the question, when are two malware threats equivalent? If one malware threat is equivalent to another malware threat, they will both have the same CME identifier. If they are not equivalent, they will have different CME identifiers.

The deconfliction process is one of group consensus by the Board where the group follows the current technical CME Deconfliction Guidelines in Section 5.1. When an appropriate guideline does not yet exist, the Board will formally define one when possible.

When multiple outbreaks are underway, it may take time for samples to be submitted. It will be crucial for the submitter, as well as others on the Board, to modify the request (e.g., add additional samples, add new supporting files, modify previously submitted analysis notes) to ensure deconfliction is completed accurately.

5.1 CME Deconfliction Guidelines

This is the initial list of guidelines for deconfliction as defined by the Board. Additional guidelines will be identified based on operational content decisions of the Sample Redistribution Group as additional CME identifiers are assigned. Ideally, the deconfliction process will evolve over time to depend more and more on technical characteristics of the malware threat samples and to follow more and more explicit guidelines.

For some guidelines, example cases are provided that refer to malware threats by name. After each name, the vendor(s) using the particular name is given in parentheses.

GUIDELINES:

G.1 Every file or component of a malware threat will be assigned the same CME identifier.

If a new outbreak downloads additional files from an external website, the downloaded files will get the same CME identifier as the file that initiated the download. In the cases where there is more than one downloaded file, or when the downloaded file changes (e.g., a modified version is uploaded), the CME identifier is assigned to all the additional files.

This will mean that a single file (e.g., a file downloaded by multiple threats) might be associated with more than one CME identifier. In their product, a vendor may only be able to identify the first CME identifier assigned to a file. However, other assigned CME identifiers should be provided in the vendor's encyclopedia.

Some files that are associated with a malware threat are excluded from CME identifier coverage. For example, valid and harmless .com files that might be used as part of a malware threat should not be assigned a CME identifier. See other guidelines for specific details.

Example case: Bagle.BE (Trend) outbreak in Feb. 2005 arrived as a downloader file, which downloaded additional files from several URLs included in the malware code. The downloader file and all of the additional downloaded files would have the same CME identifier.

G2. New files uploaded to a download site more than 48 hours after an initial outbreak will not be associated with any CME identifier.

There must be a limit on the number of files associated with a CME identifier.

Example case: None

G3. Log files generated by a malware threat and stored on the victim hard drive are not associated with the CME identifier.

A description of the log file would be an attribute contained in the CME identifier profile.

G4. A system file that is modified by a malware threat is not assigned a CME identifier directly. Rather, the fact that the system file is modified is an attribute of the CME-identified malware threat.

A description of the modification would be an attribute contained in the CME identifier profile.

Example case: Matcher.A (Trend) outbreak in July 2001 made a harmless modification to autoexec.bat. This file would not be identified by the CME identifier.

G5. Any file that is dropped by a malware threat is associated with the CME identifier, whether or not the file is malicious (subject to guideline G-6).

G6. Code that exploits a vulnerability that can be detected with a scanner will be assigned a CME identifier, along with any related files.

Example case: Nimda.A (Trend) outbreak in September 2001 arrived as an email attachment, dropped several files on the hard drive, infected files, and spread as a network worm. A CME identifier would be assigned to the byte sequence captured by a scanner, as well as to the email attachment, dropped files, infected files, and downloaded files.

G7. Memory dumps will be assigned a CME identifier, along with any related files.

Example case: the CodeRed outbreak in July 2001 caused a buffer overflow and never dropped any files to the hard drive. The memory dump of CodeRed would be assigned a CME identifier.

G8. Some tangible file (e.g., a packet capture) is required before a CME identifier can be assigned.

Example case: The Slammer worm (outbreak January 2003) was contained in a single UDP packet. Until a packet capture was available, a CME identifier could not be assigned.

G9. Malware threats that have functional differences will be assigned different CME identifiers.

A functional difference is defined to be any byte difference in the code. Examples include a difference of port number or email subject line. Vendors do not always report all functionally different malware threats to customers, choosing instead to associated multiple threats with a single name. In these cases, the single name would be associated in the vendor encyclopedia with multiple CME identifiers.

Example case: Many files were associated with Bagle activity on 3/1/05. Because of string differences and a difference of downloaded files, five different CME identifiers would have been assigned.

G10. A difference of attributes that are randomly generated by a malware threat (e.g., randomly generated email subject lines) does not constitute a functional difference.

G11. The packing method of a malware threat does not constitute a functional difference.

G12. Each malware threat created by a single malware threat "construction kit" will be given separate CME identifier if they are functionally different. A separate CME identifier will be assigned to the construction kit itself.

6. Other notes

7. Glossary

Adware - A form of spyware that collects information about the user in order to display advertisements. The term can also refer to software that contains embedded advertisements.

High-profile, high-impact - Malware threats that satisfy outbreak conditions.

Malware threat - Anything that has the potential to damage a computer system or network.

Security risk - Software that may pose a security risk, depending on the policies, expectations, and knowledge of the user.

Sufficiently high - This term is intentionally vague and simply means that the particular characteristic of the malware threat warrants the assignment of a CME identifier.

Trojan - Code that does something that is not expected by the executor of the code.

Virus - A program that infects a computer by attaching itself to another program, and propagating itself when that program is executed.

Worm - A computer program that can make copies of itself, and spreads through connected systems, using up resources in affected computers or causing other damage.

The CME Process:
Scope, Identifiers, and Guidelines for Deconfliction

Table of Contents

1. Introduction

2. Scope

3. Identifiers

4. Identifier Assignment Process

5. Deconfliction

5.1 CME Deconfliction Guidelines

GUIDELINES:

G.1 Every file or component of a malware threat will be assigned the same CME identifier.

G2. New files uploaded to a download site more than 48 hours after an initial outbreak will not be associated with any CME identifier.

G3. Log files generated by a malware threat and stored on the victim hard drive are not associated with the CME identifier.

G4. A system file that is modified by a malware threat is not assigned a CME identifier directly. Rather, the fact that the system file is modified is an attribute of the CME-identified malware threat.

G5. Any file that is dropped by a malware threat is associated with the CME identifier, whether or not the file is malicious (subject to guideline G-6).

G6. Code that exploits a vulnerability that can be detected with a scanner will be assigned a CME identifier, along with any related files.

G7. Memory dumps will be assigned a CME identifier, along with any related files.

G8. Some tangible file (e.g., a packet capture) is required before a CME identifier can be assigned.

G9. Malware threats that have functional differences will be assigned different CME identifiers.

G10. A difference of attributes that are randomly generated by a malware threat (e.g., randomly generated email subject lines) does not constitute a functional difference.

G11. The packing method of a malware threat does not constitute a functional difference.

G12. Each malware threat created by a single malware threat "construction kit" will be given separate CME identifier if they are functionally different. A separate CME identifier will be assigned to the construction kit itself.

6. Other notes

7. Glossary

The CME Process: Scope, Identifiers, and Guidelines for Deconfliction

Table of Contents

1. Introduction

2. Scope

3. Identifiers

4. Identifier Assignment Process

5. Deconfliction

5.1 CME Deconfliction Guidelines

GUIDELINES:

G.1 Every file or component of a malware threat will be assigned the same CME identifier.

G2. New files uploaded to a download site more than 48 hours after an initial outbreak will not be associated with any CME identifier.

G3. Log files generated by a malware threat and stored on the victim hard drive are not associated with the CME identifier.

G4. A system file that is modified by a malware threat is not assigned a CME identifier directly. Rather, the fact that the system file is modified is an attribute of the CME-identified malware threat.

G5. Any file that is dropped by a malware threat is associated with the CME identifier, whether or not the file is malicious (subject to guideline G-6).

G6. Code that exploits a vulnerability that can be detected with a scanner will be assigned a CME identifier, along with any related files.

G7. Memory dumps will be assigned a CME identifier, along with any related files.

G8. Some tangible file (e.g., a packet capture) is required before a CME identifier can be assigned.

G9. Malware threats that have functional differences will be assigned different CME identifiers.

G10. A difference of attributes that are randomly generated by a malware threat (e.g., randomly generated email subject lines) does not constitute a functional difference.

G11. The packing method of a malware threat does not constitute a functional difference.

G12. Each malware threat created by a single malware threat "construction kit" will be given separate CME identifier if they are functionally different. A separate CME identifier will be assigned to the construction kit itself.

6. Other notes

7. Glossary

The CME Process:
Scope, Identifiers, and Guidelines for Deconfliction